From traverso at dm.unipi.it Wed Dec 1 02:25:59 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Wed Dec 1 02:26:23 2004 Subject: [gutvol-d] request for input (first timer here) In-Reply-To: <20041130172908.237d8b64.stuart@ww1aviationlinks.cjb.net> (message from stuart on Tue, 30 Nov 2004 17:29:08 -0800) References: <20041130172908.237d8b64.stuart@ww1aviationlinks.cjb.net> Message-ID: <200412011025.iB1APx35025750@posso.dm.unipi.it> >>>>> "stuart" == stuart writes: stuart> I am starting on a project to convert Jane's All The stuart> World's Aircraft 1919 to an ebook, any suggestions stuart> welcome. If interested, take a look at stuart> http://ww1aviationlinks.cjb.net/janes/index.html stuart> _______________________________________________ gutvol-d stuart> mailing list gutvol-d@lists.pglaf.org stuart> http://lists.pglaf.org/listinfo.cgi/gutvol-d The 1913 edition has been proofread by Distributed Proofreaders, and is currently in post-processing. You might contact the post-processor for mutual suggestions. Carlo From flis at detk.com Wed Dec 1 11:35:12 2004 From: flis at detk.com (William Flis) Date: Wed Dec 1 11:28:47 2004 Subject: [gutvol-d] Unicode versions? Message-ID: I prepared ("postprocessed" at Distributed Proofreaders) also a Unicode (UTF-8) version of this book, but it doesn't seem to have made it to posting at PG. The UTF-8 elements were all pronunciation symbols, which one might expect to be important in such a book. I'm currently working on another volume in this series, with similar symbols. I've also been working on a book on Native American sign language, which contains a good number of special symbols used in transcribing the NA spoken languages, also requiring UTF-8. So my first question is, did my Unicode version just get lost somewhere? (I usually upload directly to PG but submitted this one the long way around through DP's "Post-Processing Verification" system so someone else would take a look at it, since it was my first attempt at Unicode.) Second, if not, are Unicode versions welcome? Bill Flis > Society for Pure English, Tract 2, on English Homophones, Robert > Bridges 14227 > [Link: http://www.gutenberg.net/1/4/2/2/14227 ] > [Files: 14227.txt; 14227-8.txt; 14227-h.htm] From dwidger at adelphia.net Wed Dec 1 11:41:11 2004 From: dwidger at adelphia.net (David Widger) Date: Wed Dec 1 11:41:17 2004 Subject: [gutvol-d] Unicode versions? References: Message-ID: <009401c4d7dd$b64e1010$6901a8c0@novocon.net> ----- Original Message ----- From: "William Flis" To: Sent: Wednesday, December 01, 2004 2:35 PM Subject: [gutvol-d] Unicode versions? > I prepared ("postprocessed" at Distributed Proofreaders) also a Unicode > (UTF-8) version of this book, but it doesn't seem to have made it to posting > at PG. The UTF-8 elements were all pronunciation symbols, which one might > expect to be important in such a book. I'm currently working on another > volume in this series, with similar symbols. I've also been working on a > book on Native American sign language, which contains a good number of > special symbols used in transcribing the NA spoken languages, also requiring > UTF-8. > > So my first question is, did my Unicode version just get lost somewhere? (I > usually upload directly to PG but submitted this one the long way around > through DP's "Post-Processing Verification" system so someone else would > take a look at it, since it was my first attempt at Unicode.) > > Second, if not, are Unicode versions welcome? > > Bill Flis > > > Society for Pure English, Tract 2, on English Homophones, Robert > > Bridges 14227 > > [Link: http://www.gutenberg.net/1/4/2/2/14227 ] > > [Files: 14227.txt; 14227-8.txt; 14227-h.htm] > Hi Bill, Unicode is very welcome. Here is the note I sent to Frank this morning and should have sent a copy to you. Hi Frank, I have been toying with this file for several days. The original problem was your provision of two html files one Latin-1 and one Unicode. We can only post one html file in the directory for the eBook. There is one way around this (and the one I was thinking of trying) which is to make a main html file with a links to the two html files you provided. However this went down the drain when I found the utf-8 html file has an invalid CSS statement (see the attached W3C CSS validation report). So I elected to post the valid Latin-1 html file and the text file alone. If you object to my approach kindly provide a file such as I suggested above with links to both html files and be sure that all the html files validate on all three W3C checks. Thanks, David PS. I sent Frank a copy of the CSS validator report but no longer have it--something about not allowing content in the prolog which I did not understand. DW > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Thu Dec 2 06:02:13 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Dec 2 06:02:19 2004 Subject: [gutvol-d] Anybody want to test of this gutenberg browser ? Message-ID: <41AF2065.9030900@perathoner.de> -------- Original Message -------- From: Clif Flynt Message-Id: <200412012104.iB1L4rm27796@clif.cflynt.com> Subject: Re: Project Gutenberg Browser To: marcello@perathoner.de (Marcello Perathoner) Date: Wed, 1 Dec 2004 16:04:53 -0500 (EST) Hi, My apology for being slow in replying to your mail. Being a spare-time project, the TkGutenbrowser doesn't get as much attention as it deserves, and the move from "works-for-me" to "suitable-for-general-use" is always slower than I'd like. The browser software is currently at http://www.mod3.net:~clif This is very much a work-in-progress, but the software does what I consider the minimal set of tasks now. It will read and display text, save text as PDB for C-Spot-Run on a PDA, and download non-text documents (images, sound files) to a disk file. I've not added support for bookmarks yet and the help is rudimentary. I believe that Project Gutenberg has changed the data in the catalog.rdf file since I started the project, and it appears that most (all?) of the information is now in catalog.rdf, instead of being split between catalog.rdf and GUTINDEX.ALL. My browser is only using catalog.rdf now, and not downloading the other files. ... Clif .... Clif Flynt ... http://www.cflynt.com ... clif@cflynt.com ... ..Tcl/Tk: A Developer's Guide (2'nd edition) - Morgan Kauffman .. -- Marcello Perathoner webmaster@gutenberg.org From Gutenberg9443 at aol.com Thu Dec 2 07:52:00 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Thu Dec 2 07:52:18 2004 Subject: [gutvol-d] interim report on eBookWise reader, info about Rocket Message-ID: <1a7.2c565d81.2ee09420@aol.com> Hi all-- One quick note on the Rocket: If your power cord has gone out, call Fox International at 1-800-321-6993 for a replacement. If you're in the US it will cost about $33 including shipping. I don't know whether you can even get it outside the US. Don't try to go online to get it so you'll have a printout of the order; they'll just tell you to call the phone number. Now to the eBookWise reader: Judging from the info I have so far, its footprint is slightly larger than that of the Rocket, but it weighs less--about a pound, whereas Rocket weighs 22 oz. It's still slightly smaller than a trade paperback, though, and weighs considerably less. Some of its controls seem counterintuitive to me, but that may be just because I've used the Rocket so many years. There are thousands (about 7,500 to be precise) of commercial books for those who can afford to buy them; they're at eBookWise.com, which is a subsidiary of FictionWise.com, and is presently engaged in converting all, or almost all, of its content into an eBookWise format. Some of them are new books, often bestsellers, and others are classics that are not yet out of copyright. If I had it to spend, I could spend a thousand dollars there in two shakes of a puppy-dog's tail. Alas, I don't have a thousand dollars. Although loading your own content is right now rather clunky, involving upload to a server and then download through a telephone line, it is doable. You can keep your own bookshelf at the server, and download by telephone, so you don't have to take your computer on the road with you to change the books in your reader. They have their software engineers working on a direct USB download program, the kind Rocket has, but it isn't ready yet. This close to Christmas, they made the decision to make the device available now and fix the software later. This is a quotation from eBookWise's propaganda sheet: "In addition, the eBookwise-1150 can display your own personal content in the following file formats: plain text (.txt), rich text format (.rtf), Microsoft Word documents (.doc), HTML (.htm or .html), and Rocket eBook Editions (.rb)." So that means that just about any free online book is readable on this reader, except that far under 1% that are available only in PDF. So--PG, PG Oz, Blackmask, _http://www.sacred-texts.com_ (http://www.sacred-texts.com) , Phoenix-Library.com and umpteen dozen other sites are now your oyster, in a reader you can fit into any briefcase or backpack and most large-size handbags. (Phoenix-Library has good language-to-language dictionaries in a surprisingly large number of languages.) It will hold only about 20 texts at a time, which was also normal for the Rocket unless you bought the extra large storage device when you ordered your Rocket. I did, and my Rocket holds about 50 texts. However, it has a slot for a SmartMedia card; I checked, and you can get SmartMedia cards holding anything from 4 mg up to half a gig. (The half-gig card costs about $250.) It does not do much with illustrations, as it is grayscale only. However, it appears that at least some small illustrations can be put into it. If you want one right now, go to eBookWise and create an account. Put $110 into your account. You can immediately order your machine, which will use 10 cents short of the entire $110, but it will then give you $20 in book credits. Wait to buy them until your device arrives, but you can go through and select them and put them into your cart now. Of course, you can deposit as much money as you want to into your account, and put the books into your shopping cart, to purchase when your device arrives. This is not the best possible ebook reader, but it is the best presently available for anybody who is not content to read ebooks only on a computer or a PDA. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041202/cb030670/attachment.html From joshua at hutchinson.net Thu Dec 2 08:10:12 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Dec 2 08:10:17 2004 Subject: [gutvol-d] interim report on eBookWise reader, info about Rocket Message-ID: <20041202161012.185664F4C6@ws6-5.us4.outblaze.com> Good review... Just one quick quibble. The SmartMedia card is only supported up to 128MB and it lists for $30.11 at Amazon (which means you can probably find it cheaper other places, but that's a good ballpark figure). Josh ----- Original Message ----- From: Gutenberg9443@aol.com To: gutvol-d@lists.pglaf.org Subject: [gutvol-d] interim report on eBookWise reader, info about Rocket Date: Thu, 2 Dec 2004 10:52:00 EST > > Hi all-- > > One quick note on the Rocket: If your power cord has gone out, call Fox > International at 1-800-321-6993 for a replacement. If you're in the US it will > cost about $33 including shipping. I don't know whether you can even get it > outside the US. Don't try to go online to get it so you'll have a printout of > the order; they'll just tell you to call the phone number. > > Now to the eBookWise reader: > > Judging from the info I have so far, its footprint is slightly larger than > that of the Rocket, but it weighs less--about a pound, whereas Rocket weighs 22 > oz. It's still slightly smaller than a trade paperback, though, and weighs > considerably less. Some of its controls seem counterintuitive to me, but that > may be just because I've used the Rocket so many years. > > There are thousands (about 7,500 to be precise) of commercial books for > those who can afford to buy them; they're at eBookWise.com, which is a > subsidiary > of FictionWise.com, and is presently engaged in converting all, or almost > all, of its content into an eBookWise format. Some of them are new books, > often > bestsellers, and others are classics that are not yet out of copyright. If I > had it to spend, I could spend a thousand dollars there in two shakes of a > puppy-dog's tail. Alas, I don't have a thousand dollars. > > Although loading your own content is right now rather clunky, involving > upload to a server and then download through a telephone line, it is doable. > You > can keep your own bookshelf at the server, and download by telephone, so you > don't have to take your computer on the road with you to change the books in > your reader. They have their software engineers working on a direct USB > download program, the kind Rocket has, but it isn't ready yet. This close to > Christmas, they made the decision to make the device available now and fix the > software later. > > This is a quotation from eBookWise's propaganda sheet: > > "In addition, the eBookwise-1150 can display your own personal content in > the following file formats: plain text (.txt), rich text format (.rtf), > Microsoft Word documents (.doc), HTML (.htm or .html), and Rocket eBook > Editions > (.rb)." > > So that means that just about any free online book is readable on this > reader, except that far under 1% that are available only in PDF. So--PG, PG Oz, > Blackmask, _http://www.sacred-texts.com_ (http://www.sacred-texts.com) , > Phoenix-Library.com and umpteen dozen other sites are now your oyster, in a > reader > you can fit into any briefcase or backpack and most large-size handbags. > (Phoenix-Library has good language-to-language dictionaries in a surprisingly > large number of languages.) > > It will hold only about 20 texts at a time, which was also normal for the > Rocket unless you bought the extra large storage device when you ordered your > Rocket. I did, and my Rocket holds about 50 texts. However, it has a slot for a > SmartMedia card; I checked, and you can get SmartMedia cards holding > anything from 4 mg up to half a gig. (The half-gig card costs about $250.) > > It does not do much with illustrations, as it is grayscale only. However, it > appears that at least some small illustrations can be put into it. > > If you want one right now, go to eBookWise and create an account. Put $110 > into your account. You can immediately order your machine, which will use 10 > cents short of the entire $110, but it will then give you $20 in book credits. > Wait to buy them until your device arrives, but you can go through and select > them and put them into your cart now. Of course, you can deposit as much > money as you want to into your account, and put the books into your shopping > cart, to purchase when your device arrives. > > This is not the best possible ebook reader, but it is the best presently > available for anybody who is not content to read ebooks only on a computer or a > PDA. > > Anne > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From Gutenberg9443 at aol.com Thu Dec 2 08:31:59 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Thu Dec 2 08:32:29 2004 Subject: [gutvol-d] interim report on eBookWise reader, info about Rocket Message-ID: <199.3375a734.2ee09d7f@aol.com> In a message dated 12/2/2004 9:10:38 AM Mountain Standard Time, joshua@hutchinson.net writes: The SmartMedia card is only supported up to 128MB and it lists for $30.11 at Amazon (which means you can probably find it cheaper other places, but that's a good ballpark figure). Thank you. I wasn't sure about that. It's $35.18 at CompUSA, plus shipping, so the Amazon price is probably about the best. Well, 128 MB will hold about 256 average texts, so that plus the built-in storage for another 20 books should suffice most people for the average airline trip or emergency room wait! The only time I read more than one complete book in either of those situations was the first time I flew after 9/11. I had to get to the airport two hours early and then the plane was two hours late. I spent most of the time sitting on the floor with my Rocket plugged in so I wouldn't drain the battery. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041202/255aad44/attachment.html From blondeel at clipper.ens.fr Thu Dec 2 20:28:39 2004 From: blondeel at clipper.ens.fr (Sebastien Blondeel) Date: Thu Dec 2 20:28:56 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) Message-ID: <20041203042839.GA3074@clipper.ens.fr> Hello, I hacked some scripts doing the following: RTF -> XML RTF: from Word, using a (very) simple stylesheet: just paragraphs, 3 title levels, footnotes, and italics Meta-information is in the properties of the document. My script can extract images too, if wanted. XML: using a personal and simple DTD (embedded), probably easy to port to any more complete DTD, such as TEI This is the hard part, and I am never quite sure it will not break in case the Word file is weird. >From that, I then did other (proof-of-concept) scripts to produce: XML -> PG TXT XML -> (LaTeX) -> PDF, DVI, PS (with hyperlinks) XML -> valid HTML 4.01 (probably useless) XML -> XHTML 1.0 Strict with some CSS (embedded) The programming is very defensive, so when all transforms finish I am confident enough the stuff is right. You can find examples of those formats at http://www.eleves.ens.fr/home/blondeel/ebooksgratuits/ (most of the books there don't have the meta-info properly set up, so don't worry too much about that). My scripts also clean up small typography mistakes (they are specialized in French rules but can of course be taught any thing). They will be used to help give PG nicer formats from the ebooksgratuits team (until now their Word macros could only produce PG TXT, which is not very sexy to read for the end user). Regards, From joshua at hutchinson.net Fri Dec 3 05:46:10 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Dec 3 05:46:13 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) Message-ID: <20041203134610.93A12EDAFD@ws6-1.us4.outblaze.com> I'm curious to see if your script can handle tables. That is our current biggest bugaboo when it comes to transforming to PG TXT format. Josh ----- Original Message ----- From: "Sebastien Blondeel" To: gutvol-d@lists.pglaf.org Subject: [gutvol-d] XML version of some books of PG (and other formats) Date: Fri, 3 Dec 2004 05:28:39 +0100 > > Hello, > > I hacked some scripts doing the following: > > RTF -> XML > > RTF: from Word, using a (very) simple stylesheet: just > paragraphs, 3 title levels, footnotes, and italics > Meta-information is in the properties of the document. > My script can extract images too, if wanted. > > XML: using a personal and simple DTD (embedded), probably easy to port > to any more complete DTD, such as TEI > > This is the hard part, and I am never quite sure it will not break in > case the Word file is weird. > > > From that, I then did other (proof-of-concept) scripts to produce: > > XML -> PG TXT > XML -> (LaTeX) -> PDF, DVI, PS (with hyperlinks) > XML -> valid HTML 4.01 (probably useless) > XML -> XHTML 1.0 Strict with some CSS (embedded) > > The programming is very defensive, so when all transforms finish I am > confident enough the stuff is right. > > You can find examples of those formats at > http://www.eleves.ens.fr/home/blondeel/ebooksgratuits/ > (most of the books there don't have the meta-info properly set up, > so don't worry too much about that). > > My scripts also clean up small typography mistakes (they are specialized > in French rules but can of course be taught any thing). They will be > used to help give PG nicer formats from the ebooksgratuits team (until > now their Word macros could only produce PG TXT, which is not very sexy > to read for the end user). > > Regards, > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From blondeel at clipper.ens.fr Fri Dec 3 07:16:02 2004 From: blondeel at clipper.ens.fr (Sebastien Blondeel) Date: Fri Dec 3 07:16:08 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) In-Reply-To: <20041203134610.93A12EDAFD@ws6-1.us4.outblaze.com> References: <20041203134610.93A12EDAFD@ws6-1.us4.outblaze.com> Message-ID: <20041203151602.GA10478@clipper.ens.fr> On Fri, Dec 03, 2004 at 08:46:10AM -0500, Joshua Hutchinson wrote: > I'm curious to see if your script can handle tables. That is our > current biggest bugaboo when it comes to transforming to PG TXT > format. My DTD doesn't mention them (yet?). It focuses mainly on the French books of the ebooksgratuits site. I guess it can very easily be injected in a more complete DTD (TEI, Docbook, whatever). I already did Perl (not XSLT!) translations of XML tables (Docbook, for example) to other formats (HTML: easy; LaTeX: harder...; TXT: w3m -dump of the HTML version is usually good enough) for other projects. I heard there were now Perl modules able to deal with XML and XSLT so it should be even easier to take care of. XSLT-style of programming is not for me... How complex are your tables and what do you need to do with them? Any example of (input, output desired, and constraints [API, language...] of the transformation)? From joshua at hutchinson.net Fri Dec 3 07:52:25 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Dec 3 07:52:29 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) Message-ID: <20041203155226.053FF4F462@ws6-5.us4.outblaze.com> The hard part is getting the table info within PG text 80 column width. A typical table might be 4 columns wide and 5 rows tall. Here is a fairly simple one from a Basebal Guide text I'm working on... Club. Won. Lost. P.C. Chicago 42 14 .788 Hartford 47 21 .691 St. Louis 45 19 .703 Boston 39 31 .557 Louisville 30 36 .455 Mutual 21 35 .375 Athletic 14 45 .237 Cincinnati 9 56 .135 Here is one a little more complex... It has more text columns. THE RECORD OF 1875. Club. Won. Lost. P.C. Club. Won. Lost. P.C. Boston ........ 71 8 .809 St. Louis Reds .... 4 14 .222 Athletic ...... 55 28 .756 Washington ........ 4 22 .156 Hartford ...... 54 28 .639 New Haven ......... 7 39 .152 St. Louis* .... 29 39 .574 Centennial......... 2 13 .133 Philadelphia .. 37 31 .544 Western ........... 1 12 .077 Chicago ....... 30 37 .448 Atlantic .......... 2 42 .065 Mutual ........ 29 38 .426 FYI, this table becomes this in TEI markup (NOTE: I made the second Club column just continue under the first for simplicities sake): THE RECORD OF 1875. Club.Won.Lost.P.C. Boston718.809 Athletic5528.756 Hartford5428.639 St. Louis2939.574 Philadelphia3731.544 Chicago3037.448 Mutual2938.426 St. Louis Reds414.222 Washington422.156 New Haven739.152 Centennial213.133 Western112.077 Atlantic242.065
----- Original Message ----- From: "Sebastien Blondeel" To: "Project Gutenberg Volunteer Discussion" Subject: Re: [gutvol-d] XML version of some books of PG (and other formats) Date: Fri, 3 Dec 2004 16:16:02 +0100 > > On Fri, Dec 03, 2004 at 08:46:10AM -0500, Joshua Hutchinson wrote: > > I'm curious to see if your script can handle tables. That is our > > current biggest bugaboo when it comes to transforming to PG TXT > > format. > > My DTD doesn't mention them (yet?). It focuses mainly on the French > books of the ebooksgratuits site. I guess it can very easily be injected > in a more complete DTD (TEI, Docbook, whatever). > > I already did Perl (not XSLT!) translations of XML tables (Docbook, for > example) to other formats (HTML: easy; LaTeX: harder...; TXT: w3m -dump > of the HTML version is usually good enough) for other projects. > > I heard there were now Perl modules able to deal with XML and XSLT so it > should be even easier to take care of. XSLT-style of programming is not > for me... > > How complex are your tables and what do you need to do with them? Any > example of (input, output desired, and constraints [API, language...] of > the transformation)? > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From blondeel at clipper.ens.fr Fri Dec 3 08:53:49 2004 From: blondeel at clipper.ens.fr (Sebastien Blondeel) Date: Fri Dec 3 08:53:55 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) In-Reply-To: <20041203155226.053FF4F462@ws6-5.us4.outblaze.com> References: <20041203155226.053FF4F462@ws6-5.us4.outblaze.com> Message-ID: <20041203165349.GA193@clipper.ens.fr> On Fri, Dec 03, 2004 at 10:52:25AM -0500, Joshua Hutchinson wrote: > The hard part is getting the table info within PG text 80 column width. It is not always possible of course. > FYI, this table becomes this in TEI markup (NOTE: I made the second That looks simple enough. > Club column just continue under the first for simplicities sake): Change it to HTML: -=-=-= [...] -=-=-= then replace: row -> tr cell -> td then "w3m -dump table.html" gives: $ w3m -dump table.html THE RECORD OF 1875. Club. Won. Lost. P.C. Boston 71 8 .809 Athletic 55 28 .756 Hartford 54 28 .639 St. Louis 29 39 .574 Philadelphia 37 31 .544 Chicago 30 37 .448 Mutual 29 38 .426 St. Louis Reds 4 14 .222 Washington 4 22 .156 New Haven 7 39 .152 Centennial 2 13 .133 Western 1 12 .077 Atlantic 2 42 .065 (the star after St. Louis has disappeared). If you need it embedded in a program I can try to code the algorithm, depending on the programming language you want (Perl should be easy). Then you can detect cells with just numbers in them should be right-aligned, etc. It should also be easy to translate this to LaTeX for PDF/DVI/PS output. From joshua at hutchinson.net Fri Dec 3 09:07:31 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Dec 3 09:07:36 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) Message-ID: <20041203170731.E8BE5EDEA0@ws6-1.us4.outblaze.com> The problem I've always run into is where the table tries to grow beyond 80 characters wide. For instance, say that one row looks like this in the original book. Data label that Now we have a column Now we have a column is extremely long of data that is also of data that is also and is broken up very long and broken very long and broken accordingly over up over multiple lines. up over multiple lines. multiple lines. Most automated text converters will put each cell on one line with no line breaks. A web browser will generate line breaks within cells so that the table will end up looking very similar to the above. I haven't tried w3m ... will it handle the above scenario? I've tried lynx dumping to a text file and IE/Mozilla dumping to a text, and they all fail miserably. Josh ----- Original Message ----- From: "Sebastien Blondeel" To: "Project Gutenberg Volunteer Discussion" Subject: Re: [gutvol-d] XML version of some books of PG (and other formats) Date: Fri, 3 Dec 2004 17:53:49 +0100 > > On Fri, Dec 03, 2004 at 10:52:25AM -0500, Joshua Hutchinson wrote: > > The hard part is getting the table info within PG text 80 column width. > > It is not always possible of course. > > > FYI, this table becomes this in TEI markup (NOTE: I made the second > > That looks simple enough. > > > Club column just continue under the first for simplicities sake): > > Change it to HTML: > > -=-=-= >
THE RECORD OF 1875.
> > > [...] > -=-=-= > > then replace: > row -> tr > cell -> td > > then "w3m -dump table.html" gives: > > $ w3m -dump table.html > THE RECORD OF 1875. > Club. Won. Lost. P.C. > Boston 71 8 .809 > Athletic 55 28 .756 > Hartford 54 28 .639 > St. Louis 29 39 .574 > Philadelphia 37 31 .544 > Chicago 30 37 .448 > Mutual 29 38 .426 > St. Louis Reds 4 14 .222 > Washington 4 22 .156 > New Haven 7 39 .152 > Centennial 2 13 .133 > Western 1 12 .077 > Atlantic 2 42 .065 > > (the star after St. Louis has disappeared). > > If you need it embedded in a program I can try to code the algorithm, > depending on the programming language you want (Perl should be easy). > > Then you can detect cells with just numbers in them should be > right-aligned, etc. > > It should also be easy to translate this to LaTeX for PDF/DVI/PS output. > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From marcello at perathoner.de Fri Dec 3 09:17:00 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Dec 3 09:17:08 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) In-Reply-To: <20041203155226.053FF4F462@ws6-5.us4.outblaze.com> References: <20041203155226.053FF4F462@ws6-5.us4.outblaze.com> Message-ID: <41B09F8C.8000704@perathoner.de> Joshua Hutchinson wrote: >
THE RECORD OF 1875.
> > THE RECORD OF 1875. > Shouldn't that be
THE RECORD OF 1875. ? -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Fri Dec 3 09:26:17 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Dec 3 09:26:31 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) Message-ID: <20041203172617.5A5439E832@ws6-2.us4.outblaze.com> I didn't think you could have a inside a
... So I went back and looked at the TEI-Lite. It doesn't mention it, so I thought I was right. Then, just to be sure, I checked the full spec. There, it does mention as a viable element inside a
So, you are right. markup would be the more correct route. Josh ----- Original Message ----- From: "Marcello Perathoner" To: "Project Gutenberg Volunteer Discussion" Subject: Re: [gutvol-d] XML version of some books of PG (and other formats) Date: Fri, 03 Dec 2004 18:17:00 +0100 > > Joshua Hutchinson wrote: > > >
> > > > THE RECORD OF 1875. > > > > Shouldn't that be > >
> > THE RECORD OF 1875. > > > ? > > > -- Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From jussi.kukkonen at welho.com Fri Dec 3 11:00:07 2004 From: jussi.kukkonen at welho.com (Jussi Kukkonen) Date: Fri Dec 3 11:00:04 2004 Subject: [gutvol-d] possible fix for overwide tables in PGTEI text (was: XML version of some books...) Message-ID: <41B0B7B7.1010605@welho.com> Joshua Hutchinson wrote: > I'm curious to see if your script can handle tables. That is our > current biggest bugaboo when it comes to transforming to PG TXT > format. > Now that you mentioned it.... I've been playing with PGTEI, and encountered this problem (too wide tables) also. If anyone is wondering what we're talking about, please search for string "1271-95." in http://koti.welho.com/jkukkone/geo/teioutput.txt Oh, feel free to see the html version also while you're there: http://koti.welho.com/jkukkone/geo/teioutput.html (warning for modem users - some images might still be pretty large). So, I spent some time with Groff* and Tbl** manuals and I think I found a fix for this. Currently Tbl input tables look like this (3 rows, 2 columns): *** 1873. Livingstone discovers Lake Moero. 1874-75. Lieut. Cameron crosses equatorial Africa. 1875-94. ?lis?e Reclus publishes his _G?ographie Universelle_. *** Tbl _can_ be instructed to wrap lines when needed by changing the input to this: *** T{ 1873. T} T{ Livingstone discovers Lake Moero. T} T{ 1874-75. T} T{ Lieut. Cameron crosses equatorial Africa. T} T{ 1875-94. T} T{ ?lis?e Reclus publishes his _G?ographie Universelle_. T} *** According to my tests it works suprisingly well. Sometimes Tbl does wrap too eagerly, but I saw nothing that wasn't acceptable. Marcello (or anyone with some authority on PGTEI), I can try and come up with a patch for tei2nroff-common.xsl, if that's wished for. Let me know. - jussi *, ** For those not familiar with the more obscure unix tools: Groff, or GNU Troff, is a document typesetting tool used in PGTEI to produce TXT and PDB versions. Tbl is a table formatting tool used by Groff -- Jussi Kukkonen From marcello at perathoner.de Fri Dec 3 11:07:11 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Dec 3 11:07:23 2004 Subject: [gutvol-d] possible fix for overwide tables in PGTEI text (was: XML version of some books...) In-Reply-To: <41B0B7B7.1010605@welho.com> References: <41B0B7B7.1010605@welho.com> Message-ID: <41B0B95F.5030805@perathoner.de> Jussi Kukkonen wrote: > Tbl _can_ be instructed to wrap lines when needed by changing the input > to this: > *** > T{ > 1873. > T} T{ > Livingstone discovers Lake Moero. > T} > T{ > 1874-75. > T} T{ > Lieut. Cameron crosses equatorial Africa. > T} > T{ > 1875-94. > T} T{ > ?lis?e Reclus publishes his _G?ographie Universelle_. > T} > *** > > According to my tests it works suprisingly well. Sometimes Tbl does wrap > too eagerly, but I saw nothing that wasn't acceptable. > > Marcello (or anyone with some authority on PGTEI), I can try and come up > with a patch for tei2nroff-common.xsl, if that's wished for. Let me know. The new forthcoming version 0.3 of the PGTEI converter already does this. (with a little help from the markup: you have to manually specify the width of the column.) -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Fri Dec 3 11:20:46 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Dec 3 11:20:54 2004 Subject: [gutvol-d] possible fix for overwide tables in PGTEI text (was:XML version of some books...) Message-ID: <20041203192046.DBD1C10993F@ws6-4.us4.outblaze.com> ----- Original Message ----- From: "Marcello Perathoner" > > The new forthcoming version 0.3 of the PGTEI converter already does this. > (with a little help from the markup: you have to manually specify the width of > the column.) > I saw that in the preliminary document you sent me, Marcello. I haven't had a chance to really dig into it yet. (US Thanksgiving holiday has put me behind) Dumb, off-the-top-of-my-head question: Would it be possible for the converter to assume that if no manual width is specified, that it should just divide the table width by the number of columns and apply that value automatically to each column? That way, if a quick-and-dirty table will work, no further markup is needed. But if special formatting is needed (perhaps for a really big, complex table?), the etext preparer can take the time to do so. Josh From blondeel at clipper.ens.fr Fri Dec 3 12:12:28 2004 From: blondeel at clipper.ens.fr (Sebastien Blondeel) Date: Fri Dec 3 12:12:44 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) In-Reply-To: <20041203170731.E8BE5EDEA0@ws6-1.us4.outblaze.com> References: <20041203170731.E8BE5EDEA0@ws6-1.us4.outblaze.com> Message-ID: <20041203201228.GA5096@clipper.ens.fr> On Fri, Dec 03, 2004 at 12:07:31PM -0500, Joshua Hutchinson wrote: > A web browser will generate line breaks within cells so that the table > will end up looking very similar to the above. I haven't tried w3m > ... will it handle the above scenario? I've tried lynx dumping to a You should have. Yes it does: -=-=-= $ cat /tmp/toto.html
Data label that is extremely long and is broken up accordingly over multiple lines. Now we have a column of data that is also very long and broken up over multiple lines. Now we have a column of data that is also very long and broken up over multiple lines.
$ w3m -cols 72 -dump /tmp/toto.html Data label that is Now we have a column of Now we have a column of extremely long and is data that is also very data that is also very broken up accordingly long and broken up over long and broken up over over multiple lines. multiple lines. multiple lines. $ w3m -cols 48 -dump /tmp/toto.html Data label that Now we have a Now we have a is extremely column of data column of data long and is that is also that is also broken up very long and very long and accordingly broken up over broken up over over multiple multiple lines. multiple lines. lines. -=-=-= (Note: for the 72 columns version I don't know why there is an extra space between columns 1 and 2. Probably a bug of w3m: it was already there in the base-ball example. This is easy to detect and fix I guess: use ``border=1'' and clean out the frames: $ w3m -cols 76 -dump /tmp/toto.html +-------------------------------------------------------------------------+ |Data label that is |Now we have a column of |Now we have a column of | |extremely long and is |data that is also very |data that is also very | |broken up accordingly |long and broken up over |long and broken up over | |over multiple lines. |multiple lines. |multiple lines. | +-------------------------------------------------------------------------+ ^^^ ^^ You can detect those useless empty columns and remove them (or decide to have 2 or 3 blanks between columns). Doing this without frames is more dangerous, and the columns more difficult to detect: $ w3m -cols 72 -dump /tmp/toto.html Data la el that is Now we have a column of Now we have a column of extreme y long and is data that is also very data that is also very brokenx p accordingly long and broken up over long and broken up over over mu tiple lines. multiple lines. multiple lines. ^^^ this is not a column break. > text file and IE/Mozilla dumping to a text, and they all fail > miserably. w3m is better than lynx for tables (and many other things: it is able to display images in console mode and inside xterms!). links is good too. As for the example given in a later message: -=-=-= $ w3m -cols 72 -dump teioutput.html | head 150. Ptolemy publishes his geography. 230. The Peutinger Table pictures the Roman roads. 400-14. Fa-hien travels through and describes Afghanistan and India. 499. Hoei-Sin said to have visited the kingdom of Fu-sang, 20,000 furlongs east of China (identified by some with California). 518-21. Hoei-Sing and Sung-Yun visit and describe the Pamirs and the Punjab. 540. Cosmas Indicopleustes visits India, and combats the sphericity of the globe. 629-46. Hiouen-Tshang travels through Turkestan, Afghanistan, India, $ perl ~/work/PGDP/ebooksgratuits/Fmt.pl 72 teioutput.html | head
150.Ptolemy publishes his geography.
230.The Peutinger Table pictures the Roman roads.
400-14.Fa-hien travels through and describes Afghanistan and India.
499.Hoei-Sin said to have visited the kingdom of Fu-sang, 20,000 furlongs east of China (identified by some with California).
518-21.Hoei-Sing and Sung-Yun visit and describe the -=-=-= Note: sometimes the columns in w3m are, weirdly, unbalanced. I don't have an example right here but I guess you can help with percentage-width attributes in the columns (if that is possible at all in TEI). I am using an old version of w3m, too (Debian stable). With links: -=-=-= $ links -dump teioutput.html | head 150. Ptolemy publishes his geography. 230. The Peutinger Table pictures the Roman roads. 400-14. Fa-hien travels through and describes Afghanistan and India. 499. Hoei-Sin said to have visited the kingdom of Fu-sang, 20,000 furlongs east of China (identified by some with California). 518-21. Hoei-Sing and Sung-Yun visit and describe the Pamirs and the Punjab. 540. Cosmas Indicopleustes visits India, and combats the sphericity of the globe. 629-46. Hiouen-Tshang travels through Turkestan, Afghanistan, India, $ links -dump toto.html +------------------------------------------------------------------------+ | Data label that is | Now we have a column of | Now we have a column | | extremely long and is | data that is also very | of data that is also | | broken up accordingly | long and broken up over | very long and broken | | over multiple lines. | multiple lines. | up over multiple | | | | lines. | +------------------------------------------------------------------------+ $ links -dump toto.html # without border Data label that is Now we have a column of Now we have a column of extremely long and is data that is also very data that is also very broken up accordingly over long and broken up over long and broken up over multiple lines. multiple lines. multiple lines. -=-=-= From joshua at hutchinson.net Fri Dec 3 12:36:36 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Dec 3 12:36:45 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) Message-ID: <20041203203636.C862E2F9A7@ws6-3.us4.outblaze.com> Anyone have a link to w3m in a windows executable (command line is fine, I just don't have access to a way to compile the source where I'm at right now)? This definitely looks interesting. Josh ----- Original Message ----- From: "Sebastien Blondeel" To: "Project Gutenberg Volunteer Discussion" Subject: Re: [gutvol-d] XML version of some books of PG (and other formats) Date: Fri, 3 Dec 2004 21:12:28 +0100 > > On Fri, Dec 03, 2004 at 12:07:31PM -0500, Joshua Hutchinson wrote: > > A web browser will generate line breaks within cells so that the table > > will end up looking very similar to the above. I haven't tried w3m > > ... will it handle the above scenario? I've tried lynx dumping to a > > You should have. Yes it does: > > -=-=-= > $ cat /tmp/toto.html > > > > > > >
Data label that is extremely long and is broken up accordingly over > multiple lines.Now we have a column of data that is also very long and broken up > over multiple lines.Now we have a column of data that is also very long and broken up > over multiple lines.
> > $ w3m -cols 72 -dump /tmp/toto.html > Data label that is Now we have a column of Now we have a column of > extremely long and is data that is also very data that is also very > broken up accordingly long and broken up over long and broken up over > over multiple lines. multiple lines. multiple lines. > > $ w3m -cols 48 -dump /tmp/toto.html > Data label that Now we have a Now we have a > is extremely column of data column of data > long and is that is also that is also > broken up very long and very long and > accordingly broken up over broken up over > over multiple multiple lines. multiple lines. > lines. > -=-=-= > > (Note: for the 72 columns version I don't know why there is an extra > space between columns 1 and 2. Probably a bug of w3m: it was already > there in the base-ball example. This is easy to detect and fix I guess: > use ``border=1'' and clean out the frames: > > $ w3m -cols 76 -dump /tmp/toto.html > +-------------------------------------------------------------------------+ > |Data label that is |Now we have a column of |Now we have a column of | > |extremely long and is |data that is also very |data that is also very | > |broken up accordingly |long and broken up over |long and broken up over | > |over multiple lines. |multiple lines. |multiple lines. | > +-------------------------------------------------------------------------+ > ^^^ ^^ > > You can detect those useless empty columns and remove them (or decide to have 2 > or 3 blanks between columns). Doing this without frames is more dangerous, and > the columns more difficult to detect: > > $ w3m -cols 72 -dump /tmp/toto.html > Data la el that is Now we have a column of Now we have a column of > extreme y long and is data that is also very data that is also very > brokenx p accordingly long and broken up over long and broken up over > over mu tiple lines. multiple lines. multiple lines. > ^^^ > this is not a column break. > > > text file and IE/Mozilla dumping to a text, and they all fail > > miserably. > > w3m is better than lynx for tables (and many other things: it is able to > display images in console mode and inside xterms!). links is good too. > > As for the example given in a later message: > > -=-=-= > $ w3m -cols 72 -dump teioutput.html | head > 150. Ptolemy publishes his geography. > 230. The Peutinger Table pictures the Roman roads. > 400-14. Fa-hien travels through and describes Afghanistan and India. > 499. Hoei-Sin said to have visited the kingdom of Fu-sang, 20,000 > furlongs east of China (identified by some with California). > 518-21. Hoei-Sing and Sung-Yun visit and describe the Pamirs and the > Punjab. > 540. Cosmas Indicopleustes visits India, and combats the > sphericity of the globe. > 629-46. Hiouen-Tshang travels through Turkestan, Afghanistan, India, > > $ perl ~/work/PGDP/ebooksgratuits/Fmt.pl 72 teioutput.html | head > > > > > >
150.Ptolemy publishes his geography.
230.The Peutinger Table pictures the Roman > roads.
400-14.Fa-hien travels through and describes > Afghanistan and India.
499.Hoei-Sin said to have visited the kingdom of > Fu-sang, 20,000 furlongs east of China (identified by some with > California).
518-21.Hoei-Sing and Sung-Yun visit and describe the > -=-=-= > > Note: sometimes the columns in w3m are, weirdly, unbalanced. I don't have an > example right here but I guess you can help with percentage-width attributes in > the columns (if that is possible at all in TEI). I am using an old version of > w3m, too (Debian stable). > > With links: > > -=-=-= > $ links -dump teioutput.html | head > 150. Ptolemy publishes his geography. > 230. The Peutinger Table pictures the Roman roads. > 400-14. Fa-hien travels through and describes Afghanistan and India. > 499. Hoei-Sin said to have visited the kingdom of Fu-sang, 20,000 > furlongs east of China (identified by some with California). > 518-21. Hoei-Sing and Sung-Yun visit and describe the Pamirs and the > Punjab. > 540. Cosmas Indicopleustes visits India, and combats the sphericity > of the globe. > 629-46. Hiouen-Tshang travels through Turkestan, Afghanistan, India, > > $ links -dump toto.html > +------------------------------------------------------------------------+ > | Data label that is | Now we have a column of | Now we have a column | > | extremely long and is | data that is also very | of data that is also | > | broken up accordingly | long and broken up over | very long and broken | > | over multiple lines. | multiple lines. | up over multiple | > | | | lines. | > +------------------------------------------------------------------------+ > > $ links -dump toto.html # without border > Data label that is Now we have a column of Now we have a column of > extremely long and is data that is also very data that is also very > broken up accordingly over long and broken up over long and broken up over > multiple lines. multiple lines. multiple lines. > -=-=-= > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From Gutenberg9443 at aol.com Fri Dec 3 18:49:38 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Dec 3 18:50:01 2004 Subject: [gutvol-d] Update 2 on the eFictionWise Message-ID: <159.459950f3.2ee27fc2@aol.com> Hi all-- My eFictionWise 1150 arrived this morning, and I have been fiddling with it ever since. Ten seconds after I got my Rocket out of its box, I was in love with it. If my first experience with a dedicated ebook reader had been an eFictionWise 1150, I would never have even tried to use a dedicated ebook reader again. It is nicely packaged. It feels good in the hand. The problem of having to point the down arrow to go up and the up arrow to go down has been remedied. BUT-- (1) There seem to be only two type sizes available, larger and smaller. Although the Rocket has the same built-in capacity, it also has a way to download other font sizes from the RocketLibrary. (2) With my vision problems, I need black on white. What I'm getting is grey-beige on white-grey-beige. The controls do not allow me to create black on white. I will be able to read the books in bed, because they are backlighted, but I'm not going to be able to do it without glasses, as I can the Rocket. (3) I still haven't figured out how to do most things. The biggest problem is that three of the four on-screen icons don't mean the same as they do on the Rocket. All kinds of pulldown menus are located in very illogical places. (4) I think I have figured out how to highlight passages, but not how to insert bookmarks. (5) The write-and-draw feature does not work. I have not checked out the keyboard yet, but it's pretty useless if I can't get back to the notes I have made. That's why I need bookmarks. (6) The method of downloading books is a total, utter, and complete nightmare. When I am downloading a Rocket or Microwave Reader from FictionWise.com, I pay for it and then I download it onto my screen and then, if it's a Rocket, I goes directly to the RocketLibrarian, which then asks me if I want to put it into the Rocket eBook reader. If I say yes, the job is done in a few seconds. If I say no, the file sits right there in my RocketLibrarian on my computer until I need it. If I want to import a personal file, from PG or any of several other sources, I make sure that I have saved it in .txt or .htm. Then I go into the Rocket Librarian, press a button and import the file, name the file, and then do exactly what I do for a book that is already .rb. Now, read the following eBookWise version of this task: I had put two ebooks in my eBookWise cart a couple of days ago, When I went back today and tried to buy them, the program didn't let me. Instead, it told me that I already had them. In order to try out capabilities, I picked out another ebook in which I was mildly interested and bought it. I had to jump through hoops to identify myself and my online "bookshelf." Then, in order to get it onto my eBookWise reader, I had to use a special heavy telephone cord to get it from the "bookshelf" to the reading device. Then I went to load one of my personal books onto the eBookWise. In order to do this, I had to go to my computer and then upload the file into my "bookshelf" online. This required me to jump through a few more hoops. I am allowed only 10 mg of "personal" stuff. Then I had to go back to the reader and play telephone games a while longer. I'll try it out again when the new program allowing USB uploads onto the eBookWise, and allowing the unlimited library that the Rocket allows, arrives. In the meantime, I'm using my Rocket. My own recommendation is that eBookWise take a VERY good look at a Rocket and then rewrite the abominable software for this device. I'm sorry. I wish I had better news. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041203/f05f1a89/attachment.html From ciesiels at bigpond.net.au Sat Dec 4 00:53:43 2004 From: ciesiels at bigpond.net.au (Michael Ciesielski) Date: Sat Dec 4 00:54:55 2004 Subject: [gutvol-d] promo.net/pg Message-ID: <41B17B17.6070805@bigpond.net.au> promo.net/pg is still the second result for a Google search for "Project Gutenberg". Is there any reason why this site still exists other then as a redirect to gutenberg.org? At the moment, the top two choices are: ** Welcome to Project Gutenberg - Project Gutenberg PROJECT GUTENBERG OFFICIAL HOME SITE - INDEX -- Free Books On-Line ... Not knowing anything about PG, I'd be most inclined to go for the second, which is promo.net. Once opened, the promo.net site provides no indication that it is not the real PG website, and even the link to gutenberg.net is phrased such that a casual glance would make one think that promo.net/pg was the official site. Mike From marcello at perathoner.de Sat Dec 4 02:36:51 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Dec 4 02:37:15 2004 Subject: [gutvol-d] possible fix for overwide tables in PGTEI text (was:XML version of some books...) In-Reply-To: <20041203192046.DBD1C10993F@ws6-4.us4.outblaze.com> References: <20041203192046.DBD1C10993F@ws6-4.us4.outblaze.com> Message-ID: <41B19343.8030201@perathoner.de> Joshua Hutchinson wrote: > Would it be possible for the converter to assume that if no manual > width is specified, that it should just divide the table width by the > number of columns and apply that value automatically to each column? That is exactly what it does unless you specify some column width. -- Marcello Perathoner webmaster@gutenberg.org From blondeel at clipper.ens.fr Sat Dec 4 05:30:19 2004 From: blondeel at clipper.ens.fr (Sebastien Blondeel) Date: Sat Dec 4 05:30:27 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) In-Reply-To: <20041203203636.C862E2F9A7@ws6-3.us4.outblaze.com> References: <20041203203636.C862E2F9A7@ws6-3.us4.outblaze.com> Message-ID: <20041204133019.GD27551@clipper.ens.fr> On Fri, Dec 03, 2004 at 03:36:36PM -0500, Joshua Hutchinson wrote: > Anyone have a link to w3m in a windows executable (command line is > fine, I just don't have access to a way to compile the source where > I'm at right now)? This definitely looks interesting. A friend of mine competent in Windows stuff suggests to use the following: ==== xml2txt.js ==== var x = new ActiveXObject("Msxml2.FreeThreadedDOMDocument"); x.load(WScript.Arguments(0)); var p = x.documentElement.selectNodes("//p"); for (var it = new Enumerator(p) ; !it.atEnd() ; it.moveNext()) { var t = it.item().text.replace(/\s+/, " "); if (t.charAt(0) == " ") t = t.slice(1); var l = t.length; var i = 0; var j = 0; while (i < l) { j = i + 77; if (j <= l) { j = t.lastIndexOf(" ", j); if (j < i) { j = t.indexOf(" ", i); if (j == -1) j = l; }; } else { j = l; }; WScript.Echo(t.slice(i, j)); i = j + 1; }; WScript.Echo(""); }; WScript.Quit(0); ==================== Run it with the following command: -=-=-= cscript //nologo xml2txt.js ton_fichier.xml -=-=-= With the paragraphs in the XML marked as

(Cf the selectNodes call). From marcello at perathoner.de Sat Dec 4 11:02:37 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Dec 4 11:02:45 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) In-Reply-To: <20041203134610.93A12EDAFD@ws6-1.us4.outblaze.com> References: <20041203134610.93A12EDAFD@ws6-1.us4.outblaze.com> Message-ID: <41B209CD.8050406@perathoner.de> Joshua Hutchinson wrote: > I'm curious to see if your script can handle tables. That is our > current biggest bugaboo when it comes to transforming to PG TXT > format. HTML, TXT and TEI versions of the 0.3 docs are up at: http://www.gutenberg.org/tei/marcello/0.3/doc/ There are two tables in the docs, a small one and a bigger one that needs manual specifying of the column width. -- Marcello Perathoner webmaster@gutenberg.org From sly at victoria.tc.ca Sat Dec 4 11:37:49 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Sat Dec 4 11:38:01 2004 Subject: [gutvol-d] promo.net/pg In-Reply-To: <41B17B17.6070805@bigpond.net.au> References: <41B17B17.6070805@bigpond.net.au> Message-ID: On Sat, 4 Dec 2004, Michael Ciesielski wrote: > promo.net/pg is still the second result for a Google search for "Project > Gutenberg". Is there any reason why this site still exists other then as > a redirect to gutenberg.org? My understanding is that there is a problem with who has write permissions for those particular pages. There was some discussion of this on gutvol-d a while ago, but it was before this list was moved to pglaf.org so I don't know if you could find it archived. I have sent emails to many people who have mentioned the URL promo.net/pg on web pages or in newsgroups to let them know that the current, most correct URL is http://www.gutenberg.org/ Andrew From joshua at hutchinson.net Sat Dec 4 14:10:47 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sat Dec 4 14:10:57 2004 Subject: =?iso-8859-1?q?Re:_[gutvol-d]_possible_fix_for_overwide_tables?= =?iso-8859-1?q?_in_PGTEI_text=09(was:XMLversion_of_some_books.?= =?iso-8859-1?q?..)?= Message-ID: <20041204221047.51C049E79E@ws6-2.us4.outblaze.com> ----- Original Message ----- From: "Marcello Perathoner" > > Joshua Hutchinson wrote: > > > Would it be possible for the converter to assume that if no manual > > width is specified, that it should just divide the table width by the > > number of columns and apply that value automatically to each column? > > That is exactly what it does unless you specify some column width. > > Good. Thanks for clearing that up for me. Josh From Gutenberg9443 at aol.com Sat Dec 4 14:45:28 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sat Dec 4 14:45:42 2004 Subject: [gutvol-d] Update on eBookWise -more sanguine Message-ID: <1ee.303f8519.2ee39808@aol.com> I'm feeling a little better about the eBookWise reader than I did yesterday. The problem with getting ebooks from the online bookshelf into my reader turned out to be a problem with the telephone company serving the bookshelf. I checked on that and this afternoon I was able to download with no problem at all. The "handwriting" feature suddenly started working. I don't know why it wasn't working before. I'm still not happy about the contrast, and obviously, until an eBookWise Librarian program allows personal content to be used without limit, I'm not going to be happy about the present very limited personal content, especially since I like to do a lot of my editing on my Rocket. It will actually be far easier on eBookWise once unlimited personal content is allowed, because it's much easier for me to write in the changes and then transfer by hand to the computer than it is to monkey around with punching each letter on a teeny little keyboard with the stylus. eBookWise took care of a rather peculiar problem with my ability to order a book. Their technicians are noticeably quick to correct problems that do not entail new programs, and their customer service reps also are prompt. So I think that once the USB computer to reader problem is solved, thereby solving the problem of extensive use of personal content, the reader will be highly user-friendly. Obviously eBookWise would rather I purchase books from them than use my own content, and I will definitely be purchasing as many books as I can afford. As I have been saying for years, I can't imagine anyone who has given a fair trial to a good ebook reader chosing to go back to tree books if there's an ebook available. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041204/cea73bfd/attachment.html From alex at awstudios.net Sat Dec 4 15:09:37 2004 From: alex at awstudios.net (Alex Wilson) Date: Sat Dec 4 15:09:56 2004 Subject: [gutvol-d] NYT upbeat on e-books Message-ID: Just saw this note on Slashdot, though it'd be of interest. http://slashdot.org/article.pl?sid=04/12/04/181228 "Sunday's NYT Book Review will carry an upbeat article on e-books, complete with mention of the New York Public Library's impressive 3,000-title efforts...." Links and discussion at the above link. Alex. http://www.alexwilson.com - Alex Wilson Studios http://www.telltaleweekly.org - Funding a Free Audiobook Library From nwolcott2 at kreative.net Sat Dec 4 14:00:47 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Sat Dec 4 19:10:09 2004 Subject: [gutvol-d] XML version of some books of PG (and other formats) References: <20041203155226.053FF4F462@ws6-5.us4.outblaze.com> Message-ID: <017101c4da77$e350c020$2371fea9@gateway> As I recall the 80 colllumn rule didn't used to be a hard and fast rule for tables. When the table contained too much information, one was supposed to expand it the minmum amount necessary, at least that is what I recall MH as saying. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: "Joshua Hutchinson" To: "Project Gutenberg Volunteer Discussion" Sent: Friday, December 03, 2004 10:52 AM Subject: Re: [gutvol-d] XML version of some books of PG (and other formats) > The hard part is getting the table info within PG text 80 column width. > > A typical table might be 4 columns wide and 5 rows tall. > > Here is a fairly simple one from a Basebal Guide text I'm working on... > > Club. Won. Lost. P.C. > Chicago 42 14 .788 > Hartford 47 21 .691 > St. Louis 45 19 .703 > Boston 39 31 .557 > Louisville 30 36 .455 > Mutual 21 35 .375 > Athletic 14 45 .237 > Cincinnati 9 56 .135 > > Here is one a little more complex... It has more text columns. > > THE RECORD OF 1875. > Club. Won. Lost. P.C. Club. Won. Lost. P.C. > Boston ........ 71 8 .809 St. Louis Reds .... 4 14 .222 > Athletic ...... 55 28 .756 Washington ........ 4 22 .156 > Hartford ...... 54 28 .639 New Haven ......... 7 39 .152 > St. Louis* .... 29 39 .574 Centennial......... 2 13 .133 > Philadelphia .. 37 31 .544 Western ........... 1 12 .077 > Chicago ....... 30 37 .448 Atlantic .......... 2 42 .065 > Mutual ........ 29 38 .426 > > FYI, this table becomes this in TEI markup (NOTE: I made the second Club column just continue under the first for simplicities sake): > > > > THE RECORD OF 1875. > > > Club.Won.Lost.P.C. > > > Boston718.809 > > > Athletic5528.756 > > > Hartford5428.639 > > > St. Louis2939.574 > > > Philadelphia3731.544 > > > Chicago3037.448 > > > Mutual2938.426 > > > St. Louis Reds414.222 > > > Washington422.156 > > > New Haven739.152 > > > Centennial213.133 > > > Western112.077 > > > Atlantic242.065 > >
> > ----- Original Message ----- > From: "Sebastien Blondeel" > To: "Project Gutenberg Volunteer Discussion" > Subject: Re: [gutvol-d] XML version of some books of PG (and other formats) > Date: Fri, 3 Dec 2004 16:16:02 +0100 > > > > > On Fri, Dec 03, 2004 at 08:46:10AM -0500, Joshua Hutchinson wrote: > > > I'm curious to see if your script can handle tables. That is our > > > current biggest bugaboo when it comes to transforming to PG TXT > > > format. > > > > My DTD doesn't mention them (yet?). It focuses mainly on the French > > books of the ebooksgratuits site. I guess it can very easily be injected > > in a more complete DTD (TEI, Docbook, whatever). > > > > I already did Perl (not XSLT!) translations of XML tables (Docbook, for > > example) to other formats (HTML: easy; LaTeX: harder...; TXT: w3m -dump > > of the HTML version is usually good enough) for other projects. > > > > I heard there were now Perl modules able to deal with XML and XSLT so it > > should be even easier to take care of. XSLT-style of programming is not > > for me... > > > > How complex are your tables and what do you need to do with them? Any > > example of (input, output desired, and constraints [API, language...] of > > the transformation)? > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d@lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > From Gutenberg9443 at aol.com Tue Dec 7 11:53:19 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Tue Dec 7 11:53:34 2004 Subject: [gutvol-d] NYT upbeat on e-books; so am I. Message-ID: In a message dated 12/4/2004 4:09:58 PM Mountain Standard Time, alex@awstudios.net writes: "Sunday's NYT Book Review will carry an upbeat article on e-books, complete with mention of the New York Public Library's impressive 3,000-title efforts...." Good news. But, as has been pointed out often, nobody really wants to sit at the computer or take the computer to bed in order to read a book. Nobody is going to provide book-size ebook readers for very long unless doing so becomes financially feasible. So if you don't want to read on a computer or on a PDA, I would appreciate it if anybody who can afford it would go to either eBookWise.com or FictionWise.com and spend at least $20 a month. If you don't have an ebook reader you can download Microsoft Reader to your desktop free; however, if you purchase th e 1150 you will have a highly acceptable tool you can use for many years. By buying all the remaining 1150s and making them available dirt cheap; transforming over 7500 ebooks, mostly proprietary, into the right format for the 1150; hiring software engineers to fix perceived problems; and hiring hardware engineers to make improved readers, FictionWise has stuck its neck so far out it looks like a giraffe. Now we need to feed that giraffe. As to my complaints about the 1150, I was mistaken on most of them. Some of the changes from Rocket are definitely an improvement. For example, so far I have zorched TWO Rocket powercords because the location of the cord port is such that the cord is often bent at a right angle. The 1150's cord port is at the top, which obviates that problem. It is possible to insert bookmarks (I was just plain wrong on that earlier). It is also possible to handwrite your notes to yourself as you're reading. I have only two remaining objections: First, of course, is the limited ability to use personal material; that is being worked on right now, and will be fixed as soon as possible by allowing direct USB downloading from your computer. The other, which I hadn't noticed earlier, is that there is no dictionary capability. I use that extensively in the Rocket, both English dictionary and language-to-language dictionaries. I do not know whether there is any intention of adding that. I won't update again until the USB problem is remedied. But on second thought, if you have it to spare, spend $50 a month at Fictionwise and/or eBookwise. Unfortunately, $20 is my limit and I don't always have that. But I think that the combination of free ebooks from us and many other sources, and commercial ebooks, is going to be a long-range win for all of us. Anne PS--Yes, you can read at the beach if you keep your 1150 inside a ziplock plastic bag, though I wouldn't do it because the possibility of somebody stealing it or walking on it is too high. As for underwater . . . If you're underwater watch the fishies instead of reading a book. I still wouldn't read it in the bathtub, but then I never read in the bathtub anyway since I dropped a rather expensive library book into the water. AW -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041207/e746d584/attachment.html From sly at victoria.tc.ca Wed Dec 8 00:28:04 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed Dec 8 00:28:26 2004 Subject: [gutvol-d] PG books used by visually impaired Message-ID: Dear fellow PG volunteers, Here's a recent newsgroup posting that I came across showing more of the use that people do get out of the texts that we are creating. Andrew Newsgroups: alt.disability.blind.social Date: 2004-12-07 11:15:06 PST I often listen to books from the Gutenberg Project on my laptop. Most of those books are available from the US Library of Congress on audio cassette tape. So mainly, when I listen to a book on my laptop, it's because I forgot to order another book and am having a "book emergency". You know, when you finish one book and have nothig else to read and it's cold and rainy outside and there's nothing on TV? But I've listened to a lot of books on my laptop and after a while you don't notice the voice -- if it's a good book. From nihil_obstat at mindspring.com Wed Dec 8 07:16:11 2004 From: nihil_obstat at mindspring.com (Dennis McCarthy) Date: Wed Dec 8 07:16:16 2004 Subject: [gutvol-d] PG books used by visually impaired Message-ID: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net> On a related thread, does anyone know a user friendly way to make mp3s (or other format) with digitized voices out of P.G. e-books? Specifically looking for using typical software on Windows or Mac format machines. I have been able to get newer versions of Abode Acroread to "Read Out Loud" (an option under "View"). This has a couple problems, though: 1) Most people do not have access to Acrobat to make PDF files. Acroread is free and lets you read PDF files, but you need Acrobat (somewhat expensive) to make PDFs. 2) This is fine for listening at your computer, but I could not find a way to export to MP3 for listening to later using other equipment. Looked for an option in RealPlayer, but no dice. This is specifically for someone who drives alot, and wants books-on-disk. Thanks. -Dennis McCarthy anno Domini MMIIII, a.d. VI Id. Dec., dies Mercvri Feast of the Immaculate Conception -----Original Message----- From: Andrew Sly Sent: Dec 8, 2004 3:28 AM To: gutvol-d@lists.pglaf.org Subject: [gutvol-d] PG books used by visually impaired Dear fellow PG volunteers, Here's a recent newsgroup posting that I came across showing more of the use that people do get out of the texts that we are creating. Andrew Newsgroups: alt.disability.blind.social Date: 2004-12-07 11:15:06 PST I often listen to books from the Gutenberg Project on my laptop. Most of those books are available from the US Library of Congress on audio cassette tape. So mainly, when I listen to a book on my laptop, it's because I forgot to order another book and am having a "book emergency". You know, when you finish one book and have nothig else to read and it's cold and rainy outside and there's nothing on TV? But I've listened to a lot of books on my laptop and after a while you don't notice the voice -- if it's a good book. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From alex at awstudios.net Wed Dec 8 08:40:29 2004 From: alex at awstudios.net (Alex Wilson) Date: Wed Dec 8 08:40:36 2004 Subject: [gutvol-d] PG books used by visually impaired In-Reply-To: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net> Message-ID: On 12/8/04 10:16 AM, "Dennis McCarthy" wrote: > > On a related thread, does anyone know a user friendly way to make mp3s (or > other format) with digitized voices out of P.G. e-books? On the Mac side, Real Mac Software has a program called "Voice Box" which does just that. The functionality is actually built in to Mac OS X so all you really need is a simple AppleScript, but VoiceBox gives you more options. Alex. http://www.alexwilson.com - Alex Wilson Studios http://www.telltaleweekly.org - Funding a Free Audiobook Library From kth at srv.net Wed Dec 8 08:24:41 2004 From: kth at srv.net (Kevin Handy) Date: Wed Dec 8 09:04:33 2004 Subject: [gutvol-d] PG books used by visually impaired In-Reply-To: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net> References: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net> Message-ID: <41B72AC9.4080105@srv.net> Dennis McCarthy wrote: >On a related thread, does anyone know a user friendly way to make mp3s (or other format) with digitized voices out of P.G. e-books? > >Specifically looking for using typical software on Windows or Mac format machines. > >I have been able to get newer versions of Abode Acroread to "Read Out Loud" (an option under "View"). This has a couple problems, though: >1) Most people do not have access to Acrobat to make PDF files. Acroread is free and lets you read PDF files, but you need Acrobat (somewhat expensive) to make PDFs. >2) This is fine for listening at your computer, but I could not find a way to export to MP3 for listening to later using other equipment. Looked for an option in RealPlayer, but no dice. > >This is specifically for someone who drives alot, and wants books-on-disk. > >Thanks. > > Not sure of the availability for whatever platforms you are using, but 'festival' will generate '.wav' files using the included 'text2wav' program. They should be easy to convert to mp3s. It's a monotone reading, but usable. More recent versions (i.e. FC3) seem to sound smoother (less computerized) than earlier ones (i.e. RH9). Don't know if this is because of the Linux or the festival versions. Comes standard with most recent RedHat and Fedora Core Linux installs, and probably others. Price is right (free). http://www.cstr.ed.ac.uk/projects/festival/manual/ From M.J.Farmer at bham.ac.uk Wed Dec 8 08:33:35 2004 From: M.J.Farmer at bham.ac.uk (Malcolm Farmer) Date: Wed Dec 8 09:09:34 2004 Subject: [gutvol-d] PG books used by visually impaired In-Reply-To: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net> References: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net> Message-ID: <41B72CDF.6050505@bham.ac.uk> Dennis McCarthy wrote: >On a related thread, does anyone know a user friendly way to make mp3s (or other format) with digitized voices out of P.G. e-books? > >Specifically looking for using typical software on Windows or Mac format machines. > >I have been able to get newer versions of Abode Acroread to "Read Out Loud" (an option under "View"). This has a couple problems, though: >1) Most people do not have access to Acrobat to make PDF files. Acroread is free and lets you read PDF files, but you need Acrobat (somewhat expensive) to make PDFs. > > PDF creator is free, for Windows: http://sourceforge.net/projects/pdfcreator Don't know how good it is, but if you're starting with PG plain text files, it shouldn't have too much problem. Or go the whole hog and use Open Office: free, and its word processor has an option to export to PDF That solves the first part under Windows. Getting Real to save the output as audio or MP3, I don't know. The only version of Real I've had much experience with (for Linux) just doesn't have any provision to allow saving output. It sounds as if Windows is the same. However, under Linux, the "mplayer" program can use the Real codec for playing Real audio, and will happily give a variety of outputs, including dumping raw sound output to disk for burning or re-encoding, but I have no idea if there's a free equivalent for Windows. You might have to resort to feeding one PC's soundcard line out to another PC's line in. These are synthesised voices we're talking about, so the loss in quality won't matter, if you're encoding to MP3 anyway. From tb at baechler.net Wed Dec 8 23:41:30 2004 From: tb at baechler.net (Tony Baechler) Date: Wed Dec 8 23:39:25 2004 Subject: [gutvol-d] PG books used by visually impaired In-Reply-To: <41B72CDF.6050505@bham.ac.uk> References: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net> <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net> Message-ID: <5.2.0.9.0.20041208233722.04129bf0@snoopy2.trkhosting.com> Hi. Apologies in advance, but I don't have links for most of this software. Also, I can only comment on Windows. There are many easy ways to convert to mp3. Probably the easiest is something like TextAloud but I don't know how much it is and I don't use it. What I do is use an audio capture program. In other words, it records anything from the sound card output, whether MIDI, RA, etc. to wave or mp3. The problem is that it does this in real time so if the book is 10 hours long, it takes that long to record. This probably doesn't help, but newer OCR programs for the blind such as Kurzweil 1000 version 6 and up and newer versions of Openbook have this built-in. They will convert text to mp3 relatively quickly. If you have a specific book you want converted, I will do it for you. Write me off list. The particular capture program I use is RecAll. It is shareware but there are free alternatives. You can go here if you want a 30 day demo. http://www.sagebrush.com/ From nwolcott2 at kreative.net Sat Dec 11 07:26:18 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Sat Dec 11 07:26:39 2004 Subject: [gutvol-d] Induce law Message-ID: <003201c4df95$ca040a00$2371fea9@gateway> The people who brought you Sonny Bono now are giving you the Induce Law. Prohibitting the manufacture or sale of any device which might be "reasonably" assumed to "induce" anyone into violating any law or regulation. A VCR is a good example. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041211/fa68c917/attachment.html From hart at pglaf.org Sat Dec 11 08:09:34 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Dec 11 08:09:36 2004 Subject: [gutvol-d] Induce law In-Reply-To: <003201c4df95$ca040a00$2371fea9@gateway> References: <003201c4df95$ca040a00$2371fea9@gateway> Message-ID: On Sat, 11 Dec 2004, Norm Wolcott wrote: > The people who brought you Sonny Bono now are giving you the Induce Law. > Prohibitting the manufacture or sale of any device which might be > "reasonably" assumed to "induce" anyone into violating any law or regulation. > A VCR is a good example. > > nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest Laws have also been introduced in Congress to include skipping ads on TV as part of the "violating any law or regulation." Brought to you by your friendly local Thought Police. . . . The next series could make it illegal to even TALK about skipping ads. mh From Gutenberg9443 at aol.com Sat Dec 11 08:10:23 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sat Dec 11 08:10:30 2004 Subject: [gutvol-d] Induce law Message-ID: In a message dated 12/11/2004 8:27:09 AM Mountain Standard Time, nwolcott2@kreative.net writes: >>Prohibitting the manufacture or sale of any device >>which might be "reasonably" assumed to "induce" >>anyone into violating any law or regulation. A VCR is a >>good example. Passed or proposed? In what legislative body? It seems to me that would hit photocopiers, scanners, tape recorders, large and/or external hard drives, and . . . the list could go on quite a while longer. This is absurd. Anne Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041211/06760870/attachment.html From hyphen at hyphenologist.co.uk Sat Dec 11 09:09:27 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sat Dec 11 09:09:58 2004 Subject: [gutvol-d] Induce law In-Reply-To: <003201c4df95$ca040a00$2371fea9@gateway> References: <003201c4df95$ca040a00$2371fea9@gateway> Message-ID: <0camr01g1hq9k9i459g5b1e6rvi9ge2p9s@4ax.com> On Sat, 11 Dec 2004 10:26:18 -0500, "Norm Wolcott" wrote: | The people who brought you Sonny Bono now are giving you the | Induce Law. Prohibitting the manufacture or sale of any | device which might be "reasonably" assumed to "induce" | anyone into violating any law or regulation. | A VCR is a good example. Thank ghod I do not live in the USA. -- Dave F From bill at truthdb.org Sat Dec 11 16:55:41 2004 From: bill at truthdb.org (bill jenness) Date: Sat Dec 11 16:55:56 2004 Subject: [gutvol-d] Re: Induce law In-Reply-To: <20041211200003.13D518C83B@pglaf.org> References: <20041211200003.13D518C83B@pglaf.org> Message-ID: <32790.134.117.137.83.1102812941.squirrel@134.117.137.83> Wouldn't that also include guns, fast cars and money? The idea is patently ridiculous. From webmaster at gutenberg.org Sun Dec 12 07:57:04 2004 From: webmaster at gutenberg.org (Marcello Perathoner) Date: Mon Dec 13 20:03:05 2004 Subject: [gutvol-d] [Fwd: Folio files] Message-ID: <41BC6A50.2040207@gutenberg.org> -------- Original Message -------- Subject: Folio files Date: Sat, 11 Dec 2004 22:45:42 -0000 From: Charles Crosby To: I have downloaded a folio version of Gibbon's 'Decline and Fall...' What program do I need to read it? Hoping you can be of assistance, Charles Crosby. -- Marcello Perathoner webmaster@gutenberg.org From gbnewby at pglaf.org Mon Dec 13 20:11:44 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Dec 13 20:11:46 2004 Subject: [gutvol-d] [Fwd: Folio files] In-Reply-To: <41BC6A50.2040207@gutenberg.org> References: <41BC6A50.2040207@gutenberg.org> Message-ID: <20041214041144.GC28632@pglaf.org> On Sun, Dec 12, 2004 at 04:57:04PM +0100, Marcello Perathoner wrote: > > > -------- Original Message -------- > Subject: Folio files > Date: Sat, 11 Dec 2004 22:45:42 -0000 > From: Charles Crosby > To: > > I have downloaded a folio version of Gibbon's 'Decline and Fall...' > What program do I need to read it? > Hoping you can be of assistance, > Charles Crosby. Hi, Charles. You're probably better off with a different version of this eBook (visit http://gutenberg.org and type "gibbon" in an Author search box). "Folio" is by a company that we haven't heard from in awhile. They had some proprietary software for eBooks. I'm unaware of any current programs that can view these files properly. We keep the files as part of the archive because we don't like to delete things, but as you can see this format as not much of a success, from today's point of view. -- Greg Newby From hart at pglaf.org Tue Dec 14 04:19:17 2004 From: hart at pglaf.org (Michael Hart) Date: Tue Dec 14 04:19:20 2004 Subject: [gutvol-d] [Fwd: Folio files] In-Reply-To: <41BC6A50.2040207@gutenberg.org> References: <41BC6A50.2040207@gutenberg.org> Message-ID: On Sun, 12 Dec 2004, Marcello Perathoner wrote: > > > -------- Original Message -------- > Subject: Folio files > Date: Sat, 11 Dec 2004 22:45:42 -0000 > From: Charles Crosby > To: > > I have downloaded a folio version of Gibbon's 'Decline and Fall...' > What program do I need to read it? > Hoping you can be of assistance, > Charles Crosby. The program I remember was called "Folio View" There once was a free reader, but it was discontinued. Michael From marcello at perathoner.de Tue Dec 14 04:45:17 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Dec 14 04:46:03 2004 Subject: [gutvol-d] [Fwd: Folio files] In-Reply-To: References: <41BC6A50.2040207@gutenberg.org> Message-ID: <41BEE05D.3080109@perathoner.de> Michael Hart wrote: > The program I remember was called > "Folio View" > > There once was a free reader, > but it was discontinued. Then we should either get hold of a copy of that reader and offer it for download or delete the files. No point in holding files nobody can read. -- Marcello Perathoner webmaster@gutenberg.org From nihil_obstat at mindspring.com Tue Dec 14 08:22:56 2004 From: nihil_obstat at mindspring.com (Dennis McCarthy) Date: Tue Dec 14 08:23:02 2004 Subject: [gutvol-d] Google On-Line Library Message-ID: <15491867.1103041377021.JavaMail.root@wamui08.slb.atl.earthlink.net> FYI, Here is an article about Google on-line library project: http://www.foxnews.com/story/0,2933,141433,00.html "The ambitious initiative announced late Monday gives Mountain View, Calif.-based Google the right to index material from the New York Public Library as well as libraries at four universities--Harvard, Stanford, Michigan and Oxford in England." Not sure what their profit angle is. Supposedly public domain works will be free to access. Maybe they get a cut of copyrighted books viewed via this service. --------------------------- Dennis McCarthy nihil_obstat@mindspring.com From marcello at perathoner.de Tue Dec 14 09:01:20 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Dec 14 09:01:27 2004 Subject: [gutvol-d] [Fwd: [ibiblio-announce] ibiblio and all sites offline] Message-ID: <41BF1C60.7070007@perathoner.de> -------- Original Message -------- Subject: [ibiblio-announce] ibiblio and all sites offline Date: Tue, 14 Dec 2004 11:47:00 -0500 From: John Reuning Reply-To: help@ibiblio.org To: ibiblio-announce@lists.ibiblio.org One of the core file servers crashed this morning. All web and ftp services will be offline until this system has been restored. -jrr _______________________________________________ ibiblio-announce mailing list ibiblio-announce@lists.ibiblio.org http://lists.ibiblio.org/mailman/listinfo/ibiblio-announce -- Marcello Perathoner webmaster@gutenberg.org From gbnewby at pglaf.org Tue Dec 14 09:55:58 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Tue Dec 14 09:56:00 2004 Subject: [gutvol-d] Google Partners with Oxford, Harvard & Others to Digitize Libraries Message-ID: <20041214175558.GA15809@pglaf.org> Here's an extract from http://searchenginewatch.com/searchday/article.php/3447411 '"In-copyright" books that are in these collections will have basic bibliographic information available but the full text will not be accessible. Smith told us that out-of copyright material will be available in full text, though printing will be disabled when viewing this content.' This doesn't sound like competition to PG, to me, and in fact the second sentence above means they won't even meet my definition of an eBook. Not to say that these things aren't worthwhile. After all, *we* could generate eBooks from scans etc. made available for public domain content. This could be very helpful. As to "why," when a few PG'ers met with Google last year, they stressed that from their point of view, any growth in online content is good for them. More stuff "out there" means there's more for them to find. So, this is partially altruistic, but also partially for the public good. It was interesting to see that UC Berkeley, UIUC and Yale were not among the libraries chosen (those are the 2-4th largest academic collections in the US, after Harvard). -- Greg From hart at pglaf.org Tue Dec 14 09:58:11 2004 From: hart at pglaf.org (Michael Hart) Date: Tue Dec 14 09:58:13 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <41BEE05D.3080109@perathoner.de> References: <41BC6A50.2040207@gutenberg.org> <41BEE05D.3080109@perathoner.de> Message-ID: On Tue, 14 Dec 2004, Marcello Perathoner wrote: > Michael Hart wrote: > >> The program I remember was called >> "Folio View" >> >> There once was a free reader, >> but it was discontinued. > > Then we should either get hold of a copy of that reader and offer it for > download or delete the files. No point in holding files nobody can read. It is VERY important to keep example of files that once had free readers that are available no longer. . .if nothing more than examples of why we don't put everything into any particular proprietary format. Michael S. Hart From hart at pglaf.org Tue Dec 14 10:11:21 2004 From: hart at pglaf.org (Michael Hart) Date: Tue Dec 14 10:11:23 2004 Subject: [gutvol-d] Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: <20041214175558.GA15809@pglaf.org> References: <20041214175558.GA15809@pglaf.org> Message-ID: On Tue, 14 Dec 2004, Greg Newby wrote: > Here's an extract from > http://searchenginewatch.com/searchday/article.php/3447411 > > '"In-copyright" books that are in these collections will have basic > bibliographic information available but the full text will not be accessible. > > Smith told us that out-of copyright material will be available in full > text, though printing will be disabled when viewing this content.' I wonder what Smith means by "full text" ??? > This doesn't sound like competition to PG, to me, and in > fact the second sentence above means they won't even meet > my definition of an eBook. > > Not to say that these things aren't worthwhile. After all, > *we* could generate eBooks from scans etc. made > available for public domain content. This could be > very helpful. I've also heard they intend to start with 40,000 books only of interest to rare book people and scholars. The two projections I heard were 7 and 10 years for the project. > As to "why," when a few PG'ers met with Google last year, > they stressed that from their point of view, any growth > in online content is good for them. More stuff "out there" > means there's more for them to find. So, this is partially > altruistic, but also partially for the public good. Of course, Google didn't follow up in any way on this meeting, and in fact didn't reply to my followup inquiries. > It was interesting to see that UC Berkeley, UIUC and Yale > were not among the libraries chosen (those are the > 2-4th largest academic collections in the US, after Harvard). Yale was originally announced, at least by NPR, and they had to announce a retraction. > -- Greg michael From marcello at perathoner.de Tue Dec 14 10:42:39 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Dec 14 10:42:48 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: References: <41BC6A50.2040207@gutenberg.org> <41BEE05D.3080109@perathoner.de> Message-ID: <41BF341F.9060101@perathoner.de> Michael Hart wrote: > It is VERY important to keep example of files that once had free readers > that are available no longer. . .if nothing more than examples of why we > don't put everything into any particular proprietary format. Would there be a better way to keep those "examples" than to keep them in the collection buried beneath a ton of other files where they just pop up per chance to disgruntle users who inadvertently download them ? -- Marcello Perathoner webmaster@gutenberg.org From jon at noring.name Tue Dec 14 10:48:28 2004 From: jon at noring.name (Jon Noring) Date: Tue Dec 14 10:48:39 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <41BF341F.9060101@perathoner.de> References: <41BC6A50.2040207@gutenberg.org> <41BEE05D.3080109@perathoner.de> <41BF341F.9060101@perathoner.de> Message-ID: <73101645671.20041214114828@noring.name> Marcello wrote: > Michael Hart wrote: >> It is VERY important to keep example of files that once had free readers >> that are available no longer. . .if nothing more than examples of why we >> don't put everything into any particular proprietary format. > Would there be a better way to keep those "examples" than to keep them > in the collection buried beneath a ton of other files where they just > pop up per chance to disgruntle users who inadvertently download them ? It does seem to me that old "texts" in a obsolete proprietary format be "retired" to a home of some sort. Keep them, but move them somewhere else. Btw, do the Folio version(s) exist in plain text or HTML form? Jon From flis at detk.com Tue Dec 14 11:52:39 2004 From: flis at detk.com (William Flis) Date: Tue Dec 14 11:46:13 2004 Subject: [gutvol-d] [Fwd: Folio files] In-Reply-To: <20041214041144.GC28632@pglaf.org> Message-ID: > > I have downloaded a folio version of Gibbon's 'Decline and Fall...' > > What program do I need to read it? > > Hoping you can be of assistance, > > Charles Crosby. > > "Folio" is by a company that we haven't heard from > in awhile. They had some proprietary software for > eBooks. I'm unaware of any current programs that can > view these files properly. > > We keep the files as part of the archive because we > don't like to delete things, but as you can see this format > as not much of a success, from today's point of view. Out of curiosity, I tried Google to find this file (thought maybe I could bust it open), and it seems that most of the versions of this book out on the web are identified as "Folio", including those that are available in plain text and html formats. Maybe he just meant the size of the original book? (He did write "folio", not "Folio"!) Bill Flis From hart at pglaf.org Tue Dec 14 13:22:13 2004 From: hart at pglaf.org (Michael Hart) Date: Tue Dec 14 13:22:14 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <73101645671.20041214114828@noring.name> References: <41BC6A50.2040207@gutenberg.org> <41BEE05D.3080109@perathoner.de> <41BF341F.9060101@perathoner.de> <73101645671.20041214114828@noring.name> Message-ID: On Tue, 14 Dec 2004, Jon Noring wrote: > Marcello wrote: >> Michael Hart wrote: > >>> It is VERY important to keep example of files that once had free readers >>> that are available no longer. . .if nothing more than examples of why we >>> don't put everything into any particular proprietary format. > >> Would there be a better way to keep those "examples" than to keep them >> in the collection buried beneath a ton of other files where they just >> pop up per chance to disgruntle users who inadvertently download them ? > > It does seem to me that old "texts" in a obsolete proprietary format > be "retired" to a home of some sort. Keep them, but move them > somewhere else. No. . .we want them right where people can see the effect of what would happen if they relied on proprietrary formats. "Lest we forget." > > Btw, do the Folio version(s) exist in plain text or HTML form? They must somewhere, but I don't have them. Michael From hart at pglaf.org Tue Dec 14 13:25:28 2004 From: hart at pglaf.org (Michael Hart) Date: Tue Dec 14 13:25:30 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <41BF341F.9060101@perathoner.de> References: <41BC6A50.2040207@gutenberg.org> <41BEE05D.3080109@perathoner.de> <41BF341F.9060101@perathoner.de> Message-ID: On Tue, 14 Dec 2004, Marcello Perathoner wrote: > Michael Hart wrote: > >> It is VERY important to keep example of files that once had free readers >> that are available no longer. . .if nothing more than examples of why we >> don't put everything into any particular proprietary format. > > Would there be a better way to keep those "examples" than to keep them in the > collection buried beneath a ton of other files where they just pop up per > chance to disgruntle users who inadvertently download them ? People SHOULD be disgruntled about such things. . . . We are NOT going to rewrite Project Gutenberg history to make it appear this didn't happen, nor are we going to downplay that it happened. I, personally, met with the President of Folio, before we embarked on this project, and he assured me that the free Folio reader would always be available. . .and he seemed far more friendly than Adobe ever has appeared. Michael From gbnewby at pglaf.org Tue Dec 14 14:25:48 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Tue Dec 14 14:25:50 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: References: <41BC6A50.2040207@gutenberg.org> <41BEE05D.3080109@perathoner.de> <41BF341F.9060101@perathoner.de> <73101645671.20041214114828@noring.name> Message-ID: <20041214222548.GA23236@pglaf.org> > > > >Btw, do the Folio version(s) exist in plain text or HTML form? Sure: plain text. Visit gutenberg.org, search for "gibbon" in the Author field. -- Greg From sly at victoria.tc.ca Tue Dec 14 15:56:11 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Dec 14 15:56:26 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <41BF341F.9060101@perathoner.de> References: <41BC6A50.2040207@gutenberg.org> <41BEE05D.3080109@perathoner.de> <41BF341F.9060101@perathoner.de> Message-ID: On Tue, 14 Dec 2004, Marcello Perathoner wrote: > Michael Hart wrote: > > > It is VERY important to keep example of files that once had free readers > > that are available no longer. . .if nothing more than examples of why we > > don't put everything into any particular proprietary format. > > Would there be a better way to keep those "examples" than to keep them > in the collection buried beneath a ton of other files where they just > pop up per chance to disgruntle users who inadvertently download them ? > When old files get reposted in the new directory structure, any formats like this, that cannot be updated, are moved into an "old" directory. Is that something like what you were thinking? Andrew From gbnewby at pglaf.org Tue Dec 14 17:40:19 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Tue Dec 14 17:40:21 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: References: <41BC6A50.2040207@gutenberg.org> <41BEE05D.3080109@perathoner.de> <41BF341F.9060101@perathoner.de> Message-ID: <20041215014019.GA10991@pglaf.org> On Tue, Dec 14, 2004 at 03:56:11PM -0800, Andrew Sly wrote: > > > On Tue, 14 Dec 2004, Marcello Perathoner wrote: > > > Michael Hart wrote: > > > > > It is VERY important to keep example of files that once had free readers > > > that are available no longer. . .if nothing more than examples of why we > > > don't put everything into any particular proprietary format. > > > > Would there be a better way to keep those "examples" than to keep them > > in the collection buried beneath a ton of other files where they just > > pop up per chance to disgruntle users who inadvertently download them ? > > > > When old files get reposted in the new directory structure, > any formats like this, that cannot be updated, are moved into > an "old" directory. > > Is that something like what you were thinking? > > Andrew While this is our usual method, unfortunately this particular title (#900) is its own eBook # in the Folio format. May 1997 Decline/Fall Of The Roman Empire, by Gibbon, Folio[dfre310f.xxx] 900 I think this is the only .nfo file we have. -- Greg From marcello at perathoner.de Wed Dec 15 02:17:00 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Dec 15 02:17:20 2004 Subject: [gutvol-d] [Fwd: Re: [webgroup] ibib's downtime/server resore] Message-ID: <41C00F1C.40302@perathoner.de> -------- Original Message -------- Subject: Re: [webgroup] ibib's downtime/server resore Date: Tue, 14 Dec 2004 19:49:42 -0500 (EST) From: Paul Jones CC: webgroup@lists.ibiblio.org this morning we lost the fileserver for the first time in 2 years of continuous uptime. it took until nearly 4 o'clock EST USA to get us back, but we're back now and the response time is fine for pages and even for my mysql driven blog. -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Wed Dec 15 05:23:57 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Dec 15 05:24:00 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] Message-ID: <20041215132357.EF0F04F441@ws6-5.us4.outblaze.com> ----- Original Message ----- From: "Greg Newby" > > On Tue, Dec 14, 2004 at 03:56:11PM -0800, Andrew Sly wrote: > > > > > > On Tue, 14 Dec 2004, Marcello Perathoner wrote: > > > > > Michael Hart wrote: > > > > > > > It is VERY important to keep example of files that once had free readers > > > > that are available no longer. . .if nothing more than examples of why we > > > > don't put everything into any particular proprietary format. > > > > > > Would there be a better way to keep those "examples" than to keep them > > > in the collection buried beneath a ton of other files where they just > > > pop up per chance to disgruntle users who inadvertently download them ? > > > > > > > When old files get reposted in the new directory structure, > > any formats like this, that cannot be updated, are moved into > > an "old" directory. > > > > Is that something like what you were thinking? > > > > Andrew > > While this is our usual method, unfortunately this > particular title (#900) is its own eBook # in the > Folio format. > > May 1997 Decline/Fall Of The Roman Empire, by Gibbon, Folio[dfre310f.xxx] 900 > > I think this is the only .nfo file we have. > So any chance we can convert this file to a text file, make that the main entry and move the .nfo file to the OLD subdirectory? Josh From hart at pglaf.org Wed Dec 15 09:37:34 2004 From: hart at pglaf.org (Michael Hart) Date: Wed Dec 15 09:37:36 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <20041215132357.EF0F04F441@ws6-5.us4.outblaze.com> References: <20041215132357.EF0F04F441@ws6-5.us4.outblaze.com> Message-ID: On Wed, 15 Dec 2004, Joshua Hutchinson wrote: > > ----- Original Message ----- > From: "Greg Newby" >> >> On Tue, Dec 14, 2004 at 03:56:11PM -0800, Andrew Sly wrote: >>> >>> >>> On Tue, 14 Dec 2004, Marcello Perathoner wrote: >>> >>>> Michael Hart wrote: >>>> >>>>> It is VERY important to keep example of files that once had free readers >>>>> that are available no longer. . .if nothing more than examples of why we >>>>> don't put everything into any particular proprietary format. >>>> >>>> Would there be a better way to keep those "examples" than to keep them >>>> in the collection buried beneath a ton of other files where they just >>>> pop up per chance to disgruntle users who inadvertently download them ? >>>> >>> >>> When old files get reposted in the new directory structure, >>> any formats like this, that cannot be updated, are moved into >>> an "old" directory. >>> >>> Is that something like what you were thinking? >>> >>> Andrew >> >> While this is our usual method, unfortunately this >> particular title (#900) is its own eBook # in the >> Folio format. >> >> May 1997 Decline/Fall Of The Roman Empire, by Gibbon, Folio[dfre310f.xxx] 900 >> >> I think this is the only .nfo file we have. >> > > So any chance we can convert this file to a text file, make that the main > entry and move the .nfo file to the OLD subdirectory? > > Josh Please stop trying to rewrite history. . . . This should be kept as a straighforward example of what can and DOES happen with proprietary formats. michael PS You probably don't remember the previous example of WordStar. From joshua at hutchinson.net Wed Dec 15 09:49:25 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Dec 15 09:49:30 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] Message-ID: <20041215174925.3624B9E79C@ws6-2.us4.outblaze.com> ----- Original Message ----- From: "Michael Hart" > > > > So any chance we can convert this file to a text file, make that the main > > entry and move the .nfo file to the OLD subdirectory? > > > > Josh > > Please stop trying to rewrite history. . . . > > This should be kept as a straighforward example > of what can and DOES happen with proprietary formats. > > Sorry. Gotta call bullshit on this one. Keeping the file in the OLD subdirectory maintains the history for those that wish to find it, while allowing better usability for those folks that simply want to read this particular work. How frustrating do you think people would be if they went to their local library, found a book in the catalog that they wanted, but the only place they are allowed to access the book is in a backroom that is pitch black. Yeah, they have the book, but it is completely useless to the reader. So, yeah, we have the book in PG... but it is completely useless to you. /nelson-voice-from-The-Simpsons HA HA /end-nelson-voice-from-The-Simpsons Josh From hart at pglaf.org Wed Dec 15 09:56:56 2004 From: hart at pglaf.org (Michael Hart) Date: Wed Dec 15 09:56:58 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <20041215174925.3624B9E79C@ws6-2.us4.outblaze.com> References: <20041215174925.3624B9E79C@ws6-2.us4.outblaze.com> Message-ID: On Wed, 15 Dec 2004, Joshua Hutchinson wrote: > > ----- Original Message ----- > From: "Michael Hart" > >>> >>> So any chance we can convert this file to a text file, make that the main >>> entry and move the .nfo file to the OLD subdirectory? >>> >>> Josh >> >> Please stop trying to rewrite history. . . . >> >> This should be kept as a straighforward example >> of what can and DOES happen with proprietary formats. >> >> > > Sorry. Gotta call bullshit on this one. Keeping the file in the OLD > subdirectory maintains the history for those that wish to find it, while > allowing better usability for those folks that simply want to read this > particular work. Barnyard epithets aside, this is too important to sweep under the carpet. There is plenty of usability in other formats, so leave it be. . . . > How frustrating do you think people would be if they went to their local > library, found a book in the catalog that they wanted, but the only place > they are allowed to access the book is in a backroom that is pitch black. > Yeah, they have the book, but it is completely useless to the reader. That's the whole point. . .so don't hide it. . .MAKE the point, publicly. > So, yeah, we have the book in PG... but it is completely useless to you. No. . .it's available in other formats. . .if you take a look. > > /nelson-voice-from-The-Simpsons > > HA HA > > /end-nelson-voice-from-The-Simpsons > > Josh > Yes, you are correct, you are making a silly argument. The President of Folio came to visit us here, and promised the free Folio reader. . . . Of course this is ancient history to you, but some of us remember, and do not want such an effort wiped out of our history. It was a LOT of work. . . . Leave it be. . . . Michael From marcello at perathoner.de Wed Dec 15 09:59:53 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Dec 15 10:00:00 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: References: <20041215132357.EF0F04F441@ws6-5.us4.outblaze.com> Message-ID: <41C07B99.10100@perathoner.de> Michael Hart wrote: > Please stop trying to rewrite history. . . . Please stop kicking history in the teeth of people who don't care and just want to read a book. > This should be kept as a straighforward example > of what can and DOES happen with proprietary formats. It should be kept, yes, but not in the main archive. Please, write a "Hall of Shame" page or something and link the files from there. -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Wed Dec 15 10:06:03 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Dec 15 10:06:09 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] Message-ID: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com> You're missing the point I'm trying to make, Michael. Keep the history. No problem. Just don't make it the DEFAULT that pops up when someone does a search. Joe User couldn't care less about our history. He just wants to read the book. So, give him the book that he CAN read. Put the "historical mistake" in the OLD subdirectory, where it is still available for those of us that care about such things. Josh PS I think the pop culture Simpsons reference flew right by you. ;) ----- Original Message ----- From: "Michael Hart" To: "The gutvol-d Mailing List" Subject: Re: !@!Re: [gutvol-d] [Fwd: Folio files] Date: Wed, 15 Dec 2004 09:56:56 -0800 (PST) > > > On Wed, 15 Dec 2004, Joshua Hutchinson wrote: > > > > > ----- Original Message ----- > > From: "Michael Hart" > > > >>> > >>> So any chance we can convert this file to a text file, make that the main > >>> entry and move the .nfo file to the OLD subdirectory? > >>> > >>> Josh > >> > >> Please stop trying to rewrite history. . . . > >> > >> This should be kept as a straighforward example > >> of what can and DOES happen with proprietary formats. > >> > >> > > > > Sorry. Gotta call bullshit on this one. Keeping the file in the OLD > > subdirectory maintains the history for those that wish to find it, while > > allowing better usability for those folks that simply want to read this > > particular work. > > Barnyard epithets aside, this is too important to sweep under the carpet. > > There is plenty of usability in other formats, so leave it be. . . . > > > > How frustrating do you think people would be if they went to their local > > library, found a book in the catalog that they wanted, but the only place > > they are allowed to access the book is in a backroom that is pitch black. > > Yeah, they have the book, but it is completely useless to the reader. > > That's the whole point. . .so don't hide it. . .MAKE the point, publicly. > > > > So, yeah, we have the book in PG... but it is completely useless to you. > > No. . .it's available in other formats. . .if you take a look. > > > > > > /nelson-voice-from-The-Simpsons > > > > HA HA > > > > /end-nelson-voice-from-The-Simpsons > > > > Josh > > > > Yes, you are correct, you are making a silly argument. > > > The President of Folio came to visit us here, > and promised the free Folio reader. . . . > > Of course this is ancient history to you, > but some of us remember, and do not want > such an effort wiped out of our history. > > It was a LOT of work. . . . > > Leave it be. . . . > > > Michael > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From prishan at bom3.vsnl.net.in Wed Dec 15 09:53:45 2004 From: prishan at bom3.vsnl.net.in (avinash kothare) Date: Wed Dec 15 10:06:24 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] References: Message-ID: <41C07A29.000001.55643@AVINASH> -------Original Message------- From: Michael S. Hart; Project Gutenberg Volunteer Dis cussion Date: 12/15/04 23:07:36 To: Project Gutenberg Volunteer Discussion Subject: Re: !@!Re: [gutvol-d] [Fwd: Folio files] Please stop trying to rewrite history. . . . This should be kept as a straighforward example of what can and DOES happen with proprietary formats. michael Duh! I have downloaded so many reading formats, for making reading a pleasure for the eyes. At the final count, it all boils down to get a good version of text [I shrivel from saying a 'perfect version'> and run a script which could make it easier for your eyes. Aesthetics dictate, that all those beatiful images are included in. What else does the 99% of the whole wide world of readers need? You probably don't remember the previous example of WordStar. Beg your pardon Sir. :-) Avinash.______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041215/0a412bf6/attachment.html From hart at pglaf.org Wed Dec 15 10:12:43 2004 From: hart at pglaf.org (Michael Hart) Date: Wed Dec 15 10:12:45 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com> References: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com> Message-ID: On Wed, 15 Dec 2004, Joshua Hutchinson wrote: > You're missing the point I'm trying to make, Michael. > > Keep the history. No problem. Just don't make it the DEFAULT that pops up > when someone does a search. Joe User couldn't care less about our history. > He just wants to read the book. So, give him the book that he CAN read. I don't see any messages that say the .nfo file is the default, out of the dozen or so that I have here. _I_ certainly haven't said it should be the default, just that it should not be moved away from the main directories. > Put the "historical mistake" in the OLD subdirectory, where it is still > available for those of us that care about such things. It is our job to make sure people are aware of this. They can't care if they are not aware of it. > Josh Michael > PS I think the pop culture Simpsons reference flew right by you. ;) Doh! > > > ----- Original Message ----- From: "Michael Hart" To: "The > gutvol-d Mailing List" Subject: Re: !@!Re: > [gutvol-d] [Fwd: Folio files] Date: Wed, 15 Dec 2004 09:56:56 -0800 (PST) > >> >> >> On Wed, 15 Dec 2004, Joshua Hutchinson wrote: >> >>> >>> ----- Original Message ----- >>> From: "Michael Hart" >>> >>>>> >>>>> So any chance we can convert this file to a text file, make that the main >>>>> entry and move the .nfo file to the OLD subdirectory? >>>>> >>>>> Josh >>>> >>>> Please stop trying to rewrite history. . . . >>>> >>>> This should be kept as a straighforward example >>>> of what can and DOES happen with proprietary formats. >>>> >>>> >>> >>> Sorry. Gotta call bullshit on this one. Keeping the file in the OLD >>> subdirectory maintains the history for those that wish to find it, while >>> allowing better usability for those folks that simply want to read this >>> particular work. >> >> Barnyard epithets aside, this is too important to sweep under the carpet. >> >> There is plenty of usability in other formats, so leave it be. . . . >> >> >>> How frustrating do you think people would be if they went to their local >>> library, found a book in the catalog that they wanted, but the only place >>> they are allowed to access the book is in a backroom that is pitch black. >>> Yeah, they have the book, but it is completely useless to the reader. >> >> That's the whole point. . .so don't hide it. . .MAKE the point, publicly. >> >> >>> So, yeah, we have the book in PG... but it is completely useless to you. >> >> No. . .it's available in other formats. . .if you take a look. >> >> >>> >>> /nelson-voice-from-The-Simpsons >>> >>> HA HA >>> >>> /end-nelson-voice-from-The-Simpsons >>> >>> Josh >>> >> >> Yes, you are correct, you are making a silly argument. >> >> >> The President of Folio came to visit us here, >> and promised the free Folio reader. . . . >> >> Of course this is ancient history to you, >> but some of us remember, and do not want >> such an effort wiped out of our history. >> >> It was a LOT of work. . . . >> >> Leave it be. . . . >> >> >> Michael >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d@lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Wed Dec 15 10:14:01 2004 From: hart at pglaf.org (Michael Hart) Date: Wed Dec 15 10:14:02 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <41C07B99.10100@perathoner.de> References: <20041215132357.EF0F04F441@ws6-5.us4.outblaze.com> <41C07B99.10100@perathoner.de> Message-ID: On Wed, 15 Dec 2004, Marcello Perathoner wrote: > Michael Hart wrote: > >> Please stop trying to rewrite history. . . . > > Please stop kicking history in the teeth of people who don't care and just > want to read a book. Not trying to make it the default, if that is now the issue. > > >> This should be kept as a straighforward example >> of what can and DOES happen with proprietary formats. > > It should be kept, yes, but not in the main archive. Please, write a "Hall of > Shame" page or something and link the files from there. Sorry, it should not be relegated to museum status. > > > > -- > Marcello Perathoner > webmaster@gutenberg.org > From jon at noring.name Wed Dec 15 10:30:42 2004 From: jon at noring.name (Jon Noring) Date: Wed Dec 15 10:31:27 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com> References: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com> Message-ID: <1941350328.20041215113042@noring.name> Joshua wrote: > You're missing the point I'm trying to make, Michael. > > Keep the history. No problem. Just don't make it the DEFAULT > that pops up when someone does a search. Joe User couldn't care > less about our history. He just wants to read the book. So, > give him the book that he CAN read. Good point. I agree with this. > Put the "historical mistake" in the OLD subdirectory, where it > is still available for those of us that care about such things. Now, to show support for Michael's reasoning, PG definitely needs to make a strong point about the importance of using easy to repurpose open standards for formatting etexts. But mixing obsolete proprietary formats with usable formats actually works against making this point, as Joshua notes. It also aggravates users who may want to read the work, but can't (and thus they will develop a negative view towards PG.) I say move them to a special directory so they are *easier* to find, and then create a web site describing why proprietary formats are bad (especially those which are very difficult to repurpose even when the format is published.) Provide links at this web site to those works in the collection using proprietary formats. I guess one could call it a "PG Hall of Shame" collection. Just a suggestion. Jon From scottsch at ncweb.com Wed Dec 15 11:41:41 2004 From: scottsch at ncweb.com (Scott Schmucker) Date: Wed Dec 15 11:42:06 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <1941350328.20041215113042@noring.name> References: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com> <1941350328.20041215113042@noring.name> Message-ID: <41C09375.4070801@ncweb.com> Jon Noring wrote: > >Now, to show support for Michael's reasoning, PG definitely needs to >make a strong point about the importance of using easy to repurpose >open standards for formatting etexts. But mixing obsolete proprietary >formats with usable formats actually works against making this point, >as Joshua notes. It also aggravates users who may want to read the >work, but can't (and thus they will develop a negative view towards >PG.) > >I say move them to a special directory so they are *easier* to find, >and then create a web site describing why proprietary formats are bad >(especially those which are very difficult to repurpose even when the >format is published.) Provide links at this web site to those works >in the collection using proprietary formats. I guess one could call it >a "PG Hall of Shame" collection. > > I do support this suggestion. The reasoning behind my support is: Were I a random reader searching Project Gutenberg for a copy of Edward Gibbon's "History of the Decline and Fall of the Roman Empire" I would be met by a series of files available for download. I find several textual files for each volume of the history, and one Folio formatted document (which does, for the record, appear first, perhaps because 'Folio' is alphabetically prior to 'Volume'). I choose the first item on the list, and perhaps, if it has been somehow moved from the top of the list, I select that which does not specify a volume number, intending to locate the full set of volumes. I then see the following comment which appears in the notes for this Folio-formatted document: DO NOT DOWNLOAD !!! see #892 for HTML format, #733 for plain text. The Folio format is obsolete. You won't be able to display the file. Thank goodness that this comment is here, but I suggest that this does not have the affect that we intend, and that Michael very strongly supports. As a random reader, I do not look at this and say "What a tragic result of proprietary e-book formats!" Rather, the only thought that I can imagine is one of confusion. "What a foolish thing for Project Gutenberg to have!" not, "What a foolish thing for anybody to do!" I do support Jon's suggestion of creating a Project Gutenberg "Hall of Shame" of sorts, which provides the argument against proprietary e-book formats. I suggest that the Folio-formatted e-books could be moved into this portion of the site. Of course, as Michael has pointed out, the intention is not to hide the documents away. With that intention in mind, it would not be unreasonable to leave the original entry within the database, but replace the above note "DO NOT DOWNLOAD, etc" with a more detailed reference to the aforementioned "Hall of Shame." This provides Joe Reader with more of a justification for the document's presence, and possibly sends him away with a different perspective on proprietary e-book formats, which, after all, is the intention. - Scott Schmucker From maitriv at yahoo.com Wed Dec 15 11:46:29 2004 From: maitriv at yahoo.com (maitri venkat-ramani) Date: Wed Dec 15 11:46:36 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <41C07B99.10100@perathoner.de> Message-ID: <20041215194629.71470.qmail@web52308.mail.yahoo.com> People, people, >From the standpoint of an archivist, I have to fall on the side of keeping all file formats available and accessible to the reader. PG is as much a historical catalog as it is a library. How hard is it to have the search page list ALL of the files we have in our collection under that name (pssst ... much like we have now?). Leave it to the discretion of the user to download the format they wish to. If the Folio files aren't listed along with the others, how will our readers know they are there? Some solutions: 1. List all of the files in PG per book, along with a legend that explains to the user what the formats mean. 2. Put up an info page that lets users know ALL file formats we carry. I'm happy to help put this together if appropriate. There's nothing wrong in keeping archive formats around. As Chief Wiggum says, "I hope this has taught you kids a lesson: kids never learn." Maitri --- Marcello Perathoner wrote: > Michael Hart wrote: > > > Please stop trying to rewrite history. . . . > > Please stop kicking history in the teeth of people who don't care and > > just want to read a book. > > > > This should be kept as a straighforward example > > of what can and DOES happen with proprietary formats. > > It should be kept, yes, but not in the main archive. Please, write a > "Hall of Shame" page or something and link the files from there. > > > > -- > Marcello Perathoner > webmaster@gutenberg.org __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From bill at truthdb.org Wed Dec 15 12:14:07 2004 From: bill at truthdb.org (bill jenness) Date: Wed Dec 15 12:14:17 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <41C09375.4070801@ncweb.com> References: <41C09375.4070801@ncweb.com> Message-ID: <32842.134.117.137.162.1103141647.squirrel@134.117.137.162> ... > Were I a random reader searching Project Gutenberg for a copy of Edward > Gibbon's "History of the Decline and Fall of the Roman Empire" I would > be met by a series of files available for download. I find several > textual files for each volume of the history, and one Folio formatted > document (which does, for the record, appear first, perhaps because > 'Folio' is alphabetically prior to 'Volume'). I choose the first item > on the list, and perhaps, if it has been somehow moved from the top of ... > - Scott Schmucker > > > If this is the case, perhaps it would be easier to add XTINCT prior to Folio in the title, thus commenting on proprietary formats and changing the sort order although dropping the E from extinct might look funny.... From jmdyck at ibiblio.org Wed Dec 15 12:55:34 2004 From: jmdyck at ibiblio.org (Michael Dyck) Date: Wed Dec 15 12:56:00 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] References: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com> <1941350328.20041215113042@noring.name> <41C09375.4070801@ncweb.com> Message-ID: <41C0A4C6.C66EEE2A@ibiblio.org> Scott Schmucker wrote: > > I then see the following > comment which appears in the notes for this Folio-formatted document: > > DO NOT DOWNLOAD !!! see #892 for HTML format, #733 for plain text. > The Folio format is obsolete. You won't be able to display the file. Etexts #892 and #733 are each "Decline & Fall" Volume 3. Why point to volume 3? -Michael From hart at pglaf.org Wed Dec 15 07:34:11 2004 From: hart at pglaf.org (Michael Hart) Date: Wed Dec 15 18:43:25 2004 Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: <20041214234529.2976.qmail@web60701.mail.yahoo.com> References: <20041214234529.2976.qmail@web60701.mail.yahoo.com> Message-ID: On Tue, 14 Dec 2004, Tony Kline wrote: > > > Bowerbird@aol.com wrote: > > tony said: >>> That's very good, though image files hardly meet the needs of those users >>> who want digital text and the ability to download, cut and paste etc > >> well, since google _is_ a search engine, they'll obviously o.c.r. the text. >> and clean up the text, because errors would muck up their search engine. > > Did they say OCR or did you deduce that? I got the impression they are > imaging pages, and maybe adding some identifying keywords for each page. > That is you'll be able to Google to a title chapter and page maybe, but > you won't be able to Google within pages. Try OCR'ing some of the stuff > in the Bodleian...there ain't no such fonts!! Does anyone know what > they mean by digitizing? Here's what I have gleaned from 5 TV network news shows and the various NYT, SF Chron, etc., articles: There will be one "full text" respository at Google, but users won't be able to access more than a "snippet" around any quotation they look up, much as with general Google searches today, and then, if they want more, they will have to click on the item and will then arrive at a second database, this one provided by one of the five libraries [NYCPL, Harvard, Michigan, Stanford, Oxford] where they will get a graphical representation of the non-printable page that contains the quotation. Why they chose to call it "Google Print" when printing is outlawed, I have no idea. Michael From hart at pglaf.org Wed Dec 15 07:48:59 2004 From: hart at pglaf.org (Michael Hart) Date: Wed Dec 15 18:43:26 2004 Subject: [gutvol-d] Re: [ebook-community] Google Question for Michael Hart In-Reply-To: <000001c4e263$6e637cf0$0200a8c0@BABA> References: <000001c4e263$6e637cf0$0200a8c0@BABA> Message-ID: On Tue, 14 Dec 2004, Roy Lewis wrote: > > What is Michael Hart's take on the latest from Google? I wonder how > this will impact Project Gutenberg? Will this do what you have been > trying to do but with a LOTS of MONEY behind it? > > Roy Lewis > Garland, TX Today is the day I have to write and send the Project Gutenberg Weekly Newsletter, and I have only 2:20 to the deadline, so I hope you will allow me to come back to answer that in a bit, but if you have a specific question I hope I can answer right away. However, I think you will find that these billion dollar giants don't actually have anything in mind that would be definable as an eBook as the term has been being use. . .i.e. you can't keep it, you can't print it, you can't cut and paste quotations that are more than a "snippet," as they call it, you can't make your own concordance, index, edition, or carry a million dollars of retail value books with you on a DVD or two. In addition, I guarantee that Project Gutenberg will be the first to offer such a "Million Dollar DVD" of eBooks, and will be the first to present a collection of 50,000 eBooks, and, most liklely will be the first to offer 100,000 eBooks for any kind of service, but certainly for free download. As for getting into the millions, I'm going to wait until we're approaching 100,000 to focus all that tightly on getting into 7 figures of eBooks. BTW, they said 15 million eBooks. . .and I'm not sure they HAVE 15 million eBooks that they can legally use in the worldwide service they announced yesterday. I'd certainly be willing to bet dinner on it! More later, Thanks!!! So Nice To Hear From You! Happy Holidays!!! Michael Give FreeBooks!!! In 39 Languages!!! As of December 12, 2004 ~14,683 FreeBooks at: ~317 to go to 15,000 http://www.gutenberg.org http://www.gutenberg.net We are ~96% of the way from 10,000 to 15,000. Now even more PG eBooks In 104 Languages!!! http://gutenberg.cc http://gutenberg.us Michael S. Hart Project Gutenberg Executive Coordinator^M "*Internet User ~#100*" If you do not receive a prompt reply, please resend, keep resending. From george at pglaf.org Wed Dec 15 11:07:08 2004 From: george at pglaf.org (George Davis) Date: Wed Dec 15 18:43:27 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] Message-ID: Apologies for coming in late here, but it should be noted that the following entry was added to the GUTINDEX in April, 2003: May 1997 Decline/Fall Of The Roman Empire, by Gibbon, Folio[dfre310f.xxx] 900 (NOTE: in proprietary Folio .nfo format; Vol. 3 only.) (See also: #890-895 for HTML format, #731-736 for plain text.) Also the following notes, especially for #892: Apr 1997 Decline/Fall Of The Roman Empire, by Gibbon V6 htm[dfre6xxh.xxx] 895 Apr 1997 Decline/Fall Of The Roman Empire, by Gibbon V5 htm[dfre5xxh.xxx] 894 Apr 1997 Decline/Fall Of The Roman Empire, by Gibbon V4 htm[dfre4xxh.xxx] 893 Apr 1997 Decline/Fall Of The Roman Empire, by Gibbon V3 htm[dfre3xxx.xxx] 892 [This vol only also available as plain text in dfre3xx.txt/.zip] Apr 1997 Decline/Fall Of The Roman Empire, by Gibbon V2 htm[dfre2xxh.xxx] 891 Apr 1997 Decline/Fall Of The Roman Empire, by Gibbon V1 htm[dfre1xxh.xxx] 890 [Author: Edward Gibbon] (Note: The above 6 files are HTML conversions of ebook #'s 731-736) Occasionally, one may find other tidbits of useful information inside GUTINDEX.ALL, especially for some of the more esoteric items. It is a holding place for such info until such time as something better comes along. For example, the above may or may not be useful when updating a bibrec in the future. And as I haven't expressed an opinion lately, herewith is mine for #900: #900 should be moved to /9/0/900/old/, and a 900-readme.txt or dfre310f- readme.txt should be placed in /9/0/900/ explaining the situation, including the reasons it has not been discarded; it _is_ a part of PG history, as documented in the newsletters over the years, and prior to that, in the various maillists. And besides, it makes for lively discussions every couple of years. If no one else wants to, I'll write a brief (less than 10K words?) readme when the time comes to "update" this posting. FWIW, [eorge] From hacker at gnu-designs.com Wed Dec 15 19:56:57 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Dec 15 19:57:26 2004 Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: References: <20041214234529.2976.qmail@web60701.mail.yahoo.com> Message-ID: > > Did they say OCR or did you deduce that? I got the impression they > > are imaging pages, and maybe adding some identifying keywords for > > each page. That is you'll be able to Google to a title chapter and > > page maybe, but you won't be able to Google within pages. Try > > OCR'ing some of the stuff in the Bodleian...there ain't no such > > fonts!! Does anyone know what they mean by digitizing? I don't think so. Have you seen catalog.google.com? David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From shalesller at writeme.com Wed Dec 15 20:18:06 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Dec 15 20:18:22 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] Message-ID: <20041216041806.956774BDAB@ws1-1.us4.outblaze.com> > From the standpoint of an archivist, I have to fall on the side of > keeping all file formats available and accessible to the reader. PG is > as much a historical catalog as it is a library. Since when? Why? If we want to teach people about the death of old formats, maybe we should have a page about old formats, and how WordStar and Folio and other formats were da bomb, and how it's hard to find anything that can read them now. If they come across them in a search, how will they even know that it's an old format nobody can read? For all I would have known before this discussion, you could run out and buy an ebook reader that takes Folio, or download a program to read them. Remember that it's not just proprietary formats that die; I seem to remember code to read WordStar files in one of my old programming books, and there's a bunch of open source programs where you'd have to go through old CDs to find a version of the program that could read your files. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From j.hagerson at comcast.net Wed Dec 15 20:23:57 2004 From: j.hagerson at comcast.net (John Hagerson) Date: Wed Dec 15 20:24:21 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <20041216041806.956774BDAB@ws1-1.us4.outblaze.com> Message-ID: <000c01c4e327$12d3da00$6401a8c0@sarek> What is the purpose of PG? Is it to be a white-haired old man, with a scraggy beard, carrying a sign on the beach that says "Proprietary e-book formats may die!" or is it to provide a repository of information that is useful today and into the future? I wholeheartedly second the notion of moving obsolete formats into a "hall of shame." From jon at noring.name Wed Dec 15 20:39:35 2004 From: jon at noring.name (Jon Noring) Date: Wed Dec 15 20:40:01 2004 Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: References: <20041214234529.2976.qmail@web60701.mail.yahoo.com> Message-ID: <19677883578.20041215213935@noring.name> Tony Kline wrote: > Bowerbird@aol.com wrote: >> well, since google _is_ a search engine, they'll obviously o.c.r. the text. >> and clean up the text, because errors would muck up their search engine. > Did they say OCR or did you deduce that? I got the impression they are > imaging pages, and maybe adding some identifying keywords for each page. > That is you'll be able to Google to a title chapter and page maybe, but > you won't be able to Google within pages. Try OCR'ing some of the stuff > in the Bodleian...there ain't no such fonts!! Does anyone know what > they mean by digitizing? My understanding, which may be wrong, is that Google will OCR the page scans, but do only cursory machine cleanup of the raw unstructured text that results (which I call "raw digital text" or RDT), and use the still-error-laden RDT in their search system to pull up the page scans (or simply to refer to book title and page number.) [Obviously, RDT will have numerous scanning errors, and those who are familiar with the output of OCR engines know that that RDT is overall one big ball of wax. Certainly Google can write some advanced program to try to clean up the more obvious scanning errors in the RDT, but it will only correct some of the errors, but the result is probably good enough for search purposes. I rather doubt they will do any human proofing (it is way too expensive, and anyway, it's better to turn the public domain stuff over to Distributed Proofreaders who will do it *for free* via enthusiastic volunteer power. Any corporate entity that does not take advantage of free human labor to further their business is not serving their stockholders!)] Interestingly, this is what the University of Michigan (one of the Google partners I believe) did in their "Making of America" collection, which has been around for a few years now. See: http://www.hti.umich.edu/m/moagrp/ MoA scanned the books, placed the scanned page images online (they are freely available -- it's a cool collection that, strangely, hardly anyone has heard of), and built a search engine to search the resulting RDT from OCR. Then one by one they have been converting the RDT from selected books to highly-proofed SDT (structured digital text) using human proofers and TEI (I think) for structuring. So, the scans came first, and then the cleanup was (and is being) done at a later time. It's entirely possible that Google will give, upon request, the page scans for any public domain books they've scanned to established groups like Distributed Proofreaders for conversion into proofed SDT, so long as Google gets a copy of the resulting high-quality SDT. I hope they will do this. If not, it will be disappointing -- but at least we have the Internet Archive who will make all their scanned books available to the world. They may end up with over one million books, enough to feed Distributed Proofreaders for quite a while. Jon Noring From Bowerbird at aol.com Wed Dec 15 22:50:08 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Dec 15 22:50:29 2004 Subject: [gutvol-d] excuse me Message-ID: <1d8.32b83f3f.2ef28a20@aol.com> jon noring, please stop sending your replies to my bookpeople posts to other listserves, including one from which you've banned me. thank you. -bowerbird From hart at pglaf.org Thu Dec 16 06:51:01 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 16 06:51:03 2004 Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: <19677883578.20041215213935@noring.name> References: <20041214234529.2976.qmail@web60701.mail.yahoo.com> <19677883578.20041215213935@noring.name> Message-ID: >From what I understand, Google and the five libraries are going to do some serious DRM on all their sites and these scans and files will NOT go out. mh On Wed, 15 Dec 2004, Jon Noring wrote: > Tony Kline wrote: >> Bowerbird@aol.com wrote: > >>> well, since google _is_ a search engine, they'll obviously o.c.r. the text. >>> and clean up the text, because errors would muck up their search engine. > >> Did they say OCR or did you deduce that? I got the impression they are >> imaging pages, and maybe adding some identifying keywords for each page. >> That is you'll be able to Google to a title chapter and page maybe, but >> you won't be able to Google within pages. Try OCR'ing some of the stuff >> in the Bodleian...there ain't no such fonts!! Does anyone know what >> they mean by digitizing? > > My understanding, which may be wrong, is that Google will OCR the > page scans, but do only cursory machine cleanup of the raw unstructured > text that results (which I call "raw digital text" or RDT), and use the > still-error-laden RDT in their search system to pull up the page scans > (or simply to refer to book title and page number.) > > [Obviously, RDT will have numerous scanning errors, and those who are > familiar with the output of OCR engines know that that RDT is overall > one big ball of wax. Certainly Google can write some advanced program > to try to clean up the more obvious scanning errors in the RDT, but it > will only correct some of the errors, but the result is probably good > enough for search purposes. I rather doubt they will do any human > proofing (it is way too expensive, and anyway, it's better to turn the > public domain stuff over to Distributed Proofreaders who will do it > *for free* via enthusiastic volunteer power. Any corporate entity that > does not take advantage of free human labor to further their business > is not serving their stockholders!)] > > Interestingly, this is what the University of Michigan (one of the > Google partners I believe) did in their "Making of America" collection, > which has been around for a few years now. See: > > http://www.hti.umich.edu/m/moagrp/ > > MoA scanned the books, placed the scanned page images online (they > are freely available -- it's a cool collection that, strangely, hardly > anyone has heard of), and built a search engine to search the > resulting RDT from OCR. Then one by one they have been converting the > RDT from selected books to highly-proofed SDT (structured digital text) > using human proofers and TEI (I think) for structuring. So, the scans > came first, and then the cleanup was (and is being) done at a later > time. > > It's entirely possible that Google will give, upon request, the page > scans for any public domain books they've scanned to established > groups like Distributed Proofreaders for conversion into proofed SDT, > so long as Google gets a copy of the resulting high-quality SDT. I > hope they will do this. If not, it will be disappointing -- but at > least we have the Internet Archive who will make all their scanned > books available to the world. They may end up with over one million > books, enough to feed Distributed Proofreaders for quite a while. > > Jon Noring > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Thu Dec 16 07:05:54 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 16 07:05:55 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <000c01c4e327$12d3da00$6401a8c0@sarek> References: <000c01c4e327$12d3da00$6401a8c0@sarek> Message-ID: On Wed, 15 Dec 2004, John Hagerson wrote: > What is the purpose of PG? Is it to be a white-haired old man, with a > scraggy beard, carrying a sign on the beach that says "Proprietary e-book > formats may die!" or is it to provide a repository of information that is > useful today and into the future? Riiight. . .one book out of an entire library. Just put the appropriate note in it and move on. . . . [I think the current note could be improved. . .I don't know who wrote it, but it wasn't much of an issue then, and shouldn't be now. Tempest=Teapot] As long as people are proposing .pdf for all eBooks, which they are, this is a serious issue, but even when, hopefully it is not, it is not something that should be forgotten. Such as when WordStar went to court to claim copyright on ALL documents stored in WordStar format, and wanted royalties every time you used the documents you wrote yourself. Think this is silly? It was a HUGE case! Now swept under the carpet. [not to mention the people who tried to copyright the human genome, or who DID patent one person's genome. . .Mr. Moore, who was immune to a form of cancer.] This is a non-issue. . .no one else is every going to notice as much as this week. From nihil_obstat at mindspring.com Thu Dec 16 07:09:04 2004 From: nihil_obstat at mindspring.com (Dennis McCarthy) Date: Thu Dec 16 07:09:09 2004 Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others to Digitize Libraries Message-ID: <28363984.1103209744189.JavaMail.root@wamui02.slb.atl.earthlink.net> "Non-printable" page? If you can display it on a screen, it should not be too difficult to capture the image. Can the "Print Screen" capture method be disabled? (Copies the screen's visual display to the clipboard, at least on MS-Windows--presume there is something similar for Linux and Mac.) Or will they try to figure out a way to keep that captured image from being fed to (or rendered unreadable) an OCR program? Time will tell, but my guess is that these page images will one way or another become a source of material for future PG volunteers. -----Original Message----- From: Michael Hart Sent: Dec 15, 2004 10:34 AM To: Book People Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others to Digitize Libraries > if they want more, they will have to click on the item and will then arrive at a second database, this one provided by one of the five libraries [NYCPL, Harvard, Michigan, Stanford, Oxford] where they will get a graphical representation of the non-printable page that contains the quotation. --------------------------- Dennis McCarthy nihil_obstat@mindspring.com From hart at pglaf.org Thu Dec 16 07:09:51 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 16 07:09:53 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <20041216041806.956774BDAB@ws1-1.us4.outblaze.com> References: <20041216041806.956774BDAB@ws1-1.us4.outblaze.com> Message-ID: On the one hand people complain that eBooks in general will never last, simply because those big gov't databases were kept in formats no one can read today. . .on the other hand you don't want this to be mentioned up front. . . . None of the people arguing this case were there when we met with the President of Folio, none of them were part of doing Gibbon's "Roman Empire" . . .so please just leave it be. Some day, when you are all gone, perhaps someone else will sweep your efforts under the carpet. . .and Google will go down as the inventor of eBooks and the first eBook library. On Wed, 15 Dec 2004, D. Starner wrote: >> From the standpoint of an archivist, I have to fall on the side of >> keeping all file formats available and accessible to the reader. PG is >> as much a historical catalog as it is a library. > > Since when? Why? > > If we want to teach people about the death of old > formats, maybe we should have a page about old formats, and how WordStar > and Folio and other formats were da bomb, and how it's hard to find anything > that can read them now. If they come across them in a search, how will they > even know that it's an old format nobody can read? For all I would have > known before this discussion, you could run out and buy an ebook reader that > takes Folio, or download a program to read them. > > Remember that it's not just proprietary formats that die; I seem to remember > code to read WordStar files in one of my old programming books, and there's > a bunch of open source programs where you'd have to go through old CDs to > find a version of the program that could read your files. > -- > ___________________________________________________________ > Sign-up for Ads Free at Mail.com > http://promo.mail.com/adsfreejump.htm > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joshua at hutchinson.net Thu Dec 16 07:20:52 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Dec 16 07:20:55 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] Message-ID: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> You know, it's like you're deliberately trying to make me angry! NO ONE HAS SUGGESTED SWEEPING IT AWAY! In fact, every person that has suggested a change of some kind has advocated putting the obsolete format document somewhere accessible. Just not right out in front where an uninformed visitor will see it, click it and get frustrated. It reflects poorly on PG as a whole and turns off potential users from ever coming back. Move the bloody thing into the OLD subdirectory. That's what the OLD subdirectory is for. Use it as such. Is the text version we have the exact same document as the folio version or where they created from separate sources? If it is the same, we should move the folio into the text's etext number and free up a number. If they are from separate sources, can any somehow generate a text file from the Folio file we have? Josh ----- Original Message ----- From: "Michael Hart" To: "Project Gutenberg Volunteer Discussion" Subject: Re: !@!Re: [gutvol-d] [Fwd: Folio files] Date: Thu, 16 Dec 2004 07:09:51 -0800 (PST) > > > On the one hand people complain that eBooks in general will never last, > simply because those big gov't databases were kept in formats no one > can read today. . .on the other hand you don't want this to be mentioned > up front. . . . > > None of the people arguing this case were there when we met with the > President of Folio, none of them were part of doing Gibbon's "Roman Empire" > . . .so please just leave it be. > > Some day, when you are all gone, perhaps someone else will sweep your > efforts under the carpet. . .and Google will go down as the inventor > of eBooks and the first eBook library. > > > > On Wed, 15 Dec 2004, D. Starner wrote: > > >> From the standpoint of an archivist, I have to fall on the side of > >> keeping all file formats available and accessible to the reader. PG is > >> as much a historical catalog as it is a library. > > > > Since when? Why? > > > > If we want to teach people about the death of old > > formats, maybe we should have a page about old formats, and how WordStar > > and Folio and other formats were da bomb, and how it's hard to find anything > > that can read them now. If they come across them in a search, how will they > > even know that it's an old format nobody can read? For all I would have > > known before this discussion, you could run out and buy an ebook reader that > > takes Folio, or download a program to read them. > > > > Remember that it's not just proprietary formats that die; I seem to remember > > code to read WordStar files in one of my old programming books, and there's > > a bunch of open source programs where you'd have to go through old CDs to > > find a version of the program that could read your files. > > -- > > ___________________________________________________________ > > Sign-up for Ads Free at Mail.com > > http://promo.mail.com/adsfreejump.htm > > > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d@lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From jonathan.gorman at gmail.com Thu Dec 16 07:36:13 2004 From: jonathan.gorman at gmail.com (Jon Gorman) Date: Thu Dec 16 07:36:18 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> Message-ID: <4a6dc7604121607365bb13e91@mail.gmail.com> On Thu, 16 Dec 2004 10:20:52 -0500, Joshua Hutchinson wrote: > You know, it's like you're deliberately trying to make me angry! > > NO ONE HAS SUGGESTED SWEEPING IT AWAY! > > In fact, every person that has suggested a change of some kind has advocated putting >the obsolete format document somewhere accessible. Just not right out in front where an >uninformed visitor will see it, click it and get frustrated. It reflects poorly on PG as a whole >and turns off potential users from ever coming back. Given my rather infrequent posting to this list (although long time lurking from a variety of email addresses) I'm rather hesitant to throw more fuel on the fire. But I have to agree with the idea behind Joshu Hutchinson and Jon Noring's suggestions. The folio is confusing when it is the first return result, and people do have a tendency to hit the first result. I believe Greenstone (the new software behind the scenes at gutenberg.org) allows pretty precise sorting of returns on various conditions. Would it be possible to always return the text format as the first return? This would help highlight the importance of the text format without having to decide when a format is outdated or unsupported, needs to be moved to the suggested "old" directory, or a "stupid, stupid formats" page. I know my first thought when seeing the folio was to think "That's got to be an error, who would be crazy enough to publish that as a folio". But my excuse was it as a long day ;). Jon Gorman From hart at pglaf.org Thu Dec 16 07:58:26 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 16 07:58:27 2004 Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: <28363984.1103209744189.JavaMail.root@wamui02.slb.atl.earthlink.net> References: <28363984.1103209744189.JavaMail.root@wamui02.slb.atl.earthlink.net> Message-ID: On Thu, 16 Dec 2004, Dennis McCarthy wrote: > > "Non-printable" page? > > If you can display it on a screen, it should not be too difficult to capture > the image. > > Can the "Print Screen" capture method be disabled? (Copies the screen's > visual display to the clipboard, at least on MS-Windows--presume there is > something similar for Linux and Mac.) > > Or will they try to figure out a way to keep that captured image from being > fed to (or rendered unreadable) an OCR program? As I always predict, with every generation of DRM, some 14 year old will figure out a way immediately, before they have even finished their initial tests of the Google Print project. Interesting, tho, that they called it Google PRINT when PRING is exactly what you can NOT do. . . . I wonder if they plan to charge for printing? mh From hart at pglaf.org Thu Dec 16 08:04:49 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 16 08:04:51 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <4a6dc7604121607365bb13e91@mail.gmail.com> References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <4a6dc7604121607365bb13e91@mail.gmail.com> Message-ID: On Thu, 16 Dec 2004, Jon Gorman wrote: > On Thu, 16 Dec 2004 10:20:52 -0500, Joshua Hutchinson > wrote: >> You know, it's like you're deliberately trying to make me angry! >> >> NO ONE HAS SUGGESTED SWEEPING IT AWAY! >> >> In fact, every person that has suggested a change of some kind has advocated putting >the obsolete format document somewhere accessible. Just not right out in front where an >uninformed visitor will see it, click it and get frustrated. It reflects poorly on PG as a whole >and turns off potential users from ever coming back. > > Given my rather infrequent posting to this list (although long time > lurking from a variety of email addresses) I'm rather hesitant to > throw more fuel on the fire. But I have to agree with the idea behind > Joshu Hutchinson and Jon Noring's suggestions. The folio is > confusing when it is the first return result, and people do have a > tendency to hit the first result. No one is suggesting it should be the first result. > I believe Greenstone (the new software behind the scenes at > gutenberg.org) allows pretty precise sorting of returns on various > conditions. Would it be possible to always return the text format as > the first return? This would help highlight the importance of the > text format without having to decide when a format is outdated or > unsupported, needs to be moved to the suggested "old" directory, or a > "stupid, stupid formats" page. However, this sort of sweeping out of sight is not acceptable. Try again when those of us who spent all the effort on this Folio project are dead, eh? > I know my first thought when seeing the folio was to think "That's got > to be an error, who would be crazy enough to publish that as a folio". That's the whole point. . .let us make that point. From hacker at gnu-designs.com Thu Dec 16 08:08:13 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Thu Dec 16 08:08:42 2004 Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: References: <28363984.1103209744189.JavaMail.root@wamui02.slb.atl.earthlink.net> Message-ID: > Interesting, tho, that they called it Google PRINT when PRING is > exactly what you can NOT do. . . . Noun vs. verb. Its "Print"(ed) media, but you cannot "print" it on your normal printer for personal or commercial use. The same sort of confusion surrounds "DRM", which has absolutely nothing to do with "Rights" at all. > I wonder if they plan to charge for printing? If they do, this means they have 100% rights to do so, from the copyright holder(s), assuming the copyright is still in effect. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From hart at pglaf.org Thu Dec 16 08:10:26 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 16 08:10:27 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> Message-ID: On Thu, 16 Dec 2004, Joshua Hutchinson wrote: > You know, it's like you're deliberately trying to make me angry! Sweeping it under the carpet is exactly what you are promoting here. > > NO ONE HAS SUGGESTED SWEEPING IT AWAY! Again: Sweeping it under the carpet is exactly what you are promoting here. > > In fact, every person that has suggested a change of some kind has advocated > putting the obsolete format document somewhere accessible. Just not right > out in front where an uninformed visitor will see it, click it and get > frustrated. It reflects poorly on PG as a whole and turns off potential > users from ever coming back. It this were the case, lots of people would have complained by now. You are insiders. . .you have a distinctly different viewpoint. > > Move the bloody thing into the OLD subdirectory. That's what the OLD > subdirectory is for. Use it as such. Again: Sweeping it under the carpet is exactly what you are promoting here. > Is the text version we have the exact same document as the folio version or > where they created from separate sources? If it is the same, we should move > the folio into the text's etext number and free up a number. If they are > from separate sources, can any somehow generate a text file from the Folio > file we have? This is exactly the reason for having a separate number, so people will NOT get the .nfo format unless they want it. BTW, you can still get the Folio reader with the TIME Magazing CDs which sell for $1. > Josh > > ----- Original Message ----- > From: "Michael Hart" > To: "Project Gutenberg Volunteer Discussion" > Subject: Re: !@!Re: [gutvol-d] [Fwd: Folio files] > Date: Thu, 16 Dec 2004 07:09:51 -0800 (PST) > >> >> >> On the one hand people complain that eBooks in general will never last, >> simply because those big gov't databases were kept in formats no one >> can read today. . .on the other hand you don't want this to be mentioned >> up front. . . . >> >> None of the people arguing this case were there when we met with the >> President of Folio, none of them were part of doing Gibbon's "Roman Empire" >> . . .so please just leave it be. >> >> Some day, when you are all gone, perhaps someone else will sweep your >> efforts under the carpet. . .and Google will go down as the inventor >> of eBooks and the first eBook library. >> >> >> >> On Wed, 15 Dec 2004, D. Starner wrote: >> >>>> From the standpoint of an archivist, I have to fall on the side of >>>> keeping all file formats available and accessible to the reader. PG is >>>> as much a historical catalog as it is a library. >>> >>> Since when? Why? >>> >>> If we want to teach people about the death of old >>> formats, maybe we should have a page about old formats, and how WordStar >>> and Folio and other formats were da bomb, and how it's hard to find anything >>> that can read them now. If they come across them in a search, how will they >>> even know that it's an old format nobody can read? For all I would have >>> known before this discussion, you could run out and buy an ebook reader that >>> takes Folio, or download a program to read them. >>> >>> Remember that it's not just proprietary formats that die; I seem to remember >>> code to read WordStar files in one of my old programming books, and there's >>> a bunch of open source programs where you'd have to go through old CDs to >>> find a version of the program that could read your files. >>> -- >>> ___________________________________________________________ >>> Sign-up for Ads Free at Mail.com >>> http://promo.mail.com/adsfreejump.htm >>> >>> _______________________________________________ >>> gutvol-d mailing list >>> gutvol-d@lists.pglaf.org >>> http://lists.pglaf.org/listinfo.cgi/gutvol-d >>> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d@lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Thu Dec 16 08:21:46 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Dec 16 08:21:55 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <4a6dc7604121607365bb13e91@mail.gmail.com> References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <4a6dc7604121607365bb13e91@mail.gmail.com> Message-ID: <41C1B61A.5010704@perathoner.de> Jon Gorman wrote: > I believe Greenstone (the new software behind the scenes at > gutenberg.org) allows pretty precise sorting of returns on various > conditions. What? Who installed Greenstone without my noticing it? > Would it be possible to always return the text format as > the first return? How can the software know that #900, #733 and #892 are the same book ? (If they are indeed the same, which I cannot establish, lacking a Folio viewer.) The Right Thing to do is to reindex all formats (TXT, HTML, Folio) under one etext number. Then the software would sort it in a sensible way. -- Marcello Perathoner webmaster@gutenberg.org From jonathan.gorman at gmail.com Thu Dec 16 08:22:10 2004 From: jonathan.gorman at gmail.com (Jon Gorman) Date: Thu Dec 16 08:22:15 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <4a6dc7604121607365bb13e91@mail.gmail.com> Message-ID: <4a6dc7604121608222f463d12@mail.gmail.com> On Thu, 16 Dec 2004 08:04:49 -0800 (PST), Michael Hart wrote: > > > On Thu, 16 Dec 2004, Jon Gorman wrote: > > > On Thu, 16 Dec 2004 10:20:52 -0500, Joshua Hutchinson > > wrote: > >> You know, it's like you're deliberately trying to make me angry! > >> > >> NO ONE HAS SUGGESTED SWEEPING IT AWAY! > >> > >> In fact, every person that has suggested a change of some kind has advocated putting >the obsolete format document somewhere accessible. Just not right out in front where an >uninformed visitor will see it, click it and get frustrated. It reflects poorly on PG as a whole >and turns off potential users from ever coming back. > > > > Given my rather infrequent posting to this list (although long time > > lurking from a variety of email addresses) I'm rather hesitant to > > throw more fuel on the fire. But I have to agree with the idea behind > > Joshu Hutchinson and Jon Noring's suggestions. The folio is > > confusing when it is the first return result, and people do have a > > tendency to hit the first result. > > No one is suggesting it should be the first result. > > > > I believe Greenstone (the new software behind the scenes at > > gutenberg.org) allows pretty precise sorting of returns on various > > conditions. Would it be possible to always return the text format as > > the first return? This would help highlight the importance of the > > text format without having to decide when a format is outdated or > > unsupported, needs to be moved to the suggested "old" directory, or a > > "stupid, stupid formats" page. > > However, this sort of sweeping out of sight is not acceptable. Michael, I think people are trying to understand what you mean by hiding or sweeping away. The mere fact the folio appears first is an unintentional accident of sorting. You yourself says that no one is arguing it should be first. Yet, down here you say changing the order is not acceptable. Should we be moving all the obsolete formats to the front, essentially doing the opposite? What service does that provide? Indeed by having the text format be first and foremost, it should send a clear signal of the preferred format, and can be linked to another page explaining why. > > Try again when those of us who spent all the effort on this Folio > project are dead, eh? > Michael, no one is trying to disparage your efforts. Indeed, I have some questions. Do we have it in writing there would always be a free reader? I've seen some algorithms and code that decodes the folio format. Would the lack a free reader for the folio allow these to be legally available for a person to develop a reader/converter program for it? > > > I know my first thought when seeing the folio was to think "That's got > > to be an error, who would be crazy enough to publish that as a folio". > > That's the whole point. . .let us make that point. Right.....but how often are encodings named after words like folio or quarto? Just serves another dose of confusion. > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From jonathan.gorman at gmail.com Thu Dec 16 08:34:23 2004 From: jonathan.gorman at gmail.com (Jon Gorman) Date: Thu Dec 16 08:34:28 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <41C1B61A.5010704@perathoner.de> References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <4a6dc7604121607365bb13e91@mail.gmail.com> <41C1B61A.5010704@perathoner.de> Message-ID: <4a6dc7604121608345a650580@mail.gmail.com> On Thu, 16 Dec 2004 17:21:46 +0100, Marcello Perathoner wrote: > Jon Gorman wrote: > > > I believe Greenstone (the new software behind the scenes at > > gutenberg.org) allows pretty precise sorting of returns on various > > conditions. > > What? Who installed Greenstone without my noticing it? Errr, oops. Dang, sorry about that. Could have sworn I heard a bit ago that you guys were putting it in and the new site (gutenburg.org) sure looks like Greenstone. Sorry about that. This is exactly why I usually keep my mouth shut about these types of things ;). Jon Gorman From joshua at hutchinson.net Thu Dec 16 08:34:53 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Dec 16 08:35:03 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] Message-ID: <20041216163453.D512F4F507@ws6-5.us4.outblaze.com> Michael, I know you're not this obtuse, so there must be something we are not understanding about each other's stance. > > > You know, it's like you're deliberately trying to make me angry! > > Sweeping it under the carpet is exactly what you are promoting here. > Maybe it is the word sweeping. "Sweeping it away" means deleting to me. I am most definitely NOT advocating that. (And no one I've seen has been.) I am advocating MOVING the file so that it is not the first thing someone sees when they do a search for that text. The easiest way to do that is to move it into the OLD directory. It is what it was created for. It is still there for anyone interested in PG history, but it doesn't confuse the average user who just wants to be able to read the e-book. > > It this were the case, lots of people would have complained by now. > > You are insiders. . .you have a distinctly different viewpoint. > It is well known that when people are searching the web, if they don't understand something, they are FAR more likely to just click away and never return. For every one person that complains, you'll have hundreds, if not thousands, that just clicked away never to return. > > > Is the text version we have the exact same document as the folio version or > > where they created from separate sources? If it is the same, we should move > > the folio into the text's etext number and free up a number. If they are > > from separate sources, can any somehow generate a text file from the Folio > > file we have? > > This is exactly the reason for having a separate number, > so people will NOT get the .nfo format unless they want it. > > BTW, you can still get the Folio reader with the TIME Magazing CDs > which sell for $1. > Moving the file to a new number is probably not a good idea. I was just kind of thinking out loud there. But it should be moved to the OLD directory so that Joe User doesn't see it as the FIRST THING IN HIS SEARCH LIST. Josh From hart at pglaf.org Thu Dec 16 08:38:08 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 16 08:38:09 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <4a6dc7604121608222f463d12@mail.gmail.com> References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <4a6dc7604121607365bb13e91@mail.gmail.com> <4a6dc7604121608222f463d12@mail.gmail.com> Message-ID: On Thu, 16 Dec 2004, Jon Gorman wrote: >>> Joshu Hutchinson and Jon Noring's suggestions. The folio is >>> confusing when it is the first return result, and people do have a >>> tendency to hit the first result. >> >> No one is suggesting it should be the first result. >> >> >>> I believe Greenstone (the new software behind the scenes at >>> gutenberg.org) allows pretty precise sorting of returns on various >>> conditions. Would it be possible to always return the text format as >>> the first return? This would help highlight the importance of the >>> text format without having to decide when a format is outdated or >>> unsupported, needs to be moved to the suggested "old" directory, or a >>> "stupid, stupid formats" page. >> >> However, this sort of sweeping out of sight is not acceptable. > > Michael, I think people are trying to understand what you mean by > hiding or sweeping away. Putting something where it is not likely be be seen is sweeping under the carpet. . .period. > The mere fact the folio appears first is an unintentional accident of sorting. Then change the sorting technique so it is last. . . . > You yourself says that no one is arguing it should be first. I haven't seen anyone say it should be first, no one at all. > Yet, down here you say changing the order is not acceptable. No, I don't. . .just moving it to another directory is. > Should we be moving all the obsolete formats to the front, What kind of question is that? > essentially doing the opposite? What service does that provide? Indeed by > having the text format be first and foremost, it should send a clear signal > of the preferred format, and can be linked to another page explaining why. As above, I am not saying it should come up as first, as default, etc. >> Try again when those of us who spent all the effort on this Folio >> project are dead, eh? >> > > Michael, no one is trying to disparage your efforts. Indeed, I have > some questions. Do we have it in writing there would always be a free > reader? I've seen some algorithms and code that decodes the folio > format. Would the lack a free reader for the folio allow these to be > legally available for a person to develop a reader/converter program for it? Personally, I wouldn't go through the effort, even if Folio has folded and we can get the rights. It's JUST and example. . .leave it be, put in a comment describing this better than the one that is in there now. From hart at pglaf.org Thu Dec 16 08:46:49 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 16 08:46:51 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <20041216163453.D512F4F507@ws6-5.us4.outblaze.com> References: <20041216163453.D512F4F507@ws6-5.us4.outblaze.com> Message-ID: On Thu, 16 Dec 2004, Joshua Hutchinson wrote: > Michael, I know you're not this obtuse, so there must be something we are not understanding about each other's stance. > >> >>> You know, it's like you're deliberately trying to make me angry! >> >> Sweeping it under the carpet is exactly what you are promoting here. >> > > Maybe it is the word sweeping. > > "Sweeping it away" means deleting to me. Are you intentionally misquoting me and thinking no one will notice. "Sweeping under the carpet/rug" is what I said. Putting out of view. > I am most definitely NOT advocating that. (And no one I've seen has been.) It appears the opposite. > I am advocating MOVING the file so that it is not the first thing someone > sees when they do a search for that text. The easiest way to do that is to > move it into the OLD directory. It is what it was created for. It is still > there for anyone interested in PG history, but it doesn't confuse the average > user who just wants to be able to read the e-book. I'm find with changes "so that it is not the first thing someone sees when they do a search for that text." I am NOT fine with sweeping it out of the normal directory. You can do that when there is no one left to remember the issue, or you can try to help them remember the issue, because it IS going to come up again. >> It this were the case, lots of people would have complained by now. >> >> You are insiders. . .you have a distinctly different viewpoint. >> > > It is well known that when people are searching the web, if they don't > understand something, they are FAR more likely to just click away and never > return. For every one person that complains, you'll have hundreds, if not > thousands, that just clicked away never to return. I get messages all the time about things to improve, this has never been one of them. . .not even once. When we get ONE message we consider it. Even if we only get one per year, it is still considered, but it is not considered as the kind of major issue you want it to be. Change the search so it is last. Change the comments about not downloading it unless you have a Folio View. Please add a remark that there used to be a free viewer but that Folio changed its mind, just and any other company might do, such as Adobe about .pdf files. . .and I will be MORE than happy for you, and for me. Good enough for now? Thanks for coming a bit in my direction! Michael PS I hope you will also thank me for moving in your direction. From hart at pglaf.org Thu Dec 16 08:48:39 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 16 08:48:41 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <4a6dc7604121608345a650580@mail.gmail.com> References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <4a6dc7604121607365bb13e91@mail.gmail.com> <41C1B61A.5010704@perathoner.de> <4a6dc7604121608345a650580@mail.gmail.com> Message-ID: On Thu, 16 Dec 2004, Jon Gorman wrote: > On Thu, 16 Dec 2004 17:21:46 +0100, Marcello Perathoner > wrote: >> Jon Gorman wrote: >> >>> I believe Greenstone (the new software behind the scenes at >>> gutenberg.org) allows pretty precise sorting of returns on various >>> conditions. >> >> What? Who installed Greenstone without my noticing it? > > Errr, oops. Dang, sorry about that. Could have sworn I heard a bit > ago that you guys were putting it in and the new site (gutenburg.org) > sure looks like Greenstone. > > Sorry about that. This is exactly why I usually keep my mouth shut > about these types of things ;). > > Jon Gorman Probably some confusion about domain names here: gutenberg.org = gutenberg.net the old site pgcc.net = gutenberg.us = gutenberg.cc the new site mh From gbnewby at pglaf.org Thu Dec 16 08:51:32 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Dec 16 08:51:34 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> Message-ID: <20041216165132.GA6868@pglaf.org> On Thu, Dec 16, 2004 at 08:10:26AM -0800, Michael Hart wrote: > ... > BTW, you can still get the Folio reader with the TIME Magazing CDs > which sell for $1. I wasn't aware of this - do you have a copy? We can make sure Brewster's site archives it, and maybe even provide our own archival copy. -- Greg From ciesiels at bigpond.net.au Thu Dec 16 08:51:09 2004 From: ciesiels at bigpond.net.au (Michael Ciesielski) Date: Thu Dec 16 08:52:05 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <4a6dc7604121607365bb13e91@mail.gmail.com> <41C1B61A.5010704@perathoner.de> <4a6dc7604121608345a650580@mail.gmail.com> Message-ID: <41C1BCFD.9050500@bigpond.net.au> Michael Hart wrote: > Probably some confusion about domain names here: > > gutenberg.org = gutenberg.net the old site > > pgcc.net = gutenberg.us = gutenberg.cc the new site > Uh, excuse me? When did "PG2"/"PGCC"/"WEL" become the new Project Gutenberg site? -- Michael Ciesielski From hart at pglaf.org Thu Dec 16 08:54:02 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 16 08:54:04 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <20041216165132.GA6868@pglaf.org> References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <20041216165132.GA6868@pglaf.org> Message-ID: On Thu, 16 Dec 2004, Greg Newby wrote: > On Thu, Dec 16, 2004 at 08:10:26AM -0800, Michael Hart wrote: >> ... >> BTW, you can still get the Folio reader with the TIME Magazing CDs >> which sell for $1. > > I wasn't aware of this - do you have a copy? We can > make sure Brewster's site archives it, and maybe even > provide our own archival copy. I have TIME, but the reader is NOT the free one. > -- Greg > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From gbnewby at pglaf.org Thu Dec 16 08:56:35 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Dec 16 08:56:36 2004 Subject: greenstone (Re: !@!Re: [gutvol-d] [Fwd: Folio files]) In-Reply-To: <4a6dc7604121608345a650580@mail.gmail.com> References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <4a6dc7604121607365bb13e91@mail.gmail.com> <41C1B61A.5010704@perathoner.de> <4a6dc7604121608345a650580@mail.gmail.com> Message-ID: <20041216165635.GB6868@pglaf.org> On Thu, Dec 16, 2004 at 10:34:23AM -0600, Jon Gorman wrote: > On Thu, 16 Dec 2004 17:21:46 +0100, Marcello Perathoner > wrote: > > Jon Gorman wrote: > > > > > I believe Greenstone (the new software behind the scenes at > > > gutenberg.org) allows pretty precise sorting of returns on various > > > conditions. > > > > What? Who installed Greenstone without my noticing it? > > Errr, oops. Dang, sorry about that. Could have sworn I heard a bit > ago that you guys were putting it in and the new site (gutenburg.org) > sure looks like Greenstone. iBiblio runs the Greenstone search engine, which we link to. It's not bad, but takes a long time to re-index the site (and doesn't do all the filetypes), and is not updated too regularly. We link to it from the gutenberg.org/gutenberg.net pages, as well as Yahoo & Google (which, similarly, we don't run: they just index us as part of their service). > Sorry about that. This is exactly why I usually keep my mouth shut > about these types of things ;). Not at all - it's often not too clear what's "ours" (as in stuff we run) and what's not ours, without detailed reading. -- Greg From hart at pglaf.org Thu Dec 16 08:59:56 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 16 08:59:58 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <41C1BCFD.9050500@bigpond.net.au> References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <4a6dc7604121607365bb13e91@mail.gmail.com> <41C1B61A.5010704@perathoner.de> <4a6dc7604121608345a650580@mail.gmail.com> <41C1BCFD.9050500@bigpond.net.au> Message-ID: On Fri, 17 Dec 2004, Michael Ciesielski wrote: > Michael Hart wrote: > >> Probably some confusion about domain names here: >> >> gutenberg.org = gutenberg.net the old site >> >> pgcc.net = gutenberg.us = gutenberg.cc the new site >> > Uh, excuse me? > > When did "PG2"/"PGCC"/"WEL" become the new Project Gutenberg site? This was announced many times in the Weekly Newsletter, and discussed in several listserv conversations. The current site was online for testing at least since Jun 22, the offical date of change from testing to opening was Nov 4. gutenberg.org replaced gutenberg.net as the preferred domain name for that site during the same period. From gbnewby at pglaf.org Thu Dec 16 09:00:36 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Dec 16 09:00:37 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <20041216165132.GA6868@pglaf.org> Message-ID: <20041216170036.GA7970@pglaf.org> On Thu, Dec 16, 2004 at 08:54:02AM -0800, Michael Hart wrote: > > > On Thu, 16 Dec 2004, Greg Newby wrote: > > >On Thu, Dec 16, 2004 at 08:10:26AM -0800, Michael Hart wrote: > >>... > >>BTW, you can still get the Folio reader with the TIME Magazing CDs > >>which sell for $1. > > > >I wasn't aware of this - do you have a copy? We can > >make sure Brewster's site archives it, and maybe even > >provide our own archival copy. > > I have TIME, but the reader is NOT the free one. Brewster @ TIA has a project to archive such orphaned software (copyrighted, but not being sold/owned). It's legit, & might be a good place to send this software. -- Greg From marcello at perathoner.de Thu Dec 16 09:03:02 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Dec 16 09:03:10 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> Message-ID: <41C1BFC6.4020408@perathoner.de> Michael Hart wrote: > Sweeping it under the carpet is exactly what you are promoting here. Actually we are advocating greater visibility of the files in question. Current situation: if a reader who looks for Gibbon *by chance* happens to download the Folio files, she *may* realize that proprietary formats are bad. Disadvantage: more probably she will not realize where the problem is because nobody told her and just form a bad opinion of PG: "What the hell do they keep around files if nobody can read them ?" Proposed change: move the Folio files out of the catalog, write a "Hall of Shame" page explaining the problem and link to the Folio files from there. Advantage: people who don't look for Gibbon can see the "Hall of Shame" page. People actually realize the problem because it is explained to them. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Thu Dec 16 09:17:13 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Dec 16 09:17:19 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <4a6dc7604121607365bb13e91@mail.gmail.com> <41C1B61A.5010704@perathoner.de> <4a6dc7604121608345a650580@mail.gmail.com> <41C1BCFD.9050500@bigpond.net.au> Message-ID: <41C1C319.6060103@perathoner.de> Michael Hart wrote: >> When did "PG2"/"PGCC"/"WEL" become the new Project Gutenberg site? > > This was announced many times in the Weekly Newsletter, > and discussed in several listserv conversations. And most everybody took exception with the "new" and "old" connotation. > The current site was online for testing at least since Jun 22, > the offical date of change from testing to opening was Nov 4. > gutenberg.org replaced gutenberg.net as the preferred domain > name for that site during the same period. The two changes are completely unrelated. gutenberg.net was abandoned in favor of gutenberg.org because .org is the standard TLD for non-profits. Getting rid of multiple domains also gave us better search engine ranking. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Thu Dec 16 09:17:17 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Dec 16 09:17:23 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: References: <20041216163453.D512F4F507@ws6-5.us4.outblaze.com> Message-ID: <41C1C31D.7030500@perathoner.de> Michael Hart wrote: > I'm find with changes "so that it is not the first thing someone sees > when they do a search for that text." I edited the title so it will sort later. > I am NOT fine with sweeping it out of the normal directory. How fine are you with removing the files from the catalog database? > I get messages all the time about things to improve, this has never > been one of them. . .not even once. Actually we just got a message. The one that started this discussion. > When we get ONE message we consider it. Alright. We got ONE message. > Change the comments about not downloading it unless you have a Folio View. > Please add a remark that there used to be a free viewer but that Folio > changed its mind, just and any other company might do, such as Adobe > about .pdf files. . .and I will be MORE than happy for you, and for me. Go ahead and write a better comment. -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Thu Dec 16 09:39:35 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Dec 16 09:39:41 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] Message-ID: <20041216173935.5F7944F49F@ws6-5.us4.outblaze.com> > > On Thu, 16 Dec 2004, Joshua Hutchinson wrote: > > > Michael, I know you're not this obtuse, so there must be something we are > > not understanding about each other's stance. > > > >> > >>> You know, it's like you're deliberately trying to make me angry! > >> > >> Sweeping it under the carpet is exactly what you are promoting here. > >> > > > > Maybe it is the word sweeping. > > > > "Sweeping it away" means deleting to me. > > Are you intentionally misquoting me and thinking no one will notice. > Yeah, Michael. I'm misquoting you. That's why your original words are exactly two lines above it. My quote was a simple mistake of putting the quote in the wrong spot. It was supposed to read: "Sweeping it" away means deleting it to me. > "Sweeping under the carpet/rug" is what I said. > > Putting out of view. > > > > I am most definitely NOT advocating that. (And no one I've seen has been.) > > It appears the opposite. > > Yep, it looks like you define sweeping differently than I do. Moving the file doesn't mean get rid of it or put where no one can see it. It moves it so that the search bar doesn't bring it up as the default search result. > > You can do that when there is no one left to remember the issue, > or you can try to help them remember the issue, because it IS > going to come up again. > If this really was an issue you cared about, you would have put a section up on the web page with links to the examples of why this is a "bad thing." You just seem to be arguing against change for the sake of arguing. As we are all fond of saying around here, if this bothers you so much, DO something about it. Create a page deriding proprietary formats. > > > I get messages all the time about things to improve, this has never > been one of them. . .not even once. > > When we get ONE message we consider it. What the heck do you call the message that started this whole thread? A big thank you for having an unreadable file out there? Come on, we do have a complaint message! > > Even if we only get one per year, it is still considered, > but it is not considered as the kind of major issue you want it to be. You're right. This isn't major. It should be fixed in about 30 seconds. But for some reason, you're arguing like mad to keep something that results in a lower level of usability. It is really boggling my mind. > > Change the search so it is last. That is exactly what moving the file to the OLD directory (which is a subdirectory of its current location), would do! > > Change the comments about not downloading it unless you have a Folio View. > Please add a remark that there used to be a free viewer but that Folio > changed its mind, just and any other company might do, such as Adobe > about .pdf files. . .and I will be MORE than happy for you, and for me. > Why do we need to handle this one differently than we would any other file in the collection. As (I believe) Andrew pointed out, this would normally be handled by moving the file to the OLD directory. So we have an established manner of handling these situations. You just seem to want to fight against it. If, however, putting a disclaimer in the search field is the best we can get... fine, I'll take it. At least it is something (if not the "best practice" method I'd like to see). Josh From joshua at hutchinson.net Thu Dec 16 09:43:20 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Dec 16 09:43:25 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] Message-ID: <20041216174320.88AA54F42B@ws6-5.us4.outblaze.com> Amen, Marcello. And thank you for spelling it out much clearer and calmer than I have been able to. Josh ----- Original Message ----- From: "Marcello Perathoner" To: "Michael S. Hart" , "Project Gutenberg Volunteer Discussion" Subject: Re: !@!Re: [gutvol-d] [Fwd: Folio files] Date: Thu, 16 Dec 2004 18:03:02 +0100 > > Michael Hart wrote: > > > > Sweeping it under the carpet is exactly what you are promoting here. > > Actually we are advocating greater visibility of the files in question. > > > Current situation: if a reader who looks for Gibbon *by chance* happens to > download the Folio files, she *may* realize that proprietary formats are bad. > > Disadvantage: more probably she will not realize where the problem is because > nobody told her and just form a bad opinion of PG: "What the hell do they keep > around files if nobody can read them ?" > > > > Proposed change: move the Folio files out of the catalog, write a "Hall of > Shame" page explaining the problem and link to the Folio files from there. > > Advantage: people who don't look for Gibbon can see the "Hall of Shame" page. > People actually realize the problem because it is explained to them. > > > > > -- Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From joshua at hutchinson.net Thu Dec 16 09:49:25 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Dec 16 09:49:30 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] Message-ID: <20041216174925.E50A09E82B@ws6-2.us4.outblaze.com> As a side question --- Is PGCC a part of PG? Officially? Does Greg, for instance, have oversight over it. Is anyone from PG (outside of Michael) invovled in this site in any way? I know this is reopening old wounds around here, but the last paragraph on the PGCC home page makes it sound like this is an official part of PG, which I never understood to be the actual case... [quote] Up until now, Project Gutenberg has focused on the creation of the eBooks rather than their distribution and we have spent as much of our time on copyright as on eBook creation and distribution. This is our first attempt focused on distribution rather than creation. [/quote] Another paragraph, though, indicates that PGCC is a subset of the World eBook Library Consortia, which I'm almost positive is not related directly to PG. [quote] Project Gutenberg Consortia Center, promoting global literacy by multiplying intellectual properties though Internet library lending and increasing access to digital archives and repositories. Project Gutenberg Consortia Center is a branch of The World eBook Library Consortia. [/quote] Josh PS Oh, and I strongly disagree with the characterisation of pgcc.net as the "new PG" site, which no matter how you MEAN it sound, it will imply that it is the "replacement" for the OLD site (gutenberg.org). ----- Original Message ----- From: "Michael Ciesielski" To: "Michael S. Hart" , "Project Gutenberg Volunteer Discussion" Subject: Re: !@!Re: [gutvol-d] [Fwd: Folio files] Date: Fri, 17 Dec 2004 03:51:09 +1100 > > Michael Hart wrote: > > > Probably some confusion about domain names here: > > > > gutenberg.org = gutenberg.net the old site > > > > pgcc.net = gutenberg.us = gutenberg.cc the new site > > > Uh, excuse me? > > When did "PG2"/"PGCC"/"WEL" become the new Project Gutenberg site? > > -- > Michael Ciesielski > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From hart at pglaf.org Thu Dec 16 10:55:18 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 16 10:55:20 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <20041216174925.E50A09E82B@ws6-2.us4.outblaze.com> References: <20041216174925.E50A09E82B@ws6-2.us4.outblaze.com> Message-ID: On Thu, 16 Dec 2004, Joshua Hutchinson wrote: > As a side question --- > > Is PGCC a part of PG? Officially? Does Greg, for instance, have oversight > over it. Is anyone from PG (outside of Michael) invovled in this site in any > way? Yes, Greg has spent plenty of time on PGCC, more than plenty. > I know this is reopening old wounds around here, but the last paragraph on > the PGCC home page makes it sound like this is an official part of PG, which > I never understood to be the actual case... Different people want to have different views. . .we haven't pushed it very hard. . .keeping the reports and numbers separate in the Weekly Newsletter, etc. > [quote] Up until now, Project Gutenberg has focused on the creation of the > eBooks rather than their distribution and we have spent as much of our time > on copyright as on eBook creation and distribution. This is our first attempt > focused on distribution rather than creation. [/quote] Yes, the Mission Statements of Project Gutenberg have always made it clear that PG is intended to focus on both the creation and distribution of eBooks. Obviously eBooks must be created before they can be distributed. Many eBook creators insist their books be left in certain formats that have not passed muster with PG processing and post-processing standards, and we pass these on to PGCC, who is willing to post them in original formats, pagination, files, etc. In addition, PGCC surfs the web for any and all possible eBook sites and sends requests to them. > Another paragraph, though, indicates that PGCC is a subset of the World eBook > Library Consortia, which I'm almost positive is not related directly to PG. > [quote] Project Gutenberg Consortia Center, promoting global literacy by > multiplying intellectual properties though Internet library lending and > increasing access to digital archives and repositories. Project Gutenberg > Consortia Center is a branch of The World eBook Library Consortia. [/quote] This is probably something that obviously needs correction, if you can send the exact location I will forward it so it can be corrected immediately. As far as I know, there should be no reference to World eBook Library. BTW, this is a different World Library than donated us the Shakespeare files from which we made book #100. > Josh > > PS Oh, and I strongly disagree with the characterisation of pgcc.net as the > "new PG" site, which no matter how you MEAN it sound, it will imply that it > is the "replacement" for the OLD site (gutenberg.org). Sorry, that was quoted from someone else who used the term "new site (gutenberg.org)" in reference to the location of the Greenstone program, and I obviously should have made that quotation clearly marked. From joshua at hutchinson.net Thu Dec 16 11:24:37 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Dec 16 11:24:44 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] Message-ID: <20041216192437.399854F432@ws6-5.us4.outblaze.com> ----- Original Message ----- From: "Michael Hart" > > On Thu, 16 Dec 2004, Joshua Hutchinson wrote: > > > Another paragraph, though, indicates that PGCC is a subset of the World > > eBook Library Consortia, which I'm almost positive is not related directly > > to PG. > > > [quote] Project Gutenberg Consortia Center, promoting global literacy by > > multiplying intellectual properties though Internet library lending and > > increasing access to digital archives and repositories. Project Gutenberg > > Consortia Center is a branch of The World eBook Library Consortia. [/quote] > > This is probably something that obviously needs correction, if you can send > the exact location I will forward it so it can be corrected immediately. > As far as I know, there should be no reference to World eBook Library. > BTW, this is a different World Library than donated us the Shakespeare > files from which we made book #100. > On the main pgcc.net page in the lower right corner, in the green side bar area. There is a Project Gutenberg Consortia Center logo with the above text below it. > > Josh > > > > PS Oh, and I strongly disagree with the characterisation of pgcc.net as the > > "new PG" site, which no matter how you MEAN it sound, it will imply that it > > is the "replacement" for the OLD site (gutenberg.org). > > Sorry, that was quoted from someone else who used the term > "new site (gutenberg.org)" in reference to the location of > the Greenstone program, and I obviously should have made > that quotation clearly marked. > Fair enough and thanks for the clarification. Josh From gbnewby at pglaf.org Thu Dec 16 12:04:47 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Dec 16 12:04:48 2004 Subject: [gutvol-d] pgcc In-Reply-To: <20041216192437.399854F432@ws6-5.us4.outblaze.com> References: <20041216192437.399854F432@ws6-5.us4.outblaze.com> Message-ID: <20041216200447.GB14891@pglaf.org> Two clarifications: >From one message: >On Thu, 16 Dec 2004, Joshua Hutchinson wrote: >>As a side question --- >> >>Is PGCC a part of PG? Officially? Does Greg, for instance, have oversight >>over it. Is anyone from PG (outside of Michael) invovled in this site in >>any way? > >Yes, Greg has spent plenty of time on PGCC, more than plenty. The question was whether I have oversight. The answer to that question is basically, "no," but a qualified "no." It's definitely true that I've spent plenty of time on PGCC. As mentioned way back when, there were just a few criteria on my list for what the PGCC (n??e PG2) site needed to do in order to use the "Project Gutenberg" trademark. Both Michael and I (via PGLAF) have oversight for use of the mark. John Guagliardo (john@guagliardo.cc) is the person behind PGCC who funded & orchastrated it. The list was fairly straightforward: make sure correct small print & attribution is there; don't put our free eBooks behind for-fee sites; decouple the World eBook Library (John's for-fee site) from the PGCC. Also some fundamental usability stuff, though I kept that to a minimum since I don't want to design or maintain someone else's site. John complied with all those things. As Michael mentioned, the newsletter carried requests for proofreading & feedback for the pgcc site for something over 4 months, leading up to a grand opening on November 4. I'm not sure how grand it was, but nevertheless it's there and available. We carry periodic updates from PGCC in the weekly newsletter. During the same time the PGCC site was being rolled out and tested, I expended a fair amount of effort (with Michael & John) to help solve the issues listed above, and also work on some ideas for moving forward. As someone quoted elsewhere, the idea is for PGCC to be a "collection of collections," rather than a producer of eBooks. This is a pretty clear delineation between what we do (gutenberg.org) and what pgcc does, with no substantial overlap in activities. During that same time, Michael and I rolled out some new documents (mis-named FAQ0, FAQ1 and FAQ3) that better describe the link between the mission ("to encourage the creation and distribution of eBooks") and various activities that either spin-off or augment gutenberg.org (such as pg-eu), or that work towards the mission in fairly different ways (such as pgcc). We also ran requests for feedback etc. in the newsletter for several months, and got several good suggestions. >On Thu, Dec 16, 2004 at 02:24:37PM -0500, Joshua Hutchinson wrote: >> >> ----- Original Message ----- >> From: "Michael Hart" >> > >> > On Thu, 16 Dec 2004, Joshua Hutchinson wrote: >> > >> > > Another paragraph, though, indicates that PGCC is a subset of the World >> > > eBook Library Consortia, which I'm almost positive is not related directly >> > > to PG. >> > >> > > [quote] Project Gutenberg Consortia Center, promoting global literacy by >> > > multiplying intellectual properties though Internet library lending and >> > > increasing access to digital archives and repositories. Project Gutenberg >> > > Consortia Center is a branch of The World eBook Library Consortia. [/quote] >> > >> > This is probably something that obviously needs correction, if you can send >> > the exact location I will forward it so it can be corrected immediately. >> > As far as I know, there should be no reference to World eBook Library. >> > BTW, this is a different World Library than donated us the Shakespeare >> > files from which we made book #100. >> > >> >> On the main pgcc.net page in the lower right corner, in the green side bar area. There is a Project Gutenberg Consortia Center logo with the above text below it. >> >> > > Josh >> > > >> > > PS Oh, and I strongly disagree with the characterisation of pgcc.net as the >> > > "new PG" site, which no matter how you MEAN it sound, it will imply that it >> > > is the "replacement" for the OLD site (gutenberg.org). >> > >> > Sorry, that was quoted from someone else who used the term >> > "new site (gutenberg.org)" in reference to the location of >> > the Greenstone program, and I obviously should have made >> > that quotation clearly marked. >> > >> >> Fair enough and thanks for the clarification. Sorry to correct Michael on this, but in fact we specifically decided that it was fine for PGCC to have some sort of credit/reference to WEB, since WEB is the sponsor. My view is that a recognition of such sponsorship is fine, but that it's inappropriate to further entangle the sites (for example, having links on pgcc that go to other pgcc pages, and to also to WEB pages, interspersed). My view is that the pgcc site has an appropriate & minimalist set of links & info about WEB. As always, feedback (to this list, to John, etc.) is welcome. And, let me remind you that there is always opportunity for even more new efforts to support the Gutenberg mission - see http://gutenberg.net/about - there is plenty of good work left to do!!! -- Greg From jmdyck at ibiblio.org Thu Dec 16 13:53:25 2004 From: jmdyck at ibiblio.org (Michael Dyck) Date: Thu Dec 16 13:55:12 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> Message-ID: <41C203D5.86003F86@ibiblio.org> Michael Hart wrote: > > This is exactly the reason for having a separate number, > so people will NOT get the .nfo format unless they want it. The latter is a fine goal, but it seems to me that giving a Folio file a separate etext number achieves precisely the opposite effect. If a volume is available in several formats, the easiest way to convey this fact is in a tabular listing with a "format" column. This is what the PG online catalog's 'bibrec' pages do. However, (I'm pretty sure) a bibrec page can only show data associated with one etext number. Conversely, the pages that show info for multiple etexts (e.g., search results or browse authors) do *not* convey format information. Thus, having a different etext number for a Folio version (or for any particular-format version) actually obscures the format distinction, making it *more* likely that someone will get the .nfo format when they don't want it (or plain text when they'd prefer html, or vice versa, etc). Of course, the decision for Decline & Fall was made back in 1997, before we had bibrec pages, or even much of an online catalog, I think. Perhaps it made more sense given the access and indexing methods of the day, though as far as I can tell, very little use was made of etext numbers in accessing files. (Instead, one used filenames like etext97/dfre310xx.xxx.) Anyway, the argument of people not getting unwanted formats would seem to point in the opposite direction now. Or, as Marcello put it: "The Right Thing to do is to reindex all formats (TXT, HTML, Folio) under one etext number. Then the software would sort it in a sensible way." -Michael From traverso at dm.unipi.it Thu Dec 16 23:40:26 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Thu Dec 16 23:40:48 2004 Subject: !@!Re: [gutvol-d] [Fwd: Folio files] In-Reply-To: <41C203D5.86003F86@ibiblio.org> (message from Michael Dyck on Thu, 16 Dec 2004 13:53:25 -0800) References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> <41C203D5.86003F86@ibiblio.org> Message-ID: <200412170740.iBH7eQbH003244@posso.dm.unipi.it> >>>>> "Michael" == Michael Dyck writes: Michael> Or, as Marcello put it: "The Right Thing to do is to Michael> reindex all formats (TXT, HTML, Folio) under one etext Michael> number. Then the software would sort it in a sensible Michael> way." I agree, but if MH objects to the renumbering another option is to have all the formats in all the numbers. This can be done with symbolic links, so that no duplication of files occurs. We will have a duplication of bibrec records, but even choosing the wrong number you'll get the correct file anyway. And another link can go to the "Hall of shame". Carlo Traverso From marcello at perathoner.de Fri Dec 17 07:22:08 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Dec 17 07:22:13 2004 Subject: [gutvol-d] Distributed Proofreaders in German Message-ID: <41C2F9A0.4090004@perathoner.de> There's a new DP for German texts only at http://www.gaga.net/ producing books for PG-DE http://gutenberg.spiegel.de/ -- Marcello Perathoner webmaster@gutenberg.org From hart at pglaf.org Fri Dec 17 08:45:25 2004 From: hart at pglaf.org (Michael Hart) Date: Fri Dec 17 08:45:27 2004 Subject: [gutvol-d] [BP] "A call to Arms" (fwd) Message-ID: Can anyone send me a plain text copy? http://www.library.unisa.edu.au/about/papers/calltoarms.pdf Thanks & Best Regards, Veenu [Moderator: The URL above is for an 11-page paper titled "A Call to Arms: What in the World is Happening to Information?" by Karen Williams, reference librarian at the University of South Australia. In the abstract, she writes: "We are fighting a battle, and that battle is all about the provision of and access to information. This paper looks briefly at how the provision of information has created gaps between those who have access, and those who do not .... Only if national information policies are moulded with a basis of equal access for all will the future be brighter.... The author concludes that much can be done by librarians..." - JMO] ----------------------------------------------------------------------------- This message was sent via the Book People mailing list. Posting address: spok+bookpeople@cs.cmu.edu Admin. & unsubscribe address: spok+bookpeople-request@cs.cmu.edu Charter: http://onlinebooks.library.upenn.edu/bplist/ From flis at detk.com Fri Dec 17 12:16:12 2004 From: flis at detk.com (William Flis) Date: Fri Dec 17 12:09:41 2004 Subject: [gutvol-d] PG books used by visually impaired In-Reply-To: <41B72AC9.4080105@srv.net> Message-ID: > It's a monotone reading, but usable. More recent versions (i.e. FC3) seem > to sound smoother (less computerized) than earlier ones (i.e. RH9). Don't > know if this is because of the Linux or the festival versions. Comes > standard with most recent RedHat and Fedora Core Linux installs, and > probably others. My old Macintosh (OS 7.5-8) had different "voices" to choose from. One called "Cellos" was definitely not monotone--it sang the words in the melody of "Hall of the Mountain King". I wrote a little poem that fit the meter/rhythm and recorded it as my voice-mail answer--I still use it. The funniest part is that I occasionally receive messages that people have left, sung in the same melody (those are "keepers"!). Call me up for a demo. William J. Flis DE Technologies, Inc. 3620 Horizon Drive King of Prussia, PA 19406 610-270-9700 x130 From stephen.thomas at adelaide.edu.au Wed Dec 15 21:08:20 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Fri Dec 17 16:12:23 2004 Subject: [gutvol-d] Re: Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: <41BEB3A3.1030805@adelaide.edu.au> References: <41BEB3A3.1030805@adelaide.edu.au> Message-ID: <41C11844.9060106@adelaide.edu.au> There's actually quite a lot about this project -- "Google Print" -- to be learned from their own web site: http://print.google.com/ which may dispel some of the misconceptions I see being bandied about. It also raises some more questions. (E.g. it is not at all clear whether their "print only" policy will apply to everything, or only the copyright books.) But also, to keep this in perspective, it may be worth remembering the recent Google stock float, which may have some influence on the timing of this press release and the previous (last week's) release about Google Scholar. Clearly, these things are going to do no harm to their stock price. Google Print has a long way to go, and I wish them well. Steve -- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. From bill at janssen.org Thu Dec 16 00:07:12 2004 From: bill at janssen.org (Bill Janssen) Date: Fri Dec 17 16:12:24 2004 Subject: [ebook-community] Re: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: Your message of "Wed, 15 Dec 2004 20:39:35 PST." <19677883578.20041215213935@noring.name> Message-ID: <04Dec16.000715pst."58617"@synergy1.parc.xerox.com> > It's entirely possible that Google will give, upon request, the page > scans for any public domain books they've scanned to established > groups like Distributed Proofreaders for conversion into proofed SDT, > so long as Google gets a copy of the resulting high-quality SDT. My guess is that part of the deal is that the libraries are going to get copies of those page scans, and they will probably make them available in various ways in addition to whatever Google does with them. By the way, it's astonishing to me how far OCR has come in the last 10 years. I think the low cost of storage has made page image storage of many historical documents feasible, relatively suddenly, and that means that the problem of OCR'ing handwritten text, odd fonts, early books, and other similar things has suddenly become a hot research topic. Bill From jlinden at ticluse.com Fri Dec 17 13:30:44 2004 From: jlinden at ticluse.com (James Linden) Date: Fri Dec 17 16:12:26 2004 Subject: [gutvol-d] [BP] "A call to Arms" (fwd) In-Reply-To: References: Message-ID: <41C35004.4030004@ticluse.com> http://www.kodekrash.com/project/calltoarms.txt -- James Michael Hart wrote: > > Can anyone send me a plain text copy? > > http://www.library.unisa.edu.au/about/papers/calltoarms.pdf > > > Thanks & Best Regards, > Veenu > > [Moderator: The URL above is for an 11-page paper titled > "A Call to Arms: What in the World is Happening to Information?" > by Karen Williams, reference librarian at the University of South > Australia. > > In the abstract, she writes: "We are fighting a battle, and that battle > is all about the provision of and access to information. This paper > looks > briefly at how the provision of information has created gaps between > those > who have access, and those who do not .... Only if national information > policies are moulded with a basis of equal access for all will the > future be brighter.... The author concludes that much can be done > by librarians..." - JMO] > ----------------------------------------------------------------------------- > > This message was sent via the Book People mailing list. > Posting address: spok+bookpeople@cs.cmu.edu > Admin. & unsubscribe address: spok+bookpeople-request@cs.cmu.edu > Charter: > http://onlinebooks.library.upenn.edu/bplist/ > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > From hart at pglaf.org Sat Dec 18 09:25:51 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Dec 18 09:25:53 2004 Subject: [ebook-community] Re: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: <04Dec16.000715pst."58617"@synergy1.parc.xerox.com> References: <04Dec16.000715pst."58617"@synergy1.parc.xerox.com> Message-ID: On Thu, 16 Dec 2004, Bill Janssen wrote: >> It's entirely possible that Google will give, upon request, the page >> scans for any public domain books they've scanned to established >> groups like Distributed Proofreaders for conversion into proofed SDT, >> so long as Google gets a copy of the resulting high-quality SDT. > > My guess is that part of the deal is that the libraries are going to > get copies of those page scans, and they will probably make them > available in various ways in addition to whatever Google does with them. AFAIK each library will keep the scans of their own books, and z/j/ealously guard them. . . . > By the way, it's astonishing to me how far OCR has come in the last 10 > years. I think the low cost of storage has made page image storage of > many historical documents feasible, relatively suddenly, and that > means that the problem of OCR'ing handwritten text, odd fonts, early > books, and other similar things has suddenly become a hot research topic. I heard they are still having huge troubles with older books. . . . mh From hart at pglaf.org Sat Dec 18 09:28:11 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Dec 18 09:28:13 2004 Subject: [gutvol-d] Re: Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: <41C11844.9060106@adelaide.edu.au> References: <41BEB3A3.1030805@adelaide.edu.au> <41C11844.9060106@adelaide.edu.au> Message-ID: On Thu, 16 Dec 2004, Steve Thomas wrote: > There's actually quite a lot about this project -- "Google Print" -- to be > learned from their own web site: > > http://print.google.com/ > > which may dispel some of the misconceptions I see being bandied about. It > also raises some more questions. (E.g. it is not at all clear whether their > "print only" policy will apply to everything, or only the copyright books.) You can only SEE a few pages from the copyrighted books, but can NEVER print ANY pages from ANY books, as far as I can tell. > But also, to keep this in perspective, it may be worth remembering the recent > Google stock float, which may have some influence on the timing of this press > release and the previous (last week's) release about Google Scholar. Clearly, > these things are going to do no harm to their stock price. As far as I can tell, the timing was based on being exacly one year from our meeting with them when we pitched the eLibrary idea at their headquarters, after which the silence was deafening. . .no replies to emails. . . . > mh From stephen.thomas at adelaide.edu.au Sun Dec 19 20:48:30 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Sun Dec 19 20:48:50 2004 Subject: [gutvol-d] Re: Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: <41C11844.9060106@adelaide.edu.au> References: <41BEB3A3.1030805@adelaide.edu.au> <41C11844.9060106@adelaide.edu.au> Message-ID: <41C6599E.4000008@adelaide.edu.au> Playing around with this a little more, I found something interesting: First I did a standard Google for "William Morris". The results page lists one book result at the top, which takes you to a page for the Cambridge Uni. press edition of "News from nowhere". (Surely they have other editions/titles too?) This only provides a few pages, but if you use "Search within this book" you can get other pages. Specifically, I typed in the keyword "the" and seem to have made it list ALL the pages (assuming the word "the" would appear on every page). They have used some Javascript magic to prevent you from saving the page images. I dare say someone will figure out a way round that. Also they seem to have some kind of counter that stops you viewing too many pages at once. All of this is aimed at their publisher market of course -- they want publishers to let them scan their books ard make them searchable, and this is the trade-off. Still not clear whether they'll treat the PD stuff from librarys the same way. Steve -- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. From stephen.thomas at adelaide.edu.au Sun Dec 19 21:32:12 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Sun Dec 19 21:32:31 2004 Subject: [gutvol-d] Re: Google Partners with Oxford, Harvard & Others to Digitize Libraries In-Reply-To: <41C6599E.4000008@adelaide.edu.au> References: <41BEB3A3.1030805@adelaide.edu.au> <41C11844.9060106@adelaide.edu.au> <41C6599E.4000008@adelaide.edu.au> Message-ID: <41C663DC.6080006@adelaide.edu.au> Strangely, searching for "William Morris News from Nowhere" does NOT bring up the book link! So I guess Google still have a few wrinkles to iron out. Steve Steve Thomas wrote: > Playing around with this a little more, I found something interesting: > > First I did a standard Google for "William Morris". > > The results page lists one book result at the top, which takes you to a > page for the Cambridge Uni. press edition of "News from nowhere". > (Surely they have other editions/titles too?) > > This only provides a few pages, but if you use "Search within this book" > you can get other pages. Specifically, I typed in the keyword "the" and > seem to have made it list ALL the pages (assuming the word "the" would > appear on every page). > > They have used some Javascript magic to prevent you from saving the page > images. I dare say someone will figure out a way round that. Also they > seem to have some kind of counter that stops you viewing too many pages > at once. > > All of this is aimed at their publisher market of course -- they want > publishers to let them scan their books ard make them searchable, and > this is the trade-off. > > Still not clear whether they'll treat the PD stuff from librarys the > same way. > > > Steve > -- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. From jonhendry at mac.com Sun Dec 19 22:48:42 2004 From: jonhendry at mac.com (Jonathan Hendry) Date: Sun Dec 19 22:49:11 2004 Subject: [gutvol-d] 'CDDB' for Gutenberg texts Message-ID: <300811F0-5253-11D9-ABD1-000A956D5546@mac.com> Hi, I'm new here. I hope this isn't out of place. I'm working on a Mac OS X program for reading Gutenberg e-texts. It occurs to me that it would be useful if there were something for Gutenberg e-texts akin to the CDDB database for MP3s. It would hold information about e-texts, keyed to the Gutenberg filename. The sort of information stored would be things like long-format titles, author's name, information about the Gutenberg file if it's a revision, information about the original source text, etc. This would all be useful for developers of ebook readers, or web interfaces to the Gutenberg texts. This information is often available in the files themselves, but it would be difficult to extract it through software. It might be extended to include character lists for novels or plays, synopses, summaries, connections to other works, byte offsets to chapter starts, file-specific aids to parsing, and other useful bits of information. The information would be supplied by users, piece by piece, similar to the way people submit track listings to CDDB. Ideally, etext reader apps would have a UI for entering and uploading new information. There'd be no change to the Gutenberg files themselves. The meta-info would all be kept apart from the e-texts. So the format need not change, old texts wouldn't need updating, and the files would remain universally compatible. If the user has an etext program which supports it, then after downloading a text, they would have the option download the meta-info from a separate 'gtdb' server. The program could then use the meta-info to enhance the user interface. Naturally, the "gtdb" database would be non-commercial, and in some non-proprietary format, and/or available as SQL dumps. So, my questions. 1) Is anyone working on such a thing already? 2) Has such a thing been discussed? 3) Does anyone else think it'd be a good thing? Thanks, Jon From hacker at gnu-designs.com Sun Dec 19 23:35:55 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Sun Dec 19 23:37:03 2004 Subject: [gutvol-d] 'CDDB' for Gutenberg texts In-Reply-To: <300811F0-5253-11D9-ABD1-000A956D5546@mac.com> References: <300811F0-5253-11D9-ABD1-000A956D5546@mac.com> Message-ID: > It occurs to me that it would be useful if there were something for > Gutenberg e-texts akin to the CDDB database for MP3s. You mean like the RDF catalog of all of the Gutenberg texts? http://gutenberg.net/browse/rdf/catalog.rdf.bz2 I've posted perl here before that splits this apart and imports it into SQL in about 8 lines of code. Search the archives. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From hart at pglaf.org Mon Dec 20 09:55:46 2004 From: hart at pglaf.org (Michael Hart) Date: Mon Dec 20 09:55:49 2004 Subject: [gutvol-d] Need Help With MS Word File Message-ID: A volunteer who has been working on a book for us for three years is very near completion, but can't do any more due to medical issues. If anyone is willing to take a look at this book and help get it into a final format, please let me know. Forwarded message: Date: Mon, 20 Dec 2004 11:13:38 -0500 From: Jeanette Hayward To: hart@beryl.ils.unc.edu Subject: Ebook submission Hello, My name is Jeanette Hayward. I began transcribing Henry A. Beers, A History of English Romanticism in the Eighteenth Century nearly 3 years ago. Because of a number of issues,I am just now getting to the finished state. Unfortunately, I will not be able to do much more with the transcription. But, I did want to submit the work because I feel it is important to be able to add it to the collection. I tried sending this message to the submission team; however, my ISP apparently had difficulty recognizing the address as a valid e-mail address. *** My HUGE thanks to anyone who can help! Michael From nihil_obstat at mindspring.com Mon Dec 20 14:42:02 2004 From: nihil_obstat at mindspring.com (Dennis McCarthy) Date: Mon Dec 20 14:42:14 2004 Subject: [gutvol-d] Need Help With MS Word File Message-ID: <3221311.1103582522659.JavaMail.root@wamui10.slb.atl.earthlink.net> The original message was a little vague on the details, so I contacted Jeanette Hayward for some more information the the help request for: Henry A. Beers A History of English Romanticism in the Eighteenth Century Mrs. Hayward began this text three years ago, but has had to stop in a near finished state to deal with serious family medical issuse. A friend loaned some webspace, so you may download a sample 10 pages of the text at: http://www.lakeclaire.org/beers/beers_sample.doc The full text is about 250 pages worth of text. If like what you see, and wish to contact Mrs. Hayward to adopt this project, please contact her at: jeanett@teacher.com She can also give you the web address for the full as-is transcription. -------------- "Finished state" refers to the proofing, and/or final version (HTML, etc). She has typed in everything except the index within the book which isn't necessary in this format. Currently, it is in MS-Word, but can be converted to anything you wish to use. She has proofed as she went along, but as always, it would be better to have someone else proof also. There are some "misspellings" but those are the author's words, not typos. So, she is willing to ship the original copy to the proofer. -----Original Message----- From: Michael Hart Sent: Dec 20, 2004 12:55 PM To: Project Gutenberg Whitewashers , The gutvol-d Mailing List Subject: [gutvol-d] Need Help With MS Word File A volunteer who has been working on a book for us for three years is very near completion, but can't do any more due to medical issues. If anyone is willing to take a look at this book and help get it into a final format, please let me know. Forwarded message: Date: Mon, 20 Dec 2004 11:13:38 -0500 From: Jeanette Hayward To: hart@beryl.ils.unc.edu Subject: Ebook submission Hello, My name is Jeanette Hayward. I began transcribing Henry A. Beers, A History of English Romanticism in the Eighteenth Century nearly 3 years ago. Because of a number of issues,I am just now getting to the finished state. Unfortunately, I will not be able to do much more with the transcription. But, I did want to submit the work because I feel it is important to be able to add it to the collection. I tried sending this message to the submission team; however, my ISP apparently had difficulty recognizing the address as a valid e-mail address. *** My HUGE thanks to anyone who can help! Michael _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d --------------------------- Dennis McCarthy nihil_obstat@mindspring.com From ajhaines at shaw.ca Tue Dec 21 10:35:10 2004 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Tue Dec 21 10:35:10 2004 Subject: [gutvol-d] Help with German text Message-ID: <000c01c4e78b$cdd9a600$6401a8c0@ahainesp2600> I'm working on a book that's mostly in English, but that has several short passages in German. Each German passage is immediately followed by its English translation. According to my research, the German material is using the Fraktur alphabet. None of the German characters appear to use any accenting - umlauts, etc. I think I've generally managed to transliterate the German characters into their English equivalents, but since I don't understand German, I'm baffled as to how to tell the difference, in some contexts, between its lower-case f's and s's, and between its lower-case k's and t's. Is there someone out there to whom I can send the six page scans involved and their matching text files, to have my transliteration checked? Al -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041221/de26d9f1/attachment.html From hart at pglaf.org Thu Dec 23 09:58:14 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 23 09:58:17 2004 Subject: [gutvol-d] !@!Re: E-DOCS: Google Print Questions [J. Roland] In-Reply-To: <20041223031630.16124F2548@boggle.pobox.com> References: <20041223031630.16124F2548@boggle.pobox.com> Message-ID: On Wed, 22 Dec 2004, Lloyd Benson wrote: > From: Jon Roland > Date: Tue, 21 Dec 2004 14:50:22 -0600 > Subject: Google Print questions > > The announcements would seem to suggest that Google intends to not only scan > the images of all these books, but to OCR and correct the recognition errors > of all of them, so they can be made searchable, offer the complete texts of > all the public domain works, and excerpts of the copyrighted ones (presumably > under the fair use doctrine). One announcement also estimated a cost of $10 > per volume. Project Gutenberg has already produced and distributed nearly 15,000 eBooks, with a budget that has yet to reach a significant total for all 33+ years, and is projected to reach a million eBooks without undue expense or effort. We'll just have to wait and see if either Google Print, or any of the various "Million eBook Projects" will ever come up with even 1% of a million eBooks that you can carry with you on a one inch stack of plain homemade DVDs. If it hasn't been proofread, and if you can't take it with you, it is only of limited value. . .sort of like reading over someone's shoulder. With Project Gutenberg eBooks, you OWN them. . .forever. . .and can save them in your own favorite formats, fonts, margination, pagination, or whatever, and you can search, quote, print, and do all the normal eBook fuctions. "A picture of a book is not an eBook." The term eBook should not be used to describe raw scans or raw OCR, as has been tried by some of the Google and "Million eBook" particpants over the past decade. I would say that an eBook has to be at least 99.9% accurate, and that it should then be a process as people read the eBooks, to send in corrections. Most of the Project Gutenberg and Distributed Proofeaders would say it has to be over 99.99% and perhaps even over 99.999%. 99.999% would be one error perhaps every 100 pages or so, and I'm pretty sure the source materals we have are not that accurate. . .not that eBooks won't become more and more accurate, closer and closer to 100% accuracy, but I'm not sure they have to be all the much better than 99.9% before they can be made available. > This is highly ambitious, even to scan the images. The experience of the U. > of Michigan should show that it is not feasible to OCR these works accurately > for that cost, or in that timeframe. While uncorrected OCRs might enable > search, since most words appear more than once in a work, and at least one of > them might be expected to be recognized accurately, searching on entire > phrases could be expected to be much more problematic. I have heard this described before. . .has anyone tried their test eBooks??? > As one who works from a lot of older works to not only scan and OCR but > correct them, I know how much human labor is involved. There are volunteer > efforts like Distributed Proofreaders http://www.pgdp.net/c/default.php , but > I have concluded that it takes me more time to set up a project for them than > it would take for me to do the proofreading myself, and my work would likely > be more accurate, since I would understand the underlying content and know > how to render obscure text. While it does take a little time to set up one's first project with the Distributed Proofreaders, it is usually quite a bit easier the second time, not to mention that we have volunteers who will walk you through processes the first few times around, which seems to do the trick for nearly everyone. > So my basic question and concern is, how do we ensure that this project does > not release too many uncorrected texts into the world that never get > corrected, and perhaps propagate errors that come to be accepted as accurate > even when they are not? I wonder how many of these will be "released into the world". . .I have a strong suspicion that the answer is "none." Unless some outside source does it. > I would submit that it would be better to prioritize these works and release > fully corrected and annotated digital editions of the most important first, > going for quality rather than quantity. This has been the approach used by > the online collections such as ours at > http://www.constitution.org/liberlib.htm Although we do put some works up > before the correcting and reformatting is finished, we always flag those that > are still in progress, indicating the state of completion, and we stand by to > quickly make corrections that outsiders may discover are needed. I view all eBooks as "still in progress" as I have never proofread one in which I didn't find any mistakes. . . . My own views are that I would prefer to have access to twice as many eBooks at the 99.95% accuracy level [the Library on Congress standard] than half as many at the 99.995% level I think is being suggested here. After all, the books that get read the most will be the ones that get the most corrections. . .an obvious way to aim effort at the proper targets! Not only that, but, viewing the entire eBook effort as a 50 year process, of which I have walked 33+ years, I must state for the record that I think OCR, spellcheckers, grammarcheckers., etc. will be so much better a decade from now that doing the proofreading on the more obscure works will require so much less effort than it does today, that it will be a great trade-off. I'm not at all sure why people want eBooks to be so perfect to start with. I would prefer to get all 10 million public domain works we can find. . . or at least a million of them. . .online and freely downloadable before we try to approach the 100% accuracy level. Of course, I don't believe in the "raw OCR" idea that seems to be what the Google Print idea has in mind, even with spelling and "scanno" checkers, and I also don't believe in going so far in the other direction that we try for such accuracy levels that the number of eBooks only grows at half the rate it has been growing. The path is obviously somewhere in the middle. . .machine production is obviously not accurate enough [except in certain tests I have seen run with high contrast new materials] and after a certain point it becomes inefficient to keep proofreading before letting the public have access. After all, the public IS what this is all about, is it not? So let's let the public do the final proofreading, as a process, for all the years to come. . .at least until we have OCR that makes only 1 error in a million characters. . .and thus most of the errors we find are from the original publications. [Bye the bye, this is one of the reasons for using more than one paper edition to produce an eBook, when multiples of paper editions are available. Then the machine processes can compare to find even more errors. Well, enough now. . .let's make more eBooks!!! Thanks!!! So Nice To Hear From You! Happy Holidays!!! Michael Give FreeBooks!!! In 39 Languages!!! As of December 23, 2004 ~14,780 FreeBooks at: ~220 to go to 15,000 http://www.gutenberg.org http://www.gutenberg.net We are ~95% of the way from 10,000 to 15,000. Now even more PG eBooks In 104 Languages!!! http://gutenberg.cc http://gutenberg.us Michael S. Hart Project Gutenberg Executive Coordinator^M "*Internet User ~#100*" If you do not receive a prompt reply, please resend, keep resending. From joshua at hutchinson.net Thu Dec 23 10:12:52 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Dec 23 10:12:59 2004 Subject: [gutvol-d] !@!Re: E-DOCS: Google Print Questions [J. Roland] Message-ID: <20041223181252.F236510992F@ws6-4.us4.outblaze.com> ----- Original Message ----- From: "Michael Hart" > > As one who works from a lot of older works to not only scan and OCR but > > correct them, I know how much human labor is involved. There are volunteer > > efforts like Distributed Proofreaders http://www.pgdp.net/c/default.php , > > but I have concluded that it takes me more time to set up a project for them > > than it would take for me to do the proofreading myself, and my work would > > likely be more accurate, since I would understand the underlying content and > > know how to render obscure text. > > While it does take a little time to set up one's first project with the > Distributed Proofreaders, it is usually quite a bit easier the second time, > not to mention that we have volunteers who will walk you through processes > the first few times around, which seems to do the trick for nearly everyone. > I just want to make a quick comment on this part (since I somehow missed the initial e-mail). Setting up projects at DP is not time consuming (well, the upload of the image files can be, depending on your internet connection), especially once you've done it a few times. As one of the larger DP project managers (currently at 687 projects created for DP), I can tell you that there is NO WAY to proof even an easy text in the amount of time it takes to create and upload the project to DP. Even if I take into account OCR time (which I batch up and run overnight), it is still less time than I would take to proof the work. I can also reiterated Michael's comment that there are plenty of folks ready to help out new content providers on their first few projects. It can be a little daunting the first time, but it gets easier once you've done a couple times. Also, for folks that don't want to get heavily involved, we can usually work something out with someone that just wants to provide the image scans. We can usually take it from there (assuming they are public domain scans, of course). Josh PS I also haven't created any new projects in many months because of the backlog we've got in the system. I wanted to help clear out some more work before sending more into the queue. So those 687 were done in a much shorter frame of time than my login statistics at DP might otherwise imply. From stephen.thomas at adelaide.edu.au Wed Dec 22 19:36:44 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Thu Dec 23 10:37:41 2004 Subject: [gutvol-d] Internet Archive to build alternative to Google [Print] Message-ID: <41CA3D4C.1030002@adelaide.edu.au> Yet another story -- this one on an alternative to GP: http://www.iwr.co.uk/IWR/1160176 -- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. From nwolcott at dsdial.net Thu Dec 23 08:58:26 2004 From: nwolcott at dsdial.net (N Wolcott) Date: Thu Dec 23 10:37:42 2004 Subject: [gutvol-d] Need Help With MS Word File References: <3221311.1103582522659.JavaMail.root@wamui10.slb.atl.earthlink.net> Message-ID: <00bb01c4e910$a4443ae0$7d9895ce@gw98> ok ----- Original Message ----- From: Dennis McCarthy To: Michael S. Hart ; Project Gutenberg Volunteer Discussion Cc: Sent: Monday, December 20, 2004 5:42 PM Subject: Re: [gutvol-d] Need Help With MS Word File > > The original message was a little vague on the details, so I contacted Jeanette Hayward for some more information the the help request for: > > Henry A. Beers > A History of English Romanticism in the Eighteenth Century > > > Mrs. Hayward began this text three years ago, but has had to stop in a near finished state to deal with serious family medical issuse. > > A friend loaned some webspace, so you may download a sample 10 pages of the text at: > http://www.lakeclaire.org/beers/beers_sample.doc > > The full text is about 250 pages worth of text. > > If like what you see, and wish to contact Mrs. Hayward to adopt this project, please contact her at: jeanett@teacher.com > > She can also give you the web address for the full as-is transcription. > > -------------- > > "Finished state" refers to the proofing, and/or final version (HTML, etc). She has typed in everything except the index within the book which isn't necessary in this format. > > Currently, it is in MS-Word, but can be converted to anything you wish to use. > > She has proofed as she went along, but as always, it would be better to have someone else proof also. > > There are some "misspellings" but those are the author's words, not typos. So, she is willing to ship the original copy to the proofer. > > > > -----Original Message----- > From: Michael Hart > Sent: Dec 20, 2004 12:55 PM > To: Project Gutenberg Whitewashers , > The gutvol-d Mailing List > Subject: [gutvol-d] Need Help With MS Word File > > > A volunteer who has been working on a book for us for three years > is very near completion, but can't do any more due to medical issues. > > If anyone is willing to take a look at this book and help get it > into a final format, please let me know. > > Forwarded message: > > Date: Mon, 20 Dec 2004 11:13:38 -0500 > From: Jeanette Hayward > To: hart@beryl.ils.unc.edu > Subject: Ebook submission > > Hello, > My name is Jeanette Hayward. I began transcribing Henry A. Beers, > A History of English Romanticism in the Eighteenth Century nearly 3 > years ago. Because of a number of issues,I am just now getting to the > finished state. Unfortunately, I will not be able to do much more > with the transcription. But, I did want to submit the work because > I feel it is important to be able to add it to the collection. > > I tried sending this message to the submission team; however, my ISP > apparently had difficulty recognizing the address as a valid e-mail > address. > > *** > > My HUGE thanks to anyone who can help! > > Michael > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > --------------------------- > Dennis McCarthy > nihil_obstat@mindspring.com > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From nwolcott at dsdial.net Thu Dec 23 09:18:29 2004 From: nwolcott at dsdial.net (N Wolcott) Date: Thu Dec 23 10:37:44 2004 Subject: [gutvol-d] test if going through Message-ID: <018801c4e913$a03e3ba0$7d9895ce@gw98> test message x -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041223/d7dc03d5/attachment.html From Gutenberg9443 at aol.com Thu Dec 23 15:38:36 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Thu Dec 23 15:38:57 2004 Subject: [gutvol-d] Fwd: Project Googleberg Message-ID: <110.3fbf95f2.2efcb0fc@aol.com> Skipped content of type multipart/alternative-------------- next part -------------- An embedded message was scrubbed... From: Gutenberg9443@aol.com Subject: Re: Project Googleberg Date: Thu, 23 Dec 2004 18:35:27 EST Size: 6675 Url: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041223/cda11de6/attachment.mht From hacker at gnu-designs.com Thu Dec 23 16:41:13 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Thu Dec 23 16:42:10 2004 Subject: [gutvol-d] Fwd: Project Googleberg In-Reply-To: <110.3fbf95f2.2efcb0fc@aol.com> References: <110.3fbf95f2.2efcb0fc@aol.com> Message-ID: > [ Part 2: "Included Message" ] > Date: Thu, 23 Dec 2004 18:35:27 EST > From: Gutenberg9443@aol.com > To: hart@pobox.com > Subject: Re: Project Googleberg First and foremost, when composing email, the best place for your text is in the the _body_ of the email, not sent as an attachment (a non-RFC-compliant attachment at that). Please don't do that. > I have examined what seems to be the preliminary "Googleprint" > catalog. It consists of books scanned and posted by other people > including us. At least half of them that I looked at are available > only as page scans, and I have to want a book an awful lot to put > page scans together just for my own use. They use a LOT of our > books; in fact, everything they have that they are aware we have > shows us as the best or only site to go and get the book. Second, when you wish to post to a mailing list about a particular subject, it is best to read the archives first, in full, so you can see if the subject or question you were about to ask has been discussed before, as this one has. Please go back and re-read the archives of the last few weeks to bring yourself up to speed on the issues, concerns, support, and other items related to "Google Print". David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From stephen.thomas at adelaide.edu.au Thu Dec 23 17:00:01 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Thu Dec 23 17:00:15 2004 Subject: [gutvol-d] Fwd: Project Googleberg In-Reply-To: References: <110.3fbf95f2.2efcb0fc@aol.com> Message-ID: <41CB6A11.5010104@adelaide.edu.au> Hey, who put David D. in charge of the Internet?! David, if you can't cope with the way other people do things on the 'net, you'd better save yourself a lot of grief and leave now. Or you could try a little tolerance, and accept that some people, not having your supreme level of skill, will occasionally do "the wrong thing". Get over it man, and while I'm on your case, get some manners. Sheesh. Merry Christmas all. Steve David A. Desrosiers wrote: >> [ Part 2: "Included Message" ] > > >>Date: Thu, 23 Dec 2004 18:35:27 EST >>From: Gutenberg9443@aol.com >>To: hart@pobox.com >>Subject: Re: Project Googleberg > > > First and foremost, when composing email, the best place for > your text is in the the _body_ of the email, not sent as an attachment > (a non-RFC-compliant attachment at that). > > Please don't do that. > > >>I have examined what seems to be the preliminary "Googleprint" >>catalog. It consists of books scanned and posted by other people >>including us. At least half of them that I looked at are available >>only as page scans, and I have to want a book an awful lot to put >>page scans together just for my own use. They use a LOT of our >>books; in fact, everything they have that they are aware we have >>shows us as the best or only site to go and get the book. > > > Second, when you wish to post to a mailing list about a > particular subject, it is best to read the archives first, in full, so > you can see if the subject or question you were about to ask has been > discussed before, as this one has. > > Please go back and re-read the archives of the last few weeks > to bring yourself up to speed on the issues, concerns, support, and > other items related to "Google Print". > > > David A. Desrosiers > desrod@gnu-designs.com > http://gnu-designs.com > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d -- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. From holden.mcgroin at dsl.pipex.com Thu Dec 23 20:04:10 2004 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Thu Dec 23 20:04:30 2004 Subject: [gutvol-d] Ibn Batuta (Was Re: Fwd: Project Googleberg) In-Reply-To: <110.3fbf95f2.2efcb0fc@aol.com> References: <110.3fbf95f2.2efcb0fc@aol.com> Message-ID: <41CB953A.50402@dsl.pipex.com> Gutenberg9443@aol.com wrote: > By the way, does ANYBODY know where we can get a public domain copy of > Ibn Batuta? I've had no luck finding one online. I even asked the king > of Saudi Arabia for a copy, but His Majesty didn't answer. The few > snippets I've seen are fascinating. He left his home to go on a haj, and > then kept going, spending 29 years travelling and writing fascinating > notes of where he went, namely everywhere you could get to without going > to Arctica, Antarctica, or the Americas. I have to agree with Anne. Every time I hear about Ibn Batuta's amazing travels, I feel the urge to read his writings. Is there any chance we could get them online as part of Gutenberg's collection? Cheers, Holden From hacker at gnu-designs.com Thu Dec 23 20:20:22 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Thu Dec 23 20:21:22 2004 Subject: [gutvol-d] Fwd: Project Googleberg In-Reply-To: <41CB6A11.5010104@adelaide.edu.au> References: <110.3fbf95f2.2efcb0fc@aol.com> <41CB6A11.5010104@adelaide.edu.au> Message-ID: > if you can't cope with the way other people do things on the 'net, > you'd better save yourself a lot of grief and leave now. Its important for others to realize that not everyone reads their email on desktop machines, or on fully-featured email clients. What about text-to-speech readers and PDAs? Its best to stick to the standards, and not make up your own. Open your eyes, and realize the world isn't just like you. > Or you could try a little tolerance, and accept that some people, > not having your supreme level of skill, will occasionally do "the > wrong thing". I find this to be the case in a lot of things, unfortunately. > Get over it man, and while I'm on your case, get some manners. I've got plenty of manners, but thanks for pointing it out to others who may not have the same "supreme" level of diplomacy that I often exhibit. Google for my name, if you feel I'm some sort of rude person without manners. You might be surprised at what you find. > Sheesh. Merry Christmas all. Happy Christmahanakwanzaka to all as well. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From sly at victoria.tc.ca Thu Dec 23 23:15:38 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Dec 23 23:15:59 2004 Subject: [gutvol-d] Ibn Batuta In-Reply-To: <41CB953A.50402@dsl.pipex.com> References: <110.3fbf95f2.2efcb0fc@aol.com> <41CB953A.50402@dsl.pipex.com> Message-ID: On Fri, 24 Dec 2004, Holden McGroin wrote: > I have to agree with Anne. Every time I hear about Ibn Batuta's amazing > travels, I feel the urge to read his writings. Is there any chance we > could get them online as part of Gutenberg's collection? Of course there is. I believe it would just be a matter of how much effort and expense some volunteers would like to go to in order to make it happen. Oh, and a bit of luck too. After reading what I could find about Ibn Batuta, I agree, this could be worth searching out... (I would imagine in an english translation) Andrew From Gutenberg9443 at aol.com Fri Dec 24 10:26:54 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Dec 24 10:27:17 2004 Subject: [gutvol-d] Fwd: Project Googleberg Message-ID: In a message dated 12/23/2004 5:42:07 PM Mountain Standard Time, hacker@gnu-designs.com writes: Second, when you wish to post to a mailing list about a particular subject, it is best to read the archives first, in full, so you can see if the subject or question you were about to ask has been discussed before, as this one has. Please go back and re-read the archives of the last few weeks to bring yourself up to speed on the issues, concerns, support, and other items related to "Google Print". Thank you for your suggestions. Now I am going to ignore them completely, as I have far too much to do to go back and reread all the archives. I prefer to risk being redundant than to risk leaving a question unanswered. A question was asked. I looked into the matter. I answered the question. Period. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041224/98b2af58/attachment.html From Gutenberg9443 at aol.com Fri Dec 24 10:29:46 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Dec 24 10:29:58 2004 Subject: [gutvol-d] Fwd: Project Googleberg Message-ID: <9a.1c8862d0.2efdba1a@aol.com> In a message dated 12/23/2004 5:42:07 PM Mountain Standard Time, hacker@gnu-designs.com writes: First and foremost, when composing email, the best place for your text is in the the _body_ of the email, not sent as an attachment (a non-RFC-compliant attachment at that). Please don't do that. Where was the attachment? What was the attachment? I ask because I didn't remember an attachment and when I went back and looked at my "send" list I didn't find an attachment. Therefore, if there was an attachment, somebody else attached it and I'd like to know who, when, how, and why. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041224/3de05947/attachment-0001.html From jlinden at pglaf.org Fri Dec 24 10:35:19 2004 From: jlinden at pglaf.org (James Linden) Date: Fri Dec 24 10:37:34 2004 Subject: [gutvol-d] Fwd: Project Googleberg In-Reply-To: <9a.1c8862d0.2efdba1a@aol.com> References: <9a.1c8862d0.2efdba1a@aol.com> Message-ID: <41CC6167.10704@pglaf.org> Anne, All your messages come through with an attachment because you use HTML formatted mail. While I do hate HTML email, it's nothing to worry about. -- James Gutenberg9443@aol.com wrote: > In a message dated 12/23/2004 5:42:07 PM Mountain Standard Time, > hacker@gnu-designs.com writes: > > First and foremost, when composing email, the best place for > your text is in the the _body_ of the email, not sent as an attachment > (a non-RFC-compliant attachment at that). > > Please don't do that. > > Where was the attachment? What was the attachment? I ask because I > didn't remember an attachment and when I went back and looked at my > "send" list I didn't find an attachment. Therefore, if there was an > attachment, somebody else attached it and I'd like to know who, when, > how, and why. > > Anne > >------------------------------------------------------------------------ > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > From marcello at perathoner.de Fri Dec 24 09:37:25 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Dec 24 10:38:32 2004 Subject: [gutvol-d] !@!Re: E-DOCS: Google Print Questions [J. Roland] In-Reply-To: References: <20041223031630.16124F2548@boggle.pobox.com> Message-ID: <41CC53D5.3010005@perathoner.de> Michael Hart wrote: > Project Gutenberg has already produced and distributed nearly 15,000 > eBooks, > with a budget that has yet to reach a significant total for all 33+ years, > and is projected to reach a million eBooks without undue expense or effort. PG produces books at a lower cost only if you neglect the cost of volunteer work. I'm sure a big organized corporation like Google can create eBooks way cheaper than a loosely organized group of volunteers like PG. > We'll just have to wait and see if either Google Print, or any of the > various > "Million eBook Projects" will ever come up with even 1% of a million eBooks > that you can carry with you on a one inch stack of plain homemade DVDs. Whereas PG already has reached 1.5% of a million books with 98.5% still to go. > If it hasn't been proofread, and if you can't take it with you, it is only > of limited value. . .sort of like reading over someone's shoulder. Depends on what you want to do with the book. If you only want to cite some work a page scan (that you cannot take with you but is error-free) is much better than a proofread eBook (which may contain OCR errors). > With Project Gutenberg eBooks, you OWN them. . .forever. . .and can save > them > in your own favorite formats, fonts, margination, pagination, or whatever, > and you can search, quote, print, and do all the normal eBook fuctions. Yours forever ... until new copyright laws separate you. > I would say that an eBook has to be at least 99.9% accurate, and that it > should then be a process as people read the eBooks, to send in corrections. That is ~ 2 errors per page if you assume a line length of 55 and page length of 40 (~ 2000) chars. > Most of the Project Gutenberg and Distributed Proofeaders would say it has > to be over 99.99% and perhaps even over 99.999%. That is approx. one error every 5 pages or every 50 pages. Still not very good. > Not only that, but, viewing the entire eBook effort as a 50 year process, > of which I have walked 33+ years, I must state for the record that I think > OCR, spellcheckers, grammarcheckers., etc. will be so much better a decade > from now that doing the proofreading on the more obscure works will require > so much less effort than it does today, that it will be a great trade-off. Which poses the question: isn't Google's approach to just scan the books today and wait, better suited to achieve the 1 million target? Every progress in OCR technology automatically "proof-reads" all books Google has scanned. -- Marcello Perathoner webmaster@gutenberg.org From sharris at steveharris.net Thu Dec 23 20:55:10 2004 From: sharris at steveharris.net (steve harris) Date: Fri Dec 24 14:21:21 2004 Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta In-Reply-To: <41CB953A.50402@dsl.pipex.com> Message-ID: The Library of Congress doesn't list anything before 1929. The British Library shows: Author - personal Batu?ta, Ibn. Title The travels of Ibn Batu?ta : translated from the abridged manuscript copies, preserved in the public library of Cambridge with notes, illustrative of the history, geography, botany, antiquities, &c. occurring through the work / by Samuel Lee. Publisher/year London : Darf, 1984, 1829. Added name Lee, Samuel. holdings (1) All items Holdings (BL) 89/27495 DSC Request ISBN 1850770352 Good Luck. Thx, steve h sharris@steveharris.net > -----Original Message----- > From: gutvol-d-bounces@lists.pglaf.org > [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Holden McGroin > Sent: Thursday, December 23, 2004 8:04 PM > To: Project Gutenberg Volunteer Discussion > Subject: [gutvol-d] Ibn Batuta (Was Re: Fwd: Project Googleberg) > > > Gutenberg9443@aol.com wrote: > > By the way, does ANYBODY know where we can get a public > domain copy of > > Ibn Batuta? I've had no luck finding one online. I even > asked the king > > of Saudi Arabia for a copy, but His Majesty didn't answer. The few > > snippets I've seen are fascinating. He left his home to go > on a haj, and > > then kept going, spending 29 years travelling and writing > fascinating > > notes of where he went, namely everywhere you could get to > without going > > to Arctica, Antarctica, or the Americas. > > I have to agree with Anne. Every time I hear about Ibn > Batuta's amazing > travels, I feel the urge to read his writings. Is there any chance we > could get them online as part of Gutenberg's collection? > > Cheers, > Holden > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d > From gbuchana at rogers.com Fri Dec 24 14:44:45 2004 From: gbuchana at rogers.com (Gardner Buchanan) Date: Fri Dec 24 14:45:07 2004 Subject: [gutvol-d] Ibn Batuta In-Reply-To: Message-ID: Andrew Sly wrote: > > > On Fri, 24 Dec 2004, Holden McGroin wrote: > >> I have to agree with Anne. Every time I hear about Ibn Batuta's amazing >> travels, I feel the urge to read his writings. Is there any chance we >> could get them online as part of Gutenberg's collection? > > Of course there is. I believe it would just be a matter of how much > effort and expense some volunteers would like to go to in order to > make it happen. Oh, and a bit of luck too. > > After reading what I could find about Ibn Batuta, I agree, this > could be worth searching out... (I would imagine in an english > translation) > The 1829 English translation by (Reverend) Samuel Lee looks like the best bet: The Travels of Ibn Batuta; Translated from the Abridged Arabic Manuscript Copies, preserved in the Public Library of Cambridge. Translated by Rev. Samuel Lee. London: Printed for the Oriental Translation Committee, 1829 I see a couple of 1985 re-prints going for ~$60US. The Gibb translation is too new. It looks like it was published in the 50s or so. Gibb lived until 1971. ============================================================ Gardner Buchanan Ottawa, ON FreeBSD: Where you want to go. Today. From Gutenberg9443 at aol.com Fri Dec 24 15:14:31 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Dec 24 15:14:48 2004 Subject: [gutvol-d] Fwd: Project Googleberg Message-ID: <1b9.98e68bb.2efdfcd7@aol.com> In a message dated 12/24/2004 11:37:36 AM Mountain Standard Time, jlinden@pglaf.org writes: All your messages come through with an attachment because you use HTML formatted mail. I didn't know that. What kind of an attachment is it? Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041224/f8238e44/attachment.html From flis at detk.com Fri Dec 24 17:50:04 2004 From: flis at detk.com (William Flis) Date: Fri Dec 24 17:43:38 2004 Subject: [gutvol-d] Fwd: Project Googleberg In-Reply-To: <1b9.98e68bb.2efdfcd7@aol.com> Message-ID: The attachment I get in a lot of mesages from this list says this: _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d Who would put such an attachment? Bill Flis -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Gutenberg9443@aol.com Sent: Friday, December 24, 2004 6:15 PM To: gutvol-d@lists.pglaf.org Subject: Re: [gutvol-d] Fwd: Project Googleberg In a message dated 12/24/2004 11:37:36 AM Mountain Standard Time, jlinden@pglaf.org writes: All your messages come through with an attachment because you use HTML formatted mail. I didn't know that. What kind of an attachment is it? Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041224/51e020b5/attachment.html From Gutenberg9443 at aol.com Fri Dec 24 18:15:30 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Dec 24 18:15:53 2004 Subject: [gutvol-d] Fwd: Project Googleberg Message-ID: <199.35331039.2efe2742@aol.com> In a message dated 12/24/2004 6:43:46 PM Mountain Standard Time, flis@detk.com writes: Who would put such an attachment? I haven't the foggiest. Some great guru of the Internet must have done it. I'm glad I'm not the culprit. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041224/db9b3605/attachment.html From gbuchana at rogers.com Fri Dec 24 19:41:10 2004 From: gbuchana at rogers.com (Gardner Buchanan) Date: Fri Dec 24 19:41:44 2004 Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta In-Reply-To: Message-ID: On 04:55:10 steve harris wrote: > The Library of Congress doesn't list anything before 1929. > > > Good Luck. > It's not as bad as all that. The Lee translation (1829) is available pretty easily. Amazon.com has an edition for $12. Look for ISBN 0486437655. There was a 1940s re-print of the Lee translation that seems to go for $60 used. The French 1859 translation by Defremery and Sanguinetti is based on more/better source material - but I don't read French. ============================================================ Gardner Buchanan Ottawa, ON FreeBSD: Where you want to go. Today. From hyphen at hyphenologist.co.uk Fri Dec 24 23:28:09 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Fri Dec 24 23:28:43 2004 Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta In-Reply-To: References: Message-ID: On Fri, 24 Dec 2004 22:41:10 -0500 (EST), Gardner Buchanan wrote: | | On 04:55:10 steve harris wrote: | > The Library of Congress doesn't list anything before 1929. | > | > | > Good Luck. | > | | It's not as bad as all that. The Lee translation (1829) is available | pretty easily. Amazon.com has an edition for $12. Look for ISBN | 0486437655. There was a 1940s re-print of the Lee translation that | seems to go for $60 used. | | The French 1859 translation by Defremery and Sanguinetti is based | on more/better source material - but I don't read French. The original ?Arabic? would be nice. -- Dave F From shalesller at writeme.com Fri Dec 24 23:55:11 2004 From: shalesller at writeme.com (D. Starner) Date: Fri Dec 24 23:55:30 2004 Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta Message-ID: <20041225075511.0EC874BDAB@ws1-1.us4.outblaze.com> "Dave Fawthrop" writes: > The original ?Arabic? would be nice. In etext form? Yes. In paper form, it'd be down right useless for PG. We just don't have anyone really capable of handling it. We don't have OCR--the Urdu team at DP-EU is completely type-in, and I don't know of anyone interested in proofing more than a few lines of Arabic. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From hyphen at hyphenologist.co.uk Sat Dec 25 01:11:03 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sat Dec 25 01:11:37 2004 Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta In-Reply-To: <20041225075511.0EC874BDAB@ws1-1.us4.outblaze.com> References: <20041225075511.0EC874BDAB@ws1-1.us4.outblaze.com> Message-ID: <8vaqs0plv9mb4vjj0d99lb4rg2ldi8bj7f@4ax.com> On Fri, 24 Dec 2004 23:55:11 -0800, "D. Starner" wrote: | "Dave Fawthrop" writes: | > The original ?Arabic? would be nice. | | In etext form? Yes. In paper form, it'd be down right useless for PG. | We just don't have anyone really capable of handling it. Maybe a friend of a friend of someone on gutvol-d? | We don't have | OCR--the Urdu team at DP-EU is completely type-in, and I don't know | of anyone interested in proofing more than a few lines of Arabic. There are quite a lot of people who can read Classical Arabic, though not perhaps in the USA. I ended guessed an Arabic word and ended up with a long discussion with a Muslim lady who is quadra lingual about the differences between Classical Arabic and modern Arabics. Unicode has all the characters required. Also right to left writing. Also because Arabic is a language designed for calligraphy, a page scan of a well written copy would be useful. Sorry but my knowledge of Arabic is theoretical :-( -- Dave F From hart at pglaf.org Sat Dec 25 10:14:07 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Dec 25 10:14:10 2004 Subject: !@!Re: [gutvol-d] RE: [gavel-d] Ibn Batuta In-Reply-To: <8vaqs0plv9mb4vjj0d99lb4rg2ldi8bj7f@4ax.com> References: <20041225075511.0EC874BDAB@ws1-1.us4.outblaze.com> <8vaqs0plv9mb4vjj0d99lb4rg2ldi8bj7f@4ax.com> Message-ID: If you can forward this to me as an attachment, I think I have someone who can proof it for you. Michael On Sat, 25 Dec 2004, Dave Fawthrop wrote: > On Fri, 24 Dec 2004 23:55:11 -0800, "D. Starner" > wrote: > > | "Dave Fawthrop" writes: > | > The original ?Arabic? would be nice. > | > | In etext form? Yes. In paper form, it'd be down right useless for PG. > | We just don't have anyone really capable of handling it. > > Maybe a friend of a friend of someone on gutvol-d? > > | We don't have > | OCR--the Urdu team at DP-EU is completely type-in, and I don't know > | of anyone interested in proofing more than a few lines of Arabic. > > There are quite a lot of people who can read Classical Arabic, though not > perhaps in the USA. I ended guessed an Arabic word and ended up with a > long discussion with a Muslim lady who is quadra lingual about the > differences between Classical Arabic and modern Arabics. > > Unicode has all the characters required. Also right to left writing. > > Also because Arabic is a language designed for calligraphy, a page scan of > a well written copy would be useful. > > Sorry but my knowledge of Arabic is theoretical :-( > > > -- > Dave F > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From shalesller at writeme.com Sat Dec 25 10:18:06 2004 From: shalesller at writeme.com (D. Starner) Date: Sat Dec 25 10:18:12 2004 Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta Message-ID: <20041225181806.8BEBC101D0@ws1-3.us4.outblaze.com> "Dave Fawthrop" writes: > Maybe a friend of a friend of someone on gutvol-d? I don't think it's helpful to try and push people into handling a entire specific book right at the start, especially without OCR. > There are quite a lot of people who can read Classical Arabic, There's a lot of language communities out there that PG doesn't have much contact with. > Unicode has all the characters required. Also right to left writing. Unicode has pretty much all the letters we need, sans the myriad varietes of early 20th-century phonetic characters. That doesn't mean we can transcribe them easily. > Also because Arabic is a language designed for calligraphy, a page scan of > a well written copy would be useful. It's not really any different from English. A page of English calligraphy may be beautiful, but it's not a text version. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From hart at pglaf.org Sat Dec 25 10:37:58 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Dec 25 10:37:59 2004 Subject: [gutvol-d] !@!Re: E-DOCS: Google Print Questions [J. Roland] In-Reply-To: <41CC53D5.3010005@perathoner.de> References: <20041223031630.16124F2548@boggle.pobox.com> <41CC53D5.3010005@perathoner.de> Message-ID: On Fri, 24 Dec 2004, Marcello Perathoner wrote: > Michael Hart wrote: > >> Project Gutenberg has already produced and distributed nearly 15,000 >> eBooks, >> with a budget that has yet to reach a significant total for all 33+ years, >> and is projected to reach a million eBooks without undue expense or >> effort. > > PG produces books at a lower cost only if you neglect the cost of volunteer > work. I'm sure a big organized corporation like Google can create eBooks way > cheaper than a loosely organized group of volunteers like PG. We'll find out, won't we? I'm still betting we will be first to 100,000. Then it'll be fun to see how it goes to 1,000,000. Of course, after 10,000,000, things will really slow down, in the sense that it will become hard to find more books. >> We'll just have to wait and see if either Google Print, or any of the >> various >> "Million eBook Projects" will ever come up with even 1% of a million >> eBooks >> that you can carry with you on a one inch stack of plain homemade DVDs. > > Whereas PG already has reached 1.5% of a million books with 98.5% still to > go. Hopefully more news on this front shortly. >> If it hasn't been proofread, and if you can't take it with you, it is only >> of limited value. . .sort of like reading over someone's shoulder. > > Depends on what you want to do with the book. If you only want to cite some > work a page scan (that you cannot take with you but is error-free) is much > better than a proofread eBook (which may contain OCR errors). I have yet to read any paper book that is error free. . . . Eventually the eBook will be more accurate than the source, perhaps in your lifetime for many eBooks. > > >> With Project Gutenberg eBooks, you OWN them. . .forever. . .and can save >> them >> in your own favorite formats, fonts, margination, pagination, or whatever, >> and you can search, quote, print, and do all the normal eBook fuctions. > > Yours forever ... until new copyright laws separate you. Luckily US and AU copyright changes are not retroactive, as are those of more olde worlde countries. . . . > > >> I would say that an eBook has to be at least 99.9% accurate, and that it >> should then be a process as people read the eBooks, to send in >> corrections. > > That is ~ 2 errors per page if you assume a line length of 55 and page length > of 40 (~ 2000) chars. The Library on Congress standard is 99.95%. . .one error per page. Of course, some people count a stray character in the margins as an error, or a typo in the header/footer/page#. . .I only count the authors's words. > >> Most of the Project Gutenberg and Distributed Proofeaders would say it has >> to be over 99.99% and perhaps even over 99.999%. > > That is approx. one error every 5 pages or every 50 pages. Still not very > good. Reading one of Brewster's books with Greg the other day, it was obvious only the author's words had been proofed, the headers/footers/page# were often messy, but the book itself was quite readable. It had perhaps less than 1,000 characters per page, but only one real error. . .another was a capitalization error that may bother some and not others. . .in about 10 pages. That's at least one "hard" error, and one "soft" error per 10K, 99.99% or 99.98%. . .if you don't count header/footer/page# errors. . . . This is well beyond the Library of Congress standards of 99.95% if someone were to decided to "sew all the pages together, into a single file eBook, and eliminate the headers/footers/page#'s etc. I was quite impressed. . .and I will have to look at more of them. > > >> Not only that, but, viewing the entire eBook effort as a 50 year process, >> of which I have walked 33+ years, I must state for the record that I think >> OCR, spellcheckers, grammarcheckers., etc. will be so much better a decade >> from now that doing the proofreading on the more obscure works will >> require >> so much less effort than it does today, that it will be a great trade-off. > > Which poses the question: isn't Google's approach to just scan the books > today and wait, better suited to achieve the 1 million target? Every progress > in OCR technology automatically "proof-reads" all books Google has scanned. This has been the approach of all the "quick and dirty" eBook projects, certainly all those that project a million eBooks in the next 10 years. Except, of course, Project Gutenberg. Thanks!!! So Nice To Hear From You! Happy Holidays!!! Michael Give FreeBooks!!! In 39 Languages!!! As of December 25, 2004 ~14,815 FreeBooks at: ~185 to go to 15,000 http://www.gutenberg.org http://www.gutenberg.net We are ~96% of the way from 10,000 to 15,000. Now even more PG eBooks In 104 Languages!!! http://gutenberg.cc http://gutenberg.us Michael S. Hart Project Gutenberg Executive Coordinator^M "*Internet User ~#100*" If you do not receive a prompt reply, please resend, keep resending. From hart at pglaf.org Sat Dec 25 10:42:13 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Dec 25 10:42:15 2004 Subject: !@!Re: [gutvol-d] Fwd: Project Googleberg (fwd) Message-ID: Request permission to quote/forward your comments, you can be anonymous if you like. . .as it may get back to Google. . . . > I have examined what seems to be the preliminary "Googleprint" > catalog. It consists of books scanned and posted by other people > including us. At least half of them that I looked at are available > only as page scans, and I have to want a book an awful lot to put > page scans together just for my own use. They use a LOT of our > books; in fact, everything they have that they are aware we have > shows us as the best or only site to go and get the book. Though from this message it wasn't clear whose words these are. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From hyphen at hyphenologist.co.uk Sat Dec 25 12:17:32 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sat Dec 25 12:18:25 2004 Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta In-Reply-To: <20041225181806.8BEBC101D0@ws1-3.us4.outblaze.com> References: <20041225181806.8BEBC101D0@ws1-3.us4.outblaze.com> Message-ID: On Sat, 25 Dec 2004 10:18:06 -0800, "D. Starner" wrote: | There's a lot of language communities out there that PG doesn't | have much contact with. Shame about that :-( -- Dave F From holden.mcgroin at dsl.pipex.com Sat Dec 25 15:15:50 2004 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Sat Dec 25 15:16:08 2004 Subject: [gutvol-d] Ibn Batuta In-Reply-To: References: Message-ID: <41CDF4A6.3000906@dsl.pipex.com> steve harris wrote: > The Library of Congress doesn't list anything before 1929. The British Library shows: > > Author - personal Batu?ta, Ibn. > Title The travels of Ibn Batu?ta : translated from the abridged manuscript copies, preserved in the public library of Cambridge with notes, illustrative of the history, geography, botany, antiquities, &c. occurring through the work / by Samuel Lee. > Publisher/year London : Darf, 1984, 1829. > Added name Lee, Samuel. > holdings (1) All items > Holdings (BL) 89/27495 DSC Request > ISBN 1850770352 Hi! Thanks for the info. I've really wanted to read Ibn Batuta for a while now so perhaps this is the golden opportunity to finally get a PG version in motion. So, does anybody have any experience ordering copies of whole books from the British Library? I'd love to do it (obviously, depending on price) if it's at all possible :-) Cheers, Holden From sly at victoria.tc.ca Sat Dec 25 23:17:06 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Sat Dec 25 23:17:27 2004 Subject: [gutvol-d] Ibn Batuta In-Reply-To: <41CDF4A6.3000906@dsl.pipex.com> References: <41CDF4A6.3000906@dsl.pipex.com> Message-ID: On Sat, 25 Dec 2004, Holden McGroin wrote: > Thanks for the info. I've really wanted to read Ibn Batuta for a while > now so perhaps this is the golden opportunity to finally get a PG > version in motion. So, does anybody have any experience ordering copies > of whole books from the British Library? I'd love to do it (obviously, > depending on price) if it's at all possible :-) > Hmmm... you may not need to go that far afield. It looks as if there is a reprint in the University Library in my city (Victoria, British Columbia) and given that it was published in New York, I'd expect you could find it in some American cities... Here's a full record: Author/Creator: Ibn Batuta, 1304-1377. Other Author/Creator(s): Lee, Samuel, 1783-1852. Title: The travels of Ibn Batuta. Translated from the abridged Arabic manuscript copies, preserved in the Public Library of Cambridge. With notes illustrative of the history, geography, botany, antiquities, occurring throughout the work, by Samuel Lee. Uniform Title: Tuhfat al-nuzzar English. 1971 , _________________________________________________________________ Database: University of Victoria Libraries Location: McPherson Library Call Number: G370 I23 Number of Items: 1 Status: In Library Subject(s): Voyages and travels Africa--Description and travel--To 1900 Asia--Description and travel Published: New York, B. Franklin [1971] Description: xviii, 243 p. 24 cm. Series: Burt Franklin research & source works series, 817 Geography and discovery, 13 Notes: Reprint of the 1829 ed. Translation of Tuhfat al-nuzzar. ISBN: 0833720511 From hyphen at hyphenologist.co.uk Sun Dec 26 01:06:02 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sun Dec 26 01:06:33 2004 Subject: [gutvol-d] Ibn Batuta In-Reply-To: <41CDF4A6.3000906@dsl.pipex.com> References: <41CDF4A6.3000906@dsl.pipex.com> Message-ID: On Sat, 25 Dec 2004 23:15:50 +0000, Holden McGroin wrote: | Thanks for the info. I've really wanted to read Ibn Batuta for a while | now so perhaps this is the golden opportunity to finally get a PG | version in motion. So, does anybody have any experience ordering copies | of whole books from the British Library? I'd love to do it (obviously, | depending on price) if it's at all possible :-) It all depends where the copy is stored. If it is in London, forget it, you can not borrow it, you must get readership permission and consult it there. If it is in Boston Spa, you can consult it at Boston Spa but it takes some time to get it from storage so organize yourself first. It is far easier to get things via your local library. Fill in a request form, and they will first look for an ordinary local library copy, and failing this they will get a copy from Boston Spa. ************ BEWARE THE BL FINES SYSTEM THEY ARE HORRENDOUS ********** -- Dave F From Gutenberg9443 at aol.com Sun Dec 26 15:45:51 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sun Dec 26 15:46:09 2004 Subject: [gutvol-d] Final Report on eBookWise 1150 Message-ID: <9b.5597c5bf.2f00a72f@aol.com> I promised a final report after the eBookWise 1500 Librarian was released. It was released last week. It took me a little while to get the computer and the device to talk to each other, but the problem turned out to be that I had only turned the computer off and on after each program was installed. I was supposed to have completely unplugged the computer. Please do not flame me over this. You are authorized to disagree with me. I know that some people would rather spend 42 days stuck in a semiprivate hospital room with a roommate who is in love with the soaps, than possess a device that will allow them to read and mentally tune out the television. Forget about all the negative comments in my initial report. As long as I have loved and been faithful to my Rocket, I have to admit that the 1500 is better. It will allow me to do things that my Rocket won't allow, including making handwritten notes with my stylus, so henceforth the Rocket will be my pleasure reading device and the 1500 (we have named her Isis, to relate well to my computer, whose name is Sesheta. Sesheta was the ancient Egyptian goddess of libraries and librarians; Isis was the Lady high everything else.) will be my work device. The only problem I'm still having is the fact that Isis holds only about 20 books. But after I get to the computer store and get a SmartMedia card and its driver, Isis will hold over 300 books very easily. This is a winner. If you have any interest in being able to carry 300, or even 20, books around in your purse or backpack without tearing your shoulder into shreds, hie yourself over to eBookWise.com and buy the eBookWise while the price is right. $110 will buy and deliver it, and you'll then be given $20 to spend on books. All books are 20% off for about another week. With this device, you can read ANY BOOK ON THE INTERNET, unless it is in an encrypted format incompatible with .txt and .htm and .doc. You can definitely use it to read anything posted on PG and anything posted on Blackmask. Between FictionWise and eBookWise, literally thousands of commercial titles are available. Also anything you can get in .rb will transform itself into the right format in less than a minute. If you use a PDA or even a cellular phone with a lot of memory, you can carry around one to three books, if you don't mind reading in teensie weensie typesize with a line that consists of three words (or two if they're longer words). Blech. Yesterday at the family Christmas party I was showing this thing to my husband's former wife, and she held it in her hand and said, "Well, it doesn't weigh less than a book that size." I conceded the point--a Rocket is 18 ounces and I think an eBookWise 1500 is about 22 ounces-- but then pointed out that it weighs far less than 20, or 300, books that size. (When anybody in our family is in the hospital for more than a day, daily book runs are necessary to take away the read books and bring new ones. This device will obviate that necessity. Although it is true that you should never have valuable stuff lying around your hospital room, as thieves are familiar with hospital rooms, you can ask a kind nurse to lock it away for you when you're about to go to sleep.) Or--You're travelling cross-country. Every evening you stop at a motel. At the motel, you have your choice of the Gideon Bible if it hasn't been swiped, the food service menu, or television--or your own selection of 300 books. OR--if you're stuck in the hospital emergency room waiting interminably for somebody to come and attend to you or your loved one, wouldn't you like to have something to read with you? Maybe even two somethings, so that the person in bed, if it's not you, can also have something to read? If I had the money I'd give one of these to everybody I know. If you don't want one, don't buy it. But if you can afford it, at least give it a one-week trial. Then if you still don't like it, give it to somebody you don't like. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041226/532b2d28/attachment.html From Gutenberg9443 at aol.com Sun Dec 26 15:48:06 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sun Dec 26 15:48:22 2004 Subject: [gutvol-d] Ibn Batuta Message-ID: <111.4043b5d3.2f00a7b6@aol.com> In a message dated 12/26/2004 12:17:36 AM Mountain Standard Time, sly@victoria.tc.ca writes: << Snippet of conversation found on usenet: > > I have probably 2,000 books and > > perhaps 50 videos, not a single one of them clutter. Getting them from the > > library and then returning them just isn't the same as rereading > well-worn, familiar pages, either, for instance. > > > Yes, it's quite different. Tho come to think of it, I feel that way about a > few books I get from the library once a year or so to reread. Elizabeth > Goudge's CITY OF BELLS for one. But those are old books themselves, been > there a long time, nice old pictures and nice old typeface.... > > What I'm trying to do is transfer that feeling to Project Gutenberg's 'plain > vanilla' texts. I already kind of feel like their site is a very old shabby > library, smelling of leather, mice, and I forget what all else.... > > From shimmin at uiuc.edu Mon Dec 27 07:30:44 2004 From: shimmin at uiuc.edu (Robert Shimmin) Date: Mon Dec 27 07:30:50 2004 Subject: [gutvol-d] !@!Re: E-DOCS: Google Print Questions [J. Roland] In-Reply-To: References: <20041223031630.16124F2548@boggle.pobox.com> <41CC53D5.3010005@perathoner.de> Message-ID: <41D02AA4.3010301@uiuc.edu> Michael Hart wrote: > I'm still betting we will be first to 100,000. Anyone can beat us to 100,000 if they mirror all of our content and add some of their own. -- RS From stephen.thomas at adelaide.edu.au Sat Dec 25 15:16:16 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Wed Dec 29 13:35:43 2004 Subject: [gutvol-d] Revolutionary chapter / Google's ambitious book-scanning plan seen as key shift in paper-based culture Message-ID: <41CDF4C0.10200@adelaide.edu.au> Not sure if I sent this already, but it provides some useful info on the process Google is adopting. In the light of discussion on this topic so far, this may be enlightening for some readers. http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2004/12/20/BUGROAD6QT1.DTL -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From hart at pglaf.org Tue Dec 28 03:49:41 2004 From: hart at pglaf.org (Michael Hart) Date: Wed Dec 29 13:35:46 2004 Subject: [gutvol-d] !@!Googleberg eBooks Message-ID: How many of you have tried Google Print? Have you noticed that the intitial offering of eBooks strongly resembles the Project Gutenberg catalogue??? We'd love to hear your experiences with Google Print. Thanks!!! Michael S. Hart From nihil_obstat at mindspring.com Wed Dec 29 14:36:05 2004 From: nihil_obstat at mindspring.com (Dennis McCarthy) Date: Wed Dec 29 15:23:50 2004 Subject: [gutvol-d] !@!Googleberg eBooks Message-ID: <22839263.1104359765940.JavaMail.root@wamui02.slb.atl.earthlink.net> There is no "Google Print" library in a sense that I would think of one: i.e. I cannot seem to get any catalog of its collection or just browse Google Print. It works as an added feature to its regular search--so Google Print titles come up, as well as external links. This is fine in a way, because if I search for a text, and PG has it, the search results usually have a high ranking link to the a P.G. server. (So in a way, all of P.G. is essentially as findable to Google clients as Google Print is.) The lack of a catalog is bad in a way, for there is no easy way to see just what is available at Google that is in the public domain--in case you actually wanted to read an entire book on-line. You have to seach for topics or people and try to wend through what comes up in the search results. Of course Google does not claim to be a library. From its own website: "In general, Google Print is designed to help you discover books, not read them from start to finish. It's like going to a bookstore and browsing ? only with a Google twist." I do not find it a very useful service. But it is something that was not there before. I am not going to complain about it--that would be like looking a semi-lame-gift-nag-with-a-google-twist in the mouth. -----Original Message----- From: Michael Hart Sent: Dec 28, 2004 6:49 AM To: undisclosed-recipients: ; Subject: [gutvol-d] !@!Googleberg eBooks How many of you have tried Google Print? Have you noticed that the intitial offering of eBooks strongly resembles the Project Gutenberg catalogue??? We'd love to hear your experiences with Google Print. Thanks!!! Michael S. Hart _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d --------------------------- Dennis McCarthy nihil_obstat@mindspring.com From servalan at ar.com.au Thu Dec 30 03:16:58 2004 From: servalan at ar.com.au (Pauline) Date: Thu Dec 30 03:17:34 2004 Subject: [gutvol-d] !@!Googleberg eBooks In-Reply-To: References: Message-ID: <41D3E3AA.30508@ar.com.au> Michael Hart wrote: > > How many of you have tried Google Print? > > Have you noticed that the intitial offering of eBooks > strongly resembles the Project Gutenberg catalogue??? Why not include info in all PG ebooks which make it: a) easy for readers to identify the source of the book (PG & the "Produced by" line) b) easy for readers/mirror sites/republishers to send corrections back to the source (PG &/| the producers) c) not OK to drop this info from PG ebooks when they are republished As a reader, knowing the source of the book is exceedingly valuable. Cheers, P -- Distributed Proofreaders: http://www.pgdp.net "Preserving history one page at a time." From shimmin at uiuc.edu Thu Dec 30 08:35:49 2004 From: shimmin at uiuc.edu (Robert Shimmin) Date: Thu Dec 30 08:35:55 2004 Subject: [gutvol-d] !@!Googleberg eBooks In-Reply-To: <41D3E3AA.30508@ar.com.au> References: <41D3E3AA.30508@ar.com.au> Message-ID: <41D42E65.3020408@uiuc.edu> Pauline wrote: > c) not OK to drop this info from PG ebooks when they are republished The idea of a public domain is that anyone can do anything they like with the text, including edit it, republish it, and package it however they wish. -- RS From hart at pglaf.org Thu Dec 30 08:55:14 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 30 08:55:16 2004 Subject: [gutvol-d] !@!Googleberg eBooks In-Reply-To: <41D42E65.3020408@uiuc.edu> References: <41D3E3AA.30508@ar.com.au> <41D42E65.3020408@uiuc.edu> Message-ID: On Thu, 30 Dec 2004, Robert Shimmin wrote: > Pauline wrote: > >> c) not OK to drop this info from PG ebooks when they are republished > > The idea of a public domain is that anyone can do anything they like with the > text, including edit it, republish it, and package it however they wish. But you can't say you are the author. . .and perhaps other things. mh From hart at pglaf.org Thu Dec 30 09:00:40 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 30 09:00:41 2004 Subject: [gutvol-d] !@!Googleberg eBooks In-Reply-To: <41D3E3AA.30508@ar.com.au> References: <41D3E3AA.30508@ar.com.au> Message-ID: On Thu, 30 Dec 2004, Pauline wrote: > Michael Hart wrote: >> >> How many of you have tried Google Print? >> >> Have you noticed that the intitial offering of eBooks >> strongly resembles the Project Gutenberg catalogue??? > > Why not include info in all PG ebooks which make it: > a) easy for readers to identify the source of the book (PG & the "Produced > by" line) eBooks often have multiple paper sources. > b) easy for readers/mirror sites/republishers to send corrections back to the > source (PG &/| the producers) There is already a email address for errors in the eBooks, not to mention bugs@pglaf.org and my own email address. You can pretty much send error messages to ANY PG address and they will be fixed. > c) not OK to drop this info from PG ebooks when they are republished As in earlier messages, we only have something to say if they use the PG trademark. mh From jtinsley at pobox.com Thu Dec 30 13:03:18 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Thu Dec 30 13:03:35 2004 Subject: [gutvol-d] !@!Googleberg eBooks In-Reply-To: References: Message-ID: <20041230210318.GA27098@panix.com> On Tue, 28 Dec 2004 03:49:41 -0800 (PST), Michael Hart wrote: > >How many of you have tried Google Print? > >Have you noticed that the intitial offering of eBooks >strongly resembles the Project Gutenberg catalogue??? > This and some responses made me think that some people are thinking along the lines that they are using our texts in some way, so I checked it out. I figure that the answer is no, to both the explicit and implied questions. I started by searching for quotes from 20 etexts chosen at random from etext99, as follows: book "cardinals, abbots, councillors, legates, bishops, princes" book "indeed we be no fatted bullocks, we two" book "Est-ce que je ne connais pas mon filleul?" book "Suchet's head-quarters at that time was the old palace of the" book "She always has this man of letters of hers on her" book "Afterwards," he answered quickly. "A cursed gutta serena." book "himself with the people, he partially recognizes the truth of his words." book "Epistles are spurious, as that the Republic, the Timaeus, and the Laws" book "You may recall that our mutual and dear friend, old Allan Quatermain," book "Where rose the husbandman's abode," book "the felicity of his fellow beings, and sit down darkling" book "by a tub, artesian cold, and a loud and joyous singing of" book "As desires of waking hours are answered in sleep," book "Even while speaking at random, perhaps the better to hide" book "Calm and proud, Tartarin of Tarascon marched on in the night" book "Another fallacy is produced which turns on the absoluteness of" book "The evidence for the steadily growing danger of secession" book "Morose-minded people may complain of this; for myself I regard it" book "THAT old bell, presage of a train, had just" All of them returned normal search results, including a few from PG, but only the second (Jungle Book 2) offered a Google Print link. (Incidentally, for those who want to try, I find that preceding your search term with "book" will often produce a Google Print link when the bare search term doesn't.) A search for "book Tarzan" yielded, in Print results: Tarzan of the Apes - by Edgar Rice Burroughs - 320 pages Human Computer Interaction - edited by Julie Jacko, Constantine Stephanidis - 1348 pages C Primer Plus - by Stephen Arata, Stephen Prata, Kathleen Prata - 970 pages Not what I'd consider a typical PG search result! :-) "book barsoom" and "book mars" did even less well. No sign of the ERB series. Erewhon, Alice, Little Women, Oliver Twist, Tom Sawyer, Huck Finn, Zenda, Decline and Fall, at least some Sherlock Holmes, Last of the Mohicans, several from Plato and at least most of Shakespeare, are present. Richard Feveral is there, but Shagpat is nowhere. Tom Swift is AWOL. Tartarin of Tarascon can't be found. John Carter is once again mysteriously missing. Kai Lung has effaced himself into invisibility. And in the process of searching for these, I turned up about twice as many modern as pre-23 book titles. The page images I looked at are all from modern reprints, with "Copyrighted Material" tags on their sides. I imagine that the publishers would insist on this, which makes much sense of Google wanting to work with a collection of PD books from libraries. This pattern is, I think, consistent with what book publishers might be willing to provide. Any list of books drawn up by English speakers is going to have the most popular classics on it. An awful lot of the search results I found were from Penguin Classics, so it may well be that they simply have the whole Penguin Classics range. If so, a significant overlap with PG is inevitable. And the Google Print entries seem to have a lot more modern books than classics. Hmmm. Interesting. The only Tarzan link for Google Print is "Tarzan of the Apes", and the only Tarzan search result at the Penguin Classics site is, guess what? "Tarzan of the Apes". And Penguin Classics does not publish the Barsoom series. "Coincidence? I think not!" Interesting: both the search book "she could have seen through a pair of stove-lids just as well." and book "A robber is more high-toned" find Tom Sawyer in Google Print, and book "Christmas won't be Christmas without any presents," finds Little Women, but book "Papa was a pickle bottle" doesn't, and book Little Women pickle does find the book, but with the word pickle much further down in the book. Hmmm, I see. The text in the Google Print image reads "pa was a pickle-bottle" instead. So much for any thought of them using our text. The larger reason that they can't be using our text is that their search results point to page images, with the search term highlighted in yellow. You really couldn't do that unless you had mapped your text to the dimensions and placing of the image: it would be vastly easier to do it programmatically from the OCR process than to use an outside text. >We'd love to hear your experiences with Google Print. It will be handy, though probably not as handy as Amazon, for confirming unclear corrections in some older texts. They've somewhat protected their page images from downloading by the casual browser, but it's easy to bypass that. The more significant restriction is the number of pages any one session is allowed to download. This seems, to me, a reasonable compromise for genuinely-copyrighted books, though an annoyance on these reprints where the main story is in the PD and only the bookends are in copyright. It'll be interesting to see what they do with 100% pre-23 guaranteed content. jim From shalesller at writeme.com Thu Dec 30 13:25:36 2004 From: shalesller at writeme.com (D. Starner) Date: Thu Dec 30 13:25:46 2004 Subject: [gutvol-d] !@!Googleberg eBooks Message-ID: <20041230212536.30DEE4BDAB@ws1-1.us4.outblaze.com> Michael Hart writes: > > eBooks often have multiple paper sources. PG eBooks verifiably do not often have multiple paper sources. They sometimes, occasionally, have multiple paper sources. It is the exception that they have multiple paper sources, and even more the exception that they come from multiple paper editions. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From gbnewby at pglaf.org Thu Dec 30 15:39:23 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Dec 30 15:39:24 2004 Subject: [gutvol-d] !@!Googleberg eBooks In-Reply-To: <20041230210318.GA27098@panix.com> References: <20041230210318.GA27098@panix.com> Message-ID: <20041230233923.GA2406@pglaf.org> On Thu, Dec 30, 2004 at 04:03:18PM -0500, Jim Tinsley wrote: > On Tue, 28 Dec 2004 03:49:41 -0800 (PST), Michael Hart wrote: > > > > >How many of you have tried Google Print? > > > >Have you noticed that the intitial offering of eBooks > >strongly resembles the Project Gutenberg catalogue??? > > > > This and some responses made me think that some people are thinking > along the lines that they are using our texts in some way, so I checked > it out. I figure that the answer is no, to both the explicit and implied > questions. > > I started by searching for quotes from 20 etexts chosen at random from > etext99, as follows: > ... Fascinating analysis, thanks. Just a quick note that Google only indexes the first 100 or 150K of eBooks (they didn't give me a firm number, but confirmed there was a limit). This means that quotes from later parts of our eBooks > ~150K won't be found. -- Greg From jtinsley at pobox.com Thu Dec 30 15:49:20 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Thu Dec 30 15:49:32 2004 Subject: [gutvol-d] !@!Googleberg eBooks In-Reply-To: <20041230233923.GA2406@pglaf.org> References: <20041230210318.GA27098@panix.com> <20041230233923.GA2406@pglaf.org> Message-ID: <20041230234920.GC22506@panix.com> On Thu, Dec 30, 2004 at 03:39:23PM -0800, Greg Newby wrote: > >Just a quick note that Google only indexes the first 100 or 150K >of eBooks (they didn't give me a firm number, but confirmed >there was a limit). This means that quotes from later parts of >our eBooks > ~150K won't be found. This is true for our books, as searched for by Google in general, like any other page, but it is not true for the Google Print search results; when they search Google Print, they do search the whole text, regardless of length. I did confirm this by searching for quotes that were near the ends of books. jim From phil at thalasson.com Thu Dec 30 17:12:41 2004 From: phil at thalasson.com (Philip Baker) Date: Thu Dec 30 17:23:12 2004 Subject: [gutvol-d] !@!Googleberg eBooks In-Reply-To: <20041230210318.GA27098@panix.com> Message-ID: Jim Tinsley wrote: >(Incidentally, for those who want to try, I find that preceding your >search term with "book" will often produce a Google Print link when >the bare search term doesn't.) A few days ago Steve Thomas gave the following link to an article in the San Francisco Chronicle: http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2004/12/20/BUGROAD6QT1.DTL In the article it says: Typing in "book" and any search term within the Google window generates a "Book results" listing if a match of the search term is made within an indexed book. These results can be clicked to read excerpts from the book. Looks as if this may develop into a 'book: key-words' type search which will only search Google Print. -- Philip Baker From hart at pglaf.org Fri Dec 31 10:59:51 2004 From: hart at pglaf.org (Michael Hart) Date: Fri Dec 31 10:59:53 2004 Subject: [gutvol-d] !@!Googleberg eBooks In-Reply-To: <20041230234920.GC22506@panix.com> References: <20041230210318.GA27098@panix.com> <20041230233923.GA2406@pglaf.org> <20041230234920.GC22506@panix.com> Message-ID: On Thu, 30 Dec 2004, Jim Tinsley wrote: > On Thu, Dec 30, 2004 at 03:39:23PM -0800, Greg Newby wrote: >> >> Just a quick note that Google only indexes the first 100 or 150K >> of eBooks (they didn't give me a firm number, but confirmed >> there was a limit). This means that quotes from later parts of >> our eBooks > ~150K won't be found. > > This is true for our books, as searched for by Google in general, > like any other page, but it is not true for the Google Print > search results; when they search Google Print, they do search > the whole text, regardless of length. I did confirm this by > searching for quotes that were near the ends of books. > > jim Aren't the PG eBooks already in Google Print? That's what I heard, so I would have figured they would have re-indexed them to make them complete??? I wonder if they left the old files, and just are making new ones, still from PG eBooks? If so, how would you tell the difference? mh From hart at pglaf.org Fri Dec 31 11:04:35 2004 From: hart at pglaf.org (Michael Hart) Date: Fri Dec 31 11:04:35 2004 Subject: [gutvol-d] !@!Googleberg eBooks In-Reply-To: <20041230212536.30DEE4BDAB@ws1-1.us4.outblaze.com> References: <20041230212536.30DEE4BDAB@ws1-1.us4.outblaze.com> Message-ID: On Thu, 30 Dec 2004, D. Starner wrote: > Michael Hart writes: >> >> eBooks often have multiple paper sources. > > PG eBooks verifiably do not often have multiple paper sources. They > sometimes, occasionally, have multiple paper sources. It is the > exception that they have multiple paper sources, and even more the > exception that they come from multiple paper editions. The above might take more than one reading. . . . In addition, I should add the pretty much ALL the original PG eBooks came from multiple editions, simply to do better error checking. Michael From jon at noring.name Fri Dec 31 11:25:27 2004 From: jon at noring.name (Jon Noring) Date: Fri Dec 31 11:25:43 2004 Subject: [gutvol-d] !@!Googleberg eBooks In-Reply-To: References: <20041230212536.30DEE4BDAB@ws1-1.us4.outblaze.com> Message-ID: <142187819718.20041231122527@noring.name> Michael Hart wrote: > In addition, I should add the pretty much ALL the original PG eBooks > came from multiple editions, simply to do better error checking. How many of the PG texts fall into the category "the original PG eBooks"? There is, of course, a difference between consulting other sources to clarify a few things with the text derived from the primary source, and simply kludging together a bunch of different editions to form a "new edition". An example of how things got out of whack with the "original PG texts" is Mary Shelley's "Frankenstein", where there are two quite different editions, and the version at PG is not even marked as to which edition it conforms with. It was a mistake to not include source information with the early PG texts (even if the work was a derivative.) Mistakes happen. Some of these mistakes can be corrected after-the-fact. And future works can do it right. No need to apologize for the past, Michael -- all projects make mistakes. The key is to learn from the mistakes and make the necessary changes in policies and procedures. (Am I correct in that the policy has changed, and all new PG texts are to include the source metadata?) Jon From jtinsley at pobox.com Fri Dec 31 12:19:56 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Fri Dec 31 12:20:06 2004 Subject: [gutvol-d] !@!Googleberg eBooks In-Reply-To: References: <20041230210318.GA27098@panix.com> <20041230233923.GA2406@pglaf.org> <20041230234920.GC22506@panix.com> Message-ID: <20041231201956.GA2782@panix.com> On Fri, 31 Dec 2004 10:59:51 -0800 (PST), Michael Hart wrote: >Aren't the PG eBooks already in Google Print? No. Definitively, no. That is one of the things my experiments demonstrated (see "pickle-bottle"). Our texts, or at least, as Greg says, the first 100K or so of them, are indexed in Google, and Yahoo!, and other search engines. But that's Google, not Google Print. Google Print is a NEW content source. The content for Google Print is not directly available on the web now; it is held internally by Google. I have no inside information, but I think that my reconstruction below, based on my actually trying the thing, is pretty close. 1. Google agree with Penguin Classics, among others, that they can use their publications in Google Print. 2. Penguin Classics, et. al., ship Google a copy of every book they currently have in print (which is covered by this agreement -- I imagine there may be some restrictions). 3. Google cut the pages ('cos the scans are just _beautiful_!) and scan the pages of the books into images. 4. Google run OCR on the pages. Along with every word, they store its position in the image. Like: the word "poorer" is on page 62, in a box 1.1 cm wide and 0.4cm high whose top left corner is 4.2 cm from the top of the page and 3.1 cm from the left margin, . . . except I'm sure they're not using cm. as their unit. Abbyy does this in its internal files it saves, so it wouldn't shock me to find that they're using Abbyy for OCR. 5. Google resize and transform the images to JPEG for display. (I can't prove that they didn't start with JPEGs of that size, but I think it's likely that they scanned at 600 or higher initially.) 6. Google store the OCRed text, complete with the co-ordinates of each word on the pages where it appears, and index that OCRed text. They also store the JPEG images. Because they know that all the text in a book is useful (and that a book is of a finite size!) they store _all_ of the text of each book, not just the first 100K. 7. When a Google search is run, not only the main Google index is searched, but also the Google Print OCR text. 8. If the search returns results from Google Print, they are displayed on the search results page, along with the main Google results. 9. If a user clicks on a Google Print result, they are brought to the first page image -- the JPEG file -- where that search term is found in the OCRed text. When the page image is displayed, the search term is highlighted in yellow, using the co-ordinates captured at OCR time. (Actually, what is shown is the page image without the yellow, as I demonstrated by viewing the page images directly, with the HTML creatd dynamically to overlay yellow at the appropriate co-ordinates.) 10. The user can then browse back and forth, with limitations, through the page images. 11. The text that Google OCRed is never actually displayed as text, or HTML; it is used only to find the right page and highlight the search term. > >That's what I heard, Then I feel quite certain that you heard wrong. >so I would have figured they would have >re-indexed them to make them complete??? > >I wonder if they left the old files, and just are making new ones, >still from PG eBooks? > >If so, how would you tell the difference? If they were using our texts, which I am quite sure they are not, we could tell the difference by seeing whether their text was the same as our text. I do that quite a lot when checking out corrections to our texts, and I can actually reel off various errors in various eeditions of e-texts around the web by now. Their page images, and their search index, do not contain the same words as our texts. My "pickle-bottle" example is the least demonstration of that: many of the Penguin Classics they have in Google Print include introductions that we do not have. And, remember, they never display text: they _only_ display page images. No, I conclude that Google Print overlaps not at all with PG, except that we both have (different editions of) a large number of classic books. jim From juliet.sutherland at verizon.net Fri Dec 31 20:09:25 2004 From: juliet.sutherland at verizon.net (juliet.sutherland@verizon.net) Date: Fri Dec 31 20:09:41 2004 Subject: [gutvol-d] !@!Googleberg eBooks Message-ID: <20050101040925.FZKW17379.out008.verizon.net@outgoing.verizon.net> > > From: Jim Tinsley > Date: 2004/12/31 Fri PM 12:19:56 PST > To: gutvol-d@lists.pglaf.org > Subject: Re: [gutvol-d] !@!Googleberg eBooks > 3. Google cut the pages ('cos the scans are just _beautiful_!) and scan > the pages of the books into images. As I've previously noted, destructive scanning of modern reprints is easy and usually results in good images and good OCR. > 4. Google run OCR on the pages. Along with every word, they store its > position in the image. Like: the word "poorer" is on page 62, in a box > 1.1 cm wide and 0.4cm high whose top left corner is 4.2 cm from the top > of the page and 3.1 cm from the left margin, . . . except I'm sure > they're not using cm. as their unit. Abbyy does this in its internal > files it saves, so it wouldn't shock me to find that they're using Abbyy > for OCR. The folks at The Million Book Project and The Internet Archive are using something called djvu that does this. It creates bounding boxes around each word in the image, then stores that information along with the text. The OCR associated with djvu is not ABBYY but another product that does not work quite as well. A DP volunteer posted the following in our forums: ---------------------------- Here's an interesting experiment... Go to http://www.google.com/googleblog/. Under "All booked up" (which talks about the Google/Library project), click on the link labelled "the survival of the fittest". This takes you to a beta of Google Print, for the specific book "Darwin, and After Darwin". Under "Search within this book", type "Darwin" and hit "Go". You'll get a new window with 3 images, showing the first few occurrences of "Darwin" in the book, where "Darwin" is highlighted in yellow. What's interesting is that in the third image, there are two occurrences of the word "Darwin", but the first is not highlighted. Similarly, if you search for "Berkeley", one occurrence in the second image is missing its highlight. This suggests that their searches are based on unproofed OCR results (where the unhighlighted occurrences correspond to uncorrected scannos). ... searching for "1 arwin" (one, space, arwin) and having it highlight "Darwin". (Try it, it's neat!) --------------- All of the above would appear to confirm Jim's assessment about what Google has done to date. JulietS