From hart at pglaf.org Thu Jun 1 06:51:06 2006 From: hart at pglaf.org (Michael Hart) Date: Thu Jun 1 06:51:07 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <486.1ad8b05.31afe6bf@aol.com> References: <486.1ad8b05.31afe6bf@aol.com> Message-ID: On Thu, 1 Jun 2006 Bowerbird@aol.com wrote: > karl said: >> Sure, these ASCII files are also useful for special purposes, >> but telling us again and again that's the best solution >> for all books and all times, is highly arguable. > > to my mind, the only problem with the ascii files is > the absence of book typography -- bold headings, > justified lines, bottom-balanced pages, pagination!, > properly rendered footnotes, all the looks-nice stuff, > leading to a display that is so boring it becomes tedium. Some interesting points there, particularly that last one, as I have had multiple comments from our readers that they LIKE not having such boringly justified right margination, as it helps them better keep track of what line is next. As for the footnotes, I still agree with those who want an appendix containing all of them, rather than having breaks between pages contain them. I like this with paper books, and even more with eBooks, as it is trivial to switch from the text to the footnote and back again. I'll leave the pagination and margination issues to reader choice as their own personal decisions, along with fonts. Michael From hart at pglaf.org Thu Jun 1 07:24:10 2006 From: hart at pglaf.org (Michael Hart) Date: Thu Jun 1 07:24:12 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <44a.28419a8.31af625f@aol.com> References: <44a.28419a8.31af625f@aol.com> Message-ID: On Wed, 31 May 2006 Bowerbird@aol.com wrote: > sebastien said: >> Most of the time the original typesetting does not matter much. > > different people can disagree on that. And how! > > >> I believe you are missing the point. >> Michael doesn't care as much about collections of pictures >> as he does about digitalized text. > > different people disagree with michael. Please stop quoting from what I said about illustrations when bandwidth was a serious issue. . .not that everyone has broadband these days. . .or wants pictures; however, the point was made long ago when making an eBook larger, often many times larger, would stop people from reading. Don't forget just how much effort we put in to making an illustrated copy of Alice in Wonderland with the best of several resolution tests for each illustration, just for the purpose of making it small enough for more readers. However, this is all pretty much in the past now for the people on this list, but we should never forget that the world at large still may have bandwidth issues, and this new attention to reading eBooks on cell phone may play a major role in accentuating this issue. >> As long as scans and/or OCR technologies are so disappointing, >> we'll have to rely on higher-level human brains with initiatives >> such as PGDP or ebooksgratuits.com > > or methodologies which are better. > > >> Of course having easy access to pictures is useful and >> much better than nothing and serves you well, but >> that's not what PG and ebooks are about. > > different people can disagree on that too. > > >> ebooks are much more than photographs of regular analog books. > > yes, but photographs of regular analog books > _might_ qualify as e-books, for _some_ people. > > different people can disagree on that too. > > >> 3. is the top we are heading for. 2. is just a step on the way. > > but #2 might serve the needs of person x just fine. > > >> I did that and got >> 20845628 bytes for 604 pages. > > scans are resource hogs. nobody disagrees about that. > > one argument is that since these resources are now plentiful, > it doesn't matter that scans are resource hogs. > > different people can disagree on that too. > > as long as we can easily move scan-sets to digitized text, > i don't see much purpose in continuing to debate these two > as if they were competitors. they're not. they're complimentary. > > -bowerbird > From hart at pglaf.org Thu Jun 1 09:09:51 2006 From: hart at pglaf.org (Michael Hart) Date: Thu Jun 1 09:09:54 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <44a.28419a8.31af625f@aol.com> References: <44a.28419a8.31af625f@aol.com> Message-ID: On Wed, 31 May 2006 Bowerbird@aol.com wrote: > sebastien said: [snip] see previous message >> ebooks are much more than photographs of regular analog books. > > yes, but photographs of regular analog books > _might_ qualify as e-books, for _some_ people. > > different people can disagree on that too. > > >> 3. is the top we are heading for. 2. is just a step on the way. > > but #2 might serve the needs of person x just fine. > > >> I did that and got >> 20845628 bytes for 604 pages. > > scans are resource hogs. nobody disagrees about that. > > one argument is that since these resources are now plentiful, > it doesn't matter that scans are resource hogs. > > different people can disagree on that too. > > as long as we can easily move scan-sets to digitized text, > i don't see much purpose in continuing to debate these two > as if they were competitors. they're not. they're complementary. > > -bowerbird > Several issues worth thinking about here: File size, bandwidth, storage: important to whom? Are all scans food for OCR? Do raw scans qualify as eBooks? File size, bandwidth, storage: important to whom? Perhaps the way to think about this is to consider just how many more or less readers we would get if the file sizes were that much larger or smaller. In the end, I think we should provide both. Are all scans good for OCR? Some operations deliberately do not put their high resolution scans online for downloading, rather an automated process reduces the resolution, so these scans are no longer suitable for OCRing. Requests for those higher resolution scans seem to have a very limited success rate. The odds of being able to create a complete eBook, using those scans that are usually made available, perhaps about 1/4 to 1/3, based on the reports you have probably already seen. Once you go through the effort of scanning missing pages, rescanning the pages that did not work with your OCR programs, etc., it often might seem worth the effort simply to scan the entire book with the higher resolution scans that you can then post for others to use. Do raw scans qualify as eBooks? Obviously those who would prefer to claim a larger number of eBooks in using smaller amount of effort would prefer to be able to claim raw scans=eBooks. As mentioned in the various steps above, scanning, such as it is, can be nearly completely automated, to the point of cutting off book bindings, feeding the pages to the scanner in the same way as copier machines let you feed in stacks of pages, and then claiming the result of that minimal labor as eBook output in the catalog. This is the "quick and dirty approach" and doesn't cost much in terms of time, effort or money and it does provide a reasonably readable output if pages go through smoothly. Apparently they don't always go so smoothly, as many of the books were reported to have missing pages not to mention pages scanned poorly enough to be a problem; the report I recall mentioned some 30% as being acceptable: but these do not take into account some setups intentionally created to be not suitable for OCR. *** I suppose the real question comes down to purposes for making eBooks. Obviously Google, Yahoo, Amazon, and those Library of Congress projects all have different purposes: and it remains to be seen how much of the purposes will be revealed as they each start to move from a single percentage point of their goals to counting a majority of their collection as completed. The various university projects still seem to be a great deal concerned with keep their eBooks out of the hands of the public, as has Google, though the Google philosophy may be in the process of change. Right now it's hard to tell what Google has chosen as their goal; will they really try to do millions of books in the next 54 months after perhaps stats of .1 million in the first 18 months? Will Google change their philosophy per downloading scans, and or downloading their full text searching database? Until Google decides to actually proofread eBooks, I don't think they will want anyone to see what an eBook from Google looks like in full text: simply because it would be too obvious that proofreading, even on a moderate basis, is not part of the plan. However, I _DO_ think that the "second pass" eBook collection, whether done by Google or others, will be good enough, simply due to advanced technology, someone will do it all over again, 10 times better and 10 times faster and 10 times cheaper. However, I don't predict this before 2020. So, there it is in a nutshell, what eBooks will be in the near and distant future, as I see it. Will raw scans ever be the default? No. Why? Because full text will become easier to and people will keep making more and more full text eBooks in contrast to the raw scans. Obviously raw scans will continue to be cheap/easy for another few years, perhaps long enough for the Google, Yahoo, etc., efforts to claim some success in that area, but by the time they could claim any real success we will find that full text is coming along fast enough that the Google efforts would be lost in the shuffle as better full text emerges. My own goal has always been for the public to have their own home eLibraries, just as they have their own home computers. These eLibraries should be an entirely flexible set of products that can be read in virtually any hardware/software combination for the world at large to use. Such libraries are not dependent on particular search engines, or formats or any other particular product. Everyone will be free to keep their own copies of these libraries-- the number of persons owning libraries from now on will rise on the same order as did people owning a book after the invention of Gutenberg's Press. Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From hart at pglaf.org Thu Jun 1 09:16:03 2006 From: hart at pglaf.org (Michael Hart) Date: Thu Jun 1 09:16:04 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: References: <3b0.31c4827.31adca78@aol.com> Message-ID: On Wed, 31 May 2006, Karl Eichwalder wrote: > Michael Hart writes: > >> No one seems to thinks Gallica is really an eBook collection, raw >> scans seems to be most of what is available, and even those are a >> set of low-res versions that is not really suitable for OCRing. > > OCRing is important, but OCR without the scans nearby is often not > enough. I think gallica is one of the best e-book collections. Their > PDF are very useful (you can download complete books as PDFs pretty > easily and they are readable)! This way I can access the Bulletin > Monumental. > >> I must admit that I am relying on my friends here, as my Francias >> is not really good enough to know if I didn't miss something that >> would have provided better results on their site. > > Sure, you must know the way to create and download PDFs: Each .pdf file seemed to just hold a .gif file. . .or is there something else going on there that was missed? > > www.gallica.fr -> > Recherche -> > "Mots du titre" - enter the title, for example "Bulletin Monumental" > In the "R?sultat de la recherche: click on "Bulletin Monumental" > Select the volume, you are interested in, for example "1861 (S?r. 2)" > Now "T?l?charger" and "ok" if you are interested in the complete book > > Then wait, PDF preparation takes time. Click > Vous pouvez le t?l?charger "en cliquant ici." or use the supplied FTP > address. And this is supposed to prepare the book as a single .pdf file? Searchable? Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From Bowerbird at aol.com Thu Jun 1 10:03:04 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 1 10:03:14 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <388.4a90b23.31b077c8@aol.com> michael said: > as I have had multiple comments from our readers that they > LIKE not having such boringly justified right margination, > as it helps them better keep track of what line is next. and the beauty of a good viewer-app is it lets each user decide. > As for the footnotes, I still agree with those who want an > appendix containing all of them, rather than having breaks > between pages contain them.? it doesn't have to be "either/or" with e-books, it can be "both". > I like this with paper books, and even more with eBooks, as it is > trivial to switch from the text to the footnote and back again. i think it's even better to have both displayed at the same time; no switching required. > I'll leave the pagination and margination issues to reader > choice as their own personal decisions, along with fonts. i agree. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060601/3f308cb6/attachment.html From Bowerbird at aol.com Thu Jun 1 10:10:36 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 1 10:10:46 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <249.b88eb40.31b0798c@aol.com> michael said: > Please stop quoting from what I said about illustrations > when bandwidth was a serious issue. . .not that everyone > has broadband these days. . .or wants pictures; however, > the point was made long ago when making an eBook larger, > often many times larger, would stop people from reading. i too have changed my position on this only recently, when the penetration of broadband in homes in the u.s. passed over 50%. but like you, i am still cognizant that not everyone has broadband, and that those who don't are on the poor side of the digital divide and thus must be given priority in our thinking. since this has been a cornerstone of my thinking all along, i'm sure it will continue to be. i know -- personally -- many people with hand-me-down machinery, far too many for me to have this vital issue slip from my radar-screen... -bowerbird p.s. i myself just moved to an ibook g4 with o.s.x. a little over a year ago... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060601/2c9cff7f/attachment.html From nwolcott2ster at gmail.com Thu Jun 1 12:42:42 2006 From: nwolcott2ster at gmail.com (Norm Wolcott) Date: Thu Jun 1 12:49:08 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digitallibraries References: <3b0.31c4827.31adca78@aol.com> Message-ID: <005e01c685b4$5b75eba0$650fa8c0@gw98> The gallica pdf's are very low resoloution mostly. Where there are diagrams they hardly come out at all, especially mathematical ones with small lettters on them. I t may he helpful to have a copy of the book nearby. OCR'ing pdf's is not for the faint hearted, as they are ot designed for this purpose. However they are good for layout of the original publications and for copyright use as the date of publication is usually given. Also shows the title page often omitted from other pdf files. I believe some gallica are available in text format if you push the "text" button. nwolcott2@post.harvard.edu ----- Original Message ----- From: "Karl Eichwalder" To: Sent: Wednesday, May 31, 2006 4:03 PM Subject: Re: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digitallibraries Michael Hart writes: > No one seems to thinks Gallica is really an eBook collection, raw > scans seems to be most of what is available, and even those are a > set of low-res versions that is not really suitable for OCRing. OCRing is important, but OCR without the scans nearby is often not enough. I think gallica is one of the best e-book collections. Their PDF are very useful (you can download complete books as PDFs pretty easily and they are readable)! This way I can access the Bulletin Monumental. > I must admit that I am relying on my friends here, as my Francias > is not really good enough to know if I didn't miss something that > would have provided better results on their site. Sure, you must know the way to create and download PDFs: www.gallica.fr -> Recherche -> "Mots du titre" - enter the title, for example "Bulletin Monumental" In the "R?sultat de la recherche: click on "Bulletin Monumental" Select the volume, you are interested in, for example "1861 (S?r. 2)" Now "T?l?charger" and "ok" if you are interested in the complete book Then wait, PDF preparation takes time. Click Vous pouvez le t?l?charger "en cliquant ici." or use the supplied FTP address. I hope this helps. -- http://www.gnu.franken.de/ke/ | ,__o | _-\_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From hart at pglaf.org Thu Jun 1 13:32:18 2006 From: hart at pglaf.org (Michael Hart) Date: Thu Jun 1 13:32:19 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digitallibraries In-Reply-To: <005e01c685b4$5b75eba0$650fa8c0@gw98> References: <3b0.31c4827.31adca78@aol.com> <005e01c685b4$5b75eba0$650fa8c0@gw98> Message-ID: On Thu, 1 Jun 2006, Norm Wolcott wrote: > > I believe some gallica are available in text format if you push the "text" > button. >From what my French friends tell me, this is only around 1% of them, and that sometimes the full text versions disappear after a while. mh From traverso at dm.unipi.it Thu Jun 1 13:40:12 2006 From: traverso at dm.unipi.it (Carlo Traverso) Date: Thu Jun 1 13:39:12 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digitallibraries In-Reply-To: <005e01c685b4$5b75eba0$650fa8c0@gw98> (nwolcott2ster@gmail.com) References: <3b0.31c4827.31adca78@aol.com> <005e01c685b4$5b75eba0$650fa8c0@gw98> Message-ID: <200606012040.k51KeCF16696@pico.dm.unipi.it> >>>>> "Norm" == Norm Wolcott writes: Norm> The gallica pdf's are very low resoloution mostly. Where Norm> there are diagrams they hardly come out at all, especially Norm> mathematical ones with small lettters on them. I t may he Norm> helpful to have a copy of the book nearby. OCR'ing pdf's is Norm> not for the faint hearted, as they are ot designed for this Norm> purpose. However they are good for layout of the original Norm> publications and for copyright use as the date of Norm> publication is usually given. Also shows the title page Norm> often omitted from other pdf files. But why you download pdf from gallica? For OCR you should download tiff, that is perfectly suited, and does not pose conversion problems. The gallica pdf is just a wrapper for the tiff files (compare a gallica pdf with a gallica tiff: the tiff is integrally contained in the pdf, with some extra wrapper) for every page). For example FineReader, if you feed a pdf, passes through ghostscript, substantially "printing" the pfd and converting the resulting bitmap; if you choose the wrong dpi while converting, you lose resolution; it instead directly uses a tiff file (tiff is the internal image format in FineReader). gallica pdf is OK if you want to read (but a multipage tiff viewer is even better). But not for OCR. You cannot blame gallica if you cannot tick the correct box when you download. Carlo Traverso From Bowerbird at aol.com Thu Jun 1 15:11:26 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 1 15:11:36 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <3eb.30681fd.31b0c00e@aol.com> michael said: > Perhaps the way to think about this is to consider > just how many more or less readers we would get if > the file sizes were that much larger or smaller. there are something like 100,000 books available at google. d.p. digitizes about 2,000 books a year. they can't keep up. > In the end, I think we should provide both. in the end, users will turn exclusively to "digital reprints" -- digital text that mimics the scans so accurately that there's really no good reason to consult the scans at all. after 10 or 20 years of nobody downloading the scans, we'll be able to feel comfortable taking them offline... > Some operations deliberately do not put their high > resolution scans online for downloading, rather an > automated process reduces the resolution, so these > scans are no longer suitable for OCRing. yeah, that's sad. but what are you gonna do about it? > The odds of being able to create a complete eBook, > using those scans that are usually made available, > perhaps about 1/4 to 1/3, based on the reports you > have probably already seen. yeah, that's sad too. but that's a quality-control issue that i suspect the scanning operations will solve soon... > Once you go through the effort of scanning missing > pages, rescanning the pages that did not work with > your OCR programs, etc., it often might seem worth > the effort simply to scan the entire book with the > higher resolution scans that you can then post for > others to use. i don't think -- for most books -- that will be the case. but perhaps that's because i don't see much use for high-resolution scans. i am _not_ in love with scans. like i said above, they will eventually be left behind. the important point _today_, though, is that we have a shitload of scan-sets, more than we can process now, and it's silly to ignore them when we _could_ offer them for people to _read_ now, even if they aren't digitized... > Do raw scans qualify as eBooks? does it matter? they are what they are. no more, no less. and almost everyone sees them for exactly what they are. > This is the "quick and dirty approach" and doesn't > cost much in terms of time, effort or money um, scanning does indeed take time, effort, and money, at least if you're doing it on a scale of millions of books... > I suppose the real question comes down to > purposes for making eBooks. i'm not sure of that. we make e-books for people to read, and so their text can be searched and easily repurposed... scans get us part of the way. digital text gets us the rest... > The various university projects still seem to be a > great deal concerned with keep their eBooks out of > the hands of the public, as has Google, though the > Google philosophy may be in the process of change. the michigan librarian pledged that all public-domain books scanned from their library will be made available to the public. i assume he meant the scan-sets. but from them, we will soon be able to automatically get digital text, so there's no difference. > Right now it's hard to tell what Google has chosen > as their goal; will they really try to do millions > of books in the next 54 months after perhaps stats > of .1 million in the first 18 months?? they most certainly will. > Will Google change their philosophy per downloading scans, if we open up negotiations with them, _maybe_. we can hope. > and or downloading their full text searching database? they'll never make their text-database public, as that's the competitive edge for which they are paying many millions... do you really think they're gonna hand it over to microsoft? > Until Google decides to actually proofread eBooks, if you mean "ensure that their digital text is highly accurate" -- which can be completely orthogonal to "proofreading" -- then you can be certain that they will "decide" to take that step. inaccurate text gives bad search results; google won't tolerate that. > My own goal has always been for the public to have their own > home eLibraries, just as they have their own home computers.? that's the goal for a lot of us. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060601/8feb4872/attachment.html From Bowerbird at aol.com Thu Jun 1 15:21:57 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 1 15:22:03 2006 Subject: [gutvol-d] roger frank, breakout programming star over at distributed proofreaders Message-ID: <441.2a0911d.31b0c285@aol.com> recently i mentioned 3 programmers over at d.p. lately, roger frank is doing some excellent work too. some involves the task of creating a project-specific dictionary, which is a very powerful tool d.p. has mostly ignored up to now. another excellent arena roger is working on involves the flagging of suspicious words. a brief overview is at: > http://pgdp.rfrank.net/ruby/dp-view.html text-to-html conversion routines are another thing that roger has been working on. all of these ideas are _fine_. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060601/71ddd981/attachment.html From hart at pglaf.org Fri Jun 2 09:22:37 2006 From: hart at pglaf.org (Michael Hart) Date: Fri Jun 2 09:22:39 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries (fwd) Message-ID: >From one of my French friends much more familiar with Gallica. On Thu, Jun 01, 2006 at 09:16:03AM -0700, Michael Hart wrote: > Each .pdf file seemed to just hold a .gif file. . .or is there > something else going on there that was missed? Gallica just has _pictures_. It is hardly more than a scanning bank. They can either serve them as PDF or TIFF files. When they did the word to have a TXT file (which they did for about 1% of their books), they give that too. (ex: _L'=EEle des pingouins_, both on Gallica as text and on PG). > And this is supposed to prepare the book as a single .pdf file? Yes. > Searchable? No, just a bunch of pictures. The document is just like a document with a picture of a different painting, or a photograph of a different landscape, on each page. It's like HTML: HTML can have text (-> searchable) or display a sequence of pictures (-> non searchable, even if they are pictures of pages with text). PDF is more confusing because the layout depends less on the viewer than with HTML (one can define custom colors/sizes/margins in HTML with CSS and the like; not so with PDF). Make the experiment with the ZIP file I pointed to. It contains a PDF file, small & light & searchable. The PDF file produced by Gallica (take the example given by the other person) is heavy and not searchable. I did the tests with xpdf but it should be the same with Acrobat Reader. To know whether a PDF file is text or a picture, I'm not sure what to do. Here are hints: . pdftotext will only work with text-PDF . searching too . if the letters look dirty, with noise, or the lines are not quite horizontal, it is a picture. From hart at pglaf.org Fri Jun 2 09:52:55 2006 From: hart at pglaf.org (Michael Hart) Date: Fri Jun 2 09:52:57 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <3eb.30681fd.31b0c00e@aol.com> References: <3eb.30681fd.31b0c00e@aol.com> Message-ID: On Thu, 1 Jun 2006 Bowerbird@aol.com wrote: > michael said: >> Perhaps the way to think about this is to consider >> just how many more or less readers we would get if >> the file sizes were that much larger or smaller. > > there are something like 100,000 books available at google. > d.p. digitizes about 2,000 books a year. they can't keep up. We work with all possible sources to get eBooks. >> In the end, I think we should provide both. > > in the end, users will turn exclusively to "digital reprints" > -- digital text that mimics the scans so accurately that > there's really no good reason to consult the scans at all. I seem to get plenty of messages from scholarly types who think source scans will always be in high demands, at the ivory tower level, at least. > after 10 or 20 years of nobody downloading the scans, > we'll be able to feel comfortable taking them offline... after 10-20 years the actual hardware requirements will appear so drastically reduced that the load will be nil. >> Some operations deliberately do not put their high >> resolution scans online for downloading, rather an >> automated process reduces the resolution, so these >> scans are no longer suitable for OCRing. > > yeah, that's sad. but what are you gonna do about it? Once you provide a better alternative, you force those who should have done it originally to do it better too. >> The odds of being able to create a complete eBook, >> using those scans that are usually made available, >> perhaps about 1/4 to 1/3, based on the reports you >> have probably already seen. > > yeah, that's sad too. but that's a quality-control issue > that i suspect the scanning operations will solve soon... I was under the impression that much of this low-quality was intentional, so I don't think those will be improving, at least until someone provides a better mousetrap. >> Once you go through the effort of scanning missing >> pages, rescanning the pages that did not work with >> your OCR programs, etc., it often might seem worth >> the effort simply to scan the entire book with the >> higher resolution scans that you can then post for >> others to use. > > i don't think -- for most books -- that will be the case. All depends on how much effort it is for the particular person in question. . .if it's a lot of effort to get the materials, but low effort to do the scanning, you may as well replace the entire file with your better examples of what should be done. > but perhaps that's because i don't see much use for > high-resolution scans. i am _not_ in love with scans. > like i said above, they will eventually be left behind. 1. Makes for better OCR 2. The scholarly types, as above. > the important point _today_, though, is that we have > a load of scan-sets, more than we can process now, > and it's silly to ignore them when we _could_ offer them > for people to _read_ now, even if they aren't digitized... Yes, and we should. >> Do raw scans qualify as eBooks? > > does it matter? they are what they are. no more, no less. > and almost everyone sees them for exactly what they are. It matters to the integrity of the eBook world. >> This is the "quick and dirty approach" and doesn't >> cost much in terms of time, effort or money > > um, scanning does indeed take time, effort, and money, > at least if you're doing it on a scale of millions of books... _I_ have no intention of quitting until I can give away a million books, and I have about the same intention of spending any real money on it. It will be interesting to see who can put a million eBooks online first, and how good they are. >> I suppose the real question comes down to >> purposes for making eBooks. > > i'm not sure of that. we make e-books for people to read, > and so their text can be searched and easily repurposed... This is obviously NOT the goal of many. > scans get us part of the way. digital text gets us the rest... Yep. . .scans are just one step, I say it's the easiest. >> The various university projects still seem to be a >> great deal concerned with keep their eBooks out of >> the hands of the public, as has Google, though the >> Google philosophy may be in the process of change. > > the michigan librarian pledged that all public-domain books > scanned from their library will be made available to the public. > i assume he meant the scan-sets. but from them, we will soon > be able to automatically get digital text, so there's no difference. I can only hope he meant something more worthwhile to the masses than what most of the current scan-sets provide and that he will be able to find some way to keep the ball rolling. >> Right now it's hard to tell what Google has chosen >> as their goal; will they really try to do millions >> of books in the next 54 months after perhaps stats >> of .1 million in the first 18 months?? > > they most certainly will. We'll see, and I am taking bets. >> Will Google change their philosophy per downloading scans, > > if we open up negotiations with them, _maybe_. we can hope. What is it that St. Augustine was quoted as saying? A bit like: "Work as though everything depends on you, Pray as though everything depends on God." I think we should work as though it all depends on us, and hope that Google will get somewhere. >> and or downloading their full text searching database? > > they'll never make their text-database public, as that's the > competitive edge for which they are paying many millions... They claim all those million are spent on scaning, not OCR. > do you really think they're gonna hand it over to microsoft? Or to the world at large? >> Until Google decides to actually proofread eBooks, > > if you mean "ensure that their digital text is highly accurate" > -- which can be completely orthogonal to "proofreading" -- > then you can be certain that they will "decide" to take that step. > inaccurate text gives bad search results; google won't tolerate that. Actually, you have it backwards there. . .think about it. . . . Google's monster speciality is SEARCH ENGINES!!! They are MUCH more interested in writing a search engine that will read fuzzy OCR text than in increasing the accuracy of the text. >> My own goal has always been for the public to have their own >> home eLibraries, just as they have their own home computers.? > > that's the goal for a lot of us. !!! > > -bowerbird > Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From marcello at perathoner.de Fri Jun 2 10:29:01 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Jun 2 10:29:05 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: References: <3eb.30681fd.31b0c00e@aol.com> Message-ID: <4480755D.3090301@perathoner.de> Michael Hart wrote: > Google's monster speciality is SEARCH ENGINES!!! > > They are MUCH more interested in writing a search engine that will > read fuzzy OCR text than in increasing the accuracy of the text. You mean a search engine that finds "I)arwin" when I search for "Darwin"? That search engine would have to automagically decide that "I)" looks quite a bit the same as "D". But that's the same thing an OCR software already does! to match characters against ink stains. If they come up with some better algorithm to do that, they would be foolish not to use it directly on the scanned texts. Somewhere they have to keep the OCRed text of their books. It would take much less cycles to clean up the text (once) instead of having the search engine do a fuzzy match every time a user does a search. -- Marcello Perathoner webmaster@gutenberg.org From hart at pglaf.org Fri Jun 2 10:36:35 2006 From: hart at pglaf.org (Michael Hart) Date: Fri Jun 2 10:36:36 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <4480755D.3090301@perathoner.de> References: <3eb.30681fd.31b0c00e@aol.com> <4480755D.3090301@perathoner.de> Message-ID: On Fri, 2 Jun 2006, Marcello Perathoner wrote: > Michael Hart wrote: > >> Google's monster speciality is SEARCH ENGINES!!! >> >> They are MUCH more interested in writing a search engine that will >> read fuzzy OCR text than in increasing the accuracy of the text. > > You mean a search engine that finds "I)arwin" when I search for "Darwin"? > > That search engine would have to automagically decide that "I)" looks > quite a bit the same as "D". Someone posted a number of such examples they found a while back, and it appeared as if that was the general idea. > But that's the same thing an OCR software already does! to match > characters against ink stains. If they come up with some better > algorithm to do that, they would be foolish not to use it directly on > the scanned texts. I think they will probably wait several iterations of improvement before it becomes obvious to them that they should improve the text. > Somewhere they have to keep the OCRed text of their books. It would take > much less cycles to clean up the text (once) instead of having the > search engine do a fuzzy match every time a user does a search. They probably have enough computing power not to be worried about that, but perhaps eventually they will have a large enough collection for the thought to come. Michael From Bowerbird at aol.com Fri Jun 2 11:31:55 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Jun 2 11:32:16 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <3fb.31210cc.31b1de1b@aol.com> michael said: > We work with all possible sources to get eBooks. all of your sources, combined, won't be able to keep up. not until the digitization becomes nearly-automatic. which, as i've said, is not that far down the line anyway. > I seem to get plenty of messages from scholarly types > who think source scans will always be in high demands, > at the ivory tower level, at least. i'm not sure they know what they want. in fact, i'm almost sure that they don't... > after 10-20 years the actual hardware requirements will > appear so drastically reduced that the load will be nil. it won't be the resources required (or not required) that makes us take down the scans, it will be the lack of demand. digital reprints will do the same job, better, with fewer resources. > Once you provide a better alternative, you force those > who should have done it originally to do it better too. if you can scare up the $250-million budget, please do. as i've said elsewhere, that's what we spend in _two_days_ on the war in iraq, so you wouldn't _think_ it's all that hard to find the same amount for such a culturally important task. but i don't see anyone except google stepping up to the plate. > I was under the impression that much of this low-quality > was intentional, so I don't think those will be improving, > at least until someone provides a better mousetrap. that's exactly why i asked "what can we do about it?" > All depends on how much effort it is for the particular person > in question. . .if it's a lot of effort to get the materials, > but low effort to do the scanning, you may as well replace the > entire file with your better examples of what should be done. i agree. and in the cases where we can't use google's scans, that's what we'll have to do. let's just hope, though, that that won't be the case for the bulk of those 10-million unique titles. > 1.? Makes for better OCR i'm rooting for better o.c.r. i sincerely hope that it happens, and i suspect the abbyy folks still have tricks up their sleeves. and, just to remind everybody here, they have _already_ made a version of their software that's specially-adapted for old books, a version that nobody here, to my knowledge, has even _tried_, so y'all will need to do some convincing in order to convince me that you're really as concerned with the o.c.r. thing as you claim. but as for me, i'm not counting on the o.c.r. much at all; i'll take what is currently available in regard to o.c.r. tech. my aim is to jack up the post-o.c.r. correction routines, using a wide array of automagic. > 2.? The scholarly types, as above. let 'em use their scholar dollars to create whatever they need. i can't be bothered with their esotericism. i just love the books. > Yes, and we should. well, i'm glad i finally got _that_ tooth pulled! ;+) > It matters to the integrity of the eBook world. my integrity does not turn on semantics. > _I_ have no intention of quitting > until I can give away a million books, that's what i love about you, big boy, your dedication. > It will be interesting to see who can put a million eBooks > online first, and how good they are. i agree. > Yep. . .scans are just one step, I say it's the easiest. depends on how many you do. > I can only hope he meant something more worthwhile > to the masses than what most of the current scan-sets provide > and that he will be able to find some way to keep the ball rolling. i'll be happy to help him out, just like i'm happy to help you out. > We'll see, and I am taking bets. pizza. loser buys in the winner's city. you can fly out to santa monica and spend some time with me sometime when it's wintry cold there in illinois. > I think we should work as though it all depends on us, > and hope that Google will get somewhere. you're not the best person to do that negotiation anyway. > They claim all those million are spent on scaning, not OCR. it is. and that's why it will take a _negotiation_ to get them to release the public-domain scans. they won't do it "just because". but i think there _are_ some things we can offer in negotiation. one would be the quality-control that we're willing to do for 'em. although i think they'll realize soon they need to do this themselves, at the time when they've still got the book right there by the scanner, it never hurts to have another entity take a look at your work later... another would be an offer to serve as their "reading room", which would mean we'd dish the pages to people for reading, so google could instead concentrate completely on being "the search engine". (this might mean we'd have to agree not to furnish our own search capability, but as long as their engine is nicely integrated into our presentation regime, i don't think that would be a problem at all; many websites use google as their search-engine even at present.) and perhaps most importantly, what we could offer is huge help in the form of friend-of-the-court briefs that would be supportive of google's scanning project in facing their various legal challenges. public opinion will be very important when this comes to judgment, and a good-faith effort like turning their public-domain scans loose could go a _long_ way in drumming up public support for their work. on the other hand, a selfish attitude on google's part would make 'em look bad, and that appearance could be quite devastating to their case. i assume all of these points are reasonably apparent to google already, so the "negotiation" wouldn't have to be antagonistic in nature. indeed, it might be very short and very sweet, and we could find ourselves with 100,000 scan-sets on our machines before we knew it. that possibility sounds too good to me to pass up without giving it serious consideration. > Actually, you have it backwards there. . .think about it. . . . > Google's monster speciality is SEARCH ENGINES!!! > They are MUCH more interested in writing a search engine that will > read fuzzy OCR text than in increasing the accuracy of the text. if you can search fuzzy text, you can correct fuzzy text. that's the point. if google lets its text remain fuzzy, it will be because they _decided_to_. and there are a couple reasons why they might well decide to do that, but i'd rather not take a chance of making them real by discussing them. still, as i've said, i myself will show the world how to correct fuzzy text, within 5 years, assuming that abbyy hasn't already solved the problem. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060602/a4d894e8/attachment.html From ke at gnu.franken.de Fri Jun 2 11:50:26 2006 From: ke at gnu.franken.de (Karl Eichwalder) Date: Fri Jun 2 12:38:18 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries (fwd) In-Reply-To: (Michael Hart's message of "Fri, 2 Jun 2006 09:22:37 -0700 (PDT)") References: Message-ID: Michael Hart writes: > Gallica just has _pictures_. Gallica _has_ pictures and that's very nice. >> Searchable? > > No, just a bunch of pictures. Searching isn't the only thing that matter's. Think about children's book where pictures are very important. The same is valid for book about architecture, etc. As I said earlier, we need both sides of the coin--: the pictures and the text or the text and the pictures (= scans). Not necessarily within the same file (PDF, Djvu, or .tar.bz2), but catalogued or archived in a way that it is possible to download the wanted files easily. -- http://www.gnu.franken.de/ke/ | ,__o | _-\_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C From maitri.vr at gmail.com Fri Jun 2 10:19:06 2006 From: maitri.vr at gmail.com (maitri venkat-ramani) Date: Sat Jun 3 08:04:12 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: References: <3eb.30681fd.31b0c00e@aol.com> Message-ID: <6ebf94650606021019o20137648s60df33e87b8ebd67@mail.gmail.com> Page scans are not eBooks. They are not universally searchable, readable and editable, and have full ability to become proprietary, i.e. owned or copy-protected. Maitri On 6/2/06, Michael Hart wrote: > > On Thu, 1 Jun 2006 Bowerbird@aol.com wrote: > > > michael said: > >> Perhaps the way to think about this is to consider > >> just how many more or less readers we would get if > >> the file sizes were that much larger or smaller. > > > > there are something like 100,000 books available at google. > > d.p. digitizes about 2,000 books a year. they can't keep up. > > We work with all possible sources to get eBooks. > > > >> In the end, I think we should provide both. > > > > in the end, users will turn exclusively to "digital reprints" > > -- digital text that mimics the scans so accurately that > > there's really no good reason to consult the scans at all. > > I seem to get plenty of messages from scholarly types who > think source scans will always be in high demands, at the > ivory tower level, at least. > > > > after 10 or 20 years of nobody downloading the scans, > > we'll be able to feel comfortable taking them offline... > > after 10-20 years the actual hardware requirements will > appear so drastically reduced that the load will be nil. > > > >> Some operations deliberately do not put their high > >> resolution scans online for downloading, rather an > >> automated process reduces the resolution, so these > >> scans are no longer suitable for OCRing. > > > > yeah, that's sad. but what are you gonna do about it? > > Once you provide a better alternative, you force those > who should have done it originally to do it better too. > > > >> The odds of being able to create a complete eBook, > >> using those scans that are usually made available, > >> perhaps about 1/4 to 1/3, based on the reports you > >> have probably already seen. > > > > yeah, that's sad too. but that's a quality-control issue > > that i suspect the scanning operations will solve soon... > > I was under the impression that much of this low-quality > was intentional, so I don't think those will be improving, > at least until someone provides a better mousetrap. > > > >> Once you go through the effort of scanning missing > >> pages, rescanning the pages that did not work with > >> your OCR programs, etc., it often might seem worth > >> the effort simply to scan the entire book with the > >> higher resolution scans that you can then post for > >> others to use. > > > > i don't think -- for most books -- that will be the case. > > All depends on how much effort it is for the particular person > in question. . .if it's a lot of effort to get the materials, > but low effort to do the scanning, you may as well replace the > entire file with your better examples of what should be done. > > > > but perhaps that's because i don't see much use for > > high-resolution scans. i am _not_ in love with scans. > > like i said above, they will eventually be left behind. > > 1. Makes for better OCR > > 2. The scholarly types, as above. > > > > the important point _today_, though, is that we have > > a load of scan-sets, more than we can process now, > > and it's silly to ignore them when we _could_ offer them > > for people to _read_ now, even if they aren't digitized... > > Yes, and we should. > > > >> Do raw scans qualify as eBooks? > > > > does it matter? they are what they are. no more, no less. > > and almost everyone sees them for exactly what they are. > > It matters to the integrity of the eBook world. > > > >> This is the "quick and dirty approach" and doesn't > >> cost much in terms of time, effort or money > > > > um, scanning does indeed take time, effort, and money, > > at least if you're doing it on a scale of millions of books... > > _I_ have no intention of quitting until I can give away a million books, > and I have about the same intention of spending any real money on it. > > It will be interesting to see who can put a million eBooks online first, > and how good they are. > > > >> I suppose the real question comes down to > >> purposes for making eBooks. > > > > i'm not sure of that. we make e-books for people to read, > > and so their text can be searched and easily repurposed... > > This is obviously NOT the goal of many. > > > > scans get us part of the way. digital text gets us the rest... > > Yep. . .scans are just one step, I say it's the easiest. > > > > >> The various university projects still seem to be a > >> great deal concerned with keep their eBooks out of > >> the hands of the public, as has Google, though the > >> Google philosophy may be in the process of change. > > > > the michigan librarian pledged that all public-domain books > > scanned from their library will be made available to the public. > > i assume he meant the scan-sets. but from them, we will soon > > be able to automatically get digital text, so there's no difference. > > I can only hope he meant something more worthwhile to the masses > than what most of the current scan-sets provide and that he will > be able to find some way to keep the ball rolling. > > > >> Right now it's hard to tell what Google has chosen > >> as their goal; will they really try to do millions > >> of books in the next 54 months after perhaps stats > >> of .1 million in the first 18 months? > > > > they most certainly will. > > We'll see, and I am taking bets. > > > >> Will Google change their philosophy per downloading scans, > > > > if we open up negotiations with them, _maybe_. we can hope. > > What is it that St. Augustine was quoted as saying? A bit like: > > "Work as though everything depends on you, > Pray as though everything depends on God." > > I think we should work as though it all depends on us, > and hope that Google will get somewhere. > > > >> and or downloading their full text searching database? > > > > they'll never make their text-database public, as that's the > > competitive edge for which they are paying many millions... > > They claim all those million are spent on scaning, not OCR. > > > > do you really think they're gonna hand it over to microsoft? > > Or to the world at large? > > > >> Until Google decides to actually proofread eBooks, > > > > if you mean "ensure that their digital text is highly accurate" > > -- which can be completely orthogonal to "proofreading" -- > > then you can be certain that they will "decide" to take that step. > > inaccurate text gives bad search results; google won't tolerate that. > > Actually, you have it backwards there. . .think about it. . . . > > Google's monster speciality is SEARCH ENGINES!!! > > They are MUCH more interested in writing a search engine that will > read fuzzy OCR text than in increasing the accuracy of the text. > > > >> My own goal has always been for the public to have their own > >> home eLibraries, just as they have their own home computers. > > > > that's the goal for a lot of us. > > !!! > > > > > -bowerbird > > > > > > Thanks!!! > > Give the world eBooks in 2006!!! > > Michael S. Hart > Founder > Project Gutenberg > > Blog at http://hart.pglaf.org > > From Bowerbird at aol.com Sat Jun 3 10:17:07 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Jun 3 10:17:14 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <490.1d7f4ba.31b31e13@aol.com> maitri said: > Page scans are not eBooks.? They are not universally searchable, > readable and editable, and have full ability to become proprietary, > i.e. owned or copy-protected. thanks for joining the thread. where ya been? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060603/d498f6b5/attachment.html From prosfilaes at gmail.com Sat Jun 3 10:45:28 2006 From: prosfilaes at gmail.com (David Starner) Date: Sat Jun 3 10:52:36 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <6ebf94650606021019o20137648s60df33e87b8ebd67@mail.gmail.com> References: <3eb.30681fd.31b0c00e@aol.com> <6ebf94650606021019o20137648s60df33e87b8ebd67@mail.gmail.com> Message-ID: <6d99d1fd0606031045q69c419f6x90af037faf55cadc@mail.gmail.com> On 6/2/06, maitri venkat-ramani wrote: > Page scans are not eBooks. They are not universally searchable, > readable and editable, and have full ability to become proprietary, > i.e. owned or copy-protected. Everything can be taken proprietary; in fact, most proprietary eBook formats are text based. It's far easier to lock up text then it is to lock up images; if you display images, at the least you can grab them with a digital camera. From cannona at fireantproductions.com Sun Jun 4 10:44:57 2006 From: cannona at fireantproductions.com (Aaron Cannon) Date: Sun Jun 4 11:04:40 2006 Subject: [gutvol-d] any one know java script? Message-ID: <7.0.1.0.0.20060604124119.01a19e00@fireantproductions.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >Hi all. I'm looking for someone who knows Java Script to help out >with creating a script that will select a specific range of check >boxes. I have a script that will check all check boxes, but I need >one that will, if for example I check the first and then the 19th check box, it >will check boxes 2-18 as well. This is for the PG CD/DVD request system. > >If anyone knows how to do something like this or if my above >description wasn't clear, please let me know. > >Thanks! > >Sincerely >Aaron Cannon > > - -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFEgxwrI7J99hVZuJcRAvlrAKCutvTA/nU9wb6u6xnj5pwNY20AEwCgsH89 45G+XKb4SKnfxKXklC/8KPU= =SaA3 -----END PGP SIGNATURE----- From tb at baechler.net Mon Jun 5 00:26:45 2006 From: tb at baechler.net (Tony Baechler) Date: Mon Jun 5 00:24:12 2006 Subject: [gutvol-d] any one know java script? In-Reply-To: <7.0.1.0.0.20060604124119.01a19e00@fireantproductions.com> References: <7.0.1.0.0.20060604124119.01a19e00@fireantproductions.com> Message-ID: <7.0.1.0.2.20060605002200.02c814b0@baechler.net> Hi, On the client side, you can get the GreaseMonkey extension for Firefox that will do what you want. You still need to know javascript but it has a lot of sample scripts. On the server side, I don't know. You would probably have to embed it in the html page. For the blind and others prevented from reading by a print disability, there is a site called http://www.bookshare.org/ . This site has a $50 annual fee and a $25 set up fee but has many books on Java, programming, javascript, and other technology related items. I highly recommend it. Unfortunately it is only open to US citizens. They have all of the O'Reilly books. O'Reilly Media publishes only computer and tech books. My point is that I've seen a couple javascript books there and one of them would probably do what you want. If you already have many scanned books that you've scanned for yourself and include the title, author and copyright pages, you can upload them and get a free or reduced subscription cost. From cannona at fireantproductions.com Mon Jun 5 08:03:05 2006 From: cannona at fireantproductions.com (Aaron Cannon) Date: Mon Jun 5 08:27:25 2006 Subject: [gutvol-d] any one know java script? In-Reply-To: <7.0.1.0.2.20060605002200.02c814b0@baechler.net> References: <7.0.1.0.0.20060604124119.01a19e00@fireantproductions.com> <7.0.1.0.2.20060605002200.02c814b0@baechler.net> Message-ID: <7.0.1.0.0.20060605100004.01dc07a0@fireantproductions.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yeah, figuring it out on my own is the last-case option. :) I'll probably turn to bookshare if I don't find someone who knows how to do it. I'll have to renew though, as my subscription has lapsed. No big deal though, I was meaning to anyway. Thanks for the info. Sincerely Aaron Cannon At 02:26 AM 6/5/2006, you wrote: >Hi, > >On the client side, you can get the GreaseMonkey extension for >Firefox that will do what you want. You still need to know >javascript but it has a lot of sample scripts. On the server side, >I don't know. You would probably have to embed it in the html page. > >For the blind and others prevented from reading by a print >disability, there is a site called http://www.bookshare.org/ . This >site has a $50 annual fee and a $25 set up fee but has many books on >Java, programming, javascript, and other technology related >items. I highly recommend it. Unfortunately it is only open to US >citizens. They have all of the O'Reilly books. O'Reilly Media >publishes only computer and tech books. My point is that I've seen >a couple javascript books there and one of them would probably do >what you want. If you already have many scanned books that you've >scanned for yourself and include the title, author and copyright >pages, you can upload them and get a free or reduced subscription cost. > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > - -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFEhEfGI7J99hVZuJcRAqwoAJ9bkB3YpTCovUy6RXikrtr1B8aCJQCgj1o6 AW0HzFfe0w8k5a0Z9Qvkbxs= =VBYl -----END PGP SIGNATURE----- From mlockey at magma.ca Mon Jun 5 11:52:54 2006 From: mlockey at magma.ca (Michael Lockey) Date: Mon Jun 5 11:53:03 2006 Subject: [gutvol-d] DP-Canada progress report In-Reply-To: <20060528190002.AE8348CB9B@pglaf.org> Message-ID: <200606051852.k55IqtwP008277@mail3.magma.ca> We are experiencing some delays while awaiting the DP-EU source. They're heavily overworked there, but not a problem... The forums are up at dp-can.cybernetik.ca/phpbb2, so please feel free to log on and make suggestions as the site becomes more accessible. Content providers may start entering through userid dpscans with password of image$. The files go into the directory \inetpub\dp-uploads. Users can create their own subdirectories, e.g. "\vasa", and thereunder put their projects, e.g. "\vasa\FamousParrotsIhaveKnown". (I might note that I'm not ready to put this up with a proper web address until it's a lot more stable than it is: the situation is only temporary, of course.) Michael Lockey (Hoping to surprize and amuse you once we are flying; many thanks to Don Kretz for all his work.) From traverso at dm.unipi.it Mon Jun 5 12:06:22 2006 From: traverso at dm.unipi.it (Carlo Traverso) Date: Mon Jun 5 12:04:47 2006 Subject: [gutvol-d] DP-Canada progress report In-Reply-To: <200606051852.k55IqtwP008277@mail3.magma.ca> (mlockey@magma.ca) References: <200606051852.k55IqtwP008277@mail3.magma.ca> Message-ID: <200606051906.k55J6M820121@pico.dm.unipi.it> >>>>> "Michael" == Michael Lockey writes: Michael> We are experiencing some delays while awaiting the DP-EU Michael> source. They're heavily overworked there, but not a Michael> problem... Michael> The forums are up at dp-can.cybernetik.ca/phpbb2, so Michael> please feel free to log on and make suggestions as the Michael> site becomes more accessible. Michael> Content providers may start entering through userid Michael> dpscans with password of image$. The files go into the Michael> directory \inetpub\dp-uploads. Users can create their own Michael> subdirectories, e.g. "\vasa", and thereunder put their Michael> projects, e.g. "\vasa\FamousParrotsIhaveKnown". An important piece of info missing: ISO-8859-1 or UTF-8? (a second piece of info: are zip files allowed?) Carlo From squadette at gmail.com Mon Jun 5 12:21:38 2006 From: squadette at gmail.com (Alexey Mahotkin) Date: Tue Jun 6 08:03:38 2006 Subject: [gutvol-d] typo in Leonardo Da Vinci Notebooks Message-ID: hello, a rather common OCR-style typo: http://www.gutenberg.org/dirs/etext04/8ldvc10.txt "zvith", should be "with". Thank you for all your work, --alexm From sly at victoria.tc.ca Tue Jun 6 08:22:21 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Jun 6 08:22:25 2006 Subject: [gutvol-d] typo in Leonardo Da Vinci Notebooks In-Reply-To: References: Message-ID: Greetings Alexey... Thanks for mentioning this. However, gutvol-d is a general discussion mailing list; It is entirely possible that this error will never be dealt with if you mention it here. I've forwarded it to our "errata" email address. Also, see the faq at: http://www.gutenberg.org/faq/R-26 Thanks, Andrew On Mon, 5 Jun 2006, Alexey Mahotkin wrote: > hello, > > a rather common OCR-style typo: > > http://www.gutenberg.org/dirs/etext04/8ldvc10.txt > > "zvith", should be "with". > > > Thank you for all your work, > > --alexm > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From mlockey at magma.ca Tue Jun 6 16:39:45 2006 From: mlockey at magma.ca (Michael Lockey) Date: Tue Jun 6 16:39:57 2006 Subject: [gutvol-d] DP-Canada progress report In-Reply-To: <200606051906.k55J6M820121@pico.dm.unipi.it> Message-ID: <200606062339.k56Ndjuv025949@mail3.magma.ca> >An important piece of info missing: ISO-8859-1 or UTF-8? >(a second piece of info: are zip files allowed?) >Carlo That's, of course, dependent on DP-EU's help (for which we are NOT pushing: they're busy, we're happy for what we can get!) Zip files are allowed. Cheers, Michael From marcello at perathoner.de Tue Jun 6 16:52:02 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Jun 6 16:52:06 2006 Subject: [gutvol-d] All people with accounts on ibiblio! PG site moving to wiki Message-ID: <44861522.2050905@perathoner.de> I'm moving the static part of the PG site to a wiki. This will allow more people to participate in the site maintenance and improvement. Everybody who currently has shell access on ibiblio should stop editing the html pages and transfer their content to the wiki instead. The wiki will soon replace most of the PG site, except for the online catalog and a few other pages. The wiki can be reached at: http://www.gutenberg.org/wiki/ 1. Currently all new users have to be added by a sysop (me). If you speak wiki and want an account, mail me your username and initial password. You will be added to the "gutenberg" group. 2. The wiki has a 'private' section. All pages starting with "Gutenberg:" are editable by the "gutenberg" group only. This will be the 'official' PG site. 3. The rest of the wiki works just like wikis are supposed to work. You may put it to any use you like that helps "producing and distributing ebooks". -- Marcello Perathoner webmaster@gutenberg.org From ajhaines at shaw.ca Wed Jun 7 09:12:30 2006 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Wed Jun 7 09:13:38 2006 Subject: [gutvol-d] Dagger/sword symbol, Scandinavian countries Message-ID: <000501c68a4d$2e067d90$6401a8c0@ahainesp2400> A couple of questions, just to satisfy my curiosity: - A book I'm working on has a small dagger or sword symbol (point down, handle up) next to some dates. It looks something like the "dagger" symbol in Windows' Arial font, Unicode U2020. Two examples, with an exclamation mark substituting for the symbol, are "Occam ! c. 1349" and "Colet ! 1519". What does this symbol mean? - On several books' copyright page, I've seen the statement "all rights reserved, including that of translation into foreign languages, including the Scandinavian." Why are Scandinavian languages specially noted like this? Al From dixonm at pobox.com Wed Jun 7 09:53:26 2006 From: dixonm at pobox.com (Meredith Dixon) Date: Wed Jun 7 10:04:44 2006 Subject: [gutvol-d] Dagger/sword symbol, Scandinavian countries In-Reply-To: <000501c68a4d$2e067d90$6401a8c0@ahainesp2400> References: <000501c68a4d$2e067d90$6401a8c0@ahainesp2400> Message-ID: <44870486.2070009@pobox.com> Al Haines (shaw) wrote: > - A book I'm working on has a small dagger or sword symbol (point down, > handle up) next to some dates....What does this symbol mean? Date of death. From hyphen at hyphenologist.co.uk Wed Jun 7 11:24:12 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Wed Jun 7 11:24:23 2006 Subject: [gutvol-d] Dagger/sword symbol, Scandinavian countries In-Reply-To: <44870486.2070009@pobox.com> References: <000501c68a4d$2e067d90$6401a8c0@ahainesp2400> <44870486.2070009@pobox.com> Message-ID: On Wed, 07 Jun 2006 12:53:26 -0400, Meredith Dixon wrote: |Al Haines (shaw) wrote: | |> - A book I'm working on has a small dagger or sword symbol (point down, |> handle up) next to some dates....What does this symbol mean? | |Date of death. ROTFLMAO They murder all authors in Scandinavia ;-) -- Dave Fawthrop "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From sly at victoria.tc.ca Wed Jun 7 18:19:00 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed Jun 7 18:19:37 2006 Subject: [gutvol-d] Dagger/sword symbol, Scandinavian countries In-Reply-To: <000501c68a4d$2e067d90$6401a8c0@ahainesp2400> References: <000501c68a4d$2e067d90$6401a8c0@ahainesp2400> Message-ID: On Wed, 7 Jun 2006, Al Haines (shaw) wrote: > A couple of questions, just to satisfy my curiosity: > > - A book I'm working on has a small dagger or sword symbol (point down, > handle up) next to some dates. It looks something like the "dagger" symbol > in Windows' Arial font, Unicode U2020. Two examples, with an exclamation > mark substituting for the symbol, are "Occam ! c. 1349" and "Colet ! 1519". > What does this symbol mean? Yes, as already mentioned, the dagger symbol is used to indicate date of death. You might also occassionally see it used as a footnote marker. And yes, U+2020 is the correct code point for this character. > - On several books' copyright page, I've seen the statement "all rights > reserved, including that of translation into foreign languages, including > the Scandinavian." Why are Scandinavian languages specially noted like > this? > I only have guesses here. It could be something to do with the state of international laws at the time. Or perhaps there had been a significant number of unauthorized scandanavian translations. I do know that many of the Finnish texts in PG are translations of a surprisingly broad spectrum of works from other languages. For example, Beaumarchais' "Marriage of Figaro", Edward Bellamy's "Looking Backward, 2000 to 1887", Dante's "Divine Comedy", Dickens' "David Copperfield", and works by Epictetus, Gustave Flaubert, Goethe, Henrik Ibsen, Moliere, Nietzsche, Shakespeare, Sir Walter Scott, Tolstoy, Harriet Beecher Stowe, and Jules Verne. Andrew From hyphen at hyphenologist.co.uk Thu Jun 8 02:40:15 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Thu Jun 8 02:40:28 2006 Subject: [gutvol-d] URL for a single author is PG catalogue? Message-ID: I am trying to beg a link to the PG catalogue for John Hartley's books from my local Library. He was a Halifax poet and they specialise in paper copies of his works. Is there a way of giving them a link to the PG catalogue which will go straight to John Hartley's books? This must be quite a common problem, could someone consider including it in the PG FAQ? -- Dave Fawthrop "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From sly at victoria.tc.ca Thu Jun 8 02:48:41 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Jun 8 02:48:46 2006 Subject: [gutvol-d] URL for a single author is PG catalogue? In-Reply-To: References: Message-ID: According to the explanation at http://www.gutenberg.org/howto-link the best form to use is: http://www.gutenberg.org/author/John_Hartley This has been implemented for quite some time now. For an example of it in use, see the Gutenberg link from: http://en.wikipedia.org/wiki/John_Hartley_%28poet%29 Andrew On Thu, 8 Jun 2006, Dave Fawthrop wrote: > I am trying to beg a link to the PG catalogue for John Hartley's books from > my local Library. He was a Halifax poet and they specialise in paper > copies of his works. > > Is there a way of giving them a link to the PG catalogue which will go > straight to John Hartley's books? > > This must be quite a common problem, could someone consider including it in > the PG FAQ? > > From marcello at perathoner.de Thu Jun 8 10:40:03 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Jun 8 10:40:07 2006 Subject: [gutvol-d] Works of Bertolt Brecht Message-ID: <448860F3.4020506@perathoner.de> Famous German playwright Bertolt Brecht died 14. Aug. 1956. His works will therefore be in the public domain in life+50 countries by January. I'm a big fan of Brecht so I'm willing to put quite a lot of work into an electronic edition of his works. I have an edition of his "Gesammelte Werke" (collected works, 20 volumes) Copyright Suhrkamp Verlag Frankfurt am Main 1967 that I'm willing to sacrifice to the good cause. Is this edition eligible for processing by DP-EU or any other DP? -- Marcello Perathoner webmaster@gutenberg.org From fvandrog at scripps.edu Thu Jun 8 11:20:13 2006 From: fvandrog at scripps.edu (Frank van Drogen) Date: Thu Jun 8 11:20:13 2006 Subject: [gutvol-d] Works of Bertolt Brecht In-Reply-To: <448860F3.4020506@perathoner.de> References: <448860F3.4020506@perathoner.de> Message-ID: <7.0.1.0.0.20060608111844.01d02908@scripps.edu> At 10:40 AM 6/8/2006, you wrote: >Famous German playwright Bertolt Brecht died 14. Aug. 1956. His works >will therefore be in the public domain in life+50 countries by January. > >I'm a big fan of Brecht so I'm willing to put quite a lot of work into >an electronic edition of his works. > >I have an edition of his "Gesammelte Werke" (collected works, 20 >volumes) Copyright Suhrkamp Verlag Frankfurt am Main 1967 that I'm >willing to sacrifice to the good cause. Is this edition eligible for >processing by DP-EU or any other DP? DP-EU would be perfectely happy to process his works; and the Gesammelte Werke should be fine from copyright perspectives. Frank From Bowerbird at aol.com Thu Jun 8 12:06:24 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 8 12:06:41 2006 Subject: [gutvol-d] Dagger/sword symbol, Scandinavian countries Message-ID: <37c.44730fd.31b9cf30@aol.com> andrew said: > Edward Bellamy's "Looking Backward, 2000 to 1887" gosh i enjoyed that book as a youngster... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060608/ecde0406/attachment.html From hart at pglaf.org Thu Jun 8 13:13:03 2006 From: hart at pglaf.org (Michael Hart) Date: Thu Jun 8 13:13:05 2006 Subject: [gutvol-d] !@! 4 Weeks: The Big Push, Well Not So Big This Time Message-ID: As most of you are aware, it is 4 weeks until we complete our 35th year of Project Gutenberg history, and we have about 380 eBooks left to make it to 20,000. This would be about 95 per week. . .we did 82 this week. So it's not such a Big Push as we did to get to 10,000, but a rather smaller push, which is why you haven't heard me say an awfully lot about it. . .things are working out much a closer match to reaching 20,000 on our 35th anniversary than anyone, myself included, would likely have predicted. However, especially since I am planning on taking a week off, right at July 4th, I am best man at my best friend's wedding, I am trying to get as much as possible done before I leave as soon as I can after sending out the Newsletter a week before. I am working on the July 5th Newsletter, and will have it out in a fairly complete manner half a day after the previous one goes out, and am hoping that some of our volunteers will have the wherewithal to update it and send it out July 5th with an entirely up to date revision, that hopefully will hit 20,000. If you have any books that are near completion, but would not be totally through all the various processes, we can put them in the "PrePrints" section now, where perhaps a few people in the next few weeks can help with them. More later, I'm just trying to make it one day at a time right now. . . . Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From cannona at fireantproductions.com Fri Jun 9 07:58:11 2006 From: cannona at fireantproductions.com (Aaron Cannon) Date: Fri Jun 9 08:00:03 2006 Subject: [gutvol-d] All people with accounts on ibiblio! PG site moving to wiki In-Reply-To: <44861522.2050905@perathoner.de> References: <44861522.2050905@perathoner.de> Message-ID: <7.0.1.0.0.20060609095424.0190f0a0@fireantproductions.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Marcello and all. The idea of a wiki is a great one! The only problem I have with it is that it is really slow. How are you planning on dealing with that? Perhaps a daily static dump of the Gutenberg Namespace could be made and visitors could be referred to that. Updates wouldn't show up immediately, but it might not matter in most cases. Again, thanks for setting this up! It will make updating pages much easier. Sincerely Aaron Cannon At 06:52 PM 6/6/2006, you wrote: >I'm moving the static part of the PG site to a wiki. This will allow >more people to participate in the site maintenance and improvement. > >Everybody who currently has shell access on ibiblio should stop editing >the html pages and transfer their content to the wiki instead. The wiki >will soon replace most of the PG site, except for the online catalog and >a few other pages. > >The wiki can be reached at: > > http://www.gutenberg.org/wiki/ > > >1. Currently all new users have to be added by a sysop (me). If you >speak wiki and want an account, mail me your username and initial >password. You will be added to the "gutenberg" group. > >2. The wiki has a 'private' section. All pages starting with >"Gutenberg:" are editable by the "gutenberg" group only. This will be >the 'official' PG site. > >3. The rest of the wiki works just like wikis are supposed to work. You >may put it to any use you like that helps "producing and distributing >ebooks". > > >-- >Marcello Perathoner >webmaster@gutenberg.org > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d - -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFEiY0cI7J99hVZuJcRAtx/AJ4mB2XR+BMwybvzg/Nz+CgIMfyHyQCfW9G4 qByJ8dJRiFwkB1shgl6S5Co= =ck3b -----END PGP SIGNATURE----- From hart at pglaf.org Thu Jun 8 09:36:06 2006 From: hart at pglaf.org (Michael Hart) Date: Fri Jun 9 08:27:54 2006 Subject: [gutvol-d] Thank You for all of your work (fwd) Message-ID: As usual at this time of the year, I will be sending you some "Thank You Notes" from our Project Gutenberg readers. Here is one message, in it's entirety, that I hope you enjoy! Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org ---------- Forwarded message ---------- Date: Sat, 03 Jun 2006 17:48:23 +0100 From: Amy To: hart@pobox.com Subject: Thank You for all of your work Dear Project Gutenberg, I don?t know if many people take the time to thank you, but I just wanted to express my gratitude for the services you provide. Thank you all for your work and dedication. Your work is profoundly appreciated. I am a Peace Corps Volunteer working deep in rural Namibia. I had always admired Project Gutenberg (even donating some time through Distributed Proofreaders) but I have only begun to realize how truly important it is since I have been here. When I was in America it was nice to have access to books whenever I felt like it, without having to go to the trouble of going to a library or bookshop, but here it is vital. Bookshops are rare in Namibia (the nearest one to my village is over 250 kilometres away) and the books they sell are often very very expensive (especially considering my limited financial resources.) Also, the books they sell are often only in Afrikaans or German, neither of which I understand (Peace Corps taught me the tribal language in my village?KhoeKhoe?instead.) Libraries are even rarer and often badly under stocked. I am trying to build up a school library, but we are dependent on donations and it is much more important to get easy to read picture books to help the children with their English than to get classics for my own consumption. Project Gutenberg has become my library. I didn?t realize the importance of plain vanilla texts until I got here and realized how slow and expensive internet is. The zipped plain vanilla texts often take less than 5 or 10 minutes to download and provide hours of reading enjoyment. Thank you for being an equalizing force in literacy, allowing books to reach those who would otherwise have a hard time getting them. Your work is thoroughly appreciated. I have shared your site with other volunteers who also enjoy it. Thank you so much. I am immensely grateful. Sincerely, Amy Elizabeth Pedersen From Bowerbird at aol.com Fri Jun 9 09:36:40 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Jun 9 09:36:56 2006 Subject: [gutvol-d] All people with accounts on ibiblio! PG site moving to wiki Message-ID: <493.292ea68.31bafd98@aol.com> i see that _marcello_ has blocked _me_ as a "troll". ironic, eh? t.e.i. is lagging, but the smear campaign continues. whatever, i've got better things to think about... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060609/9ff28a67/attachment.html From marcello at perathoner.de Fri Jun 9 12:26:41 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Jun 9 12:26:45 2006 Subject: [gutvol-d] All people with accounts on ibiblio! PG site moving to wiki In-Reply-To: <493.292ea68.31bafd98@aol.com> References: <493.292ea68.31bafd98@aol.com> Message-ID: <4489CB71.6080302@perathoner.de> Bowerbird@aol.com wrote: > i see that _marcello_ has blocked _me_ as a "troll". Proactive conflict management. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Fri Jun 9 12:40:02 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Jun 9 12:40:07 2006 Subject: [gutvol-d] All people with accounts on ibiblio! PG site moving to wiki In-Reply-To: <7.0.1.0.0.20060609095424.0190f0a0@fireantproductions.com> References: <44861522.2050905@perathoner.de> <7.0.1.0.0.20060609095424.0190f0a0@fireantproductions.com> Message-ID: <4489CE92.1080007@perathoner.de> Aaron Cannon wrote: > Hello Marcello and all. The idea of a wiki is a great one! The only > problem I have with it is that it is really slow. How are you > planning on dealing with that? Perhaps a daily static dump of the > Gutenberg Namespace could be made and visitors could be referred to > that. Updates wouldn't show up immediately, but it might not matter > in most cases. ibiblio is slow currently. They say they are moving to new servers at the end of summer. That may alleviate the problem. Currently only about 4% of requests are accessing the pages we are going to put on the wiki. Also, MediaWiki is slower if you are logged in. If you are not logged in, most pages will come out of the page cache. Of course, the page cache will not fill before the public starts using the wiki. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Fri Jun 9 13:15:19 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Jun 9 13:15:25 2006 Subject: [gutvol-d] All people with accounts on ibiblio! PG site moving to wiki Message-ID: <370.4a7ddeb.31bb30d7@aol.com> marcello said: > Proactive conflict management. well, i suppose if you really cannot help yourself from becoming entangled, that's understandable. good luck with your wiki. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060609/a09c6a6c/attachment.html From kreeder at mailsnare.net Sun Jun 11 08:36:37 2006 From: kreeder at mailsnare.net (kreeder@mailsnare.net) Date: Sun Jun 11 08:56:23 2006 Subject: [gutvol-d] Auction of rare books at the end of June Message-ID: <20060611153637.tqof2bgqo0w0c0cs@horde.mailsnare.net> This article appeared in the Cincinnati Enquirer last week, thought I'd share it in case others might also find it interesting: Treasure of rare books on the block Historical Society expects $4.5M+ from auction By Margaret A. McGurk Enquirer staff writer In the middle of the 1920s, a newly married Cornelius J. Hauck began to collect books. At first, he and his wife, Harriet Wesche, looked only for botanical subjects: trees, plants and flowers. In the next 40 years, the hobby blossomed into a passionate love affair with everything rare and glorious in the realm of the written word. The scion of a prominent Cincinnati brewery and banking family, Hauck bought books printed on paper, chiseled in stone, carved into jade, wrapped in leather and silver and jewels. Taken all together, those books form a spectacular treasure that stayed locked in a vault for 40 years, unknown to most outside the Cincinnati Historical Society, to which Hauck donated the collection in 1966, a year before his death. This month, its anonymity ends. On June 27 and 28, Christie's auction house in New York will sell the Hauck collection under the title "The History of the Book." Prices are expected to exceed $4.5 million. "Given the fact that we are really a regional history organization, it doesn't make sense for us to keep them," said Douglass W. McDonald, head of the Cincinnati Museum Center, which includes the historical society. Money from the sale will be used to care for the 50,000-plus books in the historical society collection. "We think it is important for these works to be put in the hands of people who can bring them more to the public's attention, and . . . where the world's scholars will be made aware of this collection." Francis Wahlgren, head of the books and manuscripts department at Christie's, said Hauck's books remained largely unknown in part because they were bought with the help of an unusually discreet adviser, Emil Offenbacher of New York. Offenbacher, a book dealer, bought many of the pieces on Hauck's behalf at estate sales and auctions in the '30s and '40s but did not reveal Hauck's identity. "His name is not bantered about the room," Wahlgren said. "Many book dealers would let that out, (that) they had a big client in Cincinnati and so forth. That never happened with Offenbacher." As a result, "There are things in there none of us have seen in 40 or more years," he said. "They are museum pieces in the sense that any examples that have survived tend to be in museums. They're unobtainable." The collection includes 900 items, to be sold in 700 lots, including ancient cuneiform tablets, illuminated manuscripts, rare bindings, sacred texts in Arabic and Hebrew and fragments of Greek papyrus, as well as modern miniatures and first editions. Because of the breadth of the collection, Christie's enlisted specialists in jewelry, silver, Asian art, Islamic artifacts, decorative arts and many other areas to assess and catalog the items. "No book collection has ever required such a team effort," he said. At least one local archivist regrets that the museum center did not make a greater effort to find a way to keep the collection intact, and in Cincinnati. Kevin Grace, University of Cincinnati archivist and head of the rare books department for the UC library system, said: "It's disappointing that they didn't try and get a local buyer first. It's a shame it's going to be dispersed and leave the city." The museum's decision to sell came as a surprise, he said. "We didn't find out about it until Christie's had it listed as an upcoming auction. If we'd known before, it might have given us the time to court somebody to endow the purchase. . . . We already have a very fine rare-book collection, and this would add to it. And since it was a Cincinnati-compiled collection, it would be nice to have it remain in the city." Museum spokesman Rodger Pille said some institutions outside Cincinnati that specialize in rare books were contacted informally about the possibility of buying the entire collection, "but at the end of the day, we determined that the auction provided a way for every one of those institutions to supplement their collections." The collection has never been exhibited in full, although a few items were shown during the museum center's "Prized Possessions" show in 2000. In recent weeks, about 40 pieces were displayed in London, Paris and Munich to entice European buyers, Wahlgren said. "In the book world, it's a huge source of excitement," he said. "This means a major new collector will be brought to light. A book from this collection will be known as the 'Hauck copy.'" * * * * Hauck collection The collection's single most valuable item, with an estimated sale price of $600,00 to $800,000, is "The Book of Friendship", an illuminated manuscript created between 1596 and 1633 to memorialize the crowned heads of Europe. A 20th-century Chinese-Tibetan portable "pocket shrine" carries the catalog's lowest price estimate, at $50-$150. A number of items are listed at less than $500. The newest book in the collection is a 1955 limited edition of Surrealist poems by Paul ?luard, listed at $1,500 to $2,000. The oldest is a Mesopotamian cuneiform cone dating to 2250 B.C., being sold with a newer but similar item; estimated price for both is $1,000 to $1,500. Francis Wahlgren, head of the books and manuscripts department at Christie's auction house, said his personal favorite among the 900 items in the Hauck collection is a 17th-century Dutch merchant's book on coins that has its own set of scales. Wahlgren described it as the original owner's "Blackberry, his technology at hand." See some of the rare items and get more info about the collection at Cincinnati.com. Keyword: photos [Note: I think I found the appropriate page at this site, but my browser showed it to be empty.) * * * * "The History of the Book" auction will be at Christie's, 20 Rockefeller Plaza in Manhattan, beginning at 10 a.m. June 27 and 28. Viewing days are June 23-26. The 679-page catalogs are $35 and can be ordered online at www.christies.com or by phone at 800-395-6300. From mattsen at arvig.net Sun Jun 11 09:23:09 2006 From: mattsen at arvig.net (Chuck MATTSEN) Date: Sun Jun 11 09:45:19 2006 Subject: [gutvol-d] Auction of rare books at the end of June In-Reply-To: <20060611153637.tqof2bgqo0w0c0cs@horde.mailsnare.net> References: <20060611153637.tqof2bgqo0w0c0cs@horde.mailsnare.net> Message-ID: On Sun, 11 Jun 2006 10:36:37 -0500, wrote: > See some of the rare items and get more info about the collection at > Cincinnati.com. Keyword: photos [Note: I think I found the appropriate > page > at this site, but my browser showed it to be empty.) Seems okay here: http://news.enquirer.com/apps/pbcs.dll/gallery?Avis=AB&Dato=20060606&Kategori=LIFE&Lopenr=606003&Ref=PH&SectionCat=all or http://tinyurl.com/o3df7 -- Chuck Mattsen (Mahnomen, MN) mattsen@arvig.net From Bowerbird at aol.com Sun Jun 11 10:38:58 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Jun 11 10:39:04 2006 Subject: [gutvol-d] utf8 prototyping Message-ID: <111.5fbb62fa.31bdaf32@aol.com> i've begun prototyping utf8 capability in my apps. if anyone would like to help test that, let me know. i remember getting a bunch of flak back when i advocated stripping a few diacritical marks in english texts for the sake of wide compatibility (since english readers understand it fine anyway). here's a chance for those people to show that they weren't just flapping their yaps... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060611/a21aa8d4/attachment.html From Bowerbird at aol.com Mon Jun 12 11:12:15 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Jun 12 11:12:33 2006 Subject: [gutvol-d] Fwd: an open letter to the google book scanning people Message-ID: <438.3673597.31bf087f@aol.com> Skipped content of type multipart/alternative-------------- next part -------------- An embedded message was scrubbed... From: Bowerbird@aol.com Subject: an open letter to the google book scanning people Date: Mon, 12 Jun 2006 14:11:41 EDT Size: 4837 Url: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060612/bd86f00e/attachment.mht From Bowerbird at aol.com Mon Jun 12 11:41:00 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Jun 12 11:41:09 2006 Subject: [gutvol-d] translucent windows Message-ID: <3f9.47be333.31bf0f3c@aol.com> the mac allows windows to have varying background opacity, from totally opaque through translucent to fully transparent... can anyone think of _any_ possible e-book use for transparent windows? because it looks really cool, and even though it's not cross-plat, i'd _love_ to be able to find _some_ reason to implement it... any reason. but alas, i'm coming up empty... ;+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060612/fda82cc7/attachment.html From realmjit at yahoo.com Mon Jun 12 12:19:44 2006 From: realmjit at yahoo.com (Mjit Raindancer-Stahl) Date: Mon Jun 12 12:26:27 2006 Subject: [gutvol-d] Re: translucent windows In-Reply-To: <20060612190003.E006C8CBC7@pglaf.org> Message-ID: <20060612191944.83601.qmail@web30210.mail.mud.yahoo.com> > > can anyone think of _any_ possible > e-book use for transparent windows? Anatomy books. My favorite anatomy books allow the reader to view the human body system by system, with each system on a clear overlay. M'jit AIM/Yahoo!IM/Ebay: Realmjit realmjit@yahoo.com | answerwitch@gundo.com __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From Bowerbird at aol.com Tue Jun 13 13:05:51 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Jun 13 13:05:59 2006 Subject: [gutvol-d] viewer-program for p.g. e-texts Message-ID: <40f.39b4f8c.31c0749f@aol.com> one of the best viewer-programs around -- for those of you on the p.c. platform -- is "ybook", by simon hayes. and it's free... it even has a hookup with the p.g. catalog, so you can download e-texts from inside it. ybook also lets you wrap a book you've written in a standalone executable .exe, which is nifty... http://members.iinet.net.au/~simonh/spacejock/yBook.html -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060613/9433cd27/attachment.html From Bowerbird at aol.com Tue Jun 13 13:07:48 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Jun 13 13:07:54 2006 Subject: [gutvol-d] dan poynter and e-books Message-ID: <383.42ff849.31c07514@aol.com> on another listserve, dan poynter -- the guru of self-publishing -- says this: > I have been reading (many) books > on my Pocket PC for years. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060613/a6dd7fb1/attachment.html From Bowerbird at aol.com Wed Jun 14 11:50:10 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Jun 14 11:50:24 2006 Subject: [gutvol-d] annotating the movies Message-ID: <4cd.1556433.31c1b462@aol.com> wanna create your own version of mystery science theater 3000, complete with smart-ass comments coming from the audience members pictured in silhouette down front? then get a mac and run "peanut gallery". > http://peanutgallery.kaisakura.com/ and you can be such an audience-member, meaning that "it's ok to talk during the film". as the website says: > Interact with each other via Maya-rendered 30fps* > animated characters, inline real-time text chat, and voice. > Peanut Gallery isn't just a video player ? it's a Shared Media Experience! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060614/c59893a5/attachment.html From sly at victoria.tc.ca Thu Jun 15 12:26:55 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Jun 15 12:26:59 2006 Subject: [gutvol-d] PG text in library catalog Message-ID: Well! I've had my first experience of running into a Project Gutenberg citation in a major "traditional" library catalog. This was through Amicus, a collection of records from Canadian libraries. The only unfortunate thing is that it is presented via NetLibrary, which limits and controls access to its texts. NAME(S):*Burroughs, Edgar Rice, 1875-1950 NetLibrary, Inc TITLE(S): The mucker [electronic resource] / Edgar Rice Burroughs PUBLISHER: Champaign, Ill. (P.O. Box 2782, Champaign 61825) : Project Gutenberg, [199u]. E-LOCATIONS: http://www.netLibrary.com/urlapi.asp?action=summary&v=1 &bookid=1085499 *McMaster only NOTES: Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://www.netLibrary.com/urlapi.asp?action=summary&v= 1&bookid=1085499 Electronic reproduction. Boulder, Colo. : NetLibrary, 2001. Available via World Wide Web. Access may be limited to NetLibrary affiliated libraries. NUMBERS: ISBN: 0585016860 (electronic bk.) : ISBN: 0585016860 (electronic bk.) CLASSIFICATION: LC Call no.: PS3503.U687 .M83 SUBJECTS: Electronic books Science fiction From Bowerbird at aol.com Thu Jun 15 12:43:24 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 15 12:43:35 2006 Subject: [gutvol-d] PG text in library catalog Message-ID: <425.36aa8e5.31c3125c@aol.com> andrew said: > The only unfortunate thing is that it is presented via > NetLibrary, which limits and controls access to its texts. to my eyes, this is starting to look like an i.q. test for librarians. ironic, isn't it? for me, a library is a place where books that normally cost money can be borrowed for free. but with this, a library is becoming a place that pays for books that are free. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060615/64b240e2/attachment.html From desrod at gnu-designs.com Thu Jun 15 13:02:31 2006 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Thu Jun 15 13:09:28 2006 Subject: [gutvol-d] PG text in library catalog In-Reply-To: <425.36aa8e5.31c3125c@aol.com> References: <425.36aa8e5.31c3125c@aol.com> Message-ID: > ironic, isn't it? for me, a library is a place where books that > normally cost money can be borrowed for free. but with this, a > library is becoming a place that pays for books that are free. You must be new to this Internet thing ;) Joking aside, lots of common terms that we are used to are being redefined to mean precisely the exact opposite. "Free membership" (just enter your credit card number or user name here), or "Download these titles now" (as soon as we receive them in stock; 4-6 weeks minimum). Oh, and my favorite recent one... "Net Neutrality". The irony with the Doublespeak never ceases to amaze me. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From greg at durendal.org Thu Jun 15 13:01:23 2006 From: greg at durendal.org (Greg Weeks) Date: Thu Jun 15 13:30:04 2006 Subject: [gutvol-d] PG text in library catalog In-Reply-To: References: Message-ID: On Thu, 15 Jun 2006, Andrew Sly wrote: > Well! I've had my first experience of running into a > Project Gutenberg citation in a major "traditional" library catalog. > This was through Amicus, a collection of records from Canadian > libraries. The only unfortunate thing is that it is presented > via NetLibrary, which limits and controls access to its texts. I've ran into a number of these citations via NetLibrary from the Carnegie library in Pittsburgh. They don't have the complete Gutenberg catalog. -- Greg Weeks http://durendal.org:8080/greg/ From greg at durendal.org Thu Jun 15 13:01:23 2006 From: greg at durendal.org (Greg Weeks) Date: Thu Jun 15 13:30:05 2006 Subject: [gutvol-d] PG text in library catalog In-Reply-To: References: Message-ID: On Thu, 15 Jun 2006, Andrew Sly wrote: > Well! I've had my first experience of running into a > Project Gutenberg citation in a major "traditional" library catalog. > This was through Amicus, a collection of records from Canadian > libraries. The only unfortunate thing is that it is presented > via NetLibrary, which limits and controls access to its texts. I've ran into a number of these citations via NetLibrary from the Carnegie library in Pittsburgh. They don't have the complete Gutenberg catalog. -- Greg Weeks http://durendal.org:8080/greg/ From Bowerbird at aol.com Thu Jun 15 15:03:58 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 15 15:04:08 2006 Subject: [gutvol-d] the newest d.p. iteration Message-ID: <4ad.29fbf48.31c3334e@aol.com> the newest iteration over at distributed proofreaders will be _3_ proofing rounds and 2 formatting rounds, with provisions for skipping some of these rounds... with this new change, i think we can safely say that d.p. has wasted a lot of time studying its workflow and _still_ not come to the point of perfecting it... so it's time for me to once again interject my opinion. 1. pre-proofing clean-up programs could handle _many_ of the problems that are found in your o.c.r. (careful image handling could solve most of the rest.) 2. if d.p. used zen markup, it could save itself from the drudgery of those "formatting rounds". conversion from plain-ascii to html is now routine. (pushing out each page to check its formatting is a tremendous waste of bandwidth. but who cares?) 3. no matter how many rounds you add, it will _still_ be the case that some pages will have needed more. (some _pages_, *not* some _books_; it's silly to treat all of the pages in a book as being of equal difficulty.) d.p. needs to go "roundless", treat pages individually. 4. duplicate proofings by independent proofers can be crosschecked to quickly and easily spot any differences, which can then be dispatched with a minimum of effort. this double-key strategy can be used on individual pages. again, these are all things that i've been saying for years. if all the energy that's been spent on "research" would've been used to implement these recommendations instead, it would've been a lot less work, and d.p. would now have a good workflow. as it is, it will probably take a year or so for the problems in the newest system to reveal themselves, and then more work after that to install all of my suggestions. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060615/3379ee76/attachment.html From marcello at perathoner.de Thu Jun 15 15:17:55 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Jun 15 15:18:03 2006 Subject: [gutvol-d] the newest d.p. iteration In-Reply-To: <4ad.29fbf48.31c3334e@aol.com> References: <4ad.29fbf48.31c3334e@aol.com> Message-ID: <4491DC93.7070300@perathoner.de> Bowerbird@aol.com wrote: > if all the energy that's been spent on "research" would've > been used to implement these recommendations instead, > it would've been a lot less work, and d.p. would now have > a good workflow. Why don't you start your own distributed proofing project with all those nifty processes and tools you have by now devised? Seeing how superior all your ideas are, you should be able to churn out twice as many books as DP with no effort at all. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Thu Jun 15 17:57:11 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 15 17:57:23 2006 Subject: [gutvol-d] the newest d.p. iteration Message-ID: <31b.502718f.31c35be7@aol.com> marcello said: > Why don't you start your own distributed proofing project because i anticipate that with further research on my part, combined with ever-increasing o.c.r. progress from abbyy, we won't even need much human proofreading in the future. besides, i've already prototyped my "continuous proofreading", and i'll be putting that into place when google hands me their full pre-1923 library of page-scans, which i recently requested. and, to be honest with you, i've become more and more bored with these old books, which -- face it -- we focus on _mostly_ because their copyright has expired. i'd say that 4 out of 5 of the e-texts that are being posted these days are _not_ "classics". (not that nonclassics don't deserve to be preserved as well, but...) further, much of the copyright-constrained stuff of recent decades is merely pap the publishing industry thought might make money. much of it, i couldn't give a shit if it makes it to cyberspace or not... what really excites me now is our new possibility to let _everything_ that _anyone_ might write see the light of day and find its audience. we are finally free of the shackles of the past, meaning that we can free ourselves of the corporate mindset that's blinded us up to now. (and the government one before it, and the religious one before it.) we can now travel far past the edge of the envelope; that's exciting. so rather than converting old books from paper to electronic form, i want to help new born-digital works find their place in cyberspace. i want to encourage writers to see our imaginations can now be free, in a way that has _never_ been true before in all of our long history. in other words, the human race now has a truly unique opportunity! don't get me wrong, i am _really_happy_ old works are being rescued. it's just that, for my own self, the relevance of new works is more juicy. -bowerbird p.s. plus, as voice recognition improves over the next 5 years or so, i expect that o.c.r. will take a back seat to voice-transcribed books... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060615/3bf97e6c/attachment.html From brad at chenla.org Fri Jun 16 06:58:30 2006 From: brad at chenla.org (Brad Collins) Date: Fri Jun 16 07:05:08 2006 Subject: [gutvol-d] the newest d.p. iteration In-Reply-To: <31b.502718f.31c35be7@aol.com> (Bowerbird@aol.com's message of "Thu, 15 Jun 2006 20:57:11 EDT") References: <31b.502718f.31c35be7@aol.com> Message-ID: Bowerbird@aol.com writes: > p.s. plus, as voice recognition improves over the next 5 years or > so, i expect that o.c.r. will take a back seat to voice-transcribed > books... ROFL !!! -- Brad Collins , Banqwao, Thailand From kth at srv.net Fri Jun 16 08:13:02 2006 From: kth at srv.net (Kevin Handy) Date: Fri Jun 16 08:19:05 2006 Subject: [gutvol-d] the newest d.p. iteration In-Reply-To: References: <31b.502718f.31c35be7@aol.com> Message-ID: <4492CA7E.7020408@srv.net> Brad Collins wrote: >Bowerbird@aol.com writes: > > > >>p.s. plus, as voice recognition improves over the next 5 years or >>so, i expect that o.c.r. will take a back seat to voice-transcribed >>books... >> >> > >ROFL !!! > > > Ewe no, he mite bee rite. Wee maybe waisting oar thyme. This voice recognition get off me you stupid cat. Stuff will obviously get off of me now! Have fewer problems than ow! Ow! OW! Get off me! What we are doing now. Yowl! Snarl! Growl! Ow! OW! OW! From kth at srv.net Fri Jun 16 08:13:02 2006 From: kth at srv.net (Kevin Handy) Date: Fri Jun 16 08:19:06 2006 Subject: [gutvol-d] the newest d.p. iteration In-Reply-To: References: <31b.502718f.31c35be7@aol.com> Message-ID: <4492CA7E.7020408@srv.net> Brad Collins wrote: >Bowerbird@aol.com writes: > > > >>p.s. plus, as voice recognition improves over the next 5 years or >>so, i expect that o.c.r. will take a back seat to voice-transcribed >>books... >> >> > >ROFL !!! > > > Ewe no, he mite bee rite. Wee maybe waisting oar thyme. This voice recognition get off me you stupid cat. Stuff will obviously get off of me now! Have fewer problems than ow! Ow! OW! Get off me! What we are doing now. Yowl! Snarl! Growl! Ow! OW! OW! From hart at pglaf.org Fri Jun 16 09:41:52 2006 From: hart at pglaf.org (Michael Hart) Date: Fri Jun 16 09:41:54 2006 Subject: [gutvol-d] PG text in library catalog In-Reply-To: References: Message-ID: On Thu, 15 Jun 2006, Greg Weeks wrote: > On Thu, 15 Jun 2006, Andrew Sly wrote: > >> Well! I've had my first experience of running into a >> Project Gutenberg citation in a major "traditional" library catalog. >> This was through Amicus, a collection of records from Canadian >> libraries. The only unfortunate thing is that it is presented >> via NetLibrary, which limits and controls access to its texts. > > I've ran into a number of these citations via NetLibrary from the Carnegie > library in Pittsburgh. They don't have the complete Gutenberg catalog. NetLibrary has sold perhaps millions of PG eBooks for ~100 to college libraries. . .libraries, I might add, who wouldn't take them when I offered them free of charge. . . . Including my own local Big 10 University of Illinois. ;-) From Bowerbird at aol.com Fri Jun 16 10:38:41 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Jun 16 10:38:50 2006 Subject: [gutvol-d] the newest d.p. iteration Message-ID: <504.251e86.31c446a1@aol.com> i said: > p.s.? plus, as voice recognition improves over the next 5 years or so, > i expect that o.c.r. will take a back seat to voice-transcribed books... then brad said: > ROFL !!! then kevin said: > Ewe no, he mite bee rite. Wee maybe waisting oar thyme. ya know, i never know what i'm gonna say that's gonna set people off. (but i should have learned by now that it'll probably be a throwaway line in the p.s. rather than the meat of the substance in the body of the message.) but hey, i don't mind the challenge. it helps me develop some logic that i might not have bothered with otherwise. obviously kevin here has never used voice-recognition, because no system would give us the line he gives us. that's _not_ to say voice-recognition is problem-free. there are a lot of problems with it. a ton of problems. but there used to be a ton of problems with o.c.r. too. and people still slogged through it anyway, didn't they? the reason people will slog through the problems anyway with voice recognition is because it will be a lot more _fun_ and _easy_ to just _read_ a book through rather than to sit inside an editing system, and that will make the difference. over the past 5 years, some 35,000 people signed up at d.p. roughly 10% -- about 3,500 -- were around when d.p. reset its subscription base a while back. those were the top 10%, so that wasn't a bad thing, but it does go to show that the o.c.r. route is just a little bit too trying for the average bear. even when you distribute out the work. but hey, if that other 90% could do their part to help out by simply recording a book -- they _did_ once express enough interest in the cause to sign up, remember -- then maybe they could have been retained as helpers... and maybe a whole order of magnitude of more helpers could be _recruited_ if the means of helping were so fun. with libre vox, people are already recording old books. audiobooks, always popular, are getting even more so. podcasting is growing the base of recording experience (and audience) in the user-population at a _huge_ rate. and, for those of us keeping track, there has already been a message posted on the distributed proofing forums from a person who reported using voice-recognition software _within_the_current_d.p._system_. now that's dedication. and as the form-factors of our machines continue to shrink, voice-recognition will become more and more important, and more ingrained, and some people will rely on it entirely. and speaking of libre vox, it's important to keep in mind that a _recording_ retains value even _after_ it has been turned into digital text. heck, many people will prefer the .mp3 to the .txt. there's sure a lot more player-hardware out here for the .mp3. moreover, when a person creates a recording, that product is _seeped_ with their contribution. with their own _voice_, for crying out loud. can it get much more personal than that? to some people, that will surely be more satisfying than the simple credit-line at the top of a project gutenberg e-text... and hey, it might mean a lot more to the _end-user_ as well! i can tell you that i've looked at a lot of texts from jon ingram. lots and lots of them. and they almost always look very nice. but none of them has had the impact of the bit he recorded for libre vox, where his accent had me muttering to myself, "hey, i forgot, that bloody bloke is from _england_, isn't he?" there's something very endearing and personal about a voice. even one with a heavy english accent. ;+) so a person who records a book is giving us _two_ products; one is a route to obtaining digital text via voice-recognition, and the other is a recording of that book in a human voice. it might be that down the line, the second dwarfs the first. it's also quite important to remind ourselves that these two products are complementary, not competing with each other. and it's not hard to imagine that the recording will become _especially_ useful when it gets combined with page-scans. a recording of each page playing when the scan is displayed might become the most typical kind of "book" in the future! likewise, it does _not_ have to be either/or between o.c.r. and voice-recognition; we can instead make the two work together. we could do o.c.r. on the scans, and then cross-check the o.c.r. against the voice-recognition results, then concentrate on the differences to intelligently remove errors from _both_ versions. we would expect homonym problems in the voice-recognition, for instance, and scannos in the o.c.r., so could control for that. anytime you combine two different methods for the same result, they can serve as a useful cross-check on each other. bingo. in case you didn't know, some of the people who are obtaining the highest accuracy in their e-texts use text-to-speech to get it. what i'm talking about here can be viewed as the flip-side of that. so, in summary, if you're "rolling on the floor laughing" about voice-recognition and the possibilities it offers to digitizers, you show your lack of vision. there's no other way to say it... of course, your loss is the lurkers' gain, because it gave me a reason to explain. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060616/23249cc6/attachment.html From greg at durendal.org Fri Jun 16 10:30:59 2006 From: greg at durendal.org (Greg Weeks) Date: Fri Jun 16 11:00:11 2006 Subject: [gutvol-d] PG text in library catalog In-Reply-To: References: Message-ID: On Fri, 16 Jun 2006, Michael Hart wrote: > > On Thu, 15 Jun 2006, Greg Weeks wrote: > >> On Thu, 15 Jun 2006, Andrew Sly wrote: >> >>> Well! I've had my first experience of running into a >>> Project Gutenberg citation in a major "traditional" library catalog. >>> This was through Amicus, a collection of records from Canadian >>> libraries. The only unfortunate thing is that it is presented >>> via NetLibrary, which limits and controls access to its texts. >> >> I've ran into a number of these citations via NetLibrary from the Carnegie >> library in Pittsburgh. They don't have the complete Gutenberg catalog. > > NetLibrary has sold perhaps millions of PG eBooks for ~100 to > college libraries. . .libraries, I might add, who wouldn't take > them when I offered them free of charge. . . . > > Including my own local Big 10 University of Illinois. NetLibrary gives credit also, so I can't claim to be unhappy with them. If that's what it takes to get our books into brick and mortar libraries ok. -- Greg Weeks http://durendal.org:8080/greg/ From marcello at perathoner.de Fri Jun 16 11:14:00 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Jun 16 11:14:03 2006 Subject: [gutvol-d] the newest d.p. iteration In-Reply-To: <31b.502718f.31c35be7@aol.com> References: <31b.502718f.31c35be7@aol.com> Message-ID: <4492F4E8.5000406@perathoner.de> Bowerbird@aol.com wrote: > p.s. plus, as voice recognition improves over the next 5 years or so, > i expect that o.c.r. will take a back seat to voice-transcribed books... It is a troot uneeferselly ecknooledged, thet a seengle-a mun in pussesseeun ooff a guud furtoone-a, moost be-a in vunt ooff a veeffe-a. Hooefer leettle-a knoon zee feeleengs oor feeoos ooff sooch a mun mey be-a oon hees furst intereeng a neeeghbuoorhuud, thees troot is su vell feexed in zee meends ooff zee soorruoondeeng femeelies, thet he-a is cunseedered zee reeghtffool pruperty ooff sume-a oone-a oor oozeer ooff zeeur dooghters. -- Marcello Perathoner webmaster@gutenberg.org From dixonm at pobox.com Fri Jun 16 12:39:07 2006 From: dixonm at pobox.com (Meredith Dixon) Date: Fri Jun 16 12:39:04 2006 Subject: [gutvol-d] the newest d.p. iteration In-Reply-To: <504.251e86.31c446a1@aol.com> References: <504.251e86.31c446a1@aol.com> Message-ID: <449308DB.7000505@pobox.com> Bowerbird@aol.com wrote: > the reason people will slog through the problems anyway > with voice recognition is because it will be a lot more _fun_ > and _easy_ to just _read_ a book through rather than to sit > inside an editing system, and that will make the difference. Bowerbird, how often do you read books aloud? My grandmother, who grew up in a time when reading to others was an essential skill, taught me to read aloud as a child, and I spent many hours reading aloud to her and to my mother. I actually enjoyed doing it, and I often wish I had more opportunities to do so now. But reading a book aloud is an extremely slow and inefficient way to get text into electronic form. I could *type* a book in faster than I could read it aloud, much less scan it. Reading out loud is tiring, even when you're used to it. If you have only read, say, picture books to your children, you may not realize this. You need to rest your voice after an hour or so. And it takes hours and hours to read an ordinary book out loud, never mind something like The Lord of the Rings (and, yes, I have read the entire The Lord of the Rings out loud. Twice.). Scanning is boring, yes, but it is also fast. And it doesn't make your throat hurt at the end of a session. > > over the past 5 years, some 35,000 people signed up at d.p. > > roughly 10% -- about 3,500 -- were around when d.p. reset > its subscription base a while back. those were the top 10%, > so that wasn't a bad thing, but it does go to show that the > o.c.r. route is just a little bit too trying for the average bear. > even when you distribute out the work. > > but hey, if that other 90% could do their part to help out > by simply recording a book -- they _did_ once express > enough interest in the cause to sign up, remember -- > then maybe they could have been retained as helpers... > > and maybe a whole order of magnitude of more helpers > could be _recruited_ if the means of helping were so fun. > > with libre vox, people are already recording old books. > audiobooks, always popular, are getting even more so. > podcasting is growing the base of recording experience > (and audience) in the user-population at a _huge_ rate. > > and, for those of us keeping track, there has already been > a message posted on the distributed proofing forums from > a person who reported using voice-recognition software > _within_the_current_d.p._system_. now that's dedication. > > and as the form-factors of our machines continue to shrink, > voice-recognition will become more and more important, > and more ingrained, and some people will rely on it entirely. > > and speaking of libre vox, it's important to keep in mind that > a _recording_ retains value even _after_ it has been turned into > digital text. heck, many people will prefer the .mp3 to the .txt. > there's sure a lot more player-hardware out here for the .mp3. > > moreover, when a person creates a recording, that product > is _seeped_ with their contribution. with their own _voice_, > for crying out loud. can it get much more personal than that? > to some people, that will surely be more satisfying than the > simple credit-line at the top of a project gutenberg e-text... > > and hey, it might mean a lot more to the _end-user_ as well! > > i can tell you that i've looked at a lot of texts from jon ingram. > lots and lots of them. and they almost always look very nice. > but none of them has had the impact of the bit he recorded > for libre vox, where his accent had me muttering to myself, > "hey, i forgot, that bloody bloke is from _england_, isn't he?" > > there's something very endearing and personal about a voice. > even one with a heavy english accent. ;+) > > so a person who records a book is giving us _two_ products; > one is a route to obtaining digital text via voice-recognition, > and the other is a recording of that book in a human voice. > it might be that down the line, the second dwarfs the first. > > it's also quite important to remind ourselves that these two > products are complementary, not competing with each other. > > and it's not hard to imagine that the recording will become > _especially_ useful when it gets combined with page-scans. > a recording of each page playing when the scan is displayed > might become the most typical kind of "book" in the future! > > likewise, it does _not_ have to be either/or between o.c.r. and > voice-recognition; we can instead make the two work together. > > we could do o.c.r. on the scans, and then cross-check the o.c.r. > against the voice-recognition results, then concentrate on the > differences to intelligently remove errors from _both_ versions. > > we would expect homonym problems in the voice-recognition, > for instance, and scannos in the o.c.r., so could control for that. > > anytime you combine two different methods for the same result, > they can serve as a useful cross-check on each other. bingo. > > in case you didn't know, some of the people who are obtaining > the highest accuracy in their e-texts use text-to-speech to get it. > what i'm talking about here can be viewed as the flip-side of that. > > so, in summary, if you're "rolling on the floor laughing" about > voice-recognition and the possibilities it offers to digitizers, > you show your lack of vision. there's no other way to say it... > > of course, your loss is the lurkers' gain, > because it gave me a reason to explain. > > -bowerbird > > > ------------------------------------------------------------------------ > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d -- Meredith Dixon Check out *Raven Days* For victims and survivors of bullying at school. And for those who want to help. From Bowerbird at aol.com Fri Jun 16 12:55:15 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Jun 16 12:55:29 2006 Subject: [gutvol-d] mark pilgrim Message-ID: <234.bddf23a.31c466a3@aol.com> mark pilgrim, an early open-source person, recently switched from apple over to linux... in a blog entry on this, he talks about archiving, and how it gets complicated by file-formats and _especially_ by d.r.m. (which hobbles it by design), and remarks open source does not always equate to open formats (using "gimp" as an example of it). even before he mentioned project gutenberg, i was thinking i'd share a pointer, so here it is: > http://diveintomark.org/archives/2006/06/16/juggling-oranges -bowerbird p.s. i highly recommend -- for guys -- pilgrim's blog entry before this, "howto make the perfect fruit salad and get laid." -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060616/e705c6d9/attachment.html From joshua at hutchinson.net Fri Jun 16 13:22:26 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Jun 16 13:22:37 2006 Subject: [gutvol-d] the newest d.p. iteration Message-ID: <20060616202226.BDCD510995B@ws6-4.us4.outblaze.com> > If you have only > read, say, picture books to your children, you > may not realize this. That has to be one of the scariest things I've read lately... bowerbird procreating? *shudder* Other than that, I would add one more reason that OCR is more convenient than Voice Recognition ... I can work on a page of typed text at my computer without annoying the crap out of people around me. Can you imagine trying to read a book while sitting at your local Starbucks? Josh From Bowerbird at aol.com Fri Jun 16 13:24:53 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Jun 16 13:24:58 2006 Subject: [gutvol-d] the newest d.p. iteration Message-ID: <4f8.2aa03d.31c46d95@aol.com> meredith said: > But reading a book aloud is an extremely slow > and inefficient way to get text into electronic form. that's the point, though. people won't be doing it "to get text into electronic form". that will just be a pleasant side-effect, a tangent from their real aim, which will be "to share a book with the whole world". they'll be doing it because it's _fun_. sure it's work too. but people will do a whole lot of work if they enjoy what they're doing. you see that all the time. > I could *type* a book in faster than I could read it aloud, > much less scan it. but your typed version will be no different than anyone else's. your _recorded_ version, however, will be _uniquely_ yours, perfectly representing the one-of-a-kind snowflake you are, something that your grandchildren, and _their_ grandchildren, can listen to over and over whenever they want to think of you. don't you wish you could hear your grandmother's voice again? > Reading out loud is tiring, even when you're used to it. i agree, it is. but you also get used to it, the more you do it, until you can do it without straining yourself in the slightest. > If you have only read, say, picture books to your children, you may > not realize this. You need to rest your voice after an hour or so. i do performance poetry, so i'm sharply cognizant of voice training. i'm also acutely aware a large audience provides a lot of motivation. > And it takes hours and hours to read an ordinary book out loud, > never mind something like The Lord of the Rings the market for audiobooks has already asserted itself, quite loudly. i imagine that _free_ audiobooks will provide a _very_ large audience. and thus a lot of motivation. > (and, yes, I have read the entire The Lord of the Rings out loud.? Twice.). then i guess you must have had sufficient motivation of some kind. > Twice. ya know, if you would have recorded yourself the first time you did it, you wouldn't have had to read it out loud again the second time... ;+) > Scanning is boring, yes, but it is also fast.? > And it doesn't make your throat hurt at the end of a session. warm water. (for your throat, not for your scanner...) ;+) -bowerbird p.s. i see your signature-block promotes a book you've written. perhaps you heard that an author who was podcasting his novel recently got picked up by one of the major publishing houses? so lots of aspiring authors might think of becoming podcasters. voice training -- it's not just for performance poets any more! > If you ask me what I came to do in this world, > I, an artist, I will answer you: "I am here to live out loud.? > -- Emile Zola -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060616/5f375af8/attachment.html From dixonm at pobox.com Fri Jun 16 16:46:42 2006 From: dixonm at pobox.com (Meredith Dixon) Date: Fri Jun 16 16:46:43 2006 Subject: [gutvol-d] the newest d.p. iteration In-Reply-To: <4f8.2aa03d.31c46d95@aol.com> References: <4f8.2aa03d.31c46d95@aol.com> Message-ID: <449342E2.7070403@pobox.com> Bowerbird@aol.com wrote: > > Reading out loud is tiring, even when you're used to it. > > i agree, it is. but you also get used to it, the more you do it, > until you can do it without straining yourself in the slightest. All I can say is that I never managed to get so used to it that my throat didn't hurt when I'd finished reading for the day, and I read aloud almost every day for at least an hour a day for most of my childhood. Certainly there's a learning curve to learning to read aloud, but that's mostly neurological; you need to learn how to read ahead with your eyes, to plan emphasis, while your mouth is reading an earlier sentence, and to jump back smoothly to your place in time to start your mouth off on the next sentence. But mastering that doesn't help any with tiredness, or with your throat's getting sore. > then i guess you must have had sufficient motivation of some kind. Well, yes, I liked the book well enough to spend time reading it to my mother. > ya know, if you would have recorded yourself the first time you did it, > you wouldn't have had to read it out loud again the second time... I don't think my mother would have stood for listening to a tape recorder instead of listening to me, and I shudder to think how many 45-minutes-on-a-side tapes it would have filled. > p.s. i see your signature-block promotes a book you've written. No, it promotes one of my websites. No book is involved. -- Meredith Dixon Check out *Raven Days* For victims and survivors of bullying at school. And for those who want to help. From Bowerbird at aol.com Sat Jun 17 02:07:13 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Jun 17 02:07:18 2006 Subject: [gutvol-d] "all of them?" Message-ID: <319.50cf8c9.31c52041@aol.com> > http://youtube.com/watch?v=veIU0Jwu54w no comment necessary... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060617/c18d5080/attachment.html From nwolcott2ster at gmail.com Sun Jun 18 09:31:41 2006 From: nwolcott2ster at gmail.com (Norm Wolcott) Date: Sun Jun 18 09:33:45 2006 Subject: [gutvol-d] http://www.ebooksgratuits.com/ Message-ID: <000c01c692f4$daed5420$650fa8c0@gw98> The web site http://www.ebooksgratuits.com/ which provided many pd french texts for PG and also had many other formats, has disappeared. Has anyone archived this site? Internet archive gets lost looking for individual books, although the home page is available until December 2005. nwolcott2@post.harvard.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060618/9eb747fa/attachment.html From ajhaines at shaw.ca Sun Jun 18 10:39:41 2006 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Sun Jun 18 10:39:45 2006 Subject: [gutvol-d] http://www.ebooksgratuits.com/ References: <000c01c692f4$daed5420$650fa8c0@gw98> Message-ID: <001c01c692fe$2de6d990$6401a8c0@ahainesp2400> Do a Google on "ebooksgratuits", and work through Google's "cached" links. Maybe you can extract material that way. ----- Original Message ----- From: Norm Wolcott To: 'Project Gutenberg Volunteer Discussion' Sent: Sunday, June 18, 2006 9:31 AM Subject: [gutvol-d] http://www.ebooksgratuits.com/ The web site http://www.ebooksgratuits.com/ which provided many pd french texts for PG and also had many other formats, has disappeared. Has anyone archived this site? Internet archive gets lost looking for individual books, although the home page is available until December 2005. nwolcott2@post.harvard.edu ------------------------------------------------------------------------------ _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060618/b1616513/attachment.html From donovan at abs.net Sun Jun 18 11:13:10 2006 From: donovan at abs.net (D Garcia) Date: Sun Jun 18 11:13:25 2006 Subject: [dp-pg] Re: [gutvol-d] http://www.ebooksgratuits.com/ In-Reply-To: <001c01c692fe$2de6d990$6401a8c0@ahainesp2400> References: <000c01c692f4$daed5420$650fa8c0@gw98> <001c01c692fe$2de6d990$6401a8c0@ahainesp2400> Message-ID: <200606181413.10766.donovan@abs.net> On Sunday 18 June 2006 01:39 pm, Al Haines (shaw) wrote: > Do a Google on "ebooksgratuits", and work through Google's "cached" links. > Maybe you can extract material that way. It looks like the domain expired, it eventually resolves to a placholder page which has a bunch of link junk on it. Google doesn't even appear to have the front page cached. From marcello at perathoner.de Sun Jun 18 11:58:38 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun Jun 18 11:58:41 2006 Subject: [dp-pg] Re: [gutvol-d] http://www.ebooksgratuits.com/ In-Reply-To: <200606181413.10766.donovan@abs.net> References: <000c01c692f4$daed5420$650fa8c0@gw98> <001c01c692fe$2de6d990$6401a8c0@ahainesp2400> <200606181413.10766.donovan@abs.net> Message-ID: <4495A25E.20300@perathoner.de> D Garcia wrote: > It looks like the domain expired, it eventually resolves to a placholder page > which has a bunch of link junk on it. $ whois ebooksgratuits.com reveals: Domain Name: EBOOKSGRATUITS.COM Created on: 11-Dec-03 Expires on: 11-Dec-06 Last Updated on: 17-May-06 so the domain has NOT expired. -- Marcello Perathoner webmaster@gutenberg.org From fvandrog at scripps.edu Sun Jun 18 11:58:39 2006 From: fvandrog at scripps.edu (Frank van Drogen) Date: Sun Jun 18 11:58:43 2006 Subject: [gutvol-d] http://www.ebooksgratuits.com/ In-Reply-To: <000c01c692f4$daed5420$650fa8c0@gw98> References: <000c01c692f4$daed5420$650fa8c0@gw98> Message-ID: <7.0.1.0.0.20060618115736.01d21348@scripps.edu> You might try to contact Patrick Merlo. (pmerlo at yahoo dot fr). Frank From blondeel at clipper.ens.fr Sun Jun 18 12:56:50 2006 From: blondeel at clipper.ens.fr (Sebastien Blondeel) Date: Sun Jun 18 12:56:54 2006 Subject: [gutvol-d] http://www.ebooksgratuits.com/ In-Reply-To: <000c01c692f4$daed5420$650fa8c0@gw98> References: <000c01c692f4$daed5420$650fa8c0@gw98> Message-ID: <20060618195650.GA26987@clipper.ens.fr> The manager of the project tells me, in a nutshell: . ISP (inspirenetworks.com) has disappeared since Thu at noon . major DNS outage? . mirror site being set up on ebooksgratuits.org, should run as of Wed/Thu . mailing list reporting problems in real time at http://fr.groups.yahoo.com/group/ebooksgratuits/ From prosfilaes at gmail.com Sun Jun 18 22:02:40 2006 From: prosfilaes at gmail.com (David Starner) Date: Sun Jun 18 22:02:42 2006 Subject: [gutvol-d] Deleting Clearances Message-ID: <6d99d1fd0606182202n2f6b15bh89c3099a8617956c@mail.gmail.com> Is there any way we could add a way to delete clearances from the clearance page? I have six clearances on my clearance page I'd like to kill; common reasons were I couldn't get suitable scans from my source or I ceeded it to some other volunteer with their own copy. Besides cluttering up my already cluttered clearance page, it makes some projects look more live than they are. From sly at victoria.tc.ca Sun Jun 18 22:13:44 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Sun Jun 18 22:13:47 2006 Subject: [gutvol-d] Deleting Clearances In-Reply-To: <6d99d1fd0606182202n2f6b15bh89c3099a8617956c@mail.gmail.com> References: <6d99d1fd0606182202n2f6b15bh89c3099a8617956c@mail.gmail.com> Message-ID: Yes, I agree this would be nice. In case there is confusion, what is under discussion is the copyright clearance system as used at: http://copy.pglaf.org/ Looking through my list of items with status "Cleared", I see that I have three clearances which were submitted manually to a white-washer; one which was a small volume of poems which was combined with another similar volume for posting to PG; and two which are duplicates of items already in PG that I didn't check closely enough. I see that there is a "Cancelled" status, which could be suitable for some of these. However, there does not seem to be a way to use it. Andrew On Mon, 19 Jun 2006, David Starner wrote: > Is there any way we could add a way to delete clearances from the > clearance page? I have six clearances on my clearance page I'd like to > kill; common reasons were I couldn't get suitable scans from my source > or I ceeded it to some other volunteer with their own copy. Besides > cluttering up my already cluttered clearance page, it makes some > projects look more live than they are. > _______________________________________________ From prosfilaes at gmail.com Sun Jun 18 22:18:34 2006 From: prosfilaes at gmail.com (David Starner) Date: Sun Jun 18 22:18:40 2006 Subject: [gutvol-d] Deleting Clearances In-Reply-To: References: <6d99d1fd0606182202n2f6b15bh89c3099a8617956c@mail.gmail.com> Message-ID: <6d99d1fd0606182218k7923e196q7242750691d3fadf@mail.gmail.com> On 6/19/06, Andrew Sly wrote: > Yes, I agree this would be nice. > > In case there is confusion, what is under discussion is the > copyright clearance system as used at: http://copy.pglaf.org/ > > Looking through my list of items with status "Cleared", I see that I > have three clearances which were submitted manually to a white-washer; > one which was a small volume of poems which was combined with another > similar volume for posting to PG; and two which are duplicates of items > already in PG that I didn't check closely enough. I've got a few that were posted to PG--Widger particularly seems to directly post when PPVing. Those would be better transfered to status Submitted, I would think; they need to stick around in some form. If they aren't getting moved to Submitted, how are they linked to the books behind the scenes? From traverso at dm.unipi.it Mon Jun 19 00:30:03 2006 From: traverso at dm.unipi.it (Carlo Traverso) Date: Mon Jun 19 00:26:39 2006 Subject: [gutvol-d] Deleting Clearances In-Reply-To: <6d99d1fd0606182218k7923e196q7242750691d3fadf@mail.gmail.com> (prosfilaes@gmail.com) References: <6d99d1fd0606182202n2f6b15bh89c3099a8617956c@mail.gmail.com> <6d99d1fd0606182218k7923e196q7242750691d3fadf@mail.gmail.com> Message-ID: <200606190730.k5J7U3F29072@pico.dm.unipi.it> It would also be handy to be able to keep alive a clearance for which a book has been posted, and more will follow. This is mainly for multi-volume works that are submitted one at a time. Carlo From fvandrog at scripps.edu Mon Jun 19 07:20:53 2006 From: fvandrog at scripps.edu (Frank van Drogen) Date: Mon Jun 19 07:20:59 2006 Subject: [gutvol-d] Deleting Clearances In-Reply-To: <6d99d1fd0606182218k7923e196q7242750691d3fadf@mail.gmail.co m> References: <6d99d1fd0606182202n2f6b15bh89c3099a8617956c@mail.gmail.com> <6d99d1fd0606182218k7923e196q7242750691d3fadf@mail.gmail.com> Message-ID: <7.0.1.0.0.20060619071954.0365ccb8@scripps.edu> >I've got a few that were posted to PG--Widger particularly seems to >directly post when PPVing. Those would be better transfered to status >Submitted, I would think; they need to stick around in some form. You can change them to the submitted state by 'previewing' any dummy file under the clearance. Frank From gbnewby at pglaf.org Tue Jun 20 09:34:09 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Tue Jun 20 09:34:11 2006 Subject: [gutvol-d] Fwd: Abbey Library of St. Gall, Switzerland: Online 100 manuscripts (fwd) Message-ID: <20060620163409.GA17431@pglaf.org> This might have some materials suitable for harvesting. -- Greg ----- Forwarded Message ---- From: Christoph Fl??eler To: christophe.flueler@unifr.ch Sent: Tuesday, June 20, 2006 10:10:11 AM Subject: Abbey Library of St. Gall, Switzerland: Online 100 manuscripts Abbey Library of St. Gall, Switzerland online - free access: www.cesg.unifr.ch - high resolution digital images: over 40'000 facsimile pages - regularly updated: now 100 complete manuscripts - manuscript descriptions and many search options - accessible in German, French, English and Italian Please recommend it to your colleagues and put a link to CESG on your homepage. ?? CESG - Codices Electronici Sangallenses ----- End forwarded message ----- From sly at victoria.tc.ca Tue Jun 20 21:28:28 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Jun 20 21:28:34 2006 Subject: [gutvol-d] Fwd: Abbey Library of St. Gall, Switzerland: Online 100 manuscripts (fwd) In-Reply-To: <20060620163409.GA17431@pglaf.org> References: <20060620163409.GA17431@pglaf.org> Message-ID: Perhaps on the new wiki, we could try adding a page for a list of propsed sites to harvest material from. I know that I seem to keep find more than I could ever deal with. Andrew On Tue, 20 Jun 2006, Greg Newby wrote: > This might have some materials suitable for harvesting. > -- Greg > From hart at pglaf.org Wed Jun 21 06:41:30 2006 From: hart at pglaf.org (Michael Hart) Date: Wed Jun 21 06:41:34 2006 Subject: [gutvol-d] !@! Just TWO Books Needed for 20,000!!! Message-ID: Anyone got anything coming in the next THREE hours??? ;-) Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From jon at noring.name Wed Jun 21 07:34:02 2006 From: jon at noring.name (Jon Noring) Date: Wed Jun 21 07:34:11 2006 Subject: [gutvol-d] 20000 (decimal) represented in other bases -- Impact on PG In-Reply-To: References: Message-ID: <106176712.20060621083402@noring.name> In reply to Michael's post asking for two more books to reach 20000 (yes, he must be itchy to reach another numerical milestone!), I was curious to see what 20000 (decimal) looks like in other numerical bases from 2-20: 2: 100111000100000 (binary) 3: 1000102202 4: 10320200 5: 1120000 6: 232332 7: 112211 8: 47040 (octal) 9: 30382 10: 20000 (decimal) 11: 14032 12: B6A8 13: 9146 14: 7408 15: 5DD5 16: 4E20 (hexadecimal) 17: 4138 18: 37D2 19: 2H7C 20: 2A00 Hmmmm, I am disappointed that 20000 in other bases is nothing special. No cool patterns -- no "Da Vinci" code stuff -- just "ordinary" sequences of numbers. There must be something wrong! 20000 (decimal) must be special in some way! It has to be special! Considering that base 10 (decimal) is also arbitrary in our modern world (why not 9 or 11 or ?), then I guess 20000 is nothing special either. That is, the current number of books, 19998, is only two less than 20000. Why aren't we celebrating over 19998? why does a 0.01% change all of a sudden start a wild party? (Don't we wish -- It's "par-tay time!") But I guess people like to see the odometer on the ole' car turn over from all 9's back to 0's. It's like a rebirth of sorts. So it is human nature, I suppose, to ascribe special meaning to certain patterns in numbers. Therefore, I recommend to PG that if human nature is important, and bigger is better, then PG should report the number of books it has in a lower base. Now, doesn't 232332 (base 6) sound much more impressive? You can report the number of books in the collection as: "# of books in PG's collection: 232332 [*]" And at the bottom of the page: "[*} Note, this is base 6." Jon Noring From marcello at perathoner.de Wed Jun 21 11:15:09 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Jun 21 11:15:14 2006 Subject: [gutvol-d] 20000 (decimal) represented in other bases -- Impact on PG In-Reply-To: <106176712.20060621083402@noring.name> References: <106176712.20060621083402@noring.name> Message-ID: <44998CAD.1090202@perathoner.de> Jon Noring wrote: > Therefore, I recommend to PG that if human nature is important, and > bigger is better, then PG should report the number of books it has in > a lower base. Wasn't it Donald E. Knuth who celebrated his 1,000,000th birthday? (base 2) We should count our books in base t where t == 2^(12/18) That would make all those computations about our keeping up with Moore's Law much simpler: If we have to add a new digit each new year, we are on schedule. I hope the advertising industry won't wisen up to this: everything would start to cost $10? and you'll have to read the fine print to find out the number base. ?) base the real price -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Wed Jun 21 11:46:18 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Jun 21 11:46:29 2006 Subject: [gutvol-d] 20000 (decimal) represented in other bases -- Impact on PG Message-ID: <20060621184623.9C68F2F93E@ws6-3.us4.outblaze.com> And below is an example of true geek humor. Us geeks are having a good chuckle. Everyone else is scratching their heads, saying, "What the *bleep* are they talking about!?" Josh PS And the Google nerds are busy searching for the meanings of the esoteric phrases... ;) > ----- Original Message ----- > From: "Marcello Perathoner" > To: "Jon Noring" , "Project Gutenberg Volunteer Discussion" > Subject: Re: [gutvol-d] 20000 (decimal) represented in other bases -- Impact on PG > Date: Wed, 21 Jun 2006 20:15:09 +0200 > > > Jon Noring wrote: > > > Therefore, I recommend to PG that if human nature is important, and > > bigger is better, then PG should report the number of books it has in > > a lower base. > > Wasn't it Donald E. Knuth who celebrated his 1,000,000th birthday? > (base 2) > > > We should count our books in base t where t == 2^(12/18) > > That would make all those computations about our keeping up with Moore's > Law much simpler: If we have to add a new digit each new year, we are on > schedule. > > > I hope the advertising industry won't wisen up to this: everything would > start to cost $10? and you'll have to read the fine print to find out > the number base. > > > ?) base the real price > > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Bowerbird at aol.com Wed Jun 21 13:41:51 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Jun 21 13:42:00 2006 Subject: [gutvol-d] scoo bee doo bee bee doo Message-ID: <522.e90e5d.31cb090f@aol.com> jon said: > Now, doesn't 232332 (base 6) sound much more impressive? yes, it does. especially if you're special enough to know that -- in base 6 lingo -- 3 is articulated as "bee", and 2 is "doo" except when it occurs at the start of a "word" in which case it is pronounced "scoo", meaning this number is vocalized as "scoo bee doo bee bee doo". -bowerbird p.s. personally, i like base 7 -- 112211 -- a lot because it's great to have m.c. palindrome on the ones and twos... p.p.s. there are 10 types of people in this world -- those who understand base 2 and those who don't. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060621/9eed935c/attachment.html From Bowerbird at aol.com Wed Jun 21 14:13:58 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Jun 21 14:14:15 2006 Subject: [gutvol-d] scraping the p.g. default .txt files Message-ID: <4ee.125e96c.31cb1096@aol.com> well, i have scraped the p.g. default .txt files -- http://www.gutenberg.org/files/#####/#####.txt -- from #10000 up, and surprisingly _quickly_. text is indeed compact. even when not zipped. a few notes. circa #18644 is the most recent? really? i thought we were up close to #20000? i take it .aus and .eur are in that count? please relabel human genome files! not really .txt! out of each 1,000 e-texts, about 150 are a.w.o.l. -- different types (e.g., mp3) or something or other, reducing these 8,644 down to some 7,000 or so. plus before i process further, i will toss out the non-english and other pesky variants... let's get it working on the simple ones first. which might take the 7,000 down to 6,000. i'd thought of it initially as a mere pilot-test, but it's looking more like split-half reliability. (i choose 10,000+ only because filenames were generated with a one-line template.) anyway, i chunked those files into folders of 1,000 e-texts each, because that was the size where my old machine starting choking, but o.s.x. seems to handle folders just fine even when the number of files inside is 5,000+... so i might consolidate the folders further, but in the meantime, the results lead to good news. each set of 1,000 e-texts takes roughly 300 megs, so the entire set of 20,000 would be about 6 gigs, meaning they will fit comfortably on today's dvd. and that is without any compression at all, baby. if we figure in compression, and tomorrow's dvd, we're talking an impressive library on a single disc. and a _huge_ library in a case containing 10 discs... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060621/9b5b32ff/attachment.html From Bowerbird at aol.com Wed Jun 21 15:03:19 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Jun 21 15:03:29 2006 Subject: [gutvol-d] chapter-headings linked to the table-of-contents Message-ID: <26d.b2f3709.31cb1c27@aol.com> i see that carlo, one of the smarter p.g. people, has started doing one of the things i suggested some time back -- having each chapter header link to the table of contents. well-done, carlo... > http://www.gutenberg.org/files/18627/18627-h/18627-h.htm#table chalk up another "i told you so". -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060621/3f33ddb5/attachment.html From Bowerbird at aol.com Wed Jun 21 23:20:38 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Jun 21 23:20:43 2006 Subject: [gutvol-d] scraping the p.g. default .txt files Message-ID: <270.b3ef12d.31cb90b6@aol.com> i said: > well, i have scraped the p.g. default .txt files -- > http://www.gutenberg.org/files/#####/#####.txt > -- from #10000 up, and surprisingly _quickly_. of course, the idea is to rework these e-texts into z.m.l. format. although i won't be able to get started on that for a few weeks, and it will probably take me about 6 months to finish them all, i did have a chance to do just a wee bit of experimentation... so, to see a review of the transformation of one such p.g. e-text: > http://snowy.arsc.alaska.edu/bowerbird/misc/screen1.html -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060622/59203d61/attachment.html From nwolcott2ster at gmail.com Thu Jun 22 09:05:58 2006 From: nwolcott2ster at gmail.com (Norm Wolcott) Date: Thu Jun 22 09:13:48 2006 Subject: [gutvol-d] David's in progress list Message-ID: <001001c69615$e72b6be0$650fa8c0@gw98> I have been unable to download david's in progress list. It stops at Abbott and just sits there and never finisihes. Is it posted anywhere else? nwolcott2@post.harvard.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060622/66638951/attachment.html From ajhaines at shaw.ca Thu Jun 22 10:23:58 2006 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Thu Jun 22 10:25:29 2006 Subject: [gutvol-d] David's in progress list References: <001001c69615$e72b6be0$650fa8c0@gw98> Message-ID: <002601c69620$a56f9130$6401a8c0@ahainesp2400> I just tried saving it to my Windows desktop, and checking that it was complete - no problem. Norm - if you want, and your e-mail has no problem with large zip files, I can forward it as a zip file. About 1.25M. Al ----- Original Message ----- From: Norm Wolcott To: 'Project Gutenberg Volunteer Discussion' Cc: harvard.edu@pglaf.org ; N Wolcott Sent: Thursday, June 22, 2006 9:05 AM Subject: [gutvol-d] David's in progress list I have been unable to download david's in progress list. It stops at Abbott and just sits there and never finisihes. Is it posted anywhere else? nwolcott2@post.harvard.edu ------------------------------------------------------------------------------ _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060622/6fbb9b33/attachment.html From greg at durendal.org Thu Jun 22 10:14:14 2006 From: greg at durendal.org (Greg Weeks) Date: Thu Jun 22 10:30:03 2006 Subject: [gutvol-d] David's in progress list In-Reply-To: <001001c69615$e72b6be0$650fa8c0@gw98> References: <001001c69615$e72b6be0$650fa8c0@gw98> Message-ID: On Thu, 22 Jun 2006, Norm Wolcott wrote: > I have been unable to download david's in progress list. It stops at > Abbott and just sits there and never finisihes. Is it posted anywhere > else? I downloaded it ok. I don't know of any backup copies. Let me know and I'll mail you a copy. -- Greg Weeks http://durendal.org:8080/greg/ From nwolcott2ster at gmail.com Thu Jun 22 11:36:24 2006 From: nwolcott2ster at gmail.com (Norm Wolcott) Date: Thu Jun 22 11:36:50 2006 Subject: [gutvol-d] David's in progress list References: <001001c69615$e72b6be0$650fa8c0@gw98> <002601c69620$a56f9130$6401a8c0@ahainesp2400> Message-ID: <002b01c6962a$cf4c1320$650fa8c0@gw98> Thanks--I got it, it was just very slow, itis a 5 meg file I don't think the browser liked it too much. nwolcott2@post.harvard.edu ----- Original Message ----- From: Al Haines (shaw) To: Project Gutenberg Volunteer Discussion Cc: N Wolcott ; harvard.edu@pglaf.org Sent: Thursday, June 22, 2006 1:23 PM Subject: Re: [gutvol-d] David's in progress list I just tried saving it to my Windows desktop, and checking that it was complete - no problem. Norm - if you want, and your e-mail has no problem with large zip files, I can forward it as a zip file. About 1.25M. Al ----- Original Message ----- From: Norm Wolcott To: 'Project Gutenberg Volunteer Discussion' Cc: harvard.edu@pglaf.org ; N Wolcott Sent: Thursday, June 22, 2006 9:05 AM Subject: [gutvol-d] David's in progress list I have been unable to download david's in progress list. It stops at Abbott and just sits there and never finisihes. Is it posted anywhere else? nwolcott2@post.harvard.edu ---------------------------------------------------------------------------- _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d ------------------------------------------------------------------------------ _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060622/e624678a/attachment-0001.html From sly at victoria.tc.ca Thu Jun 22 13:27:17 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Jun 22 13:27:22 2006 Subject: [gutvol-d] David's in progress list In-Reply-To: <001001c69615$e72b6be0$650fa8c0@gw98> References: <001001c69615$e72b6be0$650fa8c0@gw98> Message-ID: A while ago I found a page where someone had taken David's in progress list and broken it up into bite-sized pieces. They kept it periodically updated too. I thought I had saved the url, but now I can't find it. Andrew On Thu, 22 Jun 2006, Norm Wolcott wrote: > I have been unable to download david's in progress list. It stops at Abbott and just sits there and never finisihes. Is it posted anywhere else? From malcolm.farmer at gmail.com Thu Jun 22 15:44:45 2006 From: malcolm.farmer at gmail.com (Malcolm Farmer) Date: Thu Jun 22 15:51:55 2006 Subject: [gutvol-d] David's in progress list In-Reply-To: References: <001001c69615$e72b6be0$650fa8c0@gw98> Message-ID: <8baaac1d0606221544w538c851bkec2810cb9b621edf@mail.gmail.com> On 6/22/06, Andrew Sly wrote: > > > A while ago I found a page where someone had taken David's > in progress list and broken it up into bite-sized pieces. > They kept it periodically updated too. I thought I had > saved the url, but now I can't find it. Here's its index page, pointing to the individual lists by letter.: http://www.zuhause.org/dp/GutIP/ Done by Bruce Albrecht, who is user bgalbrecht at DP. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060622/f10c42be/attachment.html From Bowerbird at aol.com Fri Jun 23 13:06:54 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Jun 23 13:07:01 2006 Subject: [gutvol-d] the end of the line Message-ID: <360.64af89c.31cda3de@aol.com> as i watch all these p.g. e-texts float across my screen, i just can't help but have some thoughts recur to me... in the old days, when -- for some very good reasons -- a p.g. e-text was considered to be an _amalgamation_ of different versions of a book (even when it really was not, a fiction advised by p.g. legal counsel at that early time), that gave a good reason to remove end-line hyphenation and reflow text (without hyphenation) to p.g. margination. after all, hyphenation mostly causes problems in e-books. in the current era, however, where most p.g. e-texts are pegged to a specific version of a book (and where, for the most part, the scans are now retained to cement this direct correspondence), it no longer makes sense to discard the line-breaks, or even the end-line hyphenation, to be frank. yes, end-line hyphenation should be _marked_ in some way, so it can be automatically eliminated, but the _default_action_ should be to retain it. it would defeat the purpose of saving the line-breaks if you didn't also retain end-line hyphenation, because the goal here would be to duplicate the print version. (don't bother arguing that there would never be such a desire; maybe you'd never have any need for it, but _someone_ might. i can think of half-a-dozen such reasons -- want to hear them?) if you want to see what the future of electronic-books looks like, see the "digital reprints" that jose menendez has been producing. > http://www.ibiblio.org/ebooks/Mabie/ > http://www.ibiblio.org/ebooks/Cather/ > http://www.ibiblio.org/ebooks/Einstein/ the deep links to the actual .pdf "digital reprints" are these: > http://www.ibiblio.org/ebooks/Mabie/Books_Culture.pdf > http://www.ibiblio.org/ebooks/Cather/Antonia/Antonia.pdf > http://www.ibiblio.org/ebooks/Einstein/Einstein_Relativity.pdf aside from the unfortunate fact that jose is using the .pdf format (a format which makes it far to difficult to repurpose the content), these "digital reprints" carve out an awesome model for e-books. they replicate the original paper-book to a high degree of fidelity, and do so using a small percentage of the disk-space of the scans. yet because it is an e-book, it gives all the benefits that they give. (at least it _would_, if it wasn't a .pdf. but that part can be fixed.) and the secret of these "digital reprints" is extremely simple, folks; all that jose has done is merely to retain the original line-breaks... so, once again, i recommend and request that you start retaining this valuable information, instead of intentionally tossing it away. (it is very ironic, to me, that distributed proofreaders _retains_ the line-breaks during their proofing -- because it makes that process so much easier -- but then they discard the line-breaks! hey, there might be some end-users out there who need 'em too!) honestly, folks, when i look at your p.g. e-texts, what i see is that they're gonna be thrown on the trashpile one day -- maybe soon. in a world that is awash in scans, and where o.c.r. is a commodity, it'll be trivial to convert those scans to text. so if someone needs to have the ability to duplicate the print version -- i.e., they _need_ to have the line-break information you are routinely discarding -- they'll simply o.c.r. the scans again. they will be required to do that, because your e-texts simply won't do the job that they want done... that's not to say that your p.g. e-texts will be _completely_ worthless. as an independent digitization, they'll go a long way toward helping to move any new o.c.r. effort up to an absurdly high level of accuracy. but since the absurd level of accuracy can be applied to either e-text, and since the new effort will have retained the line-break information, that will be the one that's retained. the p.g. e-text will be thrown away. and it would break my heart to see all your hard work just thrown away. on the other hand, if y'all started retaining that line-break information, then it'd be _your_ version which would be kept (because of its primacy), and the new o.c.r. effort would just be seen as a tool to increase accuracy. if project gutenberg wants to remain as the premiere library in cyberspace, you're going to have to fix this glitch, and do it quickly. mark my words... -bowerbird p.s. at some of you aren't good at reading between the lines, i'll tell you that i intend to mount such a massive o.c.r. effort, so the question about which version, p.g. or not, receives the higher accuracy is a very real one. i don't want to challenge the p.g. library, _unless_ you've made it deficient. i'm trying to help you by giving you this advice before it becomes crucial... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060623/181693a8/attachment.html From hart at pglaf.org Fri Jun 23 14:38:48 2006 From: hart at pglaf.org (Michael Hart) Date: Fri Jun 23 14:38:49 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <360.64af89c.31cda3de@aol.com> References: <360.64af89c.31cda3de@aol.com> Message-ID: Bowerbird's lengthy essay is just one more example of how publishers, editors, etc., put their own needs ahead of those of the readers. While there might be some value in keeping references to arcane modes of pagination and margination for those who actually have those other reasons for opening books other than to simply read their contents, a certain respect for the reader, ostensably for whom all is being done by the publishers and editors, should clearly indicate that no longer is there any need for a slavish mentality to conserve the paper pages by introducing end of line hyphenation, or to create some appearances that there were actually the same number of characters on every line, when it is obvious to anyone who cares to look that there are not. And, as Mr. Bowerbird points out, end of line hyphenation can be some serious pain in the neck, depending on what programs you use to read, search, edit, etc. So, while I obviously agree that there are to camps in his model of a world of eBooks, I disagree as to which is primary. The reader is primary. Any effort to preserve items of interest only to publishers, editors, etc., should be the efforts that are invisible to the naked eye, with the option to bring them into view when desired, rather than defaults being of the nature that it is the millions of readers who have to do the process to eliminate them, rather than just a few who will prefer to have them visible. Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From Bowerbird at aol.com Fri Jun 23 16:36:59 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Jun 23 16:37:04 2006 Subject: [gutvol-d] the end of the line Message-ID: <516.1689c6b.31cdd51b@aol.com> i said: > (a format which makes it far to difficult to repurpose the content), haha. "far to difficult". i made a boo-boo. > p.s.? at some of you aren't good at reading between the lines "at some of you"...? wow. two mistakes (at least) in one post. good thing it's friday... ;+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060623/0d1bbbd9/attachment.html From jeroen.mailinglist at bohol.ph Fri Jun 23 16:43:12 2006 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Fri Jun 23 16:40:33 2006 Subject: [gutvol-d] the end of the line In-Reply-To: References: <360.64af89c.31cda3de@aol.com> Message-ID: <449C7C90.8040208@bohol.ph> Although I agree with Michael that there is no need to preserve things as linebreaks in most texts -- if you really need to go to that level of detail, there is always the original or the scans to fall back upon -- I want to make a case for preserving page numbers, if not at least as recognisable anchors in text, and only for those books being referenced to regularly by other books. This excludes most fiction, but is particularly important for scientific works, which have constructed a kind of paper web with cross references mainly based on page numbers. In long term, such references of course should give way to proper references to the actual paragraph or sentence being referenced, but as a practical ad-interim solution, staying with page numbers will increase the number of texts we can digitize with our limited means. This leads me to one place where further work could be done on the PG collection: turning it from a collection of static texts into an enriched web of knowledge. I've seen a lot of websites grabbing all of PG, and republishing it in a slightly modified form. I would however, like to see the collection be incorporated in a kind of wiki-like system, where people can add -- without tampering with the static source texts -- annotations, add tagging and create live cross references: both for own use, smaller dissemination in a group or publicy. I've added a large number of texts related to the Philippines to PG, and many of these text interact. Some critize each other, others provide opposing views, and so forth. It would be great to build a system that makes that easy to follow for everybody, such that people can immediately see, when reading a text, where it has been cited or referenced in other works. It would be great also to provide study introductions or synopises, to give users a grasp of the material, and enable them to find what they really need within reasonable time. Search enginges are a great tool, but only to a certain extend. Jeroen. From sly at victoria.tc.ca Fri Jun 23 17:16:44 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Fri Jun 23 17:16:46 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <449C7C90.8040208@bohol.ph> References: <360.64af89c.31cda3de@aol.com> <449C7C90.8040208@bohol.ph> Message-ID: There are places such as wikisource.org, where you could add the texts and start providing links such as you mention here immediately. Andrew On Sat, 24 Jun 2006, Jeroen Hellingman (Mailing List Account) wrote: > This leads me to one place where further work could be done on the PG > collection: > turning it from a collection of static texts into an enriched web of > knowledge. > I've seen a lot of websites grabbing all of PG, and republishing it in a > slightly modified > form. I would however, like to see the collection be incorporated in a > kind of wiki-like > system, where people can add -- without tampering with the static source > texts -- annotations, > add tagging and create live cross references: both for own use, smaller > dissemination in > a group or publicy. > > I've added a large number of texts related to the Philippines to PG, and > many of these > text interact. Some critize each other, others provide opposing views, > and so forth. It would > be great to build a system that makes that easy to follow for everybody, > such that > people can immediately see, when reading a text, where it has been cited > or referenced > in other works. It would be great also to provide study introductions or > synopises, to give > users a grasp of the material, and enable them to find what they really > need within > reasonable time. Search enginges are a great tool, but only to a certain > extend. > > Jeroen. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From jon at noring.name Fri Jun 23 22:12:12 2006 From: jon at noring.name (Jon Noring) Date: Fri Jun 23 22:12:24 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <449C7C90.8040208@bohol.ph> References: <360.64af89c.31cda3de@aol.com> <449C7C90.8040208@bohol.ph> Message-ID: <228656957.20060623231212@noring.name> [cc: Jose Menendez] Jeroen Hellingman wrote: > Although I agree with Michael that there is no need to preserve things > as linebreaks in most texts -- if you really need to go to that level > of detail, there is always the original or the scans to fall back upon > -- I want to make a case for preserving page numbers, if not at least > as recognisable anchors in text, and only for those books being > referenced to regularly by other books. First off, I agree with Bowerbird in the sense that it is a good thing to preserve both the line breaks and page breaks in the master marked- up texts converted from a source book. I assume with the DP work flow that this would not be that difficult of a thing to do, so why not do it if it could be done (mostly) automatically? For the OpenReader Publication Format, which is in an advanced stage of development, we're now putting together an OpenReader namespace set of elements to do various tasks. These elements may be used for all XML content documents which OpenReader now supports (an XHTML subset) and plans to support in the future (such as a subset of TEI). The namespaced elements include (attributes not described here): ... (simple hypertext linking) (embedding images, video and audio) (page break in a paper source) (line break in a paper source) (a generic marker) (both or:hlink and or:object will be defined using XLink.) With the permission of Jose Menendez, he is letting us use his copy of "My Antonia" (which is more accurate than the one I've been working on which hasn't yet been completely proofed), to put it into a demo of the OpenReader format. I've "diffed" it to my version and checked all differences found by consulting the original page scans, and it's been restored to the original 1918 edition (including textual errors -- the errors are specially marked however, including what the text should be based on both the Univ. of Nebraska online edition and Jose's edition), and have added precise line breaks and page breaks. For line breaks, I've placed the line breaks at the precise place of hyphenation. If the broken word does not have a natural hyphen, I use a ­ (a soft hyphen) to indicate that -- if the broken word does have a natural hyphen at the break, the hard hyphen character "-" is used. Here's an example paragraph (the 63rd paragraph in the text) which includes a page break, soft and hard hyphens: ****************************************************************************

The little girl was pretty, but Án-tonia — they accented the name thus, strongly, when they spoke to her — was still prettier. I re­membered what the conductor had said about her eyes. They were big and warm and full of light, like the sun shining on brown pools in the wood. Her skin was brown, too, and in her cheeks she had a glow of rich, dark color. Her brown hair was curly and wild-looking. The little sister, whom they called Yulka (Julka), was fair, and seemed mild and obedient. While I stood awkwardly confront­ing the two girls, Krajiek came up from the barn to see what was going on. With him was another Shimerda son. Even from a distance one could see that there was something strange about this boy. As he approached us, he began to make uncouth noises, and held up his hands to show us his fingers, which were webbed to the first knuckle, like a duck’s foot. When he saw me draw back, he began to crow delight­edly, “Hoo, hoo-hoo, hoo-hoo!” like a rooster. His mother scowled and said sternly, “Ma­rek!” then spoke rapidly to Krajiek in Bo­hemian.

***************************************************************************** If the above is rendered in plain text preserving the line breaks (ignore the page break), we have: (since this is an ASCII text email, I've converted the A-acute in "Antonia" to a unaccented A, em-dashes to "--", and curly quotes/apostrophes to the straight varieties.) ***************************************************************************** The little girl was pretty, but An-tonia -- they accented the name thus, strongly, when they spoke to her -- was still prettier. I re- membered what the conductor had said about her eyes. They were big and warm and full of light, like the sun shining on brown pools in the wood. Her skin was brown, too, and in her cheeks she had a glow of rich, dark color. Her brown hair was curly and wild- looking. The little sister, whom they called Yulka (Julka), was fair, and seemed mild and obedient. While I stood awkwardly confront- ing the two girls, Krajiek came up from the barn to see what was going on. With him was another Shimerda son. Even from a distance one could see that there was something strange about this boy. As he approached us, he began to make uncouth noises, and held up his hands to show us his fingers, which were webbed to the first knuckle, like a duck's foot. When he saw me draw back, he began to crow delight- edly, "Hoo, hoo-hoo, hoo-hoo!" like a rooster. His mother scowled and said sternly, "Ma- rek!" then spoke rapidly to Krajiek in Bo- hemian. ***************************************************************************** Of course, comments welcome on the above! Jon Noring From ke at gnu.franken.de Fri Jun 23 23:39:21 2006 From: ke at gnu.franken.de (Karl Eichwalder) Date: Fri Jun 23 23:39:35 2006 Subject: [gutvol-d] Re: the end of the line In-Reply-To: <228656957.20060623231212@noring.name> (Jon Noring's message of "Fri, 23 Jun 2006 23:12:12 -0600") References: <360.64af89c.31cda3de@aol.com> <449C7C90.8040208@bohol.ph> <228656957.20060623231212@noring.name> Message-ID: Jon Noring writes: > **************************************************************************** >

The little girl was pretty, but Án-tonia > — they accented the name thus, > strongly, when they spoke to her — was still prettier. I > re­membered what the conductor had said about her No result, if you grep for "remember". Consider to encode it as follows: remembered -- http://www.gnu.franken.de/ke/ | ,__o | _-\_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C From marcello at perathoner.de Sat Jun 24 07:10:40 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Jun 24 07:10:53 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <228656957.20060623231212@noring.name> References: <360.64af89c.31cda3de@aol.com> <449C7C90.8040208@bohol.ph> <228656957.20060623231212@noring.name> Message-ID: <449D47E0.1070102@perathoner.de> Jon Noring grudgingly admits: > (page break in a paper source) > (line break in a paper source) > (a generic marker) Why not use , and ? Insisting on making your own when there are perfectly good elements in TEI is just plain ... sub-optimal. > he began to crow delight­edly, Sorry to rain on your parade but your (at best) half-baked proposal has following shortcomings: 1. Non-standard use of ­ The soft-hyphen is a "non-printable" character that may be replaced with a "printable" hyphen by processors before output. Your use is to record the place where an existent hyphen has been stripped. You got it backwards. You confuse the very different stages of text feature recording and text output. 2. Throws off grep An xml-grep could find "delightedly" if searching for "delighted", but it surely won't find "delight­edly". 3. Redundant text feature documentation All you are doing here is repeatedly "documenting" that the character used to hyphenate words in this text is the hyphen. You don't have to repeat that statement through all of your text. A single statement to that effect in the TEI header will suffice. 4. Incompatibility with LOTE Remember that in LOTE you have to deal with cases like the German "ck" and "fff" which got hyphenated this way: dachdecker dachdek-ker Schiffahrt Schiff-fahrt Also remember French and Italian elisions that don't happen at line breaks. 5. Dependance on one edition All those hard-coded ­'s will marry your electronic text to one edition. You have no provision to encode different editions of the very same text like hardcover and paperback (which may very well have different line endings). Conclusion My advice is: forget entirely about line breaks. They are random artefacts introduced by the person operating the typesetting machine and indirectly by the person who chose paper size and font. They have no raison d'?tre once you separate the ebook from the scans, ie. after it left DP. (That this suggestion was by "You Know Who" should have tipped you off immediately.) But if you belong to that fastidious class of people who can't throw away even the most useless random artefact, I suggest doing it this standard way: ... he began to crow delightedly, ... A standard XHTML browser (OpenReader ?) will simply throw away the unknown tags and render the normalized text. A special processor may be used to reconstruct the paper layout of the text. -- Marcello Perathoner webmaster@gutenberg.org From jon at noring.name Sat Jun 24 09:05:28 2006 From: jon at noring.name (Jon Noring) Date: Sat Jun 24 09:05:43 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <449D47E0.1070102@perathoner.de> References: <360.64af89c.31cda3de@aol.com> <449C7C90.8040208@bohol.ph> <228656957.20060623231212@noring.name> <449D47E0.1070102@perathoner.de> Message-ID: <1509458751.20060624100528@noring.name> Marcello wrote: > Jon Noring grudgingly admits: >> (page break in a paper source) >> (line break in a paper source) >> (a generic marker) > Why not use , and ? Insisting on > making your own when there are perfectly good elements in TEI is just > plain ... sub-optimal. Actually, a very good idea. We've not fixed the "custom" elements yet. I'll have to look at the TEI-defined semantics of the use of the TEI equivalents, but *if* reasonably close to what we need, will likely embrace them. It will add to the list of namespace declarations, but that downside is pretty minor. Thanks. >> he began to crow delight­edly, > Sorry to rain on your parade but your (at best) half-baked proposal has > following shortcomings: No, I'm submitting the idea for feedback, and your feedback is valuable. > 1. Non-standard use of ­ > > The soft-hyphen is a "non-printable" character that may be replaced with > a "printable" hyphen by processors before output. > > Your use is to record the place where an existent hyphen has been stripped. Yes. > You got it backwards. You confuse the very different stages of text > feature recording and text output. Actually, I've been debating whether or not to include the ­ as it is used. > 2. Throws off grep > > An xml-grep could find "delightedly" if searching for > "delighted", but it surely won't find "delight­edly". Well, with existing toolbases, this might be. I believe, however, that Unicode itself implies that text processors should ignore ­ (U+00AD). One reference is: http://www.unicode.org/unicode/reports/tr14/#SoftHyphen In addition HTML discusses the use of the soft hyphen: http://www.w3.org/TR/html401/struct/text.html#hyphenation In summary, user agents, such as doing word searching, should ignore the soft hyphen character. That some don't is a real-world issue that unfortunately has to be pragmatically considered. > 3. Redundant text feature documentation > > All you are doing here is repeatedly "documenting" that the character > used to hyphenate words in this text is the hyphen. You don't have to > repeat that statement through all of your text. A single statement to > that effect in the TEI header will suffice. Two points (based on what I interpret you are saying): 1) We are not focusing on TEI documents, thus many XML documents will not have a TEI header. 2) The Unicode annex statement on the use of the soft hyphen (see above link) takes into account other characters used for word breaking purposes. It does not imply a "hard hyphen", but some character used for linebreaking depending upon the text's language and country code (required for all OpenReader Content Documents) > 4. Incompatibility with LOTE > > Remember that in LOTE you have to deal with cases like the German "ck" > and "fff" which got hyphenated this way: > > dachdecker > dachdek-ker > > Schiffahrt > Schiff-fahrt > > Also remember French and Italian elisions that don't happen at line breaks. Good points. I'll have to check the Unicode annex document (URL above) to see what it talks about regarding this. > 5. Dependance on one edition > > All those hard-coded ­'s will marry your electronic text to one > edition. You have no provision to encode different editions of the very > same text like hardcover and paperback (which may very well have > different line endings). Yes, this is an issue. I do plan to allow addition of an attribute to both the page break and line break pointing (via Binder identifier) to the source work. So the markup may contain multiple source works. Things get messy if in two works the same word is broken, but in different places. But I think my system will work for this. Example of identifier attribute (still using OR namespace): In the Binder document, in the "descriptions" section (now being amended), we might have: Second Edition Issued in 1922 > My advice is: forget entirely about line breaks. They are random > artefacts introduced by the person operating the typesetting machine and > indirectly by the person who chose paper size and font. They have no > raison d'?tre once you separate the ebook from the scans, ie. after it > left DP. (That this suggestion was by "You Know Who" should have tipped > you off immediately.) Disagreed. There may be a need, for example, to continue proofing work in the future. Knowing where line breaks occurred makes it easier with DP and similar processes. It also better correlates to the "bounding box information" from OCR which is being preserved. And *someone* may want to know this for formatting purposes. It is information about the source which by and large is easy for user-agents to ignore. Regarding you-know-who, I think you know that I often have profound disagreements with him, but when I agree with him, I agree. I don't let personal issues get in the way of acknowledging when I think he is right. Those who believe in objectivity evaluate what a person says. > But if you belong to that fastidious class of people who can't throw > away even the most useless random artefact, I suggest doing it this > standard way: > > > ... > he began to crow delightedly, > ... > > > A standard XHTML browser (OpenReader ?) will simply throw away the > unknown tags and render the normalized text. A special processor may be > used to reconstruct the paper layout of the text. Well, the real issue is dealing with the "fff", etc. issue of LOTE. I'll have to reread the Unicode annex. In OpenReader we reference that spec, and recommend user agents follow its guidelines. But it might not cover the particular LOTE "exceptions" you brought up. Thanks for your frank feedback. Definitely needed. Jon Noring From brad at chenla.org Sat Jun 24 19:06:35 2006 From: brad at chenla.org (Brad Collins) Date: Sat Jun 24 19:03:57 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <449D47E0.1070102@perathoner.de> (Marcello Perathoner's message of "Sat, 24 Jun 2006 16:10:40 +0200") References: <360.64af89c.31cda3de@aol.com> <449C7C90.8040208@bohol.ph> <228656957.20060623231212@noring.name> <449D47E0.1070102@perathoner.de> Message-ID: Marcello Perathoner writes: > My advice is: forget entirely about line breaks. They are random > artefacts introduced by the person operating the typesetting machine and > indirectly by the person who chose paper size and font. They have no > raison d'?tre once you separate the ebook from the scans, ie. after it > left DP. (That this suggestion was by "You Know Who" should have tipped > you off immediately.) I agree. Before encoding a text you have to decide if you are encoding the expression of the text or the manifestation of the text.[1] Marking up an expression is the structure and text of the text. This is what the author has created and has handed over to a publisher. Marking up a manifestation is all about layout and presentation. This is the realm of the publisher and this is where you get into fonts, line breaks etc. You can easily mark up a text as either one or the other, but it's not practical to try to do both in the same markup. There are a few examples of texts and manuscripts which would be worth having an expression level markup and a second manifestation markup, but these will be rare. I seriously doubt that any manifestation of Willa Cather's work would fall into this catagory :) Dead tree books fix a manifestation into a permanent arrangement. Electronic manifestations, which use systems like CSS to mold the manifestation to the moment and to the device on the fly, are liquid, if you try to hold them in your hand it just escapes through your fingers. The world of print books puts the publisher, and the manifestation at the center. The manifestation is more important than the author who has takes a back seat to the glorious manifestation that was made of the expression of her work. But when copying and distribution is for all practical purposes free and the manifestation has been reduced to an algorithm which an electronic reader interprets, the manifestation itself takes a back seat to the expression. The Age of the manifestation and the publisher is drawing to an end and we are slowly seeing the emergence of the Age of the expression and the author. PG is well named. Gutenberg's press was the first instance of fixing a manifestation so that millions of identical copies could be made. Before Gutenberg, each copy of a text was a different manifestation. Being able to make error free copies was a revolution, but came at the expense of easily being able to mold manifestations for different uses and environments. But you can make exact copy of an electronic text without it depending on any one manifestation of it. This is just as significant as Gutenburg's press. Is it useful to include some information from some manifestations in an expression level markup? Damn yes -- page breaks are the anchor and hyperlink in the world of paper. Countless millions of references to page numbers have been made over the last two centuries. Preserving page breaks is an essential part of preserving all those references which use them. So if you want to create a markup of a text which preserves a specific manifestation that's fine, there are whole sections of TEI devoted to allowing you to pick the tiniest bit of navel lint and preserving it for eternity. But for most purposes page scans of the original manifestation will provide enough of this information for most questions about a text, as well as provide the source material for the lint pickers to encode away to their heart's content for specific manifestations. But electronic books will mostly be in the business of preserving the expression of a work which can then be converted into other markup languages like XML or OR for dynamically generating flexible, ephemeral manifestations on the fly. b/ Footnotes: [1] I am using work, expression and manifestation as defined in the FRBR (Functional Requirements for Bibiographic Records). work :: the concept representing an intellectual or creative creation. expression :: includes the specific sequence of words, images and structure of work. manifestation :: includes the specific layout, typography, pagination etc of a specific expression. -- Brad Collins , Banqwao, Thailand From nwolcott2ster at gmail.com Sun Jun 25 06:45:53 2006 From: nwolcott2ster at gmail.com (Norm Wolcott) Date: Sun Jun 25 06:59:27 2006 Subject: [gutvol-d] the end of the line References: <360.64af89c.31cda3de@aol.com><449C7C90.8040208@bohol.ph> <228656957.20060623231212@noring.name><449D47E0.1070102@perathoner.de> Message-ID: <004f01c6985f$8c855060$640fa8c0@gw98> One could make the argument that the paragraph and perhaps the chapter are useful tags. Poetry and sidenotes and footnotes seem fairly established in PG without attitional tagging. Also will we be scanning 20 editions of Dickens, all with different line breaks and page numbers? nwolcott2@post.harvard.edu ----- Original Message ----- From: "Brad Collins" To: "Project Gutenberg Volunteer Discussion" Sent: Saturday, June 24, 2006 10:06 PM Subject: Re: [gutvol-d] the end of the line Marcello Perathoner writes: > My advice is: forget entirely about line breaks. They are random > artefacts introduced by the person operating the typesetting machine and > indirectly by the person who chose paper size and font. They have no > raison d'?tre once you separate the ebook from the scans, ie. after it > left DP. (That this suggestion was by "You Know Who" should have tipped > you off immediately.) I agree. Before encoding a text you have to decide if you are encoding the expression of the text or the manifestation of the text.[1] Marking up an expression is the structure and text of the text. This is what the author has created and has handed over to a publisher. Marking up a manifestation is all about layout and presentation. This is the realm of the publisher and this is where you get into fonts, line breaks etc. You can easily mark up a text as either one or the other, but it's not practical to try to do both in the same markup. There are a few examples of texts and manuscripts which would be worth having an expression level markup and a second manifestation markup, but these will be rare. I seriously doubt that any manifestation of Willa Cather's work would fall into this catagory :) Dead tree books fix a manifestation into a permanent arrangement. Electronic manifestations, which use systems like CSS to mold the manifestation to the moment and to the device on the fly, are liquid, if you try to hold them in your hand it just escapes through your fingers. The world of print books puts the publisher, and the manifestation at the center. The manifestation is more important than the author who has takes a back seat to the glorious manifestation that was made of the expression of her work. But when copying and distribution is for all practical purposes free and the manifestation has been reduced to an algorithm which an electronic reader interprets, the manifestation itself takes a back seat to the expression. The Age of the manifestation and the publisher is drawing to an end and we are slowly seeing the emergence of the Age of the expression and the author. PG is well named. Gutenberg's press was the first instance of fixing a manifestation so that millions of identical copies could be made. Before Gutenberg, each copy of a text was a different manifestation. Being able to make error free copies was a revolution, but came at the expense of easily being able to mold manifestations for different uses and environments. But you can make exact copy of an electronic text without it depending on any one manifestation of it. This is just as significant as Gutenburg's press. Is it useful to include some information from some manifestations in an expression level markup? Damn yes -- page breaks are the anchor and hyperlink in the world of paper. Countless millions of references to page numbers have been made over the last two centuries. Preserving page breaks is an essential part of preserving all those references which use them. So if you want to create a markup of a text which preserves a specific manifestation that's fine, there are whole sections of TEI devoted to allowing you to pick the tiniest bit of navel lint and preserving it for eternity. But for most purposes page scans of the original manifestation will provide enough of this information for most questions about a text, as well as provide the source material for the lint pickers to encode away to their heart's content for specific manifestations. But electronic books will mostly be in the business of preserving the expression of a work which can then be converted into other markup languages like XML or OR for dynamically generating flexible, ephemeral manifestations on the fly. b/ Footnotes: [1] I am using work, expression and manifestation as defined in the FRBR (Functional Requirements for Bibiographic Records). work :: the concept representing an intellectual or creative creation. expression :: includes the specific sequence of words, images and structure of work. manifestation :: includes the specific layout, typography, pagination etc of a specific expression. -- Brad Collins , Banqwao, Thailand _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From nwolcott2ster at gmail.com Sun Jun 25 06:58:58 2006 From: nwolcott2ster at gmail.com (Norm Wolcott) Date: Sun Jun 25 06:59:31 2006 Subject: [gutvol-d] ebooks libre et gratuits Message-ID: <005001c6985f$8d3df200$640fa8c0@gw98> Ebooks libre et gratuits had an arrangement with MH apparently where their books would appear on PG eventually. Now that the ebooks web site is no more, what will happen to the ebooksgratuits which did not make it to PG? Will all of this work have to be repeated by someone else? Is there an archive anywhere of this enormous quantitiiy of work? Why did ebooksgratuits disappear? Pressure from Canadian publishers/ government? Is there an unknown story here? nwolcott2@post.harvard.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060625/33f9bbec/attachment.html From jmk at his.com Sun Jun 25 07:25:19 2006 From: jmk at his.com (Janet Kegg) Date: Sun Jun 25 07:36:37 2006 Subject: [gutvol-d] ebooks libre et gratuits In-Reply-To: <005001c6985f$8d3df200$640fa8c0@gw98> References: <005001c6985f$8d3df200$640fa8c0@gw98> Message-ID: The site is now available again: http://www.ebooksgratuits.com/ See the front page of the Web site for what I believe (my French is almost nonexistent) is an explanation of what happened. On Sun, 25 Jun 2006 09:58:58 -0400, you wrote: >Ebooks libre et gratuits had an arrangement with MH apparently where their books would appear on PG eventually. Now that the ebooks web site is no more, what will happen to the ebooksgratuits which did not make it to PG? Will all of this work have to be repeated by someone else? Is there an archive anywhere of this enormous quantitiiy of work? Why did ebooksgratuits disappear? Pressure from Canadian publishers/ government? Is there an unknown story here? > >nwolcott2@post.harvard.edu From jon at noring.name Sun Jun 25 08:12:40 2006 From: jon at noring.name (Jon Noring) Date: Sun Jun 25 08:12:50 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <004f01c6985f$8c855060$640fa8c0@gw98> References: <360.64af89c.31cda3de@aol.com><449C7C90.8040208@bohol.ph> <228656957.20060623231212@noring.name><449D47E0.1070102@perathoner.de> <004f01c6985f$8c855060$640fa8c0@gw98> Message-ID: <161854616.20060625091240@noring.name> Norm Wolcott wrote: > One could make the argument that the paragraph and perhaps the chapter are > useful tags. Poetry and sidenotes and footnotes seem fairly established in > PG without attitional tagging. Also will we be scanning 20 editions of > Dickens, all with different line breaks and page numbers? Well, since I sort of initiated this sub-thread, let me note that the addition of an optional "page break" element in OpenReader is instigated mostly by the needs of modern educational books, where there may be mixed use with co-existing paper and ebook versions. And, yes, this feature has been asked for by a user agent vendor working with the educational community. Of course, this feature may be used to preserve page breaks for other purposes and sources, such as PG/DP. Do note that there exist lots of scholarly references which point to particular pages in particular paper manifestations of a work, so having page break info may eventually prove useful to interlink all the old stuff (provided, of course, that the focus is on preserving "manifestation" information in the master digital documents.) I don't see as much use for the line break empty tag, but we plan to include it so it's there for those who wish to use it. In the demo OpenReader Publication of "My Antonia", the line break element will be included. I'm still going over Marcello's suggestions, plus rereading the Unicode annex about line breaking (which *does* cover, in a general way, the unusual ways line breaks are done in LOTE, such as older German and Dutch.) The other part of this sub-thread, the discussion of FRBR, is also interesting. I discovered the FRBR a few years ago, and find it very useful to understand how to categorize textual works. I like to refer to the system it describes as "WEMI", which rhymes with "hemi" (for you auto buffs out there): Work -- Expression -- Manifestation -- Item http://www.ifla.org/VII/s13/frbr/frbr.pdf (WEMI is the mnemonic I use to remember the system!) Regarding "expression" versus "manifestation" in the digitization of public domain materials, such as done by DP and PG, I've made my thoughts known the last couple years, so I'll refrain from getting into that again at this time! Jon Noring From Bowerbird at aol.com Sun Jun 25 10:19:23 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Jun 25 10:19:28 2006 Subject: [gutvol-d] the end of the line Message-ID: <309.7649529.31d01f9b@aol.com> jon said: > Well, since I sort of initiated this sub-thread um, you mean since you _hijacked_ the thread... just had to talk about your shiny markup, didn't you? what a debilitating distraction... the need to retain line-breaks has nothing to do with markup. (and your example, which shows the absurd lengths to which a markup mentality will drive a person, was very illuminating, as is all the technoid jargon-jabbering in this "sub-thread".) p.g. introduces its own linebreaks into its plain-ascii e-texts, all without ever entering the markup arena. and i put in my own line-breaks, right here in these posts to this listserve, again without using any markup at all, just the return key. i'll bring this thread back to relevance starting tomorrow... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060625/feaeb328/attachment.html From jon at noring.name Sun Jun 25 12:25:44 2006 From: jon at noring.name (Jon Noring) Date: Sun Jun 25 12:25:56 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <309.7649529.31d01f9b@aol.com> References: <309.7649529.31d01f9b@aol.com> Message-ID: <121351504.20060625132544@noring.name> Bowerbird wrote: > p.g. introduces its own linebreaks into its plain-ascii e-texts, > all without ever entering the markup arena. > > and i put in my > own line-breaks, > right here in these > posts to this listserve, > again without using > any markup at all, > just the return key. > > i'll bring this thread back to relevance starting tomorrow... Yes, if one doesn't care about internal word breaks in the original, then the markup approach is *equivalent* to the plain text break approach. Using your example above, let's suppose we want to preserve internal word breaks, then we might have (ignore starting spaces, simply used to shift the left margin inward so it's easier to see in this message): and i put in my own line-breaks, right here in these posts to this list- serve, again without using any markup at all, just the return key. So the question is, is the word which is broken "listserve" or is it "list-serve"? Does it matter? Yes, for word searching purposes, and a few other purposes. I surmise you don't think that preserving the actual internal word break in the original is important, just shift the break to the nearest intra-word break. Well, fine, but some people might want to have the information preserved. The markup approach I presented gives the optional capability to mark this up. (I'm evaluating Marcello's feedback, so there are different markup approaches that may be taken.) Btw, if we don't care about internal word breaks, and place the break at the nearest intra-word break, your original example in markup becomes (using the or: namespace):

and i put in my own line-breaks, right here in these posts to this listserve, again without using any markup at all, just the return key.

If the above is rendered in a web browser, and the end-user does not care about where the line breaks occur and takes no action, the web browser ignores the tags, and the text is displayed nicely to fit the browser window parameters. But an ebook reading system, as well as simple CSS, can be used to "activate" the at user demand. Or if there's a conversion script of the markup to plain text, such as to ZML, then we know where the breaks are. (One advantage to using rather than
is that browsers will ignore the tag by default -- it would take CSS or a special user agent to activate them on demand.) Another advantage with using is that the markup document is not restricted to exact plain text formatting. This allows a lot of latitude for document authors to do what they want in their text editor editing the XML document. For example, the above markup could be expressed in the document as:

and i put in my own line-breaks, right here in these posts to this listserve, again without using any markup at all, just the return key.

Or as:

and i put in my own line-breaks, right here in these posts to this listserve, again without using any markup at all, just the return key.

Same thing... XML parsing user agents normalize all three to the same thing. But in plain text, if someone happens to edit your text, such as to and i put in my own line-breaks, right here in these posts to this listserve, again without using any markup at all, just the return key. The line breaks are changed and the original line breaks lost forever. What if someone takes a PG text formatted in ZML, and didn't understand it was ZML (see note below), did some line length reformatting, and then redistributed that -- especially if it's Bowerbird poetry? Jon Noring (Note: How would the user know the plain text they are working with is ZML? And how would they know in a particular instance that text line breaks *are* important? Is there going to be machine-readable metadata to say that the document is ZML or that text breaks are important? I recommended that a plain text document which conforms to ZML should have some message or processing-instruction-like thing at the beginning saying it is ZML, and which version, and possibly that line breaks are important in this particular document and why. That's the purpose for at the beginning of an XML document. It identifies it and even assists with determination of the text encoding. Will ZML require UTF-8 or UTF-16? Or will it stick to ASCII? Or will it allow ISO 8859-1? Or will it allow all of them? Will it allow any text encoding? How would a user agent know the text encoding of the ZML document, especially without having to process the whole thing?) From donovan at abs.net Sun Jun 25 13:22:35 2006 From: donovan at abs.net (D Garcia) Date: Sun Jun 25 13:23:02 2006 Subject: [dp-pg] Re: [gutvol-d] ebooks libre et gratuits In-Reply-To: References: <005001c6985f$8d3df200$640fa8c0@gw98> Message-ID: <200606251622.35322.donovan@abs.net> On Sunday 25 June 2006 10:25 am, Janet Kegg wrote: > The site is now available again: http://www.ebooksgratuits.com/ > > See the front page of the Web site for what I believe (my French is > almost nonexistent) is an explanation of what happened. The news item roughly translated is: As you probably noted, the site was inaccessible for over a week; the reason is that our ISP shut down following "A crippling DDOS attack" which they were not able to successfully block. We changed ISPs, and the site is once again available. We will take the necessary means so that this type of thing cannot happen again; I will speak about it again very soon. From gbnewby at pglaf.org Sun Jun 25 14:59:23 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Jun 25 14:59:25 2006 Subject: [gutvol-d] Automated readability scores for PG eBooks Message-ID: <20060625215923.GA18811@pglaf.org> Feedback/input would be valued. I've been corresopnding with Simon Ronald at RocketReader.com to see about integrating readability scores into the main PG book catalog. Because we don't have a lot of subject cataloging, one value of this is that it does a good job of identifing children's eBooks (they tend to be "easy"). This is also usable for people seeking to develop literacy or provide literacy instruction, by providing a way of reading something "harder" or "easier" as desired. Take a look at the list below (ten hardest, then ten easiest). The first score is overall, followed by a set of scores that made it up. I had provided some earlier feedback on how "hard" books were not necessarily prose, which is part of what Dr. Ronald is responding to. If you have feedback on the results, or my idea for adding these scores as an element of the catalog search results, please chime in! -- Greg ----- Forwarded message from "Dr. Simon Ronald" ----- Subject: Further Readability Results Date: Tue, 20 Jun 2006 03:34:26 +0930 Hello Greg, Here are some further "hardest and easiest" books based on a recent run. The run required 1 hour and 49 minutes to complete. This run classified 15,099 books - being a full scan of the English books. We incorporated a ordered list detection algorithm - some of the books contained (sometimes very noisy) lists of items - we found 162 books in total that were list based. It should be noted that we classified the entire book as list or "not list" based on a threshold -> if the book was a list then each separate line was considered a sentence for the purposes of readability. In time we will incorporate intra book list detection to allow the readability methodology to vary depending on the context within the book. It should also be noted that some of the HTML versions may well contain markup hints such as the use of the
    or
      HTML tag, we could use these and other tags to improve the quality of sentence chunking. Each entry has a series of 12 percentiles listed after the main readability percentile. These percentiles correspond to the 12 readability attributes in this order. bigword density short word density (-) wordsPerSentences syllablesPerWords profainwordsPerWords numbersPerWords mostCommon1000WordsPerWord (-) commascharsPerWords wordsPerParagraphs letterFrequencyDistributionError adjacentLetterPairsFrequencyDistributionError uniqueStemmedWordsPerWord; 99.914 95 90 95 97 0 79 86 84 79 88 85 90 Note on the Resemblances and Differences in the Structure and the Development of the Brain in Man and Apes (etext2354) 99.907 96 93 90 98 0 71 96 94 49 80 70 96 Original Letters and Biographic Epitomes (etext13203) 99.907 89 86 96 88 95 82 71 94 81 80 67 67 The Great Conspiracy, Volume 7 (etext7139) 99.904 85 87 93 86 0 86 97 75 78 88 78 99 A Biography of Edmund Spenser (etext6937) 99.897 92 90 95 92 88 69 76 90 80 48 60 78 Memoirs of the Court of St. Cloud (Being secret letters from a gentleman at Paris to a nobleman in London) \xe2\x8 0\x94 Volume 1 (etext3892) 99.897 82 92 32 87 88 93 98 87 68 93 83 85 Graf von Loeben and the Legend of Lorelei (etext11066) 99.894 96 93 89 98 80 91 84 97 84 88 36 27 The Modern Regime, Volume 2 (etext2582) 99.887 92 88 73 90 92 82 77 89 45 97 67 76 An Enquiry Concerning the Principles of Taste, and of the Origin of our Ideas of Beauty, etc. (etext13485) 99.887 91 89 88 92 0 64 75 99 96 88 67 96 Giordano Bruno (etext4228) 99.884 99 95 92 99 88 79 84 34 87 66 70 74 Monism as Connecting Religion and Science (etext9199) 99.881 93 91 75 92 94 64 67 77 94 88 67 80 Rise of the Dutch Republic, the \xe2\x80\x94 Volume 22: 1574-76 (etext4824) 99.874 91 92 23 93 80 80 98 95 70 80 91 74 The Principal Navigations, Voyages, Traffiques and Discoveries of the English Nation \xe2\x80\x94 Volume 01 (etext7182) 99.868 97 95 74 99 97 81 91 77 64 27 36 78 Gilbertus Anglicus (etext16155) 99.868 86 95 63 86 0 98 90 89 50 93 89 80 Cessions of Land by Indian Tribes to the United States: Illustrated by Those in the State of Indiana^M (etext17148 ) 99.858 87 87 95 87 80 56 79 93 63 66 83 73 An Essay towards Fixing the True Standards of Wit, Humour, Railery, Satire, and Ridicule (1744) (etext16233) 99.858 91 96 71 90 95 99 99 99 3 2 85 65 Noteworthy Families (Modern Science) (etext17128) 99.858 99 99 5 99 95 92 99 99 68 0 81 87 Roget's Thesaurus of English Words and Phrases (etext10681) 99.854 97 92 89 98 0 83 81 79 95 97 60 57 Eighteenth Brumaire of Louis Bonaparte (etext1346) 99.844 99 99 84 99 80 98 93 14 61 88 83 55 Venereal Diseases in New Zealand (1922) (etext15352) 99.844 96 91 98 96 88 80 74 98 97 66 64 9 Act, Declaration, & Testimony for the Whole of our Covenanted Reformation, as Attained to, and Established in Britain and Ireland; Particularly Betwixt the Years 1638 and 1649, Inclusive (etext13200) 99.844 90 89 96 90 0 79 78 69 87 66 70 97 Dr. Bullivant (etext9249) 99.831 90 88 94 89 80 71 61 96 90 66 30 72 Memoirs of the Court of St. Cloud (Being secret letters from a gentleman at Paris to a nobleman in London) \xe2\x8 0\x94 Volume 7 (etext3898) 99.831 99 99 87 99 99 92 88 27 54 93 94 34 Three Contributions to the Theory of Sex (etext14969) 99.831 95 88 91 92 80 61 67 72 83 93 64 70 Superstition Unveiled (etext15696) 99.831 95 93 45 97 80 89 97 94 43 13 72 80 Aboriginal American Authors (etext9188) 99.824 89 92 88 88 0 90 92 56 54 80 94 84 Transactions of the American Society of Civil Engineers, Vol. LXVIII, Sept. 1910 (etext18012) 99.824 87 99 40 90 0 99 95 94 3 88 92 97 On the Origin of Species (etext8205) 99.798 91 89 93 90 88 69 60 89 86 48 47 74 Memoirs of the Court of St. Cloud (Being secret letters from a gentleman at Paris to a nobleman in London) \xe2\x8 0\x94 Volume 3 (etext3894) 99.798 90 87 85 88 0 81 64 96 84 48 81 94 The Lives of the Twelve Caesars, Volume 11: Titus (etext6396) 99.798 90 86 97 90 0 81 76 90 94 27 76 82 The evolution of English lexicography (etext11694) 99.798 73 81 96 78 98 56 54 95 72 66 89 96 A Modest Proposal (etext1080) 99.798 95 96 41 97 0 95 96 79 54 98 78 78 Webster's March 7th Speech/Secession (etext1663) 99.798 91 92 47 92 92 50 87 87 87 66 64 89 Rise of the Dutch Republic, the \xe2\x80\x94 Volume 01: Introduction I (etext4801) 99.798 88 86 77 88 95 50 73 73 97 80 72 83 Rise of the Dutch Republic, the \xe2\x80\x94 Volume 26: 1577, part III (etext4828) 99.785 95 85 98 90 88 75 71 92 94 66 56 32 The Auchensaugh Renovation of the National Covenant and (etext12381) 99.785 93 92 81 92 88 92 85 98 84 80 24 16 The Modern Regime, Volume 1 (etext2581) 99.785 70 78 92 75 92 81 79 94 87 48 52 82 The Mayflower and Her Log; July 15, 1620-May 6, 1621 \xe2\x80\x94 Volume 5 (etext4105) Easiest 4.176 2 1 11 2 0 0 1 36 9 48 78 11 The Song of the Blood-Red Flower (etext12935) 4.176 0 0 11 0 0 0 15 48 8 2 94 6 Six Little Bunkers at Grandma Bell's (etext14623) 4.176 14 8 27 12 0 0 9 15 10 27 41 7 Melbourne House, Volume 2 (etext12964) 4.176 7 3 6 4 0 0 5 3 9 66 86 23 The Romantic (etext13292) 4.176 17 5 14 10 0 0 1 12 48 48 6 23 The Girl from Montana (etext15274) 4.176 7 6 13 7 0 0 11 27 22 27 56 9 Jess of the Rebel Trail (etext15382) 4.176 5 3 8 3 0 0 0 2 45 66 60 25 Stories of American Life and Adventure (etext15597) 4.176 3 8 13 6 0 0 12 3 72 1 78 13 Kazan (etext10084) 4.176 11 5 8 8 0 0 12 1 9 48 94 11 The Second Honeymoon (etext17446) 4.176 13 9 10 12 0 0 13 8 3 48 36 23 The Circus Boys on the Plains : or, the Young Advance Agents Ahead of the Show (etext2478) 4.176 19 11 11 15 0 0 10 26 42 27 9 1 The Captives (etext3601) 4.176 0 1 12 1 0 0 2 3 49 27 90 29 Old Granny Fox (etext4980) 4.176 0 0 6 0 0 0 2 0 19 27 89 54 Sleepy-Time Tales: the Tale of Fatty Coon (etext5701) 4.176 0 0 12 1 0 0 10 6 31 2 93 40 The Adventures of Johnny Chuck (etext5844) 4.176 2 3 11 3 0 0 7 1 12 5 85 54 Tale of Brownie Beaver (etext6754) 4.176 12 7 19 8 0 0 9 7 55 5 52 13 The City of Fire (etext7008) 4.176 23 9 10 15 0 0 13 16 18 13 3 29 The Man with Two Left Feet^M (etext7471) 4.176 4 5 13 5 0 0 8 15 40 5 80 21 Way of the Lawless (etext9903) 3.931 11 7 11 8 0 0 14 7 36 2 85 9 The Hunted Woman (etext11328) 3.931 4 1 14 2 0 0 1 34 44 27 64 3 Mary Marie (etext11143) 3.931 12 8 22 11 0 0 10 20 15 13 41 13 Contrary Mary (etext17938) 3.931 0 0 3 0 0 0 0 1 0 98 97 29 The New McGuffey First Reader (etext1489) 3.931 4 7 18 6 0 0 7 33 21 13 56 9 Martin Pippin in the Apple Orchard (etext2032) 3.931 8 9 8 8 0 0 1 12 48 13 64 23 Twenty-Two Goblins (etext2290) 3.931 9 8 9 8 0 0 7 5 43 27 72 13 God's Country\xe2\x80\x94And the Woman (etext4585) 3.931 12 11 11 13 0 0 6 12 60 27 13 14 The Valley of Silent Men (etext4707) 3.931 5 2 23 3 0 0 6 16 10 13 80 23 The Boy Scout Camera Club, or, the Confession of a Photograph (etext7356) 3.931 8 9 15 10 0 0 17 4 6 5 81 21 Bob Cook and the German Spy (etext9899) 3.676 16 6 8 10 0 0 25 3 3 13 76 13 The Three Sisters (etext11876) 3.676 7 2 10 3 0 0 3 17 43 27 78 11 His Second Wife (etext17259) 3.676 6 5 17 6 0 0 14 33 24 5 60 3 Michael O'Halloran (etext9489) 3.384 12 7 6 10 0 0 28 0 20 27 13 32 The Sheriff's Son (etext17043) 3.384 1 0 17 1 0 0 9 36 11 5 88 9 The Bobbsey Twins in the Great West (etext5952) 3.384 6 3 7 4 0 0 3 24 20 13 52 34 Pan (etext7214) 3.384 0 0 5 0 0 0 2 1 20 5 93 59 Five Little Friends (etext7801) 3.384 0 1 7 1 0 0 13 0 11 1 88 53 The Tale of Sandy Chipmunk (etext9462) 3.109 0 0 6 0 0 0 0 19 0 2 93 50 Boy Blue and His Friends (etext16046) 3.109 8 3 10 4 0 0 1 30 19 27 75 6 Wanderers (etext7762) 2.874 1 2 56 2 0 0 0 33 32 2 9 5 Twilight Land (etext1751) 2.874 0 0 13 0 0 0 11 38 9 1 91 6 The Curlytops on Star Island (etext5989) 2.666 0 0 10 0 0 0 6 44 9 13 76 7 The Bobbsey Twins at Home (etext18420) 2.666 0 0 34 0 0 0 0 1 43 48 64 5 The King of Ireland's Son (etext3495) 2.460 7 11 12 11 0 0 8 3 70 5 19 16 Baree, Son of Kazan (etext4748) 2.460 10 9 11 10 0 0 2 15 9 13 52 21 Samuel the Seeker (etext5961) 2.255 6 1 2 2 0 0 28 1 7 80 30 14 Plays (etext10623) 2.255 12 13 14 14 0 0 11 4 23 5 30 14 King of the Khyber Rifles (etext6066) 2.255 3 3 19 3 0 0 6 11 13 13 64 23 Riders of the Silences (etext9867) 1.917 4 2 7 3 0 0 11 4 7 48 89 6 Anne Severn and the Fieldings (etext10817) 1.785 0 0 15 0 0 0 0 9 13 5 72 36 Fifty Famous Stories Retold (etext18442) 1.507 8 4 17 6 0 0 21 0 24 13 19 16 The Light in the Clearing (etext14150) 1.507 9 7 21 8 0 0 3 2 30 27 13 14 Voyages of Dr. Dolittle (etext1154) 1.507 0 0 2 0 0 0 43 22 3 27 30 2 Six Plays (etext5618) 1.391 4 6 18 6 0 0 9 0 24 5 67 6 The Secret Garden (etext17396) 1.391 12 9 8 10 0 0 16 7 21 13 2 19 Black Jack (etext9925) 1.080 10 4 14 6 0 0 8 9 14 5 24 19 The Gay Cockade (etext16433) 0.742 4 4 14 4 0 0 5 3 55 2 6 16 Isobel : a Romance of the Northern Trail (etext6715) 0.440 0 0 5 0 0 0 2 2 0 1 97 16 Bunny Rabbit's Diary (etext16982) 0.281 6 3 6 4 0 0 22 2 8 5 19 6 Mary Olivier: a Life (etext9366) Cheers, Dr. Simon Ronald CEO The Leader in High Performance Reading Level 2, 25 Gresham Street, Adelaide, SA, Australia, 5000 GPO Box 944, Adelaide SA 5001 Ph. +61 8 8410 2771 Fax. +61 8 8125 6679 1133 Broadway, Suite 706 New York, NY 10010 Ph: (646) 736 7673 (New York) Ph: (415) 992 5412 (California) Fax: (877) 731 4410 (toll free) ____________________________________ ----- End forwarded message ----- ----- End forwarded message ----- From scott_bulkmail at productarchitect.com Sun Jun 25 18:11:22 2006 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Sun Jun 25 18:26:56 2006 Subject: [gutvol-d] Automated readability scores for PG eBooks In-Reply-To: <20060625215923.GA18811@pglaf.org> References: <20060625215923.GA18811@pglaf.org> Message-ID: >If you have feedback on the results, or my idea for >adding these scores as an element of the catalog search >results, please chime in! I think that a readability score on every book is a super good idea. And, the n easiest/hardest would make good lists for the site (well, formatted as an HTML table, and perhaps including author as well as title). And, there's probably no harm in including the sub-scores, though the overall is certainly most important for public consumption. Cheers, Scott S. Lawton http://blogsearch.com/ - a starting point http://ProductArchitect.com/ - consulting From phil at thalasson.com Sun Jun 25 17:35:22 2006 From: phil at thalasson.com (Philip Baker) Date: Sun Jun 25 18:52:33 2006 Subject: [gutvol-d] ebooks libre et gratuits In-Reply-To: <200606251622.35322.donovan@abs.net> Message-ID: In article <200606251622.35322.donovan@abs.net>, D Garcia writes >On Sunday 25 June 2006 10:25 am, Janet Kegg wrote: >> The site is now available again: http://www.ebooksgratuits.com/ >> >> See the front page of the Web site for what I believe (my French is >> almost nonexistent) is an explanation of what happened. > >The news item roughly translated is: > >As you probably noted, the site was inaccessible for over a week; the reason >is that our ISP shut down following "A crippling DDOS attack" which they were >not able to successfully block. We changed ISPs, and the site is once again >available. We will take the necessary means so that this type of thing cannot >happen again; I will speak about it again very soon. They are being rather optimistic but we will have to wait and see if their "mesures n?cessaires" work. -- Philip Baker From sly at victoria.tc.ca Sun Jun 25 21:40:55 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Sun Jun 25 21:40:58 2006 Subject: [gutvol-d] ebooks libre et gratuits In-Reply-To: References: Message-ID: I think that this shows the value of the PG approach (What MH likes to call "Unlimited distribution"), where the whole collection is mirrored in many different locations. So if the main server is down, there are plenty of alternate sites availible. Andrew On Mon, 26 Jun 2006, Philip Baker wrote: > In article <200606251622.35322.donovan@abs.net>, D Garcia > writes > >As you probably noted, the site was inaccessible for over a week; the reason > >is that our ISP shut down following "A crippling DDOS attack" which they were > >not able to successfully block. We changed ISPs, and the site is once again > >available. We will take the necessary means so that this type of thing cannot > >happen again; I will speak about it again very soon. > > > They are being rather optimistic but we will have to wait and see if > their "mesures n?cessaires" work. > From Bowerbird at aol.com Sun Jun 25 23:49:35 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Jun 25 23:49:43 2006 Subject: [gutvol-d] Automated readability scores for PG eBooks Message-ID: <520.1afbfc5.31d0dd7f@aol.com> greg said: > one value of this is that it does > a good job of identifing children's eBooks > (they tend to be "easy").? checklist said: > bigword density > short word density (-) > wordsPerSentences > syllablesPerWords > profainwordsPerWords > numbersPerWords > mostCommon1000WordsPerWord (-) > commascharsPerWords > wordsPerParagraphs > letterFrequencyDistributionError > adjacentLetterPairsFrequencyDistributionError > uniqueStemmedWordsPerWord; aren't scientists silly? :+) look, greg, if you want a list of children's e-books, or a list of "easy" e-books, or any kind of list of books, just ask the distributed proofreaders people for the list... they'll give you a long list of books, any kind of list you want, and you won't have to do one little bit of fancy-ass statistics... i'm serious, they can give a list with p.g. e-text numbers and meaningful notes, and funny little stories, and _everything_... much more vivid than your boring-ass statistics... :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060626/4374cb79/attachment-0001.html From Bowerbird at aol.com Sun Jun 25 23:53:10 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Jun 25 23:53:17 2006 Subject: [gutvol-d] ebooks libre et gratuits Message-ID: <531.136b248.31d0de56@aol.com> unlimited distribution rocks big time... major fucking concept... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060626/60950a94/attachment.html From prosfilaes at gmail.com Mon Jun 26 00:03:05 2006 From: prosfilaes at gmail.com (David Starner) Date: Mon Jun 26 00:32:03 2006 Subject: [gutvol-d] Automated readability scores for PG eBooks In-Reply-To: <20060625215923.GA18811@pglaf.org> References: <20060625215923.GA18811@pglaf.org> Message-ID: <6d99d1fd0606260003v1da9790ep3aed09dd6fc9414@mail.gmail.com> On 6/25/06, Greg Newby wrote: > Because we don't have a lot of subject cataloging, one > value of this is that it does a good job of identifing > children's eBooks (they tend to be "easy"). If the problem is that we don't have a lot of subject cataloging, provide more subject cataloging. We could copy the LoC cataloging for most of the catalog without too much work. If we're going to a Wiki-type thing, lists of children's books, mysterys, sci-fi, etc. will be made, and will be superior to this. > This is also usable for people seeking to develop > literacy or provide literacy instruction, by providing > a way of reading something "harder" or "easier" as desired. If the problem is literacy instruction, then we should work on a list of books for literacy, not rely on some tool that can't tell the difference between a 17th century children's book and a 20th century one, or how much dialect is used. Again, a Wiki-tool is perfect for this. > If you have feedback on the results, or my idea for > adding these scores as an element of the catalog search > results, please chime in! I think that these are somewhat interesting, but they are far from the most interesting factoids. I've been drooling over Amazon's Statistically Improbable Phrases, personally. I surely wouldn't have them as promenant as on the search page; I don't think it's the most important thing that most people look at. > 0.281 6 3 6 4 0 0 22 2 8 5 19 6 Mary Olivier: a Life > (etext9366) This is surely a mistake; the second sentence in the book is "When old Jenny shook it the wooden rings rattled on the pole and grey men with pointed heads and squat, bulging bodies came out of the folds on to the flat green ground. " The numbers are too hard to decipher in this form to really try and understand why. I also wonder about "profainwordsPerWords"? The profanity of words has little to do with the readability; they're just adjectives and nouns from that perspective. From traverso at dm.unipi.it Mon Jun 26 00:38:48 2006 From: traverso at dm.unipi.it (Carlo Traverso) Date: Mon Jun 26 00:34:29 2006 Subject: [gutvol-d] ebooks libre et gratuits In-Reply-To: (message from Andrew Sly on Sun, 25 Jun 2006 21:40:55 -0700 (PDT)) References: Message-ID: <200606260738.k5Q7cm402654@pico.dm.unipi.it> >>>>> "Andrew" == Andrew Sly writes: Andrew> I think that this shows the value of the PG approach (What Andrew> MH likes to call "Unlimited distribution"), where the Andrew> whole collection is mirrored in many different Andrew> locations. So if the main server is down, there are plenty Andrew> of alternate sites availible. Andrew> Andrew They do allow mirroring the collection, just nobody did. I think that they cannot afford to pay several sites (but clearly they keep copies). They work as Life+50, so a mirroring by PG is impossible (but might be possible by a PG+50) Carlo From gbnewby at pglaf.org Mon Jun 26 01:30:40 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Jun 26 01:30:42 2006 Subject: [gutvol-d] ebooks libre et gratuits In-Reply-To: References: Message-ID: <20060626083040.GD26556@pglaf.org> On Sun, Jun 25, 2006 at 09:40:55PM -0700, Andrew Sly wrote: > > I think that this shows the value of the PG approach > (What MH likes to call "Unlimited distribution"), where > the whole collection is mirrored in many different > locations. So if the main server is down, there are > plenty of alternate sites availible. > > Andrew I think Michael's approach to unlimited distribution is a little different, but not that different. What you're actually talking about is gbn's approach to belt+suspenders when it comes to server resiliency. Insert obligatory Linux Torvalds quote about mirroring, here. -- Greg > On Mon, 26 Jun 2006, Philip Baker wrote: > > > In article <200606251622.35322.donovan@abs.net>, D Garcia > > writes > > >As you probably noted, the site was inaccessible for over a week; the reason > > >is that our ISP shut down following "A crippling DDOS attack" which they were > > >not able to successfully block. We changed ISPs, and the site is once again > > >available. We will take the necessary means so that this type of thing cannot > > >happen again; I will speak about it again very soon. > > > > > > They are being rather optimistic but we will have to wait and see if > > their "mesures n?cessaires" work. > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From walter.van.holst at xs4all.nl Mon Jun 26 02:12:04 2006 From: walter.van.holst at xs4all.nl (Walter van Holst) Date: Mon Jun 26 02:12:09 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <309.7649529.31d01f9b@aol.com> References: <309.7649529.31d01f9b@aol.com> Message-ID: <449FA4E4.9010505@xs4all.nl> Bowerbird@aol.com wrote: > > p.g. introduces its own linebreaks into its plain-ascii e-texts, all > without ever entering the markup arena. > > and i put in my own line-breaks, right here in these posts to this > listserve, again without using any markup at all, just the return key. Line-breaks are mark-up. They don't add anything whatsoever to the text itself and are completely arbitrarily decided, usually based on the technology that is used to display the actual content. You can deny the difference between structure, content and presentation all you want, but it is perfectly possible to reformat a book using columns instead of lines without changing the actual content. And where will your precious line-breaks go in that case? Greetings, Walter From gbnewby at pglaf.org Mon Jun 26 02:32:37 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Jun 26 02:32:38 2006 Subject: [gutvol-d] New DVD ISO feedback sought Message-ID: <20060626093237.GA27369@pglaf.org> I've been working, slowly, on some new CD/DVD images (ISO files) for our use. As many people know, we've given away many thousands of free CDs and DVDs, and added the ISO images (along with BitTorrent, RAR and other formats) to the main PG collection. You can peruse the images I've been working on here: http://snowy.arsc.alaska.edu/gbn/pgimages actual ISOs are at: ftp://snowy.arsc.alaska.edu/pub/gbn/isos These are not completed....I'll be adding stuff like GUTINDEX.ALL, donate-howto.txt, and a README.TXT You can see the nifty tool for creating such images here: http://snowy.arsc.alaska.edu/pgiso/ Here are the main two CD/DVD collections for you to consider: 1) "As many titles as possible." In the tool, I specified these numbers: 1-2199,2225-3500,3525-11774,11800-20000 with "no copyrighted", "txt/zip" format, and any language. The result is all of the zipped eBooks in plain text format, minus our copies of the Human Genome. (No, we don't go up to #20000 in the main PG collection, which the tool uses...only 18683 as of right now. I'm just using a high enough number that I don't need to look up the actual number.) This should be similar to our eBook #11800, the PG 2003 "10k special." For that, we tried to add as many as possible, resulting in ~9300 titles including .txt and .html (also Genome), all zipped. Surprisingly, we can fit *all* 17454 of our non-copyrighted text/zip titles with space to spare in a DVD: about 3.5GB. In case you're wondering (I was!), including as many HTML titles as possible (including their images) in html/zip, then filling in the rest with text/zip, yields about 3.25 DVDs (14.5GB). 2) "Best of Redux." Our Best Of CD image was made by human selection (on this list!), resulting in just under 600 titles. Many are HTML. Since #11220, we've added lots of great stuff. So, what would go on today's "Best Of"? I went ahead and recreated the image in the new tool, and also made one emphasizing HTML (since some titles have been moved to HTML that were previously just text). I've uploaded (to the /pgimages URL) the list of the "Best Of" eBook numbers, as well as the list of "best of" public domain that Amazon did last year (remember that?). GOALS: - confirm viability/suitability of the "allzipnohgp" collection (#1 above); make any suggestions. This is basically the densest way of getting people all of the PG collection, fitting easily on a single DVD. (Yes, I plan on filling it up with some of our nice HTML & multimedia. Your ideas are solicited.) - consider ways forward for a new "Best of" - either CD or DVD. The only thing I feel strongly about is showcasing some of our beautiful HTML titles with nice images. Yes to all the classics, and yes to plain text or HTML...but consider "best of" in terms of PG's best work, not just the classic titles. If anyone would really like to run with these idea and create some new images, go for it! The snowy tool makes it easy to share your own collections, and we have many places to distribute ISOs you create. I do think it's time to create some new "primary" giveaway images, though, and appreciate any ideas you might have. -- Greg From scott_bulkmail at productarchitect.com Mon Jun 26 04:19:07 2006 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Mon Jun 26 04:19:42 2006 Subject: [gutvol-d] Automated readability scores for PG eBooks In-Reply-To: <6d99d1fd0606260003v1da9790ep3aed09dd6fc9414@mail.gmail.com> References: <20060625215923.GA18811@pglaf.org> <6d99d1fd0606260003v1da9790ep3aed09dd6fc9414@mail.gmail.com> Message-ID: >If the problem is that we don't have a lot of subject cataloging, >provide more subject cataloging. We could copy the LoC cataloging for >most of the catalog without too much work. >If the problem is literacy instruction, then we should work on a list >of books for literacy, not rely on some tool that can't tell the >difference between a 17th century children's book and a 20th century >one, or how much dialect is used. While I agree that it would not be worth adding readability score if it had much impact on these and other worthy goals, I really don't see it as either/or. Granting of course that adding scores will take some time away from other projects (and, that it's not my personal time at stake here), I still see this as relatively high gain for relatively low investment. There are lots and lots of cool things that could be done with the catalog. And, any relatively "easy" (i.e. automated) method of adding readability scores will inevitably miscategorize a whole bunch of books. But, I think the 'signal' will far outweigh the 'noise'. Even in the context of the above, the scores would provide a great starting point for being improved with manual cataloging and literacy labeling. Don't let the perfect stand in the way of the good. Plus, I think the scores (and miscategorizations) are interesting in and of themselves for those of us interested in words and language. Cheers, Scott S. Lawton http://blogsearch.com/ - a starting point http://ProductArchitect.com/ - consulting From schultzk at uni-trier.de Mon Jun 26 03:04:18 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Mon Jun 26 04:19:50 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <449FA4E4.9010505@xs4all.nl> References: <309.7649529.31d01f9b@aol.com> <449FA4E4.9010505@xs4all.nl> Message-ID: <2E862222-A145-4583-AA4C-261DCD0C39B6@uni-trier.de> Hi All, Am 26.06.2006 um 11:12 schrieb Walter van Holst: > Bowerbird@aol.com wrote: >> >> p.g. introduces its own linebreaks into its plain-ascii e-texts, >> all without ever entering the markup arena. >> >> and i put in my own line-breaks, right here in these posts to this >> listserve, again without using any markup at all, just the return >> key. > > Line-breaks are mark-up. They don't add anything whatsoever to the > text itself and are completely arbitrarily decided, usually based > on the technology that is used to display the actual content. You > can deny the difference between structure, content and presentation > all you want, but it is perfectly possible to reformat a book using > columns instead of lines without changing the actual content. And > where will your precious line-breaks go in that case? > In normal prose line breaks generally do not effect the actual content, but in poetry in may be very meaningful. Especially in works where the form of the text is important !! But do not take my word for it. Keith. From walter.van.holst at xs4all.nl Mon Jun 26 05:37:34 2006 From: walter.van.holst at xs4all.nl (Walter van Holst) Date: Mon Jun 26 05:37:37 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <2E862222-A145-4583-AA4C-261DCD0C39B6@uni-trier.de> References: <309.7649529.31d01f9b@aol.com> <449FA4E4.9010505@xs4all.nl> <2E862222-A145-4583-AA4C-261DCD0C39B6@uni-trier.de> Message-ID: <449FD50E.90002@xs4all.nl> Schultz Keith J. wrote: > > In normal prose line breaks generally do not effect the actual > content, > but in poetry in may be very meaningful. Especially in works where > the > form of the text is important !! > > But do not take my word for it. > I will take your word for it. In some poetry even the typeface is part of the poem. Think about Paul van Ostaijen's Boem Paukenslag! http://users.pandora.be/gaston.d.haese/paukenslag.html Nonetheless, I wouldn't dare to call poetry each and every e-mail I wrote. Regards, Walter From jon at noring.name Mon Jun 26 06:15:28 2006 From: jon at noring.name (Jon Noring) Date: Mon Jun 26 06:15:40 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <2E862222-A145-4583-AA4C-261DCD0C39B6@uni-trier.de> References: <309.7649529.31d01f9b@aol.com> <449FA4E4.9010505@xs4all.nl> <2E862222-A145-4583-AA4C-261DCD0C39B6@uni-trier.de> Message-ID: <1307373850.20060626071528@noring.name> Walter van Holst wrote: > Bowerbird@aol.com wrote: >> p.g. introduces its own linebreaks into its plain-ascii e-texts, >> all without ever entering the markup arena. >> >> and i put in my own line-breaks, right here in these posts to this >> listserve, again without using any markup at all, just the return >> key. > Line-breaks are mark-up. They don't add anything whatsoever to the > text itself and are completely arbitrarily decided, usually based > on the technology that is used to display the actual content. You > can deny the difference between structure, content and presentation > all you want, but it is perfectly possible to reformat a book using > columns instead of lines without changing the actual content. And > where will your precious line-breaks go in that case? Yes, line-breaks (CR/LF, etc.) are markup. They are text characters used to communicate something besides the content. Paper books don't need to include these characters (they'd be invisible anyway), thus they are characters not part of the content, i.e. markup. Also, using * and _ for highlighting purposes is also markup. Of course, what Bowerbird means by markup is formalized and comprehensive text markup systems such as TeX, SGML/XML, troff, etc., but then his ZML system is another markup system that has kept markup characters to a minimum. This brings up an interesting observation in that using line-breaks in plain text has variable importance, from mildly important (arbitrarily used in paragraphs simply to trim line lengths to something manageable), to quite important (preserving poetry lines, and as Bowerbird would attest, everything he writes even if when prose-like in meaning.) The problem is knowing the relative importance of line-breaks in a plain text document, especially if one does not understand the language to ascertain context. ZML tries to tackle this issue, and I think somewhat succeeds, albeit at a loss of richness like the Model T Ford and black paint. And as noted previously, *how* does one know a particular plain text is ZML and thus falls under strict and unambiguous plain text formatting rules? Jon Noring From Bowerbird at aol.com Mon Jun 26 10:18:11 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Jun 26 10:18:19 2006 Subject: [gutvol-d] the end of the line Message-ID: <38f.548879d.31d170d3@aol.com> Line-breaks are mark-up. They don't add anything whatsoever to the text itself and are completely arbitrarily decided, usually based on the technology that is used to display the actual content. You can deny the difference between structure, content and presentation all you want, but it is perfectly possible to reformat a book using columns instead of lines without changing the actual content. And where will your precious line-breaks go in that case? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060626/9cd35cfc/attachment.html From gbnewby at pglaf.org Mon Jun 26 11:09:27 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Jun 26 11:09:29 2006 Subject: !@!Re: [gutvol-d] ebooks libre et gratuits (fwd) In-Reply-To: References: Message-ID: <20060626180927.GB4897@pglaf.org> We could probably run a mirror of this... is anyone in touch with the folks (perhaps in French)? It would take some cooperation from their end (such as an rsync server) to run a good mirror. -- Greg > ---------- Forwarded message ---------- > Date: Sun, 25 Jun 2006 10:25:19 -0400 > From: Janet Kegg > To: Project Gutenberg Volunteer Discussion > Subject: Re: [gutvol-d] ebooks libre et gratuits > > > The site is now available again: http://www.ebooksgratuits.com/ > > See the front page of the Web site for what I believe (my French is > almost nonexistent) is an explanation of what happened. > > On Sun, 25 Jun 2006 09:58:58 -0400, you wrote: > > >Ebooks libre et gratuits had an arrangement with MH apparently where their > >books would appear on PG eventually. Now that the ebooks web site is no > >more, what will happen to the ebooksgratuits which did not make it to PG? > >Will all of this work have to be repeated by someone else? Is there an > >archive anywhere of this enormous quantitiiy of work? Why did > >ebooksgratuits disappear? Pressure from Canadian publishers/ government? > >Is there an unknown story here? > > > >nwolcott2@post.harvard.edu > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From desrod at gnu-designs.com Mon Jun 26 11:38:20 2006 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Mon Jun 26 11:39:17 2006 Subject: !@!Re: [gutvol-d] ebooks libre et gratuits (fwd) In-Reply-To: <20060626180927.GB4897@pglaf.org> References: <20060626180927.GB4897@pglaf.org> Message-ID: > We could probably run a mirror of this... is anyone in touch with > the folks (perhaps in French)? It would take some cooperation from > their end (such as an rsync server) to run a good mirror. I'd be more than happy to mirror it, if I knew what it was I was supposed to fetch and mirror ;) (Yes, my last name is Desrosiers, a French name, but my French is so rusty, you don't want me to translate that for you ;) David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Bowerbird at aol.com Mon Jun 26 12:07:24 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Jun 26 12:07:45 2006 Subject: [gutvol-d] the end of the line Message-ID: <37c.5655d2e.31d18a6c@aol.com> jeroen said: > Although I agree with Michael that there is no need > to preserve things as linebreaks in most texts -- ok, well you and michael agree. that's good. :+) but what do you say to end-users who want that info? somehow, "tough luck, kid, _we_ don't think it's necessary" doesn't sound like the kind of thing _i_ want to tell people. because that's the type of statement that makes people go off to a different cyberlibrary. that's my whole point. (and to all of the other people who responded on similar "theoretical" grounds, i'm truly sorry you missed the point.) > if you really need to go to that level of detail, there > is always the original or the scans to fall back upon well, neither of those gives you the flexibility of digital text. but yes, a tight coupling of the two forms is the best method. you will note that those "digital reprints" from jose menendez allow a reader to summon the scan of the page with one click. (since the page already looks like the scan anyway, there might be little reason to do it, though, except to verify that similarity. but this constant willingness to demonstrate the verisimilitude will be the proof that makes people comfortable with the use of the smaller-sized digital reprint, with its expanded functionality, as opposed to the bigger, slower, dumber collection of scans. anyone who has proofread a scan against reflowed text knows the reflowing makes that task immensely more difficult though, so you'll never attain the same confidence in the text's accuracy.) > I want to make a case for preserving page numbers, > if not at least as recognisable anchors in text, and only for > those books being referenced to regularly by other books. page-numbers are retained in many e-texts these days... but i'm sure you remember we all had this same argument about page-numbers. i'm confident that -- down the line -- sentiment will similarly change to be in favor of line-breaks. in general, i've just been content to wait it out until the change; but seeing all the e-texts as they cross my screen downloading made me realize again the sadness of the discarded line-breaks. > This excludes most fiction, but is particularly important for > scientific works, which have constructed a kind of paper web > with cross references mainly based on page numbers. there are plenty of cross-references made to works of fiction. and the concept of "books reading each other" would require that _all_ of our books are brought under the same umbrella... > In long term, such references of course should give way to proper > references to the actual paragraph or sentence being referenced good! you recognize the need for a finer-grained pointer than the page. because that's the kind of thinking that leads to line-break retention. you can narrow things down rather specifically when you point to the range that's represented from page-19-line-7 to page-21-line-14, or from page-87-line-6 to page-87-line-8, can't you? not only that, this kind of reference also works for the person who only has the paper copy of the book, not the e-book, if the two are duplicates of each other. and that's precisely the type of capability i'll have in my viewer-program. even in a traditional browser, it wouldn't be hard to implement something roughly equivalent, though. the user could specify some text with a link, and after going to the precise point of the link, the browser could then execute a "find" command for the specified text. it wouldn't be hard at all, and would seem to give a rather exact form of pointing to a specific place. it has the benefit of being implemented entirely outside of the document, as well, which i see as being tremendously important. if all our links need markup in the original document to be implemented, as is the present case, we're _never_ going to be able to quickly get to a point of profuse interlinks. we'll get thoroughly bogged down in the quicksand of heavy markup first... (for an example of that, take a look at the markup which jon noring posted, and then read through that particular diversion of this thread. the horrors!) > but as a practical ad-interim solution, staying with page numbers will > increase the number of texts we can digitize with our limited means. it doesn't cost anything to retain the line-break information. > I would however, like to see the collection be incorporated in a kind of > wiki-like system, where people can add -- without tampering with the static > source texts -- annotations, add tagging and create live cross references i've had a demo up for some time now showing "continuous proofreading". > http://users.aol.com/bowerbird/proof_wiki.html i also used a similar template in these demo-books: > http://www.greatamericannovel.com/mabie/mabiep001.html > http://www.greatamericannovel.com/myant/myantc001.html > http://www.greatamericannovel.com/ahmmw/ahmmwc001.html > http://www.greatamericannovel.com/sgfhb/sgfhbc001.html this system could easily be elaborated upon to build what you requested here. indeed, i will be pouring all of the p.g. texts that i'll be handling -- perhaps some 5000-6000, as near as i can tell -- into just this type of system, within the next 6 months, and i would be open to any ideas that you might have... heck, design a webpage to do what you want, and i will use it as the template. you know me, i don't even care if it "validates", as long as it's easy and it works. *** andrew said: > There are places such as wikisource.org, where you could add the texts > and start providing links such as you mention here immediately. i'll check out wikisource.org to see what kind of capabilities they offer. in the past, when i've looked at existing sites, it has seemed that wikis aren't geared to do things -- like populate pages -- on a massive scale. even rather fundamental things like batch f.t.p. are sometimes missing. and when you're dealing with thousands, or tens of thousands, of files, it becomes absolutely necessary to deal with them in a template fashion. i also think there's a good reason jeroen asked for a "wiki-like system", and not a wiki per se, as indicated by his concern about "tampering" with the static source texts. the thought is that the original source -- and indeed, the string of comments as well -- must be inviolate. that's because the idea is to build a body of thought around a text, of which links -- intrasystem, and outgoing and incoming -- are a very crucial aspect. and it's not possible to link into a wiki proper, because what was there yesterday might well be gone today, only to reappear in different form tomorrow. you can't link into a pile of sand. oh sure, you could instruct users to leave link markup untouched. and they might even follow your instructions. (yeah, right.) still, that will interfere with refactoring, and get very crufty before long. besides, a good part of the give-and-take of this kind of conversation involves letting all of the arguments stand, rather than editing them. (and especially rather than "editing them by deletion".) let the future examine all the arguments, and see which ones stand the test of time. so you need to have stability for the process itself, not just for the links. -bowerbird p.s. jeroen, if you want to provide me a template, i could use it sooner rather than later, the better to architect it into my overall work-flow... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060626/042334d3/attachment.html From prosfilaes at gmail.com Mon Jun 26 12:08:33 2006 From: prosfilaes at gmail.com (David Starner) Date: Mon Jun 26 12:08:37 2006 Subject: [gutvol-d] Automated readability scores for PG eBooks In-Reply-To: References: <20060625215923.GA18811@pglaf.org> <6d99d1fd0606260003v1da9790ep3aed09dd6fc9414@mail.gmail.com> Message-ID: <6d99d1fd0606261208ya731c40q665b5226b05359bc@mail.gmail.com> On 6/26/06, Scott Lawton wrote: > While I agree that it would not be worth adding readability score if it had much > impact on these and other worthy goals, But if it doesn't, then those goals aren't reasons _for_ adding it. > There are lots and lots of cool things that could be done with the catalog. We could start with the results of stripping the header and running wc on it. That strikes me at least as useful as this result. Also, the ten or twelve most common words in the book after stripping the ten or twelve most common words in the English language. > Even in the context of the above, the scores would provide a great starting point for > being improved with manual cataloging and literacy labeling. I don't think so. It's downright useless for manual cataloging, as it only handles that one dimension. I don't think it will help literacy labeling much, either, which is best done manually. > Don't let the perfect stand in the way of the good. But I don't think having these numbers anywhere prominent is good. Right now our pages only have a few pieces of important information; minutia like this should go to a page linked to a page linked only from the book page, which we can fill with various stats to our hearts content. It also seems a little weird to have some proprietary reading level numbers on the system, instead of the Fog index or the Flesch-Kincaid Readability tests. It feels like an advertisement. From tony at baechler.net Mon Jun 26 11:47:17 2006 From: tony at baechler.net (Tony Baechler) Date: Mon Jun 26 12:08:40 2006 Subject: [gutvol-d] ftp.archive.org Message-ID: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net> Hi list, I know that often ftp.archive.org is down for a few days at a time, but it has been down now for almost all of June. Is this permanent? Is ftp access via ftp.archive.org ended? I prefer it over ftp.ibiblio.org for PG files because it is significantly faster. If ftp access is no longer available, can anyone recommend a fast mirror that is kept frequently up to date? I tried snowy.arsc.alaska.edu but it wasn't as current as I would like. I'm planning to download several thousand zip files so a fast mirror is appreciated. I'm sure http is faster but I would prefer ftp if possible. Besides http://www.gutenberg.org/dirs/ isn't really much faster than metalab.unc.edu, AKA ftp.ibiblio.org. Is there a chance that ftp.archive.org has moved to a different host or ip address? I'm running ncftp for Windows so I don't think it's a caching or dns problem. I think I tried under Linux as well with similar results. It tries for about a minute and times out. I tried with and without passive mode but it doesn't matter since I can't connect. I am in California, US. -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.1.394 / Virus Database: 268.9.4/375 - Release Date: 6/25/06 From traverso at dm.unipi.it Mon Jun 26 13:10:34 2006 From: traverso at dm.unipi.it (Carlo Traverso) Date: Mon Jun 26 13:06:09 2006 Subject: !@!Re: [gutvol-d] ebooks libre et gratuits (fwd) In-Reply-To: <20060626180927.GB4897@pglaf.org> (message from Greg Newby on Mon, 26 Jun 2006 11:09:27 -0700) References: <20060626180927.GB4897@pglaf.org> Message-ID: <200606262010.k5QKAYR09348@pico.dm.unipi.it> >>>>> "Greg" == Greg Newby writes: Greg> We could probably run a mirror of this... is anyone in touch Greg> with the folks (perhaps in French)? It would take some Greg> cooperation from their end (such as an rsync server) to run Greg> a good mirror. -- Greg >> ---------- Forwarded message ---------- Date: Sun, 25 Jun 2006 >> 10:25:19 -0400 From: Janet Kegg To: Project >> Gutenberg Volunteer Discussion Subject: >> Re: [gutvol-d] ebooks libre et gratuits >> >> >> The site is now available again: http://www.ebooksgratuits.com/ >> I have written to coolmicro, if I don't hear back shortly I'll ask to common friends that work with him. Carlo From Bowerbird at aol.com Mon Jun 26 13:31:43 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Jun 26 13:31:56 2006 Subject: [gutvol-d] Automated readability scores for PG eBooks Message-ID: <55d.27a30.31d19e2f@aol.com> david said: > We could start with the results of stripping the header and the "footer", where most of the legalese is these days. does anyone here know the best way to strip both of them? > Also, the ten or twelve most common words in the book after > stripping the ten or twelve most common words in the English language. you'd need to strip more than a dozen. below is a list from wikipedia. there's a strong power-law in word usage. unless you strip 200-500 common words, it probably won't reveal anything very interesting... -bowerbird > http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists Here are the top 100 words (from Project Gutenberg texts) in alphabetical order: a about after all an and any are as at be been before but by can could did do down first for from good great had has have he her him his I if in into is it its know like little made man may me men more mr much must my no not now of on one only or other our out over said see she should so some such than that the their them then there these they this time to two up upon us very was we were what when which who will with would you your -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060626/f5d86011/attachment-0001.html From scott_bulkmail at productarchitect.com Mon Jun 26 13:36:13 2006 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Mon Jun 26 13:53:03 2006 Subject: [gutvol-d] Automated readability scores for PG eBooks In-Reply-To: <6d99d1fd0606261208ya731c40q665b5226b05359bc@mail.gmail.com> References: <20060625215923.GA18811@pglaf.org> <6d99d1fd0606260003v1da9790ep3aed09dd6fc9414@mail.gmail.com> <6d99d1fd0606261208ya731c40q665b5226b05359bc@mail.gmail.com> Message-ID: >>Even in the context of the above, the scores would provide a great starting point for >>being improved with manual cataloging and literacy labeling. > >I don't think so. It's downright useless for manual cataloging, as it >only handles that one dimension. Isn't "useless" a bit strong? Sure, it's only one dimension; that's true of any single piece of information. Right now, a manual cataloger looking for children's books would probably look for known titles and authors, search for some likely keywords ... and then what? How will they surface children's books that they don't already know about? A list of the "most readable" (no matter how flawed the metric) is a MUCH better starting point than the complete list of books at PG. >I don't think it will help literacy >labeling much, either, which is best done manually. Actually, readability scores are widely used in education. I'm sure they have their detractors, but that's true of almost anything. Even with manual labelling (which hasn't been done to date and therefore I don't see how it's an argument against an automated solution), scores are also useful. >It also seems a little weird to have some proprietary reading level >numbers on the system, instead of the Fog index or the Flesch-Kincaid >Readability tests. It feels like an advertisement. I'm in favor of any and all readability scores. If these existing scores were already in place, I probably wouldn't have bothered to comment. Or, if the choice was Fog + F-K vs. some other score, I would choose the most common score. But I haven't seen anyone offer to add Fog or F-K, so I welcome useful info from any source. Just so it's clear: I have no connection with Rocket Reader. I'm not even sure if I ever heard of them before Greg's note. I've thought for a long time that it would be useful to include readability scores. Scott From JBuck814366460 at aol.com Mon Jun 26 14:12:17 2006 From: JBuck814366460 at aol.com (Jared Buck) Date: Mon Jun 26 14:12:25 2006 Subject: [gutvol-d] ftp.archive.org In-Reply-To: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net> References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net> Message-ID: <44A04DB1.6070608@aol.com> Hi Tony, I am in California too - southern california to be exact. I don't know why it's not working for you because it works fine for me. Maybe your FTP program is not connecting correctly. Me, i use wget (avilable for windows as well as a default on Linux) for my Gutenberg downloading needs. I plan to get an external hard drive (preferably an Iomega drive) later, probably for my birthday next Thursday, which i can then use to store the Gutenberg etexts and save me some disk space on my current drive. I would be using rsync to do that (check the Mirroring FAQ on PG if you don't know what that is), apparently it's much faster than wget or even FTP because it doesn't check every single file for hours to find updates, it keeps a list of all files and only downloads the ones that specifically need updating, saves you a couple of hours of time. Or at least that's what Aaron (Cannon) told me. Jared Tony Baechler wrote on 26/06/2006, 11:47 AM: > Hi list, > > I know that often ftp.archive.org is down for a few days at a time, > but it has been down now for almost all of June. Is this > permanent? Is ftp access via ftp.archive.org ended? I prefer it > over ftp.ibiblio.org for PG files because it is significantly > faster. If ftp access is no longer available, can anyone recommend a > fast mirror that is kept frequently up to date? I tried > snowy.arsc.alaska.edu but it wasn't as current as I would like. I'm > planning to download several thousand zip files so a fast mirror is > appreciated. I'm sure http is faster but I would prefer ftp if > possible. Besides http://www.gutenberg.org/dirs/ isn't really much > faster than metalab.unc.edu, AKA ftp.ibiblio.org. Is there a chance > that ftp.archive.org has moved to a different host or ip address? > > I'm running ncftp for Windows so I don't think it's a caching or dns > problem. I think I tried under Linux as well with similar > results. It tries for about a minute and times out. I tried with > and without passive mode but it doesn't matter since I can't > connect. I am in California, US. > > > -- > No virus found in this outgoing message. > Checked by AVG Anti-Virus. > Version: 7.1.394 / Virus Database: 268.9.4/375 - Release Date: 6/25/06 > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -- . .:. .:::. .:::::. ***.:::::::.*** *******.:::::::::.******* Dmitri Yalovsky ********.:::::::::::.******** ********.:::::::::::::.******** USS Authority *******.::::::'***`::::.******* ******.::::'*********`::.****** Asst. Chief of Engineering ****.:::'*************`:.**** *.::'*****************`.* .:' *************** . . From traverso at dm.unipi.it Mon Jun 26 14:16:54 2006 From: traverso at dm.unipi.it (Carlo Traverso) Date: Mon Jun 26 14:12:29 2006 Subject: !@!Re: [gutvol-d] ebooks libre et gratuits (fwd) In-Reply-To: <200606262010.k5QKAYR09348@pico.dm.unipi.it> (message from Carlo Traverso on Mon, 26 Jun 2006 22:10:34 +0200) References: <20060626180927.GB4897@pglaf.org> <200606262010.k5QKAYR09348@pico.dm.unipi.it> Message-ID: <200606262116.k5QLGsq10323@pico.dm.unipi.it> >>>>> "Carlo" == Carlo Traverso writes: >>>>> "Greg" == Greg Newby writes: Greg> We could probably run a mirror of this... is anyone in touch Greg> with the folks (perhaps in French)? It would take some Greg> cooperation from their end (such as an rsync server) to run Greg> a good mirror. -- Greg >>> ---------- Forwarded message ---------- Date: Sun, 25 Jun 2006 >>> 10:25:19 -0400 From: Janet Kegg To: Project >>> Gutenberg Volunteer Discussion Subject: >>> Re: [gutvol-d] ebooks libre et gratuits >>> >>> >>> The site is now available again: >>> http://www.ebooksgratuits.com/ >>> Carlo> I have written to coolmicro, if I don't hear back shortly Carlo> I'll ask to common friends that work with him. Carlo> Carlo Coolmicro answered, who thanks very much for our interest, but they have already planned a mirror, so a second one is not critical. I send to Greg an adddress for further contacts. Carlo From gbnewby at pglaf.org Mon Jun 26 15:35:14 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Jun 26 15:35:15 2006 Subject: [gutvol-d] ftp.archive.org In-Reply-To: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net> References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net> Message-ID: <20060626223514.GA11041@pglaf.org> On Mon, Jun 26, 2006 at 11:47:17AM -0700, Tony Baechler wrote: > Hi list, > > I know that often ftp.archive.org is down for a few days at a time, > but it has been down now for almost all of June. Is this > permanent? Is ftp access via ftp.archive.org ended? I prefer it > over ftp.ibiblio.org for PG files because it is significantly > faster. If ftp access is no longer available, can anyone recommend a > fast mirror that is kept frequently up to date? I tried > snowy.arsc.alaska.edu but it wasn't as current as I would like. I'm > planning to download several thousand zip files so a fast mirror is > appreciated. I'm sure http is faster but I would prefer ftp if > possible. Besides http://www.gutenberg.org/dirs/ isn't really much > faster than metalab.unc.edu, AKA ftp.ibiblio.org. Is there a chance > that ftp.archive.org has moved to a different host or ip address? I'm surprised you can connect to ftp.archive.org. I can't. We stopped pushing the collection to them several weeks ago. They had a hardware failure, and were unresponsive. Today, there are three master collections where new eBooks are pushed: http://www.gutenberg.org on iBiblio....see this for direct access to the raw files: ftp://ftp.ibiblio.org/pub/docs/books/gutenberg http://gutenberg.readingroo.ms same as ftp://readingroo.ms/gutenberg http://snowy.arsc.alaska.edu/gutenberg same as ftp://snowy.arsc.alaska.edu/mirrors/gutenberg They all get new files immediately. The catalog at gutenberg.org is only updated daily, and of course mirrors have their own schedule. You can check "gutenberg.dcs" in the top-level mirror directory to see if they have updated in the past week (we update gutenberg.dcs Sunday mornings EST). I hope this helps. My guess is the readingroo.ms server will give you the best throughput (though it will have some brief downtime, then possibly be heavily loaded during the world ebook fair, http://www.worldebookfair.com). Are there any Debian whizzes on this list who might want to help look after the readingroo.ms server with me? -- Greg > I'm running ncftp for Windows so I don't think it's a caching or dns > problem. I think I tried under Linux as well with similar > results. It tries for about a minute and times out. I tried with > and without passive mode but it doesn't matter since I can't > connect. I am in California, US. > > > -- > No virus found in this outgoing message. > Checked by AVG Anti-Virus. > Version: 7.1.394 / Virus Database: 268.9.4/375 - Release Date: 6/25/06 > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From joey at joeysmith.com Mon Jun 26 23:31:47 2006 From: joey at joeysmith.com (joey) Date: Mon Jun 26 23:47:34 2006 Subject: [gutvol-d] ftp.archive.org In-Reply-To: <20060626223514.GA11041@pglaf.org> References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net> <20060626223514.GA11041@pglaf.org> Message-ID: <20060627063147.GB2650@joeysmith.com> On Mon, Jun 26, 2006 at 03:35:14PM -0700, Greg Newby wrote: > > Are there any Debian whizzes on this list who might want to help look > after the readingroo.ms server with me? How can I help? I've been a Debian admin for going on 6 years now. From pm40fr at yahoo.fr Tue Jun 27 01:21:17 2006 From: pm40fr at yahoo.fr (pat) Date: Tue Jun 27 01:28:00 2006 Subject: [gutvol-d] EbooksGratuits online Message-ID: <20060627082117.59885.qmail@web26809.mail.ukl.yahoo.com> Hi, I am Patrick from ebooksgratuits 1/ As Carlo told you, thank you very much for being concerned by what happened to us. Now, we have moved www.ebooksgratuits.com to a very robust provider, and we will have a mirror soon at www.ebooksgratuits.org , so that it does not happen again. We have now collected some funds through a paypal button and can afford to secure our website. Such a predicament is indeed quite painful. 2/ Moreover,as our clearance process (to use a PG word) is quite light, we are not in a position to justify the life +50 rule on some of our books (some translators especially are hard to find out), which is not something that PG would not want, to mirror help 3/ We have now transferred around 160 books to PG thanks to huge recent help (Chuck Greif) and the enduring patience of Tonya. We are now going on more slowly, as the problem is to find out the sources that can be cleared. 4/ Should we disappear again, be aware that you can also have access plenty of our files through P2P (edonkey/emule), search on "ebooksgratuits". --------------------------------- Yahoo! Mail r?invente le mail ! D?couvrez le nouveau Yahoo! Mail et son interface r?volutionnaire. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060627/e9659641/attachment.html From schultzk at uni-trier.de Tue Jun 27 03:42:19 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Tue Jun 27 03:42:25 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <37c.5655d2e.31d18a6c@aol.com> References: <37c.5655d2e.31d18a6c@aol.com> Message-ID: Hi Everybody, I have to admit I have not followed this thread fully, believe I understand the arguements well enough. It all boils down to what you want and what is practical and practically possible. Linking and references are a problem of syncronisation. Using hard copies you always give the reference author, publisher, year, edition and page (optional line) when a reference is to another book, article. All this information is absolute necessary. The layout of the publication could change or even the text itsself!! The other aspect is that a reference is always made to text and not lines or pages ( blanks, and punctuation is also text in the wider sense)! What is needed is a method to keep all this information syncronized. For e-text(books) you need mark-up in one form or another and a system that keeps track of everything. That is all changes, links, references, changes in text and its position. As mentioned you need an umbrella to keep everything under control. In other words a sub set in which everthing is syncronized. There is no one method that is fool proof and many systems out there. As sugested one could use a method in which the critical edition are marked up and the user can state what he wants to see. That makes the files very large and sometimes difficult to find what you want. I will go with bowerbirds umbrella and take what I can get. To me you can not have your cake and eat it too. That is easy mark- up, easy to read without preprocessing the text or using a viewer ! Regards Keith. From hart at pglaf.org Tue Jun 27 07:20:05 2006 From: hart at pglaf.org (Michael Hart) Date: Tue Jun 27 07:20:08 2006 Subject: [gutvol-d] ftp.archive.org In-Reply-To: <20060627063147.GB2650@joeysmith.com> References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net> <20060626223514.GA11041@pglaf.org> <20060627063147.GB2650@joeysmith.com> Message-ID: I just wanted to add my personal thanks! Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From Bowerbird at aol.com Tue Jun 27 14:41:27 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Jun 27 14:41:40 2006 Subject: [gutvol-d] a tool for your grandma to download p.g. e-texts en masse Message-ID: <2c5.a8525c6.31d30007@aol.com> what are the feelings here on releasing a tool for your grandma to download p.g. e-texts en masse? although it seems in line with "unlimited distribution", it will also mean people scraping texts indiscriminately. would someone who has a stake in the bandwidth used please give me a definite answer? thanks. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060627/e51ce705/attachment.html From brad at chenla.org Tue Jun 27 17:34:59 2006 From: brad at chenla.org (Brad Collins) Date: Tue Jun 27 17:32:26 2006 Subject: [gutvol-d] the end of the line In-Reply-To: <38f.548879d.31d170d3@aol.com> (Bowerbird@aol.com's message of "Mon, 26 Jun 2006 13:18:11 EDT") References: <38f.548879d.31d170d3@aol.com> Message-ID: Bowerbird@aol.com writes: > Line-breaks are mark-up. They don't add anything whatsoever to the text > itself and are completely arbitrarily decided, usually based on the > technology that is used to display the actual content. You can deny the > difference between structure, content and presentation all you want, but > it is perfectly possible to reformat a book using columns instead of > lines without changing the actual content. And where will your precious > line-breaks go in that case? Perhaps it's better to think of line-breaks as an arbitrary part of layout, rather than as mark-up. In a markup language you can specify if the value of an element ignores whitespace and line breaks (like html

      ) or preserves them (like html

      ).
      
      But line breaks are treated very differently by text editors.
      
      Some text editors and email clients will auto-insert soft line breaks
      at column markers.  This gives the use the illusion of having line
      breaks but if they send the text to someone who doesn't have this
      feature the person on the other side will just see extremely long
      lines which scroll faaaaar off the screen.
      
      Older editors like Emacs allow you to auto-insert hard ling-breaks as
      you type.  And then when you edit text, or cut and paste you use a
      command to "re-fill" the line or paragraph by reformatting the text to
      break lines at a defined column marker.
      
      Different programing languages treat whitespace and line breaks
      completely differently.  Some languages require you to explicitly
      indicate line breaks with markup like "\n" or ";".
      
      Since everyone seems to have a different opinion on how to treat
      whitespace and line breaks, it's best to specify very clearly how your
      language or markup treats them.
      
      But it would be foolish to treat line-breaks as markup for preserving
      line breaks for the simple reason that a lot of software out there
      will simply not respect it as such.
      
      b/
      
      -- 
      Brad Collins , Banqwao, Thailand
      From j.hagerson at comcast.net  Tue Jun 27 18:34:56 2006
      From: j.hagerson at comcast.net (John Hagerson)
      Date: Tue Jun 27 18:45:30 2006
      Subject: [gutvol-d] Daily progress reports missing from [posted]?
      Message-ID: <007401c69a53$1318e700$0200a8c0@sarek>
      
      I have not received a daily progress report through [posted] since 22-JUN.
      Have they been temporarily suspended or discontinued? Have they been moved
      to another list?
      
      Thank you.
      
      
      From Bowerbird at aol.com  Tue Jun 27 19:13:26 2006
      From: Bowerbird at aol.com (Bowerbird@aol.com)
      Date: Tue Jun 27 19:13:33 2006
      Subject: [gutvol-d] the end of the line
      Message-ID: <250.cbe5e9b.31d33fc6@aol.com>
      
      brad-
      
      the post you quoted, from me, was an errant send.
      it was actually a message posted by someone else.
      
      in general, i don't think much about how existing
      software will treat my files, because i consider it
      my job to deliver software that does what i want.
      
      non-programmers have to live within their apps,
      but as a programmer, i create the worlds i want...
      
      the question of how to "mark up" line-breaks is a
      non-starter for me.   the plain-ascii p.g. e-texts
      already have hard line-endings in them indicating
      line-breaks.   those are the ones inserted by p.g.
      my suggestion was that the original line-breaks
      should be used instead.
      
      i think the discussion has run its course this time.
      maybe it will come up again.   or maybe it will not.
      
      and maybe there will be demand from users for
      e-texts that mimic their hard-copy counterparts,
      or maybe there will not be.
      
      maybe jeroen will work up a template i can use.
      or maybe he won't.
      
      time will tell on all these things.   or maybe it won't.
      
      -bowerbird
      -------------- next part --------------
      An HTML attachment was scrubbed...
      URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060627/d9e49c3b/attachment.html
      From michael.p.may at earthlink.net  Tue Jun 27 21:07:59 2006
      From: michael.p.may at earthlink.net (Michael May)
      Date: Tue Jun 27 21:08:02 2006
      Subject: [gutvol-d] How to digitize SRR's Five Laws?
      Message-ID: <31763220.1151467680194.JavaMail.root@elwamui-royal.atl.sa.earthlink.net>
      
      Hi all,
      
      I am Michael May, new "Classics Editor" at dLIST, the Digital Library of Information Science and Technology: http://dlist.sir.arizona.edu/
      
      dLIST has received written permission from the copyright owner of works by S.R. Ranganathan to post electronic copies of several of SRR's books at the dLIST site, including the original 1931 edition of The Five Laws of Library Science, the main premise of which is "Books are for use!" Despite being out of print (a reprint is planned for later this year by Ess Ess Publications of India ), Five Laws is arguably the most important work in library science to date.
      
      We have experimented with PDF by posting the prefatory pages and Chapter 1 here:
      http://genie.sir.arizona.edu/1115/
      
      However, Five Laws is over 500 pages and includes numerous illustrations. I believe a text or html version would be much easier to access and preserve.
      
      What advice do you have about how to proceed? I was thinking about starting by recruiting volunteers from the LIS community to transcribe the text. What should I think about or plan for before asking people to help? Does Project Gutenberg already have resources available that could help us?
      
      I'd very much appreciate any suggestions or advice.
      
      Thanks.
      
      Mike
      From hyphen at hyphenologist.co.uk  Tue Jun 27 23:08:58 2006
      From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
      Date: Tue Jun 27 23:09:10 2006
      Subject: [gutvol-d] a tool for your grandma to download p.g. e-texts en
      	masse
      In-Reply-To: <2c5.a8525c6.31d30007@aol.com>
      References: <2c5.a8525c6.31d30007@aol.com>
      Message-ID: 
      
      On Tue, 27 Jun 2006 17:41:27 EDT,  Bowerbird@aol.com wrote:
      
      |what are the feelings here on releasing a tool for 
      |your grandma to download p.g. e-texts en masse?
      
      What have you against Grandparents?
      They/I as a grandparent use the normal tools which you do.
      
      -- 
      Dave Fawthrop  
      "Intelligent Design?" my knees say *not*. 
      "Intelligent Design?" my back says *not*.
      More like "Incompetent design". Sig (C) Copyright Public Domain
      
      From sly at victoria.tc.ca  Tue Jun 27 23:39:02 2006
      From: sly at victoria.tc.ca (Andrew Sly)
      Date: Tue Jun 27 23:39:05 2006
      Subject: [gutvol-d] How to digitize SRR's Five Laws?
      In-Reply-To: <31763220.1151467680194.JavaMail.root@elwamui-royal.atl.sa.earthlink.net>
      References: <31763220.1151467680194.JavaMail.root@elwamui-royal.atl.sa.earthlink.net>
      Message-ID: 
      
      
      Michael,
      
      Thanks for your message.
      
      Disclaimer: These comments are just my personal opinion,
      based on what I've seen from being involved with PG for
      a decent number of years.
      
      Yes, PG volunteers have found that, for many purposes,
      a text or html file can be preferable to a pdf.
      To start with, you have a smaller file size, which makes
      the file more accessible over slow connections.
      You can also run into extra difficulties if you
      want to update the file, or correct some errors that
      are found in a year or two's time.
      
      In my own experience, lots of illustrations certainly does
      add to the complexity of the task.
      
      One point to consider about looking for volunteers from the
      LIS community, is that you might be getting yourself into a
      big discussion of markup, encoding process, documentation,
      etc. before you get going.
      
      Have you done much work transcribing books before?
      If you were a new PG volunteer, I would gently suggest
      that a project of this nature is too much to tackle,
      and point you towards www.pgdp.net to start with some
      easy pages there.
      
      You ask "Does Project Gutenberg already have resources available
      that could help us?" Interesting question. By far the biggest
      resource the PG has is many volunteers who directly (or indirectly)
      contribute to it. If you have any specific requests or problems,
      we could probably direct you to someone who has dealt with it
      before. (With 18,000 books, we've had plenty of issues to
      deal with.)
      For a general overview, you could try reading:
      http://www.gutenberg.org/faq/
      although some of the material there is slightly outdated now.
      
      Of course the tempting possibility I could mentioned is requesting
      non-exclusive permission for PG to distribute this text, and then
      we could run it through Distributed Proofreaders.
      
      Andrew
      
      On Tue, 27 Jun 2006, Michael May wrote:
      
      > Hi all,
      >
      > I am Michael May, new "Classics Editor" at dLIST, the Digital Library of Information Science and Technology: http://dlist.sir.arizona.edu/
      >
      > dLIST has received written permission from the copyright owner of works by S.R. Ranganathan to post electronic copies of several of SRR's books at the dLIST site, including the original 1931 edition of The Five Laws of Library Science, the main premise of which is "Books are for use!" Despite being out of print (a reprint is planned for later this year by Ess Ess Publications of India ), Five Laws is arguably the most important work in library science to date.
      >
      > We have experimented with PDF by posting the prefatory pages and Chapter 1 here:
      > http://genie.sir.arizona.edu/1115/
      >
      > However, Five Laws is over 500 pages and includes numerous illustrations. I believe a text or html version would be much easier to access and preserve.
      >
      > What advice do you have about how to proceed? I was thinking about starting by recruiting volunteers from the LIS community to transcribe the text. What should I think about or plan for before asking people to help? Does Project Gutenberg already have resources available that could help us?
      >
      > I'd very much appreciate any suggestions or advice.
      >
      > Thanks.
      >
      > Mike
      From tony at baechler.net  Wed Jun 28 00:40:54 2006
      From: tony at baechler.net (Tony Baechler)
      Date: Wed Jun 28 00:40:52 2006
      Subject: [gutvol-d] ftp.archive.org
      In-Reply-To: <44A04DB1.6070608@aol.com>
      References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net>
      	<44A04DB1.6070608@aol.com>
      Message-ID: <7.0.1.0.2.20060628003711.03fdd800@baechler.net>
      
      Hi.  Yes, I'm vaguely familiar with rsync.  The problem is that I 
      don't want each and every file posted.  I don't download html and 
      8-bit files for example.  I only download the zipped plain text 
      files.  Also I don't want some religious works.  Therefore rsync 
      won't help me.  As far as the external drive, that's not a bad idea 
      but I think I prefer DVD instead.  Finally, http://www.archive.org/ 
      is fine, just ftp doesn't work.  I tried on two different computers 
      so I don't think it's my settings.  I also have wget but prefer ncftp 
      as it's a dedicated ftp client.  I am near San Diego, CA.
      
      At 02:12 PM 6/26/06 -0700, you wrote:
      >Hi Tony, I am in California too - southern california to be exact.  I
      >don't know why it's not working for you because it works fine for me.
      >Maybe your FTP program is not connecting correctly.  Me, i use wget
      >(avilable for windows as well as a default on Linux) for my Gutenberg
      >downloading needs.
      >
      >I plan to get an external hard drive (preferably an Iomega drive) later,
      >probably for my birthday next Thursday, which i can then use to store
      >the Gutenberg etexts and save me some disk space on my current drive.  I
      >would be using rsync to do that (check the Mirroring FAQ on PG if you
      >don't know what that is), apparently it's much faster than wget or even
      >FTP because it doesn't check every single file for hours to find
      >updates, it keeps a list of all files and only downloads the ones that
      >specifically need updating, saves you a couple of hours of time.  Or at
      >least that's what Aaron (Cannon) told me.
      
      
      -- 
      No virus found in this outgoing message.
      Checked by AVG Anti-Virus.
      Version: 7.1.394 / Virus Database: 268.9.5/376 - Release Date: 6/26/06
      
      
      From tony at baechler.net  Wed Jun 28 00:52:27 2006
      From: tony at baechler.net (Tony Baechler)
      Date: Wed Jun 28 00:52:24 2006
      Subject: [gutvol-d] ftp.archive.org
      In-Reply-To: <20060626223514.GA11041@pglaf.org>
      References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net>
      	<20060626223514.GA11041@pglaf.org>
      Message-ID: <7.0.1.0.2.20060628004644.03fd5a20@baechler.net>
      
      Hi.  Thanks very much, the readingroo.ms server seems much 
      faster.  When I checked last, snowy.arsc.alaska.edu seemed to be a 
      few hours behind the other master sites.  I am no longer able to 
      connect to ftp.archive.org, it just times out.  I am not a Debian 
      expert but I do run a Debian server and know a reasonable amount 
      about it.  What needs doing?  I am not really a programmer but I know 
      how to install packages and set up things for the most part.  If 
      there is something that needs to be done, let me know and I'll see.
      
      At 03:35 PM 6/26/06 -0700, you wrote:
      
      >I hope this helps.  My guess is the readingroo.ms server will
      >give you the best throughput (though it will have some
      >brief downtime, then possibly be heavily loaded during the
      >world ebook fair, http://www.worldebookfair.com).
      >
      >Are there any Debian whizzes on this list who might want to help look
      >after the readingroo.ms server with me?
      >
      >   -- Greg
      
      
      -- 
      No virus found in this outgoing message.
      Checked by AVG Anti-Virus.
      Version: 7.1.394 / Virus Database: 268.9.5/376 - Release Date: 6/26/06
      
      
      From Bowerbird at aol.com  Wed Jun 28 01:43:45 2006
      From: Bowerbird at aol.com (Bowerbird@aol.com)
      Date: Wed Jun 28 01:43:55 2006
      Subject: [gutvol-d] a tool for your grandma to download p.g. e-texts en
      	masse
      Message-ID: <256.cf31f48.31d39b41@aol.com>
      
      dave said:
      >    What have you against Grandparents?
      
      it's just an expression, dave.
      it indicates "not technically inclined".
      
      and, like most such stereotypical shorthand,
      it's got a grain of truth, and not much more.
      
      hey, two of my best online e-book buddies
      -- nicholas hodson and meyer moldeven --
      are technically astute (nicholas especially),
      and they're both into their eighties now...
      
      i'm old enough to be a grandparent myself.         :+)
      
      -bowerbird
      -------------- next part --------------
      An HTML attachment was scrubbed...
      URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060628/dc2b03e8/attachment-0001.html
      From Bowerbird at aol.com  Wed Jun 28 02:36:08 2006
      From: Bowerbird at aol.com (Bowerbird@aol.com)
      Date: Wed Jun 28 02:36:13 2006
      Subject: [gutvol-d] How to digitize SRR's Five Laws?
      Message-ID: <2fd.7a682d2.31d3a788@aol.com>
      
      mike said:
      >    Despite being out of print 
      >    (a reprint is planned for later this year 
      >    by Ess Ess Publications of India 
      >    ), 
      >    Five Laws is arguably the most important work in library science to 
      date.
      
      that's quite a sad commentary, isn't it, that
      what is arguably _the_ most important work
      in library science to date is _out_of_print_...
      
      so congratulations on bringing it back to life.
      
      the .pdf versions you've made are not as useful
      as they could be, however, because you've just
      wrapped the scans into a .pdf.   that means that
      the text cannot be searched or copied out of it,
      and those are two of the big benefits of e-books.
      
      so yes, you are right they would be better with
      digital text.   but there's no need to transcribe.
      instead, o.c.r. the scans, correct the results, and
      then wrap that digital text into several formats:
      plain text could be one, .html could be another,
      and even .pdf (except this time with text that is
      searchable and could be copied out of the .pdf).
      further, the scans could also be used themselves.
      (but you should strive for higher-quality scans.)
      
      here's a rough sketch of how to proceed:
      1.   scan the book's pages.
      2.   clean up the scans.   (straighten, crop, etc.)
      3.   perform the o.c.r.
      4.   clean up the text.
      5.   proofread the text against the scans.
      6.   auto-convert the text to .html.
      7.   auto-convert the text to .pdf.
      
      i'd be willing to help guide you on any of the steps.
      (especially the last couple, which some people might
      try and tell you are "impossible".   don't believe them.)
      
      distributed proofreaders would probably also help
      if you could donate the text to project gutenberg.
      since there will be one copy in cyberspace anyway,
      tell the publisher there might as well be lots of 'em.
      (online copies don't really cannibalize print sales;
      indeed, there are some indications they feed 'em.)
      besides, _books_are_for_use_, are they not?          :+)
      
      for some pretty "digital reprint" examples to look at, see:
      >   http://www.ibiblio.org/ebooks/Mabie/Books_Culture.pdf
      >   http://www.ibiblio.org/ebooks/Cather/Antonia/Antonia.pdf
      >   http://www.ibiblio.org/ebooks/Einstein/Einstein_Relativity.pdf
      
      for a look at a system that enables volunteers to proofread, see:
      >   http://www.greatamericannovel.com/mabie/mabiep001.html
      >   http://www.greatamericannovel.com/myant/myantc001.html
      >   http://www.greatamericannovel.com/ahmmw/ahmmwc001.html
      >    http://www.greatamericannovel.com/sgfhb/sgfhbc001.html
      
      -bowerbird
      -------------- next part --------------
      An HTML attachment was scrubbed...
      URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060628/8a73521e/attachment.html
      From desrod at gnu-designs.com  Wed Jun 28 04:10:43 2006
      From: desrod at gnu-designs.com (David A. Desrosiers)
      Date: Wed Jun 28 04:11:43 2006
      Subject: [gutvol-d] ftp.archive.org
      In-Reply-To: <7.0.1.0.2.20060628003711.03fdd800@baechler.net>
      References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net>
      	<44A04DB1.6070608@aol.com>
      	<7.0.1.0.2.20060628003711.03fdd800@baechler.net>
      Message-ID: 
      
      
      > Hi.  Yes, I'm vaguely familiar with rsync.  The problem is that I 
      > don't want each and every file posted.  I don't download html and 
      > 8-bit files for example.  I only download the zipped plain text 
      > files.  Also I don't want some religious works.  Therefore rsync 
      > won't help me.
      
       	I'm sorry... what? You can rsync exactly what files you wish, 
      recursively or not, pick and choose, with rsync... using the right 
      options. I mirror Gutenberg here with rsync, skipping the DVD files, 
      .mp3 files, .rar files and a few others, getting only the useful 
      copies of books.
      
       	What part of rsync's usage is confusing you?
      
      
      David A. Desrosiers
      desrod@gnu-designs.com
      http://gnu-designs.com
      From tony at baechler.net  Wed Jun 28 08:53:34 2006
      From: tony at baechler.net (Tony Baechler)
      Date: Wed Jun 28 08:53:31 2006
      Subject: [gutvol-d] New DVD ISO feedback sought
      In-Reply-To: <20060626093237.GA27369@pglaf.org>
      References: <20060626093237.GA27369@pglaf.org>
      Message-ID: <7.0.1.0.2.20060628084333.0426a7b0@baechler.net>
      
      Hi Greg,
      
      At the risk of sounding uninformed, why not include the copyrighted 
      books on the first DVD with as many titles as possible?  My 
      understanding is that PG must be allowed to at least noncommercially 
      distribute copyrighted works before they are added.  You wouldn't be 
      selling the DVDs so I don't see a problem.  With most CC licenses, it 
      allows at least free noncommercial distribution anyway.  Is it just a 
      matter of not enough space after the 3.5 GB of public domain titles?
      
      What about a DVD of only html books and no plain text?  The books 
      could directly be viewed in a browser.  Maybe a "best of" collection 
      but only with uncompressed html files and illustrations and on a DVD 
      instead of a CD.
      
      As far as PG's best work in terms of illustrations, I suggest 
      searching through the "posted" list archives for the word 
      "illustration."  I've noticed that David W and Joe sometimes comment 
      on images which stand out.  This might be a good basis for the best 
      of DVD described above.  Also, what about including musical scores in 
      one of these sets?
      
      I'm unfamiliar with Amazon's best of public domain list so I can't 
      comment on that.  One slight concern I would have with showing off 
      PG's best work is that some people might not be interested.  For 
      example, David W just posted five volumes on the life of George 
      Washington.  I'm sure it's interesting (I haven't looked at it yet) 
      but might not interest non-US readers and might be advanced for some 
      people.  It isn't exactly light reading.  I'm sure the text has few 
      errors and the html looks good but maybe it isn't of interest to 
      many.  This could be where the readability scores come in useful 
      though.  Pick the best PG books with the nicest html and images that 
      is the easiest to read.
      
      Those are my thoughts.  Another possibility in the future would be a 
      CD or DVD with Braille files.  National Braille Press in the US is 
      selling such a CD but it's expensive.  It would make more sens to 
      give it away.  The majority of blind people are unemployed so paying 
      for such a CD set is out of the reach of most of them, at least in the US.
      
      
      -- 
      No virus found in this outgoing message.
      Checked by AVG Anti-Virus.
      Version: 7.1.394 / Virus Database: 268.9.5/376 - Release Date: 6/26/06
      
      
      From nwolcott2ster at gmail.com  Tue Jun 27 06:43:03 2006
      From: nwolcott2ster at gmail.com (Norm Wolcott)
      Date: Wed Jun 28 09:48:31 2006
      Subject: [gutvol-d] ebooks libre et gratuits
      References: 
      Message-ID: <003301c69ad2$a66849e0$640fa8c0@gw98>
      
      They are now instituting a "quota" system, apparently to avoid  wholesale
      downloads of their site. Interestingly internet archive does not get any of
      their books, only the front page, and nothing since December 2005!
      
      The limit is a daily one, and you are invited to return tomorrow, for
      another quota apparently.
      nwolcott2@post.harvard.edu
      ----- Original Message -----
      From: "Philip Baker" 
      To: 
      Sent: Sunday, June 25, 2006 8:35 PM
      Subject: [gutvol-d] ebooks libre et gratuits
      
      
      > In article <200606251622.35322.donovan@abs.net>, D Garcia
      >  writes
      > >On Sunday 25 June 2006 10:25 am, Janet Kegg wrote:
      > >> The site is now available again: http://www.ebooksgratuits.com/
      > >>
      > >> See the front page of the Web site  for what I believe (my French is
      > >> almost nonexistent) is an explanation of what happened.
      > >
      > >The news item roughly translated is:
      > >
      > >As you probably noted, the site was inaccessible for over a week; the
      reason
      > >is that our ISP shut down following "A crippling DDOS attack" which they
      were
      > >not able to successfully block. We changed ISPs, and the site is once
      again
      > >available. We will take the necessary means so that this type of thing
      cannot
      > >happen again; I will speak about it again very soon.
      >
      >
      > They are being rather optimistic but we will have to wait and see if
      > their "mesures n?cessaires" work.
      > --
      > Philip Baker
      > _______________________________________________
      > gutvol-d mailing list
      > gutvol-d@lists.pglaf.org
      > http://lists.pglaf.org/listinfo.cgi/gutvol-d
      
      From joey at joeysmith.com  Wed Jun 28 16:19:00 2006
      From: joey at joeysmith.com (joey)
      Date: Wed Jun 28 16:35:03 2006
      Subject: [gutvol-d] ftp.archive.org
      In-Reply-To: 
      References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net>
      	<44A04DB1.6070608@aol.com>
      	<7.0.1.0.2.20060628003711.03fdd800@baechler.net>
      	
      Message-ID: <20060628231900.GD2650@joeysmith.com>
      
      On Wed, Jun 28, 2006 at 07:10:43AM -0400, David A. Desrosiers wrote:
      > 
      > >Hi.  Yes, I'm vaguely familiar with rsync.  The problem is that I 
      > >don't want each and every file posted.  I don't download html and 
      > >8-bit files for example.  I only download the zipped plain text 
      > >files.  Also I don't want some religious works.  Therefore rsync 
      > >won't help me.
      
      I have to echo what David said. Rather than chaining yourself to FTP,
      you should look more deeply at what rsync is capable of. If you need,
      I could probably help you define an rsync line that gets what you want
      and ONLY what you want (I myself already have one that pulls ONLY the 
      zip files).
      From hart at pglaf.org  Wed Jun 28 17:35:30 2006
      From: hart at pglaf.org (Michael Hart)
      Date: Wed Jun 28 17:35:31 2006
      Subject: [gutvol-d] Michael Hart is on the Road
      Message-ID: 
      
      
      I will be rather slow with my email responses for the next month,
      and I presumed, already correctly show, that some messages of the
      somewhat negative kind would come at such a time and I appreciate
      the way our list members have allowed me the opportunities for an
      easier time with such message as I don't have to respond alone.
      
      I really can't tell you how much I appreciate all the support for
      the work I have been doing, and hope will continue, with the very
      wonderful help of perhaps as many as 50,000 volunteers.
      
      
      Thank you!
      
      Thank you!
      
      Thank you!
      
      
      Give the world eBooks in 2006!!!
      
      Michael S. Hart
      Founder
      Project Gutenberg
      
      Blog at http://hart.pglaf.org
      
      From hart at pglaf.org  Wed Jun 28 17:43:15 2006
      From: hart at pglaf.org (Michael Hart)
      Date: Wed Jun 28 17:43:17 2006
      Subject: [gutvol-d] !@!Re: [BP] Re: EXTRA! Project Gutenberg Weekly
      	Newsletter 
      Message-ID: 
      
      
      By the way, as we have continually offered, if anyone would like
      to write up a different catalogue, counting system, or whatever,
      we would be only too happy to include it the Newsletters, and in
      the various archives.
      
      I am sure that people could come up with counts both higher, and
      lower, than whatever method is chosen, and we would be very glad
      to repost those counts each week, each month, or even each year,
      if anyone would be willing to put them together in pretty much a
      free fashion, as long as there was some internal consistency.
      
      Nothing like 100% accuracy would be required, and we should have
      the capability of averaging all such counts, hopefully to see in
      some manner an average count that reflected what people are used
      to seeing in library catalogues.
      
      As I will be away for a month, now would be a perfect time to do
      this sort of thing, and if it catches on, perhaps I won't have a
      Newsletter that I have to do so much of personally when I return
      after this trip. . .perhaps I won't have to do it at all. . . .
      
      
      Thanks!!!
      
      Give the world eBooks in 2006!!!
      
      Michael S. Hart
      Founder
      Project Gutenberg
      
      Blog at http://hart.pglaf.org
      From desrod at gnu-designs.com  Wed Jun 28 17:51:40 2006
      From: desrod at gnu-designs.com (David A. Desrosiers)
      Date: Wed Jun 28 17:52:45 2006
      Subject: [gutvol-d] ftp.archive.org
      In-Reply-To: <20060628231900.GD2650@joeysmith.com>
      References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net>
      	<44A04DB1.6070608@aol.com>
      	<7.0.1.0.2.20060628003711.03fdd800@baechler.net>
      	
      	<20060628231900.GD2650@joeysmith.com>
      Message-ID: 
      
      
      > I have to echo what David said. Rather than chaining yourself to 
      > FTP, you should look more deeply at what rsync is capable of. If you 
      > need, I could probably help you define an rsync line that gets what 
      > you want and ONLY what you want (I myself already have one that 
      > pulls ONLY the zip files).
      
       	Here's mine...
      
       	rsync -avzprlHtPS --delete --exclude=[0-9]*.txt 	\
       		--exclude=*.iso --exclude=*.rar --exclude=*.ISO	\
       		--exclude=*.mp3 --exclude=pgdvd*		\
       		ftp@ftp.ibiblio.org::gutenberg Gutenberg
      
       	This gives me ~34GiB of data... enough for me to use as a 
      viable mirror.
      
      
      David A. Desrosiers
      desrod@gnu-designs.com
      http://gnu-designs.com
      From jeroen.mailinglist at bohol.ph  Thu Jun 29 14:06:00 2006
      From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
      Date: Thu Jun 29 14:03:21 2006
      Subject: [gutvol-d] a tool for your grandma to download p.g. e-texts en
      	masse
      In-Reply-To: <256.cf31f48.31d39b41@aol.com>
      References: <256.cf31f48.31d39b41@aol.com>
      Message-ID: <44A440B8.9090801@bohol.ph>
      
      Bowerbird@aol.com wrote:
      > dave said:
      >   
      >>    What have you against Grandparents?
      >> it's just an expression, dave.
      >> it indicates "not technically inclined".
      >>     
      My grandfather worked with the one of the first computers to be
      installed here in the Netherlands in the fifties. Last year he bought a
      new PC, at ninety years old, and is still using it regularly to stay in
      touch with relatives who have settled down all across the globe.
      Although not a nerd, he certainly knows how to use the machine...
      
      Jeroen.
      
      From Bowerbird at aol.com  Thu Jun 29 14:14:12 2006
      From: Bowerbird at aol.com (Bowerbird@aol.com)
      Date: Thu Jun 29 14:14:24 2006
      Subject: [gutvol-d] a tool for your grandma to download p.g. e-texts en
      	masse
      Message-ID: <51e.25e11f5.31d59ca4@aol.com>
      
      jeroen said:
      >    Although not a nerd, he certainly knows how to use the machine...
      
      so, have you got him hard at work proofing for you?           ;+)
      
      -bowerbird
      -------------- next part --------------
      An HTML attachment was scrubbed...
      URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060629/10fbcd63/attachment.html
      From tony at baechler.net  Fri Jun 30 00:58:52 2006
      From: tony at baechler.net (Tony Baechler)
      Date: Fri Jun 30 00:58:46 2006
      Subject: [gutvol-d] ftp.archive.org
      In-Reply-To: 
      References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net>
      	<44A04DB1.6070608@aol.com>
      	<7.0.1.0.2.20060628003711.03fdd800@baechler.net>
      	
      Message-ID: <7.0.1.0.2.20060630005304.03354d60@baechler.net>
      
      My understanding of rsync is that you had to mirror the entire PG 
      archive.  That was based on the PG FAQ and my attempts to read the 
      help and man page.  I couldn't figure out the command line options 
      and experiments I tried gave me errors.  I think the PG FAQ gives a 
      sample command line but that's for everything which isn't what I 
      want.  Besides it's nice to manually look at and download each 
      file.  I often like to stop every few files and look at a book of 
      interest.  So, to answer your question, all of rsync confuses me 
      since I never got it to work.
      
      Also, another problem might be that I'm primarily on Windows.  I know 
      rsync is common in Linux and I have it installed on the Debian server 
      that I run but I'm not sure if it's available for Windows or not.  I 
      have Cygwin so I might have it, but again I have no idea how to get 
      it to only get the files I want.  That's nice about getting them 
      manually, I can skip those I don't want as I see them in the newsletters.
      
      At 07:10 AM 6/28/06 -0400, you wrote:
      >         I'm sorry... what? You can rsync exactly what files you wish,
      >recursively or not, pick and choose, with rsync... using the right
      >options. I mirror Gutenberg here with rsync, skipping the DVD files,
      >.mp3 files, .rar files and a few others, getting only the useful
      >copies of books.
      >
      >         What part of rsync's usage is confusing you?
      
      
      -- 
      No virus found in this outgoing message.
      Checked by AVG Anti-Virus.
      Version: 7.1.394 / Virus Database: 268.9.7/379 - Release Date: 6/29/06
      
      
      From JBuck814366460 at aol.com  Fri Jun 30 16:37:07 2006
      From: JBuck814366460 at aol.com (Jared Buck)
      Date: Fri Jun 30 16:37:19 2006
      Subject: [gutvol-d] ftp.archive.org
      In-Reply-To: <7.0.1.0.2.20060630005304.03354d60@baechler.net>
      References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net>
      	<44A04DB1.6070608@aol.com>
      	<7.0.1.0.2.20060628003711.03fdd800@baechler.net>
      	
      	<7.0.1.0.2.20060630005304.03354d60@baechler.net>
      Message-ID: <44A5B5A3.1010703@aol.com>
      
      Rsync's available for Windows as part of the cygwin package.  Just like 
      FTP or wget you can tell rsync to get only the stuff you want. and 
      unlike FTP or wget it will only download the files that need updating, 
      without you having to wait several hours for it to skip over every file 
      that hasn't changed.
      
      I admit it can be confusing since it's a very powerful too.  I was 
      talking about it with Aaron Cannon and he says it's a better way to make 
      a "mirror" of PG (with or without specific files that you want.
      
      My suggestion for people who want to use rsync?  Have someone write a 
      more detailed FAQ on it, explain it in non-technical terms, and provide 
      some examples (using the PG archive) of commands you can run with it, 
      especially sample rsync lines like David has, explaining all the '-' 
      tags and what they mean in context with the line and what they will make 
      rsync do to the files you download/mirror.
      
      Jared
      
      Tony Baechler wrote on 30/06/2006, 12:58 AM:
      
       > My understanding of rsync is that you had to mirror the entire PG
       > archive.  That was based on the PG FAQ and my attempts to read the
       > help and man page.  I couldn't figure out the command line options
       > and experiments I tried gave me errors.  I think the PG FAQ gives a
       > sample command line but that's for everything which isn't what I
       > want.  Besides it's nice to manually look at and download each
       > file.  I often like to stop every few files and look at a book of
       > interest.  So, to answer your question, all of rsync confuses me
       > since I never got it to work.
       >
       > Also, another problem might be that I'm primarily on Windows.  I know
       > rsync is common in Linux and I have it installed on the Debian server
       > that I run but I'm not sure if it's available for Windows or not.  I
       > have Cygwin so I might have it, but again I have no idea how to get
       > it to only get the files I want.  That's nice about getting them
       > manually, I can skip those I don't want as I see them in the newsletters.
       >
       > At 07:10 AM 6/28/06 -0400, you wrote:
       > >         I'm sorry... what? You can rsync exactly what files you wish,
       > >recursively or not, pick and choose, with rsync... using the right
       > >options. I mirror Gutenberg here with rsync, skipping the DVD files,
       > >.mp3 files, .rar files and a few others, getting only the useful
       > >copies of books.
       > >
       > >         What part of rsync's usage is confusing you?
       >
       >
       > --
       > No virus found in this outgoing message.
       > Checked by AVG Anti-Virus.
       > Version: 7.1.394 / Virus Database: 268.9.7/379 - Release Date: 6/29/06
       >
       >
       > _______________________________________________
       > gutvol-d mailing list
       > gutvol-d@lists.pglaf.org
       > http://lists.pglaf.org/listinfo.cgi/gutvol-d
       >
      
      -- 
                  .
                     .:.
                    .:::.
                   .:::::.
               ***.:::::::.***
          *******.:::::::::.*******                Dmitri
      Yalovsky
      ********.:::::::::::.********
      ********.:::::::::::::.********             USS
      Authority
      *******.::::::'***`::::.*******
      ******.::::'*********`::.******            Asst. Chief of
      Engineering
      ****.:::'*************`:.****
          *.::'*****************`.*
          .:'  ***************    .
         .
      
      From desrod at gnu-designs.com  Fri Jun 30 16:56:58 2006
      From: desrod at gnu-designs.com (David A. Desrosiers)
      Date: Fri Jun 30 16:58:01 2006
      Subject: [gutvol-d] ftp.archive.org
      In-Reply-To: <44A5B5A3.1010703@aol.com>
      References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net>
      	<44A04DB1.6070608@aol.com>
      	<7.0.1.0.2.20060628003711.03fdd800@baechler.net>
      	
      	<7.0.1.0.2.20060630005304.03354d60@baechler.net>
      	<44A5B5A3.1010703@aol.com>
      Message-ID: 
      
      
      > My suggestion for people who want to use rsync?  Have someone write 
      > a more detailed FAQ on it, explain it in non-technical terms, and 
      > provide some examples (using the PG archive) of commands you can run 
      > with it, especially sample rsync lines like David has, explaining 
      > all the '-' tags and what they mean in context with the line and 
      > what they will make rsync do to the files you download/mirror.
      
       	How about using Unison?
      
       	http://www.cis.upenn.edu/~bcpierce/unison/
      
      
      David A. Desrosiers
      desrod@gnu-designs.com
      http://gnu-designs.com