From radicks at bellsouth.net Fri May 5 07:11:39 2006 From: radicks at bellsouth.net (Dick Adicks) Date: Fri May 5 07:11:43 2006 Subject: [gutvol-d] PG texts in South Africa In-Reply-To: Message-ID: The Open Book venture is indeed one to be applauded, Andrew. I became a PG volunteer after teaching in Zimbabwe in 1997-98 and perceiving the need for inexpensive books throughout Africa (less so in South Africa). Regrettably, I have been unsuccessful in enlisting Zimbabwean faculty in a project to print selected PG texts. But the DVD adaptation will be a boon for teachers and students whose access to the Internet is undependable. Obviously Michael has found the OpenLab International e-mail address: the website quotes his enthusiastic welcome of Open Book. In order to cooperate with them, what about asking OpenLab International to transmit to PG a list of titles needed by their academic clientele? Such a list could be circulated so that PG volunteers can supply the books needed. A Nigerian friend told me that many a student in his country has to walk for miles to borrow a book, take it home, copy it overnight by hand, and return it the next day. PG volunteers can help young people whose learning depends on overcoming that kind of hardship. Dick Adicks . . . if vicious people are united and form a power, honest people must do the same. --Leo Tolstoy > From: Michael Hart > Reply-To: "Michael S. Hart" , Project Gutenberg Volunteer > Discussion > Date: Thu, 13 Apr 2006 09:25:19 -0700 (PDT) > To: Project Gutenberg Volunteer Discussion > Subject: Re: [gutvol-d] PG texts in South Africa > > > I can't seem to find an email address that works for them. > > Any help? > > Thanks! > > Michael > > > > On Wed, 12 Apr 2006, Andrew Sly wrote: > >> Here's another example of PG texts being used. >> >> http://www.tectonic.co.za/view.php?id=961 >> >> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d@lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d >> > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From jon at noring.name Fri May 5 08:29:38 2006 From: jon at noring.name (Jon Noring) Date: Fri May 5 08:36:04 2006 Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks Message-ID: <1052830729.20060505092938@noring.name> [It's sort of odd that Michael posted this "news release" to Book People, but not here to gutvol-d. So I'm reposting it here (at least I don't recall it being posted here -- I checked the gutvol-d archive for April and May.) David Rothman, in his TeleRead blog, posted an article covering this news release: http://www.teleread.org/blog/?p=4798 Is this PG II resurrected? Anyway, great work, Michael, at right justifying the text!] 2 Months To 1/3 Million eBooks Press Release: May 4, 2006 See: http://www.gutenberg.org [Use any favorite browsers] http://www.worldebookfair.com[Best With MS Explorer] Contact: Michael S. Hart Phone: 217-344-6623 * 1/3 Million eBooks Free from July 4 Through August 4 1/3 of a million books, or 10 times the number found in the average public library, will be available for free downloading via the Internet and World Wide Web beginning July 4, as Project Gutenberg and the World eBook Library act on their dreams of increased world literacy and education. Such a collection, if printed out in standard format would be large enough to outweigh elephant herds and to cover the sidelines at all 40 Super Bowl games. Each year for one month, The World eBook Library and Project Gutenberg will team up to be major sponsors, planning to make ONE MILLION eBooks available in the World eBook Fair of 2009. "It has been our goal since the dawn of the Internet to break down the bars of ignorance and illiteracy," says Michael Hart, who founded the Project Gutenberg effort by placing the first permanent text online on July 4, 1971, 35 years ago. "Our projects are based on the premise that everyone in the world could have access to a free worldwide public library," for John Guagliardo, founder of The World eBook Library, this event marks the fruition of years of hard labor. Over 100 languages will be represented for worldwide readers and that total is expected to increase, even as the preparations are underway. The books are the permanent property of whomever downloads them, but a warning is including to check your local copyrights, as the books are provided under U.S. copyright laws, and other nations have different copyrights. eBooks still under U.S. copyright have been donated via the permission of the copyright holders. Project Gutenberg, perhaps the oldest Internet site, and The World eBook Library, perhaps the largest one of the growing number of eBook libraries, joined for the purpose of "bringing the most eBooks to the most people in the world." "This is the fulfillment of a lifetime of dreams," a sentiment shared by Greg Newby, as Project Gutenberg CEO, he hopes his Library & Information Science PhD. will become the last of the olde library worlde, and perhaps the first of a new world of library science. "We can only hope that Google, Yahoo, and the others can also achieve their goals in the next few years-- as we hope they will each reach for a million eBooks before the decade ends. Our own goals, with them or without them, are to bring the world the 1/3 million eBooks this year, 1/2 million next year, 3/4 million in 2008, and to reach a grand total of a one million volume World eBook Fair on July 4, 2009." *** Additional facts and figures: Proposed World eBook Fair totals for upcoming years: July 4, 2006, 1/3 Million July 4, 2007, 1/2 Million July 4, 2008, 3/4 Million July 4, 2009, ONE Million * By 2009, the terabyte [one thousand gigabytes] boxes we have seen enter the consumer marketplace in 2006, now priced as low as $500, will be commonplace on an average computer on the shelf and will easily hold a million volumes of a million characters each for the price of just one semester's books at a university. * Project Gutenberg and The World eBook Library are an example of the 501 (c) (3) corporations that operate on a non-profit basis to improve the world. * Project Gutenberg eBooks are all free of charge from http://www.gutenberg.org or http://gutenberg.net.au, the home of Project Gutenberg of Australia, and also http://pge.rastko.net, Project Gutenberg of Europe. In addition The World eBook Library sponsors Project Gutenberg's Consortia Center where entire collection providers around the world make many eBook libraries available in their entirety. http://www.gutenberg.cc The World eBook Library, a member supported service, offers unrestricted access to its collection of over 250,000 eBooks, documents, and articles. Individual membership is only $8.95 a year, discounted to $1 on a per student or FTE [full time equivalency] for the various educational groups or institutions. An even greater discount is available for public libraries. For those who cannot afford a membership, or who may be experiencing hardships, these World eBook Library services are provided via complimentary subscription services via Natural Disaster Relief Programs and/or Economic Hardship Relief Subscriptions. Not one who has ever qualified for complimentary subscription of either of these programs has ever been denied. The World eBook Fair effort at worldebookfair.com or via www.gutenberg.org is a cooperative effort by the World eBook Library Consortia and Project Gutenberg, representing the largest and oldest eBook libraries. 1/3 of a million eBooks will be made available for a period from July 4 to August 4 this year in honor of the date of the first steps taken to make eLibraries on July 4, 1971, on what was to become the Internet, via what as to become Project Gutenberg. This celebration is to promote book awareness and to assist in efforts to increase literacy and education all over the world. We hope you enjoy some of your favorite books in the editions presented here and that you will pass those on to others we hope will enjoy them as much as you. Please feel free to send in requests for books to be included in future World eBook Fairs. * We hope you will see fit to give this press release, and future similar releases, your consideration, and that you will see fit to pass them on to others. If you have any favorite media people, encouragement to them to use this would be wonderful! Thank you so much! Michael S. Hart John Guagliardo Gregory Newby Visit Project Gutenberg sites at: http://gutenberg.org ~50 languages The original PG site http://gutenberg.net.au Project Gutenberg of Australia http://pgdp.net Original Distributed Proofreaders Site http://gutenberg.cc ~100 languages PG Consortia Center http://pge.rastko.net ~65 languages PG of Europe http://dp.rastko.net Distributed Proofreaders Europe Visit The World eBook Library at: http://www.netlibrary.net http://public-library.net * David vs. Googliath From Bowerbird at aol.com Fri May 5 11:43:51 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri May 5 11:44:02 2006 Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks Message-ID: <1ed.4f34cd80.318cf6e7@aol.com> michael said: > 2 Months To 1/3 Million eBooks last i remember reading -- six months back? -- the million book project said they had 600,000 books already scanned. but they conceded that not all were online yet... ok, here's the reference: > http://www.library.cmu.edu/Libraries/MBP_FAQ.html#current -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060505/b148faed/attachment.html From ke at gnu.franken.de Fri May 5 10:41:47 2006 From: ke at gnu.franken.de (Karl Eichwalder) Date: Fri May 5 11:52:00 2006 Subject: [gutvol-d] Re: PG texts in South Africa In-Reply-To: (Dick Adicks's message of "Fri, 05 May 2006 10:11:39 -0400") References: Message-ID: Dick Adicks writes: > PG volunteers can help young people whose learning depends > on overcoming that kind of hardship. Unfortunately, we cannot. Our old books are nice, but they surely need mostly books from last decade. -- http://www.gnu.franken.de/ke/ | ,__o | _-\_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C From hart at pglaf.org Mon May 8 08:43:17 2006 From: hart at pglaf.org (Michael Hart) Date: Mon May 8 08:43:19 2006 Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks In-Reply-To: <1ed.4f34cd80.318cf6e7@aol.com> References: <1ed.4f34cd80.318cf6e7@aol.com> Message-ID: My recollection of the latest word from Brewster is that they were just about to pass 10,000 full text eBooks, though not all had been proofread and edited to a 99.95% level of accuracy. So I am presuming they did actually pass 10,000 in the last month or so, though I haven't seen any official announcements. It would appear that one of the hardest things to find from Yahoo or Google eLibraries is the number of well finished eBooks. mh On Fri, 5 May 2006 Bowerbird@aol.com wrote: > michael said: >> 2 Months To 1/3 Million eBooks > > last i remember reading > -- six months back? -- > the million book project > said they had 600,000 > books already scanned. > > but they conceded that > not all were online yet... > > ok, here's the reference: >> http://www.library.cmu.edu/Libraries/MBP_FAQ.html#current > > -bowerbird > From jon at noring.name Mon May 8 13:40:20 2006 From: jon at noring.name (Jon Noring) Date: Mon May 8 13:40:30 2006 Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks In-Reply-To: References: <1ed.4f34cd80.318cf6e7@aol.com> Message-ID: <166276077.20060508144020@noring.name> Michael Hart wrote: > My recollection of the latest word from Brewster is that they were > just about to pass 10,000 full text eBooks, though not all had been > proofread and edited to a 99.95% level of accuracy. > > So I am presuming they did actually pass 10,000 in the last month > or so, though I haven't seen any official announcements. Is Brewster's effort (I assume you mean OCA?) doing actual proofing (which I always interpret to mean human proofing)? I thought all they were doing is scanning books and producing raw, unproofed text by OCR. Jon From Morasch at aol.com Mon May 8 13:47:09 2006 From: Morasch at aol.com (Morasch@aol.com) Date: Mon May 8 13:47:14 2006 Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks Message-ID: <433.49089d.3191084d@aol.com> michael said: > My recollection of the latest word from Brewster is that > they were just about to pass 10,000 full text eBooks, > though not all had been proofread and edited > to a 99.95% level of accuracy. so it's 600,000 scanned versus 10,000 proofed... meaning that any people who want one of those 590,000 that have been scanned but not proofed will need to do the o.c.r. and proofing themselves, providing they can locate the scan-set online... sounds fair to me. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060508/5eb8a60d/attachment.html From gbnewby at pglaf.org Mon May 8 14:36:29 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Mon May 8 14:36:32 2006 Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks In-Reply-To: <166276077.20060508144020@noring.name> References: <1ed.4f34cd80.318cf6e7@aol.com> <166276077.20060508144020@noring.name> Message-ID: <20060508213629.GA31020@pglaf.org> On Mon, May 08, 2006 at 02:40:20PM -0600, Jon Noring wrote: > Michael Hart wrote: > > > My recollection of the latest word from Brewster is that they were > > just about to pass 10,000 full text eBooks, though not all had been > > proofread and edited to a 99.95% level of accuracy. > > > > So I am presuming they did actually pass 10,000 in the last month > > or so, though I haven't seen any official announcements. > > Is Brewster's effort (I assume you mean OCA?) doing actual proofing > (which I always interpret to mean human proofing)? I thought all they > were doing is scanning books and producing raw, unproofed text by OCR. > > Jon You could probably call them and ask for details. When I was there in January, they were talking about doing some automated and semi-automated quality control (like looking for missing pages, and aligning pages that didn't scan straight). I don't think they're doing any human proofreading or markup at all -- instead, they are looking to Distributed Proofreaders to take that step (or anyone else interested). -- Greg From Bowerbird at aol.com Mon May 8 16:14:52 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 8 16:14:59 2006 Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks Message-ID: <2ad.4346291.31912aec@aol.com> > raw, unproofed text by OCR. i love the way this sounds -- "raw unproofed text by ocr". the implication is that the results are rather disastrous... and sometimes, granted, they can be. the chain is only as strong as its weakest link. however, if the paper-book is in fairly good shape, and its text is rather straightforward, and a best-of-breed scanner is used, and the scans are done carefully, and then properly treated (e.g., deskewed and regularized), and o.c.r. is done with a best-of-breed program, then auto-clean-up tools combined with normal spell-check will produce quite accurate text, thank you very much... and given advances in o.c.r. technology, and other tricks, a need for "human" proofreading of every word on a page could be eliminated for all but the most difficult of books. and my bet is that that day will come long before the one when we have machine-generated language-translations. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060508/1d5b2c11/attachment.html From bruce at zuhause.org Mon May 8 18:04:22 2006 From: bruce at zuhause.org (Bruce Albrecht) Date: Mon May 8 18:04:30 2006 Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks In-Reply-To: References: <1ed.4f34cd80.318cf6e7@aol.com> Message-ID: <17503.60054.882510.205276@celery.zuhause.org> Michael Hart writes: > It would appear that one of the hardest things to find from Yahoo > or Google eLibraries is the number of well finished eBooks. My searches at Google have found about 50K books, with another 42k+ books that ought to be available but are not because Google appears to be not making books available if there's a publication date and no copyright date. I don't know how to find books from the OCA, or Yahoo. From greg at durendal.org Tue May 9 04:44:48 2006 From: greg at durendal.org (Greg Weeks) Date: Tue May 9 05:00:04 2006 Subject: [gutvol-d] index entry for 18346 Message-ID: It looks like etext 18346 didn't index properly. There is an entry for it here: http://www.gutenberg.org/browse/authors/m#a7954 but no entry here: http://www.gutenberg.org/browse/authors/p#a7662 and this: http://www.gutenberg.org/etext/18346 results in a No etext no. 18346 error. -- Greg Weeks http://durendal.org:8080/greg/ From marcello at perathoner.de Tue May 9 05:30:59 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue May 9 05:31:02 2006 Subject: [gutvol-d] index entry for 18346 In-Reply-To: References: Message-ID: <44608B83.6030208@perathoner.de> Greg Weeks wrote: > > It looks like etext 18346 didn't index properly. There is an entry for > it here: > > http://www.gutenberg.org/browse/authors/m#a7954 > > but no entry here: > > http://www.gutenberg.org/browse/authors/p#a7662 > > and this: > > http://www.gutenberg.org/etext/18346 > > results in a No etext no. 18346 error. > Works for me. Most probably you (or somebody along the way) have cached the error page and some of the authors pages. Hit the refresh button on your browser. The automagic cataloger starts running at 02:00 EST and may run for a couple of hours depending on fileserver load. -- Marcello Perathoner webmaster@gutenberg.org From greg at durendal.org Tue May 9 06:46:44 2006 From: greg at durendal.org (Greg Weeks) Date: Tue May 9 07:30:05 2006 Subject: [gutvol-d] index entry for 18346 In-Reply-To: <44608B83.6030208@perathoner.de> References: <44608B83.6030208@perathoner.de> Message-ID: On Tue, 9 May 2006, Marcello Perathoner wrote: > Works for me. > > Most probably you (or somebody along the way) have cached the error page > and some of the authors pages. Hit the refresh button on your browser. It worked this time. I've had this problem before, where things were incorrectly cached. Thanks. -- Greg Weeks http://durendal.org:8080/greg/ From hart at pglaf.org Tue May 9 07:33:55 2006 From: hart at pglaf.org (Michael Hart) Date: Tue May 9 07:33:57 2006 Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks In-Reply-To: <17503.60054.882510.205276@celery.zuhause.org> References: <1ed.4f34cd80.318cf6e7@aol.com> <17503.60054.882510.205276@celery.zuhause.org> Message-ID: On Mon, 8 May 2006, Bruce Albrecht wrote: > Michael Hart writes: > > It would appear that one of the hardest things to find from Yahoo > > or Google eLibraries is the number of well finished eBooks. > > My searches at Google have found about 50K books, with another 42k+ > books that ought to be available but are not because Google appears to > be not making books available if there's a publication date and no > copyright date. I don't know how to find books from the OCA, or > Yahoo. Is this the kind of search we were discussing before, searching for commonplace words in the Google Book Search area, or have you found a better way? Perhaps you would be willing to post a list, or send it to me privately? Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg From hart at pglaf.org Tue May 9 07:35:33 2006 From: hart at pglaf.org (Michael Hart) Date: Tue May 9 07:35:35 2006 Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks In-Reply-To: <17503.60054.882510.205276@celery.zuhause.org> References: <1ed.4f34cd80.318cf6e7@aol.com> <17503.60054.882510.205276@celery.zuhause.org> Message-ID: One more question, did you figure out any estimate of how many of those 50,000 books your search found could actually be downloaded? More thanks! Michael From bruce at zuhause.org Tue May 9 19:54:38 2006 From: bruce at zuhause.org (Bruce Albrecht) Date: Tue May 9 19:54:42 2006 Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks In-Reply-To: References: <1ed.4f34cd80.318cf6e7@aol.com> <17503.60054.882510.205276@celery.zuhause.org> Message-ID: <17505.21998.788219.627950@celery.zuhause.org> Michael Hart writes: > On Mon, 8 May 2006, Bruce Albrecht wrote: > > My searches at Google have found about 50K books, with another 42k+ > > books that ought to be available but are not because Google appears to > > be not making books available if there's a publication date and no > > copyright date. I don't know how to find books from the OCA, or > > Yahoo. > > Is this the kind of search we were discussing before, searching for > commonplace words in the Google Book Search area, or have you found > a better way? Perhaps you would be willing to post a list, or send > it to me privately? They were found by doing keyword searches. Google Books now makes it easier to determine whether the book can be viewed in full. From the Google Book search page, they now indicate whether the book is either full view, snippet view, or no view. I'm not making a full list available anymore, at least not as a single download, because it took several minutes to download it from my site, and I was getting too many download requests from people who were downloading it because it was showing up at web search engines. I am currently working on loading MARC entries from several libraries for the books I've found, and will be supporting standard typical MARC tag searches (subject, author, publisher, language, etc.). My long term goal is to do the same for as many of the public domain image archives as I can. I still need to clean up the MARC entries, and the searches have not been implemented, but the website and displays of the Google Book entries and the MARC entries are at http://pdbooks.zuhause.org/ From hart at pglaf.org Wed May 10 07:36:22 2006 From: hart at pglaf.org (Michael Hart) Date: Wed May 10 07:36:23 2006 Subject: !@!re: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks In-Reply-To: <17505.21998.788219.627950@celery.zuhause.org> References: <1ed.4f34cd80.318cf6e7@aol.com> <17503.60054.882510.205276@celery.zuhause.org> <17505.21998.788219.627950@celery.zuhause.org> Message-ID: On Tue, 9 May 2006, Bruce Albrecht wrote: > Michael Hart writes: > > On Mon, 8 May 2006, Bruce Albrecht wrote: > > > My searches at Google have found about 50K books, with another 42k+ > > > books that ought to be available but are not because Google appears to > > > be not making books available if there's a publication date and no > > > copyright date. I don't know how to find books from the OCA, or > > > Yahoo. > > > > Is this the kind of search we were discussing before, searching for > > commonplace words in the Google Book Search area, or have you found > > a better way? Perhaps you would be willing to post a list, or send > > it to me privately? > > They were found by doing keyword searches. Google Books now makes it > easier to determine whether the book can be viewed in full. From the > Google Book search page, they now indicate whether the book is either > full view, snippet view, or no view. > > I'm not making a full list available anymore, at least not as a single > download, because it took several minutes to download it from my site, > and I was getting too many download requests from people who were > downloading it because it was showing up at web search engines. Would you be willing to let pglaf.org handle those download problems? > I am currently working on loading MARC entries from several libraries for the > books I've found, and will be supporting standard typical MARC tag searches > (subject, author, publisher, language, etc.). My long term goal is to do the > same for as many of the public domain image archives as I can. Wonderful!!! > I still need to clean up the MARC entries, and the searches have not > been implemented, but the website and displays of the Google Book > entries and the MARC entries are at http://pdbooks.zuhause.org/ MARC listings for eBooks are obviously going to be one of the "next big things" for eLibraries! Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg From hyphen at hyphenologist.co.uk Wed May 10 23:28:41 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Wed May 10 23:28:53 2006 Subject: [gutvol-d] Please how do I get my etexts displayed like this? Message-ID: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com> My etexts are individual stories and/or poems which are known individually and were often published several times in their own right. see etext no 18175 from weekly newsletter >>> |Emerson's Wife and Other Western Stories, by Florence Finch Kelly 18309 | [Illus.: Stanley L. Wood] | [Contents: Emerson's Wife]] | [ Colonel Kate's Protge] | [ The Kid of Apache Teju] | [ A Blaze on Pard Huff] | [ How Colonel Kate Won Her Spurs] | [ Hollyhocks] | [ The Rise, Fall, and Redemption of Johnson Sides] | [ A Piece of Wreckage] | [ The Story of a Chinee Kid] | [ Out of Sympathy] | [ An Old Roman of Mariposa] | [ Out of the Mouth of Babes] | [ Posey] | [ A Case of the Inner Imperative] | [Link: http://www.gutenberg.org/1/8/3/0/18309 ] | [Files: 18309.txt; 18309-8.txt; 18309-h.htm; ] <<< -- Dave Fawthrop "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From sly at victoria.tc.ca Wed May 10 23:35:31 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed May 10 23:35:37 2006 Subject: [gutvol-d] Please how do I get my etexts displayed like this? In-Reply-To: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com> References: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com> Message-ID: What is it you want to have "displayed like this"? The posted notes and gutindex listings, the online catalog record? I'll take a look at PG#18175 Andrew On Thu, 11 May 2006, Dave Fawthrop wrote: > My etexts are individual stories and/or poems which are known individually > and were often published several times in their own right. see etext no > 18175 > > from weekly newsletter > >>> > |Emerson's Wife and Other Western Stories, by Florence Finch Kelly 18309 > | [Illus.: Stanley L. Wood] > | [Contents: Emerson's Wife]] > | [ Colonel Kate's Protge] > | [ The Kid of Apache Teju] > | [ A Blaze on Pard Huff] > | [ How Colonel Kate Won Her Spurs] > | [ Hollyhocks] > | [ The Rise, Fall, and Redemption of Johnson Sides] > | [ A Piece of Wreckage] > | [ The Story of a Chinee Kid] > | [ Out of Sympathy] > | [ An Old Roman of Mariposa] > | [ Out of the Mouth of Babes] > | [ Posey] > | [ A Case of the Inner Imperative] > | [Link: http://www.gutenberg.org/1/8/3/0/18309 ] > | [Files: 18309.txt; 18309-8.txt; 18309-h.htm; ] > <<< > From sly at victoria.tc.ca Wed May 10 23:45:20 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed May 10 23:45:24 2006 Subject: [gutvol-d] Please how do I get my etexts displayed like this? In-Reply-To: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com> References: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com> Message-ID: Ok, it's longer than I usually do it with, but I've added a "formatted contents" field for 18175. What do you think? Also, remember that unlike traditional library catalogs, our full texts are also parsed. Search engines will pick up all of the items listed in a table of contents near the beginning of a text. Andrew On Thu, 11 May 2006, Dave Fawthrop wrote: > My etexts are individual stories and/or poems which are known individually > and were often published several times in their own right. see etext no > 18175 > > from weekly newsletter > >>> > |Emerson's Wife and Other Western Stories, by Florence Finch Kelly 18309 > | [Illus.: Stanley L. Wood] > | [Contents: Emerson's Wife]] > | [ Colonel Kate's Protge] > | [ The Kid of Apache Teju] > | [ A Blaze on Pard Huff] > | [ How Colonel Kate Won Her Spurs] > | [ Hollyhocks] > | [ The Rise, Fall, and Redemption of Johnson Sides] > | [ A Piece of Wreckage] > | [ The Story of a Chinee Kid] > | [ Out of Sympathy] > | [ An Old Roman of Mariposa] > | [ Out of the Mouth of Babes] > | [ Posey] > | [ A Case of the Inner Imperative] > | [Link: http://www.gutenberg.org/1/8/3/0/18309 ] > | [Files: 18309.txt; 18309-8.txt; 18309-h.htm; ] > <<< > From Bowerbird at aol.com Sun May 14 10:21:59 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun May 14 10:22:10 2006 Subject: [gutvol-d] on the will to scan books and digitize their text Message-ID: <3f4.26f0823.3198c137@aol.com> i said: > however, if the paper-book is in fairly good shape, and > its text is rather straightforward, and a best-of-breed > scanner is used, and the scans are done carefully, and > then properly treated (e.g., deskewed and regularized), > and o.c.r. is done with a best-of-breed program, then > auto-clean-up tools combined with normal spell-check > will produce quite accurate text, thank you very much... ...and... > so it's 600,000 scanned versus 10,000 proofed... > meaning that any people who want one of those > 590,000 that have been scanned but not proofed > will need to do the o.c.r. and proofing themselves, > providing they can locate the scan-set online... > sounds fair to me. of course, whether or not people will deem it necessary or even desirable to _do_the_work_ to get digital text is another question entirely. branko ran a poll at the teleread site and in the forums at distributed proofreaders, and the results indicate that people have little interest in digitizing their home library, the books they have sitting as paper-copies in their homes. over _half_ say they'd digitize them only if it could be done with _less_than_one_hour_per_book_. over one-quarter say they'd do it only if it took _less_than_ten_minutes_per_book_. a not-insignificant number want it happen almost _magically_, having it take it _less_than_one_minute_per_book. (telepathy?) since teleread specializes in creating unrealistic expectations, it would be tempting to chalk these poll results up to that, but alas, some respondents are people who actually digitize books. (but yes, the teleread respondents are even more out of touch.) it's somewhat shocking to understand that even people from distributed proofreaders say this, some of whom have likely spent more than ten minutes proofing _a_couple_pages_, so they have to know that time-frame is completely unrealistic. so this isn't just massive ignorance about the time required. the results tell us that _if_ they have a paper copy of a book, people seem to feel little need for a digital copy of the text. i know that i often tend to think from a mindset that posits that digital text has many advantages over the printed page, but people seem not to consider those advantages important. at least not enough to merit a non-trivial amount of their time. it seems only natural to extend the results, that if people have the scan-set of a book, they'd have little need for digital text... *** meanwhile, an article by kevin kelley in the new york times: > http://www.nytimes.com/2006/05/12/us/12vote.html?ex=1305086400 > &en=5b3554a76aad524a&ei=5090&partner=rssuserland&emc=rss informs us a company in china has scanned 1.3 million unique titles in chinese, which it estimates is about half of the books published in the chinese language since 1949. that's right: _1.3_million_. already. and still going strong... while we americans can't even get to a paperless office, and publishers sue the daylights out of the one and only company in this country who is willing to scan our libraries, china is moving quickly to becoming a paperless country... so michael, in spite of the flack that people want to give you, it looks like you've been _undercounting_, by a wide margin... and maybe just maybe you're holding your "world e-book fair" on the wrong side of the globe... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060514/1a09d237/attachment.html From gbnewby at pglaf.org Sun May 14 15:58:24 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Sun May 14 15:58:25 2006 Subject: [gutvol-d] WAP books Message-ID: <20060514225824.GA21841@pglaf.org> I remember Marcello had done some work to allow www.gutenberg.org to work on mobile phones via the WAP method. We're working with someone to get a bunch more reformatted PG eBooks for mobile phones...copying them to wap.readingroo.ms (not yet running) from wap.mobilebooks.org Question: has anyone had success with their WAP-enabled phone via wap.mobilebooks.org ? Feedback or input would be valuable! Keep in mind, there are a *lot* of people with a *lot* of mobile phones. Making our eBooks "phone-friendly" will be a great accomplishment. -- Greg From sly at victoria.tc.ca Sun May 14 17:27:04 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Sun May 14 17:27:07 2006 Subject: [gutvol-d] Michael Geist lecture In-Reply-To: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com> References: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com> Message-ID: Some here may be interested to read "Our Own Creative Land: Cultural Monopoly and the Trouble With Copyright," by Michael Geist. It is an exploration of issues of copyright, politics, and digital content from a Canadian perspective. http://www.p2pnet.net/story/8776 Andrew From bruce at zuhause.org Sun May 14 20:15:45 2006 From: bruce at zuhause.org (Bruce Albrecht) Date: Sun May 14 20:15:49 2006 Subject: [gutvol-d] on the will to scan books and digitize their text In-Reply-To: <3f4.26f0823.3198c137@aol.com> References: <3f4.26f0823.3198c137@aol.com> Message-ID: <17511.62049.910097.161473@celery.zuhause.org> Bowerbird@aol.com writes: > of course, whether or not people will deem it necessary > or even desirable to _do_the_work_ to get digital text is > another question entirely. Well, I think this really depends on the answers to questions like "Do I have a electronic reader I would prefer to read over a paperback?", and "Am I likely to reread this book often enough that spending an additional couple hours converting it to something usable on my reader worth my time?" At least with producing texts for Project Gutenberg, even if you never read it again, presumably others will. From brad at chenla.org Mon May 15 04:43:55 2006 From: brad at chenla.org (Brad Collins) Date: Mon May 15 05:38:14 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: If you haven't seen this, it's well worth the read. For those of you who don't know, the author, Kevin Kelly used to be the chief editor of Wired Magazine back in it's glory days during the great Bubble. http://www.nytimes.com/2006/05/14/magazine/14publishing.html?_r=1&oref=slogin&pagewanted=all b/ -- Brad Collins , Banqwao, Thailand From Bowerbird at aol.com Mon May 15 10:38:47 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 15 10:39:01 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <417.1c4ad86.319a16a7@aol.com> brad said: > If you haven't seen this, it's well worth the read. kevin kelly is always good for a sweeping overview. in spite of the tone of the article, the only "newish" idea in it is the notion that "books will read each other" and become synergistically interlinked, and that idea is one that is both interesting and perplexing at the same time. how -- _exactly_ -- is this supposed to happen? neither links nor tags, in their current form anyway, indicate an association between two external entities. even the most basic of building blocks in that regard -- a clean "a.p.i." into the cyberlibrary -- is absent... heck, the official policy at project gutenberg is that people must _not_ "deep-link" into the _content_ of your books per se, but rather only to a catalog page. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060515/b4f93d2d/attachment.html From traverso at dm.unipi.it Mon May 15 12:41:40 2006 From: traverso at dm.unipi.it (Carlo Traverso) Date: Mon May 15 12:31:53 2006 Subject: [gutvol-d] on the will to scan books and digitize their text In-Reply-To: <17511.62049.910097.161473@celery.zuhause.org> (message from Bruce Albrecht on Sun, 14 May 2006 22:15:45 -0500) References: <3f4.26f0823.3198c137@aol.com> <17511.62049.910097.161473@celery.zuhause.org> Message-ID: <200605151941.k4FJfen18024@pico.dm.unipi.it> >>>>> "Bruce" == Bruce Albrecht writes: Bruce> Bowerbird@aol.com writes: >> of course, whether or not people will deem it necessary or even >> desirable to _do_the_work_ to get digital text is another >> question entirely. Bruce> Well, I think this really depends on the answers to Bruce> questions like "Do I have a electronic reader I would Bruce> prefer to read over a paperback?", and "Am I likely to Bruce> reread this book often enough that spending an additional Bruce> couple hours converting it to something usable on my reader Bruce> worth my time?" At least with producing texts for Project Bruce> Gutenberg, even if you never read it again, presumably Bruce> others will. As the question was posed, the answer also depends on how many books you owe. Having several thousand books, even at one minute per book it is a huge work, especially if you are not going to read many of them, and you are not allowed to share them because of copyright reasons (and even the copying might be illegal). >From what I understood, however, the meaning of the pool was to measure the difference of results between DP and telerad users. Carlo From hyphen at hyphenologist.co.uk Thu May 18 09:04:24 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Thu May 18 09:11:15 2006 Subject: [gutvol-d] UniPad 1.20 has been released Message-ID: Just got this Unipad is a programmers text editor which works in Unicode I have 1.0 which works well with internal huge font. Someone here may be interested |We are glad to inform you about the release of UniPad 1.20. |This upgrade is free of charge for all registered users. | | |Changes, corrections and new features: | |Upgrade to Unicode 4.1: | |o This version of UniPad incorporates over 1270 new characters that | have been added since Unicode 4.0. | |o New complete scripts are: | New Tai Lue, Buginese, Glagolitic, Coptic, Tifinagh, Syloti Nagri, | Old Persian, Kharoshthi. | |o Some scripts have been extended by adding new character blocks: | Arabic with Arabic Supplement, Georgian with Georgian Supplement, | Ethiopic with Ethiopic Supplement and Ethiopic Extented. | |o Some supplemental character blocks have been added: | Phonetic Extensions Supplement, Combining Diacritical Marks Supplement, | Supplemental Punctuation. | |o Other new blocks are: CJK Strokes, Modifier Tone Letters, Vertical Forms, | Ancient Greek Numbers, Ancient Greek Musical Notation. | |o Over 700 new characters have been added to the following existing character | blocks and scripts: | Latin Extended-B, Combining Diacritical Marks, Greek | and Coptic, Cyrillic, Hebrew, Arabic, Devanagari, Bengali | Tamil, Tibetan, Georgian, Ethiopic, Phonetic Extensions, General Punctuation, | Currency Symbols, Combining Diacritical Marks for Symbols, Letterlike | Symbols, Miscellaneous Technical, Miscellaneous Symbols, Miscellaneous | Mathematical Symbols-A, Miscellaneous Symbols and Arrows, Enclosed CJK | Letters and Months, CJK Unified Ideographs, CJK Compatibility Ideographs, | Mathematical Alphanumeric Symbols. | |o For more information please visit the Unicode 4.1 page at the official web | site of the Unicode Consortium: . | | |New keyboards: | |o Dzongkha, Uyghur, Polish (Programer), Kannada, Dari, Pashto, Uzbek (Southern). | | |UniPad 1.20 is trialware. It runs in either unregistered or registered mode. Running UniPad in | |unregistered mode is free for anyone. Running UniPad in unregistred mode is "Session-Limited". After the | |session time you can save your work and restart UniPad for a new session. | | |If you register, you will be able to run UniPad in registered mode and this limitation will be removed. | |Download: | http:/www.unipad.org/download | |Registration: | http:/www.unipad.org/register -- Dave Fawthrop From brad at chenla.org Fri May 19 05:06:11 2006 From: brad at chenla.org (Brad Collins) Date: Fri May 19 05:03:34 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <417.1c4ad86.319a16a7@aol.com> (Bowerbird@aol.com's message of "Mon, 15 May 2006 13:38:47 EDT") References: <417.1c4ad86.319a16a7@aol.com> Message-ID: Bowerbird@aol.com writes: > in spite of the tone of the article, the only "newish" idea > in it is the notion that "books will read each other" and > become synergistically interlinked, and that idea is one > that is both interesting and perplexing at the same time. > > how -- _exactly_ -- is this supposed to happen? > > neither links nor tags, in their current form anyway, > indicate an association between two external entities. > > even the most basic of building blocks in that regard > -- a clean "a.p.i." into the cyberlibrary -- is absent... You're quite right. The article talks of scanning which is the first stage. DP/PG takes it to the next stage by turning scans into electronic texts, but there hasn't been anything that has stepped up to bat to take on the next stage. This is exactly what I've been working on these past few years. We'll be launching the spec (open and free) at the Extreme Markup Language conference in Montreal in August. At the same time we will provide an AJAX Web application, and Emacs based development environment a set of XSLT style sheets for converting into common formats, complete documentation and a set of test data which will provide examples and a data set for developing applications. I am now revising the paper which we will introduce at the conference and will send it out to anyone who is interested for feedback. When it's ready I'll send a blurb to the list with a berief description of the framework and see if anyone is interested in taking a look. b/ -- Brad Collins , Banqwao, Thailand From Bowerbird at aol.com Fri May 19 11:53:50 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri May 19 11:53:59 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <36f.41eaefe.319f6e3e@aol.com> brad said: > This is exactly what I've been working on these past few years. > We'll be launching the spec (open and free) at the great! i look forward to playing with your stuff. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060519/90ad9c61/attachment.html From Bowerbird at aol.com Sat May 20 14:24:59 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat May 20 14:25:11 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <370.3e9dbdc.31a0e32b@aol.com> carlo said: > Just a possibility: > if a book quotes another, > and both are available > a link can be created > from one to the other. yes, an exact quote is pretty easy to locate. and of course, in most books, an exact quote will already have been credited explicitly there, so there's not even a need to do the search digitally (although it might be easier to do it programmatically than to try to locate all of these references "manually", at least if the false alarms from common expressions were manageable). > Easy if an exact quotation is available, > more intriguing if the reference is vague. the recent cases of plagiarism in books coming out of major publishing houses comes to mind here, doesn't it? it shouldn't be too hard to write a routine that would do a relatively good job rounding up "fuzzy" versions of quotes, as long as they had some substantial similarity... > Another could be to find similar content > even in absence of an explicit quotation, > and have a link from a section of a book to > a list of sections of other books. This might be > a detection of keywords coupled with a keyword search. well yes, this is what i would think most people have in mind when they talk about "the pages of books reading each other". and since we will be indexing each book anyway, it would be rather easy to set up a "semantic profile" of each book detailing any idiosyncratic words that are fairly common within its pages. books that have similar profiles could be linked to each other, and then pages or sections within one book that are similar to those in the other book could be linked together too, certainly. amazon already does this in a fashion with their listings of the "statistically improbable phrases" -- sips -- within each book, and the "capitalized phrases" -- caps -- so that's fairly obvious. amazon's "sips" and "caps" are interesting because you can click on them, and see a list of other books where they also appear... amazon also does an overall concordance, but i'd guess that that isn't as useful for earmarking a book in a set of its peers. (well, it looks like they formulate what's called a "tag cloud", with links to the actual occurrences, so that could be useful. still, i think the "sips" and "caps" would be more meaningful.) lastly, amazon also lists "books on related topics" for each book. i don't know if the quality of these associations is up to snuff -- amazon's version of "collaborative filtering" is a very bad joke -- but i'd imagine that it has utility for a range of book-buyers... (ok, more exploration tells me they do indeed use an overlap of "sips" as their main tool in discerning "books on related topics".) for those people who've never fooled around over at amazon, i've appended a sample of some of their info for a specific book. of course, what we _really_ want is not just to be _informed_ about similar books, but to have actual, honest-to-goodness hyperlinks between 'em, so we can point-and-click at our desks, rather than just order paper-copies to be delivered to our desks. > Both are already possible at the present state of technology. these and more, absolutely. tagging and annotation are other options that get thrown around a lot. the idea here would be that interested users would form a "folksonomy" that would link related books, perhaps with a commentary of their own. this would give us the type of "intelligence" that can only be exercised by actual human brains, and which might complement and/or supersede the "brute force" approach of automatic computerized semantic analysis. in another vein, the cats at the institute for the future of the book seem fond of author/reader interaction in the actual _writing_ of the book, in a process where a book grows "organically", against the backdrop of the cyberlibrary. in this approach, links might _predate_ the content -- in essence be a "cause" of the content, rather than merely an "effect" -- which is an interesting view... likewise, david rothman's hobbyhorse these days is "blogs inside of books". the initial version of an openreader viewer-app will support this capability, so david has been raving about this feature like it's some kind of epiphany. he is even of the opinion that amazon -- which announced such a feature will be available soon in mobipocket -- could save themselves "a fortune in development costs" by using openreader instead of mobipocket. given that it's relatively trivial to embed this capability, i'm not sure what he's thinking. i would go over to his blog and ask him, but i've been semi-officially banned -- i'm not banned, but many of my posts have now permanently disappeared -- because i have this annoying habit of saying things that do not go along with the official spin that he likes to hype over there. so i would certainly hope that his put-a-blog-in-your-book capability allows an author to ban any "trolls", because we wouldn't want to experience any disagreement now, would we? either way, amazon seems to have had no trouble implementing a "discussion" section feature -- currently labeled as "beta" -- on its webpage for each book. this is in addition to the "wiki" which it already had, the purpose of which i am not all that sure, and haven't investigated, because the overwhelming nature of all of the _stuff_ on each amazon page gives me bad information overload, and after a while i just feel a strong need to get the heck out of there! :+) *** at any rate, these are some of the ideas that are bubbling at the surface. and though some of them sound interesting, to be sure, i also find that i am left wondering if all of this "books reading each other" stuff is gonna lead to something immense that leapfrogs us to the next level of super-intelligence, or whether it's all much ado about not too much... time will tell, i guess... -bowerbird p.s. here is some of the information that amazon gives for a 1997 book... Internet Dreams: Archetypes, Myths, and Metaphors (Paperback) edited by Mark Stefik, with Introduction by Vinton Cerf First Sentence: We are born into a world rich in art, invention, and knowledge Statistically Improbable Phrases (SIPs): electronic mail metaphor, digital library metaphor, electronic sketch book, digital tickets, electronic brokerage effect, digital property rights, networked libraries, digital works, marketplace metaphor, superhighway metaphor, digital library system, trusted systems, digital book, dream session, editing test, digital reality, new design methods, usage rights, digital library project, electronic hierarchies, virtual rape, warrior archetype, digital publishing, fire bringer, electronic mail address Capitalized Phrases (CAPs): Library of Congress, Jeremy Taylor, United States, Gutenberg Bible, America Online, British Library, New York, Vannevar Bush, World Wide Web, Boston Spa, Joshua Lederberg, Lynn Conway, Palo Alto, San Francisco, Turing Test, Bungle Affair, Carver Mead, Challenging Assumptions, The Machine Stops, Yellow Pages, Alexander Eliot, Civil War, Digital Property Trust, Internet Companion, Libraries of the Future Concordance These are the 100 most frequently used words in this book: access another article available between book case changes come communication community computer copy costs course design different digital dream dreammc electronic even example experience first form get good group however idea information internet joannel2 journals kinds know knowledge large library life market may means meeting members message metaphor methods might mud need network new now number often others own paper part participants people place players problem process project provide public publishers publishing read real repositories research rights room say see sense several should social society system take technology text things thus time two use used users virtual without work world? Text Stats These statistics are computed from the text of this book. Readability -- Compared with books in All Categories Fog Index: 15.9 -- 75% are easier, 25% are harder Flesch Index: 40.0 -- 69% are easier, 31% are harder Flesch-Kincaid Index: 13.0 -- 76% are easier, 24% are harder ? Complexity Complex Words: 18% -- 66% have fewer, 34% have more Syllables per Word: 1.7 -- 65% have fewer, 35% have more Words per Sentence: 21.3 -- 77% have fewer, 23% have more Number of: Characters: 804,122 -- 83% have fewer, 17% have more Words: 129,664 -- 85% have fewer, 15% have more Sentences: 6,078 -- 73% have fewer, 27% have more ? Fun stats Words per Dollar: 4,987 ? Words per Ounce: 5,332 ? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060520/a226f373/attachment.html From hart at pglaf.org Mon May 22 07:33:29 2006 From: hart at pglaf.org (Michael Hart) Date: Mon May 22 07:33:31 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: References: <417.1c4ad86.319a16a7@aol.com> Message-ID: On Fri, 19 May 2006, Brad Collins wrote: > Bowerbird@aol.com writes: > >> in spite of the tone of the article, the only "newish" idea >> in it is the notion that "books will read each other" and >> become synergistically interlinked, and that idea is one >> that is both interesting and perplexing at the same time. >> >> how -- _exactly_ -- is this supposed to happen? >> >> neither links nor tags, in their current form anyway, >> indicate an association between two external entities. >> >> even the most basic of building blocks in that regard >> -- a clean "a.p.i." into the cyberlibrary -- is absent... > > > You're quite right. The article talks of scanning which is the first > stage. DP/PG takes it to the next stage by turning scans into > electronic texts, but there hasn't been anything that has stepped up > to bat to take on the next stage. It is quite interesting that Kevin Kelly has ignored Project Gutenberg so throughly since Conde Naste has taken over WIRED, and WIRED used to mention PG several times a year, sometimes even in cover stories or in their timeline of the greatest Millennium events. A friend asked him why. . .he said it was due to space limitations. ~8,000 words? No problem mentioning The Billionaire Boys Club eBooks projects. ;-) From davidrothman at pobox.com Mon May 22 07:21:25 2006 From: davidrothman at pobox.com (David H. Rothman) Date: Mon May 22 08:11:04 2006 Subject: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries] In-Reply-To: <20060522102055.1501635176.davidrothman@pobox.com> References: <370.3e9dbdc.31a0e32b@aol.com> <20060522044422.1912481309.davidrothman@pobox.com> <20060522045804.1822398048.davidrothman@pobox.com> <20060522050215.386360944.davidrothman@pobox.com> <20060522050755.1420494486.davidrothman@pobox.com> <20060522051428.967862575.davidrothman@pobox.com> <20060522051648.800796541.davidrothman@pobox.com> <20060522051925.15344279.davidrothman@pobox.com> <20060522053339.229489020.davidrothman@pobox.com> <20060522053502.448537029.davidrothman@pobox.com> <20060522053913.277997531.davidrothman@pobox.com> <20060522054828.358784686.davidrothman@pobox.com> <20060522054932.358966824.davidrothman@pobox.com> <20060522055320.1778311864.davidrothman@pobox.com> <20060522060428.714989660.davidrothman@pobox.com> <20060522061851.1553625961.davidrothman@pobox.com> <20060522062404.772062695.davidrothman@pobox.com> <20060522062415.383795055.davidrothman@pobox.com> <20060522064200.1649312626.davidrothman@pobox.com> <20060522064251.1131791825.davidrothman@pobox.com> <20060522064753.1548642210.davidrothman@pobox.com> <20060522065004.1797726829.davidrothman@pobox.com> <20060522065504.1505615045.davidrothman@pobox.com> <20060522070022.1954849555.davidrothman@pobox.com> <20060522070738.663127506.davidrothman@pobox.com> <20060522070958.1282631953.davidrothman@pobox.com> <20060522071023.1391252226.davidrothman@pobox.com> <20060522071226.1709291641.davidrothman@pobox.com> <20060522071348.1904243163.davidrothman@pobox.com> <20060522071427.1894670678.davidrothman@pobox.com> <20060522071833.1920423911.davidrothman@pobox.com> <20060522071850.1587886708.davidrothman@pobox.com> <20060522071957.745063081.davidrothman@pobox.com> <20060522072006.917881182.davidrothman@pobox.com> <20060522072207.742524294.davidrothman@pobox.com> <20060522072510.1578459509.davidrothman@pobox.com> <20060522072705.24216481.davidrothman@pobox.com> <20060522073142.450694835.davidrothman@pobox.com> <20060522073253.385219765.davidrothman@pobox.com> <20060522073647.2116477931.davidrothman@pobox.com> <20060522074157.443744003.davidrothman@pobox.com> <20060522074246.1132228085.davidrothman@pobox.com> <20060522074400.1976932334.davidrothman@pobox.com> <20060522074704.1214913263.davidrothman@pobox.com> <20060522075003.605536152.davidrothman@pobox.com> <20060522075149.400459223.davidrothman@pobox.com> <20060522075155.116956168.davidrothman@pobox.com> <20060522075202.1081707116.davidrothman@pobox.com> <20060522075345.1846128214.davidrothman@pobox.com> <20060522075613.509322149.davidrothman@pobox.com> <20060522075628.1580225884.davidrothman@pobox.com> <20060522075657.2074880068.davidrothman@pobox.com> <20060522075824.497658352.davidrothman@pobox.com> <20060522080003.280257892.davidrothman@pobox.com> <20060522080254.151952726.davidrothman@pobox.com> <20060522080441.1099670084.davidrothman@pobox.com> <20060522080516.148514185.davidrothman@pobox.com> <20060522081044.833029587.davidrothman@pobox.com> <20060522081452.935214828.davidrothman@pobox.com> <20060522081741.1613771435.davidrothman@pobox.com> <20060522081906.196095457.davidrothman@pobox.com> <20060522082107.414609285.davidrothman@pobox.com> <20060522082126.36116734.davidrothman@pobox.com> <20060522082511.185524298.davidrothman@pobox.com> <20060522082539.808272106.davidrothman@pobox.com> <20060522082604.1326329017.davidrothman@pobox.com> <20060522082619.810954139.davidrothman@pobox.com> <20060522082659.59838956.davidrothman@pobox.com> <20060522082704.141781736.davidrothman@pobox.com> <20060522082714.1016313511.davidrothman@pobox.com> <20060522083446.993232898.davidrothman@pobox.com> <20060522101653.1962499062.davidrothman@pobox.com> <20060522102055.1501635176.davidrothman@pobox.com> Message-ID: <20060522102125.1706105918.davidrothman@pobox.com> A feathery troll with competing business interests is once again abusing this PG list to smear the OpenReader e-book standard and the TeleRead Web Log. The normal rule is, "Don't feed the troll." But every now and then, as cofounder of OpenReader and moderator of the TeleBlog, I just may pop up with the facts for the benefit of newbies who don't yet know what's going on. Yes, e-book software from OSoft, our first implementer, been available for several years now to do SHARED annotations. Embedded forums and even blogs inside books will be on the way. Imagine how this could help such wonderful activities as collaborative learning in K-12. We're talking here about dotReader, the new name for OSoft's ThoutReader, except it'll work with the OpenReader format. dotReader is a real app for real users--developed by a real company, as opposed to a troll in the basement, so to speak. dotReader is named for Dorothy Thompson, an early foe of fascism and the leading female news commentator of the 30s and 40s. Miss Thompson also happened to be married to Sinclair Lewis, one of my favorites. She loved to annotate her books and share her literary enthusiasm with friends. Fittingly, then, an early book for dotReader will be Peter Kurth's "American Cassandra: The Life of Dorothy Thompson," which the L.A. Times hailed as "exemplary biography, thoroughly researched and entertainingly written." The next will very likely be a food guide, also the subject of stellar reviews. Readers will be able to use dotReader's interactive capabilities to help the coauthors of the food book stay up to date. Needless to say, OSoft is particularly keen on seeing public domain literature in its format. Anyway, I hope I've made this case for the R Word--Real. The real company behind dotReader consists of two super-hardworking guys who've bet their assets in a serious way and hired other programmers. This is American business and technology at its best. I'm reminded of Preston Tucker, who was to Ford, GM and the others what OSoft is to Adobe, Microsoft, Amazon and the rest (http://en.wikipedia.org/wiki/Preston_Tucker). It's a shame that the troll is less interested in disc brakes than in badmouthing the Tuckers of e-book software. See the true details for yourself at http://www.dotreader.org. The site for the OpenReader standard is at http://www.openreader.org. The TeleBlog, where I've often discussed related issues, is at http://www.teleread.org/blog. You can reach OSoft CEO Mark Carey and CTO Gary Varnell through the info at http://www.dotreader.com/site/?q=node/21. Sign up for progress reports via dotReader's home page. Like the troll, the proprietary formatters aren't too happy. But OpenReader and dotReader are happening anyway; and I'm expecting an outcome much happier than Tucker's. OpenReader and dotReader have just returned from a highly successful visit to BookExpo America. Some major publishers have expressed serious interest. It's just a matter of getting one of them to be the first to benefit from OpenReader and dotReader--a major but not insurmountable challenge in a conservative industry such as publishing. Gary and Mark are also keen on hearing from smaller publishers, whose use of the format and compatible software will help with the larger houses. More importantly, the cause of small publishers is worthy in itself. The first version of dotReader will hit the Net for free this summer, thereby driving the troll crazy since all along he kept claiming that the OpenReader standard would be just vaporware. Might there be delays? Of course, as with any quality-minded effort. But yes, the launch will happen soon, and it would be good if the troll surpassed low expectations and apologized for his persistent falsehoods and abuse of the PG list. > ... amazon -- which announced such a feature will be available soon in mobipocket -- could save themselves "a fortune in development costs" by using openreader instead of mobipocket. given that it's relatively trivial to embed this capability, i'm not sure what he's thinking. Amazon could not give me a date when Mobipocket would have the shared annotations feature, so, in a Mobi context, we're not necessarily talking "soon" at all. The shared annotations will be available for now only via a Web browser, an e-book museum kind of deal, rather than downloadable e-book files. Besides, whether the issue is standards-compliance or user customization, dotReader does indeed leave Mobipocket behind in the dust. Those guys either don't get standards or bungled Mobipocket in certain ways. I love Mobipocket compared to Adobe, but OSoft's dotReader will be much better than either, and Amazon and rivals would be damn foolish not to use this rather affordable technology, which they would be free to rebrand. The main credit for dotReader, of course, goes to Gary Varnell and Mark Cary at OSoft, but Jon and I have contributed hundreds of suggestions to OpenReader first implementer. What's more, another terrific implementation, FBReader, is on the way; and you can bet I'll be bragging about FBReader, too, going by the high quality of the work (I'm basing this statement on what friends have told me). Other fine implementers cherished! Catch up with jon@openreader.org for OpenReader's preliminary specs and also to provide him your own feedback. Moreover, if you're a publisher, including the public domain variety, give us your thoughts on the traits you'd like for authoring and translation tools. Feedback from small guys especially cherished! I'd love for Gutenberg itself to offer OpenReader format and for, say, Michael Hart or Greg Newby to participate constructively in the standards-setting process. We want the OpenReader process moved to a group such as OASIS so we link up with established standard-setting. Yes, Michael, OpenReader-to-ASCII conversion will be trivial. Contrast that with Amazon's proprietary approach. Like the troll, by the way, Amazon reads my TeleRead blog. I hear you do, too. That's great. Despite my disagreements over various issues, such as the best way to achieve true QC, I continue to wish for Project Gutenberg's success since I don't want governments, library bureaucrats and big publishers or even small ones to dictate which books survive. We need all kind of approaches. It is unfortunate that the troll is casting issues in terms of one vs. another. I'd love to see my own library efforts reinforce PG's and vice versa. As for the troll's competing business interests vs. OpenReader's: 1. Jon Noring tells me that the feathery troll has given his ZML-related app a $50K price tag. True? Or is the troll just doing his reader as part of an Albert Schweitzer act for the good of humanity. If the price is a mere $50K, I'm amazed. Isn't the app worth with more? Poor creature. The many hours spent trolling against OpenReader are seemingly part of the development costs of his ZML work. I'm flattered that we're such a high priority. Before the troll gave up, he'd made hundreds of posts of the TeleBlog, and many and perhaps even most of them were either diatribes against either OpenReader or those involved. 2. From what I hear--true?--the troll is refusing to open source his app. By contrast, OSoft, despite having bet hundreds of thousands of dollars on development costs, is open-sourcing everything but the rather optional DRM that it added only at the vehement insistence of publishers. 3. If the troll's ZML surprised the cosmos and caught on, it would be trivial for the free dotReader to read it since it'll work with almost any XMLish format despite the focus on OpenReader, the new standard. Whoops. There goes the business model of either the troll or whoever buys his app. On a very related topic, no, the troll was not banned from the TeleBlog, but I do agree with him that it would be a big time-waster for him to return to the area and continue his trolling. Then he'd go on moderation, waste a lot of people's time and perhaps end up after all being the first of many hundreds of commenters to be banned--well, with the exception of the less subtle spammers. For the troll's acknowledgment that he didn't want to play by the TeleBlog's rules, check out the message below. You'll see that when our spam filters kept eating up the troll's remarks, Jon Noring offered to work with the him to solve the problem for both him and the TeleBlog; we don't like any legit comments to vanish. Alas, however, the troll turned down the suggestion. All we wanted in return was civil conduct. For those who don't know, Jon is a TeleBlog participant and the main founder of OpenReader--someone who, unlike the troll, has been highly active in mainstream e-book standards setting efforts with both small and large publishers involved. Significantly, the TeleBlog is Usenet NOT. We are a community where we compare notes on e-book technology and news, and where we welcome constructive disagreement rather than the trollish kind so well defined in the Wikipedia (http://en.wikipedia.org/wiki/Internet_troll). If our ill-feathered friend changes his ways and acts civilly, rather than doing troll spams, of course he'll be welcome back--with the expectation that he'll be a full community member as opposed to working constantly for his $50K or whatever he wants for his app. Along with the other TeleBlog regulars, I love Usenet as Usenet, but we'll not alter the community nature of the TeleBlog to make it troll-friendly. As for the PG list, which the troll has used to disparage not just me and OpenReader but also the whole TeleBlog community (supposedly we're all "unrealistic" or whatever the adjective--not just the evil moderator), it's the decision of Michael and Greg. If they want to keep the PG list unmoderated, that's fine. At the same time they would do well to remember how Joseph McCarthy took advantage of the "objective" nature of the press to engage in never-ending character assassination against opponents, just as the troll is doing against those he sees as business rivals standing in the way of his $50K and his pride. See http://en.wikipedia.org/wiki/Mccarthyism for more on McCarthy's lies, which branded a number of people as unAmerican, ranging the composer Aaron Copeland to the novelist Dashiell Hammett. The troll's McCarthy-style posts are a vicious and highly disreputable use of the PG list, and it's high time that more responsible members of the list asked the troll to stop them. As Joseph Welch, a U.S. Army lawyer, told the late Senator from Wisconsin: "You've done enough. Have you no sense of decency, sir, at long last? Have you left no sense of decency?" The troll is not just an innocent clown. While he delights in jousting with those he perceives as enemies and is motivated by his product-related pride as much as anything else, he also wants cash for this competing reader app; and he doesn't care whom he smears along the way. Michael Hart, whom the troll admires, and whom I do, too, regardless of my disagreements on certain matters, perhaps can tell the troll to stop doing his McCarthyism act. The troll's Internet McCarthyism--this repeated use of outright lies in a forum to which he has easy access--reflects badly on the PG list and lowers newbies' impression of a valuable group like PG. Thanks, David Rothman Cofounder of the OpenReader Consortium http://www.openreader.org Moderator of the TeleBlog http://www.teleread.org/blog davidrothman@openreader.org | dr@teleblog.org | 703-370-6540 P.S. This list is normally devoted to PG, not TeleRead or OpenReader. I wouldn't be commenting here except that the troll keeps gratuitously introducing these topic in a negative and misleading way. ================================================ ------------Original Message------------ From: Bowerbird@aol.com To: gutvol-d@lists.pglaf.org, Bowerbird@aol.com Date: Thu, Apr-13-2006 7:22 PM Subject: re: [gutvol-d] don't believe everything you read on the internets like i said, it doesn't matter to me if i'm banned or not, because i want to stop wasting time over there anyway. and since i am certainly not about to change what i say _or_ how i say it, it would only be a matter of time before i was banned eventually, "for the good of the community". besides, jon, with adobe plotting against openreader, you've got much bigger fish to fry than my fat old ass... -bowerbird From Bowerbird at aol.com Mon May 22 09:02:13 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 22 09:02:27 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <424.1cc65e1.31a33a85@aol.com> michael said: > A friend asked him why. . . > he said it was due to space limitations. well, my one quibble with the article was when he mentioned the google library scanning as the main impetus for resurging interest about a cyberlibrary. while that is undoubtedly true, i thought that he could have mentioned project gutenberg as well. it would've been a nod to your historic presence. however, if you want your library to move forward into the future that is being discussed, you'll need to consider the constructive criticisms i have issued. because if "books reading other books" _does_ take off, your library of standalone files which cannot will languish. you have the advantage now that most all of your files are pure ascii, so it behooves you to leverage that advantage... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/8aaa4a41/attachment.html From Bowerbird at aol.com Mon May 22 09:11:39 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 22 09:11:48 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <1bc.5089b96.31a33cbb@aol.com> geez, david, you'll have to do _something_ more original than calling me a "troll" here; that "strategy" was worn out years ago here... as for your fantasy that i "object" to openreader because i have "competitive business interests", what exactly would those "business interests" be? my source code is available for $200,000, (not $50k), but i will tell anyone who wants to buy it not to bother, that they should just go off and write it themselves, so i don't think you could call me a very good businessman. as for my app itself, i've always said it's available for free. again, not a good businessman. what can i say? i'm a poet. a good businessman would get a booth at book expo, and hawk the product there. just like you're doing now. for the record, i wish the osoft people all the best. you were very lucky they came along when they did, and i'll be glad when they finally turn your precious "openreader" format into something besides vapor. at that point, we will be able to measure its benefits and its costs, to decide how worthwhile its cost-benefit will be. considering the costs involve application of heavy markup, and similar benefits can be delivered with lighter markup, i'm not sure how you will ever be able to convince anyone that your cost-benefit ratio will be the best one available. but hey, microsoft has been able to do that for _years_, so don't give up... -bowerbird p.s. evidently, book expo ain't keeping you very busy. kinda slow around the openreader booth, is it? :+) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/f413137f/attachment.html From hart at pglaf.org Mon May 22 10:54:42 2006 From: hart at pglaf.org (Michael Hart) Date: Mon May 22 10:54:44 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <424.1cc65e1.31a33a85@aol.com> References: <424.1cc65e1.31a33a85@aol.com> Message-ID: On Mon, 22 May 2006 Bowerbird@aol.com wrote: > michael said: >> A friend asked him why. . . >> he said it was due to space limitations. > > well, my one quibble with the article was when he > mentioned the google library scanning as the main > impetus for resurging interest about a cyberlibrary. Obviously the press coverage about "Google library scanning" has done more "as the main impetus for resurg[ent] interest in a cyberlibrary" than the actualy scanning itself. We are coming up on the 18 month anniversary of the monster press blitz that announced, "This is the day the world changes." And the latest estimated I have received show that Google's total number of books has just recently passed 50,000, then again similar reports say that 88% are neither downloadable nor proofread to any particular level of accuracy. If we double that number to 100,000, we could pretend these results indicated that Google had accomplished 1% of a goal of 10,000,000 books, in 25% of their 6 year plan. > while that is undoubtedly true, i thought that he > could have mentioned project gutenberg as well. > it would've been a nod to your historic presence. Somehow I don't think this was accidental. . . . Same with WIRED's approach since Conde Naste. . . . mh From jmdyck at ibiblio.org Mon May 22 12:26:12 2006 From: jmdyck at ibiblio.org (Michael Dyck) Date: Mon May 22 12:26:15 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: References: <424.1cc65e1.31a33a85@aol.com> Message-ID: <44721054.3000104@ibiblio.org> Michael Hart wrote: > > If we double that number to 100,000, we could pretend these > results indicated that Google had accomplished 1% of a goal > of 10,000,000 books, in 25% of their 6 year plan. In 1993, PG had accomplished 1% of its goal of 10,000, in about 70% of the total time. -Michael From Bowerbird at aol.com Mon May 22 12:55:40 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 22 12:55:53 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <439.1c91c11.31a3713c@aol.com> michael said: > Obviously the press coverage about "Google library scanning" > has done more "as the main impetus for resurg[ent] interest > in a cyberlibrary" than the actualy scanning itself. well, d'uh, of course. that's how it always is. > And the latest estimated I have received show that Google's > total number of books has just recently passed 50,000 i do believe you misread that. 50,000 public-domain titles, with another 42,000 under copyright, for a total of 92,000. but even if it is just 50,000 total, they're still on my schedule: i predicted 10,000 after one year, 100,000 after two years, 1 million after three years, and 10 million after four years... > similar reports say that 88% are neither downloadable > nor proofread to any particular level of accuracy. except it's not google's job to make them downloadable, not in convenient form, nor to proofread the digitized text. it is _our_ job to grab the scans (as nicely and neatly as possible, courteous and respectful of the cost they entailed by scanning), and to make them available in a convenient format for reading, as well as to formulate automatic procedures to digitize the text and take it to a very high degree of accuracy. even if google did do these jobs for us, i would still replicate it, because i don't want to have to be dependent on google forever. > Somehow I don't think this was accidental. . . . the point is, if your books were _already_ "reading each other", people would have been talking about it long before this article. -bowerbird p.s. i see you're one of those old-fashioned people who refuse to recognize "resurging" as an adjective. it's ok. hopefully, if i keep using it that way, i'll win. (i'm trying to change the usage of "hopefully" with the same strategy.) :+) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/b81ad113/attachment.html From fvandrog at scripps.edu Mon May 22 13:32:04 2006 From: fvandrog at scripps.edu (Frank van Drogen) Date: Mon May 22 13:47:05 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <439.1c91c11.31a3713c@aol.com> References: <439.1c91c11.31a3713c@aol.com> Message-ID: <6.2.0.8.0.20060522132732.02eaa4c8@mail.scripps.edu> > > And the latest estimated I have received show that Google's > > total number of books has just recently passed 50,000 > >i do believe you misread that. 50,000 public-domain titles, >with another 42,000 under copyright, for a total of 92,000. Even that number is a misinterpretation. There's at the moment 92000 pre-1923 books available from Google Print. The 50.000 that google has made fully downloadable have a clear pre-1923 copyright statement; the 42.000 don't have a clearcut copyright statement and thus Google only gives the snippet option. I've never read numbers about the post-1923 books available, Bruce doesn't look for those in his various searches, as far as I am aware. Frank From Bowerbird at aol.com Mon May 22 13:57:39 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 22 13:57:45 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <48c.693268.31a37fc3@aol.com> frank said: > Even that number is a misinterpretation thanks for clearing that up for us, frank... it's clear that google has gotten their legs under them in regard to doing the scanning. let's hope that they'll get their quality-control under control very soon too... it is important to keep in mind that 100,000 books is <1% of the 10.5 million (or more) they'll do eventually; it's understandable if the process isn't up to speed yet. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/690ed23f/attachment.html From Bowerbird at aol.com Mon May 22 14:01:14 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 22 14:01:22 2006 Subject: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries] Message-ID: <48b.696020.31a3809a@aol.com> david said: > You gratuitously attacked OpenReader out of the blue. no, i didn't. but i'm sure you'd like to spin it that way. what i said was that you were making a big deal about "putting a blog inside an e-book", when that is actually a somewhat trivial thing to do. look, i've inserted a blog inside of this e-mail: > http://www.buzzmachine.com/index.php/2006/05/19/the-book-is-dead-long-live-the-book/ -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/3fb1584a/attachment.html From fvandrog at scripps.edu Mon May 22 14:19:48 2006 From: fvandrog at scripps.edu (Frank van Drogen) Date: Mon May 22 14:19:42 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <48c.693268.31a37fc3@aol.com> References: <48c.693268.31a37fc3@aol.com> Message-ID: <6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu> >it's clear that google has gotten their legs under them >in regard to doing the scanning. let's hope that they'll >get their quality-control under control very soon too... I have found less missing pages and other problems in books from Google then in those from the MBP and Canadian/IA. They are, however, still far from perfect. When they get a report regarding a missing or wrongly scanned page in a PD book; it is apparently up to the providing library to get the problem sorted out. I've heard report of complete books being rescanned (with the risk of having another page missing in the end ;) ). I've also heard somebody mentioning that the full rescanned book was stuck behind the existing one (rather space consuming, but for DP purposes a lot saver. What worries me in this is that Google doesn't seem to care whether pages are missing or not... as long as they get 99% of the pages from a book stored, changes are most search terms pointing to the particular book will be identified. Their interest lies in people purchasing the book via Amazon, Abe etc. after identifying them via book.google.com. The best quality control I have encountered so far is on Gallica, where appart from missing pages due to those pages missing in the original scanned manuscript, I've not encountered incomplete books. I'd be actually interesting to see how they perfrom their quality control. Frank From marcello at perathoner.de Mon May 22 14:37:28 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon May 22 14:37:31 2006 Subject: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries] In-Reply-To: <20060522102125.1706105918.davidrothman@pobox.com> References: <370.3e9dbdc.31a0e32b@aol.com> <20060522080441.1099670084.davidrothman@pobox.com> <20060522080516.148514185.davidrothman@pobox.com> <20060522081044.833029587.davidrothman@pobox.com> <20060522081452.935214828.davidrothman@pobox.com> <20060522081741.1613771435.davidrothman@pobox.com> <20060522081906.196095457.davidrothman@pobox.com> <20060522082107.414609285.davidrothman@pobox.com> <20060522082126.36116734.davidrothman@pobox.com> <20060522082511.185524298.davidrothman@pobox.com> <20060522082539.808272106.davidrothman@pobox.com> <20060522082604.1326329017.davidrothman@pobox.com> <20060522082619.810954139.davidrothman@pobox.com> <20060522082659.59838956.davidrothman@pobox.com> <20060522082704.141781736.davidrothman@pobox.com> <20060522082714.1016313511.davidrothman@pobox.com> <20060522083446.993232898.davidrothman@pobox.com> <20060522101653.1962499062.davidrothman@pobox.com> <20060522102055.1501635176.davidrothman@pobox.com> <20060522102125.1706105918.davidrothman@pobox.com> Message-ID: <44722F18.4080204@perathoner.de> David H. Rothman wrote: > See the true details for yourself at http://www.dotreader.org. It says: www.dotreader.org This page is parked free, courtesy of GoDaddy.com Did the .reader bubble already burst? > I'd love for Gutenberg itself to offer OpenReader format We already offer most books in plucker. That's because they are open format, widely deployed and offer an open toolchain. We serverd 89504 plucker books in May 2006. We'll see about OpenReader once you'll have widely deployed your .reader and made available an open toolchain. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Mon May 22 14:41:25 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 22 14:41:34 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <436.1cc35e4.31a38a05@aol.com> frank said: > I'd be actually interesting to see how they perfrom their quality control. must be something i'm missing, because quality control seems _easy_ to me. you step through the scan-set making sure you've got every page-number, and then you step through it again making sure every scan was a clean one that captured all of the text on the entire page without any blurs anywhere. if a page is bad, you redo the page, while the book is still right in your hands. on the other hand, getting a report of a bad page later means that you must go to all of the difficulty of fetching the book again, which is a pain in the ass. so, to my mind, the "learning curve" on any scanning project is learning to do it right the first time, so you don't have to re-do it. -bowerbird p.s. the idea that google rescans the whole book if a page is reported missing makes them seem downright stupid. if they keep that up, it'll take 'em 20 years. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/40406b42/attachment.html From gbnewby at pglaf.org Mon May 22 14:47:26 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Mon May 22 14:47:27 2006 Subject: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries] In-Reply-To: <44722F18.4080204@perathoner.de> References: <20060522082604.1326329017.davidrothman@pobox.com> <20060522082619.810954139.davidrothman@pobox.com> <20060522082659.59838956.davidrothman@pobox.com> <20060522082704.141781736.davidrothman@pobox.com> <20060522082714.1016313511.davidrothman@pobox.com> <20060522083446.993232898.davidrothman@pobox.com> <20060522101653.1962499062.davidrothman@pobox.com> <20060522102055.1501635176.davidrothman@pobox.com> <20060522102125.1706105918.davidrothman@pobox.com> <44722F18.4080204@perathoner.de> Message-ID: <20060522214726.GD4985@pglaf.org> On Mon, May 22, 2006 at 11:37:28PM +0200, Marcello Perathoner wrote: > David H. Rothman wrote: > > > > See the true details for yourself at http://www.dotreader.org. > > It says: > > www.dotreader.org > > This page is parked free, courtesy of GoDaddy.com > > Did the .reader bubble already burst? That's the funniest thing I've read all month. Thanks! dotreader.com seems a better source to start. I'm not sure if it has true details or not, though. -- Greg From jeroen.mailinglist at bohol.ph Mon May 22 14:46:04 2006 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Mon May 22 15:15:03 2006 Subject: [gutvol-d] Outsourcing scanning In-Reply-To: <6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu> References: <48c.693268.31a37fc3@aol.com> <6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu> Message-ID: <4472311C.3020408@bohol.ph> After demonstrating PGDP to some people, I got in touch with an NGO who likes to scan their entire holdings, and make it available on the web. Has anybody on this list experience with outsourcing scanning jobs (on a larger scale)? I am looking at a project which includes about half a million pages that need to be digitized. Ofcourse I am not going to scan that much myself, and I heard prices at that scale can be as low as a few cents per page when done in the Philippines. Has anybody prepared documents describing quality control processes, etc., for such a bulk process. Hopefully, much of the material will be made available on-line, (although it will not copyright clear with PG, I don't expect issues with copyright). I may even set up a 'Distributed Proofreading' system for it. Jeroen. From davidrothman at yahoo.com Mon May 22 15:39:53 2006 From: davidrothman at yahoo.com (David H. Rothman) Date: Mon May 22 15:47:09 2006 Subject: DotReader.com adr. [Re: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries]] Message-ID: <5eff08fa0605221539t6d8ae4e4l19de369b4d5f6c08@mail.gmail.com> Actually that's http://www.dotreader.com , not .org --mea culpa--and if PG really cares about open source, then it should encourage strong open source efforts of the OSoft variety rather than just wait until they catch on. Here's a little two-man software house in Tacoma, Washington, gambling hundreds of thousands of dollars on an open-source reader that can do far more than Plucker, allowing blogs and forums to be embedded inside books. Plucker has many appreciative users, but dotReader/OpenReader will be of far greater importance to commercial publishers, who are already starting to show interest. In turn, that'll be wonderful for PG works and other public domain books. dotReader reader can work with many kinds of books while improving the user experience. dotReader uses a turbocharged version of existing e-book standards that techies and publishers have thrashed aroundfor years. It's the best of all worlds: open source for programmers and a powerful free reader for users--and e-book standards similar to existing ones for publishers. Plus, dotReader can handle other XML/CSS-related formats as well. > We serverd 89504 plucker books in May 2006. I think you'll do much better with OpenReader available as well. OSoft's e-reader for the format is a thing of beauty, and, as noted, it'll be free to download. Plus, another awesome implementation is planned via the FBReader, which, according to the Wikipedia, is catching on among Nokia 770 users. See http://only.mawhrin.net/fbreader/plans.html and http://en.wikipedia.org/wiki/Plucker. Thanks, David David Rothman | davidrothman@openreader.org | 703-370-6540 TeleRead: http://www.teleread.org/blog On 5/22/06, Marcello Perathoner wrote: > > David H. Rothman wrote: > > > > See the true details for yourself at http://www.dotreader.org. > > It says: > > www.dotreader.org > > This page is parked free, courtesy of GoDaddy.com > > Did the .reader bubble already burst? > > > > I'd love for Gutenberg itself to offer OpenReader format > > We already offer most books in plucker. That's because they are open > format, widely deployed and offer an open toolchain. > > We serverd 89504 plucker books in May 2006. > > We'll see about OpenReader once you'll have widely deployed your .reader > and made available an open toolchain. > > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/ab9b63ce/attachment.html From davidrothman at yahoo.com Mon May 22 16:04:52 2006 From: davidrothman at yahoo.com (David H. Rothman) Date: Mon May 22 16:04:56 2006 Subject: dotReader ][Re: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries]] Message-ID: <5eff08fa0605221604v474771cey34c15e144a55c9b@mail.gmail.com> Hi, Greg. Actually dotReader is ANTI-bubble. E-books won't catch on big until they the technology improves, among other things; and the dotReader/OpenReader combo can offer major interactivity in a big way. I've already mentioned blogs and forums embedded inside e-books (and available even for off-line reading). As for the desirability of interactivity and multimedia for consumers--that's no small factor, according to the esteemed Greg Newby. dotReader/OpenReader will oblige in both areas. Meanwhile see below from USA Today, especially the last paragraph: "To be compelling enough to trigger any kind of mass migration away from paper books, e-books will need to have compelling characteristics regular books don't, such as interactivity and mixed-media capabilities, Newby and others said." Can we really trust this guy? ;-) I'd like to think so. I hope he and others in PG will be open-minded about both the format and the implementations, as opposed to letting the trolls and their buddies set the tone for PG. Cheers, David http://thelifeofbooks.blogspot.com/2006/04/whats-trouble-with-ebooks.html We don't see a lot of resistance to electronic books per se," said Gregory Newby, director of Project Gutenberg, the first electronic library, which offers 20,000 titles for free. "What we see are limiting factors in specialized readers and difficulty in finding good stuff to read." Plus, "publishers are charging the same amount for an electronic book as for a paper book." There are other challenges too. With e-book readers, people may be able to store numerous texts in one small device and do things to make reading easier, such as changing type size, something that's impossible with print. But people also like to share books with others, resell them and hand them down to their children, he said. "When you buy a book, you have it forever," Newby said. "With these electronic books, you often are prevented from doing those things that you can do with regular books. What happens when my device breaks?...Books aren't just words on a page. They are things you can trade, share and store for later." To be compelling enough to trigger any kind of mass migration away from paper books, e-books will need to have compelling characteristics regular books don't, such as interactivity and mixed-media capabilities, Newby and others said. On 5/22/06, Greg Newby wrote: > On Mon, May 22, 2006 at 11:37:28PM +0200, Marcello Perathoner wrote: > > David H. Rothman wrote: > > > > > > > See the true details for yourself at http://www.dotreader.org. > > > > It says: > > > > www.dotreader.org > > > > This page is parked free, courtesy of GoDaddy.com > > > > Did the .reader bubble already burst? > > That's the funniest thing I've read all month. Thanks! > > dotreader.com seems a better source to start. I'm not sure > if it has true details or not, though. > -- Greg > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joshua at hutchinson.net Mon May 22 16:04:51 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon May 22 16:26:37 2006 Subject: [gutvol-d] Re: DotReader.com adr. [Re: OpenReader vs. the troll in the basement [re: [gutvo Message-ID: <20060522230451.E698AEE422@ws6-1.us4.outblaze.com> The show-stopper isn't the relative popularity (people have successfully argued lots of things in PG's history that didn't seem too popular at first blush). What we *need* is a conversion utility (the "open tool-chain" that Marcello refers to). I don't expect a conversion tool that can take our somewhat sloppy (at times) encoded plain text. While wonderful, that is asking too much. But I *do* want a conversion tool that can run from PGTEI to OpenReader format. If the tool chain is open source and runs on Linux, we can probably have a quick and dirty converter up relatively quickly. As always, I'd be happy to test the heck out of it. Josh > ----- Original Message ----- > From: "David H. Rothman" > To: "Project Gutenberg Volunteer Discussion" > Subject: DotReader.com adr. [Re: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries]] > Date: Mon, 22 May 2006 18:39:53 -0400 > > > Actually that's http://www.dotreader.com , not .org --mea culpa--and if PG > really cares about open source, then it should encourage strong open source > efforts of the OSoft variety rather than just wait until they catch on. > > Here's a little two-man software house in Tacoma, Washington, gambling > hundreds of thousands of dollars on an open-source reader that can do far > more than Plucker, allowing blogs and forums to be embedded inside books. > > Plucker has many appreciative users, but dotReader/OpenReader will be of far > greater importance to commercial publishers, who are already starting to > show interest. > > In turn, that'll be wonderful for PG works and other public domain books. > dotReader reader can work with many kinds of books while improving the user > experience. > > dotReader uses a turbocharged version of existing e-book standards that > techies and publishers have thrashed aroundfor years. > > It's the best of all worlds: open source for programmers and a powerful free > reader for users--and e-book standards similar to existing ones for > publishers. Plus, dotReader can handle other XML/CSS-related formats as > well. > > > We serverd 89504 plucker books in May 2006. > > I think you'll do much better with OpenReader available as well. OSoft's > e-reader for the format is a thing of beauty, and, as noted, it'll be free > to download. > > Plus, another awesome implementation is planned via the FBReader, which, > according to the Wikipedia, is catching on among Nokia 770 users. See > http://only.mawhrin.net/fbreader/plans.html and > http://en.wikipedia.org/wiki/Plucker. > > Thanks, > David > > David Rothman | davidrothman@openreader.org | 703-370-6540 > TeleRead: http://www.teleread.org/blog > > On 5/22/06, Marcello Perathoner wrote: > > > > David H. Rothman wrote: > > > > > > > See the true details for yourself at http://www.dotreader.org. > > > > It says: > > > > www.dotreader.org > > > > This page is parked free, courtesy of GoDaddy.com > > > > Did the .reader bubble already burst? > > > > > > > I'd love for Gutenberg itself to offer OpenReader format > > > > We already offer most books in plucker. That's because they are open > > format, widely deployed and offer an open toolchain. > > > > We serverd 89504 plucker books in May 2006. > > > > We'll see about OpenReader once you'll have widely deployed your .reader > > and made available an open toolchain. > > > > > > -- > > Marcello Perathoner > > webmaster@gutenberg.org > > > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d@lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From vze3rknp at verizon.net Mon May 22 16:34:02 2006 From: vze3rknp at verizon.net (Juliet Sutherland) Date: Mon May 22 16:33:50 2006 Subject: DotReader.com adr. [Re: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries]] In-Reply-To: <5eff08fa0605221539t6d8ae4e4l19de369b4d5f6c08@mail.gmail.com> References: <5eff08fa0605221539t6d8ae4e4l19de369b4d5f6c08@mail.gmail.com> Message-ID: <44724A6A.2030107@verizon.net> David H. Rothman wrote: > It's the best of all worlds: open source for programmers and a > powerful free reader for users--and e-book standards similar to > existing ones for publishers. Plus, dotReader can handle other > XML/CSS-related formats as well. If dotReader can handle XML/CSS-related formats, then many of the more recent PG books are already available for it, since they have been produced in xhtml. Most of the output from DP these days comes with an xhtml version. JulietS From davidrothman at yahoo.com Mon May 22 16:34:02 2006 From: davidrothman at yahoo.com (David H. Rothman) Date: Mon May 22 16:34:04 2006 Subject: OpenReader vs. the troll in the basement [Re: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries]] Message-ID: <5eff08fa0605221634r3c2ea804l4753d5d0142badd1@mail.gmail.com> BIRD: In the snippet below, you simply LINKED to a blog. dotReader, OpenReader's first implementation, will CONTAIN blogs and forums and make them readable to users even when they're offlline. They'll be embedded INSIDE the books, as you well know from the TeleRead Web log. It's one thing to talk about embedded blogs and forums. It's another thing to offer them in a viable reader, as OSoft will be doing via dotReader. GREG: Well, as I said, beware of trolls and friends setting the tone for PG. I trust you'll be more open-minded in choosing technologies and formats. dotReader will offer the interactivity and multimedia capabilities you were talking up in USA Today. Call me at 703-370-6540 if you want to begin some friendly and constructive dialogue. Thanks, David Bowerbird wrote: > > what i said was that you were making a big deal about > "putting a blog inside an e-book", when that is actually > a somewhat trivial thing to do. > > look, i've inserted a blog inside of this e-mail: > > http://www.buzzmachine.com/index.php/2006/05/19/the-book-is-dead-long-live-the-book/ > > From Bowerbird at aol.com Mon May 22 17:05:22 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 22 17:05:30 2006 Subject: [gutvol-d] re: blogs in e-books Message-ID: <491.561e9f.31a3abc2@aol.com> david said: > BIRD: In the snippet below, you simply LINKED to a blog. and in most cases, that will be quite good enough, thanks, because people can then read the blog there, and comment. but since you've brought it up, will these "blogs and forums" that are "contained" within openreader e-books from osoft be addressable by the general public using web-browsers? or, like the current osoft thoutreader, will people need to use that particular piece of software in order to view the comments? a clear answer will tell us a lot about your attitude on "lock-in". > dotReader, OpenReader's first implementation, will > CONTAIN blogs and forums and make them readable > to users even when they're offlline. it's not hard to implement that... the app just downloads the content, and saves it for offline presentation, then uploading what is to be posted, capabilities already included in many r.s.s. readers and blogging software... depending on if people actually use it, it could end up being a neat technology, much like all the other instantiations of it i discussed in my earlier post in this thread. my point was that it's not difficult to implement. it's not. so why are you hyping it like it's such a big deal? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/4f5a2da0/attachment-0001.html From davidrothman at yahoo.com Mon May 22 17:10:42 2006 From: davidrothman at yahoo.com (David H. Rothman) Date: Mon May 22 17:17:10 2006 Subject: [gutvol-d] Re: DotReader.com adr. [Re: OpenReader vs. the troll in the basement [re: [gutvo In-Reply-To: <20060522230451.E698AEE422@ws6-1.us4.outblaze.com> References: <20060522230451.E698AEE422@ws6-1.us4.outblaze.com> Message-ID: <5eff08fa0605221710j10fadbfah391bc01a642adbf7@mail.gmail.com> Thanks to both Juliet and Josh for their useful thoughts. Let me add that OpenReader files from an XML/CSS-related format or another would be highly desirable if one cares about format standards as well as reader standards. Format standards would be one way to help public domain books get closer to the mainstream of publishing. Many public librararies dispense e-books only in a few formats. I hate the idea of their paying for public domain books, and format standardization could help. I want libraries to be able to give away public domain books for keeps rather than just loan them. While I hope that dotReader will catch on, let's think format as well to be safe. Either way, though, with or without OpenReader, dotReader could be very good for PG and DP alike. Sooner or later OpenReader will catch on through other means, and the readers (both human and software) will be ready. David David Rothman davidrothman@openreader.org dr@teleread.org http://www.teleread.org/blog 703-370-6540 JulietS wrote: If dotReader can handle XML/CSS-related formats, then many of the more recent PG books are already available for it, since they have been produced in xhtml. Most of the output from DP these days comes with an xhtml version. JulietS From Bowerbird at aol.com Mon May 22 17:21:56 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 22 17:22:03 2006 Subject: [gutvol-d] re: blogs in e-books Message-ID: <45a.12dcc5d.31a3afa4@aol.com> by the way, if anyone wants to see an example of an experiment aimed at eliciting reader interaction at the stage of a "finished first draft" of a book, see: > http://www.futureofthebook.org/gamertheory/ this project of "the institute for the future of the book" just went up today. in addition to an ability to comment on any _paragraph_ of the book, there is a general forum. i'm of the opinion that most books probably will not be able to find a sufficiently large number of commenters to warrant the work that an author will have to do to open up the process of writing to such interaction. but it's an interesting experiment. and then of course there will always be some _major_ exceptions, on the order of chris anderson and the blog he has been keeping while writing his "long tail" book. because of the great exposure the idea got from its description in a "wired" cover story last year, and because anderson just happens to be the editor of "wired", and because -- let's face it -- the idea is a _very_ compelling one that therefore subsequently has been written up all over the place, anderson's "long tail" blog has been a tremendously exciting space. but my sense is that this will be the _exception_, rather than the rule. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/0c0c48db/attachment.html From Bowerbird at aol.com Mon May 22 17:32:53 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 22 17:32:58 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <30d.5ad89a0.31a3b235@aol.com> another comment on "the pages of books reading each other"... > http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2006& post=2006-05-22,1 -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/685285df/attachment.html From bruce at zuhause.org Mon May 22 19:11:48 2006 From: bruce at zuhause.org (Bruce Albrecht) Date: Mon May 22 19:11:51 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <439.1c91c11.31a3713c@aol.com> References: <439.1c91c11.31a3713c@aol.com> Message-ID: <17522.28516.463762.723548@celery.zuhause.org> Bowerbird@aol.com writes: > > And the latest estimated I have received show that Google's > > total number of books has just recently passed 50,000 > > i do believe you misread that. 50,000 public-domain titles, > with another 42,000 under copyright, for a total of 92,000. My searching found 50,000 public domain titles available as complete books, and another 42,000 that should have been available as complete books because they were published prior to 1923, but were only visible in snippet view. I have no idea how many books Google scanned published after 1922 which are probably PD because the copyright was apparently not renewed, nor the number of books scanned even though the book is still under copyright. From Bowerbird at aol.com Mon May 22 20:46:11 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 22 20:46:19 2006 Subject: [gutvol-d] re: blogs in e-books Message-ID: <4a2.583199.31a3df83@aol.com> david said: > You unwittingly made the point in linking via e-mail > to a browser-readable blog. Same blog could appear > in dotReader or another OpenReader implementation. that doesn't answer the question. will each and every blog/forum inside an openreader book be accessible with a general web-browser? because as far as i know, i can't put a link in this e-mail that would take the user to a comment in a thoutreader e-book. but if it can be done, then by all means, please show us. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/321052f7/attachment.html From Bowerbird at aol.com Mon May 22 20:52:42 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 22 20:52:48 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <45d.13253d6.31a3e10a@aol.com> bruce said: > I have no idea how many books Google scanned > published after 1922 which are probably PD > because the copyright was apparently not renewed it would be foolish for google to take the risk of showing books published after 1922. if even one had been renewed, it would become ammunition for the other side. > nor the number of books scanned even though > the book is still under copyright. we could extrapolate from the ratio of public-domain to copyrighted titles in the libraries, but that would assume that they aren't taking that into consideration, and they might well be. i believe they had said that they would concentrate on public-domain titles first. (or maybe that's just what michael _said_ they said.) at any rate, i'm happy to sit back and wait while they continue to do more scanning... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/d9962c85/attachment.html From Bowerbird at aol.com Mon May 22 23:51:57 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon May 22 23:52:09 2006 Subject: [gutvol-d] re: blogs in e-books Message-ID: <43c.17978a8.31a40b0d@aol.com> david said: > You're extrapolating from a specific setup > at a specific company--one that will change in time. i have nothing else to go on. and i have no indication of the way -- if any -- that it "will change in time"... nothing technical has been written up. unlike many open-source programs, this one would appear to be being developed in secret. nothing technical about the app is written up... and while i'm quite happy to wait until the capability actually appears in an app that runs on my machine, so i can see what it does and exactly how it does it, you meanwhile are busy spewing glowing p.r.-speak. which, when you "answer" a hard question, turns to mush. if you can put a link in an e-mail here that takes a user to a comment that has been made in an osoft e-book -- let's say the one that i made in the "my antonia" demo -- then do it. otherwise, admit that -- at this point in time, with that e-book, anyway -- that capability is not present. or not. the answer here is of little or no consequence, because whether the annotations are viewable with an ordinary web-browser or not, the capability to code it (either way) is rather elementary to implement, which is the point that i made at the outset. if you'd like to see a demo program demonstrating the ease with which this can be done, i would be willing to code one up for you... but you don't really want me to steal dotreader thunder, do you? why not let mark carey roll out his version first? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/fd2fe877/attachment.html From davidrothman at yahoo.com Tue May 23 01:07:06 2006 From: davidrothman at yahoo.com (David H. Rothman) Date: Tue May 23 01:07:09 2006 Subject: Bowerbird's development schedule and the $200K he's demanding [Re: [gutvol-d] re: blogs in e-books] Message-ID: <5eff08fa0605230107o4769a8afofab749d48e78e244@mail.gmail.com> As for your doing a demo, hey, be my guest. OSoft and others have long since carried out the basic concept of shared annotation, and the SA-capable dotReader is on the way from OSoft. Of course I'm actually concerned that your demo would hurt the cause of shared annotations by showing it off less than optimally, whether it was brower- or reader-based or both. And you're not going to have the related standards infrastructure that dotReader will. Who knows, you might even want to give us, er, "lock-in." Now, back to some grubby details from the ZML world, such as the rival reader app that you've spent so many hours trolling for--against OpenReader. What's your development schedule for your reader, so we can guard against "hype" and "vaporware"? Isn't true you've taken forever to get your reader out? And beyond people not paying you $200K or whatever, how come you won't share the source code for your rival reader? Are you ashamed of it? I still don't have a satisfactory answer. Code can be dear to one's heart, but still is a long way from poetic musings. Why must you keep your brillance to yourself? Don't you believe in open source? Answer those questions, and then I suggest that we wind down this thread in the interest of bandwidth and time--both mine and others'. PG people are very welcome to write me privately or phone me--especially Greg, if he's really serious about the comments he made to U.S. Today extolling interactivity. Here's PG's chance to adopt a powerful format (OpenReader) and enjoy readers worthy of it (dotReader and in the future FBReader). I'm all ears as far as suggestions from Greg or anyone else, and I know others will be as well. David Rothman | davidrothman@openreader.org | 703-370-6540 OpenReader: http://www.openreader.org OR's first implementer: http://www.dotreader.org TeleBlog: http://www.teleread.org/blog From hart at pglaf.org Tue May 23 06:54:02 2006 From: hart at pglaf.org (Michael Hart) Date: Tue May 23 06:54:05 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <45d.13253d6.31a3e10a@aol.com> References: <45d.13253d6.31a3e10a@aol.com> Message-ID: On Mon, 22 May 2006 Bowerbird@aol.com wrote: > bruce said: >> I have no idea how many books Google scanned >> published after 1922 which are probably PD >> because the copyright was apparently not renewed I seem to recall an earlier report from someone who did lots of searches for Google books and determined that 88% of them were published after 1922. Or at least were being treated as copyrighted books. mh From hart at pglaf.org Tue May 23 06:56:53 2006 From: hart at pglaf.org (Michael Hart) Date: Tue May 23 06:56:55 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <45d.13253d6.31a3e10a@aol.com> References: <45d.13253d6.31a3e10a@aol.com> Message-ID: More. . .as for that 88% figure, I think that may have alluded to the number of books Google has at their potential disposal, rather than the number of those that have been scanned yet, or it may also take duplications into account. Sorry, been really busy, can't recall all the details. . . . mh From hart at pglaf.org Tue May 23 07:03:40 2006 From: hart at pglaf.org (Michael Hart) Date: Tue May 23 07:03:41 2006 Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <17522.28516.463762.723548@celery.zuhause.org> References: <439.1c91c11.31a3713c@aol.com> <17522.28516.463762.723548@celery.zuhause.org> Message-ID: On Mon, 22 May 2006, Bruce Albrecht wrote: > Bowerbird@aol.com writes: > > > And the latest estimated I have received show that Google's > > > total number of books has just recently passed 50,000 > > > > i do believe you misread that. 50,000 public-domain titles, > > with another 42,000 under copyright, for a total of 92,000. Then I was probably right to count Google's total as ~100,000 in my own public estimations, though I would prefer counts of downloadable books to avoid Google's new policy of: "Google Book Search is a means for helping users discover books, not to read them online and/or download them." > My searching found 50,000 public domain titles available as complete > books, and another 42,000 that should have been available as complete > books because they were published prior to 1923, but were only visible > in snippet view. I have no idea how many books Google scanned > published after 1922 which are probably PD because the copyright was > apparently not renewed, nor the number of books scanned even though > the book is still under copyright. Are you saying that there are actually 50,000 downloadable full text Google eBooks? Any idea of their level of accuracy? Please allow me to renew the request from myself and LIS PhD Greg Newby, CEO of Project Gutenberg, for a copy of the list we can look over, even if we cannot make it public. Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From jon at noring.name Tue May 23 07:24:03 2006 From: jon at noring.name (Jon Noring) Date: Tue May 23 07:24:09 2006 Subject: DotReader.com adr. [Re: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries]] In-Reply-To: <44724A6A.2030107@verizon.net> References: <5eff08fa0605221539t6d8ae4e4l19de369b4d5f6c08@mail.gmail.com> <44724A6A.2030107@verizon.net> Message-ID: <1858102527.20060523082403@noring.name> Juliet wrote: > David H. Rothman wrote: >> It's the best of all worlds: open source for programmers and a >> powerful free reader for users--and e-book standards similar to >> existing ones for publishers. Plus, dotReader can handle other >> XML/CSS-related formats as well. > If dotReader can handle XML/CSS-related formats, then many of the more > recent PG books are already available for it, since they have been > produced in xhtml. Most of the output from DP these days comes with an > xhtml version. The first vocabulary supported by OpenReader, called the "Basic Content Document 1.0" (BCD) is a structurally-oriented subset of XHTML 1.0, and compatible, as best as possible, with XHTML 2.0 currently being developed by W3C. It is also quite compatible with OEBPS 1.2. The draft BCD spec is located at: http://openreader.org/spec/bcd10.html We plan to create an "Extended Content Document" vocabulary by simply adding XLink support plus some OpenReader namespace tags to markup important things that XHTML does not natively support, such as page breaks and boundaries and numbering (e.g., for preserving where page breaks occurred in the original paper book), line breaks as occurred in the original (
is not sufficient for this as I could talk about another time), other noteworthy "mile markers", inline indexing information (so OpenReader "readers" can assemble a people- authored index on the fly), etc., etc. Anyway. Feedback on BDD from those in DP who produce XHTML versions of books is more than welcome! Of course, looking for those willing to do a careful vetting of the BCD spec (and anyone who does becomes a contributor to be added to the list of contributors in the spec.) Thanks. Jon Noring OpenReader Consortium From gbnewby at pglaf.org Tue May 23 07:51:10 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Tue May 23 07:51:12 2006 Subject: [gutvol-d] USA Today; In-Reply-To: <5eff08fa0605230107o4769a8afofab749d48e78e244@mail.gmail.com> References: <5eff08fa0605230107o4769a8afofab749d48e78e244@mail.gmail.com> Message-ID: <20060523145110.GA21391@pglaf.org> On Tue, May 23, 2006 at 04:07:06AM -0400, David H. Rothman wrote: > .... > PG people are very welcome to write me privately or phone > me--especially Greg, if he's really serious about the comments he made > to U.S. Today extolling interactivity. Here's PG's chance to adopt a > powerful format (OpenReader) and enjoy readers worthy of it (dotReader > and in the future FBReader). I'm all ears as far as suggestions from > Greg or anyone else, and I know others will be as well. I enjoyed reading those quotes, and they're pretty accurate from an interview I did a few weeks ago concerning launch of the newest Sony eBook reader with electronic ink. (I was just in Tokyo two weeks ago, and was unable to find one of these units for sale. I didn't look all that hard, but peered closely in the PDA section of Bic Camera which is a huge electronics chain store). They somehow recycled the article for USA Today -- nice to see. Of course I'm serious about limitations of eBook readers, and am against any format that is one-way, closed, non-fixable/editable, etc. This is a thread in the "about" essays Michael and I worked on: http://www.gutenberg.org/about , with a key theme being "unlimited distribution." For the OpenReader format, as Marcello said there is no conceptual resistance to using this as a "convert to" format at gutenberg.org, just as plucker is. All we need is a clear and preferably open source processing chain that we can insert into the ibiblio.org site. Also, of course, a reasonable support community so that PG help staff (me, George & Marcello) don't end up being too challenged in supporting the format. In short, as you've heard before, you should feel encouraged to "go for it." -- Greg From hart at pglaf.org Tue May 23 07:58:36 2006 From: hart at pglaf.org (Michael Hart) Date: Tue May 23 07:58:37 2006 Subject: DotReader.com adr. [Re: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries]] In-Reply-To: <5eff08fa0605221539t6d8ae4e4l19de369b4d5f6c08@mail.gmail.com> References: <5eff08fa0605221539t6d8ae4e4l19de369b4d5f6c08@mail.gmail.com> Message-ID: On Mon, 22 May 2006, David H. Rothman wrote: > Actually that's http://www.dotreader.com , not .org --mea culpa--and if PG > really cares about open source, then it should encourage strong open source > efforts of the OSoft variety rather than just wait until they catch on. However, Mr. Rothman does not take his own advice here, but only supports the particular open source projects that support him. > Here's a little two-man software house in Tacoma, Washington, gambling > hundreds of thousands of dollars on an open-source reader that can do far > more than Plucker, allowing blogs and forums to be embedded inside books. Again, Mr. Rothman should take his own advice. . .if he were really doing his bit to suport this "little two-man software house in Tacoma" he would not be mentioning them as anonymous creatures sitting behind keyboards. > Plucker has many appreciative users, but dotReader/OpenReader will be of far > greater importance to commercial publishers, who are already starting to > show interest. This is what everyone says about every project. Let's not confuse the press releases with reality. BTW, some people are totally amazed at how many Plucker files we send out. I got an independent comment on that earlier this week. However, to address Mr. Rothman's point that we should promote this one particular piece of open source programming, with or without the hundreds of thousands of dollars he mentioned, with or without programmers' names, Project Gutenberg is not in the business of establising businesses. However, on the other hand, if Mr. Rothman were to read the Newsletters, he would know that it is only one stop to putting in an announcement. "Better to light a single candle, than to curse the darkness." > In turn, that'll be wonderful for PG works and other public domain books. > dotReader reader can work with many kinds of books while improving the user > experience. > > dotReader uses a turbocharged version of existing e-book standards that > techies and publishers have thrashed aroundfor years. And this means. . .. ? > It's the best of all worlds: open source for programmers and a powerful free > reader for users--and e-book standards similar to existing ones for publishers. > Plus, dotReader can handle other XML/CSS-related formats as well. Similar? Will it be able to use these similar files? >> We serverd 89504 plucker books in May 2006. > > I think you'll do much better with OpenReader available as well. OSoft's > e-reader for the format is a thing of beauty, and, as noted, it'll be free > to download. So is Adobe Acrobat Reader. From hart at pglaf.org Tue May 23 08:03:21 2006 From: hart at pglaf.org (Michael Hart) Date: Tue May 23 08:03:22 2006 Subject: [gutvol-d] Outsourcing scanning In-Reply-To: <4472311C.3020408@bohol.ph> References: <48c.693268.31a37fc3@aol.com> <6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu> <4472311C.3020408@bohol.ph> Message-ID: Several people I know have tried oursourcing scanning, OCR, etc., but all with disappointing results. Sorry, mh On Mon, 22 May 2006, Jeroen Hellingman (Mailing List Account) wrote: > > After demonstrating PGDP to some people, I got in touch with an NGO who likes > to scan their entire holdings, and make it available on the web. > > Has anybody on this list experience with outsourcing scanning jobs (on a > larger scale)? I am looking at a project which includes about half a million > pages that need to be digitized. Ofcourse I am not going to scan that much > myself, and I heard prices at that scale can be as low as a few cents per > page when done in the Philippines. Has anybody prepared documents describing > quality control processes, etc., for such a bulk process. Hopefully, much of > the material will be made available on-line, (although it will not copyright > clear with PG, I don't expect issues with copyright). I may even set up a > 'Distributed Proofreading' system for it. > > Jeroen. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From davidrothman at pobox.com Mon May 22 13:41:32 2006 From: davidrothman at pobox.com (David H. Rothman) Date: Tue May 23 08:06:44 2006 Subject: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries] In-Reply-To: <20060522164050.939444851.davidrothman@pobox.com> References: <1bc.5089b96.31a33cbb@aol.com> <20060522150104.1080651974.davidrothman@pobox.com> <20060522150149.325320550.davidrothman@pobox.com> <20060522150332.110483112.davidrothman@pobox.com> <20060522151120.333718414.davidrothman@pobox.com> <20060522151447.591623163.davidrothman@pobox.com> <20060522151743.130547599.davidrothman@pobox.com> <20060522151814.1734039935.davidrothman@pobox.com> <20060522152006.1760790518.davidrothman@pobox.com> <20060522152114.1379513363.davidrothman@pobox.com> <20060522152413.1383410050.davidrothman@pobox.com> <20060522152716.66366918.davidrothman@pobox.com> <20060522153443.1213060337.davidrothman@pobox.com> <20060522153518.272947584.davidrothman@pobox.com> <20060522153553.2079047670.davidrothman@pobox.com> <20060522153641.560443431.davidrothman@pobox.com> <20060522155003.2005252714.davidrothman@pobox.com> <20060522160310.484640607.davidrothman@pobox.com> <20060522160741.417495629.davidrothman@pobox.com> <20060522161008.1626692064.davidrothman@pobox.com> <20060522161155.1281312296.davidrothman@pobox.com> <20060522161648.620467403.davidrothman@pobox.com> <20060522163407.1897590220.davidrothman@pobox.com> <20060522164024.966193868.davidrothman@pobox.com> <20060522164045.1760429003.davidrothman@pobox.com> <20060522164050.939444851.davidrothman@pobox.com> Message-ID: <20060522164132.1934380741.davidrothman@pobox.com> > that "strategy" was worn out years ago here.. But you're STILL a troll ;-) And a censor, too. Your reverted to the old subject line without the T Word, and I don't mean "TeleRead." Doesn't this suggest that lists should be run with a little bit of order in mind? Well, blog areas, too, including the TeleBlog. Here's the deal. You gratuitously attacked OpenReader out of the blue. After my present message, we'll both have had our say; and now I think the PG list should get back to being the PG list. Meanwhile thanks for documenting that your source code is not open, and that you're really after 200K rather than 50K, if you're serious about selling the code. > as for my app itself, i've always said it's available for free. but not disclosure of the source code? this is bizarre. evil corporate two-guy osoft will offer downloads of dotreader for free and publicly reveal the source code of the basic reader. and yet the e.e. cummings of the performance poetry circuit wants 200K for his code. i am worried. surely we are all doomed to the maw of mammon if even poets are demanding 200k. of course, the real reason could be that you're ashamed of the source code--hence, the 200K price, if financial gain isn't the object. sure you don't want to share your app's code? > they should just go off and write it themselves, so > i don't think you could call me a very good businessman. Oh, well, so much for your added value. > p.s. evidently, book expo ain't keeping you very busy. kinda slow around the openreader booth, is it? :+) ROFL. Um, the show ended yesterday. I drove Jon Noring out to see Mt. Vernon (almost all work up until now), and, after a TeleBlog post, I'm gonna return to follow-up. Wait. Might do a few of the just-received notes first. Not sure. > considering the costs involve application of heavy markup... > i'm not sure how you will ever be able to convince anyone > that your cost-benefit ratio will be the best one available. Hey, we care very much about creation tools and the like to simplify things. Your own format isn't up to the range of content that OpenReader can handle, thereby reducing the chances of collecting your $200K, if that's what you want. Good luck at it, however. And now I suggest that you do the Netiquette routine and avoid a reply, now that we've both had our say. Remember, you were the one who broached these issues. Again, best of luck. Despite more than a little provocation, my big interest is in advancing OpenReader rather than harming ZML. Thanks, David David Rothman | davidrothman@openreader.org | 703-370-6540 http://www.openreader.org http://www.teleread.org/blog ------------Original Message------------ From: Bowerbird@aol.com To: gutvol-d@lists.pglaf.org, Bowerbird@aol.com Date: Mon, May-22-2006 12:11 PM Subject: re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries geez, david, you'll have to do _something_ more original than calling me a "troll" here; that "strategy" was worn out years ago here... as for your fantasy that i "object" to openreader because i have "competitive business interests", what exactly would those "business interests" be? my source code is available for $200,000, (not $50k), but i will tell anyone who wants to buy it not to bother, that they should just go off and write it themselves, so i don't think you could call me a very good businessman. as for my app itself, i've always said it's available for free. again, not a good businessman. what can i say? i'm a poet. a good businessman would get a booth at book expo, and hawk the product there. just like you're doing now. for the record, i wish the osoft people all the best. you were very lucky they came along when they did, and i'll be glad when they finally turn your precious "openreader" format into something besides vapor. at that point, we will be able to measure its benefits and its costs, to decide how worthwhile its cost-benefit will be. considering the costs involve application of heavy markup, and similar benefits can be delivered with lighter markup, i'm not sure how you will ever be able to convince anyone that your cost-benefit ratio will be the best one available. but hey, microsoft has been able to do that for _years_, so don't give up... -bowerbird p.s. evidently, book expo ain't keeping you very busy. kinda slow around the openreader booth, is it? :+) _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From davidrothman at pobox.com Mon May 22 17:38:39 2006 From: davidrothman at pobox.com (David H. Rothman) Date: Tue May 23 08:06:45 2006 Subject: dotReader [Re: [gutvol-d] re: blogs in e-books] Message-ID: <5eff08fa0605221738w772f25d0oe789fe9838507cc8@mail.gmail.com> > so why are you hyping it like it's such a big deal? But here we're talking about these capabilities skillfully integrated in a reader about to hit the market--and especially a standards-based one. > > but since you've brought it up, will these "blogs and forums" > that are "contained" within openreader e-books from osoft > be addressable by the general public using web-browsers? Yes. You can still have blogs and forums accessible through general Web browsers. Jon would be a better one to discuss this, but obviously it's a server issue. You unwittingly made the point in linking via e-mail to a browser-readable blog. Same blog could appear in dotReader or another OpenReader implementation. > > or, like the current osoft thoutreader, will people need to use > that particular piece of software in order to view the comments? Thanks for helping me make the case for meaningful e-book standards ;-))))))))) This is exactly why OSoft is so keen on the OpenReader format. I msyelf don't want OSoft-format editions of PG books. I want OpenReader-format books, and OSoft agrees--hence, dotReader's use of OpenReader as the featured format. Beyond that, I'll be disgusted if dotReader and OSoft are the only ones able to do justice to the format. Jon's itching to work with FBReader (a planned implementer) and others. > > a clear answer will tell us a lot about your attitude on "lock-in". Well, I don't see how clearer the answer could be that than. If PG wants to free its books from lock-in and add new capabilities, especially interactivity, then a dotReader/OpenReader approach would be the way to go. At the same time PG could still offer other formats, including, yes, ZML, which would be trivial for dotReader to read. Let the marketplace decide. Elsewhere you write: > i'm of the opinion that most books probably will not be able > to find a sufficiently large number of commenters to warrant > the work that an author will have to do to open up the process > of writing to such interaction. but it's an interesting experiment It'll happen if e-books are easier for schools and libraries to use. Tearing down the Tower of eBabel would be a start. Plus, major publishers are talking to us about commercial uses of the interactivity, such as for book clubs. Popularity of a book, by the way, is only one determinant of whether there'll be commenters. The fervor of the particpants matters, and that could happen with Long Tail books, not just best-sellers. As for if:book's cool work, the example you gave is Web based. dotReader and hopefully Sophie will go beyond that. A toast to both! David On 5/22/06, Bowerbird@aol.com wrote: > david said: > > BIRD: In the snippet below, you simply LINKED to a blog. > > and in most cases, that will be quite good enough, thanks, > because people can then read the blog there, and comment. > > but since you've brought it up, will these "blogs and forums" > that are "contained" within openreader e-books from osoft > be addressable by the general public using web-browsers? > > or, like the current osoft thoutreader, will people need to use > that particular piece of software in order to view the comments? > > a clear answer will tell us a lot about your attitude on "lock-in". > > > > dotReader, OpenReader's first implementation, will > > CONTAIN blogs and forums and make them readable > > to users even when they're offlline. > > it's not hard to implement that... > > the app just downloads the content, > and saves it for offline presentation, > then uploading what is to be posted, > capabilities already included in many > r.s.s. readers and blogging software... > > depending on if people actually use it, > it could end up being a neat technology, > much like all the other instantiations of it > i discussed in my earlier post in this thread. > > my point was that it's not difficult to implement. > > it's not. > > so why are you hyping it like it's such a big deal? > > -bowerbird > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > From davidrothman at pobox.com Mon May 22 22:05:42 2006 From: davidrothman at pobox.com (David H. Rothman) Date: Tue May 23 08:06:45 2006 Subject: [gutvol-d] re: blogs in e-books In-Reply-To: <4a2.583199.31a3df83@aol.com> References: <4a2.583199.31a3df83@aol.com> Message-ID: <5eff08fa0605222205k6e9acf5bm1f40f09c6e5c5fc9@mail.gmail.com> > will each and every blog/forum inside an openreader book > be accessible with a general web-browser I've already answered, and you're ill-serving PG members by trimming out my response. The publisher could arrange for the same material to be accessible via a server. As for "each and every"--well, that's up to the publisher. But technically, it certainly would appear to be possible. I'll cc Jon in case I'm somehow wrong on a nuance. What's more, if the reader has the free reader and is using it as a plug-in, he/she could go directly to the book from the Web and, if I'm not mistaken, even reach an anchor. So why not a blog? In other words, the Web and the book converge. Remember, the goal is for the reader to reflect a STANDARD. So we're not talking about the proprietary act. Like the browser, the reader will not be limited to a proprietary format if it honors the standard. >because as far as i know, i can't put a link in this e-mail that would take the user to a comment in a thoutreader e-book. but if it can be done, then by all means, please show us. Nice going, Bowerbird. I've already said this is a server thing re the general-purpose browser. OSoft did not set up its server that way, but that's hardly an indication it can't be done. What's more, I've already described the plug-in approach.You're extrapolating from a specific setup at a specific company--one that will change in time. David ------------Original Message------------ From: Bowerbird@aol.com To: davidrothman@pobox.com, gutvol-d@lists.pglaf.org Cc: Bowerbird@aol.com Date: Mon, May-22-2006 11:46 PM Subject: re: [gutvol-d] re: blogs in e-books david said: > You unwittingly made the point in linking via e-mail > to a browser-readable blog. Same blog could appear > in dotReader or another OpenReader implementation. that doesn't answer the question. will each and every blog/forum inside an openreader book be accessible with a general web-browser? because as far as i know, i can't put a link in this e-mail that would take the user to a comment in a thoutreader e-book. but if it can be done, then by all means, please show us. -bowerbird On 5/22/06, Bowerbird@aol.com wrote: > david said: > > You unwittingly made the point in linking via e-mail > > to a browser-readable blog. Same blog could appear > > in dotReader or another OpenReader implementation. > > that doesn't answer the question. > > will each and every blog/forum inside an openreader book > be accessible with a general web-browser? > > because as far as i know, i can't put a link in this e-mail that > would take the user to a comment in a thoutreader e-book. > > but if it can be done, then by all means, please show us. > > -bowerbird > From davidrothman at pobox.com Mon May 22 22:52:08 2006 From: davidrothman at pobox.com (David H. Rothman) Date: Tue May 23 08:06:46 2006 Subject: [gutvol-d] re: blogs in e-books In-Reply-To: <5eff08fa0605222205k6e9acf5bm1f40f09c6e5c5fc9@mail.gmail.com> References: <4a2.583199.31a3df83@aol.com> <5eff08fa0605222205k6e9acf5bm1f40f09c6e5c5fc9@mail.gmail.com> Message-ID: <20060523015208.1501928951.davidrothman@pobox.com> > > will each and every blog/forum inside an openreader book > > be accessible with a general web-browser > > I've already answered, and you're ill-serving PG members by trimming > out my response. The publisher could arrange for the same material to > be accessible via a server. As for "each and every"--well, that's up > to the publisher. But technically, it certainly would appear to be > possible. I'll cc Jon in case I'm somehow wrong on a nuance. I'll hasten to add--just so it's clear even to newbies--a WEB server. The Web and the book-based blogs/forums converge. A general brower can work just as it would on other blogs/forums reachable through the Web. David From hart at pglaf.org Tue May 23 08:14:34 2006 From: hart at pglaf.org (Michael Hart) Date: Tue May 23 08:14:35 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu> References: <48c.693268.31a37fc3@aol.com> <6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu> Message-ID: On Mon, 22 May 2006, Frank van Drogen wrote: > >> it's clear that google has gotten their legs under them >> in regard to doing the scanning. let's hope that they'll >> get their quality-control under control very soon too... After about 25% of their 6 year schedule to 10 million books, it would appear they are approaching 1% or 100,000 total books, with perhaps half of those easily downloadable, but in varying states of completion and accuracy. If you presume they keep up with Moore's Law, 6 years looks like: Totals Dates Doublings Years 00 Dec 14, 2004 0 0 50,000 Jun 14, 2006 1 1.5 100,000 Dec 14, 2007 2 3 200,000 Jun 14, 2009 3 4.5 400,000 Dec 14, 2010 4 6 which continues as 800,000 Jun 14, 2012 5 7.5 1,600,000 Dec 14, 2013 6 9 3,200,000 Jun 14, 2015 7 10.5 6,400,000 Dec 14, 2016 8 12 12,800,000 Jun 14, 2018 9 13.5 which would put them at over 12 years to their 10 million books in terms of downloadable eBooks. However, if you presume they have 100,000 by June 14, 2006, this would take 18 months off their total time, by counting non-downloadable and non-readable books. > I have found less missing pages and other problems in books from Google then > in those from the MBP and Canadian/IA. They are, however, still far from > perfect. When they get a report regarding a missing or wrongly scanned page > in a PD book; it is apparently up to the providing library to get the problem > sorted out. I've heard report of complete books being rescanned (with the > risk of having another page missing in the end ;) ). I've also heard somebody > mentioning that the full rescanned book was stuck behind the existing one > (rather space consuming, but for DP purposes a lot saver. > > What worries me in this is that Google doesn't seem to care whether pages are > missing or not... as long as they get 99% of the pages from a book stored, > changes are most search terms pointing to the particular book will be > identified. Their interest lies in people purchasing the book via Amazon, Abe > etc. after identifying them via book.google.com. When your goal is simply the appearance of having a lot of books, 99% is a perfectly good business plan. And if your goal is to get people to BUY the books from your other business partners, then there is even less reason for moving to 99+%. > The best quality control I have encountered so far is on Gallica, where > appart from missing pages due to those pages missing in the original scanned > manuscript, I've not encountered incomplete books. I'd be actually > interesting to see how they perfrom their quality control. If you can give me any contact info on Gallica, I will see if I can find out for you. Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From Bowerbird at aol.com Tue May 23 08:14:39 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue May 23 08:14:43 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <49c.66a23e.31a480df@aol.com> michael said: > I seem to recall an earlier report from someone > who did lots of searches for Google books and > determined that 88% of them were published after 1922. i've posted this before, taken from lorcan dempsey's weblog, summarizing an article in d-lib. i always find it again easily by searching his site for "anatomy". > http://orweblog.oclc.org/archives/000800.html > The anatomy of an aggregate collection > September 17, 2005 > Approximately half of the print books > in the combined Google 5 collection > were published after 1974. > Almost three-quarters were published > after the Second World War. > Using the year 1923 as a rough break-off > point between materials that are > out of copyright and materials that are > in copyright [16], more than 80 percent > of the materials in the Google 5 collections are > still in copyright (this is of course an upper bound). if google has scanned roughly 100,000 pre-1923 items, and they were taking books off the shelves randomly, then we could assume they scanned 400,000 post-1923. but if we assume they were doing the pre-1923 items first, 100,000 pre-1923 scanned means 100,000 total scanned. seems to me assuming things does us absolutely no good. but google is _going_ to scan 10+ million books, eventually, so i'm not sure what difference it makes _how_many_ they've done "so far". are we really questioning their _resolve_ here? seems to me that they've proven they are dedicated to this... so attempts to figure out "how many books so far?" are silly. especially since we know that many of the post-1923 items did not have their copyrights renewed -- except that we do _not_ know what percentage, and thus cannot even _assume_ the answer to that important question, not with any certainty. if we say that half of the post-1923 books were not renewed, then that means that 60% (20% and 40%) are not in copyright. if we say that 1/3 of the post-1923 books were not renewed, then that means that 53% (20% and 33%) are not in copyright. if we say that 2/3 of the post-1923 books were not renewed, then that means that 86% (20% and 66%) are not in copyright. not that the answer would matter any, because due to the litigious arena into which we have allowed the project to be thrown, there's probably no way google would be likely to take the risk of showing _any_ of the orphaned material. so we're back to the original 20% that is pre-1923 and clear. of course, the answer to this is to give google an immunity, to let them serve as the "test-bed" that will act to bring out any claims of copyrighted material that might be lurking... in other words, let google show each book, in full, _until_ some _proof_ of copyright is rendered by another party. (and i do mean proof, and not just some bullshit claim...) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/03351ed8/attachment.html From hart at pglaf.org Tue May 23 08:18:24 2006 From: hart at pglaf.org (Michael Hart) Date: Tue May 23 08:18:25 2006 Subject: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries] In-Reply-To: <48b.696020.31a3809a@aol.com> References: <48b.696020.31a3809a@aol.com> Message-ID: It is quite obvious that many people view both Bowerbird's and Mr. Rothman's comments as attacks, even Bowerbird and Mr. Rothman, though rarely would the mention include self-reflection on this matter. Honey versus vinegar? The real question is whether either of them, or the issues they promote, bring any real advances to the world of eBooks. On Mon, 22 May 2006 Bowerbird@aol.com wrote: > david said: >> You gratuitously attacked OpenReader out of the blue. > > no, i didn't. > > but i'm sure you'd like to spin it that way. > > what i said was that you were making a big deal about > "putting a blog inside an e-book", when that is actually > a somewhat trivial thing to do. > > look, i've inserted a blog inside of this e-mail: >> > http://www.buzzmachine.com/index.php/2006/05/19/the-book-is-dead-long-live-the-book/ > > -bowerbird > From hart at pglaf.org Tue May 23 08:20:51 2006 From: hart at pglaf.org (Michael Hart) Date: Tue May 23 08:20:53 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <48c.693268.31a37fc3@aol.com> References: <48c.693268.31a37fc3@aol.com> Message-ID: On Mon, 22 May 2006 Bowerbird@aol.com wrote: > frank said: >> Even that number is a misinterpretation > > thanks for clearing that up for us, frank... > > it's clear that google has gotten their legs under them > in regard to doing the scanning. let's hope that they'll > get their quality-control under control very soon too... > > it is important to keep in mind that 100,000 books is > <1% of the 10.5 million (or more) they'll do eventually; > it's understandable if the process isn't up to speed yet. I wonder how great a percentage of Google's six year plan will have to expire before Mr. Bowerbird will admit that it doesn't look as if Google is even trying to make it to 10 million in 6 years. My own projections show it taking about twice that long, if Mr. Bowerbird is correct, and they have indeed gotten their feet under them already. mh From davidrothman at yahoo.com Tue May 23 08:37:53 2006 From: davidrothman at yahoo.com (David H. Rothman) Date: Tue May 23 08:37:56 2006 Subject: [gutvol-d] USA Today; In-Reply-To: <20060523145110.GA21391@pglaf.org> References: <5eff08fa0605230107o4769a8afofab749d48e78e244@mail.gmail.com> <20060523145110.GA21391@pglaf.org> Message-ID: <5eff08fa0605230837n4b7272dcm917d71ec7cb1b216@mail.gmail.com> Many thanks, Greg. Those are all extremely reasonable conditions, and I'll forward this to the appropriate folks, so they can be in direct touch with you. We're eager to work with PG/DP and blend in well with everyone's workflow. I also agree with you on the need for thinking through the support issues. You could share with us the lessons you've learned from Plucker. - David > For the OpenReader format, as Marcello said there is > no conceptual resistance to using this as a "convert > to" format at gutenberg.org, just as plucker is. All > we need is a clear and preferably open source processing > chain that we can insert into the ibiblio.org site. Also, > of course, a reasonable support community so that PG > help staff (me, George & Marcello) don't end up being > too challenged in supporting the format. > > In short, as you've heard before, you should feel encouraged > to "go for it." > -- Greg On 5/23/06, Greg Newby wrote: > On Tue, May 23, 2006 at 04:07:06AM -0400, David H. Rothman wrote: > > .... > > PG people are very welcome to write me privately or phone > > me--especially Greg, if he's really serious about the comments he made > > to U.S. Today extolling interactivity. Here's PG's chance to adopt a > > powerful format (OpenReader) and enjoy readers worthy of it (dotReader > > and in the future FBReader). I'm all ears as far as suggestions from > > Greg or anyone else, and I know others will be as well. > > I enjoyed reading those quotes, and they're pretty > accurate from an interview I did a few weeks ago > concerning launch of the newest Sony eBook reader > with electronic ink. > > (I was just in Tokyo two weeks ago, and was unable > to find one of these units for sale. I didn't look > all that hard, but peered closely in the PDA section > of Bic Camera which is a huge electronics chain store). > > They somehow recycled the article for USA Today -- > nice to see. Of course I'm serious about limitations > of eBook readers, and am against any format that > is one-way, closed, non-fixable/editable, etc. This > is a thread in the "about" essays Michael and I worked > on: http://www.gutenberg.org/about , with a key theme > being "unlimited distribution." > > For the OpenReader format, as Marcello said there is > no conceptual resistance to using this as a "convert > to" format at gutenberg.org, just as plucker is. All > we need is a clear and preferably open source processing > chain that we can insert into the ibiblio.org site. Also, > of course, a reasonable support community so that PG > help staff (me, George & Marcello) don't end up being > too challenged in supporting the format. > > In short, as you've heard before, you should feel encouraged > to "go for it." > -- Greg > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Bowerbird at aol.com Tue May 23 08:43:27 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue May 23 08:43:38 2006 Subject: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries] Message-ID: <368.52ae186.31a4879f@aol.com> michael said: > It is quite obvious that many people view > both Bowerbird's and Mr. Rothman's comments as attacks, > even Bowerbird and Mr. Rothman, though rarely would the > mention include self-reflection on this matter. for the record, i do not feel i have made _any_ "attacks". and i firmly believe if you look at what i have actually said -- as opposed to how david has _characterized_ what i've said -- you will see that that is so. indeed, if you find otherwise, i'd be happy for you to draw attention to it, so i can explain. i have said _unflattering_ things. but with good evidence. i _never_ resort to spin-doctoring, such as his name-calling. he flames me, and then tries to blame it on me by calling me "a troll". don't tell me you can't see through that transparency. and really, take a look at the latest thread. i made a simple point -- which is that it is relatively easy for a programmer to "embed" shared annotations into an e-book -- and he eventually ended up dragging in a _myriad_ of unrelated charges, some of them _silly_. meanwhile, it still remains easy to embed annotations in an e-book. contrary to what a naive observer might be led to believe by teleblog -- or even his recent posts here -- openreader is _not_ the only way to provide "interactivity" to electronic-books. not even close. all these other issues that he is throwing up are a _smokescreen_ intended to divert your attention from that very simple fact, and if anyone reading these posts doesn't realize that, then i _fear_ for their reading comprehension. of course, david doesn't _want_ people to read these posts, that's why he's throwing up a barrage, hoping that lurkers will just view the subject-headers (where he promulgates his smear job by prominently using the "troll" word). is what i'm saying here unflattering? you betcha. is it true? you betcha. but i don't do "attacks". i do hard-headed analysis with a focus on fact. and i don't get emotionally involved -- it interferes badly with the logic. it's a lot smarter to stay on-point. i do a lot of self-reflection. i can look at myself in a mirror just fine... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/c3c641d4/attachment.html From Bowerbird at aol.com Tue May 23 09:05:19 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue May 23 09:05:30 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <483.8864e2.31a48cbf@aol.com> michael said: > I wonder how great a percentage of Google's six year plan > will have to expire before Mr. Bowerbird will admit that > it doesn't look as if Google is even trying to make it to > 10 million in 6 years. michael, it certainly isn't necessary to call me "mr." bowerbird. but hey, it sounds kinda funny and cute, so please, be my guest. as for google's plan, i laid out my prediction last december: december 14, 2004 -- 0 books december 14, 2005 -- 10,000 books december 14, 2006 -- 100,000 books december 14, 2007 -- 1,000,000 books december 14, 2008 -- 10,000,000 books so not only do i think they are still on-track, and doing well, i actually think they'll wrap it up by the end of 2008, michael, _if_ they stop at the 10.5 million unique titles they have now. but i think the courts will clear a path for them by that time, and more libraries will come on-board, and their focus will expand from books into the wide variety of _other_ content commonly found in libraries, including much of the local stuff found in libraries nationwide, so that by december 14, 2012, they will have scanned a grand total of some 100 million items, at a cost of $10 billion. (all the local stuff will jack up the cost, from $1 billion to $10 billion. but by this time, the google boys will be worth $25 billion each, and google itself $75 billion, so this will just be a cost of doing business written off their taxes.) see, when you've got a ton of money, moore's law becomes your _bottom_ bound, not your top one. need to go faster? all you need to do is buy more scanners and hire more people. > My own projections show it taking about twice that long, so what if it does take "twice as long"? or 3 or 4 times as long? really, so what? your own e-library took 35 years to get to 20,000 items, and i think it's one of the best things in all of cyberspace... well, except for some of those videos over on youtube. people are really funny and creative, know what i mean? ;+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/52784213/attachment.html From hart at pglaf.org Tue May 23 09:12:09 2006 From: hart at pglaf.org (Michael Hart) Date: Tue May 23 09:12:10 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <44721054.3000104@ibiblio.org> References: <424.1cc65e1.31a33a85@aol.com> <44721054.3000104@ibiblio.org> Message-ID: On Mon, 22 May 2006, Michael Dyck wrote: > Michael Hart wrote: >> >> If we double that number to 100,000, we could pretend these >> results indicated that Google had accomplished 1% of a goal >> of 10,000,000 books, in 25% of their 6 year plan. > > In 1993, PG had accomplished 1% of its goal of 10,000, > in about 70% of the total time. Only if you keep refusing to acknowledge that there was not an ordinary production schedule until 1991. . . . Mr. Dyck has been refusing to acknowlege for some time that an ordinarly growth curve is impossible to create when the growth is linear. . .i.e. one per year. . . . And sometimes it was even less, as when the copyright law fell out of favor in 1976, and was replaced with a longer one, thus ruining our first efforts at a Complete Shakespeare. Project Gutenberg growth curves have always been presented for dates starting in January of 1991, the first year of scheduled production rates: 1 per month in 1991 2 per month in 1992 4 per month in 1993 8 per month in 1994 16 per month in 1995 32 per month in 1996 97 98 99 [Survived a big financial loss] [A big reason we don't let money rule Project Gutenberg] Obviously by doubling the total number of books from 1971-90 in the single year of 1991, and then again, and again in the next few years, eliminates any usefulness of incorporating a linear, or less, growth rate that previously existed. Here is the current graph, also included as an attachment to insure it is not broken by mailer margination. 12341234123412341234123412341234123412341234123412341234123412341234 -90--91--92--93--94--95--96--97--98--99--00--01--02--03--04--05--06- *Perhaps 20K >07/06 20K *Estimated 19.5K >05/06 19.5K 19,020 on March 31, 2006 19K > 03/06 19K 18,500 on February 13, 2006 18.5K > 02/06 18.5K Added ~216 from PG of Europe January 01, 2006 18K > 01/06 18K Added 1 from PG PrePrint Site, January, 2006 17.5K > 11/05 17.5K 17K > 08/05 17K 16.5K > 06/05 16.5K 16K > 04/05 16K 15.5K > 02/05 15.5K 15K > 01/05 15K 14.5K > 11/04 14.5K 14K > 10/04 14K 13.5K > 08/04 13.5K 13K > 06/04 13K 12.5K > 04/04 12.5K 12K > 03/04 12K 11.5K > 02/04 11.5K 11K > 01/04 11K 10.5K > 11/03 10.5K >>> October 15, 2003 >>> 10K > 10/03 *10K* 9,500 > 9/03 9,500 9,000 > 8/03 9,000 8,500 > 7/03 8,500 8,000 > 5/03 8,000 7,500 > 3/03 7,500 Note this graph is in 1/4 years 7,000 > 1/03 7,000 6,500 > 12/02 6,500 6,000 > 9/02 6,000 5,500 > 7/02 5,500 5,000 > 4/02 5,000 4,500 > 2/02 4,500 Added PG Australia in August, 2001 4,000> 10/01 4,000 3,500 >5/01 3,500 3,000> 12/00 3,000 2,500 > 8/00 2,500 2,000 > 12/99 2,000 1,500 > 10/98 1,500 1,000 > 8/97 1,000 500 > 4/96 500 100 >12/93 <<12/90 10 -90--91--92--93--94--95--96--97--98--99--00--01--02--03--04--05--06- YEARS 12341234123412341234123412341234123412341234123412341234123412341234 QUARTERS -------------- next part -------------- 12341234123412341234123412341234123412341234123412341234123412341234 -90--91--92--93--94--95--96--97--98--99--00--01--02--03--04--05--06- *Perhaps 20K >07/06 20K *Estimated 19.5K >05/06 19.5K 19,020 on March 31, 2006 19K > 03/06 19K 18,500 on February 13, 2006 18.5K > 02/06 18.5K Added ~216 from PG of Europe January 01, 2006 18K > 01/06 18K Added 1 from PG PrePrint Site, January, 2006 17.5K > 11/05 17.5K 17K > 08/05 17K 16.5K > 06/05 16.5K 16K > 04/05 16K 15.5K > 02/05 15.5K 15K > 01/05 15K 14.5K > 11/04 14.5K 14K > 10/04 14K 13.5K > 08/04 13.5K 13K > 06/04 13K 12.5K > 04/04 12.5K 12K > 03/04 12K 11.5K > 02/04 11.5K 11K > 01/04 11K 10.5K > 11/03 10.5K >>> October 15, 2003 >>> 10K > 10/03 *10K* 9,500 > 9/03 9,500 9,000 > 8/03 9,000 8,500 > 7/03 8,500 8,000 > 5/03 8,000 7,500 > 3/03 7,500 Note this graph is in 1/4 years 7,000 > 1/03 7,000 6,500 > 12/02 6,500 6,000 > 9/02 6,000 5,500 > 7/02 5,500 5,000 > 4/02 5,000 4,500 > 2/02 4,500 Added PG Australia in August, 2001 4,000> 10/01 4,000 3,500 >5/01 3,500 3,000> 12/00 3,000 2,500 > 8/00 2,500 2,000 > 12/99 2,000 1,500 > 10/98 1,500 1,000 > 8/97 1,000 500 > 4/96 500 100 >12/93 <<12/90 10 -90--91--92--93--94--95--96--97--98--99--00--01--02--03--04--05--06- YEARS 12341234123412341234123412341234123412341234123412341234123412341234 QUARTERS  From gbnewby at pglaf.org Tue May 23 09:21:58 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Tue May 23 09:22:00 2006 Subject: [gutvol-d] USA Today; In-Reply-To: <5eff08fa0605230837n4b7272dcm917d71ec7cb1b216@mail.gmail.com> References: <5eff08fa0605230107o4769a8afofab749d48e78e244@mail.gmail.com> <20060523145110.GA21391@pglaf.org> <5eff08fa0605230837n4b7272dcm917d71ec7cb1b216@mail.gmail.com> Message-ID: <20060523162158.GA24437@pglaf.org> On Tue, May 23, 2006 at 11:37:53AM -0400, David H. Rothman wrote: > Many thanks, Greg. Those are all extremely reasonable conditions, and > I'll forward this to the appropriate folks, so they can be in direct > touch with you. We're eager to work with PG/DP and blend in well with > everyone's workflow. I also agree with you on the need for thinking > through the support issues. You could share with us the lessons you've > learned from Plucker. - David I haven't learned any particular lesson from Plucker, which is probably good... Marcello might have some views on how things integrate. Mail to help@pglaf.org goes to me & George Davis...George answers most of them. We get 5 or so inquiries per day. Frustrations come from our .lit, .pdb and .mp3 files which, when broken our outdated, cannot be easily fixed. We get frequent requests to submit this and that format, including some people who do the work of conversion then send me files. Lots of PDF, but pretty well any format you can think of (.doc, etc.). For the most part I don't want to add such formats in static files to the PG collection, instead prefering conversion on the fly. The goal, as oft stated, is automated conversion to many formats from XML or HTML input. Several people have made great progress on this, and the XML production chain at DP is in pretty good shape....but we're not there yet. The current catalog/download interface at gutenberg.org is close to the ideal: just a few static files, then a selection of conversion options. Today, Plucker is the only one Marcello has available, but more can be added. Conversion to PDF, MP3 & Braille are at the top of my personal list. Not all input books or types can be reasonably accurately converted to any possible format, especially for the older titles with no well-formed & valid HTML version. (David Widger converts several dozen eBooks per week, minimum, to current standards.) -- Greg > >For the OpenReader format, as Marcello said there is > >no conceptual resistance to using this as a "convert > >to" format at gutenberg.org, just as plucker is. All > >we need is a clear and preferably open source processing > >chain that we can insert into the ibiblio.org site. Also, > >of course, a reasonable support community so that PG > >help staff (me, George & Marcello) don't end up being > >too challenged in supporting the format. > > > >In short, as you've heard before, you should feel encouraged > >to "go for it." > > -- Greg > > > > > > > On 5/23/06, Greg Newby wrote: > >On Tue, May 23, 2006 at 04:07:06AM -0400, David H. Rothman wrote: > >> .... > >> PG people are very welcome to write me privately or phone > >> me--especially Greg, if he's really serious about the comments he made > >> to U.S. Today extolling interactivity. Here's PG's chance to adopt a > >> powerful format (OpenReader) and enjoy readers worthy of it (dotReader > >> and in the future FBReader). I'm all ears as far as suggestions from > >> Greg or anyone else, and I know others will be as well. > > > >I enjoyed reading those quotes, and they're pretty > >accurate from an interview I did a few weeks ago > >concerning launch of the newest Sony eBook reader > >with electronic ink. > > > >(I was just in Tokyo two weeks ago, and was unable > >to find one of these units for sale. I didn't look > >all that hard, but peered closely in the PDA section > >of Bic Camera which is a huge electronics chain store). > > > >They somehow recycled the article for USA Today -- > >nice to see. Of course I'm serious about limitations > >of eBook readers, and am against any format that > >is one-way, closed, non-fixable/editable, etc. This > >is a thread in the "about" essays Michael and I worked > >on: http://www.gutenberg.org/about , with a key theme > >being "unlimited distribution." > > > >For the OpenReader format, as Marcello said there is > >no conceptual resistance to using this as a "convert > >to" format at gutenberg.org, just as plucker is. All > >we need is a clear and preferably open source processing > >chain that we can insert into the ibiblio.org site. Also, > >of course, a reasonable support community so that PG > >help staff (me, George & Marcello) don't end up being > >too challenged in supporting the format. > > > >In short, as you've heard before, you should feel encouraged > >to "go for it." > > -- Greg > > > > > > > >_______________________________________________ > >gutvol-d mailing list > >gutvol-d@lists.pglaf.org > >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > From Bowerbird at aol.com Tue May 23 09:51:27 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue May 23 09:51:37 2006 Subject: [gutvol-d] re: accurately converted to any possible format Message-ID: <43c.18826a3.31a4978f@aol.com> greg said: > Not all input books or types can be reasonably > accurately converted to any possible format, especially > for the older titles with no well-formed & valid HTML version. as i've said for years now, with a small commitment from you to consistent formatting, i could take plain-ascii files as input, automatically apply the typographic niceties that are expected, and output the results to .pdf and to .html, such that the .html can be converted to a large number of other auxiliary formats. of course, i'm not unique. david moynihan has done it for years. david was willing to make a small commitment to edit the files himself so as to obtain that consistent formatting. i think it is more important to teach you how to fish than to give you fish. check with 3 tool-makers from distributed proofreaders -- thundergnat, donovan, and bill flis -- and they'll confirm that a clear path for ascii-to-(x)html conversion is quite workable -- due to the fact that d.p. now has the required consistency -- even with their current programs, and that if they worked on it a bit more, they could make it into a regular part of the workflow. there is no need for the more-complex switch to a .tei workflow. -bowerbird p.s. if you only would have accepted moynihan's offer of his files when he made it to you, you'd already _have_ a consistent library. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/aa153d27/attachment.html From marcello at perathoner.de Tue May 23 13:10:32 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue May 23 13:10:37 2006 Subject: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries] In-Reply-To: <368.52ae186.31a4879f@aol.com> References: <368.52ae186.31a4879f@aol.com> Message-ID: <44736C38.104@perathoner.de> Bowerbird@aol.com wrote: > it is relatively easy for a programmer to "embed" > shared annotations into an e-book How would you know? > i do a lot of self-reflection. i can look at myself in a mirror just > fine... Snip. Another one for "The Showcase of Pudd'nhead Bowerbird" at: http://www.gnutenberg.de/bowerbird/ -- Marcello Perathoner webmaster@gutenberg.org From walter.van.holst at xs4all.nl Tue May 23 14:39:01 2006 From: walter.van.holst at xs4all.nl (Walter van Holst) Date: Tue May 23 14:44:08 2006 Subject: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries] In-Reply-To: <44736C38.104@perathoner.de> References: <368.52ae186.31a4879f@aol.com> <44736C38.104@perathoner.de> Message-ID: <447380F5.6020204@xs4all.nl> Marcello Perathoner wrote: > >> i do a lot of self-reflection. i can look at myself in a mirror just >> fine... >> > > Snip. Another one for "The Showcase of Pudd'nhead Bowerbird" at: > > http://www.gnutenberg.de/bowerbird/ > > Aren't you honouring this, ehm, charachter a bit too much by putting this much effort into having a collection of his delusions online? I am not to judge your pastimes, but personally I would have preferred a thorough discussion on the number of angels that fits on the tip of a needle instead. Regards, Walter From Bowerbird at aol.com Tue May 23 16:37:51 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue May 23 16:50:05 2006 Subject: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries] Message-ID: <429.203f0c1.31a4f6cf@aol.com> i said: > > it is relatively easy for a programmer to "embed" > > shared annotations into an e-book marcello said: > > How would you know? um, because i've built a number of prototypes that do it. of course. before i make bold assertions, i've done years of research... didn't i post a message saying i could do a demo-app for david, to show how simple it is?, and didn't david say, "ok, go ahead", and didn't i say "ok, i will, you can expect it by thursday, i'd say?" just a minute, let me check... ok, the first two messages did indeed post ("i can do it", "ok, do it"), but i never got around to sending the third one this morning, sorry... i'll send it first thing tomorrow, since i think we've had enough for today... still, this is fairly easy to program. here's the code that sends out a note: > dim abcd as dictionary > dim socket1 as new httpsocket > dim lec as integer > abcd=new dictionary > abcd.value("myname")=editfield1.text > abcd.value("myemail")=editfield2.text > abcd.value("mycomment")=editfield3.text > socket1.setformdata abcd > socket1.post "http://users.aol.com/cgi-bin/guestbook/bowerbird/bbb.html" that's it. all the code you need. that's a real-live form, on the web right now, found at: > http://users.aol.com/bowerbird/bbb.html you can post from the page itself, filling out the form right there, using any web-browser on any internet-connected machine... or you can compose your comment on your own machine, and, using the program that contains the code listed above, post that comment to the website. (the downstream app is interacting with the code that runs the guestbook script on the .html page, so you will find the same variable names as above if you look at the code that creates the .html form on that page. fairly easy to figure out.) *** and here's the code that fetches the text of the webpage and puts it in an editfield. this is realbasic source-code, by the way. > dim http as new httpsocket > readit.text=http.get("users.aol.com/bowerbird/bbb.html",30) open another window in the app on your machine, and the two-line function above loads and displays the comments from the webpage. well, _that_ was certainly simple, wasn't it? and those are our two functions, one to post, the other to read. *** ok, wrap a g.u.i. around it, and you've got the demo app, pronto. with what?, a dozen lines of code?, all copied from the manual? like i said, pretty elementary. realbasic does all the heavy lifting. your mileage, using your language and your compiler, may vary. so, from my vantage-point, yes, this is dirt-simple. do you have any questions? -bowerbird p.s. marcello, thanks for creating my shrine. i can point to it in the future as holding a good number of my prognostications, not to mention my phat and sassy attitude, donchajustadoreit? thank goodness for the internet archive, right? brewster rocks! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/fd3d0e68/attachment.html From gbnewby at pglaf.org Tue May 23 23:08:08 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Tue May 23 23:08:11 2006 Subject: [gutvol-d] re: accurately converted to any possible format In-Reply-To: <43c.18826a3.31a4978f@aol.com> References: <43c.18826a3.31a4978f@aol.com> Message-ID: <20060524060808.GD5644@pglaf.org> On Tue, May 23, 2006 at 12:51:27PM -0400, Bowerbird@aol.com wrote: > greg said: > > Not all input books or types can be reasonably > > accurately converted to any possible format, especially > > for the older titles with no well-formed & valid HTML version. > > as i've said for years now, with a small commitment from you > to consistent formatting, i could take plain-ascii files as input, > automatically apply the typographic niceties that are expected, > and output the results to .pdf and to .html, such that the .html > can be converted to a large number of other auxiliary formats. Here's an eBook that should meet your requirements: http://www.gutenberg.org/etext/18257 You already have server space, to provide a conversion utility. Looking forward the pudding... > of course, i'm not unique. david moynihan has done it for years. > > david was willing to make a small commitment to edit the files > himself so as to obtain that consistent formatting. i think it is > more important to teach you how to fish than to give you fish. > > check with 3 tool-makers from distributed proofreaders -- > thundergnat, donovan, and bill flis -- and they'll confirm that > a clear path for ascii-to-(x)html conversion is quite workable > -- due to the fact that d.p. now has the required consistency -- > even with their current programs, and that if they worked on it > a bit more, they could make it into a regular part of the workflow. You make it sound like I'm saying "no," when all I've ever said is "yes." If people are put off by trying to get things to "fit" in the existing www.gutenberg.org infrastructure, I have two other servers (snowy.arsc.alaska.edu and readingroo.ms) with complete copies of the PG collection for development. Currently, two different people are pursing their dreams/ideas on the readingroo.ms server. You're already on snowy, and there is room for more. > there is no need for the more-complex switch to a .tei workflow. To each their own pudding. If I'm not saying "no" to you, why would I say, "no" to someone with a different approach? If people don't like the way DP does things, they can start their own DP (they can even get help!!). If people don't like the way PG postprocesses & posts eBooks, they can grab 'em and do their own postings. As long as there's a reasonable adherance to the principle of unlimited distribution etc. (http://www.gutenberg.org/about), we'll even link to 'em! > -bowerbird > > p.s. if you only would have accepted moynihan's offer of his files > when he made it to you, you'd already _have_ a consistent library. I don't recall turning down David, but might have on the grounds of being unable to effectively ingest & manage the files he was producing. Today, I'd offer him his own server space to help him do things his way. -- Greg From nwolcott2ster at gmail.com Tue May 23 09:05:45 2006 From: nwolcott2ster at gmail.com (Norm Wolcott) Date: Tue May 23 23:23:28 2006 Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries References: <439.1c91c11.31a3713c@aol.com><17522.28516.463762.723548@celery.zuhause.org> Message-ID: <00d901c67e83$e42e6860$650fa8c0@gw98> There is no doubt that the Open Book Project of Brewster Kahle has the most accurate books online. However they have only 1000 books. The million books project I would rate just barely above google books in quality and completeness. It also helps if the page is not turned before the scan is complete. The US is doing very well in providing a large number of useless images online. Does anyone know how to get a book into Open Book Project? do you have to be a library? nwolcott2@post.harvard.edu ----- Original Message ----- From: "Michael Hart" To: "Project Gutenberg Volunteer Discussion" Sent: Tuesday, May 23, 2006 10:03 AM Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries > > On Mon, 22 May 2006, Bruce Albrecht wrote: > > > Bowerbird@aol.com writes: > > > > And the latest estimated I have received show that Google's > > > > total number of books has just recently passed 50,000 > > > > > > i do believe you misread that. 50,000 public-domain titles, > > > with another 42,000 under copyright, for a total of 92,000. > > Then I was probably right to count Google's total as ~100,000 > in my own public estimations, though I would prefer counts of > downloadable books to avoid Google's new policy of: > > "Google Book Search is a means for helping users discover > books, not to read them online and/or download them." > > > > My searching found 50,000 public domain titles available as complete > > books, and another 42,000 that should have been available as complete > > books because they were published prior to 1923, but were only visible > > in snippet view. I have no idea how many books Google scanned > > published after 1922 which are probably PD because the copyright was > > apparently not renewed, nor the number of books scanned even though > > the book is still under copyright. > > Are you saying that there are actually 50,000 downloadable > full text Google eBooks? > > Any idea of their level of accuracy? > > > Please allow me to renew the request from myself and LIS PhD Greg Newby, > CEO of Project Gutenberg, for a copy of the list we can look over, > even if we cannot make it public. > > > Thanks!!! > > Give the world eBooks in 2006!!! > > Michael S. Hart > Founder > Project Gutenberg > > Blog at http://hart.pglaf.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From nwolcott2ster at gmail.com Tue May 23 08:37:10 2006 From: nwolcott2ster at gmail.com (Norm Wolcott) Date: Tue May 23 23:23:29 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries References: <48c.693268.31a37fc3@aol.com> <6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu> Message-ID: <00d801c67e83$e3da7c00$650fa8c0@gw98> My success with google pd books is about 30%. Some books are so dark they are unreadable, let alone ocr, these seem to appear as jpegs. Others have whole sides of pages clipped off throughout the entire book. When the images are pretty good they seem to appear as png's. I have found most of the books with the png extension are pretty good. All seemed to have the occasional missing pages. I have sent many errors in to google and get a nice canned reply, but no improvement in the output is visible nor further feedback. I have found these books most useful when I already have a copy of the book, and can use the google scan to help speed up the scanning/ocr process. In fact I don't see how DP is coping with these google texts giving their now stricter requirements that a perfect scan of every page and illustration must be provided before the book can even get into their processing queue. A missing part of a page or illegible word cannot be corrected from another edition, due to their high standard of perfection. With the average book now requiring 2 years to go through their four levels of proofreading, one does wonder. nwolcott2@post.harvard.edu ----- Original Message ----- From: "Frank van Drogen" To: "Project Gutenberg Volunteer Discussion" Sent: Monday, May 22, 2006 5:19 PM Subject: re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries > > >it's clear that google has gotten their legs under them > >in regard to doing the scanning. let's hope that they'll > >get their quality-control under control very soon too... > > I have found less missing pages and other problems in books from Google > then in those from the MBP and Canadian/IA. They are, however, still far > from perfect. When they get a report regarding a missing or wrongly scanned > page in a PD book; it is apparently up to the providing library to get the > problem sorted out. I've heard report of complete books being rescanned > (with the risk of having another page missing in the end ;) ). I've also > heard somebody mentioning that the full rescanned book was stuck behind the > existing one (rather space consuming, but for DP purposes a lot saver. > > What worries me in this is that Google doesn't seem to care whether pages > are missing or not... as long as they get 99% of the pages from a book > stored, changes are most search terms pointing to the particular book will > be identified. Their interest lies in people purchasing the book via > Amazon, Abe etc. after identifying them via book.google.com. > > The best quality control I have encountered so far is on Gallica, where > appart from missing pages due to those pages missing in the original > scanned manuscript, I've not encountered incomplete books. I'd be actually > interesting to see how they perfrom their quality control. > > Frank > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From nwolcott2ster at gmail.com Tue May 23 19:40:24 2006 From: nwolcott2ster at gmail.com (Norm Wolcott) Date: Tue May 23 23:23:30 2006 Subject: [gutvol-d] changing email addresses Message-ID: <007a01c67edb$d9ef15e0$650fa8c0@gw98> Can whoever reads this please aarrange or tell me how to get reconnected with gutvol-d Thanks nwolcott2@post.harvard.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/b5e8f790/attachment-0001.html From gbnewby at pglaf.org Tue May 23 23:32:09 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Tue May 23 23:32:10 2006 Subject: [gutvol-d] changing email addresses In-Reply-To: <007a01c67edb$d9ef15e0$650fa8c0@gw98> References: <007a01c67edb$d9ef15e0$650fa8c0@gw98> Message-ID: <20060524063209.GC6248@pglaf.org> On Tue, May 23, 2006 at 10:40:24PM -0400, Norm Wolcott wrote: > Can whoever reads this please aarrange or tell me how to get reconnected with gutvol-d (done) > Thanks > nwolcott2@post.harvard.edu > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From jon.ingram at gmail.com Wed May 24 01:22:01 2006 From: jon.ingram at gmail.com (Jon Ingram) Date: Wed May 24 01:28:42 2006 Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <00d901c67e83$e42e6860$650fa8c0@gw98> References: <439.1c91c11.31a3713c@aol.com> <17522.28516.463762.723548@celery.zuhause.org> <00d901c67e83$e42e6860$650fa8c0@gw98> Message-ID: <4baf53720605240122j7cc1aaf2l3ea01f691f2820a5@mail.gmail.com> On 5/23/06, Norm Wolcott wrote: > There is no doubt that the Open Book Project of Brewster Kahle has the most > accurate books online. However they have only 1000 books. The million books > project I would rate just barely above google books in quality and > completeness. It also helps if the page is not turned before the scan is > complete. The US is doing very well in providing a large number of useless > images online. > > Does anyone know how to get a book into Open Book Project? do you have to > be a library? I wish I knew. I've scanned almost a thousand books for Distributed Proofreaders, and the Internet Archive would be a great place to permanently store the images. Every time I've asked them on their website, however, they either haven't replied, or have said that letting outside people contribute material is something that they're planning on setting up, but with no firm date. -- Jon Ingram From Bowerbird at aol.com Wed May 24 03:03:30 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed May 24 03:03:46 2006 Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <42a.2124045.31a58972@aol.com> norm said: > My success with google pd books is about 30%. and > The US is doing very well in providing a large number of useless images online. see, now _that_ is the shame. _that_ is what the complainers should be complaining about. bad scans do _nobody_ any good. *** jon ingram said: > I've scanned almost a thousand books for Distributed Proofreaders, > and the Internet Archive would be a great place to permanently store > the images. Every time I've asked them on their website, however, > they either haven't replied, or have said that letting outside people > contribute material is something that they're planning on setting up, > but with no firm date. see, this is bad too. this needs to be fixed. when you people are willing to do this work, something like _diskspace_ needs to become a solved problem, not a recurring nightmare. so, who can solve this problem for you guys? what could i do to help you guys get it solved? amazon just announced a new storage system. the rates seemed pretty low to me, but i'd guess we're looking for so much space that it'd add up. especially since they charge you for pushing it in. we need some concrete figures to discern pricing, could you give us a ballpark number on that, jon? another alternative would be to store it distributedly. we could chop it up into a thousand pieces and have a network of two thousand people storing it at home. michael keeps telling us how cheap terabyte disks are. maybe we can recreate fidonet with terabytes and d.s.l. but face facts, if we've got a complete scan-set, it has to be saved. it has to. and saved without the waste of even a second thought about it. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060524/c6abed44/attachment.html From Bowerbird at aol.com Wed May 24 03:16:44 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed May 24 03:16:53 2006 Subject: [gutvol-d] re: accurately converted to any possible format Message-ID: <365.53a95d7.31a58c8c@aol.com> i said: > > as i've said for years now, with a small commitment from you > > to consistent formatting, i could take plain-ascii files as input, > > automatically apply the typographic niceties that are expected, > > and output the results to .pdf and to .html, such that the .html > > can be converted to a large number of other auxiliary formats. greg said: > Here's an eBook that should meet your requirements: > http://www.gutenberg.org/etext/18257 well yes, that one works just fine. already did it for that one. :+) > You already have server space, yes, thank you very much, snowy has been very kind to me... > to provide a conversion utility. > Looking forward the pudding... well, but first there is that minor matter of the small commitment to ensure that all future files will conform to consistency as well... you know, first you plug the leak, and only then clean up the mess. and that's not a new request. that was the original precondition. but perhaps there's another way out of this impasse. > To each their own pudding.? If I'm not saying "no" to you, > why would I say, "no" to someone with a different approach? it would be silly of me to expect _you_ to tell them "no", as if they couldn't bloody well do it themselves anyway. no, what _i_ am doing is telling them that .tei would be a waste of their time. i'm giving them a friendly notice, a heads-up, saying "hey i got it covered, you go on", but they don't take it that way, they get all pissed off at me... so, ok, you want to be gruff, i can be gruff, no problem. oh yes, greg, _you_ are one of "them". so you would be telling yourself "no". and i don't expect you to do that, no. indeed, i don't even expect you to listen to me telling you that .tei is a waste of your time. that's fine. you'll learn. > I have two other servers (snowy.arsc.alaska.edu and readingroo.ms) > with complete copies of the PG collection for development. ok, now we're getting somewhere. this could break through. > If people don't like the way PG postprocesses & posts eBooks, > they can grab 'em and do their own postings.? As long as there's > a reasonable adherance to the principle of unlimited distribution > etc. (http://www.gutenberg.org/about), we'll even link to 'em! i have no stealaway desires, i'm happy to do things under your wing, my intention is to try and show you -- project gutenberg -- how you can save yourself a lot of work, and increase your unlimited distribution. i want to help the best cyberlibrary get better, i don't want to tear it apart. i would definitely agree to demonstrate some automatic transformations of your e-texts on a library-wide basis that could show you some shit... at first this would just be for experimental purposes. no promises. however -- if the effort continues, and it should ever come to that -- my "shaping" of the files by progressive transformations would result in a substantial fork of the library, but once you have satisfied yourself that it had retained all important data so the integrity of the books was intact and that only inconsequential inconsistencies had been removed, you will be amenable to _considering_ a wholesale replacement in one fell swoop. the benefits will be quite obvious, though. it won't be hard for you to be the decider... > I don't recall turning down David, but might have on the grounds of > being unable to effectively ingest & manage the files he was producing.? > Today, I'd offer him his own server space to help him do things his way. ok, now you're _really_ talking. because david already knows how to do this. unfortunately, as we all know, he's kinda busy right now. with any luck, though, he's bored, and restless because the website that has been his business has been his life has taken up a lot of his time in the last many years or so is shut down, with any luck he's itching for some diskspace to play with. he might be jonesing for an ftp-interaction... but you know, honestly, all _i_ would really need from him is his ascii files. he's already made all of them consistent... i've never asked him for them before. but maybe now's the time? and if you and i ask him together, so i can work on the files for you? i've never seen his ascii files for sale, but i'd be willing to pay some. after all, the reason i want those files is they'd save me a lot of time. it seems quite reasonable to reimburse him for a little of that time. especially because he's got lawyer bills, i'm sure. (actually, i'd hope a lawyer has taken this case pro bono for the great exposure, but you know there are always lawsuit-related bills that have to be paid. i consider him a trooper, and i do believe in supporting our troops!) anyway... with _consistent_ files, i can start turning neat tricks right soon now. i might have to reshape david-consistency into zml-consistency, but that'll be a lot easier than reshaping p.g.-inconsistency into anything. so even if it's not immediate, it would be soon. of course, i'm not taking david's files for granted, no sir, as we haven't even yet asked. and he _is_ busy these days. he might even say yes, but have no time to fill the request. but, to sum up, i would be most grateful for: 1. on snowy, a copy of the p.g. library that i can start "shaping". 2. also on snowy, a copy of moynihan's ascii files for experiments? 3. diskspace for david to play, now if he wants, or sometime later. the understanding is i'm just playing with your files. no promises, no expectations, no guarantees. ok? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060524/8c1991b9/attachment.html From bruce at zuhause.org Wed May 24 07:54:21 2006 From: bruce at zuhause.org (Bruce Albrecht) Date: Wed May 24 07:54:28 2006 Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <00d901c67e83$e42e6860$650fa8c0@gw98> References: <439.1c91c11.31a3713c@aol.com> <17522.28516.463762.723548@celery.zuhause.org> <00d901c67e83$e42e6860$650fa8c0@gw98> Message-ID: <17524.29597.975166.45531@celery.zuhause.org> Norm Wolcott writes: > There is no doubt that the Open Book Project of Brewster Kahle has the most > accurate books online. However they have only 1000 books. The million books > project I would rate just barely above google books in quality and > completeness. It also helps if the page is not turned before the scan is > complete. The US is doing very well in providing a large number of useless > images online. What is the URL for this archive? When I searched Google, I found an "Open Book Project" at ibiblio, but it seemed to have almost nothing there, and it looked like it wasn't from scans anyway. If you're referring to the "Open Content Alliance", I'd love to see a URL for find its archives. Internet Archive doesn't seem to have a category for OCA texts yet. From hart at pglaf.org Wed May 24 08:08:19 2006 From: hart at pglaf.org (Michael Hart) Date: Wed May 24 08:08:21 2006 Subject: [gutvol-d] changing email addresses In-Reply-To: <007a01c67edb$d9ef15e0$650fa8c0@gw98> References: <007a01c67edb$d9ef15e0$650fa8c0@gw98> Message-ID: On Tue, 23 May 2006, Norm Wolcott wrote: > Can whoever reads this please aarrange or tell me how to get reconnected with gutvol-d > Thanks > nwolcott2@post.harvard.edu You asked about subcribing or unsubscribing from one of the Project Gutenberg Newsletters. Please save for reference: This is the information from: www.gutenberg.org/howto/subscribe-howto Please check this site once in a while for updates: Mailing Lists Various mailing lists for Project Gutenberg exist. A brief description of each follows, along with a link to visit or subscribe (or unsubscribe). All lists live at http://lists.pglaf.org, and are moderated except for the discussion lists: * Newsletters, with new eBook listings, calls for assistance, general information, and announcements: + gweekly: Project Gutenberg Weekly Newsletter. Traffic consists mostly of one weekly newsletter. + gmonthly: Project Gutenberg Monthly newsletter. Traffic consists mostly of one monthly newsletter. * Notification as new eBooks are posted: + posted: receive book postings as they happen, along with other PG related internally-focused discussion (high traffic, over 10 postings per day) * Discussion for active volunteers: + gutvol-d: general unmoderated volunteer discussion (moderate traffic) + gutvol-p: programming volunteers, for software development (light traffic) + gutvol-w: website volunteers, for website development (new list) + glibrary: library help, for physically tracking down books and copyright research. Low traffic, with occasional requests. * Other lists: + gutvol-l: moderated volunteer announcements (light traffic) If you would like to subscribe to a mailing list simply select a mailing list name above. All lists require a password and email confirmation to subscribe as part of the Lyris anti-spam measures. Copyright ? 1971-2004 Project Gutenberg -- All Rights Reserved. Most recently updated: 2004-08-07 16:33:32. From hart at pglaf.org Wed May 24 09:05:16 2006 From: hart at pglaf.org (Michael Hart) Date: Wed May 24 09:05:18 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <483.8864e2.31a48cbf@aol.com> References: <483.8864e2.31a48cbf@aol.com> Message-ID: On Tue, 23 May 2006 Bowerbird@aol.com wrote: > michael said: >> I wonder how great a percentage of Google's six year plan >> will have to expire before Mr. Bowerbird will admit that >> it doesn't look as if Google is even trying to make it to >> 10 million in 6 years. > > michael, it certainly isn't necessary to call me "mr." bowerbird. > but hey, it sounds kinda funny and cute, so please, be my guest. > > as for google's plan, i laid out my prediction last december: > december 14, 2004 -- 0 books > december 14, 2005 -- 10,000 books > december 14, 2006 -- 100,000 books > december 14, 2007 -- 1,000,000 books > december 14, 2008 -- 10,000,000 books > > so not only do i think they are still on-track, and doing well, > i actually think they'll wrap it up by the end of 2008, michael, > _if_ they stop at the 10.5 million unique titles they have now. Of course a lot of this depends on what you think and eBook is. The Library Of Congress set their own standard a decade ago as a 99.95% accurate full text. I don't think Google is even trying to get close to this. Then again, some people think pictures of pages are as good as full text, but that would entail a different definition. 1. Making scans is trivial 2. Making raw OCR is trivial 3. Making a 99.95% accurate full text eBook from those is not. 1 and 2 are quick and dirty to most people, and the results make that all too obvious, with so many errors and missing pages. Not to mention that the "books reading each other" requires full text at a reasonable level of accuracy. Google is just getting to 1% of their goal, and they seem to be heading in directions other than these systems and standards would require to be defined as eBooks. Of course, as you mention, when you have over 100 billion dollars, some people are willing to fudge things for you. "Ceci n'est pas une pipe." Rene Magritte "Don't confuse the map with the territory." However, if Google manages to put 10 million books online, even if just searchable in snippets, by December 14, 2010, I will be glad to buy you dinner. And it's a bet that would help the world at large! Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From jmdyck at ibiblio.org Wed May 24 12:15:15 2006 From: jmdyck at ibiblio.org (Michael Dyck) Date: Wed May 24 12:15:18 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: References: <424.1cc65e1.31a33a85@aol.com> <44721054.3000104@ibiblio.org> Message-ID: <4474B0C3.1040906@ibiblio.org> Michael Hart wrote: > > On Mon, 22 May 2006, Michael Dyck wrote: > >> Michael Hart wrote: >>> >>> If we double that number to 100,000, we could pretend these >>> results indicated that Google had accomplished 1% of a goal >>> of 10,000,000 books, in 25% of their 6 year plan. >> >> In 1993, PG had accomplished 1% of its goal of 10,000, >> in about 70% of the total time. > > Only if you keep refusing to acknowledge that there was not an > ordinary production schedule until 1991. . . . Hm. I made my statement based on this data: 1971: 1 ebook 1993: 100 ebooks 2003: 10,000 ebooks So it took about 22 years to do the first 100, and about 32 years to do the first 10,000. That is, 22/32 of the time to do 100/10,000 of the books, or about 70% of the time to do the first 1%. Another way to look at it is that the average production up to 1993 was 4.5 books per year, and after 1993 was 90 books per year, 20 times faster. I.e., the production schedule for the first two decades was significantly slower than that of the subsequent decade. So, far from "refusing to acknowledge that there was not an ordinary production schedule until 1991", this data (and my statement) actually *support* the claim of a radical change in the production schedule in the early 90's. But if you like, we can ignore the pre-1991 data: 1991 Jan: 10 ebooks 1994 Jan: 110 ebooks 2003 Oct: 10010 ebooks So it took 3 years to do the "first" 100, and about 13 years to do the "first" 10,000. That is, 3/13 of the time to do 100/10,000 of the books, or around 23% of the time to do the first 1%. Which is remarkably close to the "25% of the time to do the first 1%" that you gave for Google, above. > Mr. Dyck has been refusing to acknowlege for some time that an > ordinarly growth curve is impossible to create when the growth > is linear. . .i.e. one per year. . . . Huh? I'm pretty sure I've never refused to acknowlege that. I think you have me confused with someone else, possibly Marcello. See, e.g. http://lists.pglaf.org/private.cgi/gutvol-d/2005-January/001262.html (which had to do with picking a reference point for "Moore's Law" growth). You should be more careful before casting aspersions. However, if it makes things easier, I'll gladly refuse to acknowlege that statement *now*. A linear growth curve *is* a growth curve, and a pretty ordinary one at that. It certainly doesn't make it impossible to create an ordinary growth curve. But that's just disagreeing with what you said, rather than what you meant. I think what you meant is something more like "a period of linear growth makes it impossible to fit an exponential growth curve". My response depends on how you interpret "fit". If, by "fit a curve", you mean "rigidly conform to a curve", then I agree: you can't make linear data conform to a exponential curve. But then no real-world phenomenon will rigidly conform to an exponential curve; there will always be some deviation. So alternatively, if by "fit a curve", you mean "approximate with a curve" or "model with a curve", then I disagree: you can certainly approximate linear data with an exponential curve. Whether it's useful depends on what you're trying to accomplish, but it's certainly not impossible. -Michael Dyck From hart at pglaf.org Wed May 24 13:07:53 2006 From: hart at pglaf.org (Michael Hart) Date: Wed May 24 13:07:54 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <4474B0C3.1040906@ibiblio.org> References: <424.1cc65e1.31a33a85@aol.com> <44721054.3000104@ibiblio.org> <4474B0C3.1040906@ibiblio.org> Message-ID: On Wed, 24 May 2006, Michael Dyck wrote: > However, if it makes things easier, I'll gladly refuse to acknowlege > that statement *now*. A linear growth curve *is* a growth curve, and a > pretty ordinary one at that. It certainly doesn't make it impossible to > create an ordinary growth curve. But that's just disagreeing with what > you said, rather than what you meant. No, a line is not a curve. Linear growth would have just been 100 eBooks in 100 years. 1,000 eBooks in 1,000 years. 10,000 eBooks in 10,000 years. 20,000 eBooks in 20,000 years. This is not a growth curve, it is a growth line. However, the case is even more drastic than that, as there was a period of over a decade when 0 eBooks were added, due to the hassles of the US Copyright Act of 1976, which took us forever to find out about, and the truth is that we would probably NEVER have figured them out without the help of one of the top dozen copyright lawyers in the US. Thus, if you INSIST on talking about curves, the curve was downward. Just one more obvious reason why you can't talk about growth curves for this period of Project Gutenberg's history. * When There ARE Growth Curves: You also mentioned that you can't fit real world items into such growth curves, but you never mentioned that the graph I included is a remarkably good overall fit, with deviations so small it is hard to see them at all on such a graph representing eBooks at a range of 500 eBook increments. It's a much more impressive growth curve than anyone predicted-- except for some of us crazy people. From joshua at hutchinson.net Wed May 24 13:23:43 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed May 24 13:23:58 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <20060524202343.70C88DA5F8@ws6-6.us4.outblaze.com> > ----- Original Message ----- > From: "Michael Hart" > On Wed, 24 May 2006, Michael Dyck wrote: > > > However, if it makes things easier, I'll gladly refuse to acknowlege > > that statement *now*. A linear growth curve *is* a growth curve, and a > > pretty ordinary one at that. It certainly doesn't make it impossible to > > create an ordinary growth curve. But that's just disagreeing with what > > you said, rather than what you meant. > > No, a line is not a curve. > Here we go again. 1 - A growth curve doesn't necessarily curve nor does it necessarily grow. A growth curve can be flat (no change), it can be negative (lose value like my stocks recently), it can be linear (grows at a constant rate) or it can be exponential (what we typically think of as a growth curve). 2 - Just because you don't WANT to include certain periods of PG history in your growth curve analysis doesn't mean no one else can or that they are wrong for doing so. If someone wants to plot growth from the start of PG to present (which certainly seems reasonable to me), then that is a valid growth curve plot. If you want to plot it in the "modern PG era" from circa 1993 to present, then that is valid, too. Both plots need to specify the time frame they are plotting. Since we argued this to death already (as Michael Dyck so nicely linked to previously), can we let it drop now? Josh From grythumn at gmail.com Wed May 24 13:26:38 2006 From: grythumn at gmail.com (Robert Cicconetti) Date: Wed May 24 13:26:41 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: References: <424.1cc65e1.31a33a85@aol.com> <44721054.3000104@ibiblio.org> <4474B0C3.1040906@ibiblio.org> Message-ID: <15cfa2a50605241326s5caa3578lbad2d63cb1eda73e@mail.gmail.com> Right. I think what needs to be reiterated here is that any extrapolation is just that, an extrapolation, and that even a very complex growth curve fitted to PG's output will vary greatly from reality based on various environmental factors (how many volunteers are available, free time available, major reorgs at DP, etc) and the biases built into the model. Taking a standard exponential growth curve and extrapolating it is not feasible for the long term. (Extend it a few hundred years out.. baring a major population explosion, major AI improvements, or other unforeseen circumstance, it's plainly not tenable. You run out of PD works and human beings very quickly.) It may be accurate enough for the short term, however. I know a statician on another forum.. perhaps he can explain it more clearly. R C On 5/24/06, Michael Hart wrote: > > > On Wed, 24 May 2006, Michael Dyck wrote: > > > However, if it makes things easier, I'll gladly refuse to acknowlege > > that statement *now*. A linear growth curve *is* a growth curve, and a > > pretty ordinary one at that. It certainly doesn't make it impossible to > > create an ordinary growth curve. But that's just disagreeing with what > > you said, rather than what you meant. > > No, a line is not a curve. > > Linear growth would have just been > > 100 eBooks in 100 years. > 1,000 eBooks in 1,000 years. > 10,000 eBooks in 10,000 years. > 20,000 eBooks in 20,000 years. > > This is not a growth curve, it is a growth line. > > > However, the case is even more drastic than that, as there was a period > of over a decade when 0 eBooks were added, due to the hassles of the US > Copyright Act of 1976, which took us forever to find out about, and the > truth is that we would probably NEVER have figured them out without the > help of one of the top dozen copyright lawyers in the US. > > Thus, if you INSIST on talking about curves, the curve was downward. > > Just one more obvious reason why you can't talk about growth curves > for this period of Project Gutenberg's history. > > * > > When There ARE Growth Curves: > > > You also mentioned that you can't fit real world items into such > growth curves, but you never mentioned that the graph I included > is a remarkably good overall fit, with deviations so small it is > hard to see them at all on such a graph representing eBooks at a > range of 500 eBook increments. > > It's a much more impressive growth curve than anyone predicted-- > except for some of us crazy people. > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060524/45c19288/attachment.html From Bowerbird at aol.com Wed May 24 13:33:14 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed May 24 13:33:20 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <489.b8cd35.31a61d0a@aol.com> michael said: > Then again, some people > think pictures of pages are > as good as full text those are probably the people who want to "just" _read_ the words of the book, and don't want to copy out its text. > but that would entail a different definition. yes, what a pity that their "definition" is so constrained. when black ink is splashed onto a white page of paper, the result is nothing more than a "picture" of the book. but somehow, for over 500 years, that has been enough. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060524/2e7198c3/attachment.html From jmdyck at ibiblio.org Wed May 24 14:38:22 2006 From: jmdyck at ibiblio.org (Michael Dyck) Date: Wed May 24 14:38:27 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: References: <424.1cc65e1.31a33a85@aol.com> <44721054.3000104@ibiblio.org> <4474B0C3.1040906@ibiblio.org> Message-ID: <4474D24E.2050409@ibiblio.org> Michael Hart wrote: > > On Wed, 24 May 2006, Michael Dyck wrote: > >> However, if it makes things easier, I'll gladly refuse to acknowlege >> that statement *now*. A linear growth curve *is* a growth curve, and a >> pretty ordinary one at that. It certainly doesn't make it impossible to >> create an ordinary growth curve. But that's just disagreeing with what >> you said, rather than what you meant. > > > No, a line is not a curve. Well, it is to a mathematician. But if you want to use that definition of "curve", fine, I agree, linear growth is straight, not bendy. I'm *quite* positive that I never refused to acknowlege such a thing. (If you look close enough, PG's ebook count is actually a step function, piecewise constant: after a book is posted, the number of books is constant until the next one is posted. Dunno if that satisfies your definition of "curve".) > However, the case is even more drastic than that, as there was a period > of over a decade when 0 eBooks were added, due to the hassles of the US > Copyright Act of 1976, which took us forever to find out about, and the > truth is that we would probably NEVER have figured them out without the > help of one of the top dozen copyright lawyers in the US. > > Thus, if you INSIST on talking about curves, the curve was downward. Uh, if no books are added, the number is constant, which I would consider "flat" rather than "downward". But if you want to define it as "downward", fine. > Just one more obvious reason why you can't talk about growth curves > for this period of Project Gutenberg's history. A downward curve is still a curve. Though if you want to say it isn't a "growth curve", fine. (Mind you, it's not hard to find talk of "negative growth".) > You also mentioned that you can't fit real world items into such > growth curves, but you never mentioned that the graph I included > is a remarkably good overall fit, (Well, now you're blurring a distinction I made between two meanings of "fit": real world data doesn't "fit" (= rigidly conform to) exponential curves, but you can "fit" (=approximate) it to an exponential curve. But anyway.) And in fact, I HAVE mentioned how closely the PG numbers are approximated by an exponential curve. See, e.g., http://lists.pglaf.org/private.cgi/gutvol-d/2005-January/001263.html and http://lists.pglaf.org/private.cgi/gutvol-d/2005-January/001456.html But now we (well, you, really) have strayed from the topic that brought me in, the comparison between Google's progress and PG's (and dang, I wish I'd changed the subject line at that point), so my interest in this discussion is probably fading. -Michael From sly at victoria.tc.ca Wed May 24 22:00:07 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed May 24 22:00:12 2006 Subject: [gutvol-d] re: accurately converted to any possible format In-Reply-To: <20060524060808.GD5644@pglaf.org> References: <43c.18826a3.31a4978f@aol.com> <20060524060808.GD5644@pglaf.org> Message-ID: On Tue, 23 May 2006, Greg Newby wrote: > > p.s. if you only would have accepted moynihan's offer of his files > > when he made it to you, you'd already _have_ a consistent library. > > I don't recall turning down David, but might have on the > grounds of being unable to effectively ingest & manage the files > he was producing. Today, I'd offer him his own server space > to help him do things his way. If I remember correctly, a big issue with the David Moynihan files was that many of them were not copyright-cleared for PG, and incorporating them into the PG collection would have taken a lot of effort. Andrew From gbnewby at pglaf.org Wed May 24 22:02:50 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Wed May 24 22:02:51 2006 Subject: [gutvol-d] re: accurately converted to any possible format In-Reply-To: <365.53a95d7.31a58c8c@aol.com> References: <365.53a95d7.31a58c8c@aol.com> Message-ID: <20060525050250.GG6694@pglaf.org> On Wed, May 24, 2006 at 06:16:44AM -0400, Bowerbird@aol.com wrote: > i said: > > > as i've said for years now, with a small commitment from you > > > to consistent formatting, i could take plain-ascii files as input, > > > automatically apply the typographic niceties that are expected, > > > and output the results to .pdf and to .html, such that the .html > > > can be converted to a large number of other auxiliary formats. > > greg said: > > Here's an eBook that should meet your requirements: > > http://www.gutenberg.org/etext/18257 > > well yes, that one works just fine. already did it for that one. :+) I don't understand. Where is the URL that does conversion on the fly from that file to arbitrary formats? PDF, HTML...others...with user-specified settings. More response, way at the bottom: > > > You already have server space, > > yes, thank you very much, snowy has been very kind to me... > > > > to provide a conversion utility. > > Looking forward the pudding... > > well, but first there is that minor matter of the small commitment > to ensure that all future files will conform to consistency as well... > > you know, first you plug the leak, and only then clean up the mess. > > and that's not a new request. that was the original precondition. > > but perhaps there's another way out of this impasse. > > > > To each their own pudding.? If I'm not saying "no" to you, > > why would I say, "no" to someone with a different approach? > > it would be silly of me to expect _you_ to tell them "no", > as if they couldn't bloody well do it themselves anyway. > > no, what _i_ am doing is telling them that .tei would be > a waste of their time. i'm giving them a friendly notice, > a heads-up, saying "hey i got it covered, you go on", but > they don't take it that way, they get all pissed off at me... > so, ok, you want to be gruff, i can be gruff, no problem. > > oh yes, greg, _you_ are one of "them". so you would be > telling yourself "no". and i don't expect you to do that, no. > indeed, i don't even expect you to listen to me telling you > that .tei is a waste of your time. that's fine. you'll learn. > > > > I have two other servers (snowy.arsc.alaska.edu and readingroo.ms) > > with complete copies of the PG collection for development. > > ok, now we're getting somewhere. this could break through. > > > > If people don't like the way PG postprocesses & posts eBooks, > > they can grab 'em and do their own postings.? As long as there's > > a reasonable adherance to the principle of unlimited distribution > > etc. (http://www.gutenberg.org/about), we'll even link to 'em! > > i have no stealaway desires, i'm happy to do things under your wing, > my intention is to try and show you -- project gutenberg -- how you > can save yourself a lot of work, and increase your unlimited distribution. > i want to help the best cyberlibrary get better, i don't want to tear it > apart. > > i would definitely agree to demonstrate some automatic transformations > of your e-texts on a library-wide basis that could show you some shit... > > at first this would just be for experimental purposes. no promises. > > however -- if the effort continues, and it should ever come to that -- > my "shaping" of the files by progressive transformations would result in > a substantial fork of the library, but once you have satisfied yourself that > it had retained all important data so the integrity of the books was intact > and that only inconsequential inconsistencies had been removed, you will > be amenable to _considering_ a wholesale replacement in one fell swoop. > > the benefits will be quite obvious, though. > it won't be hard for you to be the decider... > > > > I don't recall turning down David, but might have on the grounds of > > being unable to effectively ingest & manage the files he was producing.? > > > Today, I'd offer him his own server space to help him do things his way. > > ok, now you're _really_ talking. > > because david already knows how to do this. > > unfortunately, as we all know, he's kinda busy right now. > > with any luck, though, he's bored, and restless because > the website that has been his business has been his life > has taken up a lot of his time in the last many years or so > is shut down, with any luck he's itching for some diskspace > to play with. he might be jonesing for an ftp-interaction... > > but you know, honestly, all _i_ would really need from him > is his ascii files. he's already made all of them consistent... > > i've never asked him for them before. but maybe now's the time? > and if you and i ask him together, so i can work on the files for you? > > i've never seen his ascii files for sale, but i'd be willing to pay some. > after all, the reason i want those files is they'd save me a lot of time. > it seems quite reasonable to reimburse him for a little of that time. > especially because he's got lawyer bills, i'm sure. (actually, i'd hope > a lawyer has taken this case pro bono for the great exposure, but > you know there are always lawsuit-related bills that have to be paid. > i consider him a trooper, and i do believe in supporting our troops!) > > anyway... > > with _consistent_ files, i can start turning neat tricks right soon now. > i might have to reshape david-consistency into zml-consistency, but > that'll be a lot easier than reshaping p.g.-inconsistency into anything. > so even if it's not immediate, it would be soon. > > of course, i'm not taking david's files for granted, no sir, > as we haven't even yet asked. and he _is_ busy these days. > he might even say yes, but have no time to fill the request. > > but, to sum up, i would be most grateful for: > 1. on snowy, a copy of the p.g. library that i can start "shaping". ?? It's an official mirror, and you have a server login. Help yourself. /data1/ftp/mirrors/gutenberg http://snowy.arsc.alaska.edu/gutenberg ftp://snowy.arsc.alaska.edu/ There's enough space on /data1, where your home directory is, to make your own copy of as much the collection as you want in your own directory. > 2. also on snowy, a copy of moynihan's ascii files for experiments? I just bought his PDF files, and would buy ASCII if they're for sale. Or, let's ask. > 3. diskspace for david to play, now if he wants, or sometime later. Of course. > the understanding is i'm just playing with your files. > no promises, no expectations, no guarantees. ok? Have fun :) -- Greg > -bowerbird > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From gbnewby at pglaf.org Wed May 24 22:10:24 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Wed May 24 22:10:26 2006 Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <42a.2124045.31a58972@aol.com> References: <42a.2124045.31a58972@aol.com> Message-ID: <20060525051024.GH6694@pglaf.org> > jon ingram said: > > I've scanned almost a thousand books for Distributed Proofreaders, > > and the Internet Archive would be a great place to permanently store > > the images. Every time I've asked them on their website, however, > > they either haven't replied, or have said that letting outside people > > contribute material is something that they're planning on setting up, > > but with no firm date. Woah there, cowboy. I've been waiting for DP to provide raw page scans for *years*. This is something I discussed with Charles & Juliet years ago. The whitewashers are ready. iBiblio is ready. We have other servers if growth is too fast. Yes, that includes the Internet Archive, where we have several usernames...plus our official backup mirror. I've also been pressing to get preprints from DP...scans before the postprocessing is done, to release "to the wild" before they're quite ready. (Last count there are over 800 of these.) There's even a new preprints section (though this might not be the way we'd to DP preprints) at http://preprints.readingroo.ms If you could help to move things forward on either scans or preprints, I'd be very grateful! (Ditto for anyone else reading.) -- Greg From gbnewby at pglaf.org Wed May 24 22:12:27 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Wed May 24 22:12:29 2006 Subject: [gutvol-d] re: accurately converted to any possible format In-Reply-To: References: <43c.18826a3.31a4978f@aol.com> <20060524060808.GD5644@pglaf.org> Message-ID: <20060525051227.GA7507@pglaf.org> On Wed, May 24, 2006 at 10:00:07PM -0700, Andrew Sly wrote: > > > On Tue, 23 May 2006, Greg Newby wrote: > > > > p.s. if you only would have accepted moynihan's offer of his files > > > when he made it to you, you'd already _have_ a consistent library. > > > > I don't recall turning down David, but might have on the > > grounds of being unable to effectively ingest & manage the files > > he was producing. Today, I'd offer him his own server space > > to help him do things his way. > > If I remember correctly, a big issue with the David Moynihan > files was that many of them were not copyright-cleared for > PG, and incorporating them into the PG collection would have > taken a lot of effort. > > Andrew That's true. Today, we have preprints.readingroo.ms, and www.gutenberg.us that have easier procedures for copyright clearance. But very many of David's eBooks are PG titles, just reformatted, and with the PG header/footer/license stripped. -- Greg From hyphen at hyphenologist.co.uk Wed May 24 23:16:58 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Wed May 24 23:17:11 2006 Subject: [gutvol-d] Subject: Kingkong was: [gweekly] PT1b Weekly Project Gutenberg Newsletter Message-ID: On Wed, 24 May 2006 09:50:50 -0700 (PDT), Michael Hart wrote: |General Catalog of Old Books and Authors Including Dates of Death.!!!!! For the rest of the world on Life+70 or Life+50, i.e. non USA, date of death is crucial for copyright. |http://www.kingkong.demon.co.uk/ngcoba/ngcoba.htm | |which now indexes 24,000 books available free online, including all |PG(US) & PG(Aus)'s books, along with some basic date information |about them and their authors where you can find more. IME a great site. He missed a couple of ?my? authors and added them quickly when I informed him. Check that he has your favourite authors. -- Dave Fawthrop "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From Bowerbird at aol.com Thu May 25 01:34:46 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu May 25 01:34:54 2006 Subject: [gutvol-d] re: accurately converted to any possible format Message-ID: <42f.2268306.31a6c626@aol.com> greg said: > I don't understand.? Where is the URL that > does conversion on the fly from that file > to arbitrary formats?? PDF, HTML...others... > with user-specified settings. there is no u.r.l. conversions are done by my viewer-app. you don't need to be involved, since your users can do it all by themselves. > There's enough space on /data1, > where your home directory is, > to make your own copy of > as much the collection as you want > in your own directory. great! i'll figure it out. thank you! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/2d3acbbe/attachment.html From greg at durendal.org Thu May 25 04:39:10 2006 From: greg at durendal.org (Greg Weeks) Date: Thu May 25 05:00:03 2006 Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <20060525051024.GH6694@pglaf.org> References: <42a.2124045.31a58972@aol.com> <20060525051024.GH6694@pglaf.org> Message-ID: On Wed, 24 May 2006, Greg Newby wrote: > If you could help to move things forward on either scans or preprints, > I'd be very grateful! (Ditto for anyone else reading.) I don't have everything on DP, but I have personal copies of everything I've ever scanned. What format do you want them in and where do you want them uploaded to? There are a number of other people who would do this also, even if it's not an official DP thing. -- Greg Weeks http://durendal.org:8080/greg/ From joshua at hutchinson.net Thu May 25 05:26:22 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu May 25 05:26:23 2006 Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <20060525122622.C6712DA5C3@ws6-6.us4.outblaze.com> > ----- Original Message ----- > From: "Greg Newby" > > Woah there, cowboy. > > I've been waiting for DP to provide raw page scans for *years*. This is > something I discussed with Charles & Juliet years ago. The whitewashers > are ready. iBiblio is ready. > Have we ironed out HOW the images should be organized? I remember Marcello put forth a pretty good proposal for image naming/organization, but the response was luke-warm at best. I don't have nearly the backlog of images available that Jon does (the man is a scanning machine) ... but I'd be happy to start submitting images with my books from here on out if we've got the organization worked out. Josh From hart at pglaf.org Thu May 25 10:17:33 2006 From: hart at pglaf.org (Michael Hart) Date: Thu May 25 10:17:36 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <4474D24E.2050409@ibiblio.org> References: <424.1cc65e1.31a33a85@aol.com> <44721054.3000104@ibiblio.org> <4474B0C3.1040906@ibiblio.org> <4474D24E.2050409@ibiblio.org> Message-ID: On Wed, 24 May 2006, Michael Dyck wrote: > Michael Hart wrote: >> >> On Wed, 24 May 2006, Michael Dyck wrote: >> >>> However, if it makes things easier, I'll gladly refuse to acknowlege >>> that statement *now*. A linear growth curve *is* a growth curve, and a >>> pretty ordinary one at that. It certainly doesn't make it impossible to >>> create an ordinary growth curve. But that's just disagreeing with what >>> you said, rather than what you meant. >> >> >> No, a line is not a curve. > > Well, it is to a mathematician. OK, back to basics, I have consulted with some mathematicians, not that I think you didn't know this, but you are pressuring me to make the point, so I will, as gently as possible mostly with the presentation aids provided by the mathematicians. Presumed givens: A 9 year period in which 1 title was added each year to the index of what would later become known as Project Gutenberg. 1971-1979 A ~15 year period of increasing growth as represented by the graph previously presented. 1991-2006 Results: First for: A 9 year period in which 1 title was added each year to the index of what would later become known as Project Gutenberg. 1971-1979 The equation that would describe this is known as a "Linear Equation" of the variety y = mx + b When plotted on the normal "x,y" plane this would be a straight line, no matter what numeric variables you plugged into the equation. When describing such results the term "line" is used to represent a straight line of this nature, while various terms describing curves are used in higher order equations. Very often the term "exponential" is used in common parlance to describe what is really just a "multiplicative" or "geometric" growth pattern, but there is no need to go into that here. When describing such a linear equation in opposition to curved equations, the usual terms are "line" and "curve." We would normally say that a line "intersects" with a curve, in such a case where the equations have a common solution. Trying to fit a straight line into an equation for a curve has been one of the mathematical problems of the ages. Look up "squaring the circle" for some history on this. However, generally speaking, a curve could not contain a portion of a graph that was identical to the portion of a straight line. In this case it has been speculated that the growth of Project Gutenberg listings could be approximated via a curve, and in particular that the end result should be in some total comparison to the curve known as Moore's Law Which specifies that x should double every 18 months. Obviously it would be very rare indeed for real world curves to exactly match mathematical equations in the sense of human endeavors, but a quick look at what we have seen as the report of the dates of each 500 book level passed by Project Gutenberg, would indicate the approximate match to several well known curves. I'll leave it to you to choose which is the best fit. > But if you want to use that definition of "curve", > fine, I agree, linear growth is straight, not bendy. > I'm *quite* positive that I never refused to acknowlege such a thing. You certainly seemed to be yesterday, apparently demanding the above explanations of the difference between lines and curves when describing graphs, intersections, etc. > (If you look close enough, PG's ebook count is actually a step function, > piecewise constant: after a book is posted, the number of books is > constant until the next one is posted. Dunno if that satisfies your > definition of "curve".) I think that is why our mathematical friends above said "approximates" a curve. . .since we are only using the "counting numbers." A true curve would include many other kinds of numbers. However, in this case, using counting numbers as input, and 1/4 year increments on the graph, you do get graphs that would normally be described as curves. Growth curves is the term normally used. In this case the growth "line" does not fit the growth "curve." >> However, the case is even more drastic than that, as there was a period >> of over a decade when 0 eBooks were added, due to the hassles of the US >> Copyright Act of 1976, which took us forever to find out about, and the >> truth is that we would probably NEVER have figured them out without the >> help of one of the top dozen copyright lawyers in the US. >> >> Thus, if you INSIST on talking about curves, the curve was downward. > > Uh, if no books are added, the number is constant, which I would > consider "flat" rather than "downward". But if you want to define it as > "downward", fine. The number used in the equation to create a graph in approximation to the performance would decrease, hence the term "downward" might be applicable; a line with no growth lies "downward" of lines that represent growth statistics. >> Just one more obvious reason why you can't talk about growth curves >> for this period of Project Gutenberg's history. > > A downward curve is still a curve. Though if you want to say it isn't a > "growth curve", fine. (Mind you, it's not hard to find talk of "negative > growth".) In this case it would literally be a "negative growth" of the slope of the line. Technically the second order derivative, which is where these terms probably go beyond what is appropriate here. >> You also mentioned that you can't fit real world items into such >> growth curves, but you never mentioned that the graph I included >> is a remarkably good overall fit, > > (Well, now you're blurring a distinction I made between two meanings of > "fit": real world data doesn't "fit" (= rigidly conform to) exponential > curves, but you can "fit" (=approximate) it to an exponential curve. But > anyway.) I think all this was anticipated in the help I received above. If not, ask for more detailed explanations. > And in fact, I HAVE mentioned how closely the PG numbers are > approximated by an exponential curve. See, e.g., > http://lists.pglaf.org/private.cgi/gutvol-d/2005-January/001263.html > and > http://lists.pglaf.org/private.cgi/gutvol-d/2005-January/001456.html This was also apparently anticipated by our mathematical friends, when they mentioned "approximate match to several well known curves." and "I'll leave it to you to choose which is the best fit." Obviously there are a number of equations that make approximate fits, obvious even to someone who hadn't seen your example in the URLs. > But now we (well, you, really) have strayed from the topic that brought me > in, the comparison between Google's progress and PG's (and dang, I wish I'd > changed the subject line at that point), so my interest in this discussion is > probably fading. Ah, it would appear that you already knew you were painting us into a corner. Then I hope that the great effort spent in replying to your messages was not a total waste for either yourself or the rest of us. It is only a waste to me if no one gains an apprecation of how seriously I take your messages, and of my willingness to provide the best answers. However, as I stated in my opening paragraph, I presumed you already knew all of this and thus presumed you were only asking the question for other reasons. May I ask what those reasons were? > > -Michael Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From hart at pglaf.org Thu May 25 10:51:59 2006 From: hart at pglaf.org (Michael Hart) Date: Thu May 25 10:52:00 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <489.b8cd35.31a61d0a@aol.com> References: <489.b8cd35.31a61d0a@aol.com> Message-ID: It's so simple that even Mr. Bowerbird's rhetoric cannot confuse the issue: A picture of the pages of a book, even if complete and OCRable, is simply not a a full text eBook. 1. It takes many times the drive space. 2. It takes much more download wire time. 3. You can't do ANY of the things you can do with full text, EXCEPT THE MOST IMPORTANT. . .YOU CAN READ IT. But the expense in time and money is much larger, and it's much harder to write research papers. By Mr. Bowerbird's logic, a pre-Gutenerg book would be as useful as a post-Gutenberg book. On Wed, 24 May 2006 Bowerbird@aol.com wrote: > michael said: >> Then again, some people >> think pictures of pages are >> as good as full text > > those are probably the people who > want to "just" _read_ the words of the book, > and don't want to copy out its text. > > >> but that would entail a different definition. > > yes, what a pity that their "definition" is so constrained. > > when black ink is splashed onto a white page of paper, > the result is nothing more than a "picture" of the book. > but somehow, for over 500 years, that has been enough. > > -bowerbird > From hart at pglaf.org Thu May 25 10:53:39 2006 From: hart at pglaf.org (Michael Hart) Date: Thu May 25 10:53:40 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: <15cfa2a50605241326s5caa3578lbad2d63cb1eda73e@mail.gmail.com> References: <424.1cc65e1.31a33a85@aol.com> <44721054.3000104@ibiblio.org> <4474B0C3.1040906@ibiblio.org> <15cfa2a50605241326s5caa3578lbad2d63cb1eda73e@mail.gmail.com> Message-ID: My own predictions have always changed from eBook production to eBook translation at about the 10 million eBook mark. If you haven't seen those prediction, I will repost on request. mh On Wed, 24 May 2006, Robert Cicconetti wrote: > Right. I think what needs to be reiterated here is that any extrapolation is > just that, an extrapolation, and that even a very complex growth curve > fitted to PG's output will vary greatly from reality based on various > environmental factors (how many volunteers are available, free time > available, major reorgs at DP, etc) and the biases built into the model. > Taking a standard exponential growth curve and extrapolating it is not > feasible for the long term. (Extend it a few hundred years out.. baring a > major population explosion, major AI improvements, or other unforeseen > circumstance, it's plainly not tenable. You run out of PD works and human > beings very quickly.) It may be accurate enough for the short term, > however. > > I know a statician on another forum.. perhaps he can explain it more > clearly. > > R C > > On 5/24/06, Michael Hart wrote: >> >> >> On Wed, 24 May 2006, Michael Dyck wrote: >> >> > However, if it makes things easier, I'll gladly refuse to acknowlege >> > that statement *now*. A linear growth curve *is* a growth curve, and a >> > pretty ordinary one at that. It certainly doesn't make it impossible to >> > create an ordinary growth curve. But that's just disagreeing with what >> > you said, rather than what you meant. >> >> No, a line is not a curve. >> >> Linear growth would have just been >> >> 100 eBooks in 100 years. >> 1,000 eBooks in 1,000 years. >> 10,000 eBooks in 10,000 years. >> 20,000 eBooks in 20,000 years. >> >> This is not a growth curve, it is a growth line. >> >> >> However, the case is even more drastic than that, as there was a period >> of over a decade when 0 eBooks were added, due to the hassles of the US >> Copyright Act of 1976, which took us forever to find out about, and the >> truth is that we would probably NEVER have figured them out without the >> help of one of the top dozen copyright lawyers in the US. >> >> Thus, if you INSIST on talking about curves, the curve was downward. >> >> Just one more obvious reason why you can't talk about growth curves >> for this period of Project Gutenberg's history. >> >> * >> >> When There ARE Growth Curves: >> >> >> You also mentioned that you can't fit real world items into such >> growth curves, but you never mentioned that the graph I included >> is a remarkably good overall fit, with deviations so small it is >> hard to see them at all on such a graph representing eBooks at a >> range of 500 eBook increments. >> >> It's a much more impressive growth curve than anyone predicted-- >> except for some of us crazy people. >> >> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d@lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d >> > From jon at noring.name Thu May 25 11:17:32 2006 From: jon at noring.name (Jon Noring) Date: Thu May 25 11:17:37 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: References: <489.b8cd35.31a61d0a@aol.com> Message-ID: <1779074412.20060525121732@noring.name> Michael Hart wrote: > Bowerbird wrote: >> when black ink is splashed onto a white page of paper, >> the result is nothing more than a "picture" of the book. >> but somehow, for over 500 years, that has been enough. > It's so simple that even Mr. Bowerbird's rhetoric cannot confuse the > issue: > > A picture of the pages of a book, even if complete and OCRable, is > simply not a a full text eBook. > > 1. It takes many times the drive space. > > 2. It takes much more download wire time. > > 3. You can't do ANY of the things you can do with full text, > > EXCEPT THE MOST IMPORTANT. . .YOU CAN READ IT. > > But the expense in time and money is much larger, and it's much > harder to write research papers. > > By Mr. Bowerbird's logic, a pre-Gutenerg book would be as useful as > a post-Gutenberg book. Michael hits the nail on the head. The important thing is that in the digital realm, we are no longer constrained to the physical limitations of ink on pressed sheets of pulped cellulosic materials (also known as paper.) Thus, it makes no sense to be constrained in our thinking to the pre-digital world. Nor should we be satisfied with only trying to mimic that world. That is, we need to think of what digital texts could be, and all the various things that they may accomplish, when not constrained as paper books have to be constrained. Therefore, the most important question we should ask is: "What are ALL the things we'd like digitized books to enable?" The full answer to this question establishes a clear list of requirements that our digitizing processes, formats, metadata, and reading systems need to meet. Of course, what I just said is patently obvious. But many of these discussions tend to digress back to "how to emulate paper books" rather than on "how to surpass paper books." I'm happy to see Michael try to push the discussion back into "what can digital books do that paper books cannot do." Jon Noring OpenReader Consortium From bruce at zuhause.org Thu May 25 11:30:12 2006 From: bruce at zuhause.org (Bruce Albrecht) Date: Thu May 25 11:30:15 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: References: <424.1cc65e1.31a33a85@aol.com> <44721054.3000104@ibiblio.org> <4474B0C3.1040906@ibiblio.org> <4474D24E.2050409@ibiblio.org> Message-ID: <17525.63412.804984.36605@celery.zuhause.org> My biggest problem with your growth extrapolations, Michael, is that in the last few years, Distributed Proofreaders has been the primary source of new Project Gutenberg books, and it's clear from their statistics that they're not putting out an exponentially increasing number of books. It may well be that PG had an exponential growth rate in the past (which is easy to do when you're starting out with production in the single digits per year, and grow to thousands per year), but that doesn't mean it's sustainable. Explain to us how you're going to get the labor to validate the ebooks produced, or how the OCR will become reliable enough to skip the validation process, and then we'll believe that you can sustain the growth rates seen when Project Gutenberg's main source of new books when from a few hundred (if that many) dedicated individuals, to about 5000 Distributed Proofreaders. From gbnewby at pglaf.org Thu May 25 11:56:51 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Thu May 25 11:56:52 2006 Subject: [gutvol-d] re: accurately converted to any possible format In-Reply-To: <42f.2268306.31a6c626@aol.com> References: <42f.2268306.31a6c626@aol.com> Message-ID: <20060525185651.GC20994@pglaf.org> On Thu, May 25, 2006 at 04:34:46AM -0400, Bowerbird@aol.com wrote: > greg said: > > I don't understand.? Where is the URL that > > does conversion on the fly from that file > > to arbitrary formats?? PDF, HTML...others... > > with user-specified settings. > > there is no u.r.l. > conversions are done by my viewer-app. > you don't need to be involved, since > your users can do it all by themselves. If it's not online and on the fly, then it's not what I've been talking about. Sorry. You seem to be saying that there is exactly one application in the world that can change a ZML-formatted eBook into HTML, PDF and a variety of other formats. That application is your viewer application, which has already been discussed in gutvol-d. If/when there is such an application that can run on our Unix/Linux servers, operate on the fly, and integrate with the Web back end, it will be great to provide access to PG readers. We've done this for plucker. We have not done it for, for example, text-to-speech because nearly all of the products are for standalone WinPCs. (I've wrestled with Festival somewhat and know it can do the job, but can't figure it out myself. Help invited!). I do think we can do it for Braille with nfbtrans, and I've been negligent in helping Marcello to set it up. Where can users who might be interested download your viewer app from? I don't see it on the snowy site. If it's out there, you could write a little blurb for the PG newsletter inviting people to try it. -- Greg From Bowerbird at aol.com Thu May 25 13:01:24 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu May 25 13:01:42 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <37a.30fde4a.31a76714@aol.com> michael said: > It's so simple that even Mr. Bowerbird's rhetoric > cannot confuse the issue: it's even so simple that mr. hart's rhetoric cannot confuse it. :+) > A picture of the pages of a book, even if complete and OCRable, > is simply not a a full text eBook. a picture, by definition, is not text. whether or not a _scan-set_ qualifies as an _e-book_, however, is more of a semantic issue than anything else. > 1.? It takes many times the drive space. there's no question about that. and no need to reiterate it. > 2.? It takes much more download wire time. there's no question about that. and no need to reiterate it. > 3.? You can't do ANY of the things you can do with full text, except... > EXCEPT THE MOST IMPORTANT. . .YOU CAN READ IT. and here we have the most important concession, finally -- that a person can indeed _read_ an e-book that is a scan-set. so, for the person who _only_ wants to _read_ a book, a scan-set of that book is all that that person needs... nobody, least of all me, is going to argue with the position that digital text is _better_ than a scan-set in a multitude of ways... so if that's what you think this is about, michael, you're wrong. nobody, least of all me, is saying that people should _settle_ for a scan-set instead of digital text, especially for our cyberlibrary. so if that's what you think this is about, michael, you're wrong. > But the expense in time and money is much larger, i can figure out some interpretations of this that make sense, thinking along the lines of file-storage and bandwidth costs. but both of those things are cheap now, and getting cheaper. and a look at the whole picture shows that _scanning_ incurs the biggest cost, in terms of human labor and machine-costs. so it's a good thing a rich company like google is doing _that_. as for the next step -- the o.c.r. followed by the proofing -- that incurs more cost, mostly in the form of human labor costs. so -- actually -- it would _cost_ less to just "settle" for the scans. that would be a false economy, though, because the extra time that it takes to convert a scan-set into digital-text is _worth_it_ (i.e., the benefits of having digital text _and_ the scan-set are significantly greater than those of having _only_ the scan-set, and the increase in benefits is greater than the digitization costs, and this will become increasingly so as we automate the proofing.) so there's no question that we should keep doing the digitization. so if that's what you think this is about, michael, you're wrong. it is important to keep in mind, though, that we have no funds for doing this digitization, so we are relying on _volunteers_, and the number of volunteers we have now, and can reasonably anticipate having in the near future, is not even _close_ to being enough to keep up with the rate at which google is now scanning. and when google kicks up their rate, and the new scanning projects get going as well, and the ones currently operating cumulate their results, the number of undigitized scan-sets will become _huge_... so it's counterproductive -- to say the least -- to continue to foster some kind of unrealistic attitude about these scan-sets as being totally without value. we need to see they are of _immense_ value. to continue to throw rocks at them because "they can't be searched" or "you can't copy-and-paste their text" is silly to the point of stupid. people lived with paper-books -- which cannot be searched and from which you cannot copy-and-paste text -- for 500 years, man! and maybe it's good training for me as an anarchist to have my hero say something silly to the point of stupid. but it sure is disillusioning. the task before us now is to find a way to make use of the millions of scan-sets that are out there, standing in line, waiting to be digitized. > and it's much harder to write research papers. i'll let the researchers worry about writing their papers. after all, that's why they get paid the big bucks, right? > By Mr. Bowerbird's logic, a pre-Gutenerg book > would be as useful as a post-Gutenberg book. only if you try and twist my logic to say things that i don't. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/d5273add/attachment.html From Bowerbird at aol.com Thu May 25 13:04:11 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu May 25 13:04:28 2006 Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries Message-ID: <46b.180c14d.31a767bb@aol.com> jon said: > Of course, what I just said is patently obvious. of course it is. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/c9e7058e/attachment.html From Bowerbird at aol.com Thu May 25 13:46:56 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu May 25 13:47:05 2006 Subject: [gutvol-d] re: accurately converted to any possible format Message-ID: <328.526c6b7.31a771c0@aol.com> greg said: > If it's not online and on the fly, then it's > not what I've been talking about.? Sorry. your model and my model differ. my model calls for online maintenance of one file per book -- the master version -- in z.m.l. format. once a person has downloaded that master version, a program running on their personal machine is used to convert that master version into auxiliary versions. i can certainly port my source-code over to perl, so it would run on the web. but that's not what my model is. my model is to put the power into the hands of the user. i don't like methods that _require_ access to the internet. > You seem to be saying that there is exactly > one application in the world that can change > a ZML-formatted eBook into HTML, PDF and > a variety of other formats.? one so far, yep. but i don't see any reason why a multitude of other programmers couldn't build their own programs that would do the same thing. how many such programs do you think people need? it is also of importance to keep in mind my main orientation, which is to provide a reader-program that is so superior that nobody even _wants_ to do a conversion to another format... realistically, i can't see anyone ever using my app to make a .pdf, because why would they want to use acrobat as their viewer-app? and they won't convert to .html to read the e-book in a _browser_, that's for sure. the only conversion i can see them doing is .html so they can put it on a rocketbook or one of the other handhelds, and those machines will all be on the scrapheap before too long... > If/when there is such an application that can > run on our Unix/Linux servers, operate on the fly, > and integrate with the Web back end, > it will be great to provide access to PG readers. i don't program much in any of the scripting web-languages, so you'd have better luck trying those three d.p. programmers that i pointed you to -- thundergnat, donovan, and bill flis. as a said, their tools are already doing the vast majority of the work involved in ascii-to-html conversions for post-processors. and it would be _great_ to standardize your .html versions. the one-off nature of your current .html versions means that all of them will have to be replaced eventually, which is gonna break the hearts of the post-processors who worked so hard on them and expected that hard work to last many decades... i would be happy to consult with anyone who wants to do an open-source version of these converters. i could certainly offer pseudo-code (and even realbasic code, if it helps) that would serve as a guide in programming routines. for the most part, however, i think the translation of z.m.l. structures to (x).h.t.m.l. should be rather obvious and totally straightforward. i surely have encountered no difficulties in doing exactly what is needed. examples of some .zml files with .html conversions can be seen here: > http://www.greatamericannovel.com/ahmmw/ahmmw.zml > http://www.greatamericannovel.com/ahmmw/ahmmwc001.html > http://www.greatamericannovel.com/mabie/mabie.zml > http://www.greatamericannovel.com/mabie/mabiep001.html > http://www.greatamericannovel.com/myant/myant.zml > http://www.greatamericannovel.com/myant/myantc001.html > http://www.greatamericannovel.com/sgfhb/sgfhb.zml > http://www.greatamericannovel.com/sgfhb/sgfhbc001.html > Where can users who might be interested > download your viewer app from? they can get a beta version by signing on to the beta-test listserve: > zml_talk-subscribe@yahoogroups.com that beta-version is very old, though, and doesn't have the converter routines that we have been talking about here... i'll be bringing the program out of beta in the next month or two; people will be able to download a copy from the z.m.l. website (new!): > http://www.z-m-l.com i now have one brave linux alpha-tester, so watch for a linux version! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/0a7df199/attachment.html From gbnewby at pglaf.org Thu May 25 14:39:45 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Thu May 25 14:39:46 2006 Subject: [gutvol-d] re: accurately converted to any possible format In-Reply-To: <328.526c6b7.31a771c0@aol.com> References: <328.526c6b7.31a771c0@aol.com> Message-ID: <20060525213945.GB24081@pglaf.org> On Thu, May 25, 2006 at 04:46:56PM -0400, Bowerbird@aol.com wrote: > greg said: > > If it's not online and on the fly, then it's > > not what I've been talking about.? Sorry. > > your model and my model differ. I guess so. The main PG server 280000 files in dozens of formats and languages, with millions of hits per months. You're not offering anything that can help them, except for a mailing list subscription for an outdated beta. Your earlier message said you could offer reformatting into any reasonable format, as long as the input was sufficiently well-formed to your standards. Sounds like you don't actually have any such thing. Put it up for free public download, and I'll change my tune in a heartbeat. > my model calls for online maintenance of one file > per book -- the master version -- in z.m.l. format. > > once a person has downloaded that master version, > a program running on their personal machine is used > to convert that master version into auxiliary versions. So you're against storing (or creating) WAP versions, Braille versions and MP3 versions on our server? That's cutting out a whole lot of potential readers I would like to support. Nobody's stopping anyone from running a program on their own computer to do whatever conversion they like. Why are you trying to stop me from enabling various conversions on the server, for people who want or need to get conversion done there? -- Greg > i can certainly port my source-code over to perl, so it > would run on the web. but that's not what my model is. > > my model is to put the power into the hands of the user. > i don't like methods that _require_ access to the internet. > > > > You seem to be saying that there is exactly > > one application in the world that can change > > a ZML-formatted eBook into HTML, PDF and > > a variety of other formats.? > > one so far, yep. > > but i don't see any reason why a multitude of other programmers > couldn't build their own programs that would do the same thing. > > how many such programs do you think people need? > > it is also of importance to keep in mind my main orientation, > which is to provide a reader-program that is so superior that > nobody even _wants_ to do a conversion to another format... > > realistically, i can't see anyone ever using my app to make a .pdf, > because why would they want to use acrobat as their viewer-app? > and they won't convert to .html to read the e-book in a _browser_, > that's for sure. the only conversion i can see them doing is .html > so they can put it on a rocketbook or one of the other handhelds, > and those machines will all be on the scrapheap before too long... > > > > If/when there is such an application that can > > run on our Unix/Linux servers, operate on the fly, > > and integrate with the Web back end, > > it will be great to provide access to PG readers. > > i don't program much in any of the scripting web-languages, > so you'd have better luck trying those three d.p. programmers > that i pointed you to -- thundergnat, donovan, and bill flis. > > as a said, their tools are already doing the vast majority of the > work involved in ascii-to-html conversions for post-processors. > > and it would be _great_ to standardize your .html versions. > the one-off nature of your current .html versions means that > all of them will have to be replaced eventually, which is gonna > break the hearts of the post-processors who worked so hard > on them and expected that hard work to last many decades... > > i would be happy to consult with anyone who wants to do an > open-source version of these converters. i could certainly offer > pseudo-code (and even realbasic code, if it helps) that would > serve as a guide in programming routines. for the most part, > however, i think the translation of z.m.l. structures to (x).h.t.m.l. > should be rather obvious and totally straightforward. i surely > have encountered no difficulties in doing exactly what is needed. > > examples of some .zml files with .html conversions can be seen here: > > > http://www.greatamericannovel.com/ahmmw/ahmmw.zml > > http://www.greatamericannovel.com/ahmmw/ahmmwc001.html > > > http://www.greatamericannovel.com/mabie/mabie.zml > > http://www.greatamericannovel.com/mabie/mabiep001.html > > > http://www.greatamericannovel.com/myant/myant.zml > > http://www.greatamericannovel.com/myant/myantc001.html > > > http://www.greatamericannovel.com/sgfhb/sgfhb.zml > > http://www.greatamericannovel.com/sgfhb/sgfhbc001.html > > > > Where can users who might be interested > > download your viewer app from? > > they can get a beta version by signing on to the beta-test listserve: > > zml_talk-subscribe@yahoogroups.com > > that beta-version is very old, though, and doesn't have the > converter routines that we have been talking about here... > > i'll be bringing the program out of beta in the next month or two; > people will be able to download a copy from the z.m.l. website (new!): > > http://www.z-m-l.com > > i now have one brave linux alpha-tester, so watch for a linux version! > > -bowerbird > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From Bowerbird at aol.com Thu May 25 15:28:01 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu May 25 15:28:09 2006 Subject: [gutvol-d] re: accurately converted to any possible format Message-ID: <219.16282fa8.31a78971@aol.com> Skipped content of type multipart/alternative-------------- next part -------------- An embedded message was scrubbed... From: Greg Newby Subject: Re: [gutvol-d] re: accurately converted to any possible format Date: Thu, 25 May 2006 14:39:45 -0700 Size: 7655 Url: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/1fad2692/attachment.mht From donovan at abs.net Thu May 25 15:18:33 2006 From: donovan at abs.net (D Garcia) Date: Thu May 25 15:41:31 2006 Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...] In-Reply-To: <20060525051024.GH6694@pglaf.org> References: <42a.2124045.31a58972@aol.com> <20060525051024.GH6694@pglaf.org> Message-ID: <200605251818.34015.donovan@abs.net> By way of forking the discussion, on Thursday 25 May 2006 at 01:10 am, Greg Newby responded to Jon Ingram with: > Woah there, cowboy. > > I've been waiting for DP to provide raw page scans for *years*. This is > something I discussed with Charles & Juliet years ago. The whitewashers > are ready. iBiblio is ready. And the volunteer is ready. I volunteered nearly two months ago to take up this task and am simply waiting on various action items from a few people. Charles always intended to have the scans from DP available to the general public whenever possible. > I've also been pressing to get preprints from DP...scans before the > postprocessing is done, to release "to the wild" before they're quite > ready. (Last count there are over 800 of these.) It's an interesting idea, but initially I'd like to focus on getting the existing projects in order. :) > If you could help to move things forward on either scans or preprints, > I'd be very grateful! (Ditto for anyone else reading.) > -- Greg -- David From Bowerbird at aol.com Thu May 25 15:47:21 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu May 25 15:47:29 2006 Subject: [gutvol-d] re: accurately converted to any possible format Message-ID: <439.23d6dc6.31a78df9@aol.com> sorry, i screwed up that last post, and forgot to include this: greg said: > Your earlier message said you could offer > reformatting into any reasonable format, > as long as the input was sufficiently > well-formed to your standards.? actually, as the header shows, i said "any possible format", which is kind of ludicrous in retrospect, isn't it? so let's be precise about exactly _what_ formats we mean, in the future, and let us further provide _samples_ so that people can evaluate the _quality_ of the conversions we do. otherwise, it's just a hype war, and that does no one any good. > Sounds like you don't actually have any such thing. it's not released yet, no. but i have pointed to samples. > Put it up for free public download, > and I'll change my tune in a heartbeat. would that you were so demanding of the .tei folks. *** now, let me restate, just to remind everybody again, i have no objection to the .tei folks, or the .xml folks. i don't even have an objection that .tei is the "official" position on how project gutenberg moves to the future. i merely wish to assert _my_ opinion, which i will back up with solid evidence, that a much simpler methodology will give substantially similar (if not better) benefits, at a cost (both initial and maintenance) that is _significantly_ lower. even then, if people want to stick with the .tei/.xml method, that's fine with me, as it is no skin off my nose. comprenez? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/a168da38/attachment.html From Bowerbird at aol.com Thu May 25 15:56:18 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu May 25 15:56:23 2006 Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...] Message-ID: <3b6.2f08aab.31a79012@aol.com> donovan said: > It's an interesting idea, but initially I'd like to focus > on getting the existing projects in order. :) the volunteer who does the work sets the agenda. :+) but putting up existing scan-sets, so people could read them while they are standing in line waiting to be digitized, seems to me to be the best possible way to put them to use _now_... and if anyone needs another good idea, i think it would be a good idea to round up the scan-sets from google that represent books that are already in the p.g. library... a list of such books has been compiled on the d.p. forums. the focus of that list is now "...so don't bother with these...", but i think they could instead represent a rare opportunity... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/77ac427f/attachment.html From jmdyck at ibiblio.org Thu May 25 16:18:43 2006 From: jmdyck at ibiblio.org (Michael Dyck) Date: Thu May 25 16:18:51 2006 Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries In-Reply-To: References: <424.1cc65e1.31a33a85@aol.com> <44721054.3000104@ibiblio.org> <4474B0C3.1040906@ibiblio.org> <4474D24E.2050409@ibiblio.org> Message-ID: <44763B53.5080801@ibiblio.org> Michael Hart wrote: > > OK, back to basics, I have consulted with some mathematicians, > not that I think you didn't know this, but you are pressuring > me to make the point, I'm not pressuring you to make any point about lines and curves. If you feel pressure to do so, that's unfortunate. If you feel I demanded it, you're mistaken. In your first reply to my posting, you made a statement about "an ordinarly [sic] growth curve", assuming a particular definition for "curve". I disagreed, based on a different definition of "curve". But I also (in the same message!) agreed with you, based on my best guess at what you actually *meant*. (And also disagreed again, based on another possibility for what you meant.) So it's just a little confusion over one's choice of terms. E.g., if you had instead said "an exponential growth curve" (or "geometric growth curve" or "Moore's Law curve"), there wouldn't have been that confusion. (Of course, there still would have been another problem, namely that you ascribed to me a position I had never taken. It would be nice if you apologized for that.) >> But now we (well, you, really) have strayed from the topic that >> brought me in, the comparison between Google's progress and PG's, >> so my interest in this discussion is probably fading. > > Ah, it would appear that you already knew you were painting us into a > corner. I disagree that we're painted into a corner. There's still a chance that this could go in a useful direction, though it does seem slim. > Then I hope that the great effort spent in replying to your messages was > not a total waste for either yourself or the rest of us. For myself, the replies (yours and mine) feel like mostly a waste so far. (Although if people gained some insight into Google's progress by my comparing it with PG's, then that would be a bright spot.) For the rest, I cannot say. > However, as I stated in my opening paragraph, I presumed you already > knew all of this Correctly presumed. > and thus presumed you were only asking the question for other reasons. > > May I ask what those reasons were? Sure, but what question are you talking about? I looked over my last three messages for a question, and the only one I found was (and I quote) "Huh?". (If you're talking about a question in which I ask you to explain lines and curves, that question does not exist.) -Michael From ricardofdiogo at gmail.com Thu May 25 21:42:06 2006 From: ricardofdiogo at gmail.com (Ricardo F Diogo) Date: Thu May 25 21:48:48 2006 Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...] In-Reply-To: <3b6.2f08aab.31a79012@aol.com> References: <3b6.2f08aab.31a79012@aol.com> Message-ID: <9c6138c50605252142h592580fbg7186503ae12e769c@mail.gmail.com> > but putting up existing scan-sets, so people could read them > while they are standing in line waiting to be digitized, seems > to me to be the best possible way to put them to use _now_... > In this case, all DP's Content Providers must be explicitly warned that ALL scans CAN be released to the public. Some foreign legislation may allow people to scan a book and release it to DP (since it's a "closed" website), but may forbid them to allow those scans to be released to the general public. Some volunteers may be scanning books at this moment expecting that only the final E-text will be released. (According to my national law (EU), for instance, I can theoretically scan a 1960's edition (respecting the life+70 rule) and upload it to PGDP/DPE. But having the images freely available online can be very compromising because editors may still have typographic copyright. I don't know how it works all around the world... That's why some special warning would be advisable. 2006/5/25, Bowerbird@aol.com : > > donovan said: > > It's an interesting idea, but initially I'd like to focus > > on getting the existing projects in order. :) > > the volunteer who does the work sets the agenda. :+) > > but putting up existing scan-sets, so people could read them > while they are standing in line waiting to be digitized, seems > to me to be the best possible way to put them to use _now_... > > and if anyone needs another good idea, i think it would be > a good idea to round up the scan-sets from google that > represent books that are already in the p.g. library... > > a list of such books has been compiled on the d.p. forums. > the focus of that list is now "...so don't bother with these...", > but i think they could instead represent a rare opportunity... > > -bowerbird > donovan said: > > It's an interesting idea, but initially I'd like to focus > > on getting the existing projects in order. :) > > the volunteer who does the work sets the agenda. :+) > > but putting up existing scan-sets, so people could read them > while they are standing in line waiting to be digitized, seems > to me to be the best possible way to put them to use _now_... > > and if anyone needs another good idea, i think it would be > a good idea to round up the scan-sets from google that > represent books that are already in the p.g. library... > > a list of such books has been compiled on the d.p. forums. > the focus of that list is now "...so don't bother with these...", > but i think they could instead represent a rare opportunity... > > -bowerbird > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > -- ?Vi de que noite ? feita a luz do dia!? (Antero de Quental) D? livros electr?nicos ao Mundo. Ajude em http://www.pgdp.net e em http://dp.rastko.net From prosfilaes at gmail.com Thu May 25 22:22:49 2006 From: prosfilaes at gmail.com (David Starner) Date: Thu May 25 22:29:31 2006 Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...] In-Reply-To: <9c6138c50605252142h592580fbg7186503ae12e769c@mail.gmail.com> References: <3b6.2f08aab.31a79012@aol.com> <9c6138c50605252142h592580fbg7186503ae12e769c@mail.gmail.com> Message-ID: <6d99d1fd0605252222m6e197ac6hbaaa6c9d3771f4f5@mail.gmail.com> On 5/25/06, Ricardo F Diogo wrote: > In this case, all DP's Content Providers must be explicitly warned > that ALL scans CAN be released to the public. That's always been the assumption, as far as I know. > Some foreign legislation may allow people to scan a book and release > it to DP (since it's a "closed" website), but may forbid them to allow > those scans to be released to the general public. DP lets anyone sign up and download the complete scans for a book. I wouldn't be too trusting that that would cover you under any legal system. Note that DP-EU is different; they don't let just anyone look at more pages then they need to proof, which they expect will cover them for offering US-cleared works from a life+50 server. From gbnewby at pglaf.org Thu May 25 22:33:57 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Thu May 25 22:33:59 2006 Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...] In-Reply-To: <6d99d1fd0605252222m6e197ac6hbaaa6c9d3771f4f5@mail.gmail.com> References: <3b6.2f08aab.31a79012@aol.com> <9c6138c50605252142h592580fbg7186503ae12e769c@mail.gmail.com> <6d99d1fd0605252222m6e197ac6hbaaa6c9d3771f4f5@mail.gmail.com> Message-ID: <20060526053357.GA31126@pglaf.org> On Fri, May 26, 2006 at 12:22:49AM -0500, David Starner wrote: > On 5/25/06, Ricardo F Diogo wrote: > >In this case, all DP's Content Providers must be explicitly warned > >that ALL scans CAN be released to the public. > > That's always been the assumption, as far as I know. > > >Some foreign legislation may allow people to scan a book and release > >it to DP (since it's a "closed" website), but may forbid them to allow > >those scans to be released to the general public. > > DP lets anyone sign up and download the complete scans for a book. I > wouldn't be too trusting that that would cover you under any legal > system. Note that DP-EU is different; they don't let just anyone look > at more pages then they need to proof, which they expect will cover > them for offering US-cleared works from a life+50 server. If an eBooks is public domain in the US, then the scans are too. Even people outside of the US cannot (or at least, not successfully) claim PG can't redistribute them...or anyone else for that matter. I've saved some of our escapades on such issues: http://cand.pglaf.org/ If DP agrees to not redistribute, that's another matter... this is sometimes done for particular sources of content, even if it's public domain. I think it's in everyone's interest to not violate such agreements. We have a little about this in the "Harvesting" section at www.gutenberg.org/howto -- Greg From mlockey at magma.ca Thu May 25 22:22:43 2006 From: mlockey at magma.ca (Michael Lockey) Date: Thu May 25 22:35:35 2006 Subject: [gutvol-d] Distributed Proofreaders Canada Message-ID: <200605260522.k4Q5Mfox010349@mail2.magma.ca> First, let me apologize for the delays; in my case, the word 'health' has been a singleton oxymoron. -Despite all this, DP-CAN should be up in a week. It is currently a very small organization, and there's a whole buncha positions available. Anyone want to moderate a forum, or help in any administrative position; or willing to help CP, is welcome and needed. (Please note that anyone involved in helping us with the generation or processing of Life+50 work must be legal to do so.) Michael Lockey (note new email) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060526/f392c895/attachment.html From tb at baechler.net Thu May 25 22:53:59 2006 From: tb at baechler.net (Tony Baechler) Date: Thu May 25 22:58:19 2006 Subject: [gutvol-d] re: accurately converted to any possible format In-Reply-To: <20060525185651.GC20994@pglaf.org> References: <42f.2268306.31a6c626@aol.com> <20060525185651.GC20994@pglaf.org> Message-ID: <7.0.1.0.2.20060525224931.03e641d0@baechler.net> Greg wrote: >invited!). I do think we can do it for Braille >with nfbtrans, and I've been negligent in helping >Marcello to set it up. Yes, NFBTrans will do conversion to Braille on the fly. I did try to get volunteers to check the output but got no takers. The problem is trying to guess at what formatting commands to send. You can format the Braille files a certain way but it requires trial and error. I'm not a programmer and know little about setting it up. It may be that you have to cat a file of dot commands to set formatting with the PG text to get useful output. There are examples to produce textbooks and other embosser files, but that kind of formatting is not helpful for people using PDAs etc. Yes, this can most definitely be done though automatically. From prosfilaes at gmail.com Thu May 25 23:03:50 2006 From: prosfilaes at gmail.com (David Starner) Date: Thu May 25 23:03:53 2006 Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...] In-Reply-To: <20060526053357.GA31126@pglaf.org> References: <3b6.2f08aab.31a79012@aol.com> <9c6138c50605252142h592580fbg7186503ae12e769c@mail.gmail.com> <6d99d1fd0605252222m6e197ac6hbaaa6c9d3771f4f5@mail.gmail.com> <20060526053357.GA31126@pglaf.org> Message-ID: <6d99d1fd0605252303w711d67d0j5b6a3440c21c1e9a@mail.gmail.com> On 5/26/06, Greg Newby wrote: > If an eBooks is public domain in the US, then the scans are too. > Even people outside of the US cannot (or at least, not successfully) > claim PG can't redistribute them...or anyone else for that matter. > I've saved some of our escapades on such issues: > http://cand.pglaf.org/ I think he was more worried about the content provider's liability. I suspect a copyright holder could get very annoyed about one or two of the books I have provided to DP-EU, but I've personally chosen to have no public connection to those books. However > We are unaware of any case where copyright laws of another country impacts public domain status in the US. is not 100% true. To be pedantic, there are rule 6 clearances possible based on a non-US work not being in copyright in its home country when the URAA was passed. From gbnewby at pglaf.org Thu May 25 23:07:46 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Thu May 25 23:07:47 2006 Subject: [gutvol-d] Re: DPF images archives [Was: Re: Kevin Kelly ...] In-Reply-To: <200605251818.34015.donovan@abs.net> References: <42a.2124045.31a58972@aol.com> <20060525051024.GH6694@pglaf.org> <200605251818.34015.donovan@abs.net> Message-ID: <20060526060746.GA31780@pglaf.org> On Thu, May 25, 2006 at 06:18:33PM -0400, D Garcia wrote: > By way of forking the discussion, on Thursday 25 May 2006 at 01:10 am, Greg > Newby responded to Jon Ingram with: > > Woah there, cowboy. > > > > I've been waiting for DP to provide raw page scans for *years*. This is > > something I discussed with Charles & Juliet years ago. The whitewashers > > are ready. iBiblio is ready. > > And the volunteer is ready. I volunteered nearly two months ago to take up > this task and am simply waiting on various action items from a few people. > Charles always intended to have the scans from DP available to the general > public whenever possible. Responding to Joshua's point about the desired format, as well as Greg W's inquery. There were several messages and some proposals about the details of how to handle page scans. Stuff like whether individual pages should each have their own file, and what format... I will forward a message from Jim Tinsley about that in a moment, from July 2004. There was subsequent discussionn. I don't think we quite got closure, but will ask the WWs if they remember anything specific. My suggestion is to do a few dozen of these, and work out the workflow as we go. If you can upload a .zip or .tar or somesuch to the pglaf server via FTP (not via http://upload.pglaf.org), then email me, I'll push them to the archive. Let me know if you don't have the (non-anonymous) upload/outgoing password for pglaf.org. Ideally, zipped with the eBook #, and with everthing in a page-images, xxxxx-page-images/ subdir: 12345/12345-page-images/ that will allow our automated "push" script to put it in the right place. If things seem to work OK, I'll set things up so I won't need to intervene. I think it's fine to experiment with different ways of doing the images -- that will help us to know what's workable for our readers, and useful for other purposes. Rather than rehashing all of the questions, options and issues, I'd just as soon see some stuff get posted so we can invite folks to try it. (I'm not trying to quell discussion, just trying to avoid the discussion getting in the way of the work.) Thanks for stepping up and trying this! We do want to make images part of the regular workflow, but because the whitewashers tend to download the eBooks to their home/office systems for final processing, we'll probably want to have the page scans flow somewhat separately than everything else. Whoopee, this is great!! Yippee-ei-ayyyyyyyy!! -- Greg > > I've also been pressing to get preprints from DP...scans before the > > postprocessing is done, to release "to the wild" before they're quite > > ready. (Last count there are over 800 of these.) > > It's an interesting idea, but initially I'd like to focus on getting the > existing projects in order. :) > > > If you could help to move things forward on either scans or preprints, > > I'd be very grateful! (Ditto for anyone else reading.) > > -- Greg > > -- David From gbnewby at pglaf.org Thu May 25 23:13:40 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Thu May 25 23:13:43 2006 Subject: [gutvol-d] Page scans draft policy Message-ID: <20060526061340.GA32022@pglaf.org> As I said, there was subsequent discussion about the details of formatting...Here's some info about page scan formats. I note that point 4 is somewhat different than what I just typed in my other message, and seems a whole lot smarter. -- Greg ----- Forwarded message from Jim Tinsley ----- From: "Jim Tinsley" To: "Posted Etexts for Project Gutenberg" Subject: [posted] Posted (#12973, Butler) ! Date: Tue, 20 Jul 2004 20:24:32 -0700 (PDT) Personal Recollections of Pardee Butler, by Pardee Butler 12973 [Editor: Mrs. Rosetta B. Hastings] [Contributor: Mrs. Rosetta B. Hastings] [Contributor: Elder John Boggs] [Contributor: Elder J. B. McCleery] [Link: http://www.gutenberg.net/1/2/9/7/12973 ] [Files: 12973.txt; 12973-h.htm; 12973-page-images] Thanks to Roger for finding and scanning this book. This is the first PG book to be posted with page images. We are now beginning to accept page images along with the regular postings. Of course, DP has always preserved its page images, and those will eventually be uploaded in a big batch, or series of batches, but non-DP contributions may now begin adding page images. For now, we're setting the following guidelines for page image postings: 1. PG is now accepting page images of books posted. Page images will be posted _only_ as an addition to an etext posted in the normal way -- we will not post page images without plain text. 2. Page images are an option; they are not and will not be required for the posting of a text. 3. All page images should be good enough to work reasonably well with OCR packages, up to 600 dpi, and should be stored as black-and-white TIFFs with CCITT-4 (aka ITU-G4 or Fax Group 4) compression. This is important, so that we keep the overall file size down to a sustainable level. With this compression, a typical 600dpi page can be stored for about 40KB. Our ability to post these images depends on the file sizes staying fairly reasonable. Pages such as color pictures or greyscale photos that cannot reasonably be stored as black-and-white only should be stored as TIFF or JPEG with the best compression you can get for that image. (Note: Irfanview for Windows does this nicely individually or in batch. ImageMagick v 6.x: convert myimage.png -compress group4 myimage.tif ) 4. Each page image should be a separate file and named with the page number within the set; e.g. 001.tif, 002.tif, etc. Separate, non-page images, such as covers or color images scanned separately from the pages, should have suitable names, such as "cover.jpg" or "072-image.tif" All page images for the book will be zipped into one file, to be called FILENUMBER-page-images, e.g. 12345-page-images.piz (reverse the extension) for etext #12345, and stored in the main directory for that etext. It will unzip to a subdirectory ./page-images, but we will not post separate page images in that directory, since that would double the space used, and we believe that people who want to consult the images will probably want them all. So, for now at least, if you want the images, you download the PIZ (backwards again) file. jim ----- End forwarded message ----- From gbnewby at pglaf.org Thu May 25 23:19:08 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Thu May 25 23:19:09 2006 Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...] In-Reply-To: <6d99d1fd0605252303w711d67d0j5b6a3440c21c1e9a@mail.gmail.com> References: <3b6.2f08aab.31a79012@aol.com> <9c6138c50605252142h592580fbg7186503ae12e769c@mail.gmail.com> <6d99d1fd0605252222m6e197ac6hbaaa6c9d3771f4f5@mail.gmail.com> <20060526053357.GA31126@pglaf.org> <6d99d1fd0605252303w711d67d0j5b6a3440c21c1e9a@mail.gmail.com> Message-ID: <20060526061908.GA32191@pglaf.org> On Fri, May 26, 2006 at 01:03:50AM -0500, David Starner wrote: > On 5/26/06, Greg Newby wrote: > >If an eBooks is public domain in the US, then the scans are too. > >Even people outside of the US cannot (or at least, not successfully) > >claim PG can't redistribute them...or anyone else for that matter. > >I've saved some of our escapades on such issues: > > http://cand.pglaf.org/ > > I think he was more worried about the content provider's liability. I > suspect a copyright holder could get very annoyed about one or two of > the books I have provided to DP-EU, but I've personally chosen to have > no public connection to those books. Understood, and that's a good approach. Laws in other countries (France & Germany leap right to mind) can be pretty different than the US about their approach to such things. > However > > >We are unaware of any case where copyright laws of another country impacts > >public domain status in the US. > > is not 100% true. To be pedantic, there are rule 6 clearances possible > based on a non-US work not being in copyright in its home country when > the URAA was passed. Sure, I understand about that, and didn't type a very precise message. A more precise attempt: If a book is public domain in the US (including under GATT/URAA etc.), then we (PG and US-based persons) get to treat it as public domain. This is regardless of whether it might still be copyrighted elsewhere. This is upsetting to many copyright holders (including those with copyrights in the US, but for items that are public domain elsewhere). http://cand.pglaf.org for a few examples. -- Greg From Bowerbird at aol.com Fri May 26 00:43:00 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri May 26 00:43:08 2006 Subject: [gutvol-d] Page scans draft policy Message-ID: <37c.416bf76.31a80b84@aol.com> someone said: > 4. Each page image should be a separate file > and named with the page number within the set; > e.g. 001.tif, 002.tif, etc. Separate, non-page images, > such as covers or color images scanned separately > from the pages, should have suitable names, > such as "cover.jpg" or "072-image.tif" this is a bad policy. a wonky naming convention screws everything up. (and an inconsistent convention is a wonky one.) also, it's absolutely imperative that a library have _unique_filenames_ for every single file within it. naming files "001.tif" and expecting their _folder_ to differentiate them is a disaster-in-the-making. for a better way of doing things, check any of the links to the .html files i gave in an earlier message. all of them incorporate the image-scans. you want an alphabetical sort of the filenames to give you the _exact_ order in which the p-book pages were bound... > All page images for the book will be zipped into one file this is a bad policy too. you need to have each image individually accessible. it is _tremendously_ important that this be the case... > we will not post separate page images in that directory, > since that would double the space used, and we believe that > people who want to consult the images will probably want them all. that's a bad assumption. there are all kinds of situations where people want a single scan. (ponder your own experience...) and these scans _need_ to be accessible via the web, not just as a download package. again, very important! if it comes down to a choice between saving them in a zip file and saving them individually, toss the zip file _immediately_... *** your policy on the way in which the scans are saved simply must reflect the realities about how those scans will eventually be used. spend some time analyzing those uses so you make good decisions. -bowerbird p.s. here are those links again, straight into their subdirectories: > http://snowy.arsc.alaska.edu/bowerbird/mabie/ > http://snowy.arsc.alaska.edu/bowerbird/myant/ > http://snowy.arsc.alaska.edu/bowerbird/tolbk/ > http://snowy.arsc.alaska.edu/bowerbird/sgfhb/ > http://snowy.arsc.alaska.edu/bowerbird/henty/ > http://snowy.arsc.alaska.edu/bowerbird/bachwm/ > http://snowy.arsc.alaska.edu/bowerbird/ahmmw/ the .html files are on another machine; you can't see them as a group, but you can view-source on any one of them to see how they operate. in each case, the .html files were auto-generated from the .zml file, and the image-file is spliced in to facilitate "continuous proofreading". > http://www.greatamericannovel.com/ahmmw/ahmmw.zml > http://www.greatamericannovel.com/ahmmw/ahmmw.html > http://www.greatamericannovel.com/mabie/mabie.zml > http://www.greatamericannovel.com/mabie/mabiep001.html > http://www.greatamericannovel.com/myant/myant.zml > http://www.greatamericannovel.com/myant/myantc001.html > http://www.greatamericannovel.com/sgfhb/sgfhb.zml > http://www.greatamericannovel.com/sgfhb/sgfhbc001.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060526/4aa6344c/attachment.html