From ke at gnu.franken.de Sat Dec 1 00:17:55 2007 From: ke at gnu.franken.de (Karl Eichwalder) Date: Sat, 1 Dec 2007 09:17:55 +0100 (CET) Subject: [gutvol-d] The Advent Calendar will be up tomorrow In-Reply-To: <2vt1l3dhg8beqp31guqq8aaltc06198r91@4ax.com> References: <748ba8e50711300336g6066d752h6ba65f375fdeb3ed@mail.gmail.com> <3n60l3devcg49a88g6h5gu6m46gedtt5mb@4ax.com> <47509F69.20301@netronome.com> <2vt1l3dhg8beqp31guqq8aaltc06198r91@4ax.com> Message-ID: <51940.83.171.144.232.1196497075.squirrel@www.franken.de> > On Fri, 30 Nov 2007 18:40:25 -0500, you wrote: > >>Will the links be enabled separately day by day? > > No. Just like a chocolate calendar you should be able to abuse it. This is a good one :-D But those who behave that badly have to be punished somehow... From robert_marquardt at gmx.de Sat Dec 1 03:24:52 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Sat, 01 Dec 2007 12:24:52 +0100 Subject: [gutvol-d] The Advent Calendar will be up tomorrow In-Reply-To: <51940.83.171.144.232.1196497075.squirrel@www.franken.de> References: <748ba8e50711300336g6066d752h6ba65f375fdeb3ed@mail.gmail.com> <3n60l3devcg49a88g6h5gu6m46gedtt5mb@4ax.com> <47509F69.20301@netronome.com> <2vt1l3dhg8beqp31guqq8aaltc06198r91@4ax.com> <51940.83.171.144.232.1196497075.squirrel@www.franken.de> Message-ID: <53h2l3hh7lkh41m8eqnl494qc148t4utus@4ax.com> On Sat, 1 Dec 2007 09:17:55 +0100 (CET), you wrote: >This is a good one :-D But those who behave that badly have >to be punished somehow... Punishment in Christmas time? That is the job of Santa. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From ralf at ark.in-berlin.de Sun Dec 2 04:01:03 2007 From: ralf at ark.in-berlin.de (Ralf Stephan) Date: Sun, 2 Dec 2007 13:01:03 +0100 Subject: [gutvol-d] The Advent Calendar will be up tomorrow In-Reply-To: <51940.83.171.144.232.1196497075.squirrel@www.franken.de> References: <748ba8e50711300336g6066d752h6ba65f375fdeb3ed@mail.gmail.com> <3n60l3devcg49a88g6h5gu6m46gedtt5mb@4ax.com> <47509F69.20301@netronome.com> <2vt1l3dhg8beqp31guqq8aaltc06198r91@4ax.com> <51940.83.171.144.232.1196497075.squirrel@www.franken.de> Message-ID: <20071202120103.GB14608@ark.in-berlin.de> > >>Will the links be enabled separately day by day? > > > > No. Just like a chocolate calendar you should be able to abuse it. > > This is a good one :-D But those who behave that badly have > to be punished somehow... Intellectual obstipation? Happens often enough ... ralf From Bowerbird at aol.com Mon Dec 3 09:35:56 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 3 Dec 2007 12:35:56 EST Subject: [gutvol-d] still waiting Message-ID: still waiting for carlo to weigh in with that wdiff tutorial... -bowerbird ************************************** Check out AOL's list of 2007's hottest products. (http://money.aol.com/special/hot-products-2007?NCID=aoltop00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071203/d25b1b0e/attachment.htm From Bowerbird at aol.com Tue Dec 4 09:44:39 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 4 Dec 2007 12:44:39 EST Subject: [gutvol-d] the cost of an e-book authoring system Message-ID: some people have taken issue with the price i put on the _source_code_ for my e-book applications. (the programs themselves are free of cost.) evidently, they don't know what the market will bear... here's the latest piece of information they might want to consider: > http://www.futureofthebook.org/blog/archives/2007/12/a_grant_for_sophie_from_the_ma.html sophie, an authoring-tool from the institute for the future of the book, just received a grant of $400,000, which will "ensure" that v1.0 gets out the door. note that this is on top of gobs of other cash sophie has received in the past, not to mention the money that went toward earlier iterations (notably tk3)... -bowerbird ************************************** Check out AOL's list of 2007's hottest products. (http://money.aol.com/special/hot-products-2007?NCID=aoltop00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071204/28634511/attachment.html From Bowerbird at aol.com Fri Dec 7 16:01:26 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 7 Dec 2007 19:01:26 EST Subject: [gutvol-d] 2000 books digitized Message-ID: distributed proofreaders has passed the 2000-book milestone for 2007... congratulations to the hard-working volunteers for their accomplishment. -bowerbird ************************************** Check out AOL's list of 2007's hottest products. (http://money.aol.com/special/hot-products-2007?NCID=aoltop00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071207/a0b3ec85/attachment.htm From Bowerbird at aol.com Mon Dec 10 12:19:38 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 10 Dec 2007 15:19:38 EST Subject: [gutvol-d] that proofing challenge Message-ID: i've made headway on that proofing challenge. to refresh your memory: > http://z-m-l.com/misc/thechallenge.png > http://z-m-l.com/misc/thechallenge2.png this table depicts one way to unravel the thing: > http://z-m-l.com/misc/thechallenge3.html if anyone here is actually interested enough to ask questions about it, i'll be happy to answer... otherwise, i won't bore you with implementation. -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071210/0d866238/attachment.htm From radicks at bellsouth.net Mon Dec 10 18:34:12 2007 From: radicks at bellsouth.net (Dick Adicks) Date: Mon, 10 Dec 2007 21:34:12 -0500 Subject: [gutvol-d] Amazon's Kindle Message-ID: Does anybody know if Amazon's new Kindle is capable of downloading PG books? The Amazon website does not mention that applicability, so I assume that the device is limited only to what Amazon sells. Dick Adicks From Bowerbird at aol.com Mon Dec 10 20:30:09 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 10 Dec 2007 23:30:09 EST Subject: [gutvol-d] Amazon's Kindle Message-ID: you can send a file -- text, .html, .rtf -- to amazon to have it kindle-ized, yes. but probably the best bet is to get the 10000-book d.v.d. from silkpagoda.com (formerly blackmask.com) wherein much of the p.g. library has been converted. -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071210/54eaa15b/attachment.htm From desrod at gnu-designs.com Mon Dec 10 21:33:57 2007 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Tue, 11 Dec 2007 00:33:57 -0500 (EST) Subject: [gutvol-d] Amazon's Kindle In-Reply-To: <20071211035609.GN5916@localhost> References: <20071211035609.GN5916@localhost> Message-ID: > The device certainly is not limited to Amazon's offerings. The > Kindle reads natively Mobibook (no-drm) *.mobi and *.pdb and *.txt > formats. > It permits you to install such files free via USB. (Or you can send > them wireless to the device for 10 cents a file). And Kindle > supports SD cards up to 4GB. It still submits the titles, length, bookmarks, annotation, etc. of ANY title you happen to be reading, back upstream (their TOS is clear on this point). From greg at durendal.org Thu Dec 13 11:22:50 2007 From: greg at durendal.org (Greg Weeks) Date: Thu, 13 Dec 2007 14:22:50 -0500 (EST) Subject: [gutvol-d] pre-press Message-ID: What was the address for the pre-press site? Posting the texts before they are ready for full posting has come up in a discussion on DP. -- Greg Weeks http://durendal.org:8080/greg/ From hart at pglaf.org Thu Dec 13 12:22:36 2007 From: hart at pglaf.org (Michael Hart) Date: Thu, 13 Dec 2007 12:22:36 -0800 (PST) Subject: [gutvol-d] pre-press In-Reply-To: References: Message-ID: You want to SEND to that site, or RECEIVE from that side??? mh On Thu, 13 Dec 2007, Greg Weeks wrote: > > What was the address for the pre-press site? Posting the texts before they > are ready for full posting has come up in a discussion on DP. > > -- > Greg Weeks > http://durendal.org:8080/greg/ > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From greg at durendal.org Thu Dec 13 12:27:04 2007 From: greg at durendal.org (Greg Weeks) Date: Thu, 13 Dec 2007 15:27:04 -0500 (EST) Subject: [gutvol-d] pre-press In-Reply-To: References: Message-ID: On Thu, 13 Dec 2007, Michael Hart wrote: > > You want to SEND to that site, or RECEIVE from that side??? Receive from it. I just wanted to show some people at DP what PG is doing with other sources material that not quite ready for full release. Greg Weeks > On Thu, 13 Dec 2007, Greg Weeks wrote: > >> >> What was the address for the pre-press site? Posting the texts before they >> are ready for full posting has come up in a discussion on DP. >> >> -- >> Greg Weeks >> http://durendal.org:8080/greg/ >> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d >> > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -- Greg Weeks http://durendal.org:8080/greg/ From gbnewby at pglaf.org Thu Dec 13 12:39:33 2007 From: gbnewby at pglaf.org (Greg Newby) Date: Thu, 13 Dec 2007 12:39:33 -0800 Subject: [gutvol-d] pre-press In-Reply-To: References: Message-ID: <20071213203932.GA24302@mail.pglaf.org> On Thu, Dec 13, 2007 at 03:27:04PM -0500, Greg Weeks wrote: > On Thu, 13 Dec 2007, Michael Hart wrote: > > > > > You want to SEND to that site, or RECEIVE from that side??? > > Receive from it. I just wanted to show some people at DP what PG is doing > with other sources material that not quite ready for full release. > > Greg Weeks http://preprints.readingroo.ms Enjoy :) > > > >> > >> What was the address for the pre-press site? Posting the texts before they > >> are ready for full posting has come up in a discussion on DP. > >> > >> -- > >> Greg Weeks > >> http://durendal.org:8080/greg/ > >> > >> _______________________________________________ > >> gutvol-d mailing list > >> gutvol-d at lists.pglaf.org > >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > >> > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d at lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > -- > Greg Weeks > http://durendal.org:8080/greg/ > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From jon at noring.name Thu Dec 13 13:07:20 2007 From: jon at noring.name (Jon Noring) Date: Thu, 13 Dec 2007 14:07:20 -0700 Subject: [gutvol-d] Announcing: The Digital Text Community mailing list Message-ID: <522962738.20071213140720@noring.name> Everyone, This is the second and final formal announcement in other forums on the launch of "The Digital Text Community" (DTC), a public mailing list (run on YahooGroups) devoted to serious discussion of digitizing "ink-on-paper" publications. The full group description is found at the group's "home page" at: http://groups.yahoo.com/group/digital-text/ The group was launched a month ago, and already has over 170 subscribers, including notables from a number of text digitization projects. (The subscriber list includes, of course, a few people involved with Project Gutenberg and Distributed Proofreaders.) Discussion is beginning to become quite active. Anyone interested in any aspect of the digitization of texts is invited to subscribe and participate. I will be happy to manually subscribe those who don't want to go through the process of getting Yahoo accounts to subscribe -- just let me know in private email and I'll add you to the list, no muss, no fuss! Note that posted messages to the group are lightly moderated, intended only to remove spam, off-topic messages, and messages whose tone or substance violate the group's "Prime Directive" of cordiality and respect towards others (refer to the group description for further information on the moderation policy.) ***** (Further info taken from the first announcement) The primary reason why we started DTC is that there is, surprisingly, no independent, cross-project forum to discuss the various technical and non-technical issues of digitizing "ink-on-paper" publications. Current discussion on digitizing paper publications is disjointly spread around in various nooks and crannies of the Internet. For example, there are forums for particular digitization projects such as those run by Project Gutenberg (e.g. "gutvol-d") and Distributed Proofreaders (they maintain a web-based forum.) And then there are forums which touch upon various issues of text digitization but which is not their main focus. Examples include the The eBook Community (TeBC) and Book Futures (BF, note that I am a moderator for both of these Yahoo-based groups.) The summary purpose of DTC is given in the last paragraph of the DTC group description: "This group is not affiliated with any particular project or organization, but rather is independent. It is hoped this group will be a bridge between the various text digitization projects, enabling information exchange for everyone?s benefit." Do consider subscribing to DTC. If you need any help with subscribing to the group, let me know. Look forward to seeing you there! Jon Noring The Digital Text Community Administrator From lee at novomail.net Fri Dec 14 13:12:53 2007 From: lee at novomail.net (Lee Passey) Date: Fri, 14 Dec 2007 14:12:53 -0700 Subject: [gutvol-d] Normalizing XML and text files. Message-ID: <4762F1D5.4000705@novomail.net> A couple of weeks ago I suggested that normalizing different files is the required first step in any automated attempt to correct them. On the off chance that anyone is interested, I now have written a program which will normalize XML and non-XML ASCII files without losing critical markup. I now have it working to a point where, although it is not complete of error-free, I feel comfortable sharing it with others. If anyone is interested in obtaining a copy of the source code and a Windows executable, contact me back-channel and I'll send you a copy. To compile and link, the program requires the domcapi source (http://sourceforge.net/projects/domcapi/) and the expat source (http://sourceforge.net/projects/expat/). From Bowerbird at aol.com Fri Dec 14 16:47:30 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 14 Dec 2007 19:47:30 EST Subject: [gutvol-d] the word for the weekend is "widget" Message-ID: there are a lot of funny ideas floating around, about systems of proofing, and tools to do it... some are funny "ha-ha", some are funny "stupid", and some are funny "ha-ha, that's _really_ stupid". but no matter... because the word for the weekend is "widget". that's right, a widget, which grabs the page-scan for a page, and the o.c.r. text for that page, and presents them to users, and lets them fix errors in the text _or_ confirm it as correct; the widget then scoots the results back up to your website... a simple widget. that's all your proofer ever needs to see... you know what they say -- "a page a day". but i betcha can't eat just one... w-i-d-g-e-t. -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071214/f54a1846/attachment.htm From richfield at telkomsa.net Fri Dec 14 00:12:18 2007 From: richfield at telkomsa.net (Jon Richfield) Date: Fri, 14 Dec 2007 10:12:18 +0200 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? Message-ID: <47623AE2.1010501@telkomsa.net> I have several books that to my mind are well worth preserving, but still in copyright and unlikely to be conserved by less eccentric spirits (not in time, anyway! Have you noticed the totally unnecessarily transient nature of many books nowadays?) Some paperbacks that are still well in copyright, are almost unscannable already. Now, I realise that there are good reasons for observing copyright rules, but on the also reasonable assumption that as amorphous a structure as PG should survive the time of many of us, what is the view on accumulating books or other publications that only emerge from protection in a decade or two? At present, should one submit a candidate that dates from too close to 1922, be it never so out of print, all you get is "Not OK". Then all one can do is to sit on the product for years or till one loses one's own collection or marbles. (I tried to contact PG Aus some months ago, but got no response at all. I might try again soon. Does anyone have a sure-fire address for getting their attention?) However, it seems to me that if we established a repository, a sort of PG Purgatory, or at least PG Limbo, in which properly prepared books could rest for a few years or decades, hibernating till copyright lapsed, this would be harmless at worst. If a list of dormant titles were available, that would enable persons in a position to waive copyright, to contact PG to authorise expedited eclosion. Given the current price of bulk data storage, it sounds doable to me. How about "Mr Belloc Objects" By Wells, eg? (1926) How about "Comic and curious verse" by J.M. Cohen 1952? Limbo's full of such, many of them sliding towards oblivion, and sliding much faster than old copies of Punch or Scientific American. Have I missed anything in the faqs, that deals with this point? Cheers, Jon From Bowerbird at aol.com Sat Dec 15 00:12:20 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 15 Dec 2007 03:12:20 EST Subject: [gutvol-d] Why wait till we have to work from bookworm frass? Message-ID: the supreme court _did_ rule that "time-shifting" _is_ legal, after all... ;+) -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071215/ed4cc82b/attachment.htm From johnson.leonard at gmail.com Sat Dec 15 08:25:34 2007 From: johnson.leonard at gmail.com (Leonard Johnson) Date: Sat, 15 Dec 2007 11:25:34 -0500 Subject: [gutvol-d] Publishing founding father papers held up Message-ID: <748ba8e50712150825o67e29e6dr6cef0828ed8c6925@mail.gmail.com> This seems like a project that Project Gutenberg should get involved. Excerpted from article in "Washington Post" article _In the Course of Human Events, Still Unpublished_. http://www.washingtonpost.com/wp-dyn/content/article/2007/12/14/AR2007121402119.html?wpisrc=newsletter "More than 200 years after they were written, huge portions of the papers of America's founding fathers are still decades away from being published, prompting a distinguished group of scholars and federal officials to pressure Congress to speed the process along." "But the Pew-led lobbyists are not satisfied that enough has been accomplished, especially McCullough (Pulitzer Prize winning author of _John Adams_), who does not believe that a quicker completion would sacrifice quality. Instead, he blames the slow progress on "the little fiefdoms of each project, which have been working in their own way in their world for over two generations." Note: "Little fiefdoms", I think, refers to various institutions of higher learning that hold the copies of this material. While we, PG, would be unable to annotate the papers, if they were scanned and made publicly available, we could digitize what is there, and a wider number of scholars could be invited to help with the additional annotation necessary to more fully understand the papers in the context in which they were written. Even unannotated, these papers must be very valuable. Anyway, this is an interesting article. Len Johnson -- http://members.cox.net/leaonarddjohnson/ From jeroen.mailinglist at bohol.ph Sat Dec 15 09:14:34 2007 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Sat, 15 Dec 2007 18:14:34 +0100 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <47623AE2.1010501@telkomsa.net> References: <47623AE2.1010501@telkomsa.net> Message-ID: <47640B7A.60601@bohol.ph> Well, I think Google and a couple of others are scanning entire libraries wholesale, and I guess they too will sit on their non-PD scans until freedom comes. Others can do similar things.... On the other hand, we still have so many things in the PD to do, that even with a concentrated effort, when finally the next year of stuff becomes available in the US in 2018 or so, we probably have not eaten through it all.... In my opinion, PG is less about preservation, but more about accessibility (although to keep works a part of living culture also requires accessibility) Jeroen. Jon Richfield wrote: > I have several books that to my mind are well worth preserving, but > still in copyright and unlikely to be conserved by less eccentric > spirits (not in time, anyway! Have you noticed the totally unnecessarily > transient nature of many books nowadays?) Some paperbacks that are > still well in copyright, are almost unscannable already. > > Now, I realise that there are good reasons for observing copyright > rules, but on the also reasonable assumption that as amorphous a > structure as PG should survive the time of many of us, what is the view > on accumulating books or other publications that only emerge from > protection in a decade or two? At present, should one submit a > candidate that dates from too close to 1922, be it never so out of > print, all you get is "Not OK". Then all one can do is to sit on the > product for years or till one loses one's own collection or marbles. (I > tried to contact PG Aus some months ago, but got no response at all. I > might try again soon. Does anyone have a sure-fire address for getting > their attention?) > > However, it seems to me that if we established a repository, a sort of > PG Purgatory, or at least PG Limbo, in which properly prepared books > could rest for a few years or decades, hibernating till copyright > lapsed, this would be harmless at worst. If a list of dormant titles > were available, that would enable persons in a position to waive > copyright, to contact PG to authorise expedited eclosion. Given the > current price of bulk data storage, it sounds doable to me. How about > "Mr Belloc Objects" By Wells, eg? (1926) How about "Comic and curious > verse" by J.M. Cohen 1952? Limbo's full of such, many of them sliding > towards oblivion, and sliding much faster than old copies of Punch or > Scientific American. > > Have I missed anything in the faqs, that deals with this point? > > Cheers, > > Jon > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > From Bowerbird at aol.com Sat Dec 15 12:34:17 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 15 Dec 2007 15:34:17 EST Subject: [gutvol-d] Publishing founding father papers held up Message-ID: len said: > Anyway, this is an interesting article. sure is. when the library of congress accuses you of being slow, that means you're being _very_ slow... this is just a cozy cash cow for a small group of scholars... and as such, yes, it'd be great fun to rip it from their hands. unfortunately, libraries all over the place are starting to view their unique volumes as something they can withhold for cash in the coming digital cyberlibrary, an attitude that is only gonna bankrupt all of them in the long run, and shortchange our access to the cultural heritage that we _thought_ they were saving for us... i'm convinced the only way we're gonna shake some sense into them is to declare eminent domain on the institutions that are now holding our cultural heritage for ransom, starting with the academic journals... -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071215/6f06b3e1/attachment.htm From Bowerbird at aol.com Sat Dec 15 12:38:03 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 15 Dec 2007 15:38:03 EST Subject: [gutvol-d] Why wait till we have to work from bookworm frass? Message-ID: jeroen said: > PG is less about preservation, but more about accessibility that's a false dichotomy we should remove from our thinking. -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071215/9f9c1e04/attachment.htm From jon at noring.name Sat Dec 15 12:41:41 2007 From: jon at noring.name (Jon Noring) Date: Sat, 15 Dec 2007 13:41:41 -0700 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: References: Message-ID: <01165190.20071215134141@noring.name> Bowerbird wrote: > jeroen said: >>?PG is less about preservation, but more about accessibility > that's a false dichotomy we should remove from our thinking. Agreed. Jon Noring From Bowerbird at aol.com Sat Dec 15 13:29:30 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 15 Dec 2007 16:29:30 EST Subject: [gutvol-d] two overarching thoughts on a roundless system of proofing Message-ID: i'll have a lot more later -- it's written already, but i think i will wait until monday to send it to this list -- but here are two overarching thoughts about implementing a _roundless_ system of proofing... (in case you're wondering why, this is a topic that is being discussed over on the d.p. forums, presently, and often over the past few years. and it's a shame it never moves past the discussion phase, since the current system -- where _every_page_of_every_book_ is slated to go through a specific number of rounds -- is grossly inefficient, and has led to a huge waste of time and energy, plus endless discussions and a wide array of experiments to overcome its obvious shortcomings. however, the discussion is marred by a bunch of people who simply don't know what they're talking about, and by the fact that no one over there seems to be able to separate the wheat from the chaff...) anyway, here are those two overarching thoughts. 1. it's unnecessary to "formulate some kind of metric" to inform you when a specific page can be considered "finished". it is _done_ when a certain number of people -- say 2 to 4 -- can't find any errors in it. at that point, even if there _are_ still errors in it, it has simply become unproductive to schedule yet _another_ set of eyes to look for them... but, for the vast majority of pages, there just won't be any errors left. you don't have to believe me. just try it -- as the simplest thing that _might_ work -- and you will happily discover it does indeed work... 2. it's unnecessary to "formulate some kind of metric" to inform you about the proofing skills of each volunteer. it's easy enough to use the obvious measures to determine a score, but it's unnecessary to _use_ that score in order to assign pages to the proofer, since the measure of whether a page is "finished" or not is impervious to the skill levels of the proofers. if 2-4 "average" proofers find no errors left on a page, then the odds are that a "great" proofer won't either. and -- once again -- you don't have to believe me that this is true; try it -- as the simplest thing that _might_ work -- and find it does... in other words, don't make it more complicated that it has to be... -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071215/db02ce07/attachment.htm From julio.reis at tintazul.com.pt Sun Dec 16 07:36:56 2007 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio_Reis?=) Date: Sun, 16 Dec 2007 15:36:56 +0000 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: References: Message-ID: <47654618.4050900@tintazul.com.pt> > Now, I realise that there are good reasons for observing copyright > rules, but on the also reasonable assumption that as amorphous a > structure as PG should survive the time of many of us, what is the > view on accumulating books or other publications that only emerge from > protection in a decade or two? Well, PG should survive the time of many of us... really? Only if they stick to the laws. If PG starts saving copies of copyrighted books in 2008, then I'm afraid it might not see the light of 2009. Scanning a book is copying it. Proof? If you destroy the paper book, you still can read the scans. Therefore scanning is making a copy, therefore not legal. Supposing it were legal to scan a book and keep the copy personal, then perhaps Gutenberg could scan its own paper books and not show them to anyone else; when the time came they could release it in the rounds. Not before, as I think making the scans available in the rounds is actually allowing anyone who bothers to register in PGDP, to read the scans. Supposing PG or PGDP could keep copies of books waiting for copyright. What would you do if the copyright were extended in the USA? Frustrating. So in these terms what can be done is campaign for more public domain, less profiteering over works which should be freely given over to Humanity. I even find the concept that an author should be able to live his whole life out of the earnings of one book very strange; let alone that his grandchildren, 69 years after death would still be living off it. People who write books -- can't they have 50 years of rights from first publication, and save money for old age should they live after that period, like the rest of us? Wouldn't it be nice to pay homage to an 80-year old writer who wrote a great book 50 years ago, and which is passing to the whole of Humanity right now? Wouldn't she enjoy it? Seeing a spurt of publication and of reading / use of her work? But I'm straying from topic. More food for thought: if you don't draw the line in legality, where do you draw the line? I liked the movie /Eragon/ -- shouldn't we scan Christopher Paolini's book now when we can? And what about some of the nice books which are being sold this month? I say we have our plates full when it comes to releasing works in the public domain. And getting back to lobbying for shorter copyright terms -- I see my part in it as being a contributor to PG, to make it more and more relevant. I urge you to do the same. So we can say in a few years "200 thousand books says people CARE for public domain!" And the Electronic Frontier Foundation et al. will do the legal and campaigning stuff. > I tried to contact PG Aus some months ago, but got no response at > all. I might try again soon. Does anyone have a sure-fire address > for getting their attention? That's the way forward, I think. Publish where it's legal to. If it's not legal, then don't publish. My opinion. J?lio aka Tintazul. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071216/4077edea/attachment.htm From Bowerbird at aol.com Sun Dec 16 10:52:49 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 16 Dec 2007 13:52:49 EST Subject: [gutvol-d] more complicated that Message-ID: i said: > in other words, don't make it more complicated that it has to be... ha-ha! :+) note to self: in your text clean-up programs, insert a check for "more complicated that" as an error for "more complicated than". *** along with the not/now confusion -- the most pernicious of all, since it changes the meaning, and is often hard to diagnose -- and he/be (which always gives o.c.r. apps the heebie-jeebies), than/that is a mix-up where both options are high-frequency... so, in general, it might be good to program some checks for it... (although i don't remember how often it shows up empirically...) *** but let's play with google to check the specifics of _my_ goof... > 0,081,200 = "more complicated that" > 3,950,000 = "more complicated than" > 4,031,200 = total cases > 0,031,200 = estimated cases where "that" was correct > 4,000,000 = cases where "that" was incorrect for "than" so, roughly: > 80,000/4,000,000 > 8,000/400,000 > 800/40,000 > 80/4,000 > 8/400 > 2/100 = 2% *** i looked at roughly 100 hits, to see how often they were indeed errors... (and the vast majority of them were.) first, it happens to the best of us, like jon udell, writing for o'reilly: > Capturing a screencast needn't be much more complicated that > capturing static screenshots. and second of all, there's this page: > OK, the real world is more complicated that > what is envisioned in the Heckscher-Ohlin model, > but it is also more complicated that > what is envisioned in the Ricardian model as well. where the error appears twice within a single sentence... or this interesting variant: > You're more complicated that that. which also appears here: > It can get more complicated that that, so when you're ready, you can and, altogether, represents 20% (16,300) of the total 81,200 cases... the phrase _can_ and _does_ appear in a correct form, of course: > It is a little more complicated that way (you have to back track). or as in this particularly compelling quote: > This is, effectively, a slam on Microsoft for making everything so much > more complicated that with thousands of times more capability it still > takes the same amount of time to do things. Moore?s law marches on, > but the amount of time you spend waiting on your computer remains static. but in almost all of the _correct_ cases that google located, it was because the phrase contained some punctuation between "complicated" and "that": > other means would have been more complicated. That is what he is saying (google's blindness to punctuation often makes it inappropriate for this task.) and, in one case out of the 100 i looked at, the "correct" form was vague: > Many problems that occur with lawn mowers are > more complicated that fixing it yourself may be > too much trouble. (in this passage, "more" should probably have been "so", but so be it.) *** curiously, > "less complicated that that" = 578 (roughly 5% of the total "that" cases) > "less complicated that" = 10,300 > "less complicated than" = 129,000 it seems to be less prevalent that things are "less complicated than" -- as opposed to "more" -- but it's also the case that, when they are, it's less likely that the phrase will be used correctly. the numbers say the glitch occurs in 10,000 out of 130,000 cases, a whopping 7.7%... *** it's also interesting to look at "complex", as opposed to "complicated": > 0,186,000 = "more complex that" > 4,050,000 = "more complex than" > 4,236,000 = total cases > 0,236,200 = estimated cases where "that" was correct > 4,000,000 = cases where "that" was incorrect for "than" so, roughly: > 186,000/4,000,000 > 186/4,000 > 200/4000 > 100/2000 > 10/200 > 1/20 = 5% so "complex" looks to be twice as error-prone as "complicated"... even though things _generally_ seem "less complex" more rarely, glitch-wise it doesn't seem to matter whether it's "more" or "less"... > "less complex that that" = 1,400 (about 12% of the overall total) > "less complex that" = 11,400 > "less complex than" = 190,000 so, roughly: > 10,000/200,000 > 10/200 > 1/20 or 5%, the same as "more complex that" *** it sure is fun to play with words, isn't it? :+) -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071216/f2c4c438/attachment-0001.htm From lee at novomail.net Sun Dec 16 11:17:24 2007 From: lee at novomail.net (Lee Passey) Date: Sun, 16 Dec 2007 12:17:24 -0700 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <01165190.20071215134141@noring.name> References: <01165190.20071215134141@noring.name> Message-ID: <476579C4.9000200@novomail.net> Jon Noring wrote: > Bowerbird wrote: >> jeroen said: > >>> PG is less about preservation, but more about accessibility > >> that's a false dichotomy we should remove from our thinking. > > Agreed. Disagreed. The distinction between preservation and accessibility (in the sense of making a work available and usable) is very real, and has a serious impact on processes. Both Google and the Open Content Alliance are engaged in efforts to "digitize" books, many of which are in the public domain. If I understand correctly, these efforts consist of making scan sets of the books, then doing an uncorrected OCR stage which references points in the image files. This makes it possible to search for "text" to the extent that the OCR is correct, but all you get back is an image of the page. In other words, the Open Content Alliance is about as useful as microfilm. Preservation: 10 - Accessibility: 2. At the other end of the spectrum we have Project Gutenberg, which records nothing about the provenance of a work, strips public domain works of all but the alphabetic text, silently corrects "obvious" typographical errors, and occasionally creates hybrid works by combining various editions without comment or justification. Preservation: 2 - Accessibility: 6 (it would get higher marks for accessibility were it not for the strong bias against markup). The obvious differences between these two projects is due to their underlying priorities; the dichotomy between accessibility and preservation is very real, and has profound practical consequences. Now this does not mean that I think that the two philosophies are fundamentally irreconcilable. I think that thorough TEI encoding of a text can capture the meaning of the physical artifact almost as well as a high-resolution image. And TEI is not so esoteric that it cannot be easily used both as a presentation format using CSS, as a conversion format using XSLT or perl, or for other automated data processing functions which have nothing to do with presentation. However, I do think there may be a point where the goals of "warts and all" is so fundamentally inconsistent with "no warts at all" that some choice between the two may be inevitable. From jon at noring.name Sun Dec 16 11:23:23 2007 From: jon at noring.name (Jon Noring) Date: Sun, 16 Dec 2007 12:23:23 -0700 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <476579C4.9000200@novomail.net> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> Message-ID: <796087170.20071216122323@noring.name> Lee wrote: > Jon Noring wrote: >> Bowerbird wrote: >>> jeroen said: >>>> PG is less about preservation, but more about accessibility >>> that's a false dichotomy we should remove from our thinking. >> Agreed. > Disagreed. > > The distinction between preservation and accessibility (in the sense of > making a work available and usable) is very real, and has a serious > impact on processes. Er, ok. I interpreted Bowerbird's comment, maybe incorrectly, that one really can't separate preservation from access. One preserves in order to make something accessible at a future date. And PG is certainly about both. (Whether they do it effectively is another matter, but I'm talking about the intent.) Certainly, digital preservation of content (making something accessible in the distant future) adds requirements to those for making something accessible just for today. For example, there have to be requirements to assure the digital content even makes it to the future, and if the digital content makes it to the future, that it is accessible to the degree desired. Hopefully that clarifies my "agreed". Jon From jeroen.mailinglist at bohol.ph Sun Dec 16 14:24:08 2007 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Sun, 16 Dec 2007 23:24:08 +0100 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <476579C4.9000200@novomail.net> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> Message-ID: <4765A588.3000003@bohol.ph> Lee, you phrased nicely a line of thinking I follow, and even when I say PG is more about accessibility, I fully agree with your 6... improving metadata, cataloging, and tagging could increase that to a 7, and adding an integrated library/reading system around it may turn it to a 10. Accessibility is important for preservation. For only those works that are part of a vibrant culture are ultimately preserved, as part of our cultural heritage, and not just dust collecting on shelves. Only works that can be freely cited, re-used, and re-purposed will survive. That is why copyright has become the biggest enemy of cultural heritage, as it starves works to death in a prison, only to release them when they've starved and have become disconnected from cultural life after having been forgotten for three or four generations. You can't preserve what you do not love, and you do not love what you do not know. I have been a "believer" in TEI for over 10 years now... For me it works, and helps me to produce ebooks by large numbers, without having to dive in HTML trouble all of the time. Jeroen. Lee Passey wrote: > Jon Noring wrote: > >> Bowerbird wrote: >> >>> jeroen said: >>> >>>> PG is less about preservation, but more about accessibility >>>> >>> that's a false dichotomy we should remove from our thinking. >>> >> Agreed. >> > > Disagreed. > > The distinction between preservation and accessibility (in the sense of > making a work available and usable) is very real, and has a serious > impact on processes. > From prosfilaes at gmail.com Sun Dec 16 17:41:58 2007 From: prosfilaes at gmail.com (David Starner) Date: Sun, 16 Dec 2007 20:41:58 -0500 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <476579C4.9000200@novomail.net> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> Message-ID: <6d99d1fd0712161741l2d8f85c0waeb187b8712e80f@mail.gmail.com> On Dec 16, 2007 2:17 PM, Lee Passey wrote: > At the other end of the spectrum we have Project Gutenberg, which > records nothing about the provenance of a work, ... > it would get higher marks for accessibility were it > not for the strong bias against markup). I think Michael Hart has a point that WordStar files may not be readable in a few years... oh, wait, did you mean now? The way you were talking, I thought you were talking about the 20th century. From Bowerbird at aol.com Mon Dec 17 00:23:04 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 17 Dec 2007 03:23:04 EST Subject: [gutvol-d] Why wait till we have to work from bookworm frass? Message-ID: if you do not have _both_ "access" and "preservation", somebody has cheated you along the line somewhere, to the point where you'll have absolutely nothing at all. so beware the mindset that tries to make you _choose_. it is little more than a trick designed to make you _lose_. -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071217/84d76b73/attachment.htm From ralf at ark.in-berlin.de Mon Dec 17 02:45:48 2007 From: ralf at ark.in-berlin.de (Ralf Stephan) Date: Mon, 17 Dec 2007 11:45:48 +0100 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <476579C4.9000200@novomail.net> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> Message-ID: <20071217104548.GB7788@ark.in-berlin.de> > At the other end of the spectrum we have Project Gutenberg, which > records nothing about the provenance of a work, strips public domain > works of all but the alphabetic text, silently corrects "obvious" > typographical errors, and occasionally creates hybrid works by combining > various editions without comment or justification. Preservation: 2 - > Accessibility: 6 (it would get higher marks for accessibility were it > not for the strong bias against markup). The strong bias of proofreaders and uploaders, you should say, because that's where the stats are made. But! The line between preservation and accessibility will be redrawn when PG's relatively accessible works get bibliographic metadata AND this metadata is referred to by library projects like http://firstsearch.oclc.org or http://www.eromm.org I think that, in addition to scans, these projects will at some time index text versions, too. And PG/pglaf should be prepared, then, to provide metadata of the books of which the texts are made. ralf From lee at novomail.net Mon Dec 17 09:49:29 2007 From: lee at novomail.net (Lee Passey) Date: Mon, 17 Dec 2007 10:49:29 -0700 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <20071217104548.GB7788@ark.in-berlin.de> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> <20071217104548.GB7788@ark.in-berlin.de> Message-ID: <4766B6A9.5080400@novomail.net> Ralf Stephan wrote: >> At the other end of the spectrum we have Project Gutenberg, which >> records nothing about the provenance of a work, strips public domain >> works of all but the alphabetic text, silently corrects "obvious" >> typographical errors, and occasionally creates hybrid works by combining >> various editions without comment or justification. Preservation: 2 - >> Accessibility: 6 (it would get higher marks for accessibility were it >> not for the strong bias against markup). > > The strong bias of proofreaders and uploaders, you should say, > because that's where the stats are made. Nice spin! I see a bright future for you as a commentator for Fox News. I am under the impression from your comments that you view enhanced, accurate metadata, including accurate historical data, as A Good Thing. It is not quite as clear, but it seems that you also view the increased use of markup as A Good Thing. I find it slightly offensive that you would attempt to blame the PG volunteers for the inadequacies of the PG corpus. I believe that if proofreaders and uploaders have been contributing inadequate work product which has become part of the PG corpus, it is either because they have been encouraged to do things that way by the PG FAQ and culture, or because they have not been educated as to best known practices for digitization of printed works. As I see it, the ultimate cause of the current state of the PG corpus is either bad leadership or the failure of leadership. But, wherever you choose to assign blame, that does nothing to change the state of the corpus. If PG gets a rating of 6 out of 10 for accessibility, the practices that have caused it only to get 6 out of 10 are pretty much irrelevant, unless there is some commitment to change those practices--and I don't see /that/ happening any time soon. > But! The line between preservation and accessibility will be > redrawn when PG's relatively accessible works get bibliographic > metadata AND this metadata is referred to by library projects > like http://firstsearch.oclc.org or http://www.eromm.org > I think that, in addition to scans, these projects will at some > time index text versions, too. And PG/pglaf should be prepared, then, > to provide metadata of the books of which the texts are made. In my earlier message I tried to make the distinction between the ability to obtain a particular work and the ability to make use of a particular work once it had been obtained. Both of these components make up the overall principle of accessibility. If the ability to find and download a work were the only consideration, Google and OCA would both get a higher score than Project Gutenberg. But having downloaded a book from Google one is pretty much limited to having a photo album of pictures of pages. Given the need for a relatively powerful computer with a reasonably large display to see this pictures, these Google photo albums are actually less useful than a paper book. Easier to get, perhaps, but less useful once obtained. Project Gutenberg's e-texts, on the other hand, can be viewed on just about every piece of equipment ever manufactured, including S-100 bus machines running CP/M attached to VT-52 dumb terminals. The pure ASCII representation (well, not so pure now that LATIN-1 encoding is permitted) can easily be repurposed for other applications not envisioned by the original creators, such as defeating spam filters. (I don't see this as a bad thing, rather it is a real-life example of the usefulness of flexible, repurposable text). This commitment to the least common denominator, however, has also led to documents which are not pleasing to read on modern equipment, and which, in many ways, prevents other useful repurposing, such as the automated creation of catalogs. Because these documents are less useful than they easily could be, they are also less accessible than they could be. Even if PG could accurately recapture much of the metadata which it has lost, and even if it could become better referenced by certain library projects, that would not do much to increase the usability or repurposability of the existing e-texts; there would be improvements to their accessibility, but they would not be large scale improvements. -- From piggy at netronome.com Mon Dec 17 11:47:12 2007 From: piggy at netronome.com (La Monte H.P. Yarroll) Date: Mon, 17 Dec 2007 14:47:12 -0500 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <47623AE2.1010501@telkomsa.net> References: <47623AE2.1010501@telkomsa.net> Message-ID: <4766D240.8000801@netronome.com> Jon Richfield wrote: > I have several books that to my mind are well worth preserving, but > still in copyright and unlikely to be conserved by less eccentric > spirits (not in time, anyway! Have you noticed the totally unnecessarily > transient nature of many books nowadays?) Some paperbacks that are > still well in copyright, are almost unscannable already. > Scan them now and keep them on your own machine. You are unlikely to find yourself prosecuted over it. Depending on how rational the laws in your country are, it might even be legal. I've had moderate success looking up the authors of such books and asking them to either make their book available under a Creative Commons license or make a license especially for PG. In principle it is possible to get books added to PG under these conditions. I'm still waiting on action from PGLAF for my most recent such effort. I included a scan of the letter from the author granting a CC license. The clearance inspector told me that we needed a letter from the author. I pointed out that there WAS a letter from the author in the clearance request. I don't know what the problem is at this point. From hart at pglaf.org Mon Dec 17 12:00:47 2007 From: hart at pglaf.org (Michael Hart) Date: Mon, 17 Dec 2007 12:00:47 -0800 (PST) Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <4766B6A9.5080400@novomail.net> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> <20071217104548.GB7788@ark.in-berlin.de> <4766B6A9.5080400@novomail.net> Message-ID: Nobody forces anybody to do eBooks any particular way at PG. If you would like to write your own FAQs about how you think eBooks should be done, please do so, and we will try to find as many volunteers for your methodology as possible, and 90% of all our volunteers just might go that way, who knows? We have found that the best way to gain such volunteers is a set of examples made to your own specifications and posted-- then you can just point to them and as people if they should not like this kind better than what else is available. We'd be only too glad to post your FAQ in various locations, including the Newsletters, etc. Thanks!!! Michael S. Hart Founder Project Gutenberg Recommended Books: Dandelion Wine, by Ray Bradbury: For The Right Brain Atlas Shrugged, by Ayn Ran,: For The Left Brain [or both] Diamond Age, by Neal Stephenson: To Understand The Internet The Phantom Toobooth, by Norton Juster: Lesson of Life. . . From gbnewby at pglaf.org Mon Dec 17 13:51:36 2007 From: gbnewby at pglaf.org (Greg Newby) Date: Mon, 17 Dec 2007 13:51:36 -0800 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <4766D240.8000801@netronome.com> References: <47623AE2.1010501@telkomsa.net> <4766D240.8000801@netronome.com> Message-ID: <20071217215136.GA9598@mail.pglaf.org> > Jon Richfield wrote: > > I have several books that to my mind are well worth preserving, but > > still in copyright and unlikely to be conserved by less eccentric > > spirits (not in time, anyway! Have you noticed the totally unnecessarily > > transient nature of many books nowadays?) Some paperbacks that are > > still well in copyright, are almost unscannable already. Project Gutenberg regularly receives such items (sometimes in the hopes that they'll be judged as public domain in the US, under our copyright clearance procedures). As a library (under the US's tax ruling), we are legally able to archive such items indefinitely, but not redistribute them. In short, you can send such items to me (or to Michael Hart) and we'll do our best to hold them until they become public domain in the US. -- Greg From Bowerbird at aol.com Mon Dec 17 15:08:54 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 17 Dec 2007 18:08:54 EST Subject: [gutvol-d] a rare breath of fresh air Message-ID: one of my most important messages to the people over at d.p. was that the use of preprocessing of their text would save _lots_ of proofer time. they didn't want to hear it, but i nonetheless told them over and over... so it's nice when the topic re-emerges, and a new preprocessing tool, no matter how simple, is truly a welcome and rare breath of fresh air: > http://www.pgdp.net/phpBB2/viewtopic.php?t=30903&start=0 -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071217/ea666e99/attachment.htm From Bowerbird at aol.com Tue Dec 18 02:56:16 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 18 Dec 2007 05:56:16 EST Subject: [gutvol-d] stop it Message-ID: stop taking bits and pieces of my messages here and posting them to other lists. stop it right now. you know who you are, and so do i, so stop it now. i've stopped talking to you here because "dialog" with you is worthless. so stop exporting my posts. -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071218/db93a585/attachment.htm From bzg at altern.org Tue Dec 18 03:33:09 2007 From: bzg at altern.org (Bastien) Date: Tue, 18 Dec 2007 12:33:09 +0100 Subject: [gutvol-d] stop it In-Reply-To: (Bowerbird@aol.com's message of "Tue, 18 Dec 2007 05:56:16 EST") References: Message-ID: <874pegb5l6.fsf@bzg.ath.cx> Bowerbird at aol.com writes: > stop taking bits and pieces of my messages here > and posting them to other lists. stop it right now. > you know who you are, and so do i, so stop it now. So why don't you just send him a message? Please don't use the list to put pressure on people that you're not directly naming, it looks suspicious and it's worth nothing. -- Bastien From Bowerbird at aol.com Tue Dec 18 08:10:10 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 18 Dec 2007 11:10:10 EST Subject: [gutvol-d] stop it Message-ID: it's jon noring and lee passey. and -- as i said -- they know who they are. -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071218/aa68fc04/attachment.htm From Bowerbird at aol.com Tue Dec 18 10:12:51 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 18 Dec 2007 13:12:51 EST Subject: [gutvol-d] quick note on "roundless proofing" Message-ID: i'll be posting a longish message later today summarizing my approach to a methodology of "roundless proofing". although i will not bother to accommodate the whole of the baggage over at distributed proofreaders -- because, frankly, most of it is unnecessary -- my post will nonetheless be enlightening to anyone from there whose head is not stuck in the sand. of course, this is just one more step along a path where i've been creating all the tools to put together a system that's integrated across the workflow. one of the common excuses given over at d.p. for not moving forward on a roundless system is that their developers are overloaded. that might be, but it also indicates they don't have a clue how simple this programming is. at $100/hour, which they've acknowledged as the going rate, my estimate is that i could code a working demo in approximately 50 hours, meaning that the expense would be within the range of reason. of course, to anyone who knows our happy little history, the notion that d.p. would pay me to design a system for them is cause for chuckles. nonetheless, the offer is open... :+) -bowerbird p.s. here's another way for d.p. to proceed on this, offered _free_of_charge:_ put every page of the book into a wiki, and turn your volunteers lose on it... ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071218/ad2a48ba/attachment.htm From Bowerbird at aol.com Tue Dec 18 10:16:31 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 18 Dec 2007 13:16:31 EST Subject: [gutvol-d] thoughts on a wiki Message-ID: i said: > p.s. here's another way for d.p. to proceed on this, offered _free_of_charge:_ > put every page of the book into a wiki, and turn your volunteers lose on it... oh yeah, one of the things on my list for 2008 is to wikify the entire p.g. library. let's invite the general public in and see if they can help clean up those e-texts... -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071218/0ec7c57e/attachment.htm From Bowerbird at aol.com Tue Dec 18 15:27:22 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 18 Dec 2007 18:27:22 EST Subject: [gutvol-d] stop it Message-ID: uh, lee, don't bother posting _here_ to my threads either... i don't read your posts; i do not care what you have to say. i do not want to have dialog with you, none, on any topic... i don't want you taking my fragments from here elsewhere; that's all that needs to be said in this thread, and i've said it. rewrite your posts for other listserves, and leave me out of it. you and noring just confuse the issues. so leave me out of it. -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071218/34c58d87/attachment.htm From Bowerbird at aol.com Tue Dec 18 16:58:47 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 18 Dec 2007 19:58:47 EST Subject: [gutvol-d] my thoughts on a roundless system of proofing Message-ID: here are those notes on a "roundless" system of proofreading... i've done a lot of research that indicates to me unequivocally that the "old" style of proofreading -- verifying each word in each line against a scan -- will come to be seen as _hopelessly_inefficient_, so keep that in mind as you read this. but i'll proceed nonetheless. oh, and i don't care to hear how this and/or that will or will not fit into the d.p. system, since i don't care about the d.p. system. this is _my_ system, which you might not care about either... :+) *** i'm just going to talk about this from the standpoint of the proofer. there _is_ back-end work involved (e.g., keeping track of the files), and a lot of front-end peripheral stuff as well, but i'll ignore all that. we'll just look at it in the same way that the proofers will look at it. there are several "selection criteria" that each proofer would specify, including what they would proof -- which languages, genres, etc. -- but let's also assume that we've surmounted all of those obstacles... the final selection criteria that you would have to specify would be to say whether you want to proof pages that are "raw", "better", or "done". from this point on, you can absorb this more like a list of instructions, and a description to the proofer on how the system works as a whole. *** you'll be presented with a text-field, and the scan-image of the page. fix all errors you can find, then say if the page is "better" or it's "done". if -- after the last change is made on a page -- 2-4 proofers in a row _confirm_ the page as being "done", then it is considered as "finished". but any _change_ automatically resets the confirmation counter to zero. (the question as to whether the pages need 2 or 3 or 4 _confirmations_ is one that will need to be subjected to some real-world empirical testing, balancing the benefits against the costs of doing the extra confirmations. we'd assume, of course, that more confirmations mean greater accuracy, but 4 confirmations is _twice_ as costly as 2 confirmations, so we need to gauge whether the extra accuracy is worth the cost of extra confirmations. we'd start out requiring 4 confirmations, but will measure the number of errors that are discovered after 2 and 3 confirmations, and their _nature_. after some time, we should able to rationally assess the cost-benefit ratio. i have a guess about what's optimal, but there's no need for guess-work. note that it's also the case that people who want "higher-quality books" can choose to work at the 3- and 4-confirmation levels to ensure that, whereas people who think the 2-confirmation level is fine can do _that_. and the people who like to tackle raw o.c.r. can do that totally guilt-free.) ok, back to the instructions now... you will be informed of all pages that are changed after you proofed them. you can _dispute_ a change that's made after you, or _confirm_ it's correct. any pages in dispute can be specifically requested by any other proofer, so disputes will be resolved quickly, and proofers can come to full agreement. (after all, the point of disputes is to answer the question of "best practice".) if you've marked a page as "done", and someone later makes a change to it, and that correction is valid (i.e., it is later confirmed by 2-4 other proofers), your _accuracy_score_ goes down. take pride in your accuracy score, folks! if you have marked a page as "done", and it goes on to become "finished" too, that will raise your _accuracy_score_. again, take pride in your accuracy score! at the same time, if you never declare a page as "done", any remaining errors will _not_ be held against you. so if you'd rather be safe than sorry, _be_safe_. confirmations _raise_ your accuracy score, as long as they are _valid_ ones... inappropriate confirmations, on the other hand, _lower_ your accuracy score. the specific point-values for all the relevant actions haven't been decided, but correcting the last error on a page is the best way to raise your accuracy score! here are some rough approximations, though: 7 points -- making the last change to a page (i.e., none made after you) 6 points -- declaring a page "done" and having it confirmed as "finished" 5 points -- being the 2nd confirmation that a page is actually "finished" 4 points -- being the 3rd confirmation that a page is actually "finished" 3 points -- being the 4th confirmation that a page is actually "finished" 2 points -- making a page better by correcting one or more errors 1 point -- proofing a page, but not finding errors, because it has none 0 points -- not finding any errors, even though there are one or more -1 points -- declaring a page as "done" when it still contains an error -2 points -- 2nd confirmation as "done" when the page contains an error -3 points -- 3rd confirmation as "done" when the page contains an error -4 points -- 4th confirmation as "done" when the page contains an error -5 points -- introducing two or more errors onto a page the main idea here is that a person will get points for sticking their neck out, _providing_ that they were right. but if they're wrong, they get decapitated... your accuracy score is _public_, as is the number of pages you have proofed. (but neither of the variables has an effect on the pages you are given to proof! our initial assumption is that everyone is capable of proofing every single page. at the same time, however, no one is _forced_ to do any page either, so if _you_ decide any one page is "too difficult" for you at that moment, just don't proof it.) again, that was just a rough pass at the point-values, and nobody should get too obsessed with the points, because they're just there for "bragging rights", and to remind people that the _quality_ is just as meaningful as the _quantity_. you make _your_own_judgment_ on the importance of "quantity" and "quality". it's not necessary that you move a page to "done", just that you make it "better". (but if you _do_ call a page "done", then you had better make sure it really _is_.) differing priorities on the value of "quality" will not adversely effect the output, because every page will keep getting proofed until a consensus emerges on it. even if you do make a goof on a page, it's no big deal, because _every_ change needs to be confirmed by other eyes, so there's no danger of errors persisting. although we call everyone "proofers", a page should _not_ be marked as "done" until it is free of all scanning and printer errors, _and_ properly formatted too! (so, again, if all you want to do is _proof_, then just mark the page as "better". any action improving the page is something you can, and should, be proud of. you don't have to do formatting if you don't want, and you're not "penalized".) formatting is done with z.m.l., so once a page is formatted, it's completely done, and when all of the pages in the book are done, the _book_ is finished as well... (we'll still want one person to finalize it by checking it at the "whole-book level".) a book's pages are structured like a wiki, so every revision to each page is saved. it's possible for any proofer to "step through" the consecutive edits of each page. again, when 2-4 people call a page "done" -- with no changes -- it's "finished". but any time another change is made, the confirmation counter is re-set to zero. when a page is displayed, any questionable aspects about it are _flagged_in_red_. if the page is marked as "done" while it still contains questionable _words_ on it, those words are automatically added to the _vocabulary_ for that book, which is the text-file used to determine which words will be flagged on subsequent pages. this means that the next time that page is displayed, the word will not be flagged. however, proofers are able to constantly view the vocabulary for a specific book, to check if a questionable word has been added to it, and those proofers can then specifically call up the page containing that word, so as to _verify_ that it's correct. proofers can also do a "find" on a keyword/phrase, and proof pages containing it. if a "badword" is corrected, you can search for other instances of "badword", and view them on a book-wide basis, then approve the correction on those other pages. if you find and correct a scanno that appears on 20 different pages, you'll get credit for making every one of those 20 pages "better". but check each of them carefully, because you'll also get docked if one of those changes shouldn't have been made! when you mark a page as "better", it means that it's not "done" yet, and needs either (1) more eyes to look at it, and/or (2) specialized treatment from a certain "expert". the specific type of "expert" that is required can be noted in a series of checkboxes, like "greek", "tables", "index", "ads", and so on. specialists can request such pages... proofers can jump around in the book randomly, or step through it page-by-page. a page will move from "raw" to "finished" over the course of hours, maybe _minutes_. so the viewing of "diffs" will be quite _immediate_, to get everyone on the same page. one point of a roundless system is to move a book from start to finish within _days_. let the books jostle for entrance into the system; but once in, get 'em out right away! since a book will come to "completion" so quickly, proofers interested in their "diffs" should plan on examining them within 24 hours, while their experience is still fresh. as for a "meaningless diff", there's no such thing. there's one -and-only-one _correct_ way to proof and format a page, and our sole object is to move the page to that state... that doesn't mean that any one particular proofer will be _able_ to attain that goal, and there's no shame in being unable to do so. just mark the page as "better", and go on... once all pages in a book are considered "finished", the book moves into the stage of "continuous proofreading", where it essentially sits in the wiki for about 3-6 months, during which time anyone can "smooth-read" it, and make any necessary corrections. (and yes, any later corrections ramify back on your "accuracy score" too, so be sharp!) ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071218/9542842e/attachment.htm From ralf at ark.in-berlin.de Wed Dec 19 01:34:16 2007 From: ralf at ark.in-berlin.de (Ralf Stephan) Date: Wed, 19 Dec 2007 10:34:16 +0100 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <4766B6A9.5080400@novomail.net> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> <20071217104548.GB7788@ark.in-berlin.de> <4766B6A9.5080400@novomail.net> Message-ID: <20071219093416.GA29329@ark.in-berlin.de> > I am under the impression from your comments that you view enhanced, > accurate metadata, including accurate historical data, as A Good Thing. > It is not quite as clear, but it seems that you also view the increased > use of markup as A Good Thing. That is so. However, catalog.rdf is a start and 'just' needs to be augmented with 1) original publisher, 2) publishing year, 3) publishing place 4) possible edition and series information, 5) possible editor/ translator info, and 6) a working link to the etext page instead of simply 'etext21990'. In short, metadata about the etext should 'link' to that info which is given to pglaf when a clearance is requested, the metadata of the book. Now, it's possible there is/will be a format tailored for holding both info (MODS?). OTOH, Dublin Core RDF should be able to do that, too. So, where can pglaf metadata be accessed? > Project Gutenberg's e-texts, on the other hand, can be viewed on just > about every piece of equipment ever manufactured, including S-100 bus > machines running CP/M attached to VT-52 dumb terminals. The pure ASCII > representation (well, not so pure now that LATIN-1 encoding is > permitted) can easily be repurposed for other applications not > envisioned by the original creators, such as defeating spam filters. (I > don't see this as a bad thing, rather it is a real-life example of the > usefulness of flexible, repurposable text). And just *how* repurposable it is you can see from the fact that all those old ASCII-only versions can be potentially transformed into texts that are 'pleasing to read' as you say, by any volunteer who wants to upload an HTML or even TEI version with bibliographic markup of it. You write in another mail: > At the other end of the spectrum we have Project Gutenberg, which records > nothing about the provenance of a work, strips public domain works of all but > the alphabetic text, silently corrects "obvious" typographical errors, and > occasionally creates hybrid works by combining various editions without > comment or justification. I take this as hyperbole as quite a few people now upload HTML and include metadata in the title page. People using TEI use markup for corrections from which original spelling can be reconstructed. And the fingers of two hands suffice to count hybrid works. Ah, but now I see, you talk about the backlog! Well, backlog *always* is something to whine about, isn't it? ralf From schultzk at uni-trier.de Wed Dec 19 01:00:45 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Wed, 19 Dec 2007 10:00:45 +0100 Subject: [gutvol-d] my thoughts on a roundless system of proofing In-Reply-To: References: Message-ID: <550CCD3D-E029-4A1A-AC5A-809FC7F61AD0@uni-trier.de> Hi Bowerbird, An interresting concept. My gut feeling that there are still many questions that should/need be worked out. Am 19.12.2007 um 01:58 schrieb Bowerbird at aol.com: > here are those notes on a "roundless" system of proofreading... > > [Snip, Snip] > you'll be presented with a text-field, and the scan-image of the page. > > fix all errors you can find, then say if the page is "better" or > it's "done". > > if -- after the last change is made on a page -- 2-4 proofers in a row > _confirm_ the page as being "done", then it is considered as > "finished". > but any _change_ automatically resets the confirmation counter to > zero. This assumes you can get several proofers for a particular page/text. What do you do then? Do you set a time limit? Also, how do you deal with the fact that 2/several proofers battle it out and keep recorrecting to what they feel is correct? The page/text may never get done! > > > (the question as to whether the pages need 2 or 3 or 4 _confirmations_ > is one that will need to be subjected to some real-world empirical > testing, > balancing the benefits against the costs of doing the extra > confirmations. > we'd assume, of course, that more confirmations mean greater accuracy, > [] [snip, snip] > > here are some rough approximations, though: > 7 points -- making the last change to a page (i.e., none made after > you) > 6 points -- declaring a page "done" and having it confirmed as > "finished" > 5 points -- being the 2nd confirmation that a page is actually > "finished" > 4 points -- being the 3rd confirmation that a page is actually > "finished" > 3 points -- being the 4th confirmation that a page is actually > "finished" > 2 points -- making a page better by correcting one or more errors > 1 point -- proofing a page, but not finding errors, because it has > none > 0 points -- not finding any errors, even though there are one or more > -1 points -- declaring a page as "done" when it still contains an > error > -2 points -- 2nd confirmation as "done" when the page contains an > error > -3 points -- 3rd confirmation as "done" when the page contains an > error > -4 points -- 4th confirmation as "done" when the page contains an > error OOOOppppps!!!! After a page has be confirmed as DONE who catches the error??? > > -5 points -- introducing two or more errors onto a page How do you determine this. If a correction is made and others consider it not correct is that introduced error. What if the original has a typo in it and is corrected, yet others say no way jose!! How do detect this and deal with it? regards Keith. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071219/d4af9629/attachment.htm From Bowerbird at aol.com Wed Dec 19 04:59:26 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 19 Dec 2007 07:59:26 EST Subject: [gutvol-d] my thoughts on a roundless system of proofing Message-ID: keith said: > Hi Bowerbird, and hello to you, mr. keith. i've been wondering where you were at... > An interresting concept. My gut feeling that there are > still many questions that should/need be worked out. that could be... but i do believe i've got all of the main bases covered... > This assumes you can get several proofers for a particular page/text. oh yeah. i expect that a half-dozen or more people might read each page. my system makes the perusal of pages quite swift, which is very important. i will encourage my proofers to go at large consecutive swatches of pages, and to _read_for_content_, because that's a good way to find subtle errors. (i realize some people disagree with that. fine. but let's not have that talk, because it is an _empirical_ question, not a matter of _opinion_, thank you. i expect the quality coming out of my system to be the proof of its pudding.) > What do you do then? Do you set a time limit? well, as i said, i expect to move every page in a book from raw to "finished" in a matter of a day or two, and perhaps a mere matter of a couple hours... it depends on how many proofers you have, of course. but it also depends on how long you _decide_ to let a book linger before you introduce another. d.p. right now is charting that a good many of its books might take _years_... we can debate whether or not that's a good form of architecture. i wouldn't _necessarily_ say that it's a bad way of proceeding, but still, i wouldn't do it. my modus operandi would be to get a book through as quickly as possible. on the other hand, as i also said, i'd leave a book to "simmer" for 3-6 months, where the _only_ way for end-users to read it is the hybrid text/image mode of "continuous proofreading", a not-so-subtle "invitation" for 'em to proof it, one that makes it dirt-simple to report an error on-the-spot, if encountered... > Also, how do you deal with the fact that 2/several proofers battle it out > and keep recorrecting to what they feel is correct? The page/text may > never get done! i should have made clear that they can't "battle it out". all they can do is to create a situation of "dispute", which other people must come in and settle. this is _not_ wikipedia. neither proofer would benefit from "a revision war". it will help neither their _quantity_ -- i.e., their pages proofed -- nor their "quality" -- i.e., their accuracy score -- to go "back and forth" over a point. if there's a disagreement, it's simply _policy_ to settle it one way or the other. however, if you start with the notion that there's one-and-only-one "correct" way to do the page, you won't have much argument about _how_ to obtain it. some of my policies could well be ones that might cause volunteers to decide _not_ to proof on my site. bully for them, i say, standing up for their opinion. let them go to -- or start up -- another site that does things how _they_ want. there's plenty of room for everyone. for instance, my philosophical position is to correct mistakes, and do it silently. this fits with my vision of myself as a _republisher_, since this is almost always what publishers have done in the past, and continue to do up to this very day... but, you know, if someone else wants to elaborately annotate every correction that they've made, saving information on what it was, and the change they made, i'd wish them good luck and godspeed. like i said, there's plenty of room for all... but, given a firm orientation, there'll be no need for a "revision war" in this task... > OOOOppppps!!!! After a page has be confirmed as DONE who catches the error??? a page starts out as "raw". everyone who changes it after that marks it as "better", until _someone_ sticks their neck out and courageously declares it as being "done". at that point, the page would be served up to any proofers who have indicated they want to handle "confirmations", and they would try their best to find errors. if they do _not_ find an error, they will issue a "confirmation" the page is "done". depending upon how quickly they get their confirmation in, they might receive anywhere from 3-5 points. however, if they _do_ find an error, they will _correct_ it, and -- since their proof is the latest to make a change to the page -- they'd get _7_ points for their action. so even though "confirmations" are good, a negative-confirmation is even better. that's what these eagle-eyes are _hoping_ to find, a page that was so good that _someone_ thought it was worth sticking their neck out on, but oops! -- gotcha! the positive points assume that "confirmations" of the page are indeed _correct_. if _incorrect_ "confirmations" are issued -- that is, if someone comes along and finds an error _after_ you had declared a confirmation -- you will _lose_ points... of course, we won't be confident that the "error" they "corrected" _was_ an error, at least not until _their_ version of the page receives 2-4 confirmations itself... so -- from the standpoint of the system -- all we do is just sit back and _wait_ until a specific version of a page has received 2-4 confirmations, and then we trace its history. anyone who endorsed the page in that form gets positive points, and anyone who endorsed _another_ version of the page (as "done" or "confirmed") gets _negative_ points. this even applies during the 3- to 6-month "simmer" period. think of it as a parlor game, where you have fun with your friends by "beating" them. the slow and steady route is to collect small amounts of points making pages better. the only way to win big is to declare pages as "done" and/or do some confirmations, but that route also exposes you to a risk of _losing_ points big-time. it's a gamble... > -5 points -- introducing two or more errors onto a page > How do you determine this. the same way you determine whether the page is "correct" or not, i.e., whether it receives 2-4 confirmations by other proofers. so if i introduce errors onto a page, and another person corrects 'em, and 2-4 other people confirm those corrections, then i'm charged with negative points for introducing the errors. it's quite simple. > If a correction is made and others consider it not correct > is that introduced error. What if the original has a typo in it > and is corrected, yet others say no way jose!! well, since you've said straight out that "the original has a typo in it", and i've said straight out that "the official policy is we correct typos", then there is no question that this typo _should_ have been corrected, and therefore there is no way you can say no way jose! dispute solved. the more interesting question, of course, is when it is _unclear_ whether there is -- or is not -- a typo. if you feel that a specific word _is_ a typo, then you would stick your neck out and correct it. and if the next person agrees with you, they will stick their neck out and declare the page "done". and if the person after that agrees with you both, they'll stick their neck out and issue a "confirmation". and when you finally get enough confirmations, the page will be marked as "finished". and that settles the whole question... on the other hand, if someone disagrees with you, they will revert your edit, so then you'll challenge their reversion, and the page will become a "dispute". and everyone will waste gobs of time fighting about what it _should_ be, until they all realize that until they settle this dispute, none of them are improving _either_ their quantity (number of pages proofed) or quality (accuracy score), so they're all losing ground to other proofers smart enough to avoid disputes. so, as proofers grow into adults, and formulate solid policies, disputes go away. > How do detect this and deal with it? well, it's quite easy to _detect_ if a page has reverted to an earlier version -- you just compare every "new" version to each of the previous saved versions. as for _dealing_ with it, i just gave the reasoning how the problem solves itself. at some point, people realize that they're not getting anything done while they are engaged in a dispute, so they settle it and move on. i mean, _realistically_, the number of cases that are both _vague_ enough and _important_ enough that people will engage in a long-running dispute becomes vanishingly small. (we have ink-on-paper, for one, and a whole raft of grammar rules as guides.) this becomes _especially_ true when there are clear guiding policies in place... -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071219/bb79caff/attachment-0001.htm From lee at novomail.net Wed Dec 19 15:22:34 2007 From: lee at novomail.net (Lee Passey) Date: Wed, 19 Dec 2007 16:22:34 -0700 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <20071219093416.GA29329@ark.in-berlin.de> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> <20071217104548.GB7788@ark.in-berlin.de> <4766B6A9.5080400@novomail.net> <20071219093416.GA29329@ark.in-berlin.de> Message-ID: <4769A7BA.1070102@novomail.net> Ralf Stephan wrote: > So, where can pglaf metadata be accessed? /Can/ pglaf metadata be accessed? Has it been preserved in any form? If it is recreated from other sources, is it sufficiently reliable? >> Project Gutenberg's e-texts, on the other hand, can be viewed on >> just about every piece of equipment ever manufactured, including >> S-100 bus machines running CP/M attached to VT-52 dumb terminals. >> The pure ASCII representation (well, not so pure now that LATIN-1 >> encoding is permitted) can easily be repurposed for other >> applications not envisioned by the original creators, such as >> defeating spam filters. (I don't see this as a bad thing, rather it >> is a real-life example of the usefulness of flexible, repurposable >> text). > > And just *how* repurposable it is you can see from the fact that all > those old ASCII-only versions can be potentially transformed into > texts that are 'pleasing to read' as you say, by any volunteer who > wants to upload an HTML or even TEI version with bibliographic markup > of it. What you're talking about here is not repurposability, it is re-creatability. Of course, the old ASCII-only versions can be edited, but any electronic document can be edited; even old Wordstar documents, can be still be converted and edited. But can they be used for unintended purposes /without/ editing? A few examples, based primarily, this time, on navigation: Let's suppose I wrote the killer e-reader application. It can, of course, display PG e-texts in their native state, with the exception that it will ignore single newline sequences, and double newline sequences will be converted to a single newline and 5 spaces of indentation. What I want to do is auto-generate a table of contents for my users. How can I do this reliably? If every PG text had the chapters labeled

I could search for these headers. If every PG text were encoded in TEI I could scan for the tags and create an outline table of contents from there. If every PG text were marked in z.m.l. I could search for the 3 or 4 or 5 blank lines (I have no recollection of what the exact number really is) that indicate a chapter break and select the first following block as the chapter title. And while there are individual instances of each of these files in the PG corpus, 1. they are a vast minority (is that an oxymoron?) 2. they are not reliably marked in a way that allows me to distinguish them from other files which have no markup, and 3. they are not even internally consistent (most of the TEI files available from PG use the

tag to mark chapter headers, making them indistinguishable from real paragraphs). Let's suppose that I am an English teacher in Kenya. We are studying George Eliot's book, _Silas Marner_. Half of my class has tattered copies of the various editions of the book, and half of my class has a new OLPC laptop, some set to various font sizes because of uncorrected vision problems. I want to call my students attention to a particular passage in the middle of the 5th paragraph in chapter 3. How do I do that for the portion of the class that is using a PG e-text on their laptops? The only way I can think of is to expose enough of the text that the students can do a word search for the phrase; and hope that they don't mistype any part of which could be a lengthy passage, including punctuation, or that the phrase is unique enough that they won't be led to some other point in the text, or that the edition I'm using matches the (unknown, unspecified) edition(s) used to create the PG edition. This method would probably also be required for PG-TEI texts, because due to the fact that /every/ block of text frequently seems to be marked as a

aragraph, I can't even find the start of a chapter and count paragraphs from there. Let's suppose I'm a researcher doing research in American Naval History. I have obtained a PG version of a 19th century American History textbook. I want to find all the references to the warship USS Constitution ("Old Ironsides") but not any references to the Constitution of the United States. Now this last one is a bit of a stretch; I would expect to be able to do this a text marked up with level-three (complete) version of TEI, but probably not with basic-level markup. But it /is/ one more example of repurposing. This is what I mean by repurposing: taking an existing text, without manual edits (conversions using automated methods is acceptable), and using it for a purpose other than that for which it was originally intended. Some of these purposes may have been foreseen, some of them may have been foreseeable if unforeseen, and some of them may have been practically unforeseeable. But even in this last category, if we are committed to saving as much data as possible, and doing so in a structured way, making it amenable to automated data processing, we might be able to satisfy even some of those unforeseeable purposes. It seems to me that the vast majority of PG e-texts were designed to be read by scrolling down a screen having a fixed, immutable, font of 80 characters per line. These files really are inadequate for just about any other purpose. They have to be edited just to make them pleasing to read on hand-held devices (just look at how many suggestions appear from time to time on how to edit PG e-texts so that they word-wrap properly). These files are editable, but they are not repurposable. I have redone, or am in the process of redoing, in highly structured markup, two or three texts that also exist in the PG corpus. I have found that trying to add markup and metadata to existing PG texts is more difficult that simply starting from scratch. So while PG texts are clearly editable, there's no reason you would even want to upgrade them in that fashion. I /have/ found that having OCRed a new scan, the PG e-texts /are/ useful in helping me find scan errors; but even then I have to edit the text by normalizing it to match it against the scanned text. > You write in another mail: >> At the other end of the spectrum we have Project Gutenberg, which >> records nothing about the provenance of a work, strips public >> domain works of all but the alphabetic text, silently corrects >> "obvious" typographical errors, and occasionally creates hybrid >> works by combining various editions without comment or >> justification. > > I take this as hyperbole as quite a few people now upload HTML and > include metadata in the title page. People using TEI use markup for > corrections from which original spelling can be reconstructed. And > the fingers of two hands suffice to count hybrid works. Well, about 20% of the PG e-texts I have examined are hybrid works -- but then I've only looked at about 5 e-texts closely enough to make a determination one way or the other. So I suppose my conclusion that 20% of PG e-texts are hybrids is about as accurate as your conclusion that less than 10 e-texts are hybrids. In reality, we just don't know, do we? No one has kept any records of where any particular e-text came from, or what changes have been made over time (except for some rare acknowledgments of the sort of "John Doe was responsible for chapters 1-10 and Jane Roe added chapters 11-33.") We /do/ know that some percentage of the files are hybrids or otherwise altered (not necessarily intentionally or maliciously) but lacking that evidence I believe the percentage is significant enough that I am unwilling to give the e-texts of unknown provenance the benefit of the doubt. > Ah, but now I see, you talk about the backlog! Well, backlog *always* > is something to whine about, isn't it? Yes, and this may be approaching the nub of the whole issue. I have no doubt that there are gems amidst the dross, and that the percentage of gems is increasing in later submissions; this is probably due to the increasing sophistication of the volunteers who are producing e-texts. Distributed Proofreaders has obviously had a significant impact, but this impact tends to be restricted to the accuracy of the text rather than the overall quality or repurposability of the structure of the texts. My particular management bias is that quality control is not so much an issue of quality workers as it is an issue of quality processes. Thus, while I think that the PG corpus, as a whole, is quite poorly done, I don't think that's the fault of the volunteers who have been contributing the work. Rather, it is a failure of the processes that are, or should be, in place to /encourage/ and /enable/ quality work. In an earlier response to this thread, Michael Hart said: > Nobody forces anybody to do eBooks any particular way at PG. He could just as easily have said: "Nobody requires anybody to do eBooks in any particular way at PG." or "Nobody recommends that anybody do eBooks in any particular way at PG." or "Nobody encourages anybody to do eBooks in any particular way at PG." or "Nobody suggests that anybody do eBooks in any particular way at PG." This is what I was referring to when I mentioned a failure of leadership. There are /no/ explicit processes in place at PG to ensure quality, and no indications that any will be forthcoming. The quality of texts being included in the PG corpus is slowing increasing over the years, I believe, as I stated earlier, because volunteers contributing to PG are becoming more sophisticated in their knowledge of how to create e-books. I also believe that this increasing quality is occurring in spite of the efforts of The Powers That Be at PG, not because of them. In a later portion of his message, Mr. Hart suggested that > If you would like to write your own FAQs about how you think > eBooks should be done, please do so, and we will try to find > as many volunteers for your methodology as possible, and 90% > of all our volunteers just might go that way, who knows? In other words, Mr. Hart's approach to the problem of quality lies exclusively within the province of the volunteers. If you want to improve the quality of the PG corpus, just go out and find every volunteer who contributes to PG and try to convince him or her individually of the value of a quality product, and explain to him or her how that can be achieved. Then, to the extent you do succeed, PG will take that quality product, degrade it to a feature-less plain text edition, and then throw both products into the bin with the other rotten apples. The quality edition may be there, but you won't know it until you run across it. But for those who want to see PG e-texts, as a whole, improve in quality, I don't think that relying on the increasing sophistication of the contributors is the right way to go. Rather, PG should, as an organization not as a diffuse group of contributors, adopt some practices and guidelines which will tend to increase quality. Albert Einstein is credited with defining insanity as "doing the same thing over and over again and expecting different results." If PG continues to operate as it has over the past 15 years, I don't see any reason to believe that there will be any significant change in the overall quality of the corpus, despite the efforts of a few highly competent and highly motivated individuals. From Bowerbird at aol.com Wed Dec 19 16:47:34 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 19 Dec 2007 19:47:34 EST Subject: [gutvol-d] Why wait till we have to work from bookworm frass? Message-ID: ralf said: > catalog.rdf is a start and 'just' needs to be augmented with > 1) original publisher, > 2) publishing year, > 3) publishing place > 4) possible edition and series information, > 5) possible editor/translator info, and > 6) a working link to the etext page instead of simply 'etext21990'. > In short, metadata about the etext should 'link' to that info which is > given to pglaf when a clearance is requested, the metadata of the book. at last, somebody gets down to the _specifics_, instead of simply uttering the supposedly-magical "metadata" word and expecting everyone to genuflect... but -- as usual -- once we get down to the specifics, we _do_ have clarity. the "original publisher" for a p.g. e-text is -- ta-da! -- project gutenberg. the "publishing year" is whatever year the e-text is posted. the "publishing place" is cyberspace. the "possible edition and series information" is carried in the name. and the "possible editor/translator info" is usually contained in the text. _that_ is your metadata. and you had it all along. you had it all along... *** now, of course, _none_ of this will make the anal-complusives happy. they want to match up that e-text with some long-ago p-book twin... (and then they want to _criticize_ it because it's not an _exact_ match!) but that misses the point that this _electronic_ book is a _new_ edition... there is no external requirement imposed that it has to match a p-book. it simply must match itself, and it's even free to morph to something else. and that's the very _essence_ of what it means to "be in the public domain". the public-domain isn't a long-dead creature you've stuffed and mounted... it's a living, breathing animal, as up-to-date as today (without any hyphen), and as fully capable of ensnaring the mind as it has done for many decades. the only question remaining is "why do these critics continue their efforts to change p.g. into something which it has never been, and will never become?" why don't they go off and create the kind of library that _they_ want instead? the answer is obvious, of course. they have no volunteers, and it takes them _years_ -- evidently -- to create just a _handful_ of books in their manner... -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071219/0d20f184/attachment.htm From jon at noring.name Wed Dec 19 17:40:18 2007 From: jon at noring.name (Jon Noring) Date: Wed, 19 Dec 2007 18:40:18 -0700 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: References: Message-ID: <1933820418.20071219184018@noring.name> Bowerbird wrote: > now, of course, _none_ of this will make the anal-complusives happy. > they want to match up that e-text with some long-ago p-book twin... The term "anal-compulsive" is uncalled for. About the rest of Bowerbird's comments, it is clear that PG is really a repository of whatever anyone wants to put into it. The "YouTube" of the public domain text world. About the only rigor exercised by PG is its copyright clearance procedure. PG is simply a kind of barely- organized anarchy, for better and for worse. So calling PG a republisher is a real stretch, because PG does not do anything editorial, or selective, or anything else even remotely what a "publisher" does. (And of course, note that officially PG, via its non-profit organization for donation purposes, calls itself a "literary archive," not a publisher.) So PG is simply a public repository of "stuff". (O.k., I'll accept "literary archive" as a descriptor even if that is also a stretch based on my personal views of what a literary archive should do.) Now in the past, both Michael and Greg have put on the flag of "republisher", and it is possible they may reassert that claim in response to this message. Anyone can claim anything they want. I'm simply calling PG as I see it is today, and I'll let the silent majority reading this message decide for themselves what they believe PG is. About metadata issues, well maybe for another time... Jon Noring From schultzk at uni-trier.de Thu Dec 20 00:56:09 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Thu, 20 Dec 2007 09:56:09 +0100 Subject: [gutvol-d] my thoughts on a roundless system of proofing In-Reply-To: References: Message-ID: <0E23DA80-78B8-4D25-A389-5138F66B6A5C@uni-trier.de> Hi Bowerbird, Am 19.12.2007 um 13:59 schrieb Bowerbird at aol.com: > keith said: > > Hi Bowerbird, > > and hello to you, mr. keith. i've been wondering where you were at... Thanks, for the mr. but Keith will do just fine. Actually, Schultz is the name, but I am mostly know as Keith, or Kies as the Germans pronounce it. Life has been giving a rough time, so I could only linger. > > > > > An interresting concept. My gut feeling that there are > > still many questions that should/need be worked out. > > that could be... but i do believe i've got all of the main bases > covered... From your other remarks, I can see that there is a lot that you have not said. Lets see if I understand things now: 1) Your time line is quite short, but that is not a real problem. 2) Work on the page is kind of anarchal in the sense that the proofers must somehow agree or solve their disputes 3) "Confirmers" seem to have the deciding vote, yet are not almighty since another "confirmer" can refute 4) Disputes have no mediator: If the disputees do not agree they may simply leave the page better. Others may step in and get the job "done" Proofers are expected to grow up ???? This list is proof that adults enjoy and indulge in kindergarten games. 5) I see no one who is there to enforce the rules, except peer presure. I see that you are basing the system on the psychological game that is very popular. I forgot its name. The problem is the game is not realistic. Its rules and parameters are far to restricted and do not reflect human nature. Yours is a nice implementation and allows more flexiblity. Also, seriuous proofers do not want to play games and may well be turned off by overly enthusiatic ones. There has to be some kind of authority. Like I said seems O.K. still some rough edges. Yet, as someone you know quite well said the proof is in eating the pudding. regards Keith. P.S. Season Greetings and to all a good night > > > > This assumes you can get several proofers for a particular page/ > text. > > oh yeah. i expect that a half-dozen or more people might read each > page. > my system makes the perusal of pages quite swift, which is very > important. > i will encourage my proofers to go at large consecutive swatches of > pages, > and to _read_for_content_, because that's a good way to find subtle > errors. > (i realize some people disagree with that. fine. but let's not > have that talk, > because it is an _empirical_ question, not a matter of _opinion_, > thank you. > i expect the quality coming out of my system to be the proof of its > pudding.) > > > > What do you do then? Do you set a time limit? > > well, as i said, i expect to move every page in a book from raw to > "finished" > in a matter of a day or two, and perhaps a mere matter of a couple > hours... > > it depends on how many proofers you have, of course. but it also > depends > on how long you _decide_ to let a book linger before you introduce > another. > d.p. right now is charting that a good many of its books might take > _years_... > we can debate whether or not that's a good form of architecture. i > wouldn't > _necessarily_ say that it's a bad way of proceeding, but still, i > wouldn't do it. > my modus operandi would be to get a book through as quickly as > possible. > > on the other hand, as i also said, i'd leave a book to "simmer" for > 3-6 months, > where the _only_ way for end-users to read it is the hybrid text/ > image mode > of "continuous proofreading", a not-so-subtle "invitation" for 'em > to proof it, > one that makes it dirt-simple to report an error on-the-spot, if > encountered... > > > > Also, how do you deal with the fact that 2/several proofers > battle it out > > and keep recorrecting to what they feel is correct? The page/ > text may > > never get done! > > i should have made clear that they can't "battle it out". all they > can do is to > create a situation of "dispute", which other people must come in > and settle. > > this is _not_ wikipedia. neither proofer would benefit from "a > revision war". > it will help neither their _quantity_ -- i.e., their pages proofed > -- nor their > "quality" -- i.e., their accuracy score -- to go "back and forth" > over a point. > > if there's a disagreement, it's simply _policy_ to settle it one > way or the other. > > however, if you start with the notion that there's one-and-only-one > "correct" > way to do the page, you won't have much argument about _how_ to > obtain it. > > some of my policies could well be ones that might cause volunteers > to decide > _not_ to proof on my site. bully for them, i say, standing up for > their opinion. > let them go to -- or start up -- another site that does things how > _they_ want. > there's plenty of room for everyone. > > for instance, my philosophical position is to correct mistakes, and > do it silently. > this fits with my vision of myself as a _republisher_, since this > is almost always > what publishers have done in the past, and continue to do up to > this very day... > > but, you know, if someone else wants to elaborately annotate every > correction > that they've made, saving information on what it was, and the > change they made, > i'd wish them good luck and godspeed. like i said, there's plenty > of room for all... > > but, given a firm orientation, there'll be no need for a "revision > war" in this task... > > > > OOOOppppps!!!! After a page has be confirmed as DONE who > catches the error??? > > a page starts out as "raw". everyone who changes it after that > marks it as "better", > until _someone_ sticks their neck out and courageously declares it > as being "done". > > at that point, the page would be served up to any proofers who have > indicated > they want to handle "confirmations", and they would try their best > to find errors. > > if they do _not_ find an error, they will issue a "confirmation" > the page is "done". > depending upon how quickly they get their confirmation in, they > might receive > anywhere from 3-5 points. > > however, if they _do_ find an error, they will _correct_ it, and -- > since their proof > is the latest to make a change to the page -- they'd get _7_ points > for their action. > so even though "confirmations" are good, a negative-confirmation is > even better. > that's what these eagle-eyes are _hoping_ to find, a page that was > so good that > _someone_ thought it was worth sticking their neck out on, but > oops! -- gotcha! > > the positive points assume that "confirmations" of the page are > indeed _correct_. > > if _incorrect_ "confirmations" are issued -- that is, if someone > comes along and > finds an error _after_ you had declared a confirmation -- you will > _lose_ points... > > of course, we won't be confident that the "error" they "corrected" > _was_ an error, > at least not until _their_ version of the page receives 2-4 > confirmations itself... > > so -- from the standpoint of the system -- all we do is just sit > back and _wait_ > until a specific version of a page has received 2-4 confirmations, > and then we > trace its history. anyone who endorsed the page in that form gets > positive points, > and anyone who endorsed _another_ version of the page (as "done" or > "confirmed") > gets _negative_ points. this even applies during the 3- to 6-month > "simmer" period. > > think of it as a parlor game, where you have fun with your friends > by "beating" them. > the slow and steady route is to collect small amounts of points > making pages better. > the only way to win big is to declare pages as "done" and/or do > some confirmations, > but that route also exposes you to a risk of _losing_ points big- > time. it's a gamble... > > > > -5 points -- introducing two or more errors onto a page > > How do you determine this. > > the same way you determine whether the page is "correct" or not, > i.e., whether it > receives 2-4 confirmations by other proofers. so if i introduce > errors onto a page, > and another person corrects 'em, and 2-4 other people confirm those > corrections, > then i'm charged with negative points for introducing the errors. > it's quite simple. > > > > If a correction is made and others consider it not correct > > is that introduced error. What if the original has a typo in it > > and is corrected, yet others say no way jose!! > > well, since you've said straight out that "the original has a typo > in it", > and i've said straight out that "the official policy is we correct > typos", > then there is no question that this typo _should_ have been corrected, > and therefore there is no way you can say no way jose! dispute > solved. > > the more interesting question, of course, is when it is _unclear_ > whether > there is -- or is not -- a typo. if you feel that a specific word > _is_ a typo, > then you would stick your neck out and correct it. and if the next > person > agrees with you, they will stick their neck out and declare the > page "done". > and if the person after that agrees with you both, they'll stick > their neck out > and issue a "confirmation". and when you finally get enough > confirmations, > the page will be marked as "finished". and that settles the whole > question... > > on the other hand, if someone disagrees with you, they will revert > your edit, > so then you'll challenge their reversion, and the page will become > a "dispute". > and everyone will waste gobs of time fighting about what it > _should_ be, until > they all realize that until they settle this dispute, none of them > are improving > _either_ their quantity (number of pages proofed) or quality > (accuracy score), > so they're all losing ground to other proofers smart enough to > avoid disputes. > > so, as proofers grow into adults, and formulate solid policies, > disputes go away. > > > > How do detect this and deal with it? > > well, it's quite easy to _detect_ if a page has reverted to an > earlier version -- > you just compare every "new" version to each of the previous saved > versions. > > as for _dealing_ with it, i just gave the reasoning how the problem > solves itself. > at some point, people realize that they're not getting anything > done while they > are engaged in a dispute, so they settle it and move on. i mean, > _realistically_, > the number of cases that are both _vague_ enough and _important_ > enough > that people will engage in a long-running dispute becomes > vanishingly small. > (we have ink-on-paper, for one, and a whole raft of grammar rules > as guides.) > this becomes _especially_ true when there are clear guiding > policies in place... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071220/1a3bde42/attachment-0001.htm From ralf at ark.in-berlin.de Thu Dec 20 03:40:33 2007 From: ralf at ark.in-berlin.de (Ralf Stephan) Date: Thu, 20 Dec 2007 12:40:33 +0100 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <4769A7BA.1070102@novomail.net> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> <20071217104548.GB7788@ark.in-berlin.de> <4766B6A9.5080400@novomail.net> <20071219093416.GA29329@ark.in-berlin.de> <4769A7BA.1070102@novomail.net> Message-ID: <20071220114033.GA31581@ark.in-berlin.de> > > So, where can pglaf metadata be accessed? Is it possible to get offcial info on this, Mister Hart? Regards, R Stephan From ralf at ark.in-berlin.de Thu Dec 20 08:08:25 2007 From: ralf at ark.in-berlin.de (Ralf Stephan) Date: Thu, 20 Dec 2007 17:08:25 +0100 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <4769A7BA.1070102@novomail.net> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> <20071217104548.GB7788@ark.in-berlin.de> <4766B6A9.5080400@novomail.net> <20071219093416.GA29329@ark.in-berlin.de> <4769A7BA.1070102@novomail.net> Message-ID: <20071220160825.GA32015@ark.in-berlin.de> You wrote > No one has kept any records of where any particular e-text came from, or > what changes have been made over time (except for some rare > acknowledgments of the sort of "John Doe was responsible for chapters In an etext with number as low as 4080 I found this snippet: Corrected EDITIONS of our etexts get a new NUMBER, 8gyge11.txt VERSIONS based on separate sources get new LETTER, 8gyge10a.txt So, you're saying changes to texts aren't documented--I find this a bit strong. You should give evidence that this happens frequently if you don't want to be named liar. However, even if there were such cases, how often do you think will they lead to a hybrid because both the corrector and the WW don't get it that there are two significantly different versions? Most correctors, I'd assume, are after typos they found while reading. Why would someone submit half a text, anyway. ralf From Bowerbird at aol.com Thu Dec 20 12:56:47 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 20 Dec 2007 15:56:47 EST Subject: [gutvol-d] Why wait till we have to work from bookworm frass? Message-ID: i said: >? now, of course, _none_ of this will make the anal-complusives happy. i'm sorry. i shouldn't have said that. because, of course, it's actually spelled like this: >? now, of course, _none_ of this will make the anal-compulsives happy. so let's see what google says about this: > compulsive: 12,000,000 hits > complusive: 57,600 hits -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071220/ec4ba2fb/attachment.htm From hart at pglaf.org Thu Dec 20 13:34:49 2007 From: hart at pglaf.org (Michael Hart) Date: Thu, 20 Dec 2007 13:34:49 -0800 (PST) Subject: [gutvol-d] Why wait till we have to work from bookworm frass? Message-ID: On Wed, 19 Dec 2007, Lee Passey wrote: [snip] > In an earlier response to this thread, Michael Hart said: > >> Nobody forces anybody to do eBooks any particular way at PG. > > He could just as easily have said: Trying to force words into my mouth isn't getting you anywhere, it just makes you look silly in the eyes of YOUR volunteers, or potential volunteers. Have you considered writing as though someone might be reading what you are saying 10, 20, 30 years from now? > "Nobody requires anybody to do eBooks in any particular way at > PG." > > or > > "Nobody recommends that anybody do eBooks in any particular way > at PG." > > or > > "Nobody encourages anybody to do eBooks in any particular way at > PG." > > or > > "Nobody suggests that anybody do eBooks in any particular way at > PG." > > This is what I was referring to when I mentioned a failure of > leadership. There are /no/ explicit processes in place at PG to > ensure quality, and no indications that any will be forthcoming. Different people view leadership in different ways, and you are totally correct that the kind of leadership you promote here in this thread, and that others, including Jon Noring, promoted in past threads, will not "be forthcoming." The answer is, as it always has been, that YOU the, volunteers, provide that kind of leadership when and if you want it, but WE do not. . .we leave the doors open to all volunteers: not just those who toe YOUR political party line. Everyone is welcome to lead, or to follow, and you have to make your own leaders and followers, if you want them to exist. Greg Newby, The Board of Directors, and I will not do that kind of thing for you. . .you want to be a boss, find/create an army of followers on your own. . .by example. . .we will not specify YOUR plans as the "official" plans of Project Gutenberg. This has been tried time and time again, and all support is the constant offer to help YOU get YOUR ideas into practice. The results are YOURS, the volunteers'. If YOU don't like those results, YOU do it the way YOU want it. We will gladly help. But if all you want to do is stand on the sidelines yelling the instructions for everyone else to follow, I cannot say you have great chance of success, given the years of history of persons, much like yourself, trying this kind of thing over and over. "Just DO it!". . .the way YOU want it, and see who follows. If no one follows, try again, and again, and again. If no one EVER follows, perhaps you are just no a leader. WE are not going to just CALL you a leader because you ask. You seem to think that because there is a vacuum in the kind of leadership YOU think there should be that it would be easy from YOUR point of view to step in and take over. Volunteers don't repond to leadership the same way as others. Get used to it, and learn how to lead them. Run your ideas/ideals up the flagpole and see who salutes. That's the way Project Gutenberg has always worked. Work with it, or work against it, the choices are ALL YOURS. No one chooses for YOU, no one chooses for THEM. Each volunteer can do what s/he wants to do. > The quality of texts being included in the PG corpus is slowing > increasing over the years, I believe, as I stated earlier, > because volunteers contributing to PG are becoming more > sophisticated in their knowledge of how to create e-books. I > also believe that this increasing quality is occurring in spite > of the efforts of The Powers That Be at PG, not because of them. It is the freedoms listed above that allowed for such growth. If we held the reins as tightly as others would have done. . . who knows if our volunteers should have felt free enough to do what you admire, or anything else, for that matter. It's hard, if not impossible, to do something "in spite of the efforts of The Powers That Be at PG, not because of them, when "The Powers That Be at PG" encourage everyone to do what their own desires direct them to do. Your cutting off your nose to "spite" your face here in public does little to increase your image with the volunteers. If you really want to be one of The Powers That Be at PG," all you have to do it pick up "The Power" and run with it. It's just laying there for ANYONE to pick up and run with it. And _I_, personally, think THAT is what you complain about!!! _I_ think YOU do NOT want PG to be so grassroots. . . YOU want some kind of CONTROL. . .that isn't there, never has been. BUT!!! If you REALLY want to DO something, the door is always open. If you just want to kibitz, the breezes just pass you thru. > In a later portion of his message, Mr. Hart suggested that > >> If you would like to write your own FAQs about how you think >> eBooks should be done, please do so, and we will try to find >> as many volunteers for your methodology as possible, and 90% >> of all our volunteers just might go that way, who knows? > > In other words, Mr. Hart's approach to the problem of quality > lies exclusively within the province of the volunteers. This is the way Project Gutenberg always has been. Ruled by the volunteers. If 90% of them like YOUR idea/ideal YOU become "The Power." Just look at "Distributed Proofreaders" as an example. Or any of the other national or regional Gutenberg sites. No one "knighted" them and proclaimed them "The Powers." "Just DO It!" was their motto, and they just did it. Yes, we helped, but we help everyone. . . . > If you want to improve the quality of the PG corpus, just go out > and find every volunteer who contributes to PG and try to > convince him or her individually of the value of a quality > product, and explain to him or her how that can be achieved. You are welcome to use our own "Powers That Be" tools to do it, via the Newsletter, or what have you. However, it is obvious from the way you state your case that it is more of a sarcastic comment than anything real, though we of "The Powers That Be" would certainly hope otherwise. . . . > Then, to the extent you do succeed, PG will take that quality > product, degrade it to a feature-less plain text edition, and > then throw both products into the bin with the other rotten > apples. The quality edition may be there, but you won't know it > until you run across it. More sarcasm, which makes me wonder if I just wasted an hour-- perhaps "the thin veneer" has worn off, and it was only such a sarcastic message all along. . . . > But for those who want to see PG e-texts, as a whole, improve in > quality, I don't think that relying on the increasing > sophistication of the contributors is the right way to go. Don't forget that the tools at our disposal are also indreasing in sophistication, and it is much easier to increase quality or quantity than ever before, even with the same volunteers. All YOU have to do is LEAD THE WAY!!! > Rather, PG should, as an organization not as a diffuse group of > contributors, adopt some practices and guidelines which will > tend to increase quality. "PG" "as an organziation" should not adopt YOUR "practices and guidlines" and more than anyone elses. . .YOU have to convince "a diffuse group of contributors" just as we all had to do. "I am not a number, I am a free man!" > Albert Einstein is credited with defining insanity as "doing the > same thing over and over again and expecting different results." > If PG continues to operate as it has over the past 15 years, I > don't see any reason to believe that there will be any > significant change in the overall quality of the corpus, despite > the efforts of a few highly competent and highly motivated > individuals. Perhaps you would like to make a wager based on that??? ;-) Thank You!!! Give the world eBooks for 2008!!! Michael S. Hart Founder Project Gutenberg 100,000 eBooks easy to download at: http://www.gutenberg.org [already passed 25,500 eBooks] http://www/gutenberg.cc [already passed 75,000 eBooks] http://gutenberg.net.au Project Gutenberg of Australia 1570+ http://pge.rastko.net 65 languages PG of Europe ~500 http://gutenberg.ca Project Gutenberg of Canada http://preprints.readingroo.ms Not Primetime Ready ~400 >>> Your Project Gutenberg Site Could Be Listed Here <<< Blog at http://hart.pglaf.org From gbnewby at pglaf.org Thu Dec 20 15:27:49 2007 From: gbnewby at pglaf.org (Greg Newby) Date: Thu, 20 Dec 2007 15:27:49 -0800 Subject: [gutvol-d] PGLAF metadata In-Reply-To: <20071220114033.GA31581@ark.in-berlin.de> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> <20071217104548.GB7788@ark.in-berlin.de> <4766B6A9.5080400@novomail.net> <20071219093416.GA29329@ark.in-berlin.de> <4769A7BA.1070102@novomail.net> <20071220114033.GA31581@ark.in-berlin.de> Message-ID: <20071220232748.GC20405@mail.pglaf.org> On Thu, Dec 20, 2007 at 12:40:33PM +0100, Ralf Stephan wrote: > > > So, where can pglaf metadata be accessed? > > Is it possible to get offcial info on this, Mister Hart? > > > Regards, > R Stephan (I changed the subject line because I had been mostly ignoring the thread) What official PGLAF metadata do you want to access? If you're just looking for copyright clearance info that identifies print volumes, David Price's list is a good place to start: http://www.dprice48.freeserve.co.uk/GutIP.html -- Greg From ralf at ark.in-berlin.de Fri Dec 21 00:12:36 2007 From: ralf at ark.in-berlin.de (Ralf Stephan) Date: Fri, 21 Dec 2007 09:12:36 +0100 Subject: [gutvol-d] PGLAF metadata In-Reply-To: <20071220232748.GC20405@mail.pglaf.org> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> <20071217104548.GB7788@ark.in-berlin.de> <4766B6A9.5080400@novomail.net> <20071219093416.GA29329@ark.in-berlin.de> <4769A7BA.1070102@novomail.net> <20071220114033.GA31581@ark.in-berlin.de> <20071220232748.GC20405@mail.pglaf.org> Message-ID: <20071221081236.GA782@ark.in-berlin.de> > What official PGLAF metadata do you want to access? If > you're just looking for copyright clearance info that identifies > print volumes, David Price's list is a good place to start: > > http://www.dprice48.freeserve.co.uk/GutIP.html I'm sorry to say that the info does not identify print volumes. Especially the well known books have several editions. So, what's missing is - original publishing place - original publishing year Let's say we don't need the publisher because it's highly unlikely different editions have the same place and year. No one would need this info if we could access the cleared title pages, however, from the etext page, for example. So, is it possible to access place/year for a work? If not, is it possible to get at the title scan? Thanks for your time, R Stephan From richfield at telkomsa.net Fri Dec 21 00:57:48 2007 From: richfield at telkomsa.net (Jon Richfield) Date: Fri, 21 Dec 2007 10:57:48 +0200 Subject: [gutvol-d] Bookworm frass Message-ID: <476B800C.3020700@telkomsa.net> To Greg in particular, who said: > Project Gutenberg regularly receives such items (sometimes in the hopes that they'll be judged as public domain in the US, under our copyright clearance procedures). As a library (under the US's tax ruling), we are legally able to archive such items indefinitely, but not redistribute them. In short, you can send such items to me (or to Michael Hart) and we'll do our best to hold them until they become public domain in the US. < Thanks Greg, that was the definitive answer to my question. I'll be back. Meanwhile, greetings (seasonal and otherwise) and thanks to everyone else who responded. Cheers, Jon From Bowerbird at aol.com Fri Dec 21 09:06:03 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 21 Dec 2007 12:06:03 EST Subject: [gutvol-d] chinese Message-ID: love to see all the books in chinese being posted! -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071221/2221b50a/attachment.htm From Bowerbird at aol.com Fri Dec 21 09:19:48 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 21 Dec 2007 12:19:48 EST Subject: [gutvol-d] my thoughts on a roundless system of proofing Message-ID: keith said: > 3) "Confirmers" seem to have the deciding vote, yet > are not almighty since another "confirmer" can refute "consensus" is the operative concept here... the question in a roundless system is "when is a page done?" in a system that consists of a known number of rounds, a page is considered "finished" when it goes through all those rounds. even though it still might contain errors, it's considered "done". in a roundless system, you need to have some mechanism that tells you that the page -- though it might still contain errors -- is nonetheless to be considered "done". the answer that's been in the forefront at distributed proofreaders -- for some time now, with apparently no one to challenge it -- is some combination of the measures of the difficulty of the page and the competence of the various proofers who have attacked it. as you can imagine, it's pretty difficult to _obtain_ those measures. what i propose, in contrast, is a measure that is starkly simple... and -- perhaps more importantly -- will work as well, or better. > 4) Disputes have no mediator: > If the disputees do not agree they may simply leave the page better. > Others may step in and get the job "done" disputes _do_ have a mediator. that mediator is _consensus_. i didn't detail the processes that might lead to a consensus when we are in a "dispute" situation, but they might take _many_ forms, assuming that the two sides of the dispute couldn't work it all out. at the most basic, you could have people vote on the two options. or the person running the site could decide on the official policy... (over at d.p. presently, the "match-the-page" motto can trump all.) however, the fact of the matter is that there are very few "issues" that lead to people digging in their heels for an extended conflict. in deciding what those ink-marks _mean_ on the page of a book, it's not all that convoluted. if it was, readers would have rebelled. i've followed the forums over at distributed proofreaders for years, literally, and i cannot say that i remember even one such situation. there are a lot of fights about the system, but not the basic task... in most situations that come up, there is _uncertainty_ about them -- is that a comma or a semicolon?, is this a printer-error or not? -- which is quite different from two sides locked in an intractable fight. > Proofers are expected to grow up ???? > This list is proof that adults enjoy and indulge in kindergarten games. the "job" of this list isn't straightforward, like digitizing a page of a book. > 5) I see no one who is there to enforce the rules, except peer presure. it's my system, so i'll enforce the rules. :+) -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071221/d0060e20/attachment-0001.htm From lee at novomail.net Fri Dec 21 10:29:43 2007 From: lee at novomail.net (Lee Passey) Date: Fri, 21 Dec 2007 11:29:43 -0700 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: References: Message-ID: <476C0617.1000503@novomail.net> Michael Hart wrote: [snip] > Different people view leadership in different ways, and you are > totally correct that the kind of leadership you promote here in > this thread, and that others, including Jon Noring, promoted in > past threads, will not "be forthcoming." [snip] > Greg Newby, The Board of Directors, and I will not do that kind > of thing for you. . .you want to be a boss, find/create an army > of followers on your own. . .by example. . .we will not specify > YOUR plans as the "official" plans of Project Gutenberg. [snip] > Each volunteer can do what s/he wants to do. [snip] > It's hard, if not impossible, to do something "in spite of the > efforts of The Powers That Be at PG, not because of them, when > "The Powers That Be at PG" encourage everyone to do what their > own desires direct them to do. [snip] >> In a later portion of his message, Mr. Hart suggested that >> >>> If you would like to write your own FAQs about how you think >>> eBooks should be done, please do so, and we will try to find >>> as many volunteers for your methodology as possible, and 90% >>> of all our volunteers just might go that way, who knows? >> In other words, Mr. Hart's approach to the problem of quality >> lies exclusively within the province of the volunteers. > > This is the way Project Gutenberg always has been. [snip] > "PG" "as an organziation" [sic] should not adopt YOUR "practices and > guidlines" [sic] and [sic] more than anyone elses [sic] . . .YOU have to convince > "a diffuse group of contributors" just as we all had to do. For the record, I have absolutely no desire to become one of "The Powers That Be" at Project Gutenberg. And I certainly haven't ever advocated the adoption by PG any particular standard or guideline, let alone my own. I /do/ believe that the existence of well-publicized standards and guidelines is a necessary prerequisite to effect quality and control, and I believe that PG would benefit from the adoption of such standards and guidelines, no matter what they may be, but then I'm not particularly interested in improving Project Gutenberg, either. I long ago realize the futility of any such attempt. Like BowerBird, I /would/ like the PG corpus (body of works stored in the PG database) to be internally consistent; that way I can steal some of its components and write software to convert them into something more useful. But internal consistency would require standards, and we should all recognize by now that /that/ ain't gonna happen. [In a message posted this last October Mr. Hutchinson suggested that the PG white-washers would reject any submission that was not accompanied by a degraded text version, or at least something that could be confused for a degraded text version (I'm assuming something using z.m.l. or reStructured text would probably pass muster). If this is true, PG does, in fact, have some sort of standards, it's just that no one in the PG organization is willing to admit it and those wielding the power haven't been identified.] My comments are mostly intended to reinforce the message of Michael Hart: PG has no standards, PG will never have standards, PG won't even make suggestions to help the volunteers learn how to create e-texts for fear that they might be misconstrued as organizational standards. If you are a volunteer who feels s/he could work better in an organization that provides guidance and quality control, you should find some other organization to work with. (Any suggestions as to what other organizations meet these requirements would be welcome; Distributed Proofreaders is an obvious option). If you are a volunteer who does not want the quality work you have performed to be subverted and degraded, you should try to find a more appropriate repository for your work. (Internet Archive?) If you are a volunteer who has ideas about how e-texts can be improved, or how the process of creating e-texts can be improved, you are welcome and encouraged to drag your soap box to any forum on the internet you can find (including this one) but don't expect any support from PG beyond the obvious "we support your right to express your opinion." >> Albert Einstein is credited with defining insanity as "doing the >> same thing over and over again and expecting different results." >> If PG continues to operate as it has over the past 15 years, I >> don't see any reason to believe that there will be any >> significant change in the overall quality of the corpus, despite >> the efforts of a few highly competent and highly motivated >> individuals. > > Perhaps you would like to make a wager based on that??? Sure. All we have to do is agree on the standards by which the perceived increase in quality will be judged. > ;-) From Bowerbird at aol.com Fri Dec 21 11:59:47 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 21 Dec 2007 14:59:47 EST Subject: [gutvol-d] obsessive-compulsive delightful Message-ID: i am informed that jon noring said: > The term "anal-compulsive" is uncalled for. first of all, _i_ can decide for myself what words are "called for" to describe my positions. but hey jon, thanks for the feedback. alas, jon noring has used the term "anal" himself, in a way that one can only come to believe he thinks it applies _to_ himself, so i'm not sure why he would make a fuss when i use the term. (and note that i didn't point any fingers when i used the term.) for example, here: > http://groups.yahoo.com/group/distscan/message/31 jon noring said this: > There are lots of people who love to scan, and who are > super-meticulous and into high quality -- some would > call them anal. I'd rather have five people who are anal and here: > http://groups.yahoo.com/group/distscan/message/33 jon noring said this: > some volunteer in this group who has a lot of experience > with scanning, and preferably who describes themselves > as a quality fanatic (if they admit they are "anal" about > doing it right, that's the person I want) and here: > http://groups.yahoo.com/group/distscan/message/35 jon noring said this: > This is another reason why the scanner needs to be thorough > almost to an anal level it's obvious that jon noring considers "anal" as a compliment... yet when i use the same term, he considers it to be an insult... and such, my good friends, is the beauty of a rorschach blot: by telling you what _it_ is, a person tells you who _they_ are... or perhaps, since jon noring considers "anal" as a compliment, then it must be that "compulsive" is the "uncalled for" word here. but, you know what they say: the first step in solving a problem is admitting that you _have_ a problem... happy solstice, folks! -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071221/f44f33a8/attachment.htm From lee at novomail.net Fri Dec 21 12:05:29 2007 From: lee at novomail.net (Lee Passey) Date: Fri, 21 Dec 2007 13:05:29 -0700 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <20071220160825.GA32015@ark.in-berlin.de> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> <20071217104548.GB7788@ark.in-berlin.de> <4766B6A9.5080400@novomail.net> <20071219093416.GA29329@ark.in-berlin.de> <4769A7BA.1070102@novomail.net> <20071220160825.GA32015@ark.in-berlin.de> Message-ID: <476C1C89.7000809@novomail.net> Ralf Stephan wrote: > You wrote > >> No one has kept any records of where any particular e-text came from, or >> what changes have been made over time (except for some rare >> acknowledgments of the sort of "John Doe was responsible for chapters > > In an etext with number as low as 4080 I found this snippet: > > Corrected EDITIONS of our etexts get a new NUMBER, 8gyge11.txt > VERSIONS based on separate sources get new LETTER, 8gyge10a.txt > > So, you're saying changes to texts aren't documented--I find this > a bit strong. You should give evidence that this happens frequently > if you don't want to be named liar. No, all I should have to do is give evidence that it happens occasionally. If it happens occasionally, and there is no way of knowing when it did or did not happen, then nothing can be relied on. Let me be clear, here, that I'm not saying reliability is necessary. Most of the e-texts at PG can carry the same amount of enjoyment whether they are reliable or not, and in most cases the lack of reliability is just not that important. It's only if reliability is important to you that PG e-texts fall short (at least in this regard). There is a saying in the Intelligence community that "knowing there is a secret is half the secret." So, if you find a text whose name is a string of characters followed by a number greater than 10 (I believe 10 represents version 1.0) you have discovered half of the secret: you now know that there /is/ a secret. (I have no idea how you would discover this secret for the newer e-texts above 10000 which are named by number alone.) But while you have discovered that there is a secret, you haven't discovered what the secret actually is. What are the changes, who made them, and why were they made? > However, even if there were such cases, how often do you think will > they lead to a hybrid because both the corrector and the WW don't > get it that there are two significantly different versions? > Most correctors, I'd assume, are after typos they found while reading. > Why would someone submit half a text, anyway. I dunno, but it happens. Consider the infamous case of _Frankenstein_. In the header of the most recent PG edition (15) it states: Release Date: October 31, 1993 [eBook #84] [Most recently updated: May 30, 2005] ... [Chapters 1-6: mostly scanned by David Meltzer, Meltzer at ..., proofread, partially typed and submitted by Christy Phillips, Caphilli at ..., submitted on 9/24/93. Proofread by Lynn Hanninen, submitted 10/93.] Frankenstein, continued (Chapters 20-24) Scanned by Judy Boss (boss at ...) Proofread by Christy Phillips (caphilli at ...) Reproofed by Lynn Hanninen (leh1 at ...) Margination and last proofing by anonymous volunteers It would appear from this entry that Mr. Meltzer and colleagues scanned and submitted the first 6 chapters in the fall of '93. It also appears that sometime there after the project was taken up again by Ms. Boss and colleagues, at least for chapters 20-24. How chapters 7 through 19 got into the text is a mystery. In February 2005 it was reported that by empirical investigation, the PG edition of _Frankenstein_ differed, mostly in punctuation, from most paper editions of _Frankenstein_, but it was virtually identical to the 1981 Penguin Classics edition of that work, which had "modernized" both the punctuation and spelling. (http://groups.yahoo.com/group/ebook-community/message/22105). Later, it was discovered (I thought the message was posted here, but I can't seem to find it in the archives) that at some point the PG texts began to exhibit the more archaic spelling and punctuation styles. It could have been as early as Chapter 7 in the book, but may have been as late as chapter 24. At any rate, it is by now fairly clear that the PG e-text of _Frankenstein_ started as a transcription of the 1981 Penguin Classics edition, and was completed using some other, more archaic, edition -- the classic definition of a hybrid work. The earliest version of _Frankenstein_ I was able to find on the PG website was version 10, claimed to have been released on October 31, 1993. It was not until version 14 that the texts even contained any mention that edits had occurred (version 14 indicates "[Date last updated: May 15, 2004]"). So, knowing that there is a secret (the numbers changed), and knowing how to find older versions of texts in the PG archive, we could probably normalize and "diff" the texts and find out what changes were made between any to versions. But even that will only tell us /what/ has changed, it won't tell us /why/ it was changed, /who/ made the change, or what the justification for the change was. In May of 2005, on this listserv, Michael Hart stated: > Recently just such a discussion ocurred [sic] about Frankenstein and > about The Memoirs of Sherlock Holmes, and new editions have already > appeared for each of these, and yet another new edition is already in > progress for each of them, each with significan [sic] improvements > from different sources, as well as improvements we tend to make along > the way. While I have not done the research necessary to verify this, it would appear from Mr. Hart's comments that the PG edition of _The Memoirs of Sherlock Holmes_ is yet another of these hybrid editions. Lastly, in August of this year Mr. Newby wrote: > We don't enforce any adherance to a particular printed edition,...and > do have a number of frankentexts that have benefitted from different > sources. I conclude that there are an unknown but significant number of these hybrid texts included in the PG corpus, for whatever that is worth. Again, I want to point out that I do not necessarily think that these hybrid texts are a bad thing. The Penguin Classics edition of _Frankenstein_ was clearly an edited, modernized version, but there is no acknowledgment of that in their book, nor any identification of /their/ source materials. I think that BowerBird was absolutely correct when he replied that the metadata for any PG text should be: Publisher: Project Gutenberg Publishing year: whenever you downloaded it Publishing place: http://www.gutenberg.org Possible edition and series information: the file name Possible editor/translator info: Anonymous Project Gutenberg volunteers. This is as good as you're going to get from any print publisher, why should you expect better metadata from e-texts published by Project Gutenberg? Now it may be that you have misconstrued Project Gutenberg as an electronic archive or electronic library. Rather than rendering my own opinion, let me just encourage you to consider the hallmarks of an electronic archive, an electronic library, and an electronic publisher, and come to your own conclusions. From hart at pglaf.org Fri Dec 21 12:33:18 2007 From: hart at pglaf.org (Michael Hart) Date: Fri, 21 Dec 2007 12:33:18 -0800 (PST) Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <476C1C89.7000809@novomail.net> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> <20071217104548.GB7788@ark.in-berlin.de> <4766B6A9.5080400@novomail.net> <20071219093416.GA29329@ark.in-berlin.de> <4769A7BA.1070102@novomail.net> <20071220160825.GA32015@ark.in-berlin.de> <476C1C89.7000809@novomail.net> Message-ID: As far as I know, every version of every Project Gutenberg eBook is still in the Project Gutenberg archives. In cases where I have found missing eBooks right after the books were posted even, I have been able to retrieve these missing eBooks from various overnight archives. While I am sure someday someone MIGHT find a missing one-- I can also tell you that NOT ONCE has ONE person asked for a 1988 copy of Alice In Wonderland, etc. While I do not engage in dancing on pinheads, I am not the kind of person who forbids it, either. There is a better record of the history of our eBooks than in any other eLibrary, once completed. . .but as for those who did which pages, the answer is simple. . .a very large amount of the early work was done by. . .anonyous. In the particular case mentioned, I may even know who that anonymous volunteer was. But I'm not going to ever violate that confidence in these current times, or perhaps even after. mh From hart at pglaf.org Fri Dec 21 12:36:54 2007 From: hart at pglaf.org (Michael Hart) Date: Fri, 21 Dec 2007 12:36:54 -0800 (PST) Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass? Message-ID: On Fri, 21 Dec 2007, Lee Passey wrote: > For the record, I have absolutely no desire to become one of > "The Powers That Be" at Project Gutenberg. And I certainly > haven't ever advocated the adoption by PG any particular > standard or guideline, let alone my own. You certainly want Project Gutenberg to be more like YOUR image if what Project Gutenberg SHOULD be. . .period. However, you think that you can hide behind NOT making an effective statement of your own. . .thus simply adding an assortment of verbiage to the noise level, no signal just more noise. . . . Just because your statements of desire are fuzzy does not make them non-existent. > I /do/ believe that the existence of well-publicized standards > and guidelines is a necessary prerequisite to effect quality and > control, This is the advocation of a standard, however fuzzy those advocations may be. . . . > and I believe that PG would benefit from the adoption of such > standards and guidelines, no matter what they may be, Everyone could run Project Gutenberg better than we do. Obviously. And we invite you to do it. However, if fuzzy woolgathering is all you have in mind, I think we have better things to do. > but then I'm not particularly interested in improving Project > Gutenberg, either. Then what are you doing here? Is naysaying all you have in mind? If you are "not particularly interested in improving Project Gutenberg, either," then all you are doing is making noise. > I long ago realize the futility of any such attempt. This presumes that you were here "long ago," but I don't see any books contributed from the direction you are proposing-- or indeed anything with your name on it--for the past year. Should I presume you may have contributed anonymously? Or is your resistance really totally futile? > Like BowerBird, I /would/ like the PG corpus (body of works > stored in the PG database) to be internally consistent; that way > I can steal some of its components and write software to convert > them into something more useful. Then you /should/ do something to exemplify your desires. How can anyone evaluate what you say without such examples? Perhaps that is your reason for avoiding being specific? > But internal consistency would require standards, and we should > all recognize by now that /that/ ain't gonna happen. It certainly won't happen if you won't even create examples. > [In a message posted this last October Mr. Hutchinson suggested > that the PG white-washers would reject any submission that was > not accompanied by a degraded text version, or at least > something that could be confused for a degraded text version > (I'm assuming something using z.m.l. or reStructured text would > probably pass muster). If this is true, PG does, in fact, have > some sort of standards, it's just that no one in the PG > organization is willing to admit it and those wielding the power > haven't been identified.] The WhiteWashers are not the only way eBooks get donated. Need examples? Just look at all the books coming out this week. . . . > My comments are mostly intended to reinforce the message of > Michael Hart: OH OH!!! I sense more words being stuffed into my mouth. > > PG has no standards, PG will never have standards, Not standards than can be forced on volunteers, only standards that are more on the order of suggestions, excepting legal and other standards of a minimal nature. > PG won't even make suggestions to help the volunteers learn how > to create e-texts for fear that they might be misconstrued as > organizational standards. What Mr. Passey fears is that there are too many standards, not that there aren't any at all. . . . Any volunteer can simply pick out a favorite book and decide, for him/herself that this is how they want THEIR books to be. But no one else gets to decide for them, unless they WANT to be a member of a certain group. . .still voluntary. > If you are a volunteer who feels s/he could work better in an > organization that provides guidance and quality control, you > should find some other organization to work with. Actually, you should just form your own group, if none of the already existing sub-groups of Project Gutenberg suit you. Mr. Passey is confusing freedom and independence with the lack of any standards whatsoever. . .I think this happened before. > (Any suggestions as to what other organizations meet these > requirements would be welcome; Distributed Proofreaders is an > obvious option). Distributed Proofreaders came about in exactly the manner _I_ have been describing, from WITHIN Project Gutenberg and quite WITHOUT any need for such nasty commentaries. Distributed Proofreaders is a GREAT example of how volunteers create their own standards, their own groups, etc., with lots of help from "The Powers That Be." > If you are a volunteer who does not want the quality work you > have performed to be subverted and degraded, you should try to > find a more appropriate repository for your work. All any volunteer has to do is SAY inside their eBook that they don't want it changed. . .period. Mr. Passey is sooo off base here, it's not even a red herring. > (Internet Archive?) Project Gutenberg has worked with The Internet Archive all along, but their goals are not nearly identical. Project Gutenberg is much more about full text documents-- well proofread--with user generated error correction. Mr. Passey will not find this in Google, Yahoo, Amazon, or even Sony, or too many other eBook sources. Just WHY is Mr. Passey saying all this, anyway? He SAYS he has no hope, want to make no contribution. Is his ONLY goal the destruction of Project Gutenberg by getting all the volunteers to desert? > If you are a volunteer who has ideas about how e-texts can be > improved, or how the process of creating e-texts can be > improved, you are welcome and encouraged to drag your soap box > to any forum on the internet you can find (including this one) > but don't expect any support from PG beyond the obvious "we > support your right to express your opinion." At least you won't get censored here, as Mr. Noring proposed, as Mr. Ockerbloom was famous for. . .we even let Mr. Passey rant and rave to his heart's content. Why? We don't believe in censorship. Censorship of you. Censorship of him. Personally, I think he is complaining because we don't pick one standard and censor all the others. Personally, I don't think THE standard for eBooks exists yet. But I think it will become obvious when it does. I was unwilling to force HTML on our volunteers and readers when Sir Tim Berners-Lee invited me, and Project Gutenberg, to be one of the charter members of The World Wide Web, and I stand behind that decision today, simply because I am not a purveyor of standards, and neither is Project Gutenberg. Whatever standards emerge from the real world are just fine. If Mr. Passey is unwilling to provide examples of standards, then it is highly unlikely that he will ever get exactly the standards he wants, or anything close to it. I was involved with Unicode people when it was being set up, and I can tell it it was a zoo, same with TEI, ZML, ZML, and all the rest. The time wasted was enormous. I'm glad I didn't get involved more than I did. When I realized NONE of my suggestions would be taken I just decided to do it on my own, which is exactly what Mr. Passey and Mr. Noring should be doing. . .should have done long ago. If YOU won't put YOUR time, effort, and money where the mouth has put in so much mileage. . .why should anyone else? The reason The Web succeeded, browsers succeeded, and eBooks, is the the people doing them didn't wait for approval. . . . "Just DO It!" From vze3rknp at verizon.net Fri Dec 21 13:23:28 2007 From: vze3rknp at verizon.net (Juliet Sutherland) Date: Fri, 21 Dec 2007 16:23:28 -0500 Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass? In-Reply-To: References: Message-ID: <476C2ED0.5030406@verizon.net> Michael Hart wrote: > Distributed Proofreaders came about in exactly the manner _I_ > have been describing, from WITHIN Project Gutenberg and quite > WITHOUT any need for such nasty commentaries. > > Distributed Proofreaders is a GREAT example of how volunteers > create their own standards, their own groups, etc., with lots > of help from "The Powers That Be." > Well, that's one version of history. Charles Franks (founder of DP) was a PG volunteer who decided to try to make a better way of proofreading books for PG. Aside from the usual "let lots of flowers bloom" statements from PG there was NO early support. Charles did all the coding, ran the software on his own server, etc. When I went looking for DP in April of 2002, having remembered that it had been mentioned on gutvol-d several years before (2000), I could find no mention of it at all on the PG website. From PG it sure looked like DP didn't exist. In the public interviews and other publicity that Michael Hart did for PG in 2002 and 2003, at least those that I was aware of, Michael never once mentioned DP. However, in the summer of 2002, PGLAF did provide a high-speed, destructive scanner setup for Charles Franks. And at the beginning of 2003 provided a second setup. The Internet Archive provided our next server, around Sept. 2002 or so. When we could no longer afford "free" service from IA, PGLAF bought DP a server and paid for hosting. So I'm not saying that PG has been unsupportive of DP. Once it was apparent that DP was thriving and would produce lots of material for PG, there was plenty of support. But for those first 2.5 years it was different. Today, most of the DP volunteers still arrive via the banner at PG and most are strongly supportive of PG and its mission. JulietS Distributed Proofreaders From jon at noring.name Fri Dec 21 13:44:02 2007 From: jon at noring.name (Jon Noring) Date: Fri, 21 Dec 2007 14:44:02 -0700 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <476C0617.1000503@novomail.net> References: <476C0617.1000503@novomail.net> Message-ID: <191898394.20071221144402@noring.name> Lee wrote: > I /do/ believe that the existence of well-publicized standards and > guidelines is a necessary prerequisite to effect quality and > control, and I believe that PG would benefit from the adoption of > such standards and guidelines, no matter what they may be, but then > I'm not particularly interested in improving Project Gutenberg, > either. I long ago realize the futility of any such attempt. Lee summarizes the two approaches to running a "movement" like PG: 1) It is Not Good for a "movement" like PG to require any standards/ guidelines beyond the absolute minimum necessary to define the project and to protect it from any legal problems. In the case of PG, the absolute minimum necessary is a) to receive and make available for free download textual content in human readable plain text form. (This defines the historical core of PG's vision.) b) to copyright clear all texts submitted to the archive. (This protects PG from its enemies, real and imagined.) From what Michael has said over the years, I have the impression, right or wrong, that he believes *any* organizational standard/ requirement beyond the above will harm the vision and goals of PG. 2) It is Good for a "movement" like PG to have a clear-cut set of standards/guidelines necessary to assure collection uniformity, consumer quality, and to enhance the collection's repurposeability. > Like BowerBird, I /would/ like the PG corpus (body of works stored > in the PG database) to be internally consistent; that way I can > steal some of its components and write software to convert them into > something more useful. But internal consistency would require > standards, and we should all recognize by now that /that/ ain't > gonna happen. It's interesting that I think everyone, including Michael, would love for the PG corpus to be internally consistent to *something*. However, I would have to assume that Michael believes it is preferable for PG to have major inconsistencies in the collection rather than for PG to lay down a set of requirements (beyond the current minimalist ones) which would have been necessary to greatly improve the internal consistency of the collection. Now the supporters of PG's minimalist approach will point to the collection and say "look at the size of it! The minimalist approach works!" Those who believe in having at least a few requirements to meet the goal of consistency will point to the collection and say "Look at the size of this mess! The minimalist approach has failed." Since we can't rewind the clock to the early 90's and restart a project like PG with a few more requirements, all we can do is speculate where the collection would have been today -- alternative history sort of thing. However, we can certainly lend data to the speculation by looking at Distributed Proofreaders. DP has by now passed all other sources of PG texts, and they collectively work under a set of guidelines that are stricter than PG's itself. (Now we can argue whether or not their work product is consistent enough, but that's not germane to this discussion -- they clearly have a few requirements beyond PG's, and they clearly are producing a hell of a lot of texts at a pace that is far outstripping everyone else. And, I don't see a problem with DP from an organizational sense -- they are not moving to the "dark side" or anything.) So several of us believe that PG's corpus could have been just as large today, yet be at a higher level of consistency, usability, and repurposeability, had PG from the beginning issued a few more requirements for the texts submitted to it. Since hindsight is 20-20, and we can't rewind the clock, there's no need to beat a dead horse. We can only look to the future and decide what is best to do. I see two options for PG: 1) No change in the collection requirements and keep collecting it from anywhere and everywhere. 2) Require all new texts to meet a few new requirements, and then encourage the older texts to be reworked as necessary to meet the new requirements. The problem I see in all the discussions the last few years is that we tend to confuse "option #1 vs. option #2" with "if option #2, what should be the requirements." Until Michael and Greg decide that they are serious *to do something* so as to improve the PG collection's long-term consistency, we can talk all we want about what #2 should entail. But it is sort of useless (from the perspective of improving PG at least) until those who control access to the archive (Michael and Greg) get serious and clear as to what they really want. By their seeming silence on what they want I can only assume that either they are satisfied with the way things are, or they are hoping someone will come along (e.g. Bowerbird) and wave a magic wand without them having to establish any more "requirements", and all will be made well. ***** As an aside, maybe what is needed is a new text archive (and Michael and Greg enthusiastically support others to do this!) which sets a few more collection requirements, and invites contributions. So those who now contribute to PG can consider striving to make the texts they produce *also* meet the requirements of the new archive (and I think the requirements will not be that difficult to meet nor conflict with PG's minimal requirements), and submit their texts to *both* PG and to the new archive. PG gets what they want, and those of us who believe a text archive should meet a set of certain requirements beyond what PG sets, get what we want. This is really not a competition, but rather should be something where everybody wins. > PG has no standards, PG will never have standards, PG won't even > make suggestions to help the volunteers learn how to create e-texts > for fear that they might be misconstrued as organizational standards. This is my conclusion, too. > If you are a volunteer who feels s/he could work better in an > organization that provides guidance and quality control, you should > find some other organization to work with. (Any suggestions as to > what other organizations meet these requirements would be welcome; > Distributed Proofreaders is an obvious option). You know, just as I have been advocating for a while that we separate the digital text master format from the delivery format(s), maybe we need to separate in our mind "digital text archive" from "digital text production". In essence, PG is evolving towards this model where it simply is the YouTube equivalent for textual content: "just dump your stuff here!". DP has played a major role in this paradigm shift. > If you are a volunteer who has ideas about how e-texts can be > improved, or how the process of creating e-texts can be improved, > you are welcome and encouraged to drag your soap box to any forum on > the internet you can find (including this one) but don't expect any > support from PG beyond the obvious "we support your right to express > your opinion." Of course, I suggest the "Digital Text Community" to be the place to discuss new ideas for the digitization of "ink-on-paper" texts. Here's the URL to the home page: http://groups.yahoo.com/group/digital-text/ The gutvol-* groups are really for internal discussions relating to the operation of PG itself. They overall support a specific project, and are not neutral meeting places for other projects which digitize and archive texts totally apart from PG. It's amazing how many projects have joined DTC. There are people from PG and DP there, of course. Jon Noring From jon at noring.name Fri Dec 21 14:01:09 2007 From: jon at noring.name (Jon Noring) Date: Fri, 21 Dec 2007 15:01:09 -0700 Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass? In-Reply-To: References: Message-ID: <1103068917.20071221150109@noring.name> [posted publicly to gutvol-d and cc: to Michael] Michael wrote: > Lee Passey wrote: >> (Any suggestions as to what other organizations meet these >> requirements would be welcome; Distributed Proofreaders is an >> obvious option). > Distributed Proofreaders came about in exactly the manner _I_ > have been describing, from WITHIN Project Gutenberg and quite > WITHOUT any need for such nasty commentaries. > > Distributed Proofreaders is a GREAT example of how volunteers > create their own standards, their own groups, etc., with lots > of help from "The Powers That Be." Michael and Greg: Supposing someone were to: 1) set up an *online archive*, totally independent from PG, and invited submissions of digital texts which meet a certain set of guidelines the new archive has established, and 2) Personally contacted many of the current contributors of texts to PG, asking them to contribute their texts to the new and independent archive *in addition* to contributing them to PG, and 3) The new archive will NEVER ever mention PG nor mention that some of the texts may also have been contributed to PG, Would you make a request to the volunteer contributors not to contribute to the new archive so long as they contribute to PG? [Note, it would be made TOTALLY clear to the volunteer contributors that the new archive has no ties to PG in any manner whatsoever, so don't worry about that issue in your reply to my question.] Jon Noring From jon at noring.name Fri Dec 21 14:17:09 2007 From: jon at noring.name (Jon Noring) Date: Fri, 21 Dec 2007 15:17:09 -0700 Subject: [gutvol-d] obsessive-compulsive delightful In-Reply-To: References: Message-ID: <114738108.20071221151709@noring.name> Bowerbird wrote: > Jon Nirng wrote: > i am informed that jon noring said: >>?? The term "anal-compulsive" is uncalled for. > first of all, _i_ can decide for myself what words are "called for" > to describe my positions. but hey jon, thanks for the feedback. Well, ultimately it is not up to you or me to decide whether the disparaging use of the phrase "anal-compulsive" is uncalled for. It is up to the rest of the readers here to decide that for themselves. All I know from running a lot of lists over the years is that those who oppose hostile-tone speech (which is what I believe your speech amounted to) far outnumbers those who believe it to be acceptable (and some believe it to even be useful.) Since those who run this list apparently believe it's alright for people to express themselves in a quite hostile manner (so long as they don't go way off the deep end), I don't plan to email them and ask them to do something. I don't administer or moderate this list, I'm simply a guest in their house as we all are... Jon Noring From gbnewby at pglaf.org Fri Dec 21 14:24:38 2007 From: gbnewby at pglaf.org (Greg Newby) Date: Fri, 21 Dec 2007 14:24:38 -0800 Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass? In-Reply-To: <1103068917.20071221150109@noring.name> References: <1103068917.20071221150109@noring.name> Message-ID: <20071221222438.GA14265@mail.pglaf.org> On Fri, Dec 21, 2007 at 03:01:09PM -0700, Jon Noring wrote: > [posted publicly to gutvol-d and cc: to Michael] > > > Michael wrote: > > Lee Passey wrote: > > >> (Any suggestions as to what other organizations meet these > >> requirements would be welcome; Distributed Proofreaders is an > >> obvious option). > > > Distributed Proofreaders came about in exactly the manner _I_ > > have been describing, from WITHIN Project Gutenberg and quite > > WITHOUT any need for such nasty commentaries. > > > > Distributed Proofreaders is a GREAT example of how volunteers > > create their own standards, their own groups, etc., with lots > > of help from "The Powers That Be." > > Michael and Greg: > > Supposing someone were to: > > 1) set up an *online archive*, totally independent from PG, and > invited submissions of digital texts which meet a certain set of > guidelines the new archive has established, and > > 2) Personally contacted many of the current contributors of texts to > PG, asking them to contribute their texts to the new and > independent archive *in addition* to contributing them to PG, and > > 3) The new archive will NEVER ever mention PG nor mention that some of > the texts may also have been contributed to PG, > > Would you make a request to the volunteer contributors not to > contribute to the new archive so long as they contribute to PG? Of course not. * We'll even help someone do #s 1-3, if desired * How much more encouragement do you expect to get? It's surprising to see this questions asked, given that over and over, everyone has been encouraged to do it his or her own way. The encouragement hasn't included stipulations or limitations. -- Greg > [Note, it would be made TOTALLY clear to the volunteer contributors > that the new archive has no ties to PG in any manner whatsoever, so > don't worry about that issue in your reply to my question.] > > > Jon Noring > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From Bowerbird at aol.com Fri Dec 21 14:49:13 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 21 Dec 2007 17:49:13 EST Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass? Message-ID: juliet said: > Once it was apparent that DP was thriving > and would produce lots of material for PG, > there was plenty of support. "plenty of support" from an organization that has _zero_budget_, and hasn't paid its own founder much (if anything?) in several years is probably not something that should be sneezed at... > But for those first 2.5 years it was different. how many books did you digitize in those first 2.5 years? -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071221/57fab6be/attachment.htm From Bowerbird at aol.com Fri Dec 21 14:59:09 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 21 Dec 2007 17:59:09 EST Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass? Message-ID: greg said: > It's surprising to see this questions asked, given that over and over, > everyone has been encouraged to do it his or her own way.? and i've noted here, several times, that this offer is quite genuine. i've given project gutenberg tons and tons of constructive criticism, and they've responded by offering me free webspace and bandwidth. this is even after i've explicitly said that it is my fullest expectation and specific intention that -- upon my "repurposing" of the p.g. e-texts -- my cyberlibrary will usurp project gutenberg as the most useful one... and they're, like, "great! how can we help?" i know, i know, it seems strange to me too. but what can you say? :+) > The encouragement hasn't included stipulations or limitations. nope, it hasn't. -bowerbird p.s. besides, it's a stupid question because everyone knows that you can repurpose _any_ public-domain e-text from project gutenberg, simply by stripping off the p.g. legalese. and every e-text tells you so. ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071221/67a8d42d/attachment.htm From gbnewby at pglaf.org Fri Dec 21 14:59:17 2007 From: gbnewby at pglaf.org (Greg Newby) Date: Fri, 21 Dec 2007 14:59:17 -0800 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <191898394.20071221144402@noring.name> References: <476C0617.1000503@novomail.net> <191898394.20071221144402@noring.name> Message-ID: <20071221225917.GB14265@mail.pglaf.org> I'm just responding to a few points, but leaving in the whole context. Skip down 100 lines for my responses: On Fri, Dec 21, 2007 at 02:44:02PM -0700, Jon Noring wrote: > Lee wrote: > > > I /do/ believe that the existence of well-publicized standards and > > guidelines is a necessary prerequisite to effect quality and > > control, and I believe that PG would benefit from the adoption of > > such standards and guidelines, no matter what they may be, but then > > I'm not particularly interested in improving Project Gutenberg, > > either. I long ago realize the futility of any such attempt. > > Lee summarizes the two approaches to running a "movement" like PG: > > 1) It is Not Good for a "movement" like PG to require any standards/ > guidelines beyond the absolute minimum necessary to define the > project and to protect it from any legal problems. > > In the case of PG, the absolute minimum necessary is > > a) to receive and make available for free download textual > content in human readable plain text form. (This defines > the historical core of PG's vision.) > > b) to copyright clear all texts submitted to the archive. > (This protects PG from its enemies, real and imagined.) > > From what Michael has said over the years, I have the impression, > right or wrong, that he believes *any* organizational standard/ > requirement beyond the above will harm the vision and goals of PG. > > 2) It is Good for a "movement" like PG to have a clear-cut set of > standards/guidelines necessary to assure collection uniformity, > consumer quality, and to enhance the collection's repurposeability. > > > > Like BowerBird, I /would/ like the PG corpus (body of works stored > > in the PG database) to be internally consistent; that way I can > > steal some of its components and write software to convert them into > > something more useful. But internal consistency would require > > standards, and we should all recognize by now that /that/ ain't > > gonna happen. > > It's interesting that I think everyone, including Michael, would love > for the PG corpus to be internally consistent to *something*. > > However, I would have to assume that Michael believes it is preferable > for PG to have major inconsistencies in the collection rather than for > PG to lay down a set of requirements (beyond the current minimalist > ones) which would have been necessary to greatly improve the internal > consistency of the collection. > > Now the supporters of PG's minimalist approach will point to the > collection and say "look at the size of it! The minimalist approach > works!" > > Those who believe in having at least a few requirements to meet the > goal of consistency will point to the collection and say "Look at the > size of this mess! The minimalist approach has failed." > > Since we can't rewind the clock to the early 90's and restart a > project like PG with a few more requirements, all we can do is > speculate where the collection would have been today -- alternative > history sort of thing. > > However, we can certainly lend data to the speculation by looking at > Distributed Proofreaders. DP has by now passed all other sources of PG > texts, and they collectively work under a set of guidelines that are > stricter than PG's itself. (Now we can argue whether or not their work > product is consistent enough, but that's not germane to this > discussion -- they clearly have a few requirements beyond PG's, and > they clearly are producing a hell of a lot of texts at a pace that is > far outstripping everyone else. And, I don't see a problem with DP > from an organizational sense -- they are not moving to the "dark side" > or anything.) > > So several of us believe that PG's corpus could have been just as > large today, yet be at a higher level of consistency, usability, and > repurposeability, had PG from the beginning issued a few more > requirements for the texts submitted to it. > > Since hindsight is 20-20, and we can't rewind the clock, there's no > need to beat a dead horse. We can only look to the future and decide > what is best to do. I see two options for PG: > > 1) No change in the collection requirements and keep collecting it > from anywhere and everywhere. > > 2) Require all new texts to meet a few new requirements, and then > encourage the older texts to be reworked as necessary to meet the > new requirements. > > The problem I see in all the discussions the last few years is that we > tend to confuse "option #1 vs. option #2" with "if option #2, what > should be the requirements." You're mischaracterizing #1. #2 has happened, and continues to happen, in many ways. You're also not distinguishing a few important things: a. collection development policy versus technical requirements for submissions b. within-eBook quality versus cross-collection consistency > Until Michael and Greg decide that they are serious *to do something* > so as to improve the PG collection's long-term consistency, we can > talk all we want about what #2 should entail. I think you mostly care about consistency. Sorry, but Michael and I really don't, in terms of within-eBook content. (We do have consistent headers, filenames, etc., as mentioned below.) > But it is sort of useless (from the perspective of improving PG at > least) until those who control access to the archive (Michael and > Greg) get serious and clear as to what they really want. By their > seeming silence on what they want I can only assume that either they > are satisfied with the way things are, or they are hoping someone will > come along (e.g. Bowerbird) and wave a magic wand without them having > to establish any more "requirements", and all will be made well. Please don't interpret my silence on this thread, or others, as meaningful. As mentioned frequently, I almost never read anything authored by you, or by a few other people who post to gutvol-d. This is my choice, and means I'm sometimes not tuned into whatever discussion is going on. Even when I'm following a thread, I sometimes stop myself from responding...mostly because I don't want to write something that will be interpreted as policy, when it was really just an opinion. More on this theme: > As an aside, maybe what is needed is a new text archive (and Michael > and Greg enthusiastically support others to do this!) which sets a few > more collection requirements, and invites contributions. You keep asking, and we keep saying, "yes," then the cycle repeats of expressing dissatisfaction with the way things are. There has never yet been substantial action. We've been doing this for years. This is not positive reinforcement for me to continue to engage in the discussion. If you were DOING something, you'd get a lot more of my attention (for whatever that might be worth). (Yes, I know you've DONE a few things! But mostly it's just talk...and essentially the same talk, over and over.) A fact to consider that gutvol-d only has a few hundred people on it. The subscribership is relatively flat (a few people coming & going over time, but basically the same # of subscribers since the start of the list 7ish years ago). That's fewer than the unique # of individuals who have submitted eBooks, just in 2007 (around 300). Outside of those contexts, Michael and I are in ongoing discussions with any number of people who want to do interesting stuff with the PG content, or start their own affiliated site, or have other ideas. Recycling the same tired discussions on gutvol-d is just not very compelling to me, and it has not helped PG achieve much. > So those who now contribute to PG can consider striving to make the > texts they produce *also* meet the requirements of the new archive > (and I think the requirements will not be that difficult to meet nor > conflict with PG's minimal requirements), and submit their texts to > *both* PG and to the new archive. PG gets what they want, and those of > us who believe a text archive should meet a set of certain > requirements beyond what PG sets, get what we want. So do it!!!! > This is really not a competition, but rather should be something where > everybody wins. Are you sure you believe that? It seems above that you're implying that everyone LOSES if PG doesn't follow the types of guidelines you've advocated. > > PG has no standards, PG will never have standards, PG won't even > > make suggestions to help the volunteers learn how to create e-texts > > for fear that they might be misconstrued as organizational standards. > > This is my conclusion, too. Your definition of standards is not my definition of standards. PG has quite a few. But they don't cover a variety of items you seem to be interested in, such as: - maintaining provenance of sources - particular markup / layout standards And are very permissive for things you would like less permissive: - various formats accepted - various content types - and yes, varying quality in presentation, proofreading, etc. And yet: - we have a fixed format & set of rules for copyright clearances - we have gutcheck and a variety of other automated programs - we have a set of file naming procedures - we have a unified catalog - we produce valid HTML, reasonably sized images (in subdirectories), provide conversion on the fly to various formats and lots more. "No standards" and "will never have standards" is presumably a reference to some sorts of standards you care about. I don't accept it as accurate for the PG collection. As you said, DP is even stricter in what they produce...standards for items to pass the final PPV check, as well as per-book standards that producers apply. > > If you are a volunteer who feels s/he could work better in an > > organization that provides guidance and quality control, you should > > find some other organization to work with. (Any suggestions as to > > what other organizations meet these requirements would be welcome; > > Distributed Proofreaders is an obvious option). > > You know, just as I have been advocating for a while that we separate > the digital text master format from the delivery format(s), maybe we > need to separate in our mind "digital text archive" from "digital text > production". So do it!!! > In essence, PG is evolving towards this model where it simply is the > YouTube equivalent for textual content: "just dump your stuff here!". > > DP has played a major role in this paradigm shift. I don't think it's fair or accurate to say that DP has helped PG shift towards a "just dump your stuff here" model, nor that it is the model PG has. I don't really know what you're talking about with this, actually. The "just dump your stuff here" model is, as far as I can tell, being pursued by eBook initiatives from Google, Yahoo, the IA and spinoffs (including OCA), where the emphasis has been on mass scanning of entire shelves, then raw OCR + page scans are presented as an eBook. "Dump" is an overstatement, but not much of one. > > If you are a volunteer who has ideas about how e-texts can be > > improved, or how the process of creating e-texts can be improved, > > you are welcome and encouraged to drag your soap box to any forum on > > the internet you can find (including this one) but don't expect any > > support from PG beyond the obvious "we support your right to express > > your opinion." > > Of course, I suggest the "Digital Text Community" to be the place to > discuss new ideas for the digitization of "ink-on-paper" texts. Here's > the URL to the home page: > > http://groups.yahoo.com/group/digital-text/ > > The gutvol-* groups are really for internal discussions relating to > the operation of PG itself. They overall support a specific project, > and are not neutral meeting places for other projects which digitize > and archive texts totally apart from PG. > > It's amazing how many projects have joined DTC. There are people from > PG and DP there, of course. > > Jon Noring -- Greg From jon at noring.name Fri Dec 21 16:35:05 2007 From: jon at noring.name (Jon Noring) Date: Fri, 21 Dec 2007 17:35:05 -0700 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: <20071221225917.GB14265@mail.pglaf.org> References: <476C0617.1000503@novomail.net> <191898394.20071221144402@noring.name> <20071221225917.GB14265@mail.pglaf.org> Message-ID: <659760181.20071221173505@noring.name> Greg wrote: > Jon wrote: First, I appreciate your reply, Greg! > You're also not distinguishing a few important things: > > a. collection development policy versus technical requirements for > submissions > b. within-eBook quality versus cross-collection consistency Well, everything gets even finer than this, but I did not want to delve into the specifics since my discussion wanted to look at the more global aspects. For example, some aspects of "within e-book quality" definitely have an impact on "cross-collection consistency." (Of course, I can already see that we can have different definitions as to what "cross-collection consistency" means. I think this aspect of the discussion should probably not continue.) I would like to understand PG's "official collection development policy." If this is spelled out at the PG site (a Google search turned up nothing using that phrase), a link to it would be appreciated. I have an idea what it is, but since a collection development policy is clearly an organizational policy, the official policy has to originate from PGLAF. >> As an aside, maybe what is needed is a new text archive (and >> Michael and Greg enthusiastically support others to do this!) which >> sets a few more collection requirements, and invites contributions. > You keep asking, and we keep saying, "yes," > then the cycle repeats of expressing dissatisfaction with the > way things are. There has never yet been substantial action. > We've been doing this for years. Well, I did say above "and Michael and Greg enthusiastically support others to do this!", so I got the message. My other message, where I ask a couple hypotheticals about starting a project and inviting contributions of texts which are also being contributed to PG, was asked to absolutely clarify your "yes" because of concerns shared with me in private by a couple other individuals. So just because you say "yes" does not mean everyone interprets "yes" in the same way. I hope that the positive answer you provided to me in private will be also posted to gutvol-d since there are a number of people who like to hear it as a sort of "official" statement. I request that it be written and posted to the PG site. A simple paragraph would sufficie. Of course, I have my own suggested wording, but whatever is put down in writing has to come from PGLAF: "An important mission of Project Gutenberg is to assure the Public Domain is completely digitized and freely available to all. Thus, Project Gutenberg fully and enthusiastically encourages others to start independent archives which freely distribute public domain texts. PG invites all who contribute public domain texts to other archives, to also contribute them to Project Gutenberg's archive. This assures wider distribution and archival redundancy. Likewise, PG also supports contributors to PG's archive to also consider donating their texts to other archives. PG has minimal format requirements, but does require copyright clearance." Or something like that. (I'm sure the intent can be written much more cleanly and succinctly than I wrote it.) > And are very permissive for things you would like less permissive: > > - various formats accepted > - various content types > - and yes, varying quality in presentation, proofreading, etc. Well, since we are trying to understand each other, I have no difficulty with multiple formats. In fact the more derivatives the better. > I don't think it's fair or accurate to say that DP has helped PG > shift towards a "just dump your stuff here" model, nor that it is > the model PG has. The word "dump" is probably harsh, but it cannot be denied that as an organization, PG is not itself digitizing texts. At the most, on the text production end, it is encouraging people to digitize texts and then asking them to donate the texts so long as they meet some minimal requirements of format, plus pass copyright clearance. Thus, I group PG along with IA/OCA, Google Books and YouTube, among other content archives. It is interesting that YouTube is, in some respects, not much different from PG in terms of how they collect and distribute content. By its marketing, YouTube is saying "please give us your video!" and seems to have pretty minimal requirements regarding format (video is much more repurposeable than texts, though), and to meet copyright law (if PG is rigorous on anything, though, it is with copyright clearance, and this is one thing which PG does very very well, at least as seen from my vantage point.) PG, by its well-understood mission, is asking people to donate texts to it for archiving and distribution. In fact, the lack of anything other than minimal text format requirements only reinforces the view that PG is primarily a text archive. Even PGLAF, by its name "literary text archive" acknowledges that the real focus of PG is on archival and delivery of texts, and not on production of texts itself. Now there is NOTHING wrong with PG being considered primarily a text archive. It's a good thing. It is through the archive that the texts make it to the world. Jon Noring From Bowerbird at aol.com Fri Dec 21 17:18:30 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 21 Dec 2007 20:18:30 EST Subject: [gutvol-d] is it december 10th yet? Message-ID: hmm... it appears i missed my annual december 10th post to michael hart. darn! so... sorry this is late, michael, i guess i was busy; you know how time flies when you're having fun... anyway... michael hart, thank you for bringing project gutenberg into the world! you birthed electronic-books, michael, and -- along with them -- boosted the very concept of _unlimited_distribution_via_cyberspace_. let the record show -- clearly -- that many of the earliest e-texts were created when you -- michael hart, a semi-dyslexic? -- _typed_them_in._ and the first 1,000 e-texts from others were _proofed_ by you too, until you simply could not keep up with the increasing pace of submissions... why? because you assimilated a community of people around you who were eager to share, and help you create, your envisioned cyberlibrary. you built it. and they came. and is it really only 4 years ago we were celebrating number 10,000? when -- this year -- you are celebrating (give or take some) 25,000? not that numbers matter anymore. it's the idea, and the idea is loose, and the reason the idea is loose is that you put the idea into motion... while lots of people might have been talking about electronic-libraries way back when -- i know _i_ was -- _you_ sat down and set to typing, and you and your typing made all the difference, my friend, _all_ of it... google may have more books than you, but they got the idea from you. lots of people have a finger in the pie now -- the google boys, brewster, bezos, adobe's sharks, you name it -- but you _baked_ that pie, michael. so god bless you, michael hart. god bless you for what you've done. i love you, man. -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071221/5abdbdd1/attachment.htm From lee at novomail.net Fri Dec 21 18:04:07 2007 From: lee at novomail.net (Lee Passey) Date: Fri, 21 Dec 2007 19:04:07 -0700 Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass? In-Reply-To: References: Message-ID: <476C7097.9070106@novomail.net> Michael Hart wrote: > On Fri, 21 Dec 2007, Lee Passey wrote: > >> For the record, I have absolutely no desire to become one of >> "The Powers That Be" at Project Gutenberg. And I certainly >> haven't ever advocated the adoption by PG any particular >> standard or guideline, let alone my own. > > You certainly want Project Gutenberg to be more like YOUR > image if what Project Gutenberg SHOULD be . . . period. Well, in the sense of "it sure would be nice if Project Gutenberg could furnish e-texts that I might find useful," that might be true. In the sense of "Project Gutenberg should remake itself in whatever image it thinks I'm advocating," certainly not. The success or failure of Project Gutenberg is of virtually no consequence to me, and I certainly wouldn't presume to substitute my judgment for that of those to whom Project Gutenberg /is/ of consequence. No, really. I'm not suggesting you do anything at all. I may express opinions as to the likelihood of certain consequences following from certain behaviors, but I have no opinion at all as to whether those consequences are desirable or not, or should be sought or avoided. Those kind of judgments are yours, and yours alone. [snip] >> I /do/ believe that the existence of well-publicized standards >> and guidelines is a necessary prerequisite to effect quality and >> control, > > This is the advocation of a standard, however fuzzy those > advocations may be. . . . No, it is the statement of a causal relationship. I make no value judgments. >> and I believe that PG would benefit from the adoption of such >> standards and guidelines, no matter what they may be, Ok, /this/ is a value judgment, but it's only my values. I'm not in charge, you are, and I respect your right to run Project Gutenberg according to your values. [snip] >> but then I'm not particularly interested in improving Project >> Gutenberg, either. > > Then what are you doing here? An interesting question. I am deeply committed to e-books, and making quality digital versions of important literature available to the public. And to be honest, I'm getting tired of hearing things in the press like "I downloaded an e-book from Project Gutenberg, and obviously e-books will never be anywhere near as good as paper books." So one thing I want to do is speak out whenever I see the emperor without clothes. I want to be sure everyone understands what Project Gutenberg is, and is not, and if I can accomplish that in your own words, so much the better. I don't want people coming to Project Gutenberg and thinking that the quality of its work product is indicative of e-books in general. And of course, just about anyone who is interested in e-books is going to pass through so poking my head up here from time to time is a way to do a little social networking, and maybe make some contacts with like-minded individuals. If I can find some individual who is interested in producing quality e-books, and I can help him or her understand how to build a better mousetrap, then I have accomplished something. > Should I presume you may have contributed anonymously? No, you should presume that I'm not particularly interested in throwing my work product into the PG mill. My stuff is out on the internet, and if it interests you enough to find it you're welcome to it. >> Like BowerBird, I /would/ like the PG corpus (body of works >> stored in the PG database) to be internally consistent; that way >> I can steal some of its components and write software to convert >> them into something more useful. > > Then you /should/ do something to exemplify your desires. I have, and I continue to do so. I have frequently offered markup advice to people who are faced with particularly knotty problems, and in general it has been well received. I have written all sorts of software designed to facilitate the creation of quality e-books, and have offered here on the mailing list. I've recreated several works that PG also has it its corpus and placed them on my web site for critique and evaluation. The fact that I'm not doing any of the things /you/ want me to be doing does not mean I'm doing nothing. [snip] > The WhiteWashers are not the only way eBooks get donated. > > Need examples? > > Just look at all the books coming out this week. . . . Rather than pointing to examples of books which have managed to avoid the white-washer gauntlet, it would be more useful to explain the process used to avoid that gauntlet. I'm sure I'm not the only one interested in the answer to /that/ question. >> My comments are mostly intended to reinforce the message of >> Michael Hart: > > OH OH!!! > > I sense more words being stuffed into my mouth. > >> PG has no standards, PG will never have standards, > > Not standards than can be forced on volunteers, only standards > that are more on the order of suggestions, excepting legal and > other standards of a minimal nature. > >> PG won't even make suggestions to help the volunteers learn how >> to create e-texts for fear that they might be misconstrued as >> organizational standards. > > What Mr. Passey fears is that there are too many standards, > not that there aren't any at all. . . . Now who's putting words into whose mouth? :-) But although you still seem a little unclear on the concept, you are mostly right. A standard has to be explicitly definable, if not defined, and can be defined as well by what it is not as by what it is. A "standard" that includes everything is, patently, no standard at all. Now what I really would like (just to be clear, I'm speaking in general; I'm not asking for PG to do anything to satisfy this desire) is a corpus of works which are susceptible to repurposing via automated data processing. Works which are usable /by/ computers, not just by humans who happen to be using computers. Not only is the concept of thousands of "standards" meaningless, it's virtually impossible to do anything with. If I want to automatically extract the author and title from every work in my mythical corpus, and every one of them follows a different standard in identifying those two data points, it's impractical, if not impossible, to accomplish my desired task. Even if I /were/ able to write a thousand programs to match my thousand "standards" I would need some mechanism to know which text follows which standard so I know which program to apply to the particular file. In other words, I would need one, or some limited number, of meta-standards. Project Gutenberg is not the corpus I need. I'm not saying that it should be, I'm just saying that it's not. And anyone who comes to PG thinking that PG e-texts /can/ be used by computers, and not just humans, should be rapidly disabused of that notion. [snip] > Mr. Passey is confusing freedom and independence with the lack > of any standards whatsoever . . . I don't think so. If a "standard" isn't published it can't be a standard, and if everyone is not only free to ignore a published "standard", and does, in fact, ignore that "standard" then in fact there are no standards. Anarchy and conformism are not necessarily diametrically opposed, but they're pretty darn close. [snip] > Project Gutenberg has worked with The Internet Archive all along, > but their goals are not nearly identical. Indeed. That is why it may be a more appropriate repository for those people interested in preserving the Public Domain. I was proposing the Internet Archive as an alternative to Project Gutenberg, not as a companion. [snip] > Is his ONLY goal the destruction of Project Gutenberg > by getting all the volunteers to desert? I certainly would not encourage /anyone/ committed to digitizing paper books to abandon Project Gutenberg unless they had found some other organization which better suited their own preferences. > Personally, I think he is complaining because we don't pick > one standard and censor all the others. I'm sorry, but I can't help thinking here about BowerBird's Rorschach blots ... > Personally, I don't think THE standard for eBooks exists yet. Nor do I. I /do/ think that several GOOD standards for e-books exist, anyone of which would be reasonable to adopt. I even think it would be acceptable to choose a half-dozen of them with the requirement that whichever one you choose you clearly identify your choice then use it exclusively. > But I think it will become obvious when it does. I disagree. I think e-book standards will continue to evolve as hardware, software, and our understanding of natural languages improve. > I was unwilling to force HTML on our volunteers and readers > when Sir Tim Berners-Lee invited me, and Project Gutenberg, > to be one of the charter members of The World Wide Web, and > I stand behind that decision today, simply because I am not > a purveyor of standards, and neither is Project Gutenberg. Hopefully, you've made that perfectly clear. I understand this, and I hope everyone else does as well. > Whatever standards emerge from the real world are just fine. > > If Mr. Passey is unwilling to provide examples of standards, > then it is highly unlikely that he will ever get exactly the > standards he wants, or anything close to it. XHTML, TEI, DocBook, z.m.l., OEB ... take your pick. > I was involved with Unicode people when it was being set up, > and I can tell it it was a zoo, same with TEI, ZML, ZML, and > all the rest. > > The time wasted was enormous. And yet, some pretty good standards emerged. It was obviously not a waste of time for /everyone/ involved. And now I can leverage all the good work they did! Time well spent, if you ask me. [snip] > The reason The Web succeeded, browsers succeeded, and eBooks, > is the the people doing them didn't wait for approval. . . . The reason the web and browsers succeeded is because Sir Tim Berners-Lee invented the HyperText Markup Language and the HyperText Transfer Protocol, and everyone agreed to use it. The reason e-books /haven't/ succeeded is because everyone insists on doing things their own way. > "Just DO It!" Sounds like good advice to me. From hart at pglaf.org Fri Dec 21 20:02:43 2007 From: hart at pglaf.org (Michael Hart) Date: Fri, 21 Dec 2007 20:02:43 -0800 (PST) Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass? In-Reply-To: <476C7097.9070106@novomail.net> References: <476C7097.9070106@novomail.net> Message-ID: On Fri, 21 Dec 2007, Lee Passey wrote: > Michael Hart wrote: > >> On Fri, 21 Dec 2007, Lee Passey wrote: >> >>> For the record, I have absolutely no desire to become one of >>> "The Powers That Be" at Project Gutenberg. And I certainly >>> haven't ever advocated the adoption by PG any particular >>> standard or guideline, let alone my own. >> >> You certainly want Project Gutenberg to be more like YOUR >> image if what Project Gutenberg SHOULD be . . . period. > > Well, in the sense of "it sure would be nice if Project > Gutenberg could furnish e-texts that I might find useful," > that might be true. In the sense of "Project Gutenberg should > remake itself in whatever image it thinks I'm advocating," > certainly not. The success or failure of Project Gutenberg is > of virtually no consequence to me, and I certainly wouldn't > presume to substitute my judgment for that of those to whom > Project Gutenberg /is/ of consequence. Well said! > No, really. I'm not suggesting you do anything at all. I may > express opinions as to the likelihood of certain consequences > following from certain behaviors, but I have no opinion at > all as to whether those consequences are desirable or not, or > should be sought or avoided. Those kind of judgments are > yours, and yours alone. Well. . .I try to leave that to the judgement of those who do the actual choosing and working to create the books. I would not be comfortable telling someone how to prepare the favorite books of and entire life time, it shoud be, "a labor of love," so to speak, not merely following a checklist. If making eBooks were only following a standards checkist for xxx number of pages, then it could easily be done by programs and we could just dump all the text in those programs but the result would be much like artificial reading voices. [Footnote: as many of you know, _I_ don't personally care in any way about the "appearance" or "look and feel" of eBooks-- I tend to SEE the book as a vision rather than seeing a words on the page or screen kind of thing. . .hence my inhibitions, as it were to tell be what/how to do are more on the lines of personal respect for individual human beings. Perhaps that last part said it best after all these years.] > [snip] > >>> I /do/ believe that the existence of well-publicized >>> standards and guidelines is a necessary prerequisite to >>> effect quality and control, >> >> This is the advocation of a standard, however fuzzy those >> advocations may be. . . . > > No, it is the statement of a causal relationship. I make no > value judgments. Some people make statements they do no view as judgemental in cases where others see them as VERY judgemental. That is also perhaps better said than I managed before, but I hesitate to include every case in that statement. >>> and I believe that PG would benefit from the adoption of >>> such standards and guidelines, no matter what they may be, > > Ok, /this/ is a value judgment, but it's only my values. That's all I was trying to say. . . . Sorry, it took so much time and effort for us to manage it. > I'm not in charge, you are, and I respect your right to run > Project Gutenberg according to your values. I still don't think I am running Project Gutenberg according, as you say, to my values, other than that my values are value the people who volunteer to help Project Gutenberg. After all, I have nothing to offer them than my thanks and my respect, along with a chance to perhaps change the world in a way not seen since The Gutenberg Press. > > [snip] > >>> but then I'm not particularly interested in improving >>> Project Gutenberg, either. >> >> Then what are you doing here? > > An interesting question. I am deeply committed to e-books, > and making quality digital versions of important literature > available to the public. And to be honest, I'm getting tired > of hearing things in the press like "I downloaded an e-book > from Project Gutenberg, and obviously e-books will never be > anywhere near as good as paper books." You'll probably have plenty of time to get even more tired of comments such as those, as it appears to me that most of such comments come from people who WANT eBooks to fail. . . ! I'm absolutely sure the same comments were made during shifts in paradigms from stone to parchment to papyrus to paper, and from tablets to scrolls to books to eBooks. Naysaying is just part of their tactics and strategies. The answer lies, as always, in the cost/benefit ratio. The reason Project Gutenberg has survived better than eBooks from commercial sources is just that. . .cost benefit ratio. Project Gutenberg can't fail. . .at least until someone else figures out a way to make eBooks ubiquitous. > So one thing I want to do is speak out whenever I see the > emperor without clothes. I want to be sure everyone > understands what Project Gutenberg is, and is not, and if I > can accomplish that in your own words, so much the better. I > don't want people coming to Project Gutenberg and thinking > that the quality of its work product is indicative of e-books > in general. I wouldn't either. . .but for the opposite reasons. I don't like the other eBooks as well. Tell me, does anyone else hand out 35 million eBooks per year? Even those who claim to have millions to hand out? Not to mention that those 35 million are from 25,000 titles. > And of course, just about anyone who is interested in e-books > is going to pass through so poking my head up here from time > to time is a way to do a little social networking, and maybe > make some contacts with like-minded individuals. If I can > find some individual who is interested in producing quality > e-books, and I can help him or her understand how to build a > better mousetrap, then I have accomplished something. Everyone has a different idea of the ideal mouse trap, so I am trying to leave the door open for all of them. I certainly to not expect eBooks to LOOK the same in 100 years or perhaps even in 10 years. But I should hope the underlying text would still be 99.99 the same characters that the original sources had. >> Should I presume you may have contributed anonymously? > > No, you should presume that I'm not particularly interested > in throwing my work product into the PG mill. My stuff is out > on the internet, and if it interests you enough to find it > you're welcome to it. OK, I'll see what I can do. Any particular credit line you would like? >>> Like BowerBird, I /would/ like the PG corpus (body of works >>> stored in the PG database) to be internally consistent; >>> that way I can steal some of its components and write >>> software to convert them into something more useful. If it's TOO consistent, then it's just preprogrammed output. I was hoping for something better than a Xerox machine. However, I would actually accept eBooks made by machine as long as they were 99.99% accurate. After all, it's the books that matter, not our pride. >> Then you /should/ do something to exemplify your desires. > > I have, and I continue to do so. I have frequently offered > markup advice to people who are faced with particularly > knotty problems, and in general it has been well received. There is one place we differ, I don't care much about markup and I apologize if that causes a rift between us or others-- I hate to say it in at least one manner. . .but some friends I really like are VERY into markup and appearance, but I may be just "old school" enough NOT to want to judge a book by a collection of appearance variables. . .rather than content. What I see everywhere are comments on FORM, APPEARANCE, that stuff that goes down to JUDGING A BOOK BY ITS COVER. I'm just not that sort of person. . . . > I have written all sorts of software designed to facilitate > the creation of quality e-books, and have offered here on the > mailing list. > > I've recreated several works that PG also has it its corpus > and placed them on my web site for critique and evaluation. > > The fact that I'm not doing any of the things /you/ want me > to be doing does not mean I'm doing nothing. Again my apologies, those kinds of standards are just not the part of books that interest me. I hope you realize I'm not being judgemental here, I just see something else than that stuff when I read a book. > [snip] > >> The WhiteWashers are not the only way eBooks get donated. >> >> Need examples? >> >> Just look at all the books coming out this week. . . . > > Rather than pointing to examples of books which have managed > to avoid the white-washer gauntlet, it would be more useful > to explain the process used to avoid that gauntlet. I'm sure > I'm not the only one interested in the answer to /that/ > question. The simple answer, as always, is just contect Newby or myself. >>> My comments are mostly intended to reinforce the message of >>> Michael Hart: >> >> OH OH!!! >> >> I sense more words being stuffed into my mouth. >> >>> PG has no standards, PG will never have standards, >> >> Not standards than can be forced on volunteers, only >> standards that are more on the order of suggestions, >> excepting legal and other standards of a minimal nature. >> >>> PG won't even make suggestions to help the volunteers learn >>> how to create e-texts for fear that they might be >>> misconstrued as organizational standards. >> >> What Mr. Passey fears is that there are too many standards, >> not that there aren't any at all. . . . > > Now who's putting words into whose mouth? :-) I am interpreting what you have said as best I know now, but I am the first to admit that the speaker is the ONLY one who can know exactly what they meant. However, when I disagree, I do ask for literal translations. ;-) > But although you still seem a little unclear on the concept, > you are mostly right. Thank you for being so kind as to say that so publicly. More thanks! > A standard has to be explicitly definable, if not defined, > and can be defined as well by what it is not as by what it > is. A "standard" that includes everything is, patently, no > standard at all. As you may know, I tried to keep the standards obvious so it woud be possible for anyone on any hardware/software combo-- past, present or future--to create PG eBooks. I wasn't about to rule out whole portions of the population. > Now what I really would like (just to be clear, I'm speaking > in general; I'm not asking for PG to do anything to satisfy > this desire) is a corpus of works which are susceptible to > repurposing via automated data processing. Works which are > usable /by/ computers, not just by humans who happen to be > using computers. Actually, I agree here more than you might think. However, I think what Bowerbird, you, and others, wanted was something that could be totally automated. I wanted something that required the preservation of just an exemplary touch of humanity. . .not 100% automation. However, you will probably get your wish in your lifetime. MY wish was simply to break down the bars of ignorance and of illiteracy for the world at large, not for computers. Even though I was adamant that computers without a special software or hardware committement should be able to read. > Not only is the concept of thousands of "standards" > meaningless, it's virtually impossible to do anything with. Well, you might think that "thousands of `standards'" might be enough of a "reductio ad absurdum/infinitum" to ward off any possible contradiction. . . . . .the truth is that with minimal standards you can never rule out how many thousands of standards might fit in a way that both people and machines can easily read. > If I want to automatically extract the author and title from > every work in my mythical corpus, and every one of them > follows a different standard in identifying those two data > points, it's impractical, if not impossible, to accomplish my > desired task. Even if I /were/ able to write a thousand > programs to match my thousand "standards" I would need some > mechanism to know which text follows which standard so I know > which program to apply to the particular file. In other > words, I would need one, or some limited number, of > meta-standards. Sorry, you lost me there. I'm just talking about reading the books. > Project Gutenberg is not the corpus I need. I'm not saying > that it should be, I'm just saying that it's not. And anyone > who comes to PG thinking that PG e-texts /can/ be used by > computers, and not just humans, should be rapidly disabused > of that notion. I diagree, as do many programmers who use our eBooks. > > [snip] > >> Mr. Passey is confusing freedom and independence with the >> lack of any standards whatsoever . . . > > I don't think so. If a "standard" isn't published it can't be > a standard, and if everyone is not only free to ignore a > published "standard", and does, in fact, ignore that > "standard" then in fact there are no standards. Anarchy and > conformism are not necessarily diametrically opposed, but > they're pretty darn close. It's just that the standards are so simple, not that they were never published. . .and that we don't force them on volunteers. > > [snip] > >> Project Gutenberg has worked with The Internet Archive all >> along, but their goals are not nearly identical. > > Indeed. That is why it may be a more appropriate repository > for those people interested in preserving the Public Domain. > I was proposing the Internet Archive as an alternative to > Project Gutenberg, not as a companion. Well, we've been companions since before The Internet Archive even got famous, so it's a littel late for that. > > [snip] > >> Is his ONLY goal the destruction of Project Gutenberg by >> getting all the volunteers to desert? > > I certainly would not encourage /anyone/ committed to > digitizing paper books to abandon Project Gutenberg unless > they had found some other organization which better suited > their own preferences. Well said! >> Personally, I think he is complaining because we don't pick >> one standard and censor all the others. > > I'm sorry, but I can't help thinking here about BowerBird's > Rorschach blots ... Even Bowerbird would prefer just one standard, though he will work with the simple standards mentioned above, and gives the credit for them where credit is due. >> Personally, I don't think THE standard for eBooks exists >> yet. > > Nor do I. I /do/ think that several GOOD standards for > e-books exist, anyone of which would be reasonable to adopt. > I even think it would be acceptable to choose a half-dozen of > them with the requirement that whichever one you choose you > clearly identify your choice then use it exclusively. I think one has to be VERY careful when assigning standards. VERY careful. More than I could do with anything that wasn't VERY simple. And don't forget the time factor. . . . >> But I think it will become obvious when it does. > > I disagree. I think e-book standards will continue to evolve > as hardware, software, and our understanding of natural > languages improve. I think that eventually eBooks will settle into patterns quite much the way paper books did. Look at the early ones, all over the place, in size, paper and binding, fonts, inks, and everything else. That's the way pioneers are. Later on comes the pressure for everyone to be alike. . . . And the pioneers either die out or move on. >> I was unwilling to force HTML on our volunteers and readers >> when Sir Tim Berners-Lee invited me, and Project Gutenberg, >> to be one of the charter members of The World Wide Web, and >> I stand behind that decision today, simply because I am not >> a purveyor of standards, and neither is Project Gutenberg. > > Hopefully, you've made that perfectly clear. I understand > this, and I hope everyone else does as well. So glad. >> Whatever standards emerge from the real world are just fine. >> >> If Mr. Passey is unwilling to provide examples of standards, >> then it is highly unlikely that he will ever get exactly the >> standards he wants, or anything close to it. > > XHTML, TEI, DocBook, z.m.l., OEB ... take your pick. Sadly to say, at least SOME of the people behind those WANT the standards THEY developed to ELIMINATE all other standards. I've asked them in person. . . . >> I was involved with Unicode people when it was being set up, >> and I can tell it it was a zoo, same with TEI, ZML, ZML, and >> all the rest. >> >> The time wasted was enormous. > > And yet, some pretty good standards emerged. It was obviously > not a waste of time for /everyone/ involved. And now I can > leverage all the good work they did! Time well spent, if you > ask me. If only you said the same about Project Gutenberg. . . eh? ;-) > > [snip] > >> The reason The Web succeeded, browsers succeeded, and >> eBooks, is the the people doing them didn't wait for >> approval. . . . > > The reason the web and browsers succeeded is because Sir Tim > Berners-Lee invented the HyperText Markup Language and the > HyperText Transfer Protocol, and everyone agreed to use it. > The reason e-books /haven't/ succeeded is because everyone > insists on doing things their own way. Actually, I think it was as much the invention of browsers, search engines, etc., that did it. . . . It could have been ANY markup system. . .well not ANY, but, MANY. . . . > >> "Just DO It!" > > Sounds like good advice to me. I would be more than happy to assist you in doing it, if you would allow me. . . . Thanks!!! Michael S. Hart Founder Project Gutenberg Recommended Books: Dandelion Wine, by Ray Bradbury: For The Right Brain Atlas Shrugged, by Ayn Ran,: For The Left Brain [or both] Diamond Age, by Neal Stephenson: To Understand The Internet The Phantom Toobooth, by Norton Juster: Lesson of Life. . . From hart at pglaf.org Fri Dec 21 20:16:07 2007 From: hart at pglaf.org (Michael Hart) Date: Fri, 21 Dec 2007 20:16:07 -0800 (PST) Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass? In-Reply-To: <476C2ED0.5030406@verizon.net> References: <476C2ED0.5030406@verizon.net> Message-ID: Sorry, but Juliet wasn't there when I went to see Charles in Las Vegas [where he lived, but since has moved] to see about building foundations for Distributed Proofreaders. I asked Charles on any number of occasions what he wanted from both myself and Project Gutenberg and my silence was at his request or I would have said much more. I asked again and again, and cannot recall even on time a request made by Charles was not honored. The truth is that he wanted DP to appear to stand alone-- as much as possible--and Greg and I did as requested to a point of keeping much quieter than at least _I_ would. Personally, I am very proud of Distributed Proofreaders-- more than anyone but myself could possibly know, and I do mention DP just about every chance I give a presentation. However, I was asked NOT to say much in the Newsletter or other similar media. . .so I didn't. . . . THAT is how independent we allow the volunteers to be. Something Mr. Noring may have just read in a reply PG CEO Greg Newby just made to his remarks of today. We are MORE than glad to help such project behind scenes, or to give them maximum PR, or anywhere in between. Thanks!!! Michael S. Hart Founder Project Gutenberg Recommended Books: Dandelion Wine, by Ray Bradbury: For The Right Brain Atlas Shrugged, by Ayn Ran,: For The Left Brain [or both] Diamond Age, by Neal Stephenson: To Understand The Internet The Phantom Toobooth, by Norton Juster: Lesson of Life. . . On Fri, 21 Dec 2007, Juliet Sutherland wrote: > > > Michael Hart wrote: >> Distributed Proofreaders came about in exactly the manner _I_ >> have been describing, from WITHIN Project Gutenberg and quite >> WITHOUT any need for such nasty commentaries. >> >> Distributed Proofreaders is a GREAT example of how volunteers >> create their own standards, their own groups, etc., with lots >> of help from "The Powers That Be." >> > Well, that's one version of history. > > Charles Franks (founder of DP) was a PG volunteer who decided to try to > make a better way of proofreading books for PG. Aside from the usual > "let lots of flowers bloom" statements from PG there was NO early > support. Charles did all the coding, ran the software on his own server, > etc. When I went looking for DP in April of 2002, having remembered that > it had been mentioned on gutvol-d several years before (2000), I could > find no mention of it at all on the PG website. From PG it sure looked > like DP didn't exist. In the public interviews and other publicity that > Michael Hart did for PG in 2002 and 2003, at least those that I was > aware of, Michael never once mentioned DP. > > However, in the summer of 2002, PGLAF did provide a high-speed, > destructive scanner setup for Charles Franks. And at the beginning of > 2003 provided a second setup. The Internet Archive provided our next > server, around Sept. 2002 or so. When we could no longer afford "free" > service from IA, PGLAF bought DP a server and paid for hosting. So I'm > not saying that PG has been unsupportive of DP. Once it was apparent > that DP was thriving and would produce lots of material for PG, there > was plenty of support. But for those first 2.5 years it was different. > Today, most of the DP volunteers still arrive via the banner at PG and > most are strongly supportive of PG and its mission. > > JulietS > Distributed Proofreaders > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Fri Dec 21 20:22:12 2007 From: hart at pglaf.org (Michael Hart) Date: Fri, 21 Dec 2007 20:22:12 -0800 (PST) Subject: [gutvol-d] obsessive-compulsive delightful In-Reply-To: <114738108.20071221151709@noring.name> References: <114738108.20071221151709@noring.name> Message-ID: On Fri, 21 Dec 2007, Jon Noring wrote: > Bowerbird wrote: >> Jon Norng wrote: > >> i am informed that jon noring said: > >>> ?? The term "anal-compulsive" is uncalled for. > >> first of all, _i_ can decide for myself what words are >> "called for" >> to describe my positions. but hey jon, thanks for the >> feedback. > > Well, ultimately it is not up to you or me to decide whether > the disparaging use of the phrase "anal-compulsive" is > uncalled for. It is up to the rest of the readers here to > decide that for themselves. > > All I know from running a lot of lists over the years is that > those who oppose hostile-tone speech (which is what I believe > your speech amounted to) far outnumbers those who believe it > to be acceptable (and some believe it to even be useful.) > > Since those who run this list apparently believe it's alright > for people to express themselves in a quite hostile manner > (so long as they don't go way off the deep end), I don't plan > to email them and ask them to do something. I don't > administer or moderate this list, I'm simply a guest in their > house as we all are... > > > Jon Noring Jon Noring is as hostile as anyone on this list, still welcome, after all these years of hostility, mixed with other things. Mr. Noring was responsible for the ONLY "moderation" of anyone, ever, on this list, and yet is still welcome. Greg Newby and I still hope he will take that energy and focus, so we can assist him in something more productive. Michael From hart at pglaf.org Fri Dec 21 20:29:57 2007 From: hart at pglaf.org (Michael Hart) Date: Fri, 21 Dec 2007 20:29:57 -0800 (PST) Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass? In-Reply-To: <1103068917.20071221150109@noring.name> References: <1103068917.20071221150109@noring.name> Message-ID: I would LOVE it if Mr. Noring would make such an effort. I would LOVE to help, and for PG to help. I wouldn't want to be called on to lie about helping him. In fact, after Juliet's remarks earlier today about early support of Distributed Proofreaders being lacking, I must state that feel I should be cautious in the future. However, as far as I am concerned, and from what I read a moment of from Dr. Newby, there is no question a project, or projects would be Mr. Noring's, and he will be welcome to the lion's share of the credit. Again, I don't want to actually LIE and say I didn't help in any way whatsoever, but, as with Juliet's example I am willing to stand so far in the background that I might be accused by those who come later, of not helping. Michael On Fri, 21 Dec 2007, Jon Noring wrote: > [posted publicly to gutvol-d and cc: to Michael] > > > Michael wrote: >> Lee Passey wrote: > >>> (Any suggestions as to what other organizations meet these >>> requirements would be welcome; Distributed Proofreaders is an >>> obvious option). > >> Distributed Proofreaders came about in exactly the manner _I_ >> have been describing, from WITHIN Project Gutenberg and quite >> WITHOUT any need for such nasty commentaries. >> >> Distributed Proofreaders is a GREAT example of how volunteers >> create their own standards, their own groups, etc., with lots >> of help from "The Powers That Be." > > Michael and Greg: > > Supposing someone were to: > > 1) set up an *online archive*, totally independent from PG, and > invited submissions of digital texts which meet a certain set of > guidelines the new archive has established, and > > 2) Personally contacted many of the current contributors of texts to > PG, asking them to contribute their texts to the new and > independent archive *in addition* to contributing them to PG, and > > 3) The new archive will NEVER ever mention PG nor mention that some of > the texts may also have been contributed to PG, > > Would you make a request to the volunteer contributors not to > contribute to the new archive so long as they contribute to PG? > > [Note, it would be made TOTALLY clear to the volunteer contributors > that the new archive has no ties to PG in any manner whatsoever, so > don't worry about that issue in your reply to my question.] > > > Jon Noring > > From jon at noring.name Fri Dec 21 22:06:35 2007 From: jon at noring.name (Jon Noring) Date: Fri, 21 Dec 2007 23:06:35 -0700 Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass? In-Reply-To: References: <476C7097.9070106@novomail.net> Message-ID: <15110491966.20071221230635@noring.name> Michael wrote in reply to Lee: > There is one place we differ, I don't care much about markup > and I apologize if that causes a rift between us or others-- > I hate to say it in at least one manner. . .but some friends > I really like are VERY into markup and appearance, but I may > be just "old school" enough NOT to want to judge a book by a > collection of appearance variables. . .rather than content. Well, the markup proponents here on gutvol-d, in general, fall into two camps as Joshua has observed: 1) Use of markup in a presentational sense to emulate a particular look/feel in presentation, usually some sort of reproduction of the original. 2) Use of markup to specify the various document structures and important inline semantics. #2 is what makes the text highly repurposeable when done right. Even Bowerbird's ZML falls into #2. His rules for formatting the plain text using white space characters is intended to communicate certain document structures, and thus is classifiable as "markup camp #2". I fall into camp #2 quite firmly, as everyone knows by now. > What I see everywhere are comments on FORM, APPEARANCE, that > stuff that goes down to JUDGING A BOOK BY ITS COVER. > > I'm just not that sort of person. . . . Actually, Michael, in that regards you, Lee and I think much alike. (And so does Bowerbird with his ZML where nearly all original typography is stripped out, leaving only content and document structure.) Give me the content (structured is better) and then I can repurpose it any way I want, including viewing it (or listening to it) formatted the way *I want*. And imagine being blind. What value does typography have for books where typography itself is only there for visual presentational purposes? I've actually described a test to show the real importance of typography to content comprehension. Read the book to your child, or to a blind person. How often do you have to explain the typography itself in order to communicate the content of the book? How often do you say "oh, and this paragraph is in 12-pt Garamond"? Or, "and this paragraph is indented 1.0 em"? This is a very useful test for helping make a number of decisions when massaging digital texts, such as markup. (Yes, one role of visual typography is to communicate structure by understood conventions. But communicating structure is NOT content, it describes what the content is, and that can be communicated in other ways depending upon the milieu.) Now I understand the love some people have for the typography of ye olde books. I've been planning for a while the typography for the limited print edition of the "1001 Arabian Nights" project I've been working on for a number of years (I've been playing with InDesign, for example.) So I understand the joy of visual typography, and of beautifully crafted paper books. It's a lot of fun... Btw, I probably will use Adobe Garamond for the font of this book. :^) As a final note, and which Lee explained a while back quite convincingly: when we have properly structured text, it is possible to repurpose it into an exact or near-exact digital reproduction of the original typography -- if one wants. And without a lot of work. But if one starts with highly-presentational markup, the text is NOT very repurposeable without having to expend a lot of work. (E.g, the HTML version of Burton's Arabian Nights in the PG collection is a nightmare to repurpose and I gave up using it as my starting point. Since then I've learned that the source used for the PG version is not the original source, and it made significant amendments to the original text, such as removal of non-Latin characters in footnotes!, etc.) Now it is possible to mix both presentational markup and structural markup, and seemingly have it both ways. In the "master-derivative" approach later described, embedding some original typography is alright, but I tend to be ambivalent about it, mainly because I think that those interested in a true reproduction are in the minority of users, so I think about the added work for adding presentational stuff during the mastering process. Enuf said for the moment...) >> If I want to automatically extract the author and title from >> every work in my mythical corpus, and every one of them >> follows a different standard in identifying those two data >> points, it's impractical, if not impossible, to accomplish my >> desired task. Even if I /were/ able to write a thousand >> programs to match my thousand "standards" I would need some >> mechanism to know which text follows which standard so I know >> which program to apply to the particular file. In other >> words, I would need one, or some limited number, of >> meta-standards. > Sorry, you lost me there. > > I'm just talking about reading the books. Well, this is an important topic I think most everyone producing e-books, even non-programmers, should understand what Lee is saying -- and it is especially of importance to long-term digital preservation of e-books. But it is also a topic that to explain is a whole long message in itself (I started writing up something and realized it was getting very long very quickly, and not sure how to restate it in a brief paragraph. Maybe Lee can recast what he said without using too many paragraphs.) > Even Bowerbird would prefer just one standard, though he will > work with the simple standards mentioned above, and gives the > credit for them where credit is due. With regards to Bowerbird's ZML, I've probably been the #1 supporter of his ZML (to his chagrin), since it is an attempt at regularizing plain text. (Where we differ is that he envisions it being *the* format for nearly all texts under the sun -- to be both master and derivative, etc. -- and I see it as being insufficient for that purpose.) But if PG is to continue to include a plain text version for each book where a plain text is even possible (and that's nearly all books), I do think it a good idea if the plain text is regularized in some fashion. And ZML is the only candidate out there at this time. (There are problems with relying upon any text regularization that Lee elaborated upon in the past, but I won't repeat it here. But if the regularized plain text is intended just for direct reading purposes, and not as a critical component in machine repurposing, I don't have any problems.) > Sadly to say, at least SOME of the people behind those WANT the > standards THEY developed to ELIMINATE all other standards. > > I've asked them in person. . . . Well, I hope you don't think I'm in that group. I take a different tact. I'd like to see the "master-derivative" approach taken, where the book is mastered in TEI (and properly done per camp #2), then use that to auto-convert to a wide-range of end-user derivative formats for reading purposes. Certainly we could and should, as a matter of policy, create one or more permanent derivative versions to sit side-by-side with the master, such as regularized plain text. In fact, I've been intrigued with the "raw text master" approach, something I may discuss here another time. Jon Noring From ralf at ark.in-berlin.de Fri Dec 21 23:45:14 2007 From: ralf at ark.in-berlin.de (Ralf Stephan) Date: Sat, 22 Dec 2007 08:45:14 +0100 Subject: [gutvol-d] TEI corpora with public collaboration In-Reply-To: <476C0617.1000503@novomail.net> References: <476C0617.1000503@novomail.net> Message-ID: <20071222074514.GA26939@ark.in-berlin.de> Lee writes: > If you are a volunteer who feels s/he could work better in an > organization that provides guidance and quality control, you should find > some other organization to work with. (Any suggestions as to what other > organizations meet these requirements would be welcome; There may probably even one that meets your standards, if one is interested in Irish history, literature, or politics: http://celt.ucc.ie It combines proofreading with TEI SGML formatting. ralf From ralf at ark.in-berlin.de Fri Dec 21 23:53:17 2007 From: ralf at ark.in-berlin.de (Ralf Stephan) Date: Sat, 22 Dec 2007 08:53:17 +0100 Subject: [gutvol-d] Why wait till we have to work from bookworm frass? In-Reply-To: References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> <20071217104548.GB7788@ark.in-berlin.de> <4766B6A9.5080400@novomail.net> <20071219093416.GA29329@ark.in-berlin.de> <4769A7BA.1070102@novomail.net> <20071220160825.GA32015@ark.in-berlin.de> <476C1C89.7000809@novomail.net> Message-ID: <20071222075317.GB26939@ark.in-berlin.de> Michael Hart wrote > There is a better record of the history of our eBooks than > in any other eLibrary, once completed. Not so. While I won't state that history info can't be researched with PG, such info can be accessed much more easily with Wikisource texts. They have other problems, however. ralf From Bowerbird at aol.com Sat Dec 22 11:06:21 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 22 Dec 2007 14:06:21 EST Subject: [gutvol-d] some things never change (winter solstice edition) Message-ID: some things never change... david rothman has returned to labeling the o.l.p.c. machine as "the $100 laptop" in the headline on his latest blog entry on it > http://www.teleread.org/blog/2007/12/21/needed-asap-on-the-100-laptop-fbreader-and-easy-opera-installation/ because, as he "explains" in the text, that is the price that he "hopes" that it'll reach "some day". some people never learn. wayne vota, from the unofficial o.l.p.c. news site, said this recently: > A lot of OLPC's problems today date back to the original buzz > about the "$100 laptop," in Vota's opinion. With the price point > capturing attention, OLPC didn't speak to the concept of this > being a revolution in education, Vota said. He contends that > now that OLPC has successfully created interest in the technology, > it should focus on empowering education with technology. you can find that quote at: > http://www.pcworld.com/article/id,140698-c,notebooks/article.html so it seems that vota agrees with me that david's price-spin is a bad idea. *** meanwhile, jon noring is inventing a new need for a new format, this one involving annotations. find the write-up on the teleblawg: > http://www.teleread.org/blog/2007/12/21/be-my-pal-call-for-annotationlinking-open-standard/ (the long u.r.l. titles are a cheap way to enhance david's google-juice.) and no, of course we don't need a new format for this kind of thing. we can use the existing infrastructure to make it happen quite easily, if we ignore format junkies and have programmers make the tools... meanwhile, the clumsy kids over at the teleblawg enact a funny skit that specifically involves _annotations_ (and yes, that _is_ so ironic), in that -- after robert nagel and jon noring did an "upgrade" to the wordpress templates -- the blawg lost its comment-summary page. these guys -- who are constantly telling us that an x.m.l./c.s.s. setup will solve all our problems -- can't even make the _templates_ work! and then they have the gall to tell us that this stuff is not complicated. it's like a keystone-cops routine over there. man, you would think that as much as these guys _hate_my_sarcasm_, they wouldn't give me such ample opportunity to use it, wouldn't you? well, after a time, they've finally got the comment-summary page back: > http://www.teleread.org/blog/wp-stats.php but they still haven't gotten it to work correctly. if you click on a link, you are taken to the _page_ that contains the entry, but _not_ to the specific _comment_ on that page. and, as if that wasn't bad enough, that has the terrible side-effect that once you have clicked on the link to go to _one_ comment on a page, then all of the subsequent links -- on new comments that are made -- appear as "previously-visited", meaning you have to _manually_ track if you've viewed that comment by going to view it again. it's such a mess that i rarely view comments any more after scanning the headlines, and the comments have been the only worthwhile thing on the teleblawg for quite some time now. (you already know what david is gonna blog, as he's said it all before; it used to be fun to see just how he'd work in the same old point again -- especially in the "openreader" epoch (you remember that, right?) -- but lately even _that_ particular tack has become boring. c'est la vie...) so even though noring can't do the simple hacking that's necessary to make comments -- which are "annotations to the original blog entry" -- work correctly on the teleblawg, he nonetheless wants to "assemble" the "brain power" needed to foist a new "annotation standard" on all of us... yeah, right... -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071222/4b6544d8/attachment-0001.htm From jon at noring.name Sat Dec 22 11:43:10 2007 From: jon at noring.name (Jon Noring) Date: Sat, 22 Dec 2007 12:43:10 -0700 Subject: [gutvol-d] some things never change (winter solstice edition) In-Reply-To: References: Message-ID: <18610032348.20071222124310@noring.name> Bowerbird wrote: > meanwhile, jon noring is inventing a new need for a new format, > this one involving annotations.? find the write-up on the teleblawg: >>?http://www.teleread.org/blog/2007/12/21/be-my-pal-call-for-annotationlinking-open-standard/ Thanks for cross-posting the URL here! > that specifically involves _annotations_ (and yes, that _is_ so ironic), > in that -- after robert nagel and jon noring did an "upgrade" to the > wordpress templates -- the blawg lost its comment-summary page. I was not involved in any way with the "upgrade". The last time I helped out with the TeleRead blog was with some CSS during the template change a number of weeks ago. Otherwise I have no idea what they are doing. Again, thanks for posting the link to my annotation article to gutvol-d. Jon Noring From richfield at telkomsa.net Sat Dec 22 13:31:36 2007 From: richfield at telkomsa.net (Jon Richfield) Date: Sat, 22 Dec 2007 23:31:36 +0200 Subject: [gutvol-d] December the how-manyth??? DUCK everybody!!!! Message-ID: <476D8238.4050100@telkomsa.net> Ruin! Destruction! Run for your sanity! Guys are being nice to each other! In gutvol-d yet! What to do? What to DOOOOO??? Oh no! It is overtaking me too! I feel it coming over me and I'll never enjoy Swift again! I can't resist it! Bowerbird? Next you will be building ornate nests to pull the chicks, you traitor! Still, while the spirit is on me, you lot may be variously maniacal, backbiting, opinionated, logical... errr... illogical... err... make that incomprehensible, people who come up with such nonsense that it sometimes seems alarmingly like sense, but somehow some good things happen around and among you. Some of you have been very nice about my efforts to produce things, and PG has enabled me to download large volumes of materials sometimes very valuable to me. (That is why I am grateful for the opportunity to contribute; one-way benefits bother me!) So, please do not get so teed off with each other as to eject dissenters or waste time on them. Just carry on and go on working miracles (those of you that do, anyway.) As for MH, welll... sainthood is not very profitable this side of the grave anyway, and sometimes I think that is a pity. But sometimes I wonder how long his name will be inflicted on the high-school and sociology and IT historians of the future. I won't say compliments of the season, because many of us probably take the season very lightly, but I might as well follow BB's dating: Compliments of 12/10 all round! Cheers, Jon From davidrothman at pobox.com Sat Dec 22 15:21:31 2007 From: davidrothman at pobox.com (David H. Rothman) Date: Sat, 22 Dec 2007 18:21:31 -0500 Subject: [gutvol-d] Troll Bird's $350K ZML goal vs. OLPC and e-book standards [Re: some things...] Message-ID: <5eff08fa0712221521s372e7ea8j55e2a56763b00be6@mail.gmail.com> As the main perp of the TeleRead blog, I'm always amused by negative PR from Bowerbird---considering his famous usefulness as a contrary indicator. It takes a very special kind of humanitarian poet to attack people for sticking up for the One Laptop Per Child Project. To quote a joke from a poet friend of mine, "Poets make the best liars," and the Bird's trollish post is rather misleading at best. BOTH Wayan and I have vigorously criticized the business side of OLPC and other current aspects of it, but long term, we're less excited over the immediate details than over the attention that the project is focusing on needs and solutions for developing countries, not to mention the awesome low-cost display technology. Yes, it's great that the OLPC example has gotten others to come up with their own solutions. Here's to diversity! But Wayan and I are still cheering on OLPC while asking the necessary questions. We can't wait for our own XO laptops---and those of the recipient children in the Give One Get One program---to arrive from OLPC. Besides, I doubt that Wayan would challenge my belief that XO-style machines will eventually go for less than $100. Remember when electronic calculators used to cost hundreds or thousands of dollars? Today the basic models are given away to help move magazine subscriptions. The attack on me for my long-term optimism on OLPC is really part of Bowerbird's campaign to discredit critics of his ZML standard. More on this later. Meanwhile I'm growing nostalgic for the time when the Bird and some others dissed the OLPC machine as mere vaporware. My vaporware will probably be arriving in the next week or so--I won't have a heart attack if it's longer--and along the way I'll be helping a child in a developing country get his/her XO laptop from OLPC. Wayan, too, can't wait for his XO. We'll enjoy near-E Ink screen quality in the reflective mode, and I wouldn't be surprised if FBReader will eventually be ported over to the XO make it possible for me to read books in the standard .epub format that Hachette and other publishing giants will be creating, not to mention the existing .epub public domain efforts of Feedbooks (time for PG to catch up, especially since an open source validation project has been started for .epub?). What's really charming is the way the humanitarian poet loves to beat up on a linux-powered open-source machine whose creators want to spread around free public domain text. Not the wisest or most consistent strategy for someone associated with Project Gutenberg. Especially since the XO has a better screen for e-reading than rivals do and still costs less. Of course, as noted, the Bird's real goal isn't to harm the XO, but rather to discredit Jon Noring, me and other advocates of .epub, a nonproprietary e-book standard--while meanwhile the humanitarian poet is hoping that someone will pay him $350,000 for "full rights" to HIS ZML efforts ("just 10% of what amazon paid for mobipocket, it's a fair price"). See a Bird quote below. Of course, somewhat more than greed, the real driver here is Birdish vanity. As for the TeleRead blog, it's doing great despite little flaws that are inevitable. Our traffic often exceeds that of libraryjournal.com. Robert Nagle is busy with his latest gig, and I'm busy posting at TeleRead and elsewhere; and if anyone wants to pitch in on wp-stats, we'll relish the assistance. Unlike the humanitarian poet, we're not holding ourselves out as real programmers. But like Jon Noring, we have a damned good idea of what we want in an e-publishing standard, and unfortunately Bowerbird's ZML would be a DISASTER for many kinds of publications. I don't see HarperCollins, Simon & Schuster or other .epub supporters begging the Bird for ZML rights. ZML is Kiddiesville. Laughable for advanced scientific publishing, for example. This time I won't directly repeat TeleRead's URL even though Bowerbord was kind enough to give an address with our domain. Instead, for genuinely humanitarian PG folks, here's the address to visit to buy an XO for yourself and a child in a developing country: http://www.laptopgiving.org/en/index.php Phone number for Give One Get One is 1-877-70-LAPTOP (1-877-705-2786), and the deadline for orders is the end of the year. A relevant AP story appears at: http://www.chicagotribune.com/news/nationworld/chi-laptop_webdec22,1,6878223.story?ctrack=2&cset=true Lead is, "Doubts about whether poor, rural children really can benefit from quirky little computers evaporate as quickly as the morning dew in this hilltop Andean village, where 50 primary school children got machines from the One Laptop Per Child project six months ago." If anything the AP story is much gentler on OLPC than Wayan and I have been, but we still appreciate the immense good that the project has done. Perhaps it's time for Bowerbird to shut his beak about OLPC and squawk instead about Iraq. The United States is torturing and killing innocent people by the thousands, but the humanitarian poet can find nothing better to do than to try to beat up on me for observing the obvious---that the price of laptops for kids will be coming down. Pathetic. Looking beyond the Bird's latest message, it might be time for PG to examine the real purpose of this list. Is it to further PG's goal of promoting genuine mass literacy or to provide the humanitarian poet with a platform for performance trolling? I'd love to see PG involved with .epub and checking up on every line of code going into the validation tool--it IS important to monitor the International Digital Publishing Forum and ask skeptical questions. Pure .epub could be a real blessing for PG. But Bowerbird doesn't want you to care about .epub or the XO project; ZML must be the real show. And who cares if the Bird makes it impossible for list participants to stand up for standards without getting flamed? ZML, vanity and the $350K first! David Rothman for TeleRead ================================= From: Bowerbird at aol.com Date: Oct 22, 2007 11:44 PM Subject: [gutvol-d] nice weekend To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com [...] for a long time now, the price for anyone wanting to buy the rights to z.m.l . has been "six figures". and i think it was a couple years back that i raised it to $200,000 minimum. now, with fairly solid conversions to .html and .pdf, an offline-standalone authoring tool, and a to-be-announced-quite-soon web-based authoring tool, plus viewer-apps, i'll be raising the price again... as of november 1, 2007, the price for full rights to z.m.l. will be $350,000. since this is just 10% of what amazon paid for mobipocket, it's a fair price... preference will be given to buyers who will make the package open-source, and such buyers can negotiate for a substantial discount, maybe up to 50%... of course, you know, you could just figure it all out for yourself. it's simple... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071222/4726734e/attachment.htm From gbnewby at pglaf.org Sat Dec 22 18:35:13 2007 From: gbnewby at pglaf.org (Greg Newby) Date: Sat, 22 Dec 2007 18:35:13 -0800 Subject: [gutvol-d] Master format In-Reply-To: <15110491966.20071221230635@noring.name> References: <476C7097.9070106@novomail.net> <15110491966.20071221230635@noring.name> Message-ID: <20071223023513.GA3659@mail.pglaf.org> On Fri, Dec 21, 2007 at 11:06:35PM -0700, Jon Noring wrote: > ... > I take a different tact. I'd like to see the "master-derivative" > approach taken, where the book is mastered in TEI (and properly done > per camp #2), then use that to auto-convert to a wide-range of > end-user derivative formats for reading purposes. That's been a long-term goal. There are a lot of capable TEI tools that were rolled out by a number of PG volunteers, and we've posted a few. While there a variety of reasons why this isn't the default not for submissions from DP and other sources, I do expect the transformation to a TEI master to take place. (I think most of the people on gutvol-d already know this, but just in case... also, it's an opportunity for anyone who cares to to drill deeper into the state of the art and outstanding "to do" items in this effort. I don't have such a list.) -- Greg From lee at novomail.net Sat Dec 22 21:30:45 2007 From: lee at novomail.net (Lee Passey) Date: Sat, 22 Dec 2007 22:30:45 -0700 Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass? In-Reply-To: References: <476C7097.9070106@novomail.net> Message-ID: <476DF285.8030503@novomail.net> Michael Hart wrote: [snip] > Any particular credit line you would like? If "Project Gutenberg" is going to go to the trouble to seek these things out and republish them, then I think "Project Gutenberg" should get the credit. I certainly /wouldn't/ want it implied that I did the work for or on behalf of PG. [snip] > What I see everywhere are comments on FORM, APPEARANCE, that > stuff that goes down to JUDGING A BOOK BY ITS COVER. At dinner last week, a contractor friend of mine mentioned that in interviewing two candidates for a job as a framing carpenter, all else being equal he would hire the one with a college degree, even in a totally unrelated field, because it indicated that s/he was someone who could follow instructions and carry through to the end. One of my supervisors once pointed out to me that while clothes do not make the man, they do announce him. When I see a text file that looks like the kind of automated OCR texts we see from OCA or Google I think to myself, "If they couldn't figure out how to put pages together, or make a file that word wraps (or if they weren't interested in taking the time to do so) what else have they missed? Am I going to get halfway through and discover a missing page? Will the scannos be so distracting that I won't be able to concentrate on the content? I'm not going to bother with this." If PG e-texts /claim/ to be word perfect, but are uncomfortable to read due to their inability to word wrap, or unexplained _ * or # characters, I get much the same feeling. Am I petty to judge the quality of a work by the quality of its presentation? Maybe. But I do it anyway. > I'm just not that sort of person. . . . And I guess I am. I'm comfortable with that. [snip] > Again my apologies, those kinds of standards are just not the > part of books that interest me. Understandable. But why would you assume that they don't interest anyone else? [snip] >> Rather than pointing to examples of books which have managed >> to avoid the white-washer gauntlet, it would be more useful >> to explain the process used to avoid that gauntlet. I'm sure >> I'm not the only one interested in the answer to /that/ >> question. > > The simple answer, as always, is just contect Newby or myself. Wouldn't it just be easier to instruct the white-washers to not reject texts that aren't accompanied by markup-free, simple ASCII texts? [snip] > Sorry, you lost me there. > > I'm just talking about reading the books. And I'm taking about everything else. [snip] > I diagree, as do many programmers who use our eBooks. Well, would you have some of them contact me because /I/ can't figure it out. [snip] > It's just that the standards are so simple, not that they were > never published. . .and that we don't force them on volunteers. Now /this/ I understand. My wife's always expecting me to read /her/ mind too. "But honey, it was so /obvious/!" [snip] > I think that eventually eBooks will settle into patterns quite > much the way paper books did. I agree. I would hope it would happen sooner rather than later, but I'm sure it will happen. Like Mr. Newby, I'm thinking the short-term standard for digital text masters will probably end up being TEI. The long term standard will probably be something that hasn't been invented yet, but there will certainly be an automated migration path from TEI to that new standard. > Look at the early ones, all over the place, in size, paper and > binding, fonts, inks, and everything else. > > That's the way pioneers are. > > Later on comes the pressure for everyone to be alike. . . . > > And the pioneers either die out or move on. And everything the pioneers did that can't be converted to the new system dies out with them. [snip] > Sadly to say, at least SOME of the people behind those WANT the > standards THEY developed to ELIMINATE all other standards. > > I've asked them in person. . . . This sounds just so bizarre to me that I have a hard time relating to it. I suppose that sort of megalomania exists, if you say so, but I can't imagine anyone taking them seriously. >> The reason the web and browsers succeeded is because Sir Tim >> Berners-Lee invented the HyperText Markup Language and the >> HyperText Transfer Protocol, and everyone agreed to use it. >> The reason e-books /haven't/ succeeded is because everyone >> insists on doing things their own way. > > Actually, I think it was as much the invention of browsers, > search engines, etc., that did it. . . . > > It could have been ANY markup system. . .well not ANY, but, > > MANY. . . . Sorry, I just don't buy it. It /could/ have been any markup system (at least any markup system that met the requirements -- text reflow, alternate presentations, hyperlinking, etc.) but it could be only /one/. The victorious system may have been arbitrary, but it was victorious. >>> "Just DO It!" >> Sounds like good advice to me. > > > I would be more than happy to assist you in doing it, > if you would allow me. . . . Thanks . . . but I think I'll try to muddle along on my own -- or with anyone else who would care to join me. From Bowerbird at aol.com Sat Dec 22 23:24:07 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 23 Dec 2007 02:24:07 EST Subject: [gutvol-d] realism Message-ID: gee, i thought i had david in my spam folder. he will be from now on. but hey, no reason for all of you not to enjoy his )inadvertent) humor. for the record, i _love_ the o.l.p.c. project. even ordered one myself, first day i could (heck, i waited a _month_ before buying the iphone), and will be contributing some software to the kids if i'm able to do so. negroponte himself told me that he'd love to have some of my apps... i know tech. and i know you don't do it any favors by spinning vapor, and hype that can't be attained, since that leads to false expectations, which only come to bite you in the butt and rob you of your credibility. rothman raved about openreader for years... literally years (no joke!)... and where is it now? on the scrapheap. along with rothman's credibility. and it didn't get there because of anything _i_ said. it's his _own_ fault... it's enough to make me chuckle. -bowerbird p.s. look for the next update in my "some things never change" series on the spring equinox... it'll be the same story. because (wait for it...) some things never change... ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071223/805a6b18/attachment-0001.htm From ralf at ark.in-berlin.de Sun Dec 23 00:05:33 2007 From: ralf at ark.in-berlin.de (Ralf Stephan) Date: Sun, 23 Dec 2007 09:05:33 +0100 Subject: [gutvol-d] Troll Bird's $350K ZML goal vs. OLPC and e-book standards [Re: some things...] In-Reply-To: <5eff08fa0712221521s372e7ea8j55e2a56763b00be6@mail.gmail.com> References: <5eff08fa0712221521s372e7ea8j55e2a56763b00be6@mail.gmail.com> Message-ID: <20071223080533.GA16701@ark.in-berlin.de> > A relevant AP story appears at: > > http://www.chicagotribune.com/news/nationworld/chi-laptop_webdec22,1,6878223.story?ctrack=2&cset=true Thanks for the link but this here is registration-free: http://www.chicagotribune.com/news/nationworld/chi-laptop_webdec22,1,6878223.story ralf From davidrothman at pobox.com Sun Dec 23 02:47:04 2007 From: davidrothman at pobox.com (David H. Rothman) Date: Sun, 23 Dec 2007 05:47:04 -0500 Subject: [gutvol-d] Troll Bird's $350K ZML goal vs. OLPC and e-book standards [Re: some things...] In-Reply-To: <20071223080533.GA16701@ark.in-berlin.de> References: <5eff08fa0712221521s372e7ea8j55e2a56763b00be6@mail.gmail.com> <20071223080533.GA16701@ark.in-berlin.de> Message-ID: <5eff08fa0712230247n4e2934cew3a3249c08a273a07@mail.gmail.com> Ralf: Thanks! Within the bounds of fair use, I'll repro a little more of the AP piece below from the Chicago Tribune--just in case the Trib later yanks the reg-free version. A very quick Google act doesn't seem to point me to other full texts right now. Bowerbird: It's great you see positives in the OLPC project. Too bad they were lost in your eagerness to carry on your jihad against e-book standards, as part of your effort to market your $350,000 project by trying to discredit the skeptics. As for the OpenReader standard, we got co-opted---by the IDPF, which, after years of dragging its feed, finally woke up as a result of our efforts, as Adobe's Bill McCoy acknowledged. Not the worst tragedy. While I had serious issues with the implementation side of OpenReader, I don't think we did too badly in the end. The big lesson I learned is that publishers want to deal with the IDPF, so that's where I'm focusing my efforts--while encouraging people to monitor the group's standards initiative for purity. It really pains me to see newbies--potential PG readers, kids included!--so confused by the Tower of eBabel of 20+ warring e-book formats. And that's just part of the damage from eBabel. Well, enough. In honor of the holidays, during which Bird supposedly was taking a break from his troll act (he's the one who started the round and intends to continue his jihad next year), I'll stop after this paragraph. May 2008 be the year when this list finally stands up against Bird-style trolling! Can you imagine a potential PG funder tuning in? It's great to question other people's statements to arrive at the truth while being civil about it; but there's a difference between that and sustained personal attacks that go on for years. While people can flame Troll Bird right back, as they do in self-defense, it gets rather boring. It's a little like burning up dry bamboo--not to mention the time stolen from the advancement of PG's goals. HH and thanks, David On Dec 23, 2007 3:05 AM, Ralf Stephan wrote: > > A relevant AP story appears at: > > > > > http://www.chicagotribune.com/news/nationworld/chi-laptop_webdec22,1,6878223.story?ctrack=2&cset=true > > Thanks for the link but this here is registration-free: > > > http://www.chicagotribune.com/news/nationworld/chi-laptop_webdec22,1,6878223.story > > ======================== These offspring of peasant families whose monthly earnings rarely exceed the cost of one of the $188 laptops ? people who can ill afford pencil and paper much less books ? can't get enough of their "XO" laptops. At breakfast, they're already powering up the combination library/videocam/audio recorder/music maker/drawing kits. At night, they're dozing off in front of them ? if they've managed to keep older siblings from waylaying the coveted machines. "It's really the kind of conditions that we designed for," Walter Bender, president of the Massachusetts Institute of Technology spinoff, said of this agrarian backwater up a precarious dirt road. Founded in 2005 by former MIT Media Lab director Nicholas Negroponte, the One Laptop program has retreated from early boasts that developing-world governments would snap up millions of the pint-sized laptops at $100 each. In a backhanded tribute, One Laptop now faces homegrown competitors everywhere from Brazil to India ? and a full-court press from Intel Corp.'s more power-hungry Classmate. But no competitor approaches the XO in innovation. It is hard drive-free, runs on the Linux operating system and stretches wireless networks with "mesh" technology that lets each computer in a village relay data to the others. Mass production began last month and Negroponte says he expects at least 1.5million machines to be sold by next November. Even that would be far less than Negroponte originally envisioned. The higher-than-initially-advertised price and a lack of the Windows operating system, still being tested for the XO, have dissuaded many potential government buyers. Peru made the single biggest order to date ? more than 272,000 machines ? in its quest to turn around a primary education system that the World Economic Forum recently ranked last among 131 countries. [...] "Some tell me that they don't want to be like their parents, working in the fields," first-grade teacher Erica Velasco says of her pupils. She had just sent them to the Internet to seek out photos of invertebrates ? animals without backbones. Antony, 12, wants to become an accountant. Alex, 7, aspires to be a lawyer. Kevin, 9, wants to play trumpet. Saida, 10, is already a promising videographer, judging from her artful recording of the town's recent Fiesta de la Virgen. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071223/7c797d54/attachment.htm From jon at noring.name Sun Dec 23 07:53:47 2007 From: jon at noring.name (Jon Noring) Date: Sun, 23 Dec 2007 08:53:47 -0700 Subject: [gutvol-d] Update on OpenReader... In-Reply-To: References: Message-ID: <1116604583.20071223085347@noring.name> Bowerbird wrote: > rothman raved about openreader for years...? literally years (no joke!)... > and where is it now?? on the scrapheap.? along with rothman's credibility. > and it didn't get there because of anything _i_ said.? it's his _own_ fault... > > it's enough to make me chuckle. Thanks for opening this opportunity to provide an update of OpenReader to gutvol-d! I am always glad when Bowerbird brings up new topics we can discuss that might be of interest to some in the gutvol community. To start off this update, it's been said "I'd rather see a man try and fail, then not try anything at all and fail by default." For example, I believe ZML will fail for the purpose it is being designed for, but despite that belief I admire Bowerbird for trying and for his general persistence and chutzpah (some would call what he says about ZML is hype, but he really believes in what he is doing so I call it enthusiastic promotion.) And Bowerbird's efforts will not go for naught -- good things will come of them to benefit e-books, but what those benefits will be are unknown at this time. We'll only know after a few years... And note when ZML does not develop as Bowerbird hopes it will, I will NOT point my finger at Bowerbird and say he lost credibility or he failed. Rather, I will admire what he tried to do and said "job well done!" and focus on the benefits which come out of that effort. ***** Regarding OpenReader, the OR web page starkly says that OpenReader was a rousing success and a complete, utter failure. And you know, I'm proud of that failure, because *we tried*. Draft specs were written which are still being mined by others for the ideas they contain (I spent about two man-months of almost full time effort hammering down these drafts, and I've had a lot of praise for their thoroughness, quality and consistency.) Also, untold hours were spent communicating with publishers (large and small), technology developers, building the consortium, etc. We lobbied hard, fought hard, and spec'd hard -- and we fought some power players and I believe we came close to reaching critical mass despite it being quite an uphill battle: http://www.openreader.org/ Bill McCoy at Adobe, in public, stated that OpenReader was a success since it spurred IDPF to take action and do *exactly* what I lobbied it should do for several years (more on this right below): http://www.teleread.org/blog/2007/01/05/openreader-victorious/ (Now OpenReader is not yet dead, but still has innovations I believe the next version of OPS/EPub should incorporate. We'll see...) I started OpenReader (with the help of David Rothman who brought in a lot of energy -- and David came up with the name OpenReader) because I was frustrated with IDPF not taking action to develop a universal consumer e-book format per the requirements of the article I wrote back in 2003: http://www.teleread.org/blog/2007/08/29/e-book-standards-article-redux-a-comparison-between-2003-dreams-and-2007-reality/ (Let's say I spent a few years trying to work in the "system" lobbying for change -- finally I had to do something myself to force the issue.) Much of IDPF's non-responsiveness was due to the power players in IDPF not wanting this -- each had their own proprietary format solution they wanted to dominate in the marketplace. But shortly after OpenReader was announced, two major events occurred: 1) IDPF had a fairly major power realignment (e.g., Microsoft left) which left a vacuum, and 2) Adobe (the only remaining "power player" in IDPF) had a 180 degree turnaround (which surprised me), through the efforts of Bill McCoy, and decided there was a pressing need for a reflowable, open standards e-book format. (Up to then, Adobe believed PDF to be the solution.) This led to IDPF making a fast 180 degree turn, and put renewed energy into implementing the *exact* things my 2003 article recommended. And they worked *very* fast -- too fast in my opinion, but nevertheless ETI and Adobe drove the wagon very hard. And I proudly contributed to the new "EPub" standard that resulted. ("EPub" is not yet an official name -- the standards that underlie "EPub" are OPS, OPF, and OCF -- sorry for the acronym soup.) Anyway, I'll let others decide whether or not OpenReader was a success, or failure, or something else. Bill McCoy, who is a General Manager of ePublishing at Adobe, has weighed in his thoughts: success. Bowerbird said it was an utter failure and destroyed my credibility in the eyes of the world (btw, does he speak for the world?) What are your thoughts, dear reader? Jon Noring From Bowerbird at aol.com Sun Dec 23 10:31:45 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 23 Dec 2007 13:31:45 EST Subject: [gutvol-d] ample opportunities Message-ID: greg said: > I do expect the transformation to a TEI master to take place. as i said earlier, it's nice of all you people to just _hand_me_ such ample opportunities to employ my infamous sarcasm... t.e.i. has been "the official policy" of project gutenberg for years. it had already been "the official policy" for a few years in _2003_, when i first came to this listserve. and now, 4 years later, it still is. but yet, look for yourself how many e-texts are marked up in t.e.i. (and then look at the "derivative formats", and laugh at this "future".) moreover, since the library has grown 150% in those last 4 years, the "backlog" of e-texts needing markup has grown considerably. the whitewashers can't even maintain the library in its current form; look at how long it's taking to move to the "new" directory structure. and you're saying that backlog will be marked up in t.e.i. soon? ha! but hey, bring on "the official policy". the quicker the p.g. library gets its t.e.i. combover, the quicker it will start to deteriorate from the lack of maintenance which will inevitably follow, and therefore the quicker which my z.m.l. mirror will supplant the t.e.i. mutant... *** and over on the teleblawg, david rothman makes it easy to mock him by talking out of every side of his mouth. he's against d.r.m., but he's _for_ "social" d.r.m., but he "recognizes" that publishers will "require" some form of d.r.m., so "grudgingly" accepts it, but he demands a type of d.r.m. that would be "interoperable", which he evidently believes is _possible_, which means he must know more than steve jobs, who has said (in his famous letter) that it's basically impossible, because once the "secret sauce" has to be shared, you just can't keep it in the bottle, but meanwhile rothman wants "open-source" solutions which _require_ that the "secret sauce" be not just _shared_, but openly available to all... david is also busy being a lap-dog for the publishing houses other ways. he acts like it's important that they be "on-board" for all "his" initiatives... listen up, people. the publishing houses are idiots. and they're dinosaurs. they're idiots because they are actually trying to _follow_the_footsteps_ of recording companies, who, as you know, are now waist-deep in quicksand. that's right, they think that by ignoring digital distribution, it will "go away". yeah, that's really smart. meanwhile, you can get nearly any book you want from "pirate" networks. if you didn't know this, you probably haven't tried, which is no surprise. very few people even _want_ to read books anymore. and if you take a good look at the "bestseller lists", it's very easy to see why, because 8 out of the top 10 books (and 34 of the top 40) are pure garbage. word up, it's _the_publishing_industry_ that ruined the publishing industry. the recording industry has a teensy bit of success "blaming" p2p networks, even though their demise has been largely their own fault. but publishers? they've got no one to blame but themselves. no one even wants to _steal_ 8/10ths of the garbage they put out for sale. maybe you think some segments are still worthy. like perhaps textbooks? yeah, right, they've been gouging school districts and college students for so long, and in such an obvious way, that they no longer get any respect... academic journals? even worse. and libraries have started to _stand_up_ against them, and inform them in no uncertain terms they will crush them. so who cares what format the publishing industry decides upon? not me! because the future of books involves artists going directly to their audience. the publishing houses have been disintermediated, and that's a good thing. i'm interested in a format that's simple enough that you don't have to hire a "consultant" to negotiate the technoid obstacle-course to create an e-book. a format that gives readers the text -- "out in the clear", as the saying goes, a format that also gives them the ability to set options the way they want 'em -- and do that quickly and easily -- and remix text to their heart's content, including repurposing it into any other format that they desire at any time... that's what my format does. that's why the authors of tomorrow will use it. -bowerbird p.s. here's a little present for you, some interview segments with david byrne: > http://www.wired.com/entertainment/music/magazine/16-01/ff_byrne?currentPage=all#s ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071223/88b1a3c9/attachment-0001.htm From gbnewby at pglaf.org Sun Dec 23 23:21:58 2007 From: gbnewby at pglaf.org (Greg Newby) Date: Sun, 23 Dec 2007 23:21:58 -0800 Subject: [gutvol-d] Whitewashers (was Re: !@!Re: Why wait till we have to work from bookworm frass?) In-Reply-To: <476DF285.8030503@novomail.net> References: <476C7097.9070106@novomail.net> <476DF285.8030503@novomail.net> Message-ID: <20071224072158.GB25293@mail.pglaf.org> On Sat, Dec 22, 2007 at 10:30:45PM -0700, Lee Passey wrote: > >> Rather than pointing to examples of books which have managed > >> to avoid the white-washer gauntlet, it would be more useful > >> to explain the process used to avoid that gauntlet. I'm sure > >> I'm not the only one interested in the answer to /that/ > >> question. > > > > The simple answer, as always, is just contect Newby or myself. > > Wouldn't it just be easier to instruct the white-washers to not reject > texts that aren't accompanied by markup-free, simple ASCII texts? There seems to be a mysticism about the whitewashers. Drop it. They're volunteers, like pretty well everyone else who ever does anything for PG. Those very few individuals do not "reject texts that aren't accompanied by markup-free, simple ASCII texts." Since Michael already told you a way to get around any possible "gauntlet" you might see before you, quit yer bitchin'. Anyway, how would you know, Lee? I can't find any record of you submitting any etexts, or even a copyright clearance request, or having ever communicated with the whitewashers via their mail list, or ever emailing me with a text you want to submit or a question about one. You're an outsider who is not only casting criticism without construct, but also besmirching the efforts of those who actually *do* the work. I'm leaping to the WWers defense because they don't subscribe to this list. Meanwhile, I'll go back to ignoring your prattle. -- Greg From lee at novomail.net Mon Dec 24 08:44:10 2007 From: lee at novomail.net (Lee Passey) Date: Mon, 24 Dec 2007 09:44:10 -0700 Subject: [gutvol-d] Whitewashers In-Reply-To: <20071224072158.GB25293@mail.pglaf.org> References: <476C7097.9070106@novomail.net> <476DF285.8030503@novomail.net> <20071224072158.GB25293@mail.pglaf.org> Message-ID: <476FE1DA.9030603@novomail.net> On 8 Oct 2007 Ralf Stephan wrote: > This may be a FAQ. Would PG accept files that are a superset of PGTEI > and a subset of TEI? If so, which ending should the file have to not > confuse it with a possible PGTEI file? On 8 Oct 2007 I wrote: > It should, by now, be well established that Mr. Hart and the PGPTB > are strongly opposed to the establishment of any file format as the > "preferred" format, regardless of its capabilities. If you look > carefully at the PG FAQ you will note that while an ASCII text > version is requested, it is not required. Thus, you should be able to > submit a valid TEI file to PG, and no other format. On 8 Oct 2007 joshua at hutchinson.net wrote: > Well, yes and no. The FAQ does not say it is required ... but none > of the whitewashers will post it without a text file. You'd have to > go through Greg Newby and get a special dispensation from on high. > :) And there has to be a "read good reason" to not post a text > version. On 23 Dec 2007 Greg Newby wrote: [snip] > There seems to be a mysticism about the whitewashers. Drop it. > They're volunteers, like pretty well everyone else who ever does > anything for PG. > > Those very few individuals do not "reject texts that aren't > accompanied by markup-free, simple ASCII texts." Since Michael > already told you a way to get around any possible "gauntlet" you > might see before you, quit yer bitchin'. So we now have two conflicting answers to the question, one of which I would deem authoritative, the other of which I would deem speculative. Since no one had challenged Mr. Hutchinson's earlier answer, I was just trying to clear up any confusion on this matter. From lopez2 at netscorp.net Mon Dec 24 09:11:49 2007 From: lopez2 at netscorp.net (Kevin Edward Lopez) Date: Mon, 24 Dec 2007 11:11:49 -0600 (CST) Subject: [gutvol-d] ample opportunities In-Reply-To: References: Message-ID: <3184.216.150.45.176.1198516309.squirrel@216.150.45.176> what kind of opportuities? Ed > greg said: >> I do expect the transformation to a TEI master to take place. > > as i said earlier, it's nice of all you people to just _hand_me_ > such ample opportunities to employ my infamous sarcasm... > > t.e.i. has been "the official policy" of project gutenberg for years. > it had already been "the official policy" for a few years in _2003_, > when i first came to this listserve. and now, 4 years later, it still > is. > but yet, look for yourself how many e-texts are marked up in t.e.i. > (and then look at the "derivative formats", and laugh at this "future".) > > moreover, since the library has grown 150% in those last 4 years, > the "backlog" of e-texts needing markup has grown considerably. > > the whitewashers can't even maintain the library in its current form; > look at how long it's taking to move to the "new" directory structure. > and you're saying that backlog will be marked up in t.e.i. soon? ha! > > but hey, bring on "the official policy". the quicker the p.g. library > gets its t.e.i. combover, the quicker it will start to deteriorate from > the lack of maintenance which will inevitably follow, and therefore > the quicker which my z.m.l. mirror will supplant the t.e.i. mutant... > > *** > > and over on the teleblawg, david rothman makes it easy to mock him > by talking out of every side of his mouth. he's against d.r.m., but he's > _for_ "social" d.r.m., but he "recognizes" that publishers will "require" > some form of d.r.m., so "grudgingly" accepts it, but he demands a type > of d.r.m. that would be "interoperable", which he evidently believes is > _possible_, which means he must know more than steve jobs, who has > said (in his famous letter) that it's basically impossible, because once > the "secret sauce" has to be shared, you just can't keep it in the bottle, > but meanwhile rothman wants "open-source" solutions which _require_ > that the "secret sauce" be not just _shared_, but openly available to > all... > > david is also busy being a lap-dog for the publishing houses other ways. > he acts like it's important that they be "on-board" for all "his" > initiatives... > listen up, people. the publishing houses are idiots. and they're > dinosaurs. > they're idiots because they are actually trying to _follow_the_footsteps_ > of > recording companies, who, as you know, are now waist-deep in quicksand. > that's right, they think that by ignoring digital distribution, it will > "go > away". > yeah, that's really smart. meanwhile, you can get nearly any book you > want > from "pirate" networks. if you didn't know this, you probably haven't > tried, > which is no surprise. very few people even _want_ to read books anymore. > and if you take a good look at the "bestseller lists", it's very easy to > see > why, > because 8 out of the top 10 books (and 34 of the top 40) are pure garbage. > word up, it's _the_publishing_industry_ that ruined the publishing > industry. > the recording industry has a teensy bit of success "blaming" p2p networks, > even though their demise has been largely their own fault. but > publishers? > they've got no one to blame but themselves. no one even wants to _steal_ > 8/10ths of the garbage they put out for sale. > > maybe you think some segments are still worthy. like perhaps textbooks? > yeah, right, they've been gouging school districts and college students > for > so long, and in such an obvious way, that they no longer get any > respect... > > academic journals? even worse. and libraries have started to > _stand_up_ > against them, and inform them in no uncertain terms they will crush them. > > so who cares what format the publishing industry decides upon? not me! > > because the future of books involves artists going directly to their > audience. > the publishing houses have been disintermediated, and that's a good thing. > > i'm interested in a format that's simple enough that you don't have to > hire a > "consultant" to negotiate the technoid obstacle-course to create an > e-book. > a format that gives readers the text -- "out in the clear", as the saying > goes, > a format that also gives them the ability to set options the way they want > 'em > -- and do that quickly and easily -- and remix text to their heart's > content, > including repurposing it into any other format that they desire at any > time... > > that's what my format does. that's why the authors of tomorrow will use > it. > > -bowerbird > > p.s. here's a little present for you, some interview segments with david > byrne: >> > http://www.wired.com/entertainment/music/magazine/16-01/ff_byrne?currentPage=all#s > > > > ************************************** > See AOL's top rated recipes > (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From lopez2 at netscorp.net Mon Dec 24 10:32:44 2007 From: lopez2 at netscorp.net (Kevin Edward Jordan) Date: Mon, 24 Dec 2007 12:32:44 -0600 (CST) Subject: [gutvol-d] ample opportunities In-Reply-To: References: Message-ID: <3951.216.150.45.189.1198521164.squirrel@216.150.45.189> what kind of opportunities ? > greg said: >> I do expect the transformation to a TEI master to take place. > > as i said earlier, it's nice of all you people to just _hand_me_ > such ample opportunities to employ my infamous sarcasm... > > t.e.i. has been "the official policy" of project gutenberg for years. > it had already been "the official policy" for a few years in _2003_, > when i first came to this listserve. and now, 4 years later, it still > is. > but yet, look for yourself how many e-texts are marked up in t.e.i. > (and then look at the "derivative formats", and laugh at this "future".) > > moreover, since the library has grown 150% in those last 4 years, > the "backlog" of e-texts needing markup has grown considerably. > > the whitewashers can't even maintain the library in its current form; > look at how long it's taking to move to the "new" directory structure. > and you're saying that backlog will be marked up in t.e.i. soon? ha! > > but hey, bring on "the official policy". the quicker the p.g. library > gets its t.e.i. combover, the quicker it will start to deteriorate from > the lack of maintenance which will inevitably follow, and therefore > the quicker which my z.m.l. mirror will supplant the t.e.i. mutant... > > *** > > and over on the teleblawg, david rothman makes it easy to mock him > by talking out of every side of his mouth. he's against d.r.m., but he's > _for_ "social" d.r.m., but he "recognizes" that publishers will "require" > some form of d.r.m., so "grudgingly" accepts it, but he demands a type > of d.r.m. that would be "interoperable", which he evidently believes is > _possible_, which means he must know more than steve jobs, who has > said (in his famous letter) that it's basically impossible, because once > the "secret sauce" has to be shared, you just can't keep it in the bottle, > but meanwhile rothman wants "open-source" solutions which _require_ > that the "secret sauce" be not just _shared_, but openly available to > all... > > david is also busy being a lap-dog for the publishing houses other ways. > he acts like it's important that they be "on-board" for all "his" > initiatives... > listen up, people. the publishing houses are idiots. and they're > dinosaurs. > they're idiots because they are actually trying to _follow_the_footsteps_ > of > recording companies, who, as you know, are now waist-deep in quicksand. > that's right, they think that by ignoring digital distribution, it will > "go > away". > yeah, that's really smart. meanwhile, you can get nearly any book you > want > from "pirate" networks. if you didn't know this, you probably haven't > tried, > which is no surprise. very few people even _want_ to read books anymore. > and if you take a good look at the "bestseller lists", it's very easy to > see > why, > because 8 out of the top 10 books (and 34 of the top 40) are pure garbage. > word up, it's _the_publishing_industry_ that ruined the publishing > industry. > the recording industry has a teensy bit of success "blaming" p2p networks, > even though their demise has been largely their own fault. but > publishers? > they've got no one to blame but themselves. no one even wants to _steal_ > 8/10ths of the garbage they put out for sale. > > maybe you think some segments are still worthy. like perhaps textbooks? > yeah, right, they've been gouging school districts and college students > for > so long, and in such an obvious way, that they no longer get any > respect... > > academic journals? even worse. and libraries have started to > _stand_up_ > against them, and inform them in no uncertain terms they will crush them. > > so who cares what format the publishing industry decides upon? not me! > > because the future of books involves artists going directly to their > audience. > the publishing houses have been disintermediated, and that's a good thing. > > i'm interested in a format that's simple enough that you don't have to > hire a > "consultant" to negotiate the technoid obstacle-course to create an > e-book. > a format that gives readers the text -- "out in the clear", as the saying > goes, > a format that also gives them the ability to set options the way they want > 'em > -- and do that quickly and easily -- and remix text to their heart's > content, > including repurposing it into any other format that they desire at any > time... > > that's what my format does. that's why the authors of tomorrow will use > it. > > -bowerbird > > p.s. here's a little present for you, some interview segments with david > byrne: >> > http://www.wired.com/entertainment/music/magazine/16-01/ff_byrne?currentPage=all#s > > > > ************************************** > See AOL's top rated recipes > (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From lee at novomail.net Mon Dec 24 10:34:30 2007 From: lee at novomail.net (Lee Passey) Date: Mon, 24 Dec 2007 11:34:30 -0700 Subject: [gutvol-d] ample opportunities In-Reply-To: References: Message-ID: <476FFBB6.7090800@novomail.net> Bowerbird at aol.com wrote: > greg said: > >> I do expect the transformation to a TEI master to take place. > > as i said earlier, it's nice of all you people to just _hand_me_ > such ample opportunities to employ my infamous sarcasm... > > t.e.i. has been "the official policy" of project gutenberg for years. > it had already been "the official policy" for a few years in _2003_, > when i first came to this listserve. and now, 4 years later, it still is. > but yet, look for yourself how many e-texts are marked up in t.e.i. > (and then look at the "derivative formats", and laugh at this "future".) I'm sorry, didn't you get the memo? Project Gutenberg has no "official policies," except as it involves copyright clearances, and certainly none that involve file formatting or markup. While TEI files are permitted, they are not required, nor even encouraged beyond the generic "give us whatever you want." And as near as I can tell, TEI files are not even preferred over, say, z.m.l. Mr. Hart has noted earlier that, "Whatever standards emerge from the real world are just fine." I seem to recall you saying something to the effect that most of the files in the PG corpus already conform to z.m.l., so if you are correct it would appear that z.m.l. is as close to an "official policy" as we're going to get. So hop on it, and get us that tool that will distinguish valid z.m.l. from invalid z.m.l. so we can start making a list of that which does, and does not, conform to the "official policy." From hart at pglaf.org Mon Dec 24 10:40:15 2007 From: hart at pglaf.org (Michael Hart) Date: Mon, 24 Dec 2007 10:40:15 -0800 (PST) Subject: [gutvol-d] Whitewashers (was Re: !@!Re: Why wait till we have to work from bookworm frass?) In-Reply-To: <20071224072158.GB25293@mail.pglaf.org> References: <476C7097.9070106@novomail.net> <476DF285.8030503@novomail.net> <20071224072158.GB25293@mail.pglaf.org> Message-ID: This is the opposite of the way things usually are here. Usually _I_ am the one to respond so strongly to persons who are so obviously trying to push the buttons of PG, & Greg is the one who then responds more clamly later. Yes it is obvious that Mr. Passey is pushing PG buttons. So what? If Mr. Passey really has a goal in mind, it will become, eventually, pretty clear to us all, even if that goal is merely to muddy the PG waters and waste our time. However, I would like to think better of Mr. Passey, and of ourselves, and hope this will all lead us to stronger positions in the future, as we have seen from the others who have taken similar positions in the past. As I have so often said, everyone can run PG better than I/we do, which is why we run it so very little. When it comes to eBooks, to each his own, even Passey or Noring or anyone else. . . . Thanks!!! Michael S. Hart Founder Project Gutenberg Recommended Books: Dandelion Wine, by Ray Bradbury: For The Right Brain Atlas Shrugged, by Ayn Ran,: For The Left Brain [or both] Diamond Age, by Neal Stephenson: To Understand The Internet The Phantom Toobooth, by Norton Juster: Lesson of Life. . . On Sun, 23 Dec 2007, Greg Newby wrote: > On Sat, Dec 22, 2007 at 10:30:45PM -0700, Lee Passey wrote: >>>> Rather than pointing to examples of books which have managed >>>> to avoid the white-washer gauntlet, it would be more useful >>>> to explain the process used to avoid that gauntlet. I'm sure >>>> I'm not the only one interested in the answer to /that/ >>>> question. >>> >>> The simple answer, as always, is just contect Newby or myself. >> >> Wouldn't it just be easier to instruct the white-washers to not reject >> texts that aren't accompanied by markup-free, simple ASCII texts? > > There seems to be a mysticism about the whitewashers. Drop it. > They're volunteers, like pretty well everyone else who ever does > anything for PG. > > Those very few individuals do not "reject texts that aren't > accompanied by markup-free, simple ASCII texts." Since Michael > already told you a way to get around any possible "gauntlet" > you might see before you, quit yer bitchin'. > > Anyway, how would you know, Lee? I can't find any record of you > submitting any etexts, or even a copyright clearance request, > or having ever communicated with the whitewashers via their > mail list, or ever emailing me with a text you want to submit > or a question about one. You're an outsider who is not only > casting criticism without construct, but also besmirching the > efforts of those who actually *do* the work. > > I'm leaping to the WWers defense because they don't subscribe > to this list. Meanwhile, I'll go back to ignoring your prattle. > -- Greg > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Bowerbird at aol.com Mon Dec 24 12:10:01 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 24 Dec 2007 15:10:01 EST Subject: [gutvol-d] ample opportunities Message-ID: well, an opportunity to mock t.e.i. as a needlessly complicated form of markup, which hasn't been implemented even though it's been "official policy" for years, for instance... or the opportunity to mock rothman's mushy-mouthed proclamations where he tries to honor the win-friends-and-influence-people doctrine which says you should always give your opponents some wiggle-room to "come around", which essentially means then that you cannot take an unequivocal stand, even against something that's purely evil, like putting locks on our cultural heritage, or trying to turn e-books into cash-registers that ka-ching on every page-turn. it's important to mock such things. humor -- even the sarcasm form, which allows your antagonists to _spin_ your criticism as being "mean-spirited" -- is one of the best ways of pointing out failures of rhetoric when they occur, so it's extremely kind of my sparring partners to spout their various silliness, because it gives me _ample_opportunities_ for humor. i mean, otherwise, i might have to get _serious_, and what fun is that? :+) people like to laugh. so it's good when my opponents say ridiculous things... *** heck, it's even funny when my _friends_ say ridiculous things, like michael here: > If Mr. Passey really has a goal in mind, it will become, > eventually, pretty clear to us all, even if that goal is > merely to muddy the PG waters and waste our time. for whom is this still unclear? :+) and if it hasn't become clear to you _yet_, then what -- pray tell -- would mr. passey have to do in order to _make_ it "become clear"? (this is a serious question. i mean, it might be funny, but it's serious too.) michael continues: > However, I would like to think better of Mr. Passey, > and of ourselves, and hope this will all lead us to > stronger positions in the future, as we have seen from > the others who have taken similar positions in the past. it appears michael is filled with the holiday spirit. or perhaps he has merely been drinking too many of those holiday spirits. whatever the case, it's certainly a nice change of pace... and i, too, am filled with _joy_ today. because santa claus -- cleverly disguised as a fed-ex delivery person, sneaky! -- just brought an o.l.p.c. machine to my front door, oh yes... my girlfriend, bless her heart, is gift-wrapping it for me, and i'm just like a little kid looking forward to opening it! -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071224/e988b1f0/attachment.htm From gbnewby at pglaf.org Mon Dec 24 14:05:54 2007 From: gbnewby at pglaf.org (Greg Newby) Date: Mon, 24 Dec 2007 14:05:54 -0800 Subject: [gutvol-d] Collection development policy (Re: Why wait till we have to work from bookworm frass?) In-Reply-To: <659760181.20071221173505@noring.name> References: <476C0617.1000503@novomail.net> <191898394.20071221144402@noring.name> <20071221225917.GB14265@mail.pglaf.org> <659760181.20071221173505@noring.name> Message-ID: <20071224220554.GA8269@mail.pglaf.org> Apologies to those who get annoyed by the changing subject line, but I find it easier to track the different asynchronous discussion themes. On Fri, Dec 21, 2007 at 05:35:05PM -0700, Jon Noring wrote: >. > I would like to understand PG's "official collection development > policy." If this is spelled out at the PG site (a Google search turned > up nothing using that phrase), a link to it would be appreciated. I > have an idea what it is, but since a collection development policy is > clearly an organizational policy, the official policy has to originate > from PGLAF. (For those who don't know, "collection development" is a term from librarianship...you can even take full graduate level courses in it! A google search turned up lots of examples.) There is none. There won't be one any time soon. When I started writing one, two years ago, I discovered that the various "FAQ" items that Michael and I wrote obviate the need for a separate collection development policy. I'll paste in FAQ #0 below, from http://www.gutenberg.org/wiki/Gutenberg:Project_Gutenberg_Mission_Statement_by_Michael_Hart Some of the other essays in the set (under "About Us" from the main page at www.gutenberg.org) reinforce the idea that PG is quite open about what materials are added. There are some things we choose not to add to the collection... and sometimes those simply go to other collections (such as preprints.readingroo.ms or gutenberg.us). Sometimes that's due to format, or not wanting to be a vanity press, or a few other reasons that are scattered in the HOWTO section of www.gutenberg.org. Assembling those reasons together would serve to itemize the few things we generally don't add to the collection...but the better "collection development policy" is below: The mission of Project Gutenberg is simple: To encourage the creation and distribution of eBooks. This mission is, as much as possible, to encourage all those who are interested in making eBooks and helping to give them away. In fact, Project Gutenberg approves about 99% of all requests from those who would like to make our eBooks and give them away, within their various local copyright limitiations. Project Gutenberg is powered by ideas, ideals, and by idealism. Project Gutenberg is not powered by financial or political power. Therefore Project Gutenberg is powered totally by volunteers. Because we are totally powered by volunteers we are hesitant to be very bossy about what our volunteers should do, or how to do it. We offer as many freedoms to our volunteers as possible, in choices of what books to do, what formats to do them in, or any other ideas they may have concerning "the creation and distribution of eBooks." Project Gutenberg is not in the business of establishing standards. If we were, we would have gladly accepted the request to convert an exemplary portion of our eBooks into HTML when World Wide Web was a brand new idea in 1993; we are happy to bring eBooks to our readers in as many formats as our volunteers wish to make. In addition, we do not provide standards of accuracy above those as recommended by institutions such as the U.S. Library of Congress at the level of 99.95%. While most of our eBooks exceed these standards and are presented in the most common formats, this is not a requirement; people are still encouraged to send us eBooks in any format and at any accuracy level and we will ask for volunteers to convert them to other formats, and to incrementally correct errors as times goes on. Many of our most popular eBooks started out with huge error levels--only later did they come to the more polished levels seen today. In fact, many of our eBooks were done totally without any supervision--by people who had never heard of Project Gutenberg--and only sent to us after the fact. We want to continue to encourage everyone to send us eBooks, even if they have already created some without any knowledge of who we were, what we were doing, or how we were doing it. Everyone is welcome to contribute to Project Gutenberg. Thus, there are no dues, no membership requirements: and still only the most general guidelines to making eBooks for Project Gutenberg. We want to provide as many eBooks in as many formats as possible for the entire world to read in as many languages as possible. Thus, we are continually seeking new volunteers, whether to make one single favorite book available or to make one new language available or to help us with book after book. Everyone is welcome here at Project Gutenberg. Everyone is free to do their own eBooks their own way. Written by Michael S. Hart June 20, 2004. Updated October 23, 2004. From gbnewby at pglaf.org Mon Dec 24 14:17:52 2007 From: gbnewby at pglaf.org (Greg Newby) Date: Mon, 24 Dec 2007 14:17:52 -0800 Subject: [gutvol-d] PGLAF metadata In-Reply-To: <20071221081236.GA782@ark.in-berlin.de> References: <01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net> <20071217104548.GB7788@ark.in-berlin.de> <4766B6A9.5080400@novomail.net> <20071219093416.GA29329@ark.in-berlin.de> <4769A7BA.1070102@novomail.net> <20071220114033.GA31581@ark.in-berlin.de> <20071220232748.GC20405@mail.pglaf.org> <20071221081236.GA782@ark.in-berlin.de> Message-ID: <20071224221752.GF8269@mail.pglaf.org> On Fri, Dec 21, 2007 at 09:12:36AM +0100, Ralf Stephan wrote: > > What official PGLAF metadata do you want to access? If > > you're just looking for copyright clearance info that identifies > > print volumes, David Price's list is a good place to start: > > > > http://www.dprice48.freeserve.co.uk/GutIP.html > > I'm sorry to say that the info does not identify print volumes. > Especially the well known books have several editions. So, what's > missing is > > - original publishing place > - original publishing year We keep that info in the copyright clearance system (it's part of a clearance request). We do not redistribute it with eBooks or put it in our catalog, because we're not creating analogs to particular print editions, and don't claim that our eBooks match the particular print source(s) they were derived from. > Let's say we don't need the publisher because it's highly unlikely > different editions have the same place and year. No one would need > this info if we could access the cleared title pages, however, from > the etext page, for example. > > So, is it possible to access place/year for a work? If not, is it > possible to get at the title scan? There is no public front-end to the scans submitted to the copyright clearance system. However, I can provide them on request. Ditto with the other metadata submitted at clearance time. But instead, why not just contact the eBook's producer? The credit line in virtually every eBook tells you who it came from. PG can help you get in touch with a producer, if needed. Today, many eBook producers choose to put the text from the title page etc. in their eBooks. Check there first. Also, we are more frequently posting the page scans along with eBooks. Producers are encouraged to submit the scans, but most have not. Eventually, we anticipate most items from DP will have their page scans uploaded. To summarize, there are MANY ways of finding out more about print sources used for particular eBooks. We do not try to track such info in the PG online catalog or in the eBooks themselves. -- Greg PS: If someone has the skills & inclination to create a system that provides public access to the copyright clearance metadata and scans, linking that to released eBooks, I'd be willing to help. We should redirect such conversation to gutvol-p . It's a lot harder than it looks, due to widely varying quality & consistency in the metadata...also due to changes to how the metadata are listed between the cleared item, the eBook itself, and the PG catalog. Accuracy in such metadata is probably not achievable solely through automation. From hart at pglaf.org Tue Dec 25 17:39:25 2007 From: hart at pglaf.org (Michael Hart) Date: Tue, 25 Dec 2007 17:39:25 -0800 (PST) Subject: [gutvol-d] ample opportunities In-Reply-To: References: Message-ID: On Mon, 24 Dec 2007, Bowerbird at aol.com wrote: > well, an opportunity to mock t.e.i. as a needlessly complicated form of > markup, which hasn't been implemented even though it's been "official > policy" for years, for instance... This is not really in the Christmas spirit, nor have I been imbibing the other kinds of spirits, but I should add at an opportune moment that the TEI founders I spoke with down at Oak Ridge, Tennessee [once the nations' largest electricity consumption per building], actually SAID, right out loud in my face that the goal of TEI was to eliminate plain text. This was back in the early 1990's as I recall, and may have begun the incipient warfare between markup and plain text. I have never been against markup for those who want it from a personal perspective, but WAY too much perspective in the whole markup evolution has been profit and power. It's one thing to have a preference. It's another thing to want that preference applied to all! mh > > or the opportunity to mock rothman's mushy-mouthed proclamations where > he tries to honor the win-friends-and-influence-people doctrine which says > you should always give your opponents some wiggle-room to "come around", > which essentially means then that you cannot take an unequivocal stand, even > against something that's purely evil, like putting locks on our cultural > heritage, > or trying to turn e-books into cash-registers that ka-ching on every > page-turn. > > it's important to mock such things. humor -- even the sarcasm form, which > allows your antagonists to _spin_ your criticism as being "mean-spirited" -- > is one of the best ways of pointing out failures of rhetoric when they occur, > so it's extremely kind of my sparring partners to spout their various > silliness, > because it gives me _ample_opportunities_ for humor. i mean, otherwise, > i might have to get _serious_, and what fun is that? :+) > > people like to laugh. so it's good when my opponents say ridiculous > things... > > *** > > heck, it's even funny when my _friends_ say ridiculous things, like michael > here: >> If Mr. Passey really has a goal in mind, it will become, >> eventually, pretty clear to us all, even if that goal is >> merely to muddy the PG waters and waste our time. > > for whom is this still unclear? :+) > > and if it hasn't become clear to you _yet_, then what -- pray tell -- > would mr. passey have to do in order to _make_ it "become clear"? > (this is a serious question. i mean, it might be funny, but it's serious > too.) > > michael continues: >> However, I would like to think better of Mr. Passey, >> and of ourselves, and hope this will all lead us to >> stronger positions in the future, as we have seen from >> the others who have taken similar positions in the past. > > it appears michael is filled with the holiday spirit. or perhaps > he has merely been drinking too many of those holiday spirits. > > whatever the case, it's certainly a nice change of pace... > > and i, too, am filled with _joy_ today. because santa claus > -- cleverly disguised as a fed-ex delivery person, sneaky! -- > just brought an o.l.p.c. machine to my front door, oh yes... > > my girlfriend, bless her heart, is gift-wrapping it for me, > and i'm just like a little kid looking forward to opening it! > > -bowerbird > > > > ************************************** > See AOL's top rated recipes > (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) > From Bowerbird at aol.com Thu Dec 27 11:52:37 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 27 Dec 2007 14:52:37 EST Subject: [gutvol-d] ample opportunities Message-ID: michael said: > This is not really in the Christmas spirit, nor have I been > imbibing the other kinds of spirits, but I should add at an > opportune moment that the TEI founders I spoke with down > at Oak Ridge, Tennessee [once the nations' largest electricity > consumption per building], actually SAID, right out loud in > my face that the goal of TEI was to eliminate plain text. well, i'm sure they meant it in the nicest way. really. i'm totally serious. you know, most successful competitor, nothing personal -- all that rot. it's just too bad they lost all of their credibility in that little poker game... > This was back in the early 1990's as I recall, and may have > begun the incipient warfare between markup and plain text. nah, i don't believe that... simplicity and complexity have been warring ever since human beings were granted their first touch of intelligence... and actually, had i not been stirring the hornet's nest of heavy-markup on a regular basis here, the topic would've come up quite infrequently... plus, aside from the merry-go-rounders repeating their same old lines, none of the heavy-markup people even bother to defend any turf here, because they cannot dispute the inexorable move away from the model. it's quite clear, to anyone with eyes, that light-markup is our future... *** further, the _next_ evolutionary jump has come rather quickly upon us. i hinted at it years ago, in some listserve messages over on bookpeople. and a year ago, on christmas day, i unveiled solid research over on d.p. (they didn't use the information, so i'm now re-gifting it to p.g. proper.) to sum it in a sentence, o.c.r. from the large-scale digitization initiatives is approaching a rather impressive accuracy, especially when combined with post-o.c.r. book-wide correction routines, including _comparison_ with pre-existing digitizations (among them, many of the p.g. e-texts)... our _new_ world is one where we don't need the "proofing" of our past. *** i'm currently analyzing "moby dick", as digitized by the o.c.a., thank you. the o.c.r. is -- on the whole -- very good. it's not anywhere _near_ the many-9's figures that some projects like to _claim_ as their "standard". but, then again, very little of the actual _text_ in those projects _meets_ the standard they claim. the university of michigan, for example, _says_ that it requires 99.95% accuracy. but i defy you to find _any_ texts there matching that level. while i can point to _hundreds_ that fail to meet it... so i'm not going to claim big numbers on accuracy. not from raw o.c.r. but at the same time, i can vouch that most of the words on most pages are correct. and of the things that are incorrect, many can be fixed with _automatic_ post-o.c.r. cleanup routines. i will give solid examples here in the coming days, so don't bother presenting some "theoretical" counter (i.e., examples you pull out of your butt). and once you compare that text with a pre-existing digitization -- like, say an e-text in project gutenberg, which is what i'm using right now -- the list of things to look at gets small. there's _certainly_ no need to eye every word, when so many of 'em match. so -- at the end of the day -- the accuracy that you obtain is _excellent_. it starts very good, and then escalates, rather quickly, toward perfection... just to give you an idea, using the data that i laid out in detail over at d.p. in a thread last christmas, out of a book that contained some 5700 lines, 5500+ of the lines were correctly recognized by the o.c.r. from the o.c.a., and the remaining 200 lines being isolated for review, via a comparison with two other digitizations (one from google, the other a p.g. e-text)... in other words, it wasn't just that i obtained 96.5% accuracy on the text; it's that i had info pointing _exactly_ to the 3.5% that i needed to check... obviously, if you only have to check 4% of a book, it goes _much_ faster. and this is what _i_ know _i_ can do, based on the knowledge _i_ have... i'm sure that other people have other knowledge that could jack accuracy even higher, until we're getting phenomenal results with very little effort. what does this mean? and why is it of relevance to the markup question? well, the first thing that it means is that you will be able to grab a scan-set from the o.c.a. or google, and o.c.r. it, and obtain extremely accurate text. (and this assumes that someone didn't beat you to the punch, and post it.) more to the point, _anyone_ can do that. on _any_ scan-set_ at _any_ time. so there'll be no compelling need to "digitize" books "en masse" any more. it can be done as a one-off, by anyone, at any time. "digitize on demand". and if we _do_ continue to do it "en masse", that project will go very fast... distributed proofreaders -- and efforts like it -- will become unnecessary. and that becomes relevant to the markup question because the best time to apply heavy-markup _would_have_been_ during a digitization process, at least when that process was being enacted by _knowledgeable_ people. for instance, over at d.p. they've broken down the workflow into two parts, proofing and formatting. the proofing rounds get the _characters_ right, and the formatting rounds "do the formatting"; that is, they apply markup. plainly put, it is the formatting volunteers who would do the heavy-markup. but if the entirety of distributed proofreaders is tossed out as "unnecessary" -- because the o.c.r. and post-o.c.r. cleanup gets all the characters right -- then there won't be any formatting rounds, and markup will not get applied... so really, it's only been in this particular timeframe -- let's say 2000-2010 -- where heavy-markup had a "window of opportunity" for meaningful uptake... in this "window", we were "manually" correcting scans, using human eyeballs. and as long as they were doing _that_, they could've applied heavy-markup... but by 2010, a mere button-click will have a scan-set digitized automatically, with no humans involved, and therefore no one who could apply any markup. so heavy-markup doesn't stand a chance any more. it's never had a cost-benefit ratio that was any good, but as long as you had _volunteers_ paying the costs (or, at the very least, a _hope_ that they would do so) you could afford to ignore the poor cost-benefit ratio. but with no suckers left to apply heavy-markup for you, that means you'll have to _pay_ to have it done, and that means you'll have to have a _budget_, up-front no less, and that complicates your implementation immensely... heavy-markup is doomed. i won't even bother arguing against it in 2008, because it just ain't worth the time... -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071227/1ffa0212/attachment.htm From lee at novomail.net Fri Dec 28 11:55:43 2007 From: lee at novomail.net (Lee Passey) Date: Fri, 28 Dec 2007 12:55:43 -0700 Subject: [gutvol-d] Preservation of line endings Message-ID: <477554BF.6000508@novomail.net> There has been some discussion here in the past about whether or not it is important to preserve line endings when OCRing new texts. Personally, I'm ambivalent, but I recognize that others have strong feelings on the subject. When using ABBYY FineReader it is possible to ask for line endings to be preserved when selecting HTML output (and really, there is no reason to make any other selection). When doing so, a
tag is output everywhere a line ends in the source image. ABBYY is quite good at recognizing when line-ending hyphens are a result of line wrapping (soft hyphenation) as opposed to being part of a compound word (hard hyphenation). Unfortunately, when selecting the "keep line breaks" option in FR, the recognition of soft hyphens is lost. In order to preserve my cake and eat it to, I have written a program which compares two otherwise identical output files from ABBYY (one with "Keep line breaks" selected and one with "Keep line breaks" unselected) and merges the two, resulting in a file which preserves line breaks but which flags all hard hyphens with the extra notation '
' when line-ending hyphenation exists in the output file where "Keep line breaks" was unselected. As usual, if anyone wants a copy of this (Win32, console) program, with accompanying source code, contact me back channel. From joshua at hutchinson.net Fri Dec 28 12:18:03 2007 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri, 28 Dec 2007 20:18:03 +0000 (GMT) Subject: [gutvol-d] Preservation of line endings Message-ID: <783243558.231031198873083729.JavaMail.mail@webmail01> An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071228/468a0199/attachment.htm From cannona at fireantproductions.com Fri Dec 28 12:52:01 2007 From: cannona at fireantproductions.com (Aaron Cannon) Date: Fri, 28 Dec 2007 14:52:01 -0600 Subject: [gutvol-d] bitlet and the CD/DVD images Message-ID: <4113C7CC6DF0474AA76DB2C130EE499A@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Hi All. I added a link to the PG Wiki on the page http://www.gutenberg.org/wiki/Gutenberg:The_CD_and_DVD_Project . (Actually, to be precise, three links.) These links should allow one to download the CD and DVD images via BitTorrent without having to install a BitTorrent client. Anyway, if anyone wants to test them out, I would be interested to hear how it goes. Thanks! Aaron - -- Skype: cannona MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFHdWIcI7J99hVZuJcRA+cCAKCgSLMUOaVayMgOjy/AJuZRouXBJACdFSnI oBhE8Xr9GhfdQSwj/6FyeMs= =hLnZ -----END PGP SIGNATURE----- From lee at novomail.net Fri Dec 28 14:00:43 2007 From: lee at novomail.net (Lee Passey) Date: Fri, 28 Dec 2007 15:00:43 -0700 Subject: [gutvol-d] Preservation of line endings In-Reply-To: <783243558.231031198873083729.JavaMail.mail@webmail01> References: <783243558.231031198873083729.JavaMail.mail@webmail01> Message-ID: <4775720B.5070307@novomail.net> Joshua Hutchinson wrote: > Congrats, Lee. You've successfully reinvented the wheel. ;) Perhaps, but this one's a steel-belted radial, not 4-ply polyester. > DP created such a utility (which uses TEXT instead of HTML output) many moons > ago (WinPrep, I believe is the name). And there's the rub. Anything that requires that I discard significant markup to use is worthless to me. My program merges the two files together in such a way that I can subsequently create a file where the soft hyphens are displayed or not, according to style sheet settings, /without losing any of the other markup that ABBYY has provided/. It seems to me that the current DP process for creating e-texts is to save all OCR output as simple text, discarding whatever markup the OCR engine is able to provide. The text is then checked for accuracy, and finally the markup is laboriously re-applied by hand. I'm trying to develop some tools that will allow me to carry the markup forward throughout the entire process in order to simplify the final result. From donovan at abs.net Fri Dec 28 15:21:06 2007 From: donovan at abs.net (D Garcia) Date: Fri, 28 Dec 2007 18:21:06 -0500 Subject: [gutvol-d] Preservation of line endings In-Reply-To: <4775720B.5070307@novomail.net> References: <783243558.231031198873083729.JavaMail.mail@webmail01> <4775720B.5070307@novomail.net> Message-ID: <200712281821.06799.donovan@abs.net> On Friday 28 December 2007 17:00, Lee Passey wrote: > Joshua Hutchinson wrote: > > DP created such a utility (which uses TEXT instead of HTML output) many > > moons ago (WinPrep, I believe is the name). > > And there's the rub. Anything that requires that I discard significant > markup to use is worthless to me. My program merges the two files > together in such a way that I can subsequently create a file where the > soft hyphens are displayed or not, according to style sheet settings, > /without losing any of the other markup that ABBYY has provided/. I believe it actually uses RTF for its input files, so no markup is actually discarded unless you have changed settings in FineReader. > It seems to me that the current DP process for creating e-texts is to > save all OCR output as simple text, discarding whatever markup the OCR > engine is able to provide. The text is then checked for accuracy, and > finally the markup is laboriously re-applied by hand. I'm trying to > develop some tools that will allow me to carry the markup forward > throughout the entire process in order to simplify the final result. The problem is that the markup output by FineReader is frequently wrong (false bold and false italic being the most common, along with false superscript letters/numbers in place of quotation marks) and nearly as frequently mispositioned around punctuation, depending on the initial quality of the printing and the scans. For those cases, there's really no point in carrying that markup over. Adding the markup back to a clean text at least insures that you haven't any wrong markup, although you may miss some. It's usually more laborious to have to remove and/or correct the incorrect markup, and there's a higher risk of accidentally deleting surrounding text. D From Bowerbird at aol.com Fri Dec 28 16:25:49 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 28 Dec 2007 19:25:49 EST Subject: [gutvol-d] let's get this over with Message-ID: ok, let's dispense of this during 2007, so we won't have to sully 2008 with such nonsense. i've noticed, from some bits of their messages that were quoted by other people recently, that the merry-go-rounders have tried to imply that there are some arenas where i agree with them... while it might be true that there are a few specific issues where it can appear we agree on a position, i can assure you that this is nothing more than that proverbial stopped clock that's "correct" twice a day. the merry-go-rounders do not _love_ project gutenberg. i do. that's a huge difference of opinion right at the start. i mostly want project gutenberg to follow its _own_ "rules", the ones it has laid out in its guidelines for its contributors (e.g., 4 blank lines before a heading and 2 blank lines after). the merry-go-rounders want p.g. to switch to heavy markup, a workflow that is immensely more complicated and difficult, one that basically stands on its head a longtime p.g. doctrine, the very doctrine that has made p.g. the premiere cyberlibrary. i believe that project gutenberg's dedicated focus on _readers_, and not "scholars", gives it the most effective cost-benefit ratio. the merry-go-rounders want to increase p.g. digitization costs considerably, for a range of benefits that's completely unproven. (academics will _never_ cotton to something done by volunteers, since that would be a virtual denial of their "professional" status.) i am comfortable with the p.g. policy that allows an e-text to be created from multiple sources. the merry-go-rounders hate it... furthermore, i like the fact that p.g. uses contemporary standards, e.g., removing the hyphen from olde-tyme words like "to-day" and closing up the spacey punctuation , which is common in old books. the merry-go-rounders would like to impose a cumbersome system where changes like these would be laboriously "annotated" as such... i'm _proud_ of michael hart, and i cherish the wisdom he has shown. the merry-go-rounders try to paint michael as some kind of buffoon, and imply that project gutenberg would have been better without him. (as if project gutenberg would've even _existed_ without michael hart!) the merry-go-rounders might want you to believe that i am like them. but i can assure you that i am not. nope, i've gone to extreme lengths to explain, perfectly clearly, why i do not want to be confused as them. in a nutshell, i am _disgusted_ by the merry-go-rounders. yes, it's a strong word. but it's _the_ word that describes my feelings, and describes them _accurately_, so i'm not going to "sugar-coat" it... are there things about project gutenberg that i'd like to see changed? you betcha. and i'm gonna continue telling you what those things are. on some of these things -- such as "preservation of line-endings" -- the merry-go-rounders might even agree with me, at least according to a subject-header that i see in my spam-folder at this very minute... (i'm not going to read it to confirm that, because based on experience spanning a decade, reading the merry-go-rounders is a waste of time.) but make no mistake, this superficial kind of "agreement" means nada. i'm disgusted by the merry-go-rounders, on the most _fundamental_ of levels, way down deep in the gut at the philosophical cornerstone... -bowerbird ************************************** See AOL's top rated recipes (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071228/4a234f08/attachment.htm From joshua at hutchinson.net Sat Dec 29 05:41:53 2007 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sat, 29 Dec 2007 13:41:53 +0000 (GMT) Subject: [gutvol-d] Preservation of line endings Message-ID: <1071916590.313441198935713858.JavaMail.mail@webmail01> An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071229/669a1824/attachment.htm From walter.van.holst at xs4all.nl Sat Dec 29 09:15:04 2007 From: walter.van.holst at xs4all.nl (Walter van Holst) Date: Sat, 29 Dec 2007 18:15:04 +0100 Subject: [gutvol-d] bitlet and the CD/DVD images In-Reply-To: <4113C7CC6DF0474AA76DB2C130EE499A@blackbox> References: <4113C7CC6DF0474AA76DB2C130EE499A@blackbox> Message-ID: <47768098.4060608@xs4all.nl> Aaron Cannon wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: RIPEMD160 > > Hi All. > > I added a link to the PG Wiki on the page > http://www.gutenberg.org/wiki/Gutenberg:The_CD_and_DVD_Project . (Actually, > to be precise, three links.) These links should allow one to download the > CD and DVD images via BitTorrent without having to install a BitTorrent > client. > > Anyway, if anyone wants to test them out, I would be interested to hear how > it goes. A truly splendid idea. It prompts one on how to save it and I happen to know that the DVD might be a .iso file. However, other users may not know this. A related question, is there a chance of this bitlet client being available to be embedded in pages in stead of getting redirected to bitlet.org? Regards, Walter From walter.van.holst at xs4all.nl Sat Dec 29 09:17:51 2007 From: walter.van.holst at xs4all.nl (Walter van Holst) Date: Sat, 29 Dec 2007 18:17:51 +0100 Subject: [gutvol-d] bitlet and the CD/DVD images In-Reply-To: <4113C7CC6DF0474AA76DB2C130EE499A@blackbox> References: <4113C7CC6DF0474AA76DB2C130EE499A@blackbox> Message-ID: <4776813F.5070908@xs4all.nl> Aaron Cannon wrote: > Anyway, if anyone wants to test them out, I would be interested to hear how > it goes. Slight correction, it is a zip file of course, but the client nonetheless prompts the user for a filename. Regards, Walter From ricardofdiogo at gmail.com Sat Dec 29 12:23:26 2007 From: ricardofdiogo at gmail.com (Ricardo F Diogo) Date: Sat, 29 Dec 2007 20:23:26 +0000 Subject: [gutvol-d] RIP Robert Marquardt Message-ID: <9c6138c50712291223q221ce242kd531969adf3819c7@mail.gmail.com> Robert Marquardt, PG's wiki sysop, died this morning. His work must remain unfinished and questions to him will remain unanswered. This was his last will. This info was added by his brother Rolf Marquardt today at PG's wiki. Robert joined Project Gutenberg in December 2006 to create the Science Fiction Bookshelf. He has completed the Project Gutenberg Science Fiction CD, which was a tremendously huge success. He also made a promotional video for PG in Esperanto. He was now working in other bookshelves. Robert always worked very hard at PG, even while he was doing his hard cancer treatments. I'm sure that these are sad news to all volunteers. PGLAF should perhaps take some symbolic condolences action. Ricardo From julio.reis at tintazul.com.pt Sun Dec 30 17:06:45 2007 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis) Date: Mon, 31 Dec 2007 01:06:45 +0000 Subject: [gutvol-d] RIP Robert Marquardt In-Reply-To: References: Message-ID: <1199063205.6607.31.camel@abetarda> > RIP Robert Marquardt May all his friends and loved ones also find peace in this time of grief. J?lio aka Tintazul. From richfield at telkomsa.net Sun Dec 30 22:26:45 2007 From: richfield at telkomsa.net (Jon Richfield) Date: Mon, 31 Dec 2007 08:26:45 +0200 Subject: [gutvol-d] Robert Marquardt: A thought from Piet Hein Message-ID: <47788BA5.8030308@telkomsa.net> Those of you who knew the Grooks of the late (and though I am far from being Danish, I affirm: great) Piet Hein, may remember this one: Giving in is no defeat Passing on is no retreat Selves were made to rise above You shall live in what you love. In gratitude to Robert and all those who labour for no greater reward than the satisfaction of lighting candles, whether they curse the dark or not, I suggest that this Grook encapsulates a concept that might help support such people while they work, and offer some comfort their friends, families, and beneficiaries after they have gone. Go well, Jon From ricardofdiogo at gmail.com Mon Dec 31 15:46:26 2007 From: ricardofdiogo at gmail.com (Ricardo F Diogo) Date: Mon, 31 Dec 2007 23:46:26 +0000 Subject: [gutvol-d] Etext #24073 copyrighted? Message-ID: <9c6138c50712311546h1f439984w3c81c542a3ae8dc6@mail.gmail.com> Does the TP&V of etext #24073 states it is pre-1923? By reading the etext I can't see any evidence of that. 1654 is _NOT_ the edition date. It's just the year when the speech was first said. Ricardo From gbnewby at pglaf.org Mon Dec 31 17:40:26 2007 From: gbnewby at pglaf.org (Greg Newby) Date: Mon, 31 Dec 2007 17:40:26 -0800 Subject: [gutvol-d] Etext #24073 copyrighted? In-Reply-To: <9c6138c50712311546h1f439984w3c81c542a3ae8dc6@mail.gmail.com> References: <9c6138c50712311546h1f439984w3c81c542a3ae8dc6@mail.gmail.com> Message-ID: <20080101014025.GA18049@mail.pglaf.org> On Mon, Dec 31, 2007 at 11:46:26PM +0000, Ricardo F Diogo wrote: > Does the TP&V of etext #24073 states it is pre-1923? By reading the > etext I can't see any evidence of that. > 1654 is _NOT_ the edition date. It's just the year when the speech was > first said. > > Ricardo Hi, Ricardo. I didn't see a posted note for this, so am cc'ing the pgww list. The file does not state that it's copyrighted, and I don't see why it would be. http://www.gutenberg.org/files/2/4/0/7/24073/ -- Greg From ricardofdiogo at gmail.com Mon Dec 31 17:49:54 2007 From: ricardofdiogo at gmail.com (Ricardo F Diogo) Date: Tue, 1 Jan 2008 01:49:54 +0000 Subject: [gutvol-d] Etext #24073 copyrighted? In-Reply-To: <20080101014025.GA18049@mail.pglaf.org> References: <9c6138c50712311546h1f439984w3c81c542a3ae8dc6@mail.gmail.com> <20080101014025.GA18049@mail.pglaf.org> Message-ID: <9c6138c50712311749h432119fcs13543af93bad5fc4@mail.gmail.com> According to the page images, this IS NOT _ definitively_ a pre-1923 edition (I know it because of the spelling). It is possible, however, that this _may be_ an official compilation of Priest Vieira's sermons, in which case it _could_ have been released into the public domain by the Brazilian insitution that published it. If the physical book was an ordinary edition, I'm affraid it's not in public domain in the US under the pre-1923 rule. Ricardo 2008/1/1, Greg Newby : > On Mon, Dec 31, 2007 at 11:46:26PM +0000, Ricardo F Diogo wrote: > > Does the TP&V of etext #24073 states it is pre-1923? By reading the > > etext I can't see any evidence of that. > > 1654 is _NOT_ the edition date. It's just the year when the speech was > > first said. > > > > Ricardo > > Hi, Ricardo. I didn't see a posted note for this, so am cc'ing > the pgww list. The file does not state that it's copyrighted, and > I don't see why it would be. > > http://www.gutenberg.org/files/2/4/0/7/24073/ > > -- Greg > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d >