From Bowerbird at aol.com Mon Dec 8 08:20:22 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 8 Dec 2008 11:20:22 EST Subject: [gutvol-d] the analysis of mind -- 001 Message-ID: "the analysis of mind", by bertrand russell, #2529, was just reposted. i didn't run it against a separate digitization, but only looked at the diffs between the original version and the repost, and ran some simple checks, and found these problems. the letter "l" used as a "1" in a date: > l909), > 1909), > [a-z][0-9] an improperly terminated paragraph: > in the past. Our two questions are, in the case of memory > in the past. Our two questions are, in the case of memory: > [a-z]\n\n a period that should be a comma instead: > understands. it will tell you what is 34521 times 19987, without a > understands, it will tell you what is 34521 times 19987, without a > \. [a-z] a missing em-dash: > such existents which hardly happens except in philosophy--we have to do > such existents--which hardly happens except in philosophy--we have to do 5 cases of footnotes formatted inconsistently from the other footnotes: > *See "Our Knowledge of the External World" (Allen & Unwin), > *The exact definition of a piece of matter as a construction > *I have explained elsewhere the manner in which space is > *For a more exact statement of this law, with the > *See his book, "The New Physiology and Other Addresses" all in all, 9 errors, without even really looking very hard. i interpret this as a cry for help from the whitewashers... ;+) -bowerbird p.s. stay tuned for my "i love michael hart" post on december 10th. ************** Make your life easier with all your friends, email, and favorite sites in one place. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000010) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Mon Dec 8 10:20:45 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 8 Dec 2008 13:20:45 EST Subject: [gutvol-d] latest report from under the bridge Message-ID: here's the latest report from under the bridge. you know, where us trolls live... ;+) *** first, i've been banned from yet another blog: > http://www.medialoper.com/hot-topics/amazon/my-beer-with-bezos-the-shocking-truth-about-the-kindle/#comment-27049 evidently this kirk guy doesn't like to be told that he is asking stupid questions, complete with full explanation. most amusingly, however, he accuses me of being "anonymous". of course, with a name as fanciful as "bowerbird intelligentleman", i guess i can't really blame people for thinking i'm using a "handle", especially since they will not know that all my friends call me that... (well, most of 'em shorten it to "bird" or "b-bird".) the thing that makes it so amusing, though, is that kirk says that "i'm not the only one who has noticed" that bowerbird is "an ass", as he then links to a post on a _nameless_ blog that attacks me... > http://www.techloser.co.za/?p=26#http://www.techloser.co.za/?p=26 so i guess it's ok to be nameless when you are attacking someone, but not when you engage them in a discussion that they initiated... *** anyhoo... _someone_ has to tell you when you're asking stupid questions, or else you'll just keep asking them, making yourself look stupid. and if no one else will step up, i guess i have to be the guy... -bowerbird p.s. yes, you read that correctly. the blog where i was attacked is named "tech loser". so you'd _think_ the guy has a sense of humor, especially since he labels the blog as "beta", and even incorporates a mirrored surface on the logo which is de rigeur in the web 2.0 world. but evidently his sense of humor only extends _quite_ so far... ;+) ah well, at least i can laugh with him, even if he cannot laugh with me. as chesterton said, angels can fly because they take themselves lightly. ************** Make your life easier with all your friends, email, and favorite sites in one place. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000010) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Tue Dec 9 14:52:57 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 9 Dec 2008 17:52:57 EST Subject: [gutvol-d] mountain blood -- 9000 Message-ID: well, well, you might remember "mountain blood"... it was the book that i used during the month of july, when i posted one tip each day on how to clean o.c.r. the book has finally been posted from d.p. over to p.g. so, of course, i did a comparison of it with my version. oops... i find a total of _20_ errors in the d.p./p.g. version, many of which i believe i had pointed out earlier... especially astonishing is that a half-dozen of the errors -- 3 cases each for "lattice" and "george gordon makimmon" -- are on _names_, which should have been fixed globally before the o.c.r. was even sent in front of the p1 proofers... but they were not, and they managed to persist through several rounds of proofing, formatting, post-processing, and whitewashing, to emerge unscathed in the final e-text. looks like roger frank and the whitewashers should have followed along. maybe they woulda learned something... *** here are the errors... http://z-m-l.com/go/mount/mountp020.html > It was well known that the first George Gordon Makimmon > It was well known that the first George Gordon MacKimmon http://z-m-l.com/go/mount/mountp020.html > Greenstream, It was well known that the first > Greenstream. It was well known that the first http://z-m-l.com/go/mount/mountp024.html > then entered the dining-room. > then entered the dining room. http://z-m-l.com/go/mount/mountp029.html > and Lattice Hollidew covered her eyes. The stranger sprang > and Lettice Hollidew covered her eyes. The stranger sprang http://z-m-l.com/go/mount/mountp053.html > that held Sprucesap, it was bathed in a flaring after~glow > that held Sprucesap, it was bathed in a flaring afterglow http://z-m-l.com/go/mount/mountp093.html > prayer meeting. If he strolled about in that > prayer-meeting. If he strolled about in that http://z-m-l.com/go/mount/mountp111.html > would." He turned with a sigh to the log. A cross-cut > would." He turned with a sigh to the log. A crosscut http://z-m-l.com/go/mount/mountp156.html > was half-raised, tentative; and his wheat colored > was half-raised, tentative; and his wheat-colored http://z-m-l.com/go/mount/mountp173.html > "Don't you say another word about Mrs. Caley," Lattice declared > "Don't you say another word about Mrs. Caley," Lettice declared http://z-m-l.com/go/mount/mountp175.html > The thought of his dwelling, with Lattice's importunate > The thought of his dwelling, with Lettice's importunate http://z-m-l.com/go/mount/mountp213.html > surprising foreknowledge of the County, who had > surprising fore-knowledge of the County, who had http://z-m-l.com/go/mount/mountp251.html > true that he did not care for her ... he did not care for her? > true that he did not care for her ... He did not care for her? http://z-m-l.com/go/mount/mountp251.html > that realization too carried a slight sting. But neither did he > That realization too carried a slight sting. But neither did he http://z-m-l.com/go/mount/mountp277.html > George Gordon Makimmon, resting on the porch > George Gordon MacKimmon, resting on the porch http://z-m-l.com/go/mount/mountp307.html > by George Gordon Makimmon from world-old > by George Gordon MacKimmon from world-old http://z-m-l.com/go/mount/mountp332.html > exhausted slumbers. Lying on an outflung arm his > exhausted slumbers. Lying on an out-flung arm his http://z-m-l.com/go/mount/mountp336.html > Seen from the road the long roof was variously colored > Seen from the road the long roof was variously-colored http://z-m-l.com/go/mount/mountp338.html > The latter, the account proceeded, with a foreknowledge > The latter, the account proceeded, with a fore-knowledge http://z-m-l.com/go/mount/mountp365.html > ing, dead planet. Gleams of light shot like quick-silver > ing, dead planet. Gleams of light shot like quicksilver http://z-m-l.com/go/mount/mountp368.html > his arms outflung across the counterpane, his head > his arms out-flung across the counterpane, his head *** it is fair to point out the other side as well... having human proofers review the o.c.r. _did_ result in those humans catching some errors in the paper-book. there were 3 such errors, 2 of which would not have been caught automatically, as they were stealth scannos. (i bungled the other.) http://z-m-l.com/go/mount/mountp040.html (error in p-book) > "pretty... bad. > "pretty... bad." http://z-m-l.com/go/mount/mountp059.html (error in p-book) > face shown with drops of perspiration that formed > face shone with drops of perspiration that formed http://z-m-l.com/go/mount/mountp197.html (error in p-book) > could level it, an arm shout out from behind him, > could level it, an arm shot out from behind him, *** whether the dozens of hours spent proofing this text _warranted_ the catching of those 3 errors is a decision that each of us is capable of making in our own heads... either way, in comparison, the lesser effort that managed to find _20_ errors in the e-text as it is now posted would appear to have been a very good investment to have made. -bowerbird ************** Make your life easier with all your friends, email, and favorite sites in one place. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000010) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Thu Dec 11 11:26:24 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 11 Dec 2008 14:26:24 EST Subject: [gutvol-d] i love michael hart Message-ID: i love michael hart, for making project gutenberg. great job! and charlz franks, too, for making distributed proofreaders... -bowerbird p.s. michael, i hope your december 10th was good this year. ************** Make your life easier with all your friends, email, and favorite sites in one place. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000010) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Thu Dec 11 11:55:08 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 11 Dec 2008 14:55:08 EST Subject: [gutvol-d] what comes next Message-ID: i have a cleaned-up copy of "the curious case of benjamin button", if p.g. is interested. it's already been posted, in a compilation, but it's past the 100k mark where google stops its indexing, so no one will be able to find it through google to know that it's in the library. (it's coming out as a movie on christmas day, with brad pitt and cate blanchett, so i'm sure people will be looking around for it.) *** i'm also cleaning up a copy of "the jungle", if you'd like to have that. penguin just made a big deal out of the fact that they are releasing a few e-books, and "the jungle" was one of 'em. all the releases are classic public-domain titles, yet they are charging $8-$10 _each_, so it's a big rip-off. (the only saving grace might be that they put in extra material with each one, but i doubt that it justifies _that_ price. they think they can rob us with e-books just like they do with paper.) so far the p.g. e-text doesn't look _that_ bad, but i haven't checked closely enough to say if that will continue to be the case. even with what i did so far, there were enough bugs that it's worth it to fix it, especially to match the clean-up penguin did. (assuming they did indeed clean it up; if not, it will be _really_ sweet to upstage them.) *** also, mike cook, are you still on here? i took a look at your version of "gulliver's travels". any reason why you just did the first 2 parts? and what specific edition did you use, so i can check it out for you? (if mike's work is accurate, the p.g. version of this is in bad shape.) -bowerbird ************** Make your life easier with all your friends, email, and favorite sites in one place. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000010) -------------- next part -------------- An HTML attachment was scrubbed... URL: From hart at pglaf.org Thu Dec 11 13:06:44 2008 From: hart at pglaf.org (Michael Hart) Date: Thu, 11 Dec 2008 13:06:44 -0800 (PST) Subject: [gutvol-d] i love michael hart In-Reply-To: References: Message-ID: On Thu, 11 Dec 2008, Bowerbird at aol.com wrote: > i love michael hart, for making project gutenberg. great job! > > and charlz franks, too, for making distributed proofreaders... > > -bowerbird > > p.s. michael, i hope your december 10th was good this year. Other than having trouble getting reaclimated, 9 degrees on arrival, I've been doing OK, and managed to light a candle. Thanks for asking! Michael From ajhaines at shaw.ca Fri Dec 12 09:13:52 2008 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Fri, 12 Dec 2008 09:13:52 -0800 Subject: [gutvol-d] what comes next References: Message-ID: <2ED49F09CECF44BDAC963BD2A8B213DB@alp2400> Changes/corrections to PG texts should be submitted to PG's errata system (errata_AT_pglaf.org). ----- Original Message ----- From: Bowerbird at aol.com To: gutvol-d at lists.pglaf.org ; Bowerbird at aol.com Sent: Thursday, December 11, 2008 11:55 AM Subject: [gutvol-d] what comes next i have a cleaned-up copy of "the curious case of benjamin button", if p.g. is interested. it's already been posted, in a compilation, but it's past the 100k mark where google stops its indexing, so no one will be able to find it through google to know that it's in the library. (it's coming out as a movie on christmas day, with brad pitt and cate blanchett, so i'm sure people will be looking around for it.) *** i'm also cleaning up a copy of "the jungle", if you'd like to have that. penguin just made a big deal out of the fact that they are releasing a few e-books, and "the jungle" was one of 'em. all the releases are classic public-domain titles, yet they are charging $8-$10 _each_, so it's a big rip-off. (the only saving grace might be that they put in extra material with each one, but i doubt that it justifies _that_ price. they think they can rob us with e-books just like they do with paper.) so far the p.g. e-text doesn't look _that_ bad, but i haven't checked closely enough to say if that will continue to be the case. even with what i did so far, there were enough bugs that it's worth it to fix it, especially to match the clean-up penguin did. (assuming they did indeed clean it up; if not, it will be _really_ sweet to upstage them.) *** also, mike cook, are you still on here? i took a look at your version of "gulliver's travels". any reason why you just did the first 2 parts? and what specific edition did you use, so i can check it out for you? (if mike's work is accurate, the p.g. version of this is in bad shape.) -bowerbird ************** Make your life easier with all your friends, email, and favorite sites in one place. Try it now. (http://www.aol.com/?optin=new-dp&icid=aolcom40vanity&ncid=emlcntaolcom00000010) ------------------------------------------------------------------------------ _______________________________________________ gutvol-d mailing list gutvol-d at lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Fri Dec 12 14:25:31 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 12 Dec 2008 17:25:31 EST Subject: [gutvol-d] what comes next Message-ID: al said: > Changes/corrections to PG texts should be > submitted to PG's errata system (errata_AT_pglaf.org). i don't see how that relates to anything i asked in that post. i have cleaned-up versions of some e-texts if you want 'em. and if you don't, that's fine. i do them for myself, not for you. and if you want to do a comparison between your old version and my cleaned-up version, to see what changes were made, that's fine too. but it's not something i'm interested in doing. the clean-up is enough work -- documentation is too much. happy holidays... -bowerbird ************** Make your life easier with all your friends, email, and favorite sites in one place. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000010) -------------- next part -------------- An HTML attachment was scrubbed... URL: From gbnewby at pglaf.org Fri Dec 12 15:35:09 2008 From: gbnewby at pglaf.org (Greg Newby) Date: Fri, 12 Dec 2008 15:35:09 -0800 Subject: [gutvol-d] what comes next In-Reply-To: References: Message-ID: <20081212233509.GF18629@mail.pglaf.org> On Thu, Dec 11, 2008 at 02:55:08PM -0500, Bowerbird at aol.com wrote: > i have a cleaned-up copy of "the curious case of benjamin button", > if p.g. is interested. it's already been posted, in a compilation, but > it's past the 100k mark where google stops its indexing, so no one > will be able to find it through google to know that it's in the library. Sure, thanks. You can just email it to me. > (it's coming out as a movie on christmas day, with brad pitt and > cate blanchett, so i'm sure people will be looking around for it.) > > *** > > i'm also cleaning up a copy of "the jungle", if you'd like to have that. Ditto. -- Greg > penguin just made a big deal out of the fact that they are releasing > a few e-books, and "the jungle" was one of 'em. all the releases are > classic public-domain titles, yet they are charging $8-$10 _each_, > so it's a big rip-off. (the only saving grace might be that they put in > extra material with each one, but i doubt that it justifies _that_ price. > they think they can rob us with e-books just like they do with paper.) > > so far the p.g. e-text doesn't look _that_ bad, but i haven't checked > closely enough to say if that will continue to be the case. even with > what i did so far, there were enough bugs that it's worth it to fix it, > especially to match the clean-up penguin did. (assuming they did > indeed clean it up; if not, it will be _really_ sweet to upstage them.) > > *** > > also, mike cook, are you still on here? i took a look at your version > of "gulliver's travels". any reason why you just did the first 2 parts? > and what specific edition did you use, so i can check it out for you? > (if mike's work is accurate, the p.g. version of this is in bad shape.) > > -bowerbird > > > > ************** > Make your life easier with all your friends, email, and > favorite sites in one place. Try it now. (http://www.aol.com/?optin=new-dp& > icid=aolcom40vanity&ncid=emlcntaolcom00000010) > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From Bowerbird at aol.com Wed Dec 17 09:20:44 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 17 Dec 2008 12:20:44 EST Subject: [gutvol-d] "the jungle" -- 001 Message-ID: i'm cleaning up "the jungle", by upton sinclair. *** this e-text was _reposted_, so one of the things i did was to jump through the hurdles to look at the _diffs_ between the old version and the reposted version... aside from some no-look-needed global changes, i saw _32_ substantive changes between the different versions. this might lead us to think the old e-text was very clean. but... um, that will prove _not_ to be the case, however... -bowerbird p.s. here are the 32 changes that i found in the repost... bare save for a calendar. bare save for a calendar, ========================^ a little comer of the high mansions of the sky. a little corner of the high mansions of the sky. ===========^------------------------------------ pulling one way and pushing the other. pulling one way and pushing the other, =====================================^ that with these arms people will ever let me starve?" "that with these arms people will ever let me starve?" ^----------------------------------------------------- even so. even so, =======^ ln all their journey they had seen nothing so bad as this. in all their journey they had seen nothing so bad as this. ^--------------------------------------------------------- it suppported directly two hundred and fifty thousand people it supported directly two hundred and fifty thousand people =======^---------------------------------------------------- ne of the rules on the killing beds was that One of the rules on the killing beds was that ^-------------------------------------------- and the next moming set out again. and the next morning set out again. ===============^------------------- tubercular pork that was condemncd as unfit for export. tubercular pork that was condemned as unfit for export. ================================^====================== and the steam of hot food,and perhaps music, and the steam of hot food, and perhaps music, ==========================^------------------ in despeate hours, in desperate hours, ========^---------- It was several scconds before she could get breath to answer him. It was several seconds before she could get breath to answer him. ================^================================================ hecause it was nearly eight o'clock, because it was nearly eight o'clock, ^=================================== who gazed at him through a crack in thc door. who gazed at him through a crack in the door. ======================================^====== said Jurgis. said Jurgis, ===========^ isn't she here?" "isn't she here?" ^---------------- said Jurgis. said Jurgis, ===========^ toward the middle of the alternoon, toward the middle of the afternoon, ==========================^======== Quick! "Quick! ^------ "Madame Haupt, Hebamme, "Madame Haupt Hebamme", =============^--------- lt so happened that half of this was in one direction It so happened that half of this was in one direction ^==================================================== stretching out their arms to him,calling to him across a bottomless stretching out their arms to him, calling to him across a bottomless =================================^---------------------------------- Over \n where the cattle butchers were waiting, [[bad-line-break]] Over where the cattle butchers were waiting, =====^----------------------------------------- ''then you're in for it, [[2singlequotes]] "then you're in for it, ^^---------------------- "But what am I going to do?'' [[2singlequotes]] "But what am I going to do?" ===========================^^ she replied. she replied, ===========^ Perhaps you think I did you a dirty trick. running away as I did, Perhaps you think I did you a dirty trick running away as I did, =========================================^----------------------- the policeman remaining to look under the bed and behind the door the policeman remaining to look under the bed and behind the door. =================================================================^ he had somehow always excepted his own family. that he had loved; he had somehow always excepted his own family that he had loved; =============================================^------------------- a woman's voice,gentle and sweet, a woman's voice, gentle and sweet, ================^----------------- anyhow?"-- anyhow?-- =======^-- ************** One site keeps you connected to all your email: AOL Mail, Gmail, and Yahoo Mail. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000025) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Thu Dec 18 12:49:51 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 18 Dec 2008 15:49:51 EST Subject: [gutvol-d] "the jungle" -- 002 Message-ID: i'm cleaning up "the jungle", by upton sinclair. *** gee, there are thousands of diffs between these versions... so this won't be going as fast as i might have hoped... :+) i have already made hundreds of changes, on both sides... of course, you'd expect that the o.c.r. side has lots of errors; but i've corrected hundreds of bugs in the p.g. e-text as well. so the 32 fixes made in the "repost" will pale in significance. i don't think anyone but the whitewashers themselves knows much about the _process_ of the clean-up during a "repost". it's fairly clear from the type (and the large number) of errors, however, that a page-by-page comparison to page-scans is _not_ being done. it's not even obvious that _any_ page-scans are being consulted at all, or even that much analysis is done... perhaps none of this is surprising, given the size of the library. on the other hand, if a repost clean-up is so cursory, one can wonder why it's taken 5+ years to repost pre-10,000 e-texts. given hundreds of errors that _need_ to be fixed in this e-text -- that's conservative, the number could run up to 400 or 800, which averages to 1-2 errors-per-page in this 413-pager -- what difference does it make that a measly 32 were once fixed in an interim clean-up along the way? and, if it is indeed the case that no page-scans are evaluated, is it even worth the time to make "corrections" while "blind"? i know the whitewashers are spending a lot of time on the job, and -- as _volunteers_ -- i certainly appreciate their devotion. but, as with any volunteer, i think it's fair game to ask whether their time is being spent in a _constructive_ manner, and if not, then perhaps we can explore how their time can be used better. -bowerbird ************** One site keeps you connected to all your email: AOL Mail, Gmail, and Yahoo Mail. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000025) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Fri Dec 19 01:26:14 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 19 Dec 2008 04:26:14 EST Subject: [gutvol-d] "the jungle" -- 003 Message-ID: i'm cleaning up "the jungle", by upton sinclair. *** well, i've got the number of diffs down under a thousand, so it's not going as slow as i was worried that it might go. and i've made hundreds _more_ changes, on both sides... so this e-text wasn't very clean at all, i am sorry to report. but i won't be able to say _exactly_ how many errors it had, because i strongly suspect that the p.g. e-text was based on a different edition of the book than the one i have scans for. (well, i have 2 scan-sets, but _they_ appear to be identical. still, this book sold tons of copies, so it went through lots of printings, meaning edition differences can be expected.) i have noticed there are a lot of _hyphenation_ differences, which is a common change _often_ made between editions. (because, i suspect, editors must make a mark _somehow_.) but there are still enough obviously-wrong situations that i can confidently say that this e-text just isn't all that clean. no significant problems -- nothing that would give noring any ground to stand on regarding "untrustworthiness" -- but hundreds and hundreds of "little problems" _do_ add up. so, if al wants to run out the diffs between my clean version and the p.g. version, i'd be very happy to take a look at 'em... :+) -bowerbird ************** One site keeps you connected to all your email: AOL Mail, Gmail, and Yahoo Mail. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000025) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Morasch at aol.com Fri Dec 19 15:33:52 2008 From: Morasch at aol.com (Morasch at aol.com) Date: Fri, 19 Dec 2008 18:33:52 EST Subject: [gutvol-d] "the jungle" -- 004 Message-ID: i'm cleaning up "the jungle", by upton sinclair. *** hundreds _more_ changes, since last report, on both sides... and it's become very clear that there is an edition-difference. first of all, many british spellings have revealed themselves... also, there is this difference, on page 42: > This inspector wore a blue uniform, with brass buttons, and he > This inspector wore an imposing silver badge, and he the top line is from the p.g. text, the bottom line from mine. that might help you find the edition used for the p.g. e-text. google returns this, but it's not full-view or downloadable: > http://books.google.com/books?id=5IZyFaS9oWAC have a good weekend... -bowerbird ************** One site keeps you connected to all your email: AOL Mail, Gmail, and Yahoo Mail. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000025) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajhaines at shaw.ca Sat Dec 20 11:20:06 2008 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Sat, 20 Dec 2008 11:20:06 -0800 Subject: [gutvol-d] "the jungle" -- 004 References: Message-ID: This many changes (as opposed to corrections) probably renders your version useless as a replacement for PG #140, which I assume is your PG source. Such wholesale changes would cause it to be handled as a new submission, however, since PG does not have a copyright clearance on file for the second source edition (presumably only that for #140), one would have to be obtained before submission. Read article C.23 in PG's Copyright FAQ. It would seem to me that creating a mash-up version of this book from two obviously incompatible editions, is a case of A for effort, F for logic. Unless, of course, you're going to have the inspector wearing a blue uniform, with brass buttons, and an imposing silver badge. Better to have spent your time creating a new, clean, faithful-to-original, version, rather than creating a faithful-to-nothing version. ----- Original Message ----- From: Morasch at aol.com To: gutvol-d at lists.pglaf.org ; Bowerbird at aol.com Sent: Friday, December 19, 2008 3:33 PM Subject: [gutvol-d] "the jungle" -- 004 i'm cleaning up "the jungle", by upton sinclair. *** hundreds _more_ changes, since last report, on both sides... and it's become very clear that there is an edition-difference. first of all, many british spellings have revealed themselves... also, there is this difference, on page 42: > This inspector wore a blue uniform, with brass buttons, and he > This inspector wore an imposing silver badge, and he the top line is from the p.g. text, the bottom line from mine. that might help you find the edition used for the p.g. e-text. google returns this, but it's not full-view or downloadable: > http://books.google.com/books?id=5IZyFaS9oWAC have a good weekend... -bowerbird ************** One site keeps you connected to all your email: AOL Mail, Gmail, and Yahoo Mail. Try it now. (http://www.aol.com/?optin=new-dp&icid=aolcom40vanity&ncid=emlcntaolcom00000025) ------------------------------------------------------------------------------ _______________________________________________ gutvol-d mailing list gutvol-d at lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Sat Dec 20 17:37:51 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 20 Dec 2008 20:37:51 EST Subject: [gutvol-d] "the jungle" -- 004 Message-ID: al said: > This many changes (as opposed to corrections) > probably renders your version useless as > a?replacement for PG #140, > which I assume is your PG source. ? i'm not making a "replacement" for pg#140. i'm making a clean e-book -- republishing it. but you still need to know that your #140 has hundreds of errors in it -- at least, and literally. > Such wholesale changes would cause it > to be handled as a new submission that's how i've assumed greg would handle it. > however, since PG does not have > a copyright clearance on file > for the second source edition > (presumably only that for #140), > one would have to be obtained > before submission.? > Read article C.23 in PG's Copyright FAQ. i'll send the file over, and the scan-sets -- one each from google and archive.org -- and let greg sort them out, if he wants to. if he doesn't want to, that's fine with me too. i just ain't gonna worry my little head about it. it's the holiday season, and i'm singing carols... > It would seem to me that?creating > a?mash-up version of this book? > from two obviously incompatible editions, well, it's _not_ the case that the editions are "obviously incompatible", not in the slightest. if you got that impression from me, i'm sorry. there has clearly been some editing going on. most changes are words being de/hyphenated. whether that editing was done by a publisher, making a different edition, or by someone like the person making the p.g. e-text, i cannot say. the badge/buttons edit was the big stand-out. (another one was "exclaimed" versus "gasped".) i was just giving you some information that may help you track down a version your person used. > is a case of A for effort, F for logic.? sorry, al, but it's _your_ logic that is flunking here. first of all, until you've done _some_ comparison, you don't even know you've got different editions. i've done this enough times now that i know how to probe, but even when you do have two editions, the comparison method can still be quite useful... so i don't even bother to probe, unless i have more than two editions, so as to decide which two to use. so even if i had known at the outset that these were two different editions, i still would've compared 'em. (since they are the only two i've got that are worthy.) see, there were lots of cases where i _kept_an_edit_ that'd been done in the p.g. e-text, because it was _a_good change_, one that i, as a republisher, liked. you just don't get that kind of information if you do the comparison with two copies of the same edition. > Unless, of course, you're going to have > the inspector wearing a blue uniform, with > brass buttons, and an imposing silver badge. well, that might have been the case! but honestly... brass and silver would clash, and we can't have that. so i'm just gonna go with what _my_ scans say, ok?, and forget about whatever the p.g version might say. > Better to have spent your time creating a new, clean, > faithful-to-original, version, rather than creating > a faithful-to-nothing version. well, i don't hold much to that "faithful" crap, honestly. i am a _republisher_, so i _correct_ errors in the original, and even _standardize_ various aspects to "house" style, just like every other publisher out there in the real world. moreover, i never found anyone who really believes that "faithful" crap down _deep_, or they would be leaving in the spacey quotation-marks and the spacey punctuation. having said that, however, i do _furnish_the_scans_ from the p-book that i used as the source, so that people can evaluate _for_themselves_ to see if the text i gave them matches the scan, and if it doesn't, if the change i made is one that they can live with, or reject, as they see fit... since scan-sets are so plentiful these days, it'd be nice if you did the same with the books that you digitize, al. project gutenberg, as a whole, might wanna do the same. now, i hasten to add, as i've said many times in the past, that i find nothing wrong with the p.g. "synthesis" mode, where multiple p-books can be used to create an e-text. i understand the historical underpinnings of it very well, and in some cases i'd think this method is even superior. nonetheless, as i have also said many times in the past, i expect that that approach might soon fall out of favor, because -- given our bounty of scan-sets these days -- there'll be no shortage of digitizations which _can_ (and _will_) point to a specific scan-set as their authentication. since a project gutenberg "synthesis" e-text can't do that, it will likely lose out if "authentication" becomes an issue. but unhappily, the reality is even more problematic for p.g. because even your "non-synthesis" e-texts have problems when it comes to authentication. first of all, p.g. does not -- for the vast majority of its e-texts -- provide the scans, which is pretty much a first step in proving authentication. it doesn't even always tell people which p-book was used, so they could chase down the scan-set if they wanted to... even when it does provide the scans, p.g. doesn't provide infrastructure which facilitates comparison with the scans. many of your files don't even record the pagebreak info, so it's difficult to even know _which_ scan you'd look at... and even in the very rare best situation of all, where you have an .html file that records the pagebreaks and even gives a _link_ where each page links to its relevant scan -- there _are_ some p.g. e-texts like this, but very few -- the fact that you haven't retained the original linebreaks makes comparison far more painful than it needs to be. so even in these _very_best_cases_, you'll lose the battle. the conclusion? the entire p.g. library is extremely weak when it comes down to the question of _authentication_. it's your achilles heel right now, to be honest... and if you don't do something about it, you will be sorry. because the _arrow_ that will be shot at your achilles heel is the _large_number_of_errors_ located in your e-texts... people have always know that your e-texts have errors... but nobody has done a public, systematic documentation revealing the large number of errors until i did it recently. and, lest anyone be misguided, i have surprised _myself,_ since i didn't think the number of errors would be so big. and lest anyone else be misguided, let me also state that the number of _serious_ errors is (as i thought) very small. however, as i said yesterday, a big number of non-serious errors eventually adds up to a rather serious shortcoming. consider the current case. if i can point to 826 "possible" errors in the book -- all documented against a scan-set -- how effective do you think your counterclaim will be when you say "those scans are from another edition of the book" _if_ it's also the case you can't summon up your own scans? or, even if you do summon scans, they're hard to compare? because i'd make it very easy for people to see the "errors". (i picked "826" as a sum because the book has 413 pages, which works out to an error-rate of two-errors-per-page, which is enough to convince anybody there is a problem.) i'm telling you this _not_ because i want to shoot the arrow. i don't. maybe people like jon noring want to. but i don't... the thing is, you must cover your achilles heel. or be sorry. -bowerbird ************** One site keeps you connected to all your email: AOL Mail, Gmail, and Yahoo Mail. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000025) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianelwood at riseup.net Sat Dec 20 22:15:42 2008 From: ianelwood at riseup.net (Ian Elwood) Date: Sat, 20 Dec 2008 22:15:42 -0800 Subject: [gutvol-d] OCR HowTos? Message-ID: <494DDF0E.5000100@riseup.net> Hi, Just wondering if anyone can reccommend some OCR howtos for Ubuntu linux. I have the standard scanning tool (xsane) but cant figure out how to get OCR to work. --ian-- From traverso at posso.dm.unipi.it Sun Dec 21 01:53:29 2008 From: traverso at posso.dm.unipi.it (Carlo Traverso) Date: Sun, 21 Dec 2008 10:53:29 +0100 Subject: [gutvol-d] OCR HowTos? In-Reply-To: <494DDF0E.5000100@riseup.net> (message from Ian Elwood on Sat, 20 Dec 2008 22:15:42 -0800) References: <494DDF0E.5000100@riseup.net> Message-ID: <20081221095423.879A367709@posso.dm.unipi.it> >>>>> "Ian" == Ian Elwood writes: Ian> Hi, Ian> Just wondering if anyone can reccommend some OCR howtos for Ian> Ubuntu linux. I have the standard scanning tool (xsane) but Ian> cant figure out how to get OCR to work. Ian> --ian-- _______________________________________________ Ian> gutvol-d mailing list gutvol-d at lists.pglaf.org Ian> http://lists.pglaf.org/listinfo.cgi/gutvol-d A tesseract package is available for ubuntu. It is probably the best choice for Linux. Not sure if it is good enough. Carlo Traverso From ianelwood at riseup.net Sun Dec 21 18:09:31 2008 From: ianelwood at riseup.net (Ian Elwood) Date: Sun, 21 Dec 2008 18:09:31 -0800 Subject: [gutvol-d] OCR HowTos? In-Reply-To: <20081221095423.879A367709@posso.dm.unipi.it> References: <494DDF0E.5000100@riseup.net> <20081221095423.879A367709@posso.dm.unipi.it> Message-ID: <494EF6DB.1030900@riseup.net> Carlo Traverso wrote: >>>>>> "Ian" == Ian Elwood writes: > > Ian> Hi, > > Ian> Just wondering if anyone can reccommend some OCR howtos for > Ian> Ubuntu linux. I have the standard scanning tool (xsane) but > Ian> cant figure out how to get OCR to work. > > Ian> --ian-- _______________________________________________ > Ian> gutvol-d mailing list gutvol-d at lists.pglaf.org > Ian> http://lists.pglaf.org/listinfo.cgi/gutvol-d > > A tesseract package is available for ubuntu. It is probably the best > choice for Linux. Not sure if it is good enough. > > Carlo Traverso > Thanks! --ian-- > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From Bowerbird at aol.com Mon Dec 22 11:29:52 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 22 Dec 2008 14:29:52 EST Subject: [gutvol-d] "the jungle" -- 005 -- laughing all weekend Message-ID: gosh al, i want to thank you, because i've been laughing all weekend over that comment you made about how my version of "the jungle" will be "faithful to nothing". you're just _too_ funny, a real wild man, because the project gutenberg e-text is that one that is "unfaithful". here's your quote, al: > Better to have spent your time creating a new, clean, faithful-to-original, > version, rather than creating a faithful-to-nothing version. ok, let's take a closer look, ok? there are two (and only two) scan-sets widely available in cyberspace, kind of amazing when you consider just how influential this book was. one was done by google, but doesn't appear to be available from them, maybe because they jerked it due to the fact they'd missed six (6) pages. (you might be able to find a copy of it at its home, which is umichigan.) fortunately, somebody who downloaded it re-uploaded it to archive.org. > http://www.archive.org/details/jungle00sincgoog the other one was done by archive.org, coming to us from utoronto... > http://www.archive.org/details/thejungle00sincuoft the first one is represented on my site in this subdirectory: > http://z-m-l.com/go/tjbus/ the second one is represented on my site in this subdirectory: > http://z-m-l.com/go/tjaus/ the first one -- tjbus -- is the one that is most up-to-date right now. (i've done quite a few comparisons between the two scan-sets, and -- at least as far as i've been able to determine -- they are identical. if anyone can spot a difference between them, i'd love to hear that...) here's a quick little rundown of a few facts about those scan-sets, and how "faithful" _my_ version is to them, versus the p.g. version... let's start with italicized words and phrases. there are 44 of them, according to my cursory exam. my version marks them faithfully... the p.g. version doesn't have 'em marked; p.g. is being "unfaithful". now, the scan-set shows quite a few cases -- 54 of them, i believe -- with a comma followed by an em-dash. my version includes them... the p.g. version deletes the comma; again, p.g. is being "unfaithful". just with that, some 98 "unfaithful" cases of "errors" in the p.g. e-text. there is one case of a mis-spelled name -- grandma majauzskiene -- > http://z-m-l.com/go/tjbus/tjbusp080.html (should be majauszkiene), where the p.g. version unfaithfully fixed it, silently, with zero hint of a transcriber note saying anything about it... that's not the way to be faithful. that makes 99. what can i do to push the number to 3-digit territory? oh, i know. my comparison showed a few places where "afterwards" on the scans was changed to "afterward" in the p.g. version, probably because somebody somewhere along the line decided "afterwards" is not-a-word. there were 4 of these, which pushes the number to 103. italics -- 44 cases -- my version matches, your version doesn't. ", --" -- 54 cases -- my version matches, your version doesn't. "afterwards" -- 4 cases -- my version matches, your version doesn't. and as you might suspect, i corrected the grandma name misspelling, so you have a partner in your unfaithfulness there... but that's not always the case. for example, you have 42 cases (42!) of _bad_paragraphing_ -- most of them being paragraphs you _missed_, with a few that you _invented_ sprinkled in, for good measure, i guess. my version has those 42 cases marked correctly -- marked faithfully... i think i mentioned that _most_ of the differences between the o.c.r. and the p.g. version were words that either _were_ or _were_not_ hyphenated. (a couple examples here would be "working-men" and "lodging-houses".) according to the scan-sets, these words _should_ have been hyphenated. i hyphenated, faithfully... but the p.g. version? unhyphenated; unfaithful. i haven't counted 'em, but i can confidently say there's hundreds of those. and then i mentioned the british-spelled words; i kicked those to the curb! i don't care if the scans had them in. this is an 'merican book, written by an 'merican author, and it's set in 'merica, so it should have 'merican spellings. not those strange british spellings that look so weird to our 'merican eyes. yes, the scans had british spellings, but i don't care about being "faithful". .. but hey, did i mention the p.g. version _changed_ the british spellings too? so i guess the p.g. version didn't care too much 'bout being "faithful" either. so we're already up in the range of 400 or more lapses in "faithfulness" in the p.g. version, and we haven't even _started_ adding in the one-off's where a word was added incorrectly, or deleted incorrectly, or changed... there are hundreds of errors in the p.g. e-text. maybe hundreds and hundreds. and you, al, are tsk-tsking _my_ version, because it is "faithful to nothing"? ha! *** so these are all the reasons why i was laughing all weekend at al's "reasoning"... at first, i thought i'd take him to task for entering a dialog with such weak crap. but hey, al, if you come in with stuff that is _so_ bad that it's actually _hilarious_, there is some redeeming value in the humor side of it. keep up the funny stuff... i mean, it's not steven colbert, but nonetheless, it kept me laughing all weekend. *** so, al, if it helps you console yourself, when you look at that "f" i gave you for your "logic", you can think of it as standing for "faithful", if you really want to... just so long as you remember that i've demonstrated that, in the "faithful" race, your p.g. version finished in last place... -bowerbird ************** One site keeps you connected to all your email: AOL Mail, Gmail, and Yahoo Mail. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000025) -------------- next part -------------- An HTML attachment was scrubbed... URL: From paulmaas at airpost.net Thu Dec 25 06:18:23 2008 From: paulmaas at airpost.net (Paul Maas) Date: Thu, 25 Dec 2008 06:18:23 -0800 Subject: [gutvol-d] Merry Christmas! Message-ID: <1230214703.16836.1291758701@webmail.messagingengine.com> Merry Christmas to everyone at Project Gutenberg! -- Paul Maas paulmaas at airpost.net -- http://www.fastmail.fm - I mean, what is it about a decent email service? From inka at 21torr.com Thu Dec 25 06:23:39 2008 From: inka at 21torr.com (inka at 21torr.com) Date: Thu, 25 Dec 2008 15:23:39 +0100 Subject: [gutvol-d] Merry Christmas! Message-ID: Vielen Dank fuer Ihre Nachricht. Ich bin vom 24.12.2008 bis zum 2.1.2009 im Urlaub und habe keinen Zugang zu meinen E-Mails. Wenden Sie sich daher in dringenden Faellen bitte an Jochen Hild: j.hild at 21torr.com Telefon 07121/3 48-220 Thank you for your message. I am away on holiday from the 24.12.2008 until the 2.1.2009 and don?t have e-mail access. In urgent cases, please contact Jochen Hild: j.hild at 21torr.com, fon: +49/71 21/3 48-220 From hart at pglaf.org Thu Dec 25 08:54:12 2008 From: hart at pglaf.org (Michael Hart) Date: Thu, 25 Dec 2008 08:54:12 -0800 (PST) Subject: [gutvol-d] Merry Christmas! In-Reply-To: <1230214703.16836.1291758701@webmail.messagingengine.com> References: <1230214703.16836.1291758701@webmail.messagingengine.com> Message-ID: Many thanks!!! Same to you!!! Mcihael On Thu, 25 Dec 2008, Paul Maas wrote: > Merry Christmas to everyone at Project Gutenberg! > -- > Paul Maas > paulmaas at airpost.net > > -- > http://www.fastmail.fm - I mean, what is it about a decent email service? > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Bowerbird at aol.com Thu Dec 25 13:09:31 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 25 Dec 2008 16:09:31 EST Subject: [gutvol-d] merry christmas -- ho ho ho -- it's a jungle out there Message-ID: merry christmas! ho ho ho! :+) i think i'll forego my usual visit over to the d.p. forums this year, to tweak the nose of "the powers that be" who banned me, since i would only deliver them what they deserve -- a lump of coal... but here's a little present for the people who like me -- a copy of "the jungle" that is a whole lot cleaner than any other version you will find online -- most of which stem from the p.g. e-text. as you will find, if you compare that with mine, the p.g. copy has hundreds and hundreds -- and hundreds -- of errors within it... none of them are very serious -- does it really matter if the snow fell "quick" or "thick"? or if he lived off the "populace" or the "population"? probably not. except maybe to a few obsessive-compulsives... :+) anyway, here's my version: > http://z-m-l.com/go/tjbus/tjbus.zml even people who hate me can consider this "a present" to them -- a chance to shoot me down, by finding errors in my version... jose found 31 errors in a text i put up recently -- proof positive i'm one of the biggest bunglers around. can you top his number? if you want to try, jump into the proofing interface: > http://z-m-l.com/go/tjbus/tjbusp001.html *** speaking of 30-something errors, the "repost" process for this book found and fixed 32 errors. considering the many hundreds that were missed, 32 isn't very many. but that number looks even more measly when you compare it to the results of one final check that i had to run. the archive.org file that i was using as one of my comparison texts had a deficiency common to much of the archive.org o.c.r. output -- bureaucratic mangling had lost all of the em-dashes from the book... the p.g. text had the em-dashes, so i spliced them in, but of course that meant that i had no back-up comparison for those em-dashes. so i found a file to use for that explicit purpose. this gives a metric that we can use to assess the p.g. e-text's accuracy. specifically, there were (at least) 35 errors concerning p.g. em-dashes; 33 em-dashes were missing, while 2 had been erroneously inserted... all in all, there were 1,673 em-dashes, for an accuracy-rate of 97.9%. 97.9% isn't all _that_ bad... but 35 errors -- on em-dashes alone -- that survived the reposting where a mere 32 other errors were fixed? that's kind of embarrassing to the "repost" process, in my opinion... and of course, 35 errors -- on em-dashes alone -- is quite a few more than the 5-8 which al once maintained would be enough to "kick back" a submission that he was whitewashing -- if it came from _me_, that is. anyway, as usual, another demonstration that this comparison method is absolutely the best way to find and fix errors in existing p.g. e-texts. end-user error-reports ain't gonna be _nearly_ as meticulous as this... *** oh yeah, and for the _nice_ people over at distributed proofreaders? well, there's a present for them in all of this, too. they're still trying to figure out how they can get nicely-formatted print-on-demand output from p.g. e-texts... here are two of their recent threads on this topic: > http://www.pgdp.net/phpBB2/viewtopic.php?t=36618 > http://www.pgdp.net/phpBB2/viewtopic.php?t=36647 i've created an .html version of this book from the z.m.l. "master" file: > http://z-m-l.com/go/tjbus/tjbus.html and a .pdf as well: > http://z-m-l.com/go/tjbus/tjbus.pdf both that .pdf and the .html have powerful navigational capabilities. of passing interest is that the .pdf version is tightly coupled with the online page-by-page .html version. every page in the .pdf has a link (at the bottom of the page) that will jump to that page on my website. in this sense, the .html version on my site is the "canonical version", and all of the dispersed .pdf versions have the ability to connect to it, so people can make public annotations to each page right on the site. these "individual-reading-copies" joined with "communal-comments" is a useful approach toward the general issue of public annotation... plus, of course, the .pdf version can be printed, for print-on-demand, so this is just another demonstration that z.m.l. does the trick nicely... yes, this is a simple book. but z.m.l. can handle complicated stuff too. and besides, most of the books in the p.g. library _are_ "simple" books. in addition, once end-users have my converter-programs, they will be able to create .pdf versions that reflect their _own_ specific preferences. seriously, if you're going the print-on-demand route, you might as well give the end-user the ability to customize the output to their preference. -bowerbird ************** One site keeps you connected to all your email: AOL Mail, Gmail, and Yahoo Mail. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000025) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Mon Dec 29 17:27:07 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 29 Dec 2008 20:27:07 EST Subject: [gutvol-d] the jungle -- 007 Message-ID: i did one final comparison between the p.g. e-text of "the jungle" and my final version, and i came up with over 1,240 differences... as there are 413 pages in this book, that's 3+ differences per page. that's a lot. and that's not counting several types of errors i already mentioned, such as the paragraphing mistakes, the missing styling, and so on... of course, there _might_ be an edition of this book out there that matches the p.g. e-text, and not the 2 scan-sets online thus far, but if that's the case, i'd certainly like to see the scan-set from it. otherwise, al is eating a _log_ of crow for his "faithful" comment. -bowerbird ************** One site keeps you connected to all your email: AOL Mail, Gmail, and Yahoo Mail. Try it now. (http://www.aol.com/?optin=new-dp& icid=aolcom40vanity&ncid=emlcntaolcom00000025) -------------- next part -------------- An HTML attachment was scrubbed... URL: