From Bowerbird at aol.com Thu May 1 12:50:48 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 1 May 2008 15:50:48 EDT Subject: [gutvol-d] happy may day Message-ID: happy may day! this worker is taking the day off... see you tomorrow. -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080501/754a31a7/attachment.htm From Bowerbird at aol.com Fri May 2 12:05:03 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 2 May 2008 15:05:03 EDT Subject: [gutvol-d] tor offers books for free, and i get a big surprise Message-ID: tor has been making new, frontlist books available for free. here's the latest: > http://hbpub.vo.llnwd.net/o16/video/olmk/tor.com/Priest,%20Cherie%20-%20Four%20and%20Twenty%20Blackbirds.pdf i accidentally loaded this into the .pdf-viewer in safari, and to my amazement, the viewer works very nicely... for years now i've disabled the .pdf browser plug-in, because it used to always hang, and sometimes crash. i'll probably turn it off in safari as well, but it's nice to know adobe _finally_ got all the bugs out of the thing. tor also offers an .html version for those of you who detest .pdf in any form. if i'm reading on my laptop, i prefer a paginated version myself. and when that laptop is actually at my desk (as opposed to -- say -- frolicking in palisades park overlooking the pacific) with a 23-inch cinema-screen monitor, it would be absolutely silly to give up the 2-page display of .pdf for the screen-wasting 1-column .html nonsense... i recognize the weaknesses of .pdf as well as anyone, but i also recognize its strengths. (tor offers a .mobi version of these books as well.) -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080502/3d6f81de/attachment.htm From julio.reis at tintazul.com.pt Sat May 3 14:44:22 2008 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis) Date: Sat, 03 May 2008 22:44:22 +0100 Subject: [gutvol-d] tor offers books for free, and i get a big surprise In-Reply-To: References: Message-ID: <1209851062.6439.4.camel@abetarda> > tor has been making new, frontlist books available for free. Slightly off-topic, but can anyone send me / let me know where I can find the first PDFs offered? The first one I subscribed was Robert Charles Wilson's "Spin;" I'd love to have any earlier ones. Thanks J?lio. From Bowerbird at aol.com Mon May 5 09:23:35 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 5 May 2008 12:23:35 EDT Subject: [gutvol-d] too funny, part two point nine Message-ID: ok, we're on iteration#9 of the "perpetual p1" experiment... and yep, it has turned up a "real error". two of 'em, in fact. on the very first scan, where the name of a main character -- nelsen -- was misspelled twice as "nelson". too funny... meanwhile, nope, the error on page 33 (an excess comma) was _not_ corrected, so we're gonna have to hope for i#10. oh yeah, another error was exposed. i'll explain that later... -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/86650352/attachment.htm From Bowerbird at aol.com Mon May 5 09:44:08 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 5 May 2008 12:44:08 EDT Subject: [gutvol-d] too funny, part two point eight Message-ID: anyway, i guess "reality" is too slippery of a concept for piggy, because over on the d.p. wiki, he has said there were 7 "real errors" found during iteration#8... under my definition of "real error", though, i find just 2. one is this one, which i've discussed before: > "Okay, Frank. Nobody's indispensible. I might do the same > "Okay, Frank. Nobody's indispensable. I might do the same this is clearly an error, since the dictionary informs us clearly. the other is this one: > ridge, where I often go, when offshift. Carbon dioxide and a > ridge, where I often go, when off-shift. Carbon dioxide and a the second is kind of iffy, but i'll call it a "printer's error" simply because i did, in fact, go and change it in my version of the file. the "off-" compounds are an inconsistent lot in general, and this particular one isn't listed in the dictionary, so i would hesitate to argue with anyone over this, but since "off-duty" and "off-hour" were both hyphenated, _i_ went with the hyphenated "off-shift"... but see offbeat, offcast, offhand, offshoot, offshore, and offstage. finally, note that if you are off-camera, you will also be offscreen. so, you know, reasonable folks can differ, and all of that rot... still, at most we have 2 "real errors" here, and certainly not _7_... so piggy, if you could clear that up for me, i would appreciate it. -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/435830e9/attachment.htm From Bowerbird at aol.com Mon May 5 10:37:09 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 5 May 2008 13:37:09 EDT Subject: [gutvol-d] tor offers books for free, and i get a big surprise Message-ID: julio said: > can anyone send me / let me know where I can find the first PDFs offered? here are the links for some of them... -bowerbird ==================================================== mistborn http://e2ma.net/go/939129369/827662/29986921/goto:http://hbpub.vo.llnwd.net/o1 6/video/olmk/tor.com/9780765350381.pdf spin as .pdf http://e2ma.net/go/959207679/850310/31028053/goto:http://hbpub.vo.llnwd.net/o1 6/video/olmk/tor.com/Wilson,%20Robert%20Charles%20-%20Spin.pdf as .html http://e2ma.net/go/959207679/850310/31028055/goto:http://hbpub.vo.llnwd.net/o1 6/video/olmk/tor.com/WilsonSpinHTML/Wilson,%20Robert%20Charles%20-%20Spin.html as .mobi http://e2ma.net/go/959207679/850310/31028056/goto:http://hbpub.vo.llnwd.net/o1 6/video/olmk/tor.com/WilsonSpinMobi/Wilson,%20Robert%20Charles%20-%20Spin.prc through wolf's eyes as .pdf http://e2ma.net/go/1015848447/910799/33371213/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/Lindskold,%20Jane%20-%20Through%20Wolfs%20Eyes.pdf as .html http://e2ma.net/go/1015848447/910799/33371215/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/LindskoldTWEHTML/Lindskold,%20Jane%20-%20Through%20Wolfs %20Eyes.html as .mobi http://e2ma.net/go/1015848447/910799/33371216/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/LindskoldTWEMobi/Lindskold,%20Jane%20-%20Through%20Wolfs %20Eyes.prc disunited states as .pdf http://e2ma.net/go/1026613141/922980/33806409/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/Turtledove,%20Harry%20-%20The%20Disunited%20States%20of% 20America.pdf as .html http://e2ma.net/go/1026613141/922980/33806383/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/TurtledoveTDSOAHTML/Turtledove,%20Harry%20-%20The%20Disu nited%20States%20of%20America.html as .mobi http://e2ma.net/go/1026613141/922980/33806385/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/TurtledoveTDSOAMobi/Turtledove,%20Harry%20-%20The%20Disu nited%20States%20of%20America.prc reiffen's choice as .pdf http://e2ma.net/go/1037728040/935221/34265550/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/Butler,%20S.%20C.%20-%20Reiffeins%20Choice.pdf as .html http://e2ma.net/go/1037728040/935221/34265552/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/ButlerRCHTML/Butler,%20S.%20C.%20-%20Reiffeins%20Choice. html as .mobi http://e2ma.net/go/1037728040/935221/34265553/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/ButlerRCMobi/Butler,%20S.%20C.%20-%20Reiffeins%20Choice. prc sun of suns as .pdf http://e2ma.net/go/1049214279/947609/34748629/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/Schroeder,%20Karl%20-%20Sun%20of%20Suns.pdf as .html http://e2ma.net/go/1049214279/947609/34748631/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/SchroederSOSHTML/Schroeder,%20Karl%20-%20Sun%20of%20Suns .html as .mobi http://e2ma.net/go/1049214279/947609/34748632/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/SchroederSOSMobi/Schroeder,%20Karl%20-%20Sun%20of%20Suns .prc blackbirds as .pdf http://e2ma.net/go/1062408210/960854/35278504/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/Priest,%20Cherie%20-%20Four%20and%20Twenty%20Blackbirds. pdf as .html http://e2ma.net/go/1062408210/960854/35278506/goto:http://hbpub.vo.llnwd.net/o 16/video/olmk/tor.com/PriestFATBHTML/Priest,%20Cherie%20-%20Four%20and%20Twent y%20Blackbirds.html as .mobi http://e2ma.net/go/1062408210/960854/35278507/goto:http://hbpub.vo.llnwd.net/o 16/ video/olmk/tor.com/PriestFATBmobi/Priest,%20Cherie%20-%20Four%20and%20Twenty%20Blackbirds.prc ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/4eaae019/attachment.htm From hart at pglaf.org Mon May 5 13:37:27 2008 From: hart at pglaf.org (Michael Hart) Date: Mon, 5 May 2008 13:37:27 -0700 (PDT) Subject: [gutvol-d] Google Ads Message-ID: We're considering running tests with Google ads, any objections? Thanks!!! Michael From Bowerbird at aol.com Mon May 5 13:54:11 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 5 May 2008 16:54:11 EDT Subject: [gutvol-d] Google Ads Message-ID: michael said: > Google ads might as well... everybody else is doing it... great idea... before you do, can you tell us where the money will go? -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/2370add0/attachment.htm From Bowerbird at aol.com Mon May 5 13:59:09 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 5 May 2008 16:59:09 EDT Subject: [gutvol-d] at least a little good news from d.p. Message-ID: rfrank sums up the results of his parallel proofing experiment: > http://www.pgdp.net/phpBB2/viewtopic.php?p=452560#452560 he found out that it works. it's good to know that someone over there can recognize the truth when it comes up and kicks them in the shins... of course, as i pointed out originally, we _already_knew_ that it works... *** andre engles adds this: > Regarding parallel proofing, might it be interesting to check whether > parallel proofing works better or worse than subsequent proofing? > That is, let a book go through P1 three times - twice in parallel > (from the same point of departure), and then one of the two results > as a P1 -> P1. Then compare the outcome of P1 -> P1 to that > of the combination of the two parallel rounds. also good to see someone over there knows the _correct_ test to do... and maybe someday they'll get around to running it _intentionally_... *** carlo said: > We indeed suspect that this would be the best procedure > for well-prepared books with good OCR. The experiment is > to gather data and tools; unfortunately the amount of data > to get evidence is huge, and doing it without proper tools > is almost impossible. i'm not sure why carlo thinks "the amount of data to get evidence is huge", since it's not, or why "doing it without proper tools" is a matter of concern for him, since the tools are very easy to build... -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/24177a4d/attachment.htm From piggy at netronome.com Mon May 5 14:11:22 2008 From: piggy at netronome.com (La Monte H.P. Yarroll) Date: Mon, 05 May 2008 17:11:22 -0400 Subject: [gutvol-d] Google Ads In-Reply-To: References: Message-ID: <481F77FA.1030907@netronome.com> Michael Hart wrote: > We're considering running tests with Google ads, > any objections? > > > Thanks!!! > > Michael > Go for it! From klofstrom at gmail.com Mon May 5 14:20:29 2008 From: klofstrom at gmail.com (Karen Lofstrom) Date: Mon, 5 May 2008 11:20:29 -1000 Subject: [gutvol-d] Google Ads In-Reply-To: <481F77FA.1030907@netronome.com> References: <481F77FA.1030907@netronome.com> Message-ID: <1e8e65080805051420l1e927b74o19e9415cac66e538@mail.gmail.com> On Mon, May 5, 2008 at 11:11 AM, La Monte H.P. Yarroll wrote: > Michael Hart wrote: > > We're considering running tests with Google ads, > > any objections? Putting Google ads on PG, or putting ads for PG on Google? NO!!!! and Yes. -- Karen Lofstrom AKA Zora From hart at pglaf.org Mon May 5 14:43:49 2008 From: hart at pglaf.org (Michael Hart) Date: Mon, 5 May 2008 14:43:49 -0700 (PDT) Subject: [gutvol-d] Google Ads In-Reply-To: <1e8e65080805051420l1e927b74o19e9415cac66e538@mail.gmail.com> References: <481F77FA.1030907@netronome.com> <1e8e65080805051420l1e927b74o19e9415cac66e538@mail.gmail.com> Message-ID: Talking of putting Google ads on PG sites. . . . Some have them already, but not the biggest one. . . . We don't control advertizing on other PG sites, just the ones we run ourselves, there are hundreds, if not thousands altogether. We've been in the red for over 5 years now, during which I have put off my whole salary, and even part of my office expenses, but I'm not sure how long I can continue this way. I'm good for at least another year or two, so don't agree just because this is an emergency, becausse it is not, but, I am getting to the age where I have to plan ahead more. Thanks!!! Michael On Mon, 5 May 2008, Karen Lofstrom wrote: > On Mon, May 5, 2008 at 11:11 AM, La Monte H.P. Yarroll > wrote: > >> Michael Hart wrote: >> > We're considering running tests with Google ads, >> > any objections? > > Putting Google ads on PG, or putting ads for PG on Google? > > NO!!!! and Yes. > > -- > Karen Lofstrom > AKA Zora > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From grythumn at gmail.com Mon May 5 14:51:53 2008 From: grythumn at gmail.com (Robert Cicconetti) Date: Mon, 5 May 2008 17:51:53 -0400 Subject: [gutvol-d] Google Ads In-Reply-To: References: <481F77FA.1030907@netronome.com> <1e8e65080805051420l1e927b74o19e9415cac66e538@mail.gmail.com> Message-ID: <15cfa2a50805051451g3775f751pe2c1a40e66df57b5@mail.gmail.com> I suppose the biggest question is a) Does PG pay directly for hosting the main site, and b) If it doesn't, does the entity paying (NCERN?) mind having google ads on a site hosted on their network. I've heard of other non-profit groups that have ran into this or similar problems before and lost their hosting partner... R C On Mon, May 5, 2008 at 5:43 PM, Michael Hart wrote: > > Talking of putting Google ads on PG sites. . . . > > Some have them already, but not the biggest one. . . . > > We don't control advertizing on other PG sites, > just the ones we run ourselves, there are hundreds, > if not thousands altogether. > > We've been in the red for over 5 years now, > during which I have put off my whole salary, > and even part of my office expenses, but I'm > not sure how long I can continue this way. > > I'm good for at least another year or two, > so don't agree just because this is an emergency, > becausse it is not, but, I am getting to the age > where I have to plan ahead more. > > Thanks!!! > > Michael > > > > > On Mon, 5 May 2008, Karen Lofstrom wrote: > > > On Mon, May 5, 2008 at 11:11 AM, La Monte H.P. Yarroll > > wrote: > > > >> Michael Hart wrote: > >> > We're considering running tests with Google ads, > >> > any objections? > > > > Putting Google ads on PG, or putting ads for PG on Google? > > > > NO!!!! and Yes. > > > > -- > > Karen Lofstrom > > AKA Zora > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d at lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Bowerbird at aol.com Mon May 5 15:11:03 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 5 May 2008 18:11:03 EDT Subject: [gutvol-d] Google Ads Message-ID: michael said: > We've been in the red for over 5 years now, > during which I have put off my whole salary, > and even part of my office expenses, but I'm > not sure how long I can continue this way. i can support a salary for you wholeheartedly... but not for anyone else... and _especially_ not over on the d.p. side, given their immoral squandering of energy which volunteers are donating in good faith... they should be _penalized_ for that negligence. -bowerbird p.s. i was gonna say i could see david widger getting a small paycheck too, but that puts us on a very slippery and very steep slope, so no. (but if you _do_ get on that slope, and d.p. did clean up their mess, juliet deserves something. but, to repeat, _only_ if she cleans up their act.) ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/1a1bda7c/attachment.htm From Bowerbird at aol.com Mon May 5 15:34:35 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 5 May 2008 18:34:35 EDT Subject: [gutvol-d] parallel -- the plunderer -- 03 -- yet another upbeat post in this series Message-ID: ok folks, let's take a closer look at "the plunderer", which was rfrank's parallel experiment over at distributed proofreaders. as i told you, rfrank discovered that parallel proofing works! he resolved discrepancies between the parallel p1 iterations, and then subjected the results to a round by the p2 proofers. he reports that: > in the P1 parallel work: > errors found by both: 573 > additional errors only found in [A] path: 60 > additional errors only found in [B] path: 104 > in P2 round after P1 merge: > errors (diffs) reported: 55 so he found that the p1a and p1b iterations found some 573+60+104 errors, giving a subtotal of 737. p2 found an additional 55, for a grand total of 792... p1a found 633, which is 80% of the total of 792. p1b found 677, which is 85% of the total of 792. p1 combined found 737, 93% of the total of 792. we don't know if p1c -- a third iteration through p1 -- would have found as many errors as p2 found (55), but the results of previous d.p. tests indicate they would have. *** however, there is a much more interesting set of data on this book. rfrank notes that "very little" preprocessing was done to the o.c.r. i've noted before that a good chunk of the errors in this text were 340+ spacey-quotes, which can be found and fixed automatically... a number of other errors could've also been fixed in preprocessing. so i took the o.c.r. and did some rather good preprocessing on it... i used known fixes -- nothing fancy -- but went at it aggressively... what i found is that less than 50 errors remained on the body text, with approximately 50 more on the front-matter and book-end ads. these <100 errors are _significantly_ less than rfrank's total of 792, indicating that even on these clean scans, with o.c.r. by finereader, preprocessing can make a huge difference, and save proofers time. moreover, now we can start to see exactly how clean the text can be when you have clean scans, o.c.r. by abby, and good preprocessing... with <50 errors on the 300 pages of body text in this book, given good preprocessing, imagine how well p1 proofers could've done. and a second iteration of p1 -- sequential, not parallel -- would've taken this book right to the point of perfection, if not actually there. i've said it before, but it's time to say it again: the p1 proofers rock. moreover, we have found a new twist on our old pattern: p1a fixes most of the errors (with p1b) plus a few of its own, p1b fixes most of the errors (with p1a) plus a few of its own, and p2 comes in to do clean-up on the ones they both missed. again, this is the pattern you get on page after page, in book after book, day after day, over in d.p.-land... why there is any lack of awareness or comprehension of this pattern is a total and complete mystery to me... -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/3c42ee05/attachment-0001.htm From bzg at altern.org Mon May 5 16:08:45 2008 From: bzg at altern.org (Bastien) Date: Tue, 06 May 2008 01:08:45 +0200 Subject: [gutvol-d] Google Ads In-Reply-To: (Michael Hart's message of "Mon, 5 May 2008 13:37:27 -0700 (PDT)") References: Message-ID: <87k5i81h8i.fsf@bzg.ath.cx> Michael Hart writes: > We're considering running tests with Google ads, > any objections? I guess the main question is: how much do you expect to earn with ads? Here is the dilemma we can anticipate: if you don't expect a lot of money from ads it's not worth bothering people that hate ads; if you expect a lot of money from ads, people that don't care about ads might suddenly express concerns... Hopefully the reality is somewhere in between. Or maybe a donation campaign? Wikipedians might provide useful feedback on how to deal with such a campaign, and internal discussions about ads. -- Bastien From ricardofdiogo at gmail.com Mon May 5 16:28:05 2008 From: ricardofdiogo at gmail.com (Ricardo F Diogo) Date: Tue, 6 May 2008 00:28:05 +0100 Subject: [gutvol-d] Google Ads In-Reply-To: <87k5i81h8i.fsf@bzg.ath.cx> References: <87k5i81h8i.fsf@bzg.ath.cx> Message-ID: <9c6138c50805051628i67c543fem2686350a57af7904@mail.gmail.com> I guess Michael is trying to raise a _very, very_ deeper question here... If you need ads to maintain PG, then I think PG should be immediately shut down. Depending on ads is as bad as depending on public funds. Both advertisers and public intitutions can determine what a person can and cannot do or publish. _Maybe_ if PG is actually at serious financial risk, a worldwide campaign asking for donations would be a better idea. Ricardo From Bowerbird at aol.com Mon May 5 16:50:41 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 5 May 2008 19:50:41 EDT Subject: [gutvol-d] Google Ads Message-ID: ricardo said: > If you need ads to maintain PG, then > I think PG should be immediately shut down. i'm sure p.g. doesn't need ads to be maintained. but i'd say your commitment to the library doesn't seem solid, not if you'd shut it down "immediately" if funds _were_ required. i would say the topic is (a) if we think michael needs a paycheck, and (b) if a suitable way of gathering funds would be ads (or another way). i'm all in favor of a paycheck for michael. he birthed this project and raised it for decades, carrying it on his back for most of that period... and when the p.g. board made their decision, they decided to pay him. they just don't have the funds to make it a reality at this point in time... i hate ads. i used to hate them with a _passion_. but lately i have learned to ignore them entirely, as i suspect most of you have too. indeed, studies show 95% of us _rarely_ click an ad, or never at all. but the 5% who do -- _idiots_, for the most part, but who cares? -- can generate a nice chunk of change, providing you have the traffic. so how many of the people who come to p.g. are ad-clicking idiots? considering that the site has a lot of traffic, it could be quite a lot... but i'm guessing it's not that many, since ad-clickers aren't readers, and readers aren't idiots. but hey, until we try it out, we won't know. if ads don't generate much cash, then we can make a separate decision whether we want to mount some kind of other effort to generate funds to give michael a salary. but if ads will work, i'd say we should do ads... -bowerbird p.s. i think most people already know that they _can_ donate, right? they just don't... after all, we've told them that they books are _free_. high-profile "please donate" campaigns can be as irritating as ads... ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/46686881/attachment.htm From prosfilaes at gmail.com Mon May 5 18:18:44 2008 From: prosfilaes at gmail.com (David Starner) Date: Mon, 5 May 2008 21:18:44 -0400 Subject: [gutvol-d] Google Ads In-Reply-To: <9c6138c50805051628i67c543fem2686350a57af7904@mail.gmail.com> References: <87k5i81h8i.fsf@bzg.ath.cx> <9c6138c50805051628i67c543fem2686350a57af7904@mail.gmail.com> Message-ID: <6d99d1fd0805051818m40155bcble493a3ad7f02fd68@mail.gmail.com> On Mon, May 5, 2008 at 7:28 PM, Ricardo F Diogo wrote: > Depending on ads is as bad as depending on public funds. Both > advertisers and public intitutions can determine what a person can and > cannot do or publish. As a serious matter, many publishers do depend on ads and manage to stay respected as independent publishers. Newspapers, for example. And Project Gutenberg is a library, not publisher of original political opinion. I think we haven't done nearly enough to get donations that we should be looking at this, but it's not like ads or even public funds are going to change what we do unless we choose ones that have those types of strings attached. From hart at pglaf.org Mon May 5 23:26:13 2008 From: hart at pglaf.org (Michael Hart) Date: Mon, 5 May 2008 23:26:13 -0700 (PDT) Subject: [gutvol-d] Google Ads In-Reply-To: <15cfa2a50805051451g3775f751pe2c1a40e66df57b5@mail.gmail.com> References: <481F77FA.1030907@netronome.com> <1e8e65080805051420l1e927b74o19e9415cac66e538@mail.gmail.com> <15cfa2a50805051451g3775f751pe2c1a40e66df57b5@mail.gmail.com> Message-ID: We could NOT have ads on our current largest host. We would have to set up a trail run on anoher host. . . . which we are prepared to do, this is the testing mentioned. mh On Mon, 5 May 2008, Robert Cicconetti wrote: > I suppose the biggest question is a) Does PG pay directly for hosting > the main site, and b) If it doesn't, does the entity paying (NCERN?) > mind having google ads on a site hosted on their network. I've heard > of other non-profit groups that have ran into this or similar problems > before and lost their hosting partner... > > R C > > On Mon, May 5, 2008 at 5:43 PM, Michael Hart wrote: >> >> Talking of putting Google ads on PG sites. . . . >> >> Some have them already, but not the biggest one. . . . >> >> We don't control advertizing on other PG sites, >> just the ones we run ourselves, there are hundreds, >> if not thousands altogether. >> >> We've been in the red for over 5 years now, >> during which I have put off my whole salary, >> and even part of my office expenses, but I'm >> not sure how long I can continue this way. >> >> I'm good for at least another year or two, >> so don't agree just because this is an emergency, >> becausse it is not, but, I am getting to the age >> where I have to plan ahead more. >> >> Thanks!!! >> >> Michael >> >> >> >> >> On Mon, 5 May 2008, Karen Lofstrom wrote: >> >> > On Mon, May 5, 2008 at 11:11 AM, La Monte H.P. Yarroll >> > wrote: >> > >> >> Michael Hart wrote: >> >>> We're considering running tests with Google ads, >> >>> any objections? >> > >> > Putting Google ads on PG, or putting ads for PG on Google? >> > >> > NO!!!! and Yes. >> > >> > -- >> > Karen Lofstrom >> > AKA Zora >> > _______________________________________________ >> > gutvol-d mailing list >> > gutvol-d at lists.pglaf.org >> > http://lists.pglaf.org/listinfo.cgi/gutvol-d >> > >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d >> > From hyphen at hyphenologist.co.uk Mon May 5 23:59:09 2008 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Tue, 6 May 2008 07:59:09 +0100 Subject: [gutvol-d] Google Ads In-Reply-To: References: <481F77FA.1030907@netronome.com> <1e8e65080805051420l1e927b74o19e9415cac66e538@mail.gmail.com> Message-ID: <001801c8af46$b267fcb0$1737f610$@co.uk> Michael Hart wrote >We've been in the red for over 5 years now, >during which I have put off my whole salary, >and even part of my office expenses, but I'm >not sure how long I can continue this way. There is another problem with PG funding apart from it being too low :-( If every volunteer could be persuaded to give say 10USD per year it would go some way to improve the situation. Small contributions from the USA apparently work well, but sky high Bank Charges prevent those outside the USA from giving small amounts say 10USD because bank charges are about 10 USD per transaction. The credit card company charges are more reasonable but still very high. If PG were to create bank accounts in Euroland the UK and other Currency areas it would be possible to pay small amounts into those accounts, then transfer such funds in amounts of more than USD 200 when bank charges are more reasonable. In the UK there is a Direct Debit facility available which works well for small regular payments to another UK Bank account. The down side is that one would require an organisation, in each currency area to handle the not inconsiderable paperwork, and handle the local government controls. Dave Fawthrop From richfield at telkomsa.net Tue May 6 08:59:47 2008 From: richfield at telkomsa.net (Jon Richfield) Date: Tue, 06 May 2008 17:59:47 +0200 Subject: [gutvol-d] Google Ads Message-ID: <48208073.7010702@telkomsa.net> >> We're considering running tests with Google ads, any objections? << Michael, In the circumstances, I am embarrassed to be asked; *I* didn't make monetary sacrifices to keep the pot boiling. Certainly we prefer spam-free pages in every sense, but really, if it means that those who can afford it finally begin to pay to support a genuinely valuable social function, I reckon that we can see our way to closing our callussed corneas and ad-inured eyes to the abominations of the mercenary, for the children of this world are in their generation wiser than the children of light. For my part, I hardly notice the ads anymore anyway. Besides, I should think that PG would be a prime site, whose many patrons have particularly targetable interests, highly prized by the sponsors. Go for it! Jon From Bowerbird at aol.com Tue May 6 13:53:26 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 6 May 2008 16:53:26 EDT Subject: [gutvol-d] parallel -- the plunderer -- 04 -- and another upbeat post in this series Message-ID: more on "the plunderer", rfrank's d.p. parallel experiment... rfrank reports there were 55 changes made by the p2 proofers. i count only 26-32, so it would be good to get his list from him. (i didn't count changes of blank lines, or dehyphenations that could've been done by the machine, so that might explain it...) at any rate, rfrank says only 9 of the changes were "non-trivial": > four stray punctuation marks, > three capitalization errors, > and two actual typos > ("de" for "do" and "*led" for "*ied" after a page break). my analysis differs. i suspect that my auto-detection routines are more aggressive than his, as i count only 3 changes of substance, and 2 of those were _correct_o.c.r._ of an error in the paper-book. all of the other errors were detectable programmatically, and thus didn't require a round of human proofing to be found and fixed... of the 3 changes, 1 is an o.c.r. error which was not auto-detectable: > listening attitude. Suddenly, as if all had been, > listening attitude. Suddenly, as if all had been > http://z-m-l.com/go/plund/plundp287.html there's 1 p-book itso, which is _boring_ (but not "difficult") to detect: > its a cinch, it seems to me, he wouldn't do that for > it's a cinch, it seems to me, he wouldn't do that for > http://z-m-l.com/go/plund/plundp075.html and 1 p-book error, not auto-detectable, which a sharp-eyed p2 found: > died. There those among them who had been in > died. There were those among them who had been in > http://z-m-l.com/go/plund/plundp138.html given that all of these 3 errors would be reasonably expected to be caught by the general public in "continuous proofing", i believe that the question of whether p2 was even needed in this particular book is open for discussion. i am quite serious. if we can get books this perfect with 2 passes through p1 -- whether they be sequential or parallel in nature -- do the benefits of a pass through p2 warrant the costs? i say no... of course, d.p. would have to jack up the quality of its clean-up tools, but that's not all that difficult. rfrank might be just the man to do it... -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080506/564a0f79/attachment.htm From jeroen.mailinglist at bohol.ph Tue May 6 13:57:14 2008 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Tue, 06 May 2008 22:57:14 +0200 Subject: [gutvol-d] Google Ads In-Reply-To: <87k5i81h8i.fsf@bzg.ath.cx> References: <87k5i81h8i.fsf@bzg.ath.cx> Message-ID: <4820C62A.1090009@bohol.ph> Bastien wrote: > Michael Hart writes: > > >> We're considering running tests with Google ads, >> any objections? >> > > I guess the main question is: how much do you expect to earn with ads? > Here is the dilemma we can anticipate: if you don't expect a lot of > money from ads it's not worth bothering people that hate ads; if you > expect a lot of money from ads, people that don't care about ads might > suddenly express concerns... > > I run google ads on a tourism promotion related site I run (www.bohol.ph), and this generates about 200 dollar per month for about 80.000 visits. More than enough to pay its bills, but probably not really well paid given the number of ours gone into it. Based on Alexa rankings, gutenberg.org has about 40 times as many visits, so a simple multiplication would result in about 8000 dollar per month. However, I suspect many more visits are possible if the content is restructured in smaller chunks, and ads are added in between. This is what many of the PG clones are doing. Unfortunately, you cannot estimate easily with google ads. You will have to try and see what comes. Just slamming ads on the site is probably not the best idea. You will have to reorganize and actively manage the collection smartly to maximize income. Lots of literature do not have any keywords that are worth anything for google ads. I was thinking about building an historical travel site around the travelogues available in PG, adding feed back, forums, "what is it like today", and "follow the trail of " features, supported by google ads, and expect, with proper maintenance and smart organization can earn more than enough to run a small team on. However, doing would turn gutenberg.org into a fundamentally different site. Finally, you may loose some of the more active contributors once you start using ads. The more idealistic ones will certainly think you've completely fallen for big money (although the reality is, we all do have to pay our bills). People who hate ads typically block them. So do I with the more annoying varieties. Jeroen. From davedoty at hotmail.com Tue May 6 15:02:02 2008 From: davedoty at hotmail.com (Dave Doty) Date: Tue, 6 May 2008 22:02:02 +0000 Subject: [gutvol-d] Google Ads In-Reply-To: <4820C62A.1090009@bohol.ph> References: <87k5i81h8i.fsf@bzg.ath.cx> <4820C62A.1090009@bohol.ph> Message-ID: I'm chiming in late, so forgive me if I'm retreading covered ground. Another thought: ads can cost donation dollars. Not even necessarily because people have an objection to the ads, but because they're assuming ads are bringing in the bucks, and there's no need for extra support. I have no idea what kind of donations the site brings in. I'm guessing it's not enough to fully cover costs, or this wouldn't even be brought up. Maybe it's virtually nothing, in which case this line of reasoning is irrelevant. But if it's at least a significant percentage, it's worth considering whether ads will bring in enough to offset any potential loss in donations. I don't have enough information to even hazard a guess on what the answer would be in this case. Dave > Date: Tue, 6 May 2008 22:57:14 +0200 > From: jeroen.mailinglist at bohol.ph > To: gutvol-d at lists.pglaf.org > Subject: Re: [gutvol-d] Google Ads > > Bastien wrote: > > Michael Hart writes: > > > > > >> We're considering running tests with Google ads, > >> any objections? > >> > > > > I guess the main question is: how much do you expect to earn with ads? > > Here is the dilemma we can anticipate: if you don't expect a lot of > > money from ads it's not worth bothering people that hate ads; if you > > expect a lot of money from ads, people that don't care about ads might > > suddenly express concerns... > > > > > I run google ads on a tourism promotion related site I run > (www.bohol.ph), and this generates about 200 dollar per month > for about 80.000 visits. More than enough to pay its bills, but probably > not really well paid given the number of ours gone into it. Based on > Alexa rankings, gutenberg.org has about 40 times as many visits, so a > simple multiplication would result in about 8000 dollar per month. > However, I suspect many more visits are possible if the content is > restructured in smaller chunks, and ads are added in between. This is > what many of the PG clones are doing. > > Unfortunately, you cannot estimate easily with google ads. You will have > to try and see what comes. Just slamming ads on the site is probably not > the best idea. You will have to reorganize and actively manage the > collection smartly to > maximize income. Lots of literature do not have any keywords that are > worth anything for google ads. I was thinking about building an > historical travel site around the travelogues available in PG, adding > feed back, forums, "what is it like today", and "follow the trail of > " features, supported by > google ads, and expect, with proper maintenance and smart organization > can earn more than enough to run a small team on. However, > doing would turn gutenberg.org into a fundamentally different site. > > Finally, you may loose some of the more active contributors once you > start using ads. The more idealistic ones will certainly think > you've completely fallen for big money (although the reality is, we all > do have to pay our bills). > > People who hate ads typically block them. So do I with the more annoying > varieties. > > Jeroen. > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d _________________________________________________________________ Windows Live SkyDrive lets you share files with faraway friends. http://www.windowslive.com/skydrive/overview.html?ocid=TXT_TAGLM_WL_Refresh_skydrive_052008 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080506/82cd69fa/attachment.htm From Bowerbird at aol.com Wed May 7 16:34:12 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 7 May 2008 19:34:12 EDT Subject: [gutvol-d] cyberlibrary numbers Message-ID: the open library -- http://openlibrary.org -- "has just finished its latest release", according to an announcement that hit my e-mail this morning. their website says they are now: > featuring 13,439,320 books > (including 234,857 with full-text) so that helps firm up the answer to some questions that we were batting around here somewhat recently. that's a whole lot of scan-sets -- 13.44 million... considerably fewer digital-text e-books, it's true, but still over 10 times bigger than the p.g. library. of course, their digital text isn't _nearly_ as clean as the p.g. e-texts. not even close. not _yet_, anyway. but i intend to do something about that little problem. and it's ironic, because here i have been -- for years, literally _years_ -- offering to help project gutenberg clean up its e-texts, and no one took me up on that... so now, when i go help the o.c.a. clean up their text, there is some chance that they will actually end up with text that's _cleaner_ than the p.g. text, meaning that -- in addition to their huge lead in _quantity_ -- they will edge you out on the _quality_ issue as well... heck, considering that the only "comparison" that can be done will be on the books found in _both_ libraries, and considering the fact that they can use your e-text to correct their own, thereby ensuring that their text is just as good as yours, if not _better_, i have to believe that there is no way they can fall short on a comparison. maybe you should've gone for the quality upgrade while you had the opportunity... now it's too late... -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080507/7ffac2ee/attachment.htm From ebooks at ibiblio.org Thu May 8 01:07:24 2008 From: ebooks at ibiblio.org (Jose Menendez) Date: Thu, 08 May 2008 04:07:24 -0400 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: Message-ID: <4822B4BC.7030102@ibiblio.org> I've been remiss in finishing some replies to a few of Bowerbird's earlier posts, but this one didn't require a lot of typing on my part, which is always convenient for someone who uses the time-honored hunt-and-peck typing method. Bowerbird wrote: > the open library -- http://openlibrary.org -- > "has just finished its latest release", according to > an announcement that hit my e-mail this morning. > > their website says they are now: > > featuring 13,439,320 books > > (including 234,857 with full-text) > so that helps firm up the answer to some questions > that we were batting around here somewhat recently. It would help if you knew what those numbers mean. :) > that's a whole lot of scan-sets -- 13.44 million... It would be a lot--if they actually had 13.44 million scan-sets, but they don't. Apparently, Bowerbird, you didn't bother to go past the main page of the Open Library website. It's too bad you didn't click on the "About the project" link near the bottom of the main page: http://www.openlibrary.org/about Here's an excerpt: "One web page for every book ever published. It's a lofty, but achievable, goal. "To build it, we need hundreds of millions of book records, a brand new database infrastructure for handling huge amounts of dynamic information, a wiki interface, multi-language support, and people who are willing to contribute their time, effort, and book data. "To date, we have gathered about 30 million records (13.4 million are available through the site now), and more are on the way. We have built the database infrastructure and the wiki interface, and you can search millions of book records, narrow results by facet, and search across the full text of 230,000 scanned books...." So the "13,439,320 books" is actually referring to book *records*. The "234,857 with full-text" refers to the number that have been scanned. Now, if you had clicked on the "Add a book" link, also near the bottom of the main page: http://www.openlibrary.org/addbook you would have seen the lengthy form used to add a book record to their database. You should try it. Just think! You could add a book (record) without having to scan a single page or correct any OCR. Of course, since they already have about 30 million records, you might have trouble coming up with a book that's not in their database. > considerably fewer digital-text e-books, it's true, > but still over 10 times bigger than the p.g. library. > > of course, their digital text isn't _nearly_ as clean as > the p.g. e-texts. not even close. > > not _yet_, anyway. > > but i intend to do something about that little problem. Given your track record, perhaps you should have said, "but i intend to do very little (other than talk a lot) about that little problem." > and it's ironic, because here i have been -- for years, > literally _years_ -- offering to help project gutenberg > clean up its e-texts, and no one took me up on that... If you had really wanted to help PG clean up its e-texts, you could have submitted cleaned up versions at any time. I used to do that some years ago. I'd look for items that were available only in plain text format and submit corrected plain text and HTML versions. > so now, when i go help the o.c.a. clean up their text, > there is some chance that they will actually end up > with text that's _cleaner_ than the p.g. text, meaning > that -- in addition to their huge lead in _quantity_ -- > they will edge you out on the _quality_ issue as well... Unless you "help" the OCA as much as you've "helped" PG. :) Jose Menendez P.S. I'm hoping to finish typing up those replies I mentioned above in the near future. From frank.vandrogen at bc.biol.ethz.ch Thu May 8 06:53:31 2008 From: frank.vandrogen at bc.biol.ethz.ch (Frank van Drogen) Date: Thu, 08 May 2008 15:53:31 +0200 Subject: [gutvol-d] Google Ads In-Reply-To: References: <87k5i81h8i.fsf@bzg.ath.cx> <4820C62A.1090009@bohol.ph> Message-ID: Is there a possibility to get insight in the current financial situation of PGLAF, and into considered alternatives for ads? Frank From hart at pglaf.org Thu May 8 10:16:18 2008 From: hart at pglaf.org (Michael Hart) Date: Thu, 8 May 2008 10:16:18 -0700 (PDT) Subject: [gutvol-d] Google Ads In-Reply-To: References: <87k5i81h8i.fsf@bzg.ath.cx> <4820C62A.1090009@bohol.ph> Message-ID: On Thu, 8 May 2008, Frank van Drogen wrote: > Is there a possibility to get insight in the current financial > situation of PGLAF, and into considered alternatives for ads? > > Frank PGLAF has never been totally broke, mostly because I won't even let Greg pay my office expenses when cash flow is low. However, the more we've looked into this Google ad thing, the more it appears we could stand on our own. I've long wondered if we could ever find someone to replace me who would value Project Gutenberg more than their own paycheck. I'm not sure we really want to find out the hard way. We won't be offering any more than someone would get staying in academia and working from there, it was just that academia was pretty inconsistent in their support, which is why we had to create the PGLAF in the first place. By the way, I don't think the ads show up at all on Newby's computer because he has "adblock plus" and I only have the normal "adblock," so it appears no one may have to see ads that doesn't want to. As for myself, I never even notice the Google ads when I do my usual surfing. Michael From hart at pglaf.org Thu May 8 10:33:11 2008 From: hart at pglaf.org (Michael Hart) Date: Thu, 8 May 2008 10:33:11 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: <4822B4BC.7030102@ibiblio.org> References: <4822B4BC.7030102@ibiblio.org> Message-ID: On Thu, 8 May 2008, Jose Menendez wrote: The "234,857 with full-text" refers to the number that have been scanned. /// Actually, if they really are "full-text" in the manner that term has always been used, as opposed to "raw scans," then these 1/4 million or so books would NOT, technically, "refer to the number that have been scanned" but more accurately "refer to the number that have been scanned and converted from image to text mode." If. . .they are using the language as it always has been. . . . However, I don't think the OCA does much proofreading, if any, so we might need even a more detailed technical language. Perhaps "raw text" ??? I'm sure I have left out several other categories that should eventually be included in breaking down the entire processing from MARC records of 13.xs million down to who knows how many eBooks that have been proofread to the current 99.975% level. Reading the below with this in mind might be advantageous. Thanks!!! Michael > I've been remiss in finishing some replies to a few of Bowerbird's > earlier posts, but this one didn't require a lot of typing on my part, > which is always convenient for someone who uses the time-honored > hunt-and-peck typing method. > > > Bowerbird wrote: > >> the open library -- http://openlibrary.org -- >> "has just finished its latest release", according to >> an announcement that hit my e-mail this morning. >> >> their website says they are now: >> > featuring 13,439,320 books >> > (including 234,857 with full-text) >> so that helps firm up the answer to some questions >> that we were batting around here somewhat recently. > > > It would help if you knew what those numbers mean. :) > > >> that's a whole lot of scan-sets -- 13.44 million... > > > It would be a lot--if they actually had 13.44 million scan-sets, but > they don't. > > Apparently, Bowerbird, you didn't bother to go past the main page of > the Open Library website. It's too bad you didn't click on the "About > the project" link near the bottom of the main page: > > http://www.openlibrary.org/about > > Here's an excerpt: > > > > "One web page for every book ever published. It's a lofty, but > achievable, goal. > > "To build it, we need hundreds of millions of book records, a brand > new database infrastructure for handling huge amounts of dynamic > information, a wiki interface, multi-language support, and people who > are willing to contribute their time, effort, and book data. > > "To date, we have gathered about 30 million records (13.4 million are > available through the site now), and more are on the way. We have > built the database infrastructure and the wiki interface, and you can > search millions of book records, narrow results by facet, and search > across the full text of 230,000 scanned books...." > > > > So the "13,439,320 books" is actually referring to book *records*. The > "234,857 with full-text" refers to the number that have been scanned. > > Now, if you had clicked on the "Add a book" link, also near the bottom > of the main page: > > http://www.openlibrary.org/addbook > > you would have seen the lengthy form used to add a book record to > their database. You should try it. Just think! You could add a book > (record) without having to scan a single page or correct any OCR. Of > course, since they already have about 30 million records, you might > have trouble coming up with a book that's not in their database. > > >> considerably fewer digital-text e-books, it's true, >> but still over 10 times bigger than the p.g. library. >> >> of course, their digital text isn't _nearly_ as clean as >> the p.g. e-texts. not even close. >> >> not _yet_, anyway. >> >> but i intend to do something about that little problem. > > > Given your track record, perhaps you should have said, "but i intend > to do very little (other than talk a lot) about that little problem." > > >> and it's ironic, because here i have been -- for years, >> literally _years_ -- offering to help project gutenberg >> clean up its e-texts, and no one took me up on that... > > > If you had really wanted to help PG clean up its e-texts, you could > have submitted cleaned up versions at any time. I used to do that some > years ago. I'd look for items that were available only in plain text > format and submit corrected plain text and HTML versions. > > >> so now, when i go help the o.c.a. clean up their text, >> there is some chance that they will actually end up >> with text that's _cleaner_ than the p.g. text, meaning >> that -- in addition to their huge lead in _quantity_ -- >> they will edge you out on the _quality_ issue as well... > > > Unless you "help" the OCA as much as you've "helped" PG. :) > > > Jose Menendez > > > P.S. I'm hoping to finish typing up those replies I mentioned above in > the near future. > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Bowerbird at aol.com Thu May 8 10:44:52 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 May 2008 13:44:52 EDT Subject: [gutvol-d] Google Ads Message-ID: michael- since you said ibiblio won't allow ads, does this mean that the main u.r.l. -- http://www.gutenberg.org -- is _not_ what we're talking about here? if that's the case, and it would be some _new_ site that you set up, i'd think that you don't even need to ask our thoughts. -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080508/8c062276/attachment.htm From hart at pglaf.org Thu May 8 11:38:02 2008 From: hart at pglaf.org (Michael Hart) Date: Thu, 8 May 2008 11:38:02 -0700 (PDT) Subject: [gutvol-d] Google Ads In-Reply-To: References: Message-ID: On Thu, 8 May 2008, Bowerbird at aol.com wrote: > michael- > > since you said ibiblio won't allow ads, > does this mean that the main u.r.l. > -- http://www.gutenberg.org -- > is _not_ what we're talking about here? > > if that's the case, and it would be some > _new_ site that you set up, i'd think that > you don't even need to ask our thoughts. > > -bowerbird I always like to ask. . . .period. . . . Yes, this would be a different host. The first test we are thinking of would be only a partial hosting, with some of the ibiblio site still getting the hits and some going to the test site. It'll be a while, plenty of time for at least a few generations of comments and trials before going public. . . . mh From Bowerbird at aol.com Thu May 8 12:08:42 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 May 2008 15:08:42 EDT Subject: [gutvol-d] Google Ads Message-ID: michael said: > I always like to ask. . . .period. . . . well, that's nice and all. :+) but it gives us the impression that we have a say, when maybe we _shouldn't_ have a say, not really. i think lots of people don't realize that any time p.g. needed something and no monies existed, you took money out of your pocket to pay for it. and not because you "made a voluntary donation" at that time, but simply because it was _required_. other people have bought things for the project, yes sir. (i know several people have paid for _lots_ of books.) but not even _those_ people have said "you can take whatever money you need any time." so i don't think you have to _ask_us_ now that you would like to get paid back some of that money, or assure that such monies _will_ exist into the future. > It'll be a while, plenty of time for at > least a few generations of comments > and trials before going public. . . . why don't you ask _the_users_ themselves? put a thumbs-up/down poll right on the site. -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080508/b006b53b/attachment.htm From Bowerbird at aol.com Thu May 8 12:20:56 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 May 2008 15:20:56 EDT Subject: [gutvol-d] cyberlibrary numbers Message-ID: thank you, jose, for clearing up my mistaken notion that the o.c.a. had already scanned some 13.44 million books, when the actual number is more like a quarter of a million. of course, it was the lower number that i was dealing with, so almost nothing in my post needs rewriting in response... i also noticed that -- back on groundhog day in february -- umichigan announced their total of 1 million books scanned. since umich have what? -- like 6-9 million volumes or so -- and google has been scanning there for over 3 years now, i guess things aren't going as fast as they originally planned. at any rate, no matter how fast or slow it's all going, i am pleased as punch that we finally started digitizing libraries. -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080508/c4fd13cc/attachment.htm From Bowerbird at aol.com Thu May 8 12:24:42 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 May 2008 15:24:42 EDT Subject: [gutvol-d] cyberlibrary numbers Message-ID: michael said: > However, I don't think the OCA does much proofreading, if any, > so we might need even a more detailed technical language. > Perhaps "raw text" ??? and here's where we start to come full circle... *** people who have been paying attention to my recent analyses of the data from the experiments over at distributed proofreaders should now know that with good scans and good o.c.r., you can move "raw o.c.r." close to perfection with a good clean-up tool... so that's what i intend to do, with the "raw o.c.r." from umichigan -- and google more generally -- _and_ the open content alliance, sometimes even via a _comparison_ of the same book from both... an extremely persistent campaign on my part to get umichigan to fix the fatal flaws in its o.c.r. has _finally_ paid off, i am informed, thanks in part, i would guess, because i went to the very top of the org food-chain and addressed the _university_librarian_ publicly... moreover, an equally tenacious campaign directed at the o.c.a. has -- just today -- finally given me the name of a person in charge of their o.c.r., so i can hope that soon they too will fix their fatal flaws. so i expect that soon i will be able to start scraping text in earnest, and remounting it after aggressively cleaning it with my programs. will this machine-cleaned text be as clean as p.g. e-texts? nope. not at first, anyway. but since i will wrap it in an infrastructure of "continuous proofing" to encourage the error-reporting process, i expect that it won't take long before it matches and exceeds p.g. after all, proofing isn't rocket-science... -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080508/19aa8b40/attachment.htm From hart at pglaf.org Thu May 8 17:28:50 2008 From: hart at pglaf.org (Michael Hart) Date: Thu, 8 May 2008 17:28:50 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: Message-ID: On Thu, 8 May 2008, Bowerbird at aol.com wrote: > thank you, jose, for clearing up my mistaken notion that > the o.c.a. had already scanned some 13.44 million books, > when the actual number is more like a quarter of a million. > of course, it was the lower number that i was dealing with, > so almost nothing in my post needs rewriting in response... > > i also noticed that -- back on groundhog day in february -- > umichigan announced their total of 1 million books scanned. > since umich have what? -- like 6-9 million volumes or so -- > and google has been scanning there for over 3 years now, > i guess things aren't going as fast as they originally planned. > > at any rate, no matter how fast or slow it's all going, i am > pleased as punch that we finally started digitizing libraries. > Google announced on December 14, 2004 that they would digitize 10 million books in 6 years. Of course that includes a lot more libraries than UMich, who,by the way, used to claim Project Gutenberg was there. Hee hee! Meanwhile, it's been nearly 3 1/2 years. If google did 3 1/3 million in the first three years, and then doubled production for the next three years, then they might actually be able to claim on schedule and even longer if tey pretend they never mean a date of December 14, 2004 to be remembered by anyone as an official starting date, een though it was the date of biggest media blitz I've ever seen in my entire life. Hee hee! The real question will be wheter or not Google allows "their" books to become "everyone's" books in a quite useful form, or whether the world will be forced into a permanent continuation of reading over the shoulder of Google, much as the Brtish Library "Readers." Now don't get me wrong, the British Library "Readers" feel themselves to be a particularly well-heeled, and very fortunate bunch. . .but still, even in their put on Sunday best, they have to STAND in tiny carrels to do all their reading. Nevertheless, _I_ am promoting electronic books quite literally on a different plane, books YOU can OWN for as long as you want, and get again if you lose them. Millions of books. You own them all. The "personal computer" as "personal library." Somehow I don't think this is exacly what Google, and the Open Content Alliance, and Million Book Project-- and most of the rest--actually have in mind. And _I_ want them all in full text that can be pulled into any word processor, emailer, text editor, etc. Small files anyone can use in any text program. . . . OWN a million books. Maybe even a billion. . . . No kidding. . . . Michael S. Hart Founder, 1971 Project Gutenberg Inventor of eBooks > -bowerbird > > > > ************** > Wondering what's for Dinner Tonight? Get new twists on family > favorites at AOL Food. > > (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) > From hart at pglaf.org Thu May 8 17:30:56 2008 From: hart at pglaf.org (Michael Hart) Date: Thu, 8 May 2008 17:30:56 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: Message-ID: Personally, I always thought the poor quality of their scans was intentional. . .to prevent creating good enough OCR to do what I mentioned in my previous message. Have they really changed their minds, and will let out their best scans now??? mh On Thu, 8 May 2008, Bowerbird at aol.com wrote: > michael said: >> However, I don't think the OCA does much proofreading, if any, >> so we might need even a more detailed technical language. >> Perhaps "raw text" ??? > > and here's where we start to come full circle... > > *** > > people who have been paying attention to my recent analyses of > the data from the experiments over at distributed proofreaders > should now know that with good scans and good o.c.r., you can > move "raw o.c.r." close to perfection with a good clean-up tool... > > so that's what i intend to do, with the "raw o.c.r." from umichigan > -- and google more generally -- _and_ the open content alliance, > sometimes even via a _comparison_ of the same book from both... > > an extremely persistent campaign on my part to get umichigan to > fix the fatal flaws in its o.c.r. has _finally_ paid off, i am informed, > thanks in part, i would guess, because i went to the very top of the > org food-chain and addressed the _university_librarian_ publicly... > > moreover, an equally tenacious campaign directed at the o.c.a. has > -- just today -- finally given me the name of a person in charge of > their o.c.r., so i can hope that soon they too will fix their fatal flaws. > > so i expect that soon i will be able to start scraping text in earnest, > and remounting it after aggressively cleaning it with my programs. > > will this machine-cleaned text be as clean as p.g. e-texts? nope. > not at first, anyway. but since i will wrap it in an infrastructure of > "continuous proofing" to encourage the error-reporting process, > i expect that it won't take long before it matches and exceeds p.g. > after all, proofing isn't rocket-science... > > -bowerbird > > > > ************** > Wondering what's for Dinner Tonight? Get new twists on family > favorites at AOL Food. > > (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) > From Bowerbird at aol.com Thu May 8 18:53:40 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 May 2008 21:53:40 EDT Subject: [gutvol-d] cyberlibrary numbers Message-ID: michael said: > Personally, I always thought > the poor quality of their scans was intentional. . . > to prevent creating good enough OCR to > do what I mentioned in my previous message. well, you really have to separate the o.c.a. from google. > Have they really changed their minds, > and will let out their best scans now??? it's not really a question of which _scans_ they will "let out". we know google ain't releasing their high-resolution scans; they're too big anyway, and don't give better o.c.r. output... but -- hopefully -- we won't have to deal with their scans, except to display 'em in the "continuous proofing" interface, where we want bandwidth-saving smaller images anyway... but mostly, we'll be dealing with their o.c.r. output... so the question is "how good is their raw o.c.r. output?" and the answer is that it's relatively good. good enough. good enough that, after we run aggressive clean-up on it, it gives us results good enough for "continuous proofing". at least, their o.c.r. _will_ be good enough, once fatal flaws in it are overcome... "fatal flaws" means missing characters, usually em-dashes and quotemarks, as well as pagebreaks (which o.c.r. will record as a formfeed if instructed to do so). i haven't yet ruled out that these "fatal flaws" are intentional, but i do believe they are simply the result of _incompetence_, rather than a sinister attempt to make sure that competitors who might try to "steal" this data are thwarted with bad text. whatever the reasons behind it, the fact is that the _public_ -- who are always shown as the beneficiaries of this work -- simply won't put up with this cruddy text, so i'm just the first in a long series of loud-mouthed complainers if it ain't fixed. but once the fatal flaws are fixed, the o.c.r. gets very good... my tests show google o.c.r. is good, and o.c.a. o.c.r. is great. in one such test, there were only _57_ bad lines in the book using o.c.r. from the o.c.a. and only 240 in the google o.c.r. *** so the o.c.r. is good (with exceptions, yes) in both projects. but there's another important feature -- the convenience... the o.c.a. actively wants you to have the full text of the book, offering it in one file, for maximal downloading convenience. google -- and (per their contract with google) umichigan -- are making it far less convenient, forcing people to undergo a page-by-page interface, and threatening to stop scrapers... i will be attempting to engage john wilkins at umichigan in a conversation involving firm answers to questions about how and where they will draw the lines on "automated scraping". suffice it for now to observe that will be an "interesting" talk, because i sense that they don't _want_ to make it _too_ easy, but they will have a difficult task making hard-and-fast rules that limit access since they want to say their books are "open". (brewster and carl malamud have already called them on this.) just for the record, i stated in the past that it was my belief that google would never share their text, because that would mean giving up their competitive advantage, understandable as it was the result of their investment of hundreds of millions of dollars. they _deserve_ that competitive advantage. but they gave it up. so already google is giving more than i ever thought they would. when they announced they were making the full-text available, they made a big deal that they were now making it _accessible_ to visually-impaired users, so i got the impression that they had concluded they'd face a major a.d.a. suit if they didn't cough up, and i suspect that that is why they decided to release their text... but maybe i'm just not giving them enough credit... the point is that one can now get text from o.c.a. _and_ google, and it's clean enough text that you can move it close to perfect, at least if you've learned as much about o.c.r. clean-up as i have. even if -- for google -- you have to scrape it one page at a time, the clean-up tool can clean one page while it's scraping the next, and can upload the entire book once it has scraped all the pages, so -- you know -- you can just turn the thing on and watch it run. -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080508/d01759df/attachment-0001.htm From hyphen at hyphenologist.co.uk Fri May 9 12:30:04 2008 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Fri, 9 May 2008 20:30:04 +0100 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: Message-ID: <003901c8b20b$1c0a4a80$541edf80$@co.uk> Michael Hart wrote > Personally, I always thought the poor quality of their scans > was intentional. . .to prevent creating good enough OCR to do > what I mentioned in my previous message. > Have they really changed their minds, and will let out their > best scans now??? I looked up one of "my" books and the scan was pretty good, from a library copy complete with hand written marginalia. I expect that I could have OCRed it without problems. Dave F From ebooks at ibiblio.org Fri May 9 13:03:10 2008 From: ebooks at ibiblio.org (Jose Menendez) Date: Fri, 09 May 2008 16:03:10 -0400 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: <4822B4BC.7030102@ibiblio.org> Message-ID: <4824ADFE.6030903@ibiblio.org> On May 8, 2008, Michael Hart wrote: > On Thu, 8 May 2008, Jose Menendez wrote: > > The "234,857 with full-text" refers to the number that have been > scanned. > > > /// > > > Actually, if they really are "full-text" in the manner that term > has always been used, as opposed to "raw scans," then these 1/4 > million or so books would NOT, technically, "refer to the number > that have been scanned" but more accurately "refer to the number > that have been scanned and converted from image to text mode." > > If. . .they are using the language as it always has been. . . . Obviously, if the "full-text" is available, the books were not only scanned but OCRed as well, but Bowerbird's mistake wasn't about the number of books that had been OCRed. It was about the number of books that had been scanned. That's why I wrote, "The '234,857 with full-text' refers to the number that have been scanned." I didn't think I'd have to add the obvious "and OCRed" to the end of the sentence. By the way, it seems that the number given on the Open Library website might be a little out-of-date. If we look at the Internet Archive's Text Archive page, http://www.archive.org/details/texts we'll see that the total number of items listed for the "American Libraries" and "Canadian Libraries" sub-collections is over 298,000. Jose Menendez From ebooks at ibiblio.org Fri May 9 13:06:57 2008 From: ebooks at ibiblio.org (Jose Menendez) Date: Fri, 09 May 2008 16:06:57 -0400 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: Message-ID: <4824AEE1.5090908@ibiblio.org> On May 8, 2008, Bowerbird wrote: > michael said: > > However, I don't think the OCA does much proofreading, if any, > > so we might need even a more detailed technical language. > > Perhaps "raw text" ??? > > and here's where we start to come full circle... > > *** > > people who have been paying attention to my recent analyses of > the data from the experiments over at distributed proofreaders > should now know that with good scans and good o.c.r., you can > move "raw o.c.r." close to perfection with a good clean-up tool... > > so that's what i intend to do, with the "raw o.c.r." from umichigan > -- and google more generally -- _and_ the open content alliance, > sometimes even via a _comparison_ of the same book from both... > > an extremely persistent campaign on my part to get umichigan to > fix the fatal flaws in its o.c.r. has _finally_ paid off, i am informed, > thanks in part, i would guess, because i went to the very top of the > org food-chain and addressed the _university_librarian_ publicly... The Google OCR at the University of Michigan started improving over a year ago. Perhaps you recall my pointing that out to you on the DP forums in January of last year. http://www.pgdp.net/phpBB2/viewtopic.php?p=271008#271008 "But there are other Google books at the UM site that aren't the way you describe. For example, look at this page of OCR text from _Abraham Lincoln_ by Carl Schurz. I see a number of separate paragraphs. I also see a number of quotation marks and even a few end-line hyphens, but not all of the hyphens that should be there." Was the OCR as good as it should have been? No, but it was already getting better. > moreover, an equally tenacious campaign directed at the o.c.a. has > -- just today -- finally given me the name of a person in charge of > their o.c.r., so i can hope that soon they too will fix their fatal flaws. The OCA's OCR also began improving at least a year ago. For example, look at "The Spanish Story of the Armada, and Other Essays": http://www.archive.org/details/spanishstoryofar00frouuoft If we look at the directory with the various files, http://ia340919.us.archive.org/2/items/spanishstoryofar00frouuoft/ we'll see that most of them were posted exactly a year ago, on May 9, 2007. And if you look at the plain-text file, http://ia340919.us.archive.org/2/items/spanishstoryofar00frouuoft/spanishstoryofar00frouuoft_djvu.txt you'll see that it contains quotation marks, apostrophes, end of line hyphens, etc. Em dashes do seem to be missing, but that flaw was fixed with later books, starting last December. For example, look at "The Scarlet Letter, A Romance": http://www.archive.org/details/letterromscarlet00hawtrich The directory listing shows that most of the files were posted on December 15, 2007: http://ia360619.us.archive.org/0/items/letterromscarlet00hawtrich/ And if you look at the plain-text file, http://ia360619.us.archive.org/0/items/letterromscarlet00hawtrich/letterromscarlet00hawtrich_djvu.txt you'll see that it also includes the em dashes. You might not recognize them, but they're there. Look for this string of characters: ??? That's UTF-8 for em dashes. Indeed, if you switch your web browser to use UTF-8 encoding, you'll see them displayed as em dashes. It would be nice, of course, if the OCA would fix the OCR of the books that had been processed earlier. > so i expect that soon i will be able to start scraping text in earnest, > and remounting it after aggressively cleaning it with my programs. You know, Bowerbird, all this time you've been complaining about the quality of the OCA's and Google's OCR, you also could have been doing your own OCR of their scans "in earnest" and "aggressively cleaning it." It's rather easy if you know what you're doing, but I guess it's even easier to sit back and wait for someone else to provide you with high quality OCR. :) > will this machine-cleaned text be as clean as p.g. e-texts? nope. > not at first, anyway. but since i will wrap it in an infrastructure of > "continuous proofing" to encourage the error-reporting process, > i expect that it won't take long before it matches and exceeds p.g. > after all, proofing isn't rocket-science... You've posted links to your "continuous proofing" demos in a number of places, but have you gotten a single person (other than yourself) to use your "error-reporting process"? :) Jose Menendez From ebooks at ibiblio.org Fri May 9 13:07:24 2008 From: ebooks at ibiblio.org (Jose Menendez) Date: Fri, 09 May 2008 16:07:24 -0400 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: Message-ID: <4824AEFC.50102@ibiblio.org> On May 8, 2008, Michael Hart wrote: > Google announced on December 14, 2004 that they would digitize > 10 million books in 6 years. No, they didn't. Here's a link to the story CBS News placed on its website back on Dec. 14, 2004: "Google To Scan Library Volumes" http://www.cbsnews.com/stories/2004/12/14/tech/main660896.shtml The only "timeline" for the scanning given in the article is this: "Michigan's library alone contains 7 million of its library volumes -- about 132 miles of books. Google hopes to get the job done at Michigan within six years, Wilkin said." The article does NOT say how many books Google hoped to scan at the other 4 libraries in those 6 years. And here's a link to the article BBC News posted on Dec. 14, 2004: "Google to scan famous libraries" http://news.bbc.co.uk/2/hi/technology/4094271.stm Again the only mention of 6 years refers to the University of Michigan library: "It will take six years to digitise the full collection at Michigan, which contains seven million volumes." Could you point to a single article posted back then, reporting that Google announced it would scan 10 million books in 6 years? > Of course that includes a lot more libraries than UMich, > who,by the way, used to claim Project Gutenberg was there. Did they also claim Kilroy was there? ;) > If google did 3 1/3 million in the first three years, > and then doubled production for the next three years, > then they might actually be able to claim on schedule > and even longer if tey pretend they never mean a date > of December 14, 2004 to be remembered by anyone as an > official starting date, een though it was the date of > biggest media blitz I've ever seen in my entire life. It was the "biggest media blitz" you've ever seen, yet you always seem to have trouble remembering what was actually said. :) Jose Menendez From lee at novomail.net Fri May 9 13:46:39 2008 From: lee at novomail.net (Lee Passey) Date: Fri, 09 May 2008 14:46:39 -0600 Subject: [gutvol-d] parallel -- the plunderer -- 04 -- and another upbeat post in this series In-Reply-To: References: Message-ID: <4824B82F.306@novomail.net> Bowerbird at aol.com wrote: > given that all of these 3 errors would be reasonably expected to be > caught by the general public in "continuous proofing", i believe that > the question of whether p2 was even needed in this particular book > is open for discussion. And therein lies the rub. Pretty much every suggestion for change you have made to Distributed Proofreaders depends on the existence of an effective and efficient "continuous proofreading" process--which does not exist, and will probably never exist so long as Distributed Proofreaders views Project Gutenberg as it's primary distribution mechanism. The relationship between Project Gutenberg and Distributed Proofreaders is both complex and one-way, with DP playing the role of the unrequited lover. The volunteers at DP go to great lengths and expend great effort in producing the finest work product their processes allow them to. However, once DP has finished its work it is passed over to PG, where it goes into the barrels with all the other apples. This is why the volunteers at DP expend so much time and energy on trying to figure out how many rounds of various types are required to maximize the quality of their work product. They know that once a set of files for any particular work is submitted to PG that there is no effective or efficient process to correct errors, or even enhance the output. While there is a theoretical process to correct problems, there is no practical process. ("In theory, there is no difference between theory and practice. In practice, there is.") For all practical purposes, when a file leaves DP and enters the PG archive it is forever cast in stone. I agree that much of what DP now does would be unnecessary if there were in place a "continuous proofreading" process; but there is not. And so long as DP has no control over the archiving and distribution of its own output it will be unable to put a "continuous proofreading" process into practice. The first step in improving DP's processes should be to find a partner where DO's work product can be archived and accessible /in addition to/ Project Gutenberg, and where Distributed Proofreaders might have the kind of control required to implement "continuous proofreading." From Bowerbird at aol.com Fri May 9 14:45:08 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 May 2008 17:45:08 EDT Subject: [gutvol-d] cyberlibrary numbers Message-ID: jose said: > The Google OCR at the University of Michigan > started improving over a year ago. as long as it has "fatal flaws" in it, "improvement" means little. > Was the OCR as good as it should have been? > No, but it was already getting better. the point is that, as of march 31st of this year, the "problems" are now solved, according to people at the umichigan library. i haven't checked to make sure of that, but i hope that it's true. > The OCA's OCR also began improving at least a year ago. again, "improvement" is nice, but removing fatal flaws is necessary. if you can tell me that all of their o.c.r. output since a certain date is _free_ of fatal flaws, i'll go see if i can confirm that. otherwise... (by the way, one flaw -- quite serious, and perhaps even fatal -- with the o.c.a. output is that it doesn't include the pagebreaks... i'd need that to be fixed before i could seriously work that text.) > It would be nice, of course, if the OCA would fix the OCR > of the books that had been processed earlier. well, yeah, that too... > You know, Bowerbird, all this time you've been complaining > about the quality of the OCA's and Google's OCR, you also > could have been doing your own OCR of their scans "in earnest" i could have. but i'd consider that to be a waste of my time, since both the o.c.a. and google will fix their text eventually. > I guess it's even easier to sit back and wait for someone else > to provide you with high quality OCR. :) if i really wanted to "sit back and wait", i'd sit back and wait until they cleaned up their o.c.r., because they'll have to do _that_ eventually as well. and if it was a lot of work for me to clean up the o.c.r., that's precisely what i _would_ do, because i see no point in re-doing someone else's work when i know that that someone else will eventually re-do the work anyway... > You've posted links to your "continuous proofing" demos > in a number of places, but have you gotten a single person > (other than yourself) to use your "error-reporting process"? :) perhaps no one has found an error to report... ;+) -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080509/f29b9068/attachment.htm From hart at pglaf.org Fri May 9 19:09:45 2008 From: hart at pglaf.org (Michael Hart) Date: Fri, 9 May 2008 19:09:45 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: <4824AEFC.50102@ibiblio.org> References: <4824AEFC.50102@ibiblio.org> Message-ID: On Fri, 9 May 2008, Jose Menendez wrote: > On May 8, 2008, Michael Hart wrote: > > >> Google announced on December 14, 2004 that they would digitize >> 10 million books in 6 years. > > > No, they didn't. Here's a link to the story CBS News placed on its > website back on Dec. 14, 2004: > Your research is very incomplete if you think the CBS story contained all the press releases given by the "Google Print Library personnel on that date. You'd better go back and chekc al the other TV networks, not to mention radio, newspapers, etc. I've quoted many of these here in the past, and I hope that these kinds of statement will NOT be taken at face value when I am gone. Let's not forget all the interviews with the various librarians at the original member institutions of the project, along with any number of Google officials and others. _I_ went over ALL of them I could find, had people send me tapes of others. . . . Why? This was probably a lot more important to me than to anyone else. ;-) mh > "Google To Scan Library Volumes" > http://www.cbsnews.com/stories/2004/12/14/tech/main660896.shtml > > The only "timeline" for the scanning given in the article is this: > > "Michigan's library alone contains 7 million of its library volumes -- > about 132 miles of books. Google hopes to get the job done at Michigan > within six years, Wilkin said." > > The article does NOT say how many books Google hoped to scan at the > other 4 libraries in those 6 years. > > And here's a link to the article BBC News posted on Dec. 14, 2004: > > "Google to scan famous libraries" > http://news.bbc.co.uk/2/hi/technology/4094271.stm > > Again the only mention of 6 years refers to the University of Michigan > library: > > "It will take six years to digitise the full collection at Michigan, > which contains seven million volumes." > > Could you point to a single article posted back then, reporting that > Google announced it would scan 10 million books in 6 years? > > >> Of course that includes a lot more libraries than UMich, >> who,by the way, used to claim Project Gutenberg was there. > > > Did they also claim Kilroy was there? ;) > > >> If google did 3 1/3 million in the first three years, >> and then doubled production for the next three years, >> then they might actually be able to claim on schedule >> and even longer if tey pretend they never mean a date >> of December 14, 2004 to be remembered by anyone as an >> official starting date, een though it was the date of >> biggest media blitz I've ever seen in my entire life. > > > It was the "biggest media blitz" you've ever seen, yet you always seem > to have trouble remembering what was actually said. :) > > > Jose Menendez > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Fri May 9 19:12:47 2008 From: hart at pglaf.org (Michael Hart) Date: Fri, 9 May 2008 19:12:47 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: <4824ADFE.6030903@ibiblio.org> References: <4822B4BC.7030102@ibiblio.org> <4824ADFE.6030903@ibiblio.org> Message-ID: No, it is NOT obvious, so you MUST say "scanned" or "OCRed" [with or without "scanned." Why? Because there are so many people out there muddying up the waters as to what is what. "Full text" can mean SO many different things, and formats. . . even from those who are NOT trying to muddy the waters. Be perfectly clear. . .it will help more than most imagine. mh On Fri, 9 May 2008, Jose Menendez wrote: > On May 8, 2008, Michael Hart wrote: > >> On Thu, 8 May 2008, Jose Menendez wrote: >> >> The "234,857 with full-text" refers to the number that have been >> scanned. >> >> >> /// >> >> >> Actually, if they really are "full-text" in the manner that term >> has always been used, as opposed to "raw scans," then these 1/4 >> million or so books would NOT, technically, "refer to the number >> that have been scanned" but more accurately "refer to the number >> that have been scanned and converted from image to text mode." >> >> If. . .they are using the language as it always has been. . . . > > > Obviously, if the "full-text" is available, the books were not only > scanned but OCRed as well, but Bowerbird's mistake wasn't about the > number of books that had been OCRed. It was about the number of books > that had been scanned. That's why I wrote, "The '234,857 with > full-text' refers to the number that have been scanned." I didn't > think I'd have to add the obvious "and OCRed" to the end of the sentence. > > By the way, it seems that the number given on the Open Library website > might be a little out-of-date. If we look at the Internet Archive's > Text Archive page, > > http://www.archive.org/details/texts > > we'll see that the total number of items listed for the "American > Libraries" and "Canadian Libraries" sub-collections is over 298,000. > > > Jose Menendez > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Sat May 10 06:16:24 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat, 10 May 2008 15:16:24 +0200 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: <4824AEFC.50102@ibiblio.org> Message-ID: <4825A028.2040203@perathoner.de> Michael Hart wrote: >> No, they didn't. Here's a link to the story CBS News placed on its >> website back on Dec. 14, 2004: >> >> "Google To Scan Library Volumes" >> http://www.cbsnews.com/stories/2004/12/14/tech/main660896.shtml > > You'd better go back and chekc al the other TV networks, not to > mention radio, newspapers, etc. Usually, Michael, if YOU make a statement YOU have to prove it. If there are lots of papers, TV networks etc. that brought YOUR version of the facts, then it will be easy for YOU to come up with a link to prove it. > Google announced on December 14, 2004 that they would digitize > 10 million books in 6 years. All that Google "announced" was this blog entry: http://googleblog.blogspot.com/2004/12/all-booked-up.html where they say NOTHING about 10 million NOR ANYTHING about 6 years. Again, give us a link to where Google says they will digitize 10 MB in 6 years or stand back from your claim. -- Marcello Perathoner webmaster at gutenberg.org From joshua at hutchinson.net Sat May 10 09:44:17 2008 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sat, 10 May 2008 16:44:17 +0000 (GMT) Subject: [gutvol-d] cyberlibrary numbers Message-ID: <254161975.687181210437857364.JavaMail.mail@webmail05> The only places where I could find those numbers mixed together was places like this: http://tzvee.blogspot.com/2007/06/google-will-scan-10-million-more.html Quick summary... the goal is 10 million, but no date was given. The U of M contract was 6 years, but no goal was given. Maybe that is where the 10 million/6 years is coming from? Josh On May 10, 2008, marcello at perathoner.de wrote: Michael Hart wrote: > > You'd better go back and chekc al the other TV networks, not to > mention radio, newspapers, etc. Again, give us a link to where Google says they will digitize 10 MB in 6 years or stand back from your claim. From Bowerbird at aol.com Sat May 10 12:38:21 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 10 May 2008 15:38:21 EDT Subject: [gutvol-d] cyberlibrary numbers Message-ID: sometimes the backbiting on this list gets _extremely_ amusing... :+) the original f.a.q. from michigan tells the numbers about michigan... > Q. 1: What is the UM?Google project? > A: The UM-Google project is a partnership between UM and Google > that will make the seven million UM University Library volumes > searchable via the Google search engine, and open the way to > universal access to information. Google will digitize our library > collection and make the items accessible through the Google site. > The University Library will also receive and own a high quality > digital copy of the collection to use for its own > purposes. voila, we have the 7-million number. > Q. 3: How long will the project take? > A: Estimating how long the project will take is difficult, > but we are currently planning for > approximately six years of scanning. voila, we have the 6-year timeframe... there were 5 libraries involved in the project at the outset. my guess -- back then, and even now today -- would be that if they intended to scan 7 million umichigan books in 6 years, they intended to scan _at_least_ another 7 million from the other 4 libraries in that same amount of time, so i'd say the implicit promise was to do 14 million in 6 years, and i don't think you can call that an unreasonable position, either then or now. since -- after 3 years -- they've only scanned _1_ million books from umichigan, then it is _completely_ fair to say that they are "behind schedule" at umichigan. of course, since many libraries (dozens?) have joined the project since its onset, i'd guess the schedule was altered somewhere along the line, and that's fine. i'm convinced they're working on it, and working hard, so fine... i _do_ wish that -- 3 years into it -- they would be a little bit further along than 1 million out of 7 million umichigan books, because that makes it look like this could take 20 years total... but, you know, i'm not paying their bills, so what say do i have? i'm just glad that public-domain books are popping up fast... who knows, there might be a million of them, already or soon. and that's a good number. meanwhile, how about if we instead discuss some topics that have some important substance, instead of mere trivialities?... -bowerbird p.s. by the way, under google's newer contract with the c.i.c., which includes umichigan along with 11 other universities, the "initial term" of the contract was for -- you guessed it -- 6 years. so it seems that they are fixated on that timeframe. perhaps some corporate tax accountants could tell us why... ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080510/8750aa77/attachment.htm From hart at pglaf.org Sat May 10 14:38:14 2008 From: hart at pglaf.org (Michael Hart) Date: Sat, 10 May 2008 14:38:14 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: Message-ID: This fits very well with the comments I was referring to, all given on or about December 14, 2004. I collected up dozens of interviews and news stories, and none of them gave the entire story, but pieces fit here & there into a pattern that wasn't and exact fit, but close enough to get a pretty good picture. I'm sure of the 10 million book estimate by at least one, and of the 6 year figure, as well. Obviously there are additional projects now with Google-- and without Google--my own local university is doing many books, but from the insiders, and it takes insiders to do any research on these, they are choosing books quite much in opposition to our own PG philosophy, choosing lists of books that they are sure no one else would choose & thus, literally, no one else would read in general usage. I do clearly remember some of the librarians' comments in a manner suggesting this should be a library for readers, all over the world. However, as I suspectly then, and suspect even more now-- that many of these eBooks will never see the light of day in any general sense. 1. they are definitely NOT being prominently available. 2. they are definitely NOT of general reader interest. 3. they appear to be mostly "raw scans". . .not sure of the percentage, if any, being reported as OCRed, proofed by a human, or whatever. I have actually spoken in person to the local book czar, and one of the national ones, and can't get any real hot and hard facts about what percentages, etc. But I keep hoping. . . . Michael On Sat, 10 May 2008, Bowerbird at aol.com wrote: > sometimes the backbiting on this list gets _extremely_ amusing... > :+) > > the original f.a.q. from michigan tells the numbers about michigan... > > >> Q. 1: What is the UM??Google project? >> A: The UM-Google project is a partnership between UM and Google >> that will make the seven million UM University Library volumes >> searchable via the Google search engine, and open the way to >> universal access to information. Google will digitize our library >> collection and make the items accessible through the Google site. >> The University Library will also receive and own a high quality >> digital copy of the collection to use for its own >> purposes. > > voila, we have the 7-million number. > > >> Q. 3: How long will the project take? >> A: Estimating how long the project will take is difficult, >> but we are currently planning for >> approximately six years of scanning. > > voila, we have the 6-year timeframe... > > > there were 5 libraries involved in the project at the outset. > my guess -- back then, and even now today -- would be > that if they intended to scan 7 million umichigan books in > 6 years, they intended to scan _at_least_ another 7 million > from the other 4 libraries in that same amount of time, so > i'd say the implicit promise was to do 14 million in 6 years, > and i don't think you can call that an unreasonable position, > either then or now. > > since -- after 3 years -- they've only scanned _1_ million books > from umichigan, then it is _completely_ fair to say that they are > "behind schedule" at umichigan. of course, since many libraries > (dozens?) have joined the project since its onset, i'd guess the > schedule was altered somewhere along the line, and that's fine. > i'm convinced they're working on it, and working hard, so fine... > > i _do_ wish that -- 3 years into it -- they would be a little bit > further along than 1 million out of 7 million umichigan books, > because that makes it look like this could take 20 years total... > but, you know, i'm not paying their bills, so what say do i have? > > i'm just glad that public-domain books are popping up fast... > who knows, there might be a million of them, already or soon. > and that's a good number. > > meanwhile, how about if we instead discuss some topics that > have some important substance, instead of mere trivialities?... > > -bowerbird > > p.s. by the way, under google's newer contract with the c.i.c., > which includes umichigan along with 11 other universities, > the "initial term" of the contract was for -- you guessed it -- > 6 years. so it seems that they are fixated on that timeframe. > perhaps some corporate tax accountants could tell us why... > > > > ************** > Wondering what's for Dinner Tonight? Get new twists on family > favorites at AOL Food. > > (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) > From hart at pglaf.org Sat May 10 14:54:20 2008 From: hart at pglaf.org (Michael Hart) Date: Sat, 10 May 2008 14:54:20 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: <254161975.687181210437857364.JavaMail.mail@webmail05> References: <254161975.687181210437857364.JavaMail.mail@webmail05> Message-ID: On Sat, 10 May 2008, Joshua Hutchinson wrote: > The only places where I could find those numbers mixed together > was places like this: > > http://tzvee.blogspot.com/2007/06/google-will-scan-10-million-more.html > > Quick summary... the goal is 10 million, but no date was given. > The U of M contract was 6 years, but no goal was given. > > Maybe that is where the 10 million/6 years is coming from? > > Josh Josh, You have your tenses inverted. You'd better get used to the idea that not all information and knowledge comes from linked articles. . . . There still is the "real world" out there, without links. Just because you have not done the required research to do references to something that happened December 14, 2004 is neither a valid nor reliable indicator it did not happen. Your comments are a great example for those who have been, and will continue to do so, slamming Wikipedia research. Not that I am saying you used Wikipedia, but just that you oretebd that becuse you have no links, it didn't happen. By the way, you should be able to find some refereences, I provided a few at the time, but I am not really interested in doing your homework for you, as you never say thanks if I spend an hour answering your questions. Haven't you had a chance to learn the moral behind: "You get more with honey than with vinegar." Given the years you have demonstrated this lack, it might, just might, take that many years to change perceptions. "Can the leopard really change its spots?" > > On May 10, 2008, marcello at perathoner.de wrote: > Michael Hart wrote: >> >> You'd better go back and chekc al the other TV networks, not to >> mention radio, newspapers, etc. > > Again, give us a link to where Google says they will digitize 10 MB in 6 > years or stand back from your claim. > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Sat May 10 15:46:54 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun, 11 May 2008 00:46:54 +0200 Subject: [gutvol-d] parallel -- the plunderer -- 04 -- and another upbeat post in this series In-Reply-To: <4824B82F.306@novomail.net> References: <4824B82F.306@novomail.net> Message-ID: <482625DE.70509@perathoner.de> Lee Passey wrote: > The first step in improving DP's processes should be to find a > partner where DO's work product can be archived and accessible /in > addition to/ Project Gutenberg, and where Distributed Proofreaders might > have the kind of control required to implement "continuous proofreading." If Michael decides to make a few quick bucks with his trademark and domain, and thus to abandon the free hosting facilities at ibiblio, DP will just have to step into the vacated environment. The books are PD, the software that drives the web site is GPLed, so is the catalog data. Im sure many PG volunteers will stay with ibiblio and keep the new old distribution site running. DP will just have to come up with a new name and domain. -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Sat May 10 16:26:04 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun, 11 May 2008 01:26:04 +0200 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: <254161975.687181210437857364.JavaMail.mail@webmail05> Message-ID: <48262F0C.3080808@perathoner.de> Michael Hart wrote: > Just because you have not done the required research to do > references to something that happened December 14, 2004 is > neither a valid nor reliable indicator it did not happen. You are an incorrigible liar. (I tried to formulate this in a less rude way, but no polite expression quite covers the ground.) Your assertion is made up. Not only that, but when you are pressed to show some evidence -- and you can't -- then you try to wiggle your way out by reversing the positions. Not we have to do the research to find out if your claim is true (it is not) but YOU have to give US evidence that it is indeed true. I understand that you are jealous because Google is getting all the attention while PG is getting none. Still this is no reason to bad-mouth Google's very laudable exertions by making up "facts". For decades PG neglected to organize itself and to get proper fundings. Now Google is doing what an organized and funded PG ought to have done long ago. So I guess that after all the media attention is falling where it is due. -- Marcello Perathoner webmaster at gutenberg.org From gbnewby at pglaf.org Sat May 10 22:00:25 2008 From: gbnewby at pglaf.org (Greg Newby) Date: Sat, 10 May 2008 22:00:25 -0700 Subject: [gutvol-d] parallel -- the plunderer -- 04 -- and another upbeat post in this series In-Reply-To: <4824B82F.306@novomail.net> References: <4824B82F.306@novomail.net> Message-ID: <20080511050025.GD27486@mail.pglaf.org> On Fri, May 09, 2008 at 02:46:39PM -0600, Lee Passey wrote: > Bowerbird at aol.com wrote: > > > given that all of these 3 errors would be reasonably expected to be > > caught by the general public in "continuous proofing", i believe that > > the question of whether p2 was even needed in this particular book > > is open for discussion. > > And therein lies the rub. Pretty much every suggestion for change you > have made to Distributed Proofreaders depends on the existence of an > effective and efficient "continuous proofreading" process--which does > not exist, and will probably never exist so long as Distributed > Proofreaders views Project Gutenberg as it's primary distribution mechanism. > > The relationship between Project Gutenberg and Distributed Proofreaders > is both complex and one-way, with DP playing the role of the unrequited > lover. The volunteers at DP go to great lengths and expend great effort > in producing the finest work product their processes allow them to. > However, once DP has finished its work it is passed over to PG, where it > goes into the barrels with all the other apples. I LOVE DP, and I love their apples. > This is why the volunteers at DP expend so much time and energy on > trying to figure out how many rounds of various types are required to > maximize the quality of their work product. They know that once a set of > files for any particular work is submitted to PG that there is no > effective or efficient process to correct errors, or even enhance the > output. While there is a theoretical process to correct problems, there > is no practical process. ("In theory, there is no difference between > theory and practice. In practice, there is.") For all practical > purposes, when a file leaves DP and enters the PG archive it is forever > cast in stone. I think you understand the challenges as well as I do, and as always I'm ready to hear about any sort of solution, including but not limited to forking or version control for the PG content. -- Greg > I agree that much of what DP now does would be unnecessary if there were > in place a "continuous proofreading" process; but there is not. And so > long as DP has no control over the archiving and distribution of its own > output it will be unable to put a "continuous proofreading" process into > practice. The first step in improving DP's processes should be to find a > partner where DO's work product can be archived and accessible /in > addition to/ Project Gutenberg, and where Distributed Proofreaders might > have the kind of control required to implement "continuous proofreading." > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From joshua at hutchinson.net Sun May 11 08:43:30 2008 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sun, 11 May 2008 15:43:30 +0000 (GMT) Subject: [gutvol-d] cyberlibrary numbers Message-ID: <86761188.725181210520611248.JavaMail.mail@webmail06> You know, it's kinda sad that the founder of PG is one of the two biggest trolls on gutvol-d. Michael, since you missed it, that was an attempt to find a backup to your statement. The link I found was the closest news article to what you said (there were some others like it, but nothing closer). Yes, I mixed verb tenses up. I apologize to the grammar nazi. Josh On May 10, 2008, hart at pglaf.org wrote: On Sat, 10 May 2008, Joshua Hutchinson wrote: > The only places where I could find those numbers mixed together > was places like this: > > http://tzvee.blogspot.com/2007/06/google-will-scan-10-million-more.html > > Quick summary... the goal is 10 million, but no date was given. > The U of M contract was 6 years, but no goal was given. > > Maybe that is where the 10 million/6 years is coming from? > > Josh Josh, You have your tenses inverted. You'd better get used to the idea that not all information and knowledge comes from linked articles. . . . There still is the "real world" out there, without links. Just because you have not done the required research to do references to something that happened December 14, 2004 is neither a valid nor reliable indicator it did not happen. Your comments are a great example for those who have been, and will continue to do so, slamming Wikipedia research. Not that I am saying you used Wikipedia, but just that you oretebd that becuse you have no links, it didn't happen. By the way, you should be able to find some refereences, I provided a few at the time, but I am not really interested in doing your homework for you, as you never say thanks if I spend an hour answering your questions. Haven't you had a chance to learn the moral behind: "You get more with honey than with vinegar." Given the years you have demonstrated this lack, it might, just might, take that many years to change perceptions. "Can the leopard really change its spots?" > > On May 10, 2008, marcello at perathoner.de wrote: > Michael Hart wrote: >> >> You'd better go back and chekc al the other TV networks, not to >> mention radio, newspapers, etc. > > Again, give us a link to where Google says they will digitize 10 MB in 6 > years or stand back from your claim. > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > _______________________________________________ gutvol-d mailing list gutvol-d at lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From hart at pglaf.org Sun May 11 10:13:11 2008 From: hart at pglaf.org (Michael Hart) Date: Sun, 11 May 2008 10:13:11 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: <86761188.725181210520611248.JavaMail.mail@webmail06> References: <86761188.725181210520611248.JavaMail.mail@webmail06> Message-ID: On Sun, 11 May 2008, Joshua Hutchinson wrote: > You know, it's kinda sad that the founder of PG is one of the two > biggest trolls on gutvol-d. You just can't stand it that snyone stands up to your ridicule. > > Michael, since you missed it, that was an attempt to find a backup > to your statement. The link I found was the closest news article > to what you said (there were some others like it, but nothing > closer). Josh, I am sure you have still not accepted that I am not going to do your homework for you. Again, I repeat, because we have to be sure you actually got the message: Just because you don't find a link doesn't mean it didn't happen. Check the four major US TV networks, LISTEN to what they said, and add in NPR, BBC, CBC, and the newspaper syndicates. Personally, I think I remember one or two interviews that could possibly satisfy even you, but I personally doubt you could be satisfied unless you found someone who said 10 million AND 6 years in the same phrase. As I said before, I pieced all of it together on December 14, 2004, and more analysis during the following few days. You would/could/should have done the same thing if it had any similar importance to you. Now that it's mid-2008, it's just that much harder, and without asking for cooperation, but actually intentionally alienating the sources that could help you, you are just diggin your hole deeper and deeper and deeper. > Yes, I mixed verb tenses up. I apologize to the grammar nazi. Several people here have suggested that we should think more of how this will all sound 10 years or so from now. The real trouble from my point of view, and why I don't just do the obvious of either ignoring you or moderating you is because I have to consider how things will play out when I am gone. You see, I have to respond to your silliness now to make it soo obvious what you do that no one will take these comments you do so often with anything other than a ton of salt. You haven't made ANY points, either logically or emotionally to further whatever cause it is you think you might have. By the way, just what IS your cause for doing such things??? Is there an actual goal you have??? Other than just pouring more noise into the system??? If you had actually done the searches you said you did, I think you would/could/should have found the materials in question. However, I seriously doubt you put in even half as much work on these searches as I did, or you would have found at least a few of the references I and others have brought up. It is all too obvious here, and elsewhere, that those who quite literally make the most noise about troll, are actually trolls, of the lowest/highest order, depending on your perspective. So, for now I will just leave you be, you can have the floor on these until someone gives me feedback that they actully believe your rants and raves. However, I will continue to resist your trolling to get others, including myself, to do your own homework for you. You are such a perfect example, I don't think I should moderate you even if we did such moderation in Project Gutenberg. However, I will remind the audience that those who call for the moderator to "moderate"/censor others the most are those whom a realisitic and logical observer would say should be moderated. But this is all too obvious to the majority, and I will simply, and forever, continue to make it obvious when you make examples so very obvious. Just Joshing. . . . Michael > > Josh > > On May 10, 2008, hart at pglaf.org wrote: > > > On Sat, 10 May 2008, Joshua Hutchinson wrote: > >> The only places where I could find those numbers mixed together >> was places like this: >> >> http://tzvee.blogspot.com/2007/06/google-will-scan-10-million-more.html >> >> Quick summary... the goal is 10 million, but no date was given. >> The U of M contract was 6 years, but no goal was given. >> >> Maybe that is where the 10 million/6 years is coming from? >> >> Josh > > Josh, > > You have your tenses inverted. > > You'd better get used to the idea that not all information > and knowledge comes from linked articles. . . . > > There still is the "real world" out there, without links. > > Just because you have not done the required research to do > references to something that happened December 14, 2004 is > neither a valid nor reliable indicator it did not happen. > > Your comments are a great example for those who have been, > and will continue to do so, slamming Wikipedia research. > > Not that I am saying you used Wikipedia, but just that you > oretebd that becuse you have no links, it didn't happen. > > By the way, you should be able to find some refereences, I > provided a few at the time, but I am not really interested > in doing your homework for you, as you never say thanks if > I spend an hour answering your questions. > > Haven't you had a chance to learn the moral behind: > > "You get more with honey than with vinegar." > > Given the years you have demonstrated this lack, it might, > just might, take that many years to change perceptions. > > "Can the leopard really change its spots?" > > > > > >> >> On May 10, 2008, marcello at perathoner.de wrote: >> Michael Hart wrote: >>> >>> You'd better go back and chekc al the other TV networks, not to >>> mention radio, newspapers, etc. >> >> Again, give us a link to where Google says they will digitize 10 MB in 6 >> years or stand back from your claim. >> >> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d >> > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Sun May 11 10:36:13 2008 From: hart at pglaf.org (Michael Hart) Date: Sun, 11 May 2008 10:36:13 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: <48262F0C.3080808@perathoner.de> References: <254161975.687181210437857364.JavaMail.mail@webmail05> <48262F0C.3080808@perathoner.de> Message-ID: Marcello, You and yours have said pretty much exactly the same thing before. Since no one pays any attention to you, you could save much labor simply by sending the same messages you sent last time. Don't you realize that the more you call names the less anyone pays attention to you? You come along all prickly and poisonous and then wonder why nobody wants to give you a hug. Since I have proven you wrong in the past, without any response that anyone needed such proof, I leave you stew in your juices, along with Josh, and the same potential handful you managed for a few other such rants and raves. Do you really think I have forgotten??? Do you really think everyone else has forgotten??? The truth is that no one believes you. . . . If you really want a serious reply, you'll have to change tack long enough that the memory has faded. However, I will give you a clue: The search terms you would have to use to find what you SAY is non-existent are only half a dozen. Let's presume you need to use ALL possible search terms, that, as we once discussed before, would be about a dozen, and gets, as was stated here, more this than you wanted. So, do a few searches with the half dozens you might pick, and after a few such searches, the answer should be obvious unless some sites have changed between my searches and yours. However, it's past "1984" and we should not be so sure no one, and I mean many no ones, including here, have attempted to get in some rewriting of history. Doubly however, since I am sure you wouldn't thank me, even if I did your homework for you, and wouldn't accept it, no matter what was found, I have no reason to do your homework for you-- only for those who have a sincere desire to know. Do you remember how silly it all looked the last time messages very much like the one below appeared? Since everyone seems to remember, I need not do it again, eh? mh On Sun, 11 May 2008, Marcello Perathoner wrote: > Michael Hart wrote: > >> Just because you have not done the required research to do >> references to something that happened December 14, 2004 is >> neither a valid nor reliable indicator it did not happen. > > You are an incorrigible liar. > > (I tried to formulate this in a less rude way, but no polite > expression quite covers the ground.) > > > Your assertion is made up. > > Not only that, but when you are pressed to show some evidence -- > and you can't -- then you try to wiggle your way out by reversing > the positions. Not we have to do the research to find out if your > claim is true (it is not) but YOU have to give US evidence that > it is indeed true. > > > I understand that you are jealous because Google is getting all > the attention while PG is getting none. > > Still this is no reason to bad-mouth Google's very laudable > exertions by making up "facts". > > For decades PG neglected to organize itself and to get proper > fundings. Now Google is doing what an organized and funded PG > ought to have done long ago. So I guess that after all the media > attention is falling where it is due. > > > -- > Marcello Perathoner > webmaster at gutenberg.org > From hart at pglaf.org Sun May 11 10:48:02 2008 From: hart at pglaf.org (Michael Hart) Date: Sun, 11 May 2008 10:48:02 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: <4825A028.2040203@perathoner.de> References: <4824AEFC.50102@ibiblio.org> <4825A028.2040203@perathoner.de> Message-ID: Sorry, I don't think Marcello can admit that that HUGE media blitz on December 14, 2004 didn't happen all by itself. This could be either because Marcello doesn't understand the P.R. departments of places such as Google or because he does not want YOU to understand. If you think Google JUST put the link below up one day and EVERY major news outlet ran it a few hours later as a major story then you probably don't realize just how much of the "news" is fed to the media via these various P.R. people. Once again I must suggest, as though Marcello and Josh were in a listening mode, to go over the various interviews that aired the particular day of December 14, 2004. There is not, and was not, one single source that provided /all/ the information that was broadcast and printed that day. If you do your homework, you will find many such sources, some I referenced earlier, but none all that hard to find. If you actually look at the origins of Google Print Library from December 14, 2004. . . . Go for it! Or not! But don't complain if you don't. You can tease and troll all you want, but I learned not to work, or play, in response to such teasing and trolling from you. If you were serious, your message below would not be so empty. It would have substance, not just accusations. ;-) mh On Sat, 10 May 2008, Marcello Perathoner wrote: > Michael Hart wrote: > >>> No, they didn't. Here's a link to the story CBS News placed on >>> its website back on Dec. 14, 2004: >>> >>> "Google To Scan Library Volumes" >>> http://www.cbsnews.com/stories/2004/12/14/tech/main660896.shtml >> >> You'd better go back and chekc al the other TV networks, not to >> mention radio, newspapers, etc. > > Usually, Michael, if YOU make a statement YOU have to prove it. > > If there are lots of papers, TV networks etc. that brought YOUR > version of the facts, then it will be easy for YOU to come up > with a link to prove it. > > >> Google announced on December 14, 2004 that they would digitize >> 10 million books in 6 years. > > All that Google "announced" was this blog entry: > > http://googleblog.blogspot.com/2004/12/all-booked-up.html > > where they say NOTHING about 10 million NOR ANYTHING about 6 > years. > > > Again, give us a link to where Google says they will digitize 10 > MB in 6 years or stand back from your claim. > > > > -- > Marcello Perathoner > webmaster at gutenberg.org > From hart at pglaf.org Sun May 11 11:28:39 2008 From: hart at pglaf.org (Michael Hart) Date: Sun, 11 May 2008 11:28:39 -0700 (PDT) Subject: [gutvol-d] Grammar Error vs Logic Error Message-ID: By the way, reversing tenses is not to much a grammatical error as it is an error in logic. Trying to pretend that just because someone can't find links to easy to find reports today about something that took place back in 2004, and the accompanying tense errors, etc., are errors in logical construction, not errors in grammatical construction. Throwing in accusations of "liar," "rude," Nazi, etc., will not gain one much of an audience here in Project Gutenberg. Just because someone won't do your bidding does not make such a person "rude." Just because somoene disagrees with ou doesn't make a "liar." You can always search for the last time I was called a "liar"-- right here on this list--and find out how silly it all looks in the light of reason, logic, and history. Now, back to the point, I am sure that if push comes to shove-- which it might--Google will DENY the following as FALSE: 1. That December 14, 2004 was the official beginning of eBooks or "The Google Print Library" or anything else from Google. However, if you listen/read those reports again you will find a present tense, nothing to indicate the project had started some weeks or months or years ago, or would be starting some numbers of weeks or months or years later. I don't really think there has been a larger "media blitz" from any company in history. BTW, you can find many of my comments, current, previous, and future, searching for "media blitz," and "Google" along with any other terms you like. 2. That anyone SHOULD have SAID doing 10 million books. 3. That anyone SHOULD have SAID doing it in 6 years. Google might even copy some comments given here and say neither of these statements was ever made in any of the news coverage a world saw on or about December 14, 2004. Questions for consideration: Does anyone really think all those interviews taking place in a handful of famous library just happened at the spur of moments' interest created by a simple blog report mentioned earlier? If you think it might have been just incidental/accidental, may a thought about the difference in time zones might give thought to the idea that some of these interviews took place earlier in some time zones than others, by enough of a factor indicating a setup lead time in exess of what you would get with no planning of this as a worldwide event. Oh, well, perhaps I am trying to elevate the logic too far from where it was earlier, perhaps I should have let it all lie. ;-) Michael From hart at pglaf.org Sun May 11 12:20:21 2008 From: hart at pglaf.org (Michael Hart) Date: Sun, 11 May 2008 12:20:21 -0700 (PDT) Subject: [gutvol-d] Current Google Search on 10 million and 6 years Message-ID: I wanted to make sure all the quotes I referenced had not vanished, and no, the very first search I did not only gave the 6 years, but, in addition, even the name of the person in question. The same hit included the UMich reference given earlier, to which a number of other libraries addes similar numbers that day. I also got a hit for the 10 million but I didn't like it as much as the original I got on December 14, 2004. I used only three terms, all this from the first hit. Obviously there are some people here NOT doing their homework. Plus, this was also made obvious in another's previous message, so it becomes just more and more obvious who the real trolls are and that they are just making noise without providing content. Why do I go through all this??? Because I want everyone to remember when Marcello, Josh and Co., try to take over Project Gutenberg, again, and again, and again. If I can get such good answers in the first hit I get, they are obviouly either NOT trying or are totally inept at searching. In either case they have no case. Now perhaps I can leave this all be for another year. I missed this usual ranting and raving in March, as mentioned, but perhaps with the loss of Jon Noring from their ranks, this process just takes a little longer. I've hearde scattered reports that Mr. Noring is ok, but I can get no replies from him. If anyone has more info. . .please. Only 8 weeks until Project Gutenberg turns 38 years old!!! Thanks!!! Michael From Bowerbird at aol.com Sun May 11 12:36:52 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 11 May 2008 15:36:52 EDT Subject: [gutvol-d] doing homework Message-ID: hey, i love doing homework... so i've found _lots_ of quotes! so, you know, if you need any, just shoot me a backchannel... :+) -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080511/b1aeed4e/attachment.htm From hart at pglaf.org Sun May 11 19:44:28 2008 From: hart at pglaf.org (Michael Hart) Date: Sun, 11 May 2008 19:44:28 -0700 (PDT) Subject: [gutvol-d] Current Google Search on 10 million and 6 years In-Reply-To: <53392.83.171.175.104.1210540007.squirrel@www.franken.de> References: <53392.83.171.175.104.1210540007.squirrel@www.franken.de> Message-ID: On Sun, 11 May 2008, Karl Eichwalder wrote: > > Michael Hart schrieb: >> >> I wanted to make sure all the quotes I referenced had not vanished, >> and no, the very first search I did not only gave the 6 years, but, >> in addition, even the name of the person in question. > > etc. pp. > > Do us and you a favor and stop posting. Nobody is > interested in your number acrobatics. It's all boring. Just what numbers are you talking about? Did I bring up any numbers in the last few days? > It is not fair how you treat Marcello and Josh, who > contributed that much to our free library and who are > still productive. Oh, right, you are just fine with them calling me a liar, rude, and a Nazi when they won't even look up the data they say they want? And _I_ am not fair? Is the really the opinion you want on the record as the best you have to offer? > If you do not like Google, get a blog and post your > rantings, but stop spamming this list. YOU aid "of you don't like Google". . .not me. I just quoted their press from December 12, 2004. Why would you make the presumption from that that I don't like them? Sure, I would run that project differently than they do, and you, Josh and Marcello could all run Project Gutenberg better than I do, which is why I let them do just that. Your complaint is??? > > -- > Karl Eichwalder > mh From lee at novomail.net Tue May 27 10:27:49 2008 From: lee at novomail.net (Lee Passey) Date: Tue, 27 May 2008 12:27:49 -0500 Subject: [gutvol-d] Any "Top 1000" style lists of Gutenberg texts in public domain? In-Reply-To: <20080509144312.GA20699@moxie> References: <20080509144312.GA20699@moxie> Message-ID: <483C4495.1090802@novomail.net> Larry Marso wrote: > Anyone aware of any good "Top 1000" style lists of the texts found at > Gutenberg, particularly any that is itself in the public domain? > > I'm not looking for a "# of times downloaded" list, but rather judgments > of the merit of text applying various criteria. I'm not aware of anything like this for Project Gutenberg. Indeed, PG itself only maintains download stats for the past 30 days. However, Bartleby.com has the entire Harvard Classics and Shelf of Fiction online (http://www.bartleby.com/hc/) which implies the values which Charles W. Eliot has assigned to specific books. Great Books of the Western World is a series of books originally published in the United States in 1952 by Encyclop?dia Britannica Inc. in an attempt to present the western canon in a single package of 54 volumes. The series is now in its second edition and contains 60 volumes. The second edition contains 130 authors and 517 individual works. A list of the contents can be found on Wikipedia, http://en.wikipedia.org/wiki/Great_Books_of_the_Western_World. The list of great books is maintained by the Great Books Foundation. You may be able to find a more comprehensive list at their web site, http://www.greatbooks.org/. See also Robert Teeter's list of lists at http://www.interleaves.org/~rteeter/greatbks.html. Lastly, don't spurn the download stats from PG. Given the all-volunteer nature of PG, if a work has been transcribed for PG at all, it means that someone thought it an important enough work to be digitized. If someone goes to the trouble to download it, despite the obvious imperfections, it means that that person probably thought it important enough to read. If you were to take the first 2000 texts created in PG, and sort out the top 1000 downloads, you'd probably have as good a list as most literature professors could create. A close approximation of the top 400 downloads from PG over the past 3 years can be found at http://www.passkeysoft.com/~lee/zero.txt. -- Nothing of significance below this line. From hart at pglaf.org Mon May 12 11:42:50 2008 From: hart at pglaf.org (Michael Hart) Date: Mon, 12 May 2008 11:42:50 -0700 (PDT) Subject: [gutvol-d] Any "Top 1000" style lists of Gutenberg texts in public domain? In-Reply-To: <483C4495.1090802@novomail.net> References: <20080509144312.GA20699@moxie> <483C4495.1090802@novomail.net> Message-ID: If you are really looking for lists as large or larger than 1,000 books, I would ask your locl librarians to look in the lists by Eugene Garfield and his ISI work. He used to publish a list of the 1500 most quoted general works. Don't be misled by his hundreds of other lists, this one should start with Plato, Aristotle, Shakespeare, or the like, and should be obviously the one you are looking for, but he has other lists for various other subjects and media. Another clue that you are on the right list is that the top 10 or so on the list might be quoted 1,000 - 2,000 times in the references he counts. Garfield's list is the only one I know of with so many titles, many/most of which are found in Project Gutenberg if/when they are public domain. If you have trouble finding this list, let me know. Michael From ebooks at ibiblio.org Mon May 12 23:58:40 2008 From: ebooks at ibiblio.org (Jose Menendez) Date: Tue, 13 May 2008 02:58:40 -0400 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: <4824AEFC.50102@ibiblio.org> Message-ID: <48293C20.2010709@ibiblio.org> I see that Michael Hart has been busily employing some of his typical, evasive debating tactics rather than providing any evidence to support his claims. (I say "typical" because I've seen him use these tactics many times before.) Unlike him, I will provide evidence--to refute his claims. :) On May 9, 2008, Michael Hart wrote: > On Fri, 9 May 2008, Jose Menendez wrote: > >> On May 8, 2008, Michael Hart wrote: >> >> >>> Google announced on December 14, 2004 that they would digitize >>> 10 million books in 6 years. >> >> No, they didn't. Here's a link to the story CBS News placed on its >> website back on Dec. 14, 2004: >> > > > Your research is very incomplete if you think the CBS story > contained all the press releases given by the "Google Print Library > personnel on that date. Could you point out where I said that the CBS story contained all the press releases? If I had thought that the CBS story was complete, I wouldn't have bothered to link to the BBC News story as well, would I? And I never said that the BBC story, or both stories together, contained all the press releases, did I? > You'd better go back and chekc al the other TV networks, not to > mention radio, newspapers, etc. I really got a good laugh from this line, Michael, especially since you included "radio." Do you recall when I used an NPR radio report back on the Book People mailing list to *conclusively* demonstrate that another one of your claims about the Google Print Library Project (and its media coverage) was false? Let me remind you. Here's a link to a post I sent to the BP List on July 1, 2005: "Re: !@!Re: [BP] Is Google Print Real?" http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&post=2005-07-01,1 And here's the relevant excerpt: > On June 26, 2005 Michael Hart wrote: [snip] >> Once again, I am ONLY referring to the December 14 public relations >> blitz that included millions of dollars worth of publicity, NONE of >> which mentioned sending searchers off to buy books, to physical >> libraries, or other sources via the BBC, CBS, NBC, ABC, PBS, NPR, etc. > > > Oh, really? NONE of those news sources mentioned those things? Let's put your assertion to the test, shall we? Here's a link to a very short description of the story NPR's "All Things Considered" aired on the afternoon of December 14, 2004: > > "Google to Digitize Major Library Resources" > http://www.npr.org/templates/story/story.php?storyId=4227893 > > Click on the "Listen" icon on that page to hear the actual report that aired on Dec. 14th. (The audio is available in both Real Player and Windows Media Player formats.) Here's a quote from that NPR broadcast: > > > "It'll work just like any other Google search. If you type in, say, 'books ancient Rome,' three titles appear at the top of your results. Click on one, and if the book is out of copyright, John Wilkin at the University of Michigan library says you'll be able to read the whole thing.... If the book is still in copyright, you'll get a few short segments with your search terms highlighted. Google will also tell you where you can buy the book and in some cases where you can borrow it locally." > > > So much for your claim that NONE of your sources mentioned those things. I guess you managed to miss that NPR story, Michael. ;) That was a fun debate. :) (For those living outside the U.S. who may not be familiar with it, NPR stands for National Public Radio.) > I've quoted many of these here in the past, and I hope that these > kinds of statement will NOT be taken at face value when I am gone. I've looked at some of your past messages, posted soon after the Library Project was announced, and, unfortunately for you, they don't support your present claims. Oops! For example, here's a post you sent to the Book People mailing list on June 10, 2005: "More Google Print Queries" http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&post=2005-06-10,2 Here are a few excerpts: > We are coming up on June 14, 2005, the end of the first 6 months > of the project that received millions of dollars of publicity on > December 14th, 2004, when Google revealed that it had "invented" > the idea of the electronic library. > > Here are several aspects of Google Print people have commented a > number of times on, and YOUR comments would be appreciated, both > on these topics and any additional topics or points of view your > own experiences have brought to light. [snip] > 2. Are they producing as many books as planned? > > The initial claims that the hardware was already there > and ready to use and that the libraries were all ready > to provide 10-15,000 books per week for scanning, seem > to have vanished in terms of publicity, but we CAN see > that some Google Print eBooks are actually online, but > it's hard to figure out how many. > > At 10,000 books per week for 50 weeks of the year, the > project would generate 1/2 a million books per year. > > At 15,000 books per week for 50 weeks of the year, the > project would generate 3/4 million books per year. > > Thus it would take 20 years to accomplish their stated > goal of 15 million eBooks in 20 years, at 15,000/week, > but I recall the original goals being set at 10 years, > or perhaps even 15, with no mention of 20 years. Look closely at that last paragraph, Michael, and see what you claimed were Google's "stated goal" and "original goals." Neither one, according to you, was "10 million books in 6 years." Oops! Now, you did post that message nearly 6 months after the big Google "media blitz," and memories can fade, so let's take a look at a message you sent to the gutvol-d list on Dec. 14, 2004, the same day as the "media blitz": "Google Partners with Oxford, Harvard & Others to Digitize Libraries" http://lists.pglaf.org/private.cgi/gutvol-d/2004-December/000978.html Here's an excerpt: > The two projections I heard were 7 and 10 years for the project. Hmmm... There's another mention of "10 years," which you said, in the Book People post I just quoted, was the period the "original goals" were set at. Now let's take a look at a post you sent to the gutvol-d list on Dec. 15, 2004, the day after the "media blitz": "Re: [ebook-community] Google Question for Michael Hart" http://lists.pglaf.org/private.cgi/gutvol-d/2004-December/001003.html Here's an excerpt: > BTW, they said 15 million eBooks. . .and I'm not sure they HAVE > 15 million eBooks that they can legally use in the worldwide > service they announced yesterday. There's the same "15 million" figure you mentioned in your Book People post. You didn't say anything about "10 million." Now let's look at PT1 of the Weekly Project Gutenberg Newsletter you sent out on Dec. 15, 2004, the day after the "media blitz": http://lists.pglaf.org/pipermail/gweekly/2004-December/000043.html Scroll down to the "Headline News from NewsScan and Edupage" section, and you'll see this: > >From NewsScan: > > GOOGLE CUTS DEAL WITH LIBRARIES TO DIGITIZE HOLDINGS > Flush with new wealth after its IPO last summer, Google has offered > to underwrite the cost of digitizing library collections at Harvard, > Stanford, Oxford, the University of Michigan and the New York Public > Library. Although company executives declined to comment on the total > funding amount, one estimate pegs it at $10 for each of the more than 15 > million books and other documents covered in the agreement. [snip] > (New York Times 14 Dec 2004) > There's that "15 million" figure again. Now let's take a look at the "New York Times" article that was cited from Dec. 14, 2004: "Google Is Adding Major Libraries to Its Database" Here's the paragraph most relevant to our little debate: "Although Google executives declined to comment on its technology or the cost of the undertaking, others involved estimate the figure at $10 for each of the more than 15 million books and other documents covered in the agreements. Librarians involved predict the project could take at least a decade." Note the number of books given, Michael: "15 million," not "10 million." Note the time period given: "a decade" (10 years), not "6 years." Note also that it says "at least a decade," not "at most a decade." "At least" means that a decade was the *minimum* amount of time they were predicting the project could take. OOPS! > Let's not forget all the interviews with the various librarians > at the original member institutions of the project, along with > any number of Google officials and others. Note the source the "New York Times" gave for the "at least a decade" prediction: "Librarians involved." > _I_ went over ALL of them I could find, had people send me tapes > of others. . . . Apparently, you couldn't find the "New York Times" article you linked to in your own newsletter. You also apparently missed the ABC World News Tonight report about the project on Dec. 14, 2004. Unfortunately, I couldn't locate a link to the original report with either the poor search function on the ABC News website (http://abcnews.go.com/) or a Google search confined to the ABC News website. But I did find this article, which cites the ABC News report, posted on LISNews Librarian And Information Science News on Dec. 16, 2004: "Google to Digitize 15 Million Books in 10 years" http://lisnews.org/node/12867/ "During the December 14, 2004 broadcast of ABC News World News Tonight, Peter Jennings reported that Google announced their goal to digitize 15 million Books in 10 years. This ABC news report featured a machine housed in the basement of the Stanford University Library that can digitize 1,000 pages each hour where the outcome produces pages of books that can be searched on...." Now, compare that first sentence: "During the December 14, 2004 broadcast of ABC News World News Tonight, Peter Jennings reported that Google announced their goal to digitize 15 million Books in 10 years" to your recent claim: "Google announced on December 14, 2004 that they would digitize 10 million books in 6 years." I suppose you'll tell us now that Peter Jennings didn't do his homework either before airing that report. ;) > Why? > > This was probably a lot more important to me than to anyone else. It was so "important" to you, Michael, yet, as I pointed out in my last post, "you always seem to have trouble remembering what was actually said." :) Last, but not least, let's take a look at a lengthy message you sent to the Book People list on Dec. 21, 2004, just *one week* after the big "media blitz": "Project Googleberg" http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2004&post=2004-12-21,2 Here are some excerpts: > PROJECT GOOGLEBERG > > > This message contains most of those questions Project Gutenberg > received about Google Print over the past week, and first draft > answers. At this time I have not included quotations from some > of reports I have at hand, so if you have any favorites such as > "This is going to change the entire world" sort of thing, email > them to me for inclusion in the final draft. Added questions & > comments are encouraged. [snip] > In the 48 hours since the announcement of the "Google Print" project, > I have listened to 6 major network news stories and read, and reread, > the major print media stories in an attempt to answer these following > questions as best I can. Sometimes it has not been possible to get a > good answer from the information available, and I am either guessing, > or passing on indirect information from others. [snip] > 2. How many books will there be, and when will they be available? > > 15 million was the number thrown around the most, but I doubt that it > is possible that even this collection of famous libraries should have > enough books that fit the criteria they announced for their worldwide > eBook service: > > A. Public Domain > > B. 19th Century > > C. Scannable Editions Before posting more excerpts, I want to point out your obvious confusion. The major news stories I saw, including the two from the CBS News and BBC News websites that I cited in my last post "Google To Scan Library Volumes" http://www.cbsnews.com/stories/2004/12/14/tech/main660896.shtml "Google to scan famous libraries" http://news.bbc.co.uk/2/hi/technology/4094271.stm and the "New York Times" article you linked to in your newsletter, stated that both the Michigan and Stanford libraries had agreed to let Google scan all or nearly all of their books. How you thought that scanning practically all the books in 2 major university libraries meant scanning only 19th century public domain books is beyond me. As for your "19th Century" criterion, those CBS, BBC, and "New York Times" articles show that that limitation applied only to Oxford's library. Keeping that confusion in mind, let's take a look at some more excerpts: > I'm guessing that when they start researching the copyright issues, a > retraction will be made, stating that they have discovered those 19th > Century books might still be under copyright, depending on a lifespan > of the authors. Some of their most important works, such as Oxford's > "Oxford English Dictionary" might have a few volumes published before > the 20th Century that are still under copyright and thus not eligible > for inclusion in their proposed service, along with many other books. > > [Of course this depends on which country the database is placed in, > as the copyright rules for the U.K. are different that in the U.S.] > > Obviously, if they are really going to include 15 million books, they > will have to include nearly every public domain book in each of their > 5 member libraries, from the rarest to the most common. I am told by > information science professionals that there have only been just over > 30 million copyrights sought in entire history of the United States-- > and that includes millions of items besides books. Doing 15 million, > might then become problematic, as there might not be 15 million total > separate books to work from, even presuming that at least half of the > 30 million U.S. copyrights sought between 1790 and 2003 were for book > titles that are now in the public domain. Obviously not every single > book ever published has made it into these 5 libraries. > > The various time frames mentioned have ranged at least from 10 years, > the longest, to 6 years, the shortest, among those I have seen. > > Please let me know if you have seen a wider range. > > I, myself, would bet they will have to cut some corners to include 10 > million books 10 years from the announcement date, December 14, 2014. Aha! There we have it, Michael. Because you didn't think Google would be able to come up with "15 million" public domain books, *you* came up with the "10 million" figure, but note the time span you gave in that last line: "10 years," not the "6 years" you're saying now, which was the time period given in many media reports for digitizing just the University of Michigan library. Well, in conclusion, Michael, once again your claims crashed into the facts and didn't survive the collision. :) Jose Menendez P.S. You sent out that same "Project Googleberg" post to PG's gweekly mailing list on Dec. 21, 2004. Here's a link to it in the gweekly archive: http://lists.pglaf.org/pipermail/gweekly/2004-December/000044.html From hart at pglaf.org Tue May 13 00:58:09 2008 From: hart at pglaf.org (Michael Hart) Date: Tue, 13 May 2008 00:58:09 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: <48293C20.2010709@ibiblio.org> References: <4824AEFC.50102@ibiblio.org> <48293C20.2010709@ibiblio.org> Message-ID: So, Jose, are you really, after our previous exchange... really going to pretend there was no mention of 6 years? Or 10 million books? On December 14, 2004, in the major media? If you are NOT going with that pretense, then all your rhetoric has no logical place in the conversation other than as additional footnotes. Those are the only two items that were in this subject at this time, whether those were said on the record in the major media as a result of Google's project PR. You are all too obviously going for the overload of an awful lot of information, none of which states that it was not true than any of the interviewees said numbers as I have stated. By now you certainly should actually have the names of at least one and perhaps two of those I am quoting. After all, it took me only one search and it was first in the list of hits just a day or two ago. Thus you must know that at least one of the interviews included the "6 years" frame of reference. Why can't you just share that with everyone here, stop pretending it wasn't said, and get on with your life? What, exactly, is your goal with that last message??? Obviously the sheer tonnage was enough for elephants-- but you didn't actually hit either of the targets. Once again I remind you, and request to you, to think, if at all possible, about how all this will look in 10 years or whatever, when you just might want persons in possession of this conversation to take you seriously. Please. . . . Michael PS Yes, I am sure there were discussions of 10 years, and other periods as well, but that is not the issue here. Yes, I am aware there were discussions of various book totals for The Google Print Library, but they aren't a subject of the previous discussion here. Yes, I COULD have tried to make Google's claims seem a lot more by stressing the largest numbers mentioned in any of the converations reported, but I do not work in such a manner. However, I must admit that those who claim 15 million, previously and at present, were probably more on their targets, given the 7 million mentioned of U. Michigan, only one of the project members at that time, and even a smaller fraction of the members announced since, but I am not trying to distort the December 14, 2004 data, via the addition of 20/20 foresight into later media. It SHOULD be obvious to anyone who simply writes down, and then uses, a table of the figures at hand on dates from December 14, 2004 on, The Google Print Library is now in possession of so many more member libraries and so much more technology that keep up with with what is on the record from that date, actually surpassing such figures as if they were standing still, which they are in fact doing, while realily gallops by. However, once again, I am NOT using those new figures, in any way, shape or form. Yes, I did NOT quote the UMich figure of 7 million. Why not? Because I was NOT referring to just that one library-- but the entire Google Print Library project. Yes, other numbers have been mentioned later, but I am not trying to muddy the waters with larger figures. If Google makes it to 10 million I won't be the one to say it should have been 15 million, though I might say I know people who might misquote me as to what I said, in this particular conversation, or others, in, or out the proper or improper contexts. I am not trying to say, and never have, that any media report contained one single reference that said Google would be doing 10 million books in 6 years. . .I ain't saying that it was said altogether in one breath or in one phrase or in one sentence or in one report or by a single network or press syndicate. . .I have said over and over that I had to go through a number of reports, and went through them a number of times, to gain those data points necessary to create a perspective model. Now, could someone else have created differing models? Of course. I never said mine was the only possible interpretation that anyone would/could/or should consider. However, if anyone really cares to look, they can find the 6 years quotation without any real effort. If need be, after a while, I will post the source, and the media corporation reporting the source. If numbers higher than 15 million are there, it makes, without a doubt, my current statement of 10 million to appear conservative by comparison. Next! From Bowerbird at aol.com Tue May 13 01:05:19 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 May 2008 04:05:19 EDT Subject: [gutvol-d] cyberlibrary numbers Message-ID: ok, let's see if we can agree on a few things... google has promised to scan millions and millions and millions of books. heck, sometimes, they even talk about indexing "every book ever written". lofty plans... it's about time _someone_ grew a pair of balls and did this... as far as a _timeframe_ for doing it, most reports gave that as _6_years_, although some claimed "a decade", and some say it will take even longer. 3 years in, it appears that the latter predictions were the most accurate... it has even been said that peter jennings reported on "world news tonight", back on december 14, 2004, that "google announced their goal to digitize 15 million books in 10 years". but a.b.c. _is_ owned by disney, so perhaps they were just making up those numbers, in the great spirit of walt and roy. and it's clear that the university of michigan and google expected that the 7 million books at umichigan would take about 6 years to scan, and i think nobody in their right mind expected google to work _only_ at umichigan during those first 6 years. given all these various reports, a reasonable synthesis of the positions is "10 million books in 6 years" -- best you will get on such a p.r. elephant... but from their own announcement in february of this year, we know that umichigan only has _1_million_ books scanned thus far -- 3 years along -- so i think it's very safe to say that they are "behind schedule at this time"... is there anything here worth arguing about at this time? if so, i don't see it. i'm sure not gonna lose any sleep over any of it... -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080513/2de1e782/attachment.htm From Bowerbird at aol.com Tue May 13 01:17:09 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 May 2008 04:17:09 EDT Subject: [gutvol-d] cyberlibrary numbers Message-ID: if i would've know michael had just posted, i wouldn't have posted too... you can tell when jose gets desperate, because he starts to argue on both sides of the dispute... and his barrage of "data" is meant to get people to stop paying attention, probably so they don't notice how badly he has degraded the "discussion". the truth is pretty easy to see here. google has fallen behind its timetable. but who cares? as far as i know, there was no "over/under" line on them in las vegas. so as long as they're still scanning, i'm still happy with them. everything else is just needless backbiting... -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080513/676378b7/attachment-0001.htm From Bowerbird at aol.com Wed May 14 13:23:23 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 14 May 2008 16:23:23 EDT Subject: [gutvol-d] games with a purpose (turning proofing into a game) Message-ID: luis von ahn has a new site up: "games with a purpose". > http://www.gwap.com/gwap/ -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080514/c7a36157/attachment.htm From joyce.b.wilson at sbcglobal.net Wed May 14 17:13:39 2008 From: joyce.b.wilson at sbcglobal.net (Joyce Wilson) Date: Wed, 14 May 2008 19:13:39 -0500 Subject: [gutvol-d] Line-wrap problems in Chinese texts? Message-ID: <482B8033.7020809@sbcglobal.net> Hey, I've been looking at a lot of recent Chinese texts lately while working on the PG catalog, and I've noticed really widespread problems with lack of line-wrapping in the texts (for instance, of the 6 recently-posted works by Lu Xun, 3 have problems with lack of line-wrap). What's up with that? And in case anyone on this list might be able to pass word along to where it would do good, it would be really *really* helpful to the catalogers if all the new Chinese texts included their titles and authors (if known) *in Chinese characters* at the beginning of the text. Some do, but many don't. When they don't, we have only the romanized versions (sometimes non-standard romanizations, sometimes combined with English translation, sometimes translation alone) in the file header to go on (along with whatever Google can turn up). Google is a life-saver, but it can be time-consuming for non-Chinese-readers to determine that, for instance, the new author "Chr Chr Dau Jen" is already in the catalog as "Yunyangchichidaoren". And aside from the catalogers, it would just be nice to have that information present in the text for the reader (now and then I remember that it isn't all about us catalogers! ;) ). Pip-pip, Joyce W From julio.reis at tintazul.com.pt Thu May 15 12:32:34 2008 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis) Date: Thu, 15 May 2008 20:32:34 +0100 Subject: [gutvol-d] games with a purpose (turning proofing into a game) In-Reply-To: References: Message-ID: <1210879958.12377.2.camel@abetarda> Too bad the terms of service. Closed content, proprietary stuff, yadda yadda. But it looked like a good idea. And I tried it and it's... fun. J?lio. Qui, 2008-05-15 ?s 12:00 -0700, gutvol-d-request at lists.pglaf.org escreveu: > luis von ahn has a new site up: "games with a purpose". > > > http://www.gwap.com/gwap/ From Bowerbird at aol.com Mon May 19 11:48:29 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 19 May 2008 14:48:29 EDT Subject: [gutvol-d] cleaning up the catalog Message-ID: boy, what a mess the p.g. catalog is! i cleaned the info for the english e-texts 10000-14000: > http://z-m-l.com/misc/cata10-14-all.html this is what i need, and might not be useful to p.g. (sorry), but i'm happy to share it. here's a more-concentrated list, showing many of the multiple-item e-texts, which were particularly messy: > http://z-m-l.com/misc/cata10-14-repeats.html this exercise suggests that the post-processors/whitewashers might want to see how items in a series were posted in the past when preparing additional items from the series for submission, with the intent of minimizing the inconsistencies... -bowerbird p.s. if anyone has any questions on what i've done, or why, or anything related to this, i will be happy to address them... ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080519/45c2ef53/attachment.htm From Bowerbird at aol.com Mon May 19 15:25:46 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 19 May 2008 18:25:46 EDT Subject: [gutvol-d] cleaning up the catalog Message-ID: a couple of nice natural experiments have introduced themselves, in checking on some possible duplicates in the library... first, the two "young captives" e-texts are entirely different. ok. second, the two "pearl box" e-texts are highly similar, but not identical... this is a book "containing one hundred beautiful stories for young people", each version contains a few completely different stories from the other one, but there's no list (in either of the versions) of the differences between them. since the versions contain 90-95 identical stories, this seems like a book that would benefit greatly by having the two different versions _merged_ into one. and of course comparison of the two versions could identify the errors in each, so that's the first of our two "natural experiments". third, the two "scranton high chums on the cinder path" look to be identical, although there is some possibility they could be of slightly different editions. either way, a comparison of the two gives us our second "natural experiment". i'll let you know in the next few days how these experiments turn out... -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080519/46ab0210/attachment.htm From hyphen at hyphenologist.co.uk Tue May 20 00:41:04 2008 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Tue, 20 May 2008 08:41:04 +0100 Subject: [gutvol-d] cleaning up the catalog In-Reply-To: References: Message-ID: <000001c8ba4c$dcc04c90$9640e5b0$@co.uk> Bowerbird wrote >>> a couple of nice natural experiments have introduced themselves, in checking on some possible duplicates in the library... first, the two "young captives" e-texts are entirely different. ok. second, the two "pearl box" e-texts are highly similar, but not identical... this is a book "containing one hundred beautiful stories for young people", each version contains a few completely different stories from the other one, but there's no list (in either of the versions) of the differences between them. since the versions contain 90-95 identical stories, this seems like a book that would benefit greatly by having the two different versions _merged_ into one. and of course comparison of the two versions could identify the errors in each, so that's the first of our two "natural experiments". third, the two "scranton high chums on the cinder path" look to be identical, although there is some possibility they could be of slightly different editions. either way, a comparison of the two gives us our second "natural experiment". i'll let you know in the next few days how these experiments turn out... <<< These are the normal differences between different editions of a book written and produced pre 1923. In "my" books I have found different versions of the same poem in different books. I normally include the *longer* version in both books with a note about what I have done. Publishers and editors were stronger than the authors in those days, and took greater liberties, than they do today, I really think that PG and DP are a bit too paranoid about producing an etext which is an *exact* copy of a paper version. I personally think that any difference between the etext and the paper version should be about the same as one could expect between two editions of the same book produced pre 1923. These were regularly re-typeset, complete with typos, spelling mistakes and formatting. Dave Fawthrop -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080520/173b6c4d/attachment.htm From Bowerbird at aol.com Tue May 20 02:50:44 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 20 May 2008 05:50:44 EDT Subject: [gutvol-d] cleaning up the catalog Message-ID: dave said: > These are the normal differences between different editions > of a book written and produced pre 1923. right. i'm pretty sure that the two versions of "pearl box" were based on different editions, since the differences are striking... d.p. people don't lift huge segments out of the books they do... the "scranton chums" differences look like digitization mistakes, rather than version differences, with perhaps a few exceptions... i can't find any scans of this book to say that for sure, however... > In ?my? books I have found different versions of the same poem > in different books.?? I normally include the *longer* version > in both books with a note about what I have done. i'd do something similar if i merged the two "pearl box" e-texts -- i.e., i'd include every piece that was printed in either version... > Publishers and editors were stronger than the authors in those days, > and took greater liberties, than they do today. yeah, a lot of what d.p. people call "the intention of the author" is really something the _publisher_ is more likely to have controlled. > I really think that PG and DP ?are a bit too paranoid about > producing an etext which is an *exact* copy of a paper version. i think if that's _really_ their intention, they're doing a lousy job of it. but i agree with you that, in many cases, that shouldn't be the goal... (but i admit i'm totally confused as to what the exact goal of d.p. is... every aspect that they might claim seems rife with self-contradiction.) > I personally think that any difference between the etext and the > paper version should ?be about the same as one could expect > between two editions of the same book produced pre 1923. i'd want to be more specific about the types of changes allowed. :+) -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080520/54ccee4b/attachment.htm From prosfilaes at gmail.com Tue May 20 03:36:57 2008 From: prosfilaes at gmail.com (David Starner) Date: Tue, 20 May 2008 06:36:57 -0400 Subject: [gutvol-d] cleaning up the catalog In-Reply-To: <000001c8ba4c$dcc04c90$9640e5b0$@co.uk> References: <000001c8ba4c$dcc04c90$9640e5b0$@co.uk> Message-ID: <6d99d1fd0805200336r5feb1df2k59c0536b5475f61e@mail.gmail.com> On Tue, May 20, 2008 at 3:41 AM, Dave Fawthrop wrote: > These are the normal differences between different editions of a book > written > and produced pre 1923. In "my" books I have found different versions of > the > same poem in different books. I normally include the *longer* version in > both > books with a note about what I have done. Publishers and editors were > stronger > than the authors in those days, and took greater liberties, than they do > today, And what makes the *longer* version the right one? What makes it the one that the author originally wrote? What makes it fundamentally wrong to abridge a poem for an anthology, which is still done today? > I personally think that any difference > between the > etext and the paper version should be about the same as one could expect > between > two editions of the same book produced pre 1923. We can do better. We aren't pre-1923, and we don't have the same constraints on printing. Important books are reproduced verbatim in the modern world, and I see no reason why we shouldn't do the same. Quick careless reproductions are hurting Project Gutenberg's reputation, so we need to do better. From marcello at perathoner.de Tue May 20 05:51:49 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 20 May 2008 14:51:49 +0200 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: <4824AEFC.50102@ibiblio.org> <4825A028.2040203@perathoner.de> Message-ID: <4832C965.7090803@perathoner.de> Michael Hart wrote: > Once again I must suggest, as though Marcello and Josh were in a > listening mode, to go over the various interviews that aired the > particular day of December 14, 2004. Obviously, as you already did the research, it would cost you nothing to spring the links to those interviews. (If those interviews existed.) > There is not, and was not, one single source that provided /all/ > the information that was broadcast and printed that day. So you picked "6 years" from one guy here and "10 millions" from another guy there and then compounded that to an official Google announcement? No wonder you don't get interviewed by the media. > If you do your homework, you will find many such sources, some I > referenced earlier, but none all that hard to find. Maybe you should have become a teacher. Then you could have given everybody their homework. -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Tue May 20 05:54:20 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 20 May 2008 14:54:20 +0200 Subject: [gutvol-d] Current Google Search on 10 million and 6 years In-Reply-To: References: Message-ID: <4832C9FC.5080704@perathoner.de> Michael Hart wrote: > Because I want everyone to remember when Marcello, Josh and Co., > try to take over Project Gutenberg, again, and again, and again. That's ridiculous. What should I want to take over? The BIG fundings? The WELL-RUN organisation? The EVER-INCREASING volunteer base? As a matter of fact, DP has long since taken over PG. With their well-organized workflow (instead of the clueless PG anarchy) they have produced more books in 8 years than PG in 38. THEY are creating books now, not PG. -- Marcello Perathoner webmaster at gutenberg.org From hart at pglaf.org Tue May 20 08:22:10 2008 From: hart at pglaf.org (Michael Hart) Date: Tue, 20 May 2008 08:22:10 -0700 (PDT) Subject: [gutvol-d] cleaning up the catalog In-Reply-To: <6d99d1fd0805200336r5feb1df2k59c0536b5475f61e@mail.gmail.com> References: <000001c8ba4c$dcc04c90$9640e5b0$@co.uk> <6d99d1fd0805200336r5feb1df2k59c0536b5475f61e@mail.gmail.com> Message-ID: On Tue, 20 May 2008, David Starner wrote: > On Tue, May 20, 2008 at 3:41 AM, Dave Fawthrop > wrote: >> These are the normal differences between different editions of a >> book written and produced pre 1923. In "my" books I have found >> different versions of the same poem in different books. I >> normally include the *longer* version in both books with a note >> about what I have done. Publishers and editors were stronger >> than the authors in those days, and took greater liberties, than >> they do today, > > And what makes the *longer* version the right one? What makes it > the one that the author originally wrote? What makes it > fundamentally wrong to abridge a poem for an anthology, which is > still done today? When it comes down to which is a "wrong" or "right" publication, I tend to side with the author against editors and publishers. After all, it is the mind and heart of the author we try to see into when we read, not the minds and hearts of the publishers. Yes, the publishers used to always get the last word, and also the last sheckel, ducat, ruble, mark, franc, dollar, whatever. Well, the authors still get only about 5% of the gross, but it is nice that they sometimes have a little control nowadays >> I personally think that any difference between the etext and the >> paper version should be about the same as one could expect >> between two editions of the same book produced pre 1923. Personally, I think we can do better than pre-1923. No reason we can't do both versions. However, if one must the chosen over the other, I choose authors, as the creator of the baby in question, just as did Solomon. It's the author's baby, the editors and publishers are midwives, at best, but the book industry gets 19/20 of the cash, isn't it enough that they get that, without getting to do plastic surgery on the baby to make it look more like them, less like an author? > We can do better. We aren't pre-1923, and we don't have the same > constraints on printing. Important books are reproduced verbatim > in the modern world, and I see no reason why we shouldn't do the > same. Verbatim reproductions are really nothing more than Xeroxes. Project Gutenberg should be more than just an eXerox machine. > Quick careless reproductions are hurting Project Gutenberg's > reputation, so we need to do better. Everyone here can run Project Gutenberg better than anyone else. There is no doubt about that in everyone's mind. Yet, "Project Gutenberg's reputation" has been created by some 50,000+ volunteers from all walks of life all over the world, and not by the editors of Random House, Simon and Schuster, Knopf, Ballantine, HarperCollins, etc., etc., etc. These people had way more than enough control in their day, let's not help extend their control to the next millennium. However, if you want to personally create verbatim editions we will be only too glad to included them in our collection. We just won't demand that they be the ONLY editions. . . . Michael S. Hart Founder Project Gutenberg From hart at pglaf.org Tue May 20 10:01:18 2008 From: hart at pglaf.org (Michael Hart) Date: Tue, 20 May 2008 10:01:18 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: <4832C965.7090803@perathoner.de> References: <4824AEFC.50102@ibiblio.org> <4825A028.2040203@perathoner.de> <4832C965.7090803@perathoner.de> Message-ID: So, you still refuse the possibility that a better picture of Google's initial rollout of Google "Print Library" would have been available from the reading of that single entry, rather, of course, than by partaking of all available soures. No wonder you have such a myopic view of the things we talked about over the years. Of course, you ARE correct to do this if you want to be SURE. "Person who has one clock ALWAYS knows the correct time; person who has two clocks NEVER knows the correct time." [Well now there are "atomic clocks" but. . . .] So, I can see how life is so much easier for you with just a single source of information to quote from that you would be loathe to add another source, much less a dozen or two. I leave you with the words of Isaac Asimov's nomination to a position of "World's Smartest Person". . . . "You don't understand anything until you learn it more than one way" Your single source was not in charge of all the aspects of a half dozen major libraries doing "Google Print Library." No one could really exercise that kind of control. Therefore other points of view could possibly be worthwhile, and a viewpoint comprising multiple sources just might maybe barely possibly have more relation to the real world than an elementary single statement press release. Not, of course, that I believe that single press release was ALL the Google management gave to ALL the worldwide press. I can see that the syndicated news continues to escape every possible point of your attention on this, where single quote sources DO mention years and millions of books in a "single- source" aspect you find de rigueur for such events. Actually, however, the more you say, the more I am sure your reading actually included such sources, and that you are now and have always been intentionally ignoring them. I just can't possibly imagine that you missed them without a very intentional intervention of your myopic viewpoint as an obvious fact that syndicated news sources quoted in hundreds or a thousand newspapers could not have anything of offer. The real question is, of course, why are you continuing this "tag team trolling" to try to keep a flame war alive that is obviously of no interest to anyone but your tag team members after such a long time of silence and not more research on a subject that is still so easy to find with Google searches. What possible worthwhile motivation could you have? What point are you trying to make? That certain Google representives to the press might have a penchant for avoiding specifics such as saying how many the years might be to do how many books? I'll certainly grant you that which is why it is important, to no mean degree, to find multiple sources. Game! Set! Match! There is nothing you can add that doesn't make you seem in ever sillier perspectives, and hasn't been all along. So why do you continue making a fool of yourself, and, our listserver, which people will probably read years ahead. Doesn't it matter at all to you what you/we look like some years down the road when you might want us to be trusted? End. mh On Tue, 20 May 2008, Marcello Perathoner wrote: > Michael Hart wrote: > >> Once again I must suggest, as though Marcello and Josh were in >> a >> listening mode, to go over the various interviews that aired >> the >> particular day of December 14, 2004. > > Obviously, as you already did the research, it would cost you > nothing to spring the links to those interviews. (If those > interviews existed.) > > >> There is not, and was not, one single source that provided >> /all/ >> the information that was broadcast and printed that day. > > So you picked "6 years" from one guy here and "10 millions" from > another guy there and then compounded that to an official Google > announcement? > > No wonder you don't get interviewed by the media. > > >> If you do your homework, you will find many such sources, some >> I >> referenced earlier, but none all that hard to find. > > Maybe you should have become a teacher. Then you could have given > everybody their homework. > > > -- > Marcello Perathoner > webmaster at gutenberg.org > From Bowerbird at aol.com Tue May 20 11:44:26 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 20 May 2008 14:44:26 EDT Subject: [gutvol-d] cyberlibrary numbers Message-ID: you know one thing that's interesting about this? every one of the three major broadcast networks carried this item on their nightly news show, but we can't access a single transcript. not just for that one night, but for _any_ night. they cram propaganda down our throats but keep no public record, so we can't even go back after the fact and check what they've said... so -- years later -- we have to piece it together like a jigsaw puzzle. it sure makes it easy for _liars_ like george w. bush to operate, eh? (and remember when the republicans actually put forth into motion _impeachment_proceedings_ against bill for lying about a blowjob? the hypocrisy of that political party is so overwhelming it stuns me.) i forget what orwell called it, but those people who claimed that his predictions didn't come true have their heads up their butts... -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080520/90291972/attachment.htm From gbnewby at pglaf.org Tue May 20 13:17:52 2008 From: gbnewby at pglaf.org (Greg Newby) Date: Tue, 20 May 2008 13:17:52 -0700 Subject: [gutvol-d] OLPC eBook reader Message-ID: <20080520201752.GA13298@mail.pglaf.org> Seen on slashdot: 2nd Generation "$100 Laptop" Will Be an E-Book Reader http://hardware.slashdot.org/article.pl?sid=08/05/20/1621214 "At a conference sponsored by the One Laptop Per Child Foundation this morning, OLPC founder unveiled the design for the foundation's second-generation laptop. It's actually not a laptop at all -- it's a dual-screen e-book reader (we've got pictures). Negroponte said the foundation hopes that the cost of the new device, which is scheduled for production by 2010, can be kept to $75, in part by using low-cost displays manufactured for portable DVD players." The article: http://www.xconomy.com/2008/05/20/negroponte-unveils-2nd-generation-olpc-laptop-its-an-e-book/ As mentioned on gutvol-d before, PG tried to make eBooks available for [current] OLPC XO system but they didn't end up taking many [or maybe any], due to some arbitrary file format restrictions they have. -- Greg From marcello at perathoner.de Tue May 20 13:51:50 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 20 May 2008 22:51:50 +0200 Subject: [gutvol-d] Grammar Error vs Logic Error In-Reply-To: References: Message-ID: <483339E6.4090508@perathoner.de> Michael Hart wrote: > Just because somoene disagrees with ou doesn't make a "liar." What do you call a person that makes a public statement, and when challenged to post evidence to his claim, openly refuses and tries to reverse the burden of proof? Your post about Google is as clear a case of defamation as can be. > If you think it might have been just incidental/accidental, may > a thought about the difference in time zones might give thought > to the idea that some of these interviews took place earlier in > some time zones than others, by enough of a factor indicating a > setup lead time in exess of what you would get with no planning > of this as a worldwide event. If you can't give any evidence for your claim, don't mention the "difference in time zones". You just made yourself more ridiculous. First law of holes: if you are in one stop digging. -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Tue May 20 14:24:33 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 20 May 2008 23:24:33 +0200 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: <4824AEFC.50102@ibiblio.org> <4825A028.2040203@perathoner.de> <4832C965.7090803@perathoner.de> Message-ID: <48334191.1020809@perathoner.de> Michael Hart wrote: Your original statement was that: > Google announced on December 14, 2004 that they would digitize > 10 million books in 6 years. I might remember you that to prove your statement you have to show that: 1. At least one person mentioned said numbers in said context. 2. That person is an official representative of Google. > Therefore other points of view could possibly be worthwhile, > and a viewpoint comprising multiple sources just might maybe > barely possibly have more relation to the real world than an > elementary single statement press release. That amounts to admitting that you assembled the "Google announcement" in your head. You might want to be more careful. Losing a libel case against Google would be bad publicity for PG. -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Tue May 20 16:29:16 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 20 May 2008 19:29:16 EDT Subject: [gutvol-d] can't see the forest for the trees Message-ID: greg said: > As mentioned on gutvol-d before, > PG tried to make eBooks available for [current] OLPC XO system > but they didn't end up taking many [or maybe any], > due to some arbitrary file format restrictions they have. well, far be it from me to correct an executive officer from p.g. -- and greg, please do let us know if you _do_ know better than me -- but the reason o.l.p.c. didn't scoop up the p.g. library in one bite is because your library has inconsistencies that make it _impossible_. and that is the _only_ reason. heck, even your _catalog_ is an unworkable mess. at the time the original o.l.p.c. person came around, i talked to him -- he was a kid doing a summer internship -- and he had zero idea of the complexities that awaited him. so, of course, he failed badly. you can't even fathom the problems in 3 months, let alone solve 'em. the inconsistencies in your library make it unworkable _as_ a library. and _that_ is the reason o.l.p.c. didn't (couldn't) incorporate it. and that is the _only_ reason. a consistent library would be easy to re-engineer to "restrictions". you can't make a viewer-program for these books, because of their _inconsistencies_, a point i've made here for well over 4 years now... _i_ can make such a viewer-program, because i know how to resolve the inconsistencies, but i am not sharing those secrets because then the point will not be crystal clear that inconsistencies hobble a library. instead, i'm going to use my ability to resolve the inconsistencies to create a _consistent_ version of the p.g. library, which _will_ be able to be scooped up in a single bite, and _many_ entities will then do it. i mean, for crying out loud, your system should have been designed such that it could be dropped on _any_ file-storage system to create a turn-key electronic-library, for one person or one hundred million. one click of one button should be all that it takes. boom! instant library, with every utility most people will ever need. but you haven't got _any_ of the pieces needed to make that happen. not a single one. this total blind spot is a very bad black mark on you. heck, if p.g. would've had good infrastructure in-place and working, negroponte might have been much more successful selling his x.o., seeings as how he could've pitched each one as chock-full of books, adding hundreds (thousands?) of dollars of value to each machine... "whaddya mean, you want ms-office? you've got _shakespeare_!" i want to put a full p.g. version on tens of thousands of hard-drives, thereby creating a hyper-redundant and world-wide e-library mesh, and see where _that_ kind of development lets us trampoline up to... but you're all too busy churning out even more inconsistent e-texts to do something like _that_... it's sad. i tell you, it's really, really sad. you call yourself an electronic library... but you can't see the forest for the trees... *** i brought this point to your attention, greg (and yours too, michael) when you hit 10,000. and you did nothing to fix the basic problem. now your library has hit 25,000, with 2.5 times the inconsistencies... are you gonna do something now? or will i be repeating at 50,000? you need to learn. -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080520/783744d2/attachment.htm From hart at pglaf.org Tue May 20 17:42:02 2008 From: hart at pglaf.org (Michael Hart) Date: Tue, 20 May 2008 17:42:02 -0700 (PDT) Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: Message-ID: I can tell you exactly why the OLPC didn't take our books, and in just one concept: They refused my suggestion to do a feasibility study. I wanted to run 10 books through their whole system, find out what needed improvement, then 100, 1,000, & finally 10,000. . .each time making adjustments that might not have been quite so obvious concerning some smaller numbers of books. Feasibility Study. It should be tatooed on the hands of all MBAs, etc., upon graduation today, just as "HOLD FAST" was quite literally tattoed on the knuckles of the old sailors of the 19th Century. Everyone seems to think they are so smart then could figure out all the details in advance. Duh! There is NO substitute for experience. The way to build up experience is trial runs. . . . It really IS as simple as that. Every time you come up with a new plan, try it out! Period. No exceptions. This is why I always answer those who know it all-- and there have been plenty, with encouragement from the heart, and mind, to try out their plans, with a lot of help from us, before going any further. If you can't take that first step, you are not soon going to complete that journey of many many steps. There is a reason people say to walk before running and it is the voice of experience. Some people run. . .and run with scissors. . . . mh On Tue, 20 May 2008, Bowerbird at aol.com wrote: > greg said: >> As mentioned on gutvol-d before, >> PG tried to make eBooks available for [current] OLPC XO system >> but they didn't end up taking many [or maybe any], >> due to some arbitrary file format restrictions they have. > > well, far be it from me to correct an executive officer from p.g. -- > and greg, please do let us know if you _do_ know better than me > -- but the reason o.l.p.c. didn't scoop up the p.g. library in one bite > is because your library has inconsistencies that make it _impossible_. > > and that is the _only_ reason. > > heck, even your _catalog_ is an unworkable mess. > > at the time the original o.l.p.c. person came around, i talked to him > -- he was a kid doing a summer internship -- and he had zero idea > of the complexities that awaited him. so, of course, he failed badly. > you can't even fathom the problems in 3 months, let alone solve 'em. > > the inconsistencies in your library make it unworkable _as_ a library. > > and _that_ is the reason o.l.p.c. didn't (couldn't) incorporate it. > > and that is the _only_ reason. > > a consistent library would be easy to re-engineer to "restrictions". > > you can't make a viewer-program for these books, because of their > _inconsistencies_, a point i've made here for well over 4 years now... > > _i_ can make such a viewer-program, because i know how to resolve > the inconsistencies, but i am not sharing those secrets because then > the point will not be crystal clear that inconsistencies hobble a library. > > instead, i'm going to use my ability to resolve the inconsistencies to > create a _consistent_ version of the p.g. library, which _will_ be able > to be scooped up in a single bite, and _many_ entities will then do it. > > i mean, for crying out loud, your system should have been designed > such that it could be dropped on _any_ file-storage system to create > a turn-key electronic-library, for one person or one hundred million. > > one click of one button should be all that it takes. > > boom! > > instant library, with every utility most people will ever need. > > but you haven't got _any_ of the pieces needed to make that happen. > not a single one. this total blind spot is a very bad black mark on you. > > heck, if p.g. would've had good infrastructure in-place and working, > negroponte might have been much more successful selling his x.o., > seeings as how he could've pitched each one as chock-full of books, > adding hundreds (thousands?) of dollars of value to each machine... > > "whaddya mean, you want ms-office? you've got _shakespeare_!" > > i want to put a full p.g. version on tens of thousands of hard-drives, > thereby creating a hyper-redundant and world-wide e-library mesh, > and see where _that_ kind of development lets us trampoline up to... > > but you're all too busy churning out even more inconsistent e-texts > to do something like _that_... it's sad. i tell you, it's really, really > sad. > > you call yourself an electronic library... > but you can't see the forest for the trees... > > *** > > i brought this point to your attention, greg (and yours too, michael) > when you hit 10,000. and you did nothing to fix the basic problem. > now your library has hit 25,000, with 2.5 times the inconsistencies... > are you gonna do something now? or will i be repeating at 50,000? > > you need to learn. > > -bowerbird > > > > ************** > Wondering what's for Dinner Tonight? Get new twists on family > favorites at AOL Food. > > (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) > From Morasch at aol.com Tue May 20 20:02:53 2008 From: Morasch at aol.com (Morasch at aol.com) Date: Tue, 20 May 2008 23:02:53 EDT Subject: [gutvol-d] can't see the forest for the trees Message-ID: michael hart said: > They refused my suggestion to do a feasibility study. bowerbird to michael: i've _done_ your feasibility study. i can tell you exactly how "feasible" it is to try and make an e-library with your 25,000 e-texts as-is: _not_at_all_. > I wanted to run 10 books through their whole system, > find out what needed improvement, > then 100, 1,000, & finally 10,000. . . > each time making adjustments that might not have been > quite so obvious concerning some smaller numbers of books. listen please, michael. i've done just that exact identical process. i have done runs of all sizes, and adjusted and readjusted wildly... and each time the answer comes up the same on the magic 8-ball: yes, this is definitely doable; not even that hard; however first you will need to remove the inconsistencies from the library... period... why? 'cause you can't treat an inconsistent mass programmatically. and any time you deal with the care and nurturing of an e-library, you're _compelled_by_reality_ to do such dealing programmatically. let me be very clear about this: i have examined this problem from the standpoint of a programmer trying to add value to your e-texts. and there's an incontrovertible law here: garbage-in-garbage-out. the thing is, you've got _diamonds_ in amongst all of your garbage... but as long as it's an inconsistent mass, nobody can mine them out... and the kicker is that it would be relatively _easy_ for you to fix this! a conscious decision that, from now on, you're gonna be consistent wouldn't _cost_ time or energy -- it would actually _save_ you some. moreover, it would put you on the right path to deal with the backlog. and once i knew the spill had been plugged, i could start cleaning up. but as long as you pile on _more_ inconsistency, you will lose ground. anyway, it no longer matters if you ignore what i'm telling you, since i'm on the way to creating my own consistent version of your library, and i _will_ give people a one-button turn-key means of working it... it's just too bad the words "project gutenberg" will not appear within... but the time has passed for the niceties of standing back in deference. -bowerbird ************** Wondering what's for Dinner Tonight? Get new twists on family favorites at AOL Food. (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080520/3f056f7c/attachment-0001.htm From ralf at ark.in-berlin.de Wed May 21 00:06:34 2008 From: ralf at ark.in-berlin.de (Ralf Stephan) Date: Wed, 21 May 2008 09:06:34 +0200 Subject: [gutvol-d] Grammar Error vs Logic Error In-Reply-To: <483339E6.4090508@perathoner.de> References: <483339E6.4090508@perathoner.de> Message-ID: <20080521070634.GB24648@ark.in-berlin.de> Marcello, I'd rather you'd waste time on the PGTEI patch I sent you (twice!) than on a useless flame war. And if this makes me end up in your killfile, please tell, so I can stop doing work on effectively unmaintained software. P***ed, ralf From hart at pglaf.org Wed May 21 10:09:51 2008 From: hart at pglaf.org (Michael Hart) Date: Wed, 21 May 2008 10:09:51 -0700 (PDT) Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: Message-ID: I think the problem, in your eyes, is that you are thinking too much like a computer, not enough like a human being. You want everything to line up perfectly. . . . Sorry, it doesn't have to do that for human beings to work with the eBooks, or even computers. . .unless they are VERY demandingly programmed. THe OLPC people never paid the slightest attention. . . . Seriously. No wonder they didn't get exactly what they wanted. I would have been silly to expect they would have. I would be silly to expect ANYone to get exactly what they wanted without some feedback, dare I say it. . . cybernetic. . .processes. You have claimed for years, and remarkably consistently, that your programs required better eBooks to work well. But you never made the real effort to bridge the gaps between your dream eBooks and those existing in reality. Nearly every reader CAN read our eBooks, computer or human. Some choose not to do so intentionally. I'm not really worried about those. Next in line not to worry about are those who don't take the necessary steps to get from where things ARE to where they WANT them to be. You always said the PG books were close to what you wanted, but what you did NOT do was provide the pathway, leading by example, to get to where you wanted to go. The longest journey, of a billion eBooks, starts with one. Just one. . .then just two. . .then just three. . .four... On Tue, 20 May 2008, Morasch at aol.com wrote: > michael hart said: >> They refused my suggestion to do a feasibility study. > > bowerbird to michael: i've _done_ your feasibility study. > > i can tell you exactly how "feasible" it is to try and make > an e-library with your 25,000 e-texts as-is: _not_at_all_. > > >> I wanted to run 10 books through their whole system, >> find out what needed improvement, >> then 100, 1,000, & finally 10,000. . . >> each time making adjustments that might not have been >> quite so obvious concerning some smaller numbers of books. > > listen please, michael. i've done just that exact identical process. > > i have done runs of all sizes, and adjusted and readjusted wildly... > > and each time the answer comes up the same on the magic 8-ball: > yes, this is definitely doable; not even that hard; however first you > will need to remove the inconsistencies from the library... period... > > why? 'cause you can't treat an inconsistent mass programmatically. > and any time you deal with the care and nurturing of an e-library, > you're _compelled_by_reality_ to do such dealing programmatically. > > let me be very clear about this: i have examined this problem from > the standpoint of a programmer trying to add value to your e-texts. > and there's an incontrovertible law here: garbage-in-garbage-out. > > the thing is, you've got _diamonds_ in amongst all of your garbage... > but as long as it's an inconsistent mass, nobody can mine them out... > > and the kicker is that it would be relatively _easy_ for you to fix this! > a conscious decision that, from now on, you're gonna be consistent > wouldn't _cost_ time or energy -- it would actually _save_ you some. > moreover, it would put you on the right path to deal with the backlog. > and once i knew the spill had been plugged, i could start cleaning up. > but as long as you pile on _more_ inconsistency, you will lose ground. > > anyway, it no longer matters if you ignore what i'm telling you, since > i'm on the way to creating my own consistent version of your library, > and i _will_ give people a one-button turn-key means of working it... > > it's just too bad the words "project gutenberg" will not appear within... > but the time has passed for the niceties of standing back in deference. > > -bowerbird > > > > ************** > Wondering what's for Dinner Tonight? Get new twists on family > favorites at AOL Food. > > (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) > From Bowerbird at aol.com Wed May 21 12:52:22 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 21 May 2008 15:52:22 EDT Subject: [gutvol-d] can't see the forest for the trees Message-ID: i said: > let me be very clear about this:? i have examined this problem from > the standpoint of a programmer trying to add value to your e-texts. i don't think i elaborated enough to become "very clear", so i will do so... i want you to understand that the perspective that informs my thoughts is the one derived from _writing_code_ to create desirable capabilities... my opinions are not based on "ideological concerns", where i'm trying to get you to adhere to my religious dogma, so you should take it on faith... nor am i motivated by lust for control, where i want you to do it my way. no sir, my focus is mundane -- "how can i write code to make this work?" mundane. but she is also a very strict and unforgiving mistress, this muse of code. a misplaced comma or semicolon in source-code, and a program fails. won't even _compile_. might not even say (clearly) what you did wrong. but believe you me, until you _find_ the problem, and _fix_ it correctly, she's gonna continue to balk, you're gonna continue to be obstructed... and just because you finally do get it to _run_ doesn't mean it's gonna _do_what_you_want_it_to_do_. there might be big bugs in your logic... but once your code _does_ do what you want it to do, or _exceeds_ that, then you _know_ without a shadow of a doubt that your formulas _work_. you _know_ you've got it _right_, and you don't have to take it on faith. because you've got working code. working code is a beautiful proof. you don't need anything more... *** so... when i say "you need to do this", what you should be _hearing_ is, "if you don't do this, us programmers aren't gonna be able to help you, and if us programmers can't help you, your electronic library won't fly." and i'm not talking about _me_ as one of your programmer helpers, i'm talking about the _dozens_and_dozens_ of programmers who will pop up into your existence just as soon as you make it so they _can_ negotiate the library, with code, simply. and it doesn't have to be with something as complex as an "a.p.i." for them to be able to help you... just make things _simple_ and _consistent_, so all those programmers can write simple programs with simple routines that will work correctly. make it easy for 'em to write working code. working code is beautiful proof. you don't need anything more. -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080521/58b5102d/attachment.htm From lee at novomail.net Wed May 21 13:47:02 2008 From: lee at novomail.net (Lee Passey) Date: Wed, 21 May 2008 14:47:02 -0600 Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: Message-ID: <48348A46.3000609@novomail.net> Bowerbird at aol.com wrote: > i brought this point to your attention, greg (and yours too, michael) > when you hit 10,000. and you did nothing to fix the basic problem. > now your library has hit 25,000, with 2.5 times the inconsistencies... > are you gonna do something now? or will i be repeating at 50,000? > > you need to learn. Look, over the course of several years two things have become blatantly obvious: 1. By just about every measure Project Gutenberg is severely broken. 2. Michael Hart (who is the only one whose opinion counts at PG) is adamantly opposed to any of the changes which would be required to fix it. Give up. Move on. If you're repeating this message at 50,000, it will be because you're the one who won't learn -- that PG is on a dead-end course, and will not alter its direction. Those of us who want a reliable archive of consistently formatted documents with accurate cataloging information (this is the "metadata" which you have so reviled in the past) will simply have to start over. I see no other alternative. From hart at pglaf.org Thu May 22 10:13:29 2008 From: hart at pglaf.org (Michael Hart) Date: Thu, 22 May 2008 10:13:29 -0700 (PDT) Subject: [gutvol-d] cleaning up the catalog In-Reply-To: References: Message-ID: When there are two different paper editions that differ as much as the ones listed below, I certainly don't mind if someone "merges" them, but I don't want to kill off the original editions, either. someone made some editing shoices there that were too obvious. mh On Mon, 19 May 2008, Bowerbird at aol.com wrote: > a couple of nice natural experiments have introduced themselves, > in checking on some possible duplicates in the library... > > first, the two "young captives" e-texts are entirely different. ok. > > second, the two "pearl box" e-texts are highly similar, but not identical... > this is a book "containing one hundred beautiful stories for young people", > each version contains a few completely different stories from the other one, > but there's no list (in either of the versions) of the differences between > them. > since the versions contain 90-95 identical stories, this seems like a book > that > would benefit greatly by having the two different versions _merged_ into one. > and of course comparison of the two versions could identify the errors in > each, > so that's the first of our two "natural experiments". > > third, the two "scranton high chums on the cinder path" look to be identical, > although there is some possibility they could be of slightly different > editions. > either way, a comparison of the two gives us our second "natural experiment". > > i'll let you know in the next few days how these experiments turn out... > > -bowerbird > > > > ************** > Wondering what's for Dinner Tonight? Get new twists on family > favorites at AOL Food. > > (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001) > From hart at pglaf.org Thu May 22 11:45:26 2008 From: hart at pglaf.org (Michael Hart) Date: Thu, 22 May 2008 11:45:26 -0700 (PDT) Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: <48348A46.3000609@novomail.net> References: <48348A46.3000609@novomail.net> Message-ID: On Wed, 21 May 2008, Lee Passey wrote: > Bowerbird at aol.com wrote: > >> i brought this point to your attention, greg (and yours too, >> michael) when you hit 10,000. and you did nothing to fix the >> basic problem. now your library has hit 25,000, with 2.5 times >> the inconsistencies... are you gonna do something now? or will i >> be repeating at 50,000? >> >> you need to learn. > > Look, over the course of several years two things have become > blatantly obvious: > > 1. By just about every measure Project Gutenberg is severely > broken. > > 2. Michael Hart (who is the only one whose opinion counts at PG) > is adamantly opposed to any of the changes which would be required > to fix it. I have supported all the people who wanted to make changes. They just haven't used that support to make the changes. > Give up. That's exactly what they have done. They don't need your encouragement to do more of that. > Move on. That's exactly what they have NOT done. > If you're repeating this message at 50,000, it will be because > you're the one who won't learn -- that PG is on a dead-end course, > and will not alter its direction. Anyone can "alter its direction". . .but without action. . .no. If you only put as much into action as into your words. Just get out there and DO 10 books the way YOU want as an example? Then 100. Then 1,000. Then 10,000. Then 100,000. However, if you are not willing to even start with 10, how can people find you to be their leader, if you go nowhere? > Those of us who want a reliable archive of consistently formatted > documents with accurate cataloging information (this is the > "metadata" which you have so reviled in the past) will simply have > to start over. I see no other alternative. Every level of progress was made by "starting over." So. . ."Get Started!" The starting line is wherever YOU put it. The starting gun goes off whenever YOU start. You!!! Go!!! Go!! Go! From joyce.b.wilson at sbcglobal.net Thu May 22 13:31:37 2008 From: joyce.b.wilson at sbcglobal.net (Joyce Wilson) Date: Thu, 22 May 2008 15:31:37 -0500 Subject: [gutvol-d] cleaning up the catalog Message-ID: <4835D829.8060909@sbcglobal.net> > > boy, what a mess the p.g. catalog is! > > i cleaned the info for the english e-texts 10000-14000: > >/ http://z-m-l.com/misc/cata10-14-all.html > / > this is what i need, and might not be useful to p.g. (sorry), > but i'm happy to share it. > > here's a more-concentrated list, showing many of the > multiple-item e-texts, which were particularly messy: > >/ http://z-m-l.com/misc/cata10-14-repeats.html > / > this exercise suggests that the post-processors/whitewashers > might want to see how items in a series were posted in the past > when preparing additional items from the series for submission, > with the intent of minimizing the inconsistencies... > > -bowerbird > > p.s. if anyone has any questions on what i've done, or why, > or anything related to this, i will be happy to address them... > It looks to me like your data comes from the etext file headers, and not from the bibliographic records in the catalog. The list includes several titles by Edith Van Dyne. In the catalog, Edith Van Dyne exists only as a pseudonym for L. Frank Baum. Baum's name is the one attached to the works in the bibliographic records. Not to claim that there are no inconsistencies in the catalog, but from a cataloger's point of view it's not reasonable to tally every file header inconsistency as a catalog problem. The catalog is big. The cataloging team is small. It may be a long time before the problems that bug you get noticed. To speed things up, send your pet catalog peeves to: catalog at pglaf.org Joyce W From schultzk at uni-trier.de Fri May 23 02:48:51 2008 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Fri, 23 May 2008 11:48:51 +0200 Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: <48348A46.3000609@novomail.net> Message-ID: Hi Micheal, hi PG, hi DP, Yes, you allow any willing to do things, yet you refuse to instate stricter standards so that consolidation is possible. Yes, humans can handle the inconsitencies, but E-texts, E-books, and PG is handle by computers. Therefore, more structure is required. In the modern computing world it is possible to set standards which lend to humans and computers alike. I do admit that most do not master these arts or simply do understand how to use and apply them to a particular tasks. On the otherside the implementation of such standards and upholding them consume over 80% of the resources in the beginning over 50% during transition, yet less than 10% once everybody is on board. Furthermore, once a change becomes necessary the effort is minimal. Any programmer or anyone working with a large heterogeneous project knows how important well designed standards and their adherence is. As Lee states PG [and DP] are broken in this sense. That is why I do not actively participate. There are no sufficient rules to follow. Bowerbird does want he can, yet there is no simple solution, because he get the job done only to about 95%. It is the other 5% which is the most important one. Without it success is very far away. Regards Keith. Am 22.05.2008 um 20:45 schrieb Michael Hart: > > > On Wed, 21 May 2008, Lee Passey wrote: > >> Bowerbird at aol.com wrote: >> >>> i brought this point to your attention, greg (and yours too, >>> michael) when you hit 10,000. and you did nothing to fix the >>> basic problem. now your library has hit 25,000, with 2.5 times >>> the inconsistencies... are you gonna do something now? or will i >>> be repeating at 50,000? >>> >>> you need to learn. >> >> Look, over the course of several years two things have become >> blatantly obvious: >> >> 1. By just about every measure Project Gutenberg is severely >> broken. >> >> 2. Michael Hart (who is the only one whose opinion counts at PG) >> is adamantly opposed to any of the changes which would be required >> to fix it. > > I have supported all the people who wanted to make changes. > > They just haven't used that support to make the changes. > >> Give up. > > That's exactly what they have done. > > They don't need your encouragement to do more of that. > > > >> Move on. > > That's exactly what they have NOT done. > > > >> If you're repeating this message at 50,000, it will be because >> you're the one who won't learn -- that PG is on a dead-end course, >> and will not alter its direction. > > Anyone can "alter its direction". . .but without action. . .no. > > If you only put as much into action as into your words. > > Just get out there and DO 10 books the way YOU want as an example? > > Then 100. > > Then 1,000. > > Then 10,000. > > Then 100,000. > > However, if you are not willing to even start with 10, > how can people find you to be their leader, if you go nowhere? > > >> Those of us who want a reliable archive of consistently formatted >> documents with accurate cataloging information (this is the >> "metadata" which you have so reviled in the past) will simply have >> to start over. I see no other alternative. > > > Every level of progress was made by "starting over." > > So. . ."Get Started!" > > The starting line is wherever YOU put it. > > The starting gun goes off whenever YOU start. > > You!!! > > Go!!! > From hart at pglaf.org Fri May 23 11:16:24 2008 From: hart at pglaf.org (Michael Hart) Date: Fri, 23 May 2008 11:16:24 -0700 (PDT) Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: <48348A46.3000609@novomail.net> Message-ID: On Fri, 23 May 2008, Schultz Keith J. wrote: > Hi Micheal, hi PG, hi DP, > > > Yes, you allow any willing to do things, yet > you refuse to instate stricter standards so that > consolidation is possible. Not promoting someone to an official "czar of standards" position is hardly the same as opposing their efforts. Anyone who wants to "consolidate" should have to do the work of "consilidation" themselves and with volunteer help, but NOT by a fiat standard being imposed officially. Project Gutenberg should be an open standard to as much of a degree as possible. . .not completely. . .but close. However, there has never been ANY objection to the standards DP has imposed, or any other group imposes on themselves and we are only to glad to help them PROMOTE those standards for any and all to adopt. . .but FORCING those standards: NO!!! The people who say we RESIST THEIR STANDARDS are seriously-- perhaps even intentionlly--misrepresenting the situtation. > Yes, humans can handle the inconsitencies, but > E-texts, E-books, and PG is handle by computers. I guess it all depends on what systems you are lookin at. However, I repeat, Project Gutenberg is NOT a Xerox machine. This was never meant to be a completely automated process, from end to end, which leaves room for humen intervention, either to create those standards you say you want but will not actually do the work for, or by those who will resist, and create or maintain other standards. It seems as if you had your way, every paper book would've been require to be the same height for standard shelving-- a great idea for mass-production of library or bookstore's shelving, but somehow it just has never taken hold. Why not? Yes, once "everybody is on board" as you say below, things could be much better, but you aren't even trying to get an even early population on board to try things out. I believe in feasibility studies. Why? Because I have learned from experience. Feasibility studies give that experience a home to start. If you are unwilling to start, you are unwilling to finish. So many complaints by people their idea is not completed, when they have never even really started. Get started!!! "Build it. . .and they will come!" Don't build it. . .and they can't come, can they? So many people want everyone on board the same train, they they refuse to even lay the first file of track, build the first stations, or be the little engine.... that could. . . . Go for it!!! Michael > Therefore, more structure is required. In the modern > computing world it is possible to set standards > which lend to humans and computers alike. I do admit > that most do not master these arts or simply do > understand > how to use and apply them to a particular tasks. > > On the otherside the implementation of such standards and > upholding them consume over 80% of the resources in the > beginning > over 50% during transition, yet less than 10% once > everybody is > on board. Furthermore, once a change becomes necessary > the effort > is minimal. > > Any programmer or anyone working with a large > heterogeneous project > knows how important well designed standards and their > adherence is. > > As Lee states PG [and DP] are broken in this sense. That > is why I do > not actively participate. There are no sufficient rules > to follow. > Bowerbird does want he can, yet there is no simple > solution, because > he get the job done only to about 95%. It is the other 5% > which is the > most important one. Without it success is very far away. > > Regards > Keith. > > > Am 22.05.2008 um 20:45 schrieb Michael Hart: > >> >> >> On Wed, 21 May 2008, Lee Passey wrote: >> >>> Bowerbird at aol.com wrote: >>> >>>> i brought this point to your attention, greg (and yours too, >>>> michael) when you hit 10,000. and you did nothing to fix the >>>> basic problem. now your library has hit 25,000, with 2.5 >>>> times >>>> the inconsistencies... are you gonna do something now? or >>>> will i >>>> be repeating at 50,000? >>>> >>>> you need to learn. >>> >>> Look, over the course of several years two things have become >>> blatantly obvious: >>> >>> 1. By just about every measure Project Gutenberg is severely >>> broken. >>> >>> 2. Michael Hart (who is the only one whose opinion counts at >>> PG) >>> is adamantly opposed to any of the changes which would be >>> required >>> to fix it. >> >> I have supported all the people who wanted to make changes. >> >> They just haven't used that support to make the changes. >> >>> Give up. >> >> That's exactly what they have done. >> >> They don't need your encouragement to do more of that. >> >> >> >>> Move on. >> >> That's exactly what they have NOT done. >> >> >> >>> If you're repeating this message at 50,000, it will be because >>> you're the one who won't learn -- that PG is on a dead-end >>> course, >>> and will not alter its direction. >> >> Anyone can "alter its direction". . .but without action. . .no. >> >> If you only put as much into action as into your words. >> >> Just get out there and DO 10 books the way YOU want as an >> example? >> >> Then 100. >> >> Then 1,000. >> >> Then 10,000. >> >> Then 100,000. >> >> However, if you are not willing to even start with 10, >> how can people find you to be their leader, if you go nowhere? >> >> >>> Those of us who want a reliable archive of consistently >>> formatted >>> documents with accurate cataloging information (this is the >>> "metadata" which you have so reviled in the past) will simply >>> have >>> to start over. I see no other alternative. >> >> >> Every level of progress was made by "starting over." >> >> So. . ."Get Started!" >> >> The starting line is wherever YOU put it. >> >> The starting gun goes off whenever YOU start. >> >> You!!! >> >> Go!!! >> > From tb at baechler.net Sat May 24 02:00:13 2008 From: tb at baechler.net (Tony Baechler) Date: Sat, 24 May 2008 02:00:13 -0700 Subject: [gutvol-d] Why stay with PG? Message-ID: <20080524090013.GA17892@investigative.net> Hello all, I rarely post here because of the bickering and general disagreements, but I have to comment on many of the recent posts pointing out problems with PG and generally complaining on how things are done. I won't comment on whether any of you are right or not. As far as I'm concerned, I don't care who's right as long as there are more books. As long as they're in plain text, that's good enough for me. :-) I do have a question for all of you who frequently point out problems with PG and how Michael does things. Why do you stick with PG year after year? Why not just abandon PG and start your own project? Web hosting is incredibly cheap nowadays. Even dedicated servers aren't THAT expensive. Michael and Greg Newby have offered free server space many times to anyone who asks, but I'm assuming that you want to distance yourselves from PG. So, again I ask. Why bother? If you think PG has poor standards, why not create your own? OK, so you would rather use the already existing PG ebooks and reformat them. Fine, get the PG DVD or take Greg's offer of free server space and build on them. Various people have said that all ebooks must/should be in xml, tei, etc. OK, so do what blackmask.com used to do and reformat them under your own domain and your own standards. If you remove the PG name and small print, you don't even have to credit PG if I understand correctly. What prevents you from jumping ship and doing your own thing? My personal opinion is as I said above. I don't really care what format book X is in as long as I can convert it to plain text. If PG releases 25,000 ebooks and all of the people who have complaints against PG each release a few more books, that's a few more books more than the 25,000 already available. Competition isn't all bad. For people who don't like DP, create your own DP competitor. I would like to see a bunch of DP-like organizations all trying to produce the best quality. I can get the last laugh because it's all public domain anyway and the books will still eventually make their way to PG, albeit probably not by the person who created the DP competitors or their own versions of PG. Otherwise, if it's all talk with no intention to do anything, shut up and don't waste bandwidth for no good reason. Personally I think it's all talk and that is what it looks like. It must not be 100% talk though since ebookforge.net somehow got created and has produced a few files for PG. Please don't reply off list as all non-list email gets automatically deleted. From prosfilaes at gmail.com Sat May 24 03:56:53 2008 From: prosfilaes at gmail.com (David Starner) Date: Sat, 24 May 2008 06:56:53 -0400 Subject: [gutvol-d] Why stay with PG? In-Reply-To: <20080524090013.GA17892@investigative.net> References: <20080524090013.GA17892@investigative.net> Message-ID: <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> On Sat, May 24, 2008 at 5:00 AM, Tony Baechler wrote: > I do have a question for all of you who frequently point out problems > with PG and how Michael does things. Why do you stick with PG year > after year? Because PG is the source for ebooks; if I started posting somewhere else, I'd lose half my audience. And Michael is no longer the end-all and be-all of PG; I feel perfectly fine ignoring him and being part of PG. > I can get > the last laugh because it's all public domain anyway and the books will > still eventually make their way to PG, albeit probably not by the person > who created the DP competitors or their own versions of PG. So you get the last laugh because you're wasting effort? Do you know that "The Brothers Karamazov" is just now going through DP? Do you have any idea how much material is out that PG doesn't have, and probably won't have for decades, if ever? Look at and see just how much material has been out there for years that we've never touched. It's surely not going to speed up if you drive people away. From Bowerbird at aol.com Sat May 24 11:46:08 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 24 May 2008 14:46:08 EDT Subject: [gutvol-d] Why stay with PG? Message-ID: tony said: > Why stay with PG? well, because i love project gutenberg. and i love michael hart. and i love the work he's done as an important e-book pioneer... moreover, thanks to his insistence on the plain-text baseline, i eventually recognized the full power of a no-markup format, which not even michael grasps fully. so i'm willing to stick with project gutenberg to make it better... but it seems that i haven't been able to do that with simple logic. so now my course of action will be to mount my own mirror and show the superiority of no-markup format applied consistently. so, in one sense, yes, i'll then be "abandoning" project gutenberg. because there won't be one mention of it in my library. can't be... and it won't be easy for p.g. to back-assimilate my files, because i will have tossed out a lot of work that was done by its volunteers (like hand-crafted -- and thus unmaintainable -- .html versions). but i'll still be here, every day, having discussions in the lobby of the project gutenberg library, talking about how i can do _more_, with less work, because i took the simple step of _consistency_... because one of my biggest flaws is saying "i told you so"... and one of my biggest virtues is that i know that love doesn't walk out the door... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080524/8418a246/attachment.htm From Bowerbird at aol.com Sat May 24 13:33:49 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 24 May 2008 16:33:49 EDT Subject: [gutvol-d] can't see the forest for the trees Message-ID: i was gonna wait until after the holiday to post any more, but since i just posted a message in reply to tony, i'll do this, and finish up this thought. michael, you seem to be on autoresponder, because you haven't seemed to notice that we're not communicating... you seem to think this issue is about _file-formats_. it's not. it's about _consistency_ -- to your established conventions... for a couple examples, look in your f.a.q., and you'll see that it calls for section headers to be presented in a certain way: > For a standard novel, you can choose either > four blank lines before the chapter heading and two lines after, > or three lines before and one line after, but whichever you use, > do try to keep it consistent throughout. notice right there that your own guidelines stress _consistency_... but, as if contradicting yourself, you tell people two ways to do it. now, just one hour ago, i got the posted digest listing 2 updates, to e-texts #172 and #173. both of those newly-redone e-books have three blank lines before and two blank lines after, which is different from either of the two options that you gave above... to rephrase it, your e-texts aren't consistent with your own f.a.q. and i don't see that this inconsistency buys you _anything_ at all. it is of absolutely no benefit to you. the only real effect that it has is to make it difficult (sometimes to the point of total impossibility) for programmers to deal with the library, and add some value to it. this has absolutely nothing to do with resisting dubious file-formats. i'm with you 100% that you should continue to exercise such resistance. but it has everything to do with adhering to the standards you've set... because if you don't adhere to them, what's the purpose of having 'em? let's continue on with that same section in the f.a.q.: > Normally, you should move chapter headings to the left > rather than try to imitate the centering that is used in some books. this is a good idea. and it used to be that the e-texts were consistent in following this rule. but lately, there are more and more cases where this is being disregarded, and various things are being centered. why? -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080524/10bda1a7/attachment.htm From hart at pglaf.org Sat May 24 16:26:47 2008 From: hart at pglaf.org (Michael Hart) Date: Sat, 24 May 2008 16:26:47 -0700 (PDT) Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: Message-ID: On Sat, 24 May 2008, Bowerbird at aol.com wrote: > i was gonna wait until after the holiday to post any more, > but since i just posted a message in reply to tony, i'll do this, > and finish up this thought. > > michael, you seem to be on autoresponder, because you > haven't seemed to notice that we're not communicating... > > you seem to think this issue is about _file-formats_. it's not. > it's about _consistency_ -- to your established conventions... > > for a couple examples, look in your f.a.q., and you'll see that > it calls for section headers to be presented in a certain way: you seem to think I'm not paying attn, just because i don't act in accordance with your wishes. Consistency is fine for within the various Project Gutenberg groups, etc., and I'm sure YOU are are auto-respncer if you continue this pretense of not having been told that before, on many occasions. Wake up and start your own group to make at least 10 books your way. If you never do 10, then 100, etc., how can anyone agree with your exemplified production techniques? How can someone "get on board" if you are not going anywhere? You know this. Don't pretend. Go For It!!! > >> For a standard novel, you can choose either >> four blank lines before the chapter heading and two lines after, >> or three lines before and one line after, but whichever you use, >> do try to keep it consistent throughout. > > notice right there that your own guidelines stress _consistency_... But this "_consistency_" is not forced on anyone. > > but, as if contradicting yourself, you tell people two ways to do it. There have ALWAYS been more than one way, but no one is stopping YOU, or any of the others, ALL of which could obviously do eBook production better than is being done, for actually DOING it. . . Write your own standards! We'll put them right up there with the ones you don't like. You would/could/should have done this years ago if YOU were not on autoresponder. _I_ don't make rules! YOU don't like that! Make your OWN rules!!! Go For it!! Go! Go!! Go!!! Hooray your your side!!! Now DO something. . .PLEASE!!! Michael > > now, just one hour ago, i got the posted digest listing 2 updates, > to e-texts #172 and #173. both of those newly-redone e-books > have three blank lines before and two blank lines after, which is > different from either of the two options that you gave above... > > to rephrase it, your e-texts aren't consistent with your own f.a.q. > > and i don't see that this inconsistency buys you _anything_ at all. > it is of absolutely no benefit to you. the only real effect that it has > is to make it difficult (sometimes to the point of total impossibility) > for programmers to deal with the library, and add some value to it. > > this has absolutely nothing to do with resisting dubious file-formats. > i'm with you 100% that you should continue to exercise such resistance. > > but it has everything to do with adhering to the standards you've set... > because if you don't adhere to them, what's the purpose of having 'em? > > let's continue on with that same section in the f.a.q.: >> Normally, you should move chapter headings to the left >> rather than try to imitate the centering that is used in some books. > > this is a good idea. and it used to be that the e-texts were consistent > in following this rule. but lately, there are more and more cases where > this is being disregarded, and various things are being centered. why? > > -bowerbird > > > > ************** > Get trade secrets for amazing burgers. Watch "Cooking with > Tyler Florence" on AOL Food. > (http://food.aol.com/tyler-florence?video=4& > ?NCID=aolfod00030000000002) > From hart at pglaf.org Sat May 24 16:57:06 2008 From: hart at pglaf.org (Michael Hart) Date: Sat, 24 May 2008 16:57:06 -0700 (PDT) Subject: [gutvol-d] Why stay with PG? In-Reply-To: <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> References: <20080524090013.GA17892@investigative.net> <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> Message-ID: On Sat, 24 May 2008, David Starner wrote: > On Sat, May 24, 2008 at 5:00 AM, Tony Baechler > wrote: >> I do have a question for all of you who frequently point out >> problems with PG and how Michael does things. Why do you stick >> with PG year after year? > > Because PG is the source for ebooks; if I started posting > somewhere else, I'd lose half my audience. And Michael is no > longer the end-all and be-all of PG; I feel perfectly fine > ignoring him and being part of PG. That's the whole point. No one should be the end-all and be-all of Project Gutenberg. Now that that has been clarified, let's get everyone out from behind their supposed eight-ball and have them get going. >> I can get the last laugh because it's all public domain anyway >> and the books will still eventually make their way to PG, albeit >> probably not by the person who created the DP competitors or >> their own versions of PG. > > So you get the last laugh because you're wasting effort? Do you > know that "The Brothers Karamazov" is just now going through DP? > Do you have any idea how much material is out that PG doesn't > have, and probably won't have for decades, if ever? Look at > and see just how much material has > been out there for years that we've never touched. It's surely not > going to speed up if you drive people away. > list gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From dakretz at gmail.com Sat May 24 21:07:04 2008 From: dakretz at gmail.com (don kretz) Date: Sat, 24 May 2008 21:07:04 -0700 Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 27 In-Reply-To: References: Message-ID: <627d59b80805242107v672dac32jc8e280e0aa9b64be@mail.gmail.com> > > > > ---------- Forwarded message ---------- > From: Tony Baechler > To: gutvol-d at lists.pglaf.org > Date: Sat, 24 May 2008 02:00:13 -0700 > Subject: [gutvol-d] Why stay with PG? > > | > | ... or take Greg's offer of free server space ... > | What is the nature of this offer? I'm not familiar with it. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080524/77f70492/attachment.htm From sly at victoria.tc.ca Sat May 24 22:36:59 2008 From: sly at victoria.tc.ca (Andrew Sly) Date: Sat, 24 May 2008 22:36:59 -0700 (PDT) Subject: [gutvol-d] Why stay with PG? In-Reply-To: <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> References: <20080524090013.GA17892@investigative.net> <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> Message-ID: On Sat, 24 May 2008, David Starner wrote: Do you know > that "The Brothers Karamazov" is just now going through DP? Do you > have any idea how much material is out that PG doesn't have, and > probably won't have for decades, if ever? Look at > and see just how much material has been > out there for years that we've never touched. It's surely not going to > speed up if you drive people away. In looking up details for items in the PG catalog, I sometimes do a general search on an author name to try to check dates from another source. Surprisingly often I find other texts online by the same author on other sites that are not on PG--sometimes a whole collection of them. I believe the amount of such material which is around is larger then we generally realize. Even the extensive listings of the Online Books Page is just the tip of the iceberg. Andrew From gbnewby at pglaf.org Sat May 24 23:21:59 2008 From: gbnewby at pglaf.org (Greg Newby) Date: Sat, 24 May 2008 23:21:59 -0700 Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 27 In-Reply-To: <627d59b80805242107v672dac32jc8e280e0aa9b64be@mail.gmail.com> References: <627d59b80805242107v672dac32jc8e280e0aa9b64be@mail.gmail.com> Message-ID: <20080525062159.GA2592@mail.pglaf.org> On Sat, May 24, 2008 at 09:07:04PM -0700, don kretz wrote: > > > > > > > > ---------- Forwarded message ---------- > > From: Tony Baechler > > To: gutvol-d at lists.pglaf.org > > Date: Sat, 24 May 2008 02:00:13 -0700 > > Subject: [gutvol-d] Why stay with PG? > > > > | > > | ... or take Greg's offer of free server space ... > > | > > > > What is the nature of this offer? I'm not familiar with it. I have several rather large systems that I'm perpetually happy to provide access to for various projects. These include: snowy.arsc.alaska.edu readingroo.ms and for mailing lists, lists.pglaf.org snowy & readingroo have complete copies of the PG collection. I am working on a plan, which I'll share here in my next message, to provide user updates to copies of the PG collection, as well as features like personal bookshelves and commentary on eBooks. I know lots of people have their own ideas about such things, and encourage them to share their plans and/or get involved with common efforts. As was recently pointed out on the gutvol-d list, eBookforge is one project that resulted from such a "spin-off" effort. There are others, ranging in size from the Project Gutenberg Consortia Collection [gutenberg.cc] to Bowerbird's own collection of ZML-enabled eBooks [which partially lives on snowy]. -- Greg From tb at baechler.net Sun May 25 02:11:11 2008 From: tb at baechler.net (Tony Baechler) Date: Sun, 25 May 2008 02:11:11 -0700 Subject: [gutvol-d] Why stay with PG? In-Reply-To: <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> References: <20080524090013.GA17892@investigative.net> <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> Message-ID: <48392D2F.6070509@baechler.net> David Starner wrote: > On Sat, May 24, 2008 at 5:00 AM, Tony Baechler wrote: > >> I do have a question for all of you who frequently point out problems >> with PG and how Michael does things. Why do you stick with PG year >> after year? >> > > Because PG is the source for ebooks; if I started posting somewhere > else, I'd lose half my audience. And Michael is no longer the end-all > and be-all of PG; I feel perfectly fine ignoring him and being part of > PG. > > Yes, but what about the Internet Archive? You could submit texts to them just as easily. With a little funding, you could advertise on Google and other places to bring in an audience. There are many sites which use PG files, often without credit. Search for any popular public domain book and you'll find a bunch. >> I can get >> the last laugh because it's all public domain anyway and the books will >> still eventually make their way to PG, albeit probably not by the person >> who created the DP competitors or their own versions of PG. >> > > So you get the last laugh because you're wasting effort? Do you know > that "The Brothers Karamazov" is just now going through DP? Do you > have any idea how much material is out that PG doesn't have, and > probably won't have for decades, if ever? Look at > and see just how much material has been > out there for years that we've never touched. It's surely not going to > speed up if you drive people away. > __ Yes, I think I'm aware of how much PG doesn't have. Look at Google books, the Library of Congress, IA, etc. That doesn't count libraries in other countries with non-English texts. The thing is that I can wait. If I just couldn't wait, either I would buy a used reprint and scan it myself or would set up any of several free OCR packages and process the already available page images. You somewhat misunderstand me though. I'm not saying that people would be driven away. I'm saying that if we have a bunch of DP spinoffs and DP-like competition going on, essentially twice or three times more books can be produced. Even if those DP spinoffs don't post to PG, at least high quality ebooks would be available for PG harvesting. If they all did post to PG, instead of however many books DP currently produces per day, multiply that by two, three, ten, etc, depending on how many organizations there are and how well they all produce. The resources are out there for those who want to tap them. Regarding losing your audience, all I can say is that anyone can put up web sites and anyone can find them with good search engines. Between the PG newsletter (when one is actually posted), the IA forums, Google itself, and the many other book sites and newsgroups out there, I really don't see how you could say that you would be driving your audience away. As with all new projects it would start small as one could expect, but it could grow over time as PG Australia has. Personally, I think more of the free software people should be targetted. I see a paralel between making and producing free, GPL software and making and producing public domain ebooks. Oh, there's Creative Commons also. While CC is mostly interested in their own licenses, they also could push for more public domain ebooks. I don't think and never meant to imply that people's efforts should be ignored, duplicated, etc. I don't believe in wasting effort anymore than most people. I don't want to drive people away either. I'm just saying that with a small amount of effort to start separate projects, more could be produced, not less. One could still post anything they produce to PG but have multiple DPs doing the actual work. DP already has a harvesting page dealing with IA and some others. Split those efforts into new organizations with more proofers and, if the founder of such splits feels strongly enough, higher or different standards. This is similar to what DP and DP Europe have done. They both produce, they both post to PG, and some people proof for both even though they are separate and deal with different areas. I would like to see a DP Australia, or even DPs not specific to any country. Likewise, their could be PG spinoffs with their own standards. Let's make up a PG spinoff which will only post html and pdf and won't accept plain text. Well, since DP would still be posting to PG, they would of course still be producing plain text. However, they could post their special pdf and html to the PG spinoff. This way we have multiple DP organizations, we still have a central PG which accepts everything, and there would be one or many PG spinoffs which would take only page images, only html, only pdf, only some other format, some combination of the above, etc. If the sites are developed correctly with appropriate keywords, I as a scholar could find some obscure text with complete page images and a nicely formatted document, while an average reader could find the basic plain text. As I said previously, I would still get the last laugh because all of the DPs and PGs would eventually be posted or harvested to the original PG one of these days and I can wait until that happens. Please don't send email to me off-list as all non-list email gets automatically deleted. From prosfilaes at gmail.com Sun May 25 08:04:43 2008 From: prosfilaes at gmail.com (David Starner) Date: Sun, 25 May 2008 11:04:43 -0400 Subject: [gutvol-d] Why stay with PG? In-Reply-To: <48392D2F.6070509@baechler.net> References: <20080524090013.GA17892@investigative.net> <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> <48392D2F.6070509@baechler.net> Message-ID: <6d99d1fd0805250804g2cd42728sf8446b7d66bac366@mail.gmail.com> You said "shut up and don't waste bandwidth for no good reason." Try it out; remember that this is running over megabit and gigabit fiber, but people are much lower bandwidth. Of course, > Please don't send email to me off-list as all non-list email gets > automatically deleted. also wastes bandwidth for no good reason. I don't know how you can honestly claim that you don't want to drive away people, if you want to get the "last laugh". You bring up DP, but that's a red herring; the question was "Why do you stick with PG year after year?" From paulmaas at airpost.net Sun May 25 11:42:01 2008 From: paulmaas at airpost.net (Paul Maas) Date: Sun, 25 May 2008 11:42:01 -0700 Subject: [gutvol-d] Why stay with PG? In-Reply-To: <48392D2F.6070509@baechler.net> References: <20080524090013.GA17892@investigative.net> <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> <48392D2F.6070509@baechler.net> Message-ID: <1211740921.8444.1254998623@webmail.messagingengine.com> Maybe PG needs to recast itself as a text archive. A sort of text "commons", if you will. PG could allow multiple submissions from different projects for a particular book title, and let the reader pick the one they'd prefer to read. Even if PG now views itself this way, the perception is still "PG has one core text per book." So what is PG? What should PG be? On Sun, 25 May 2008 02:11:11 -0700, "Tony Baechler" said: > David Starner wrote: > > On Sat, May 24, 2008 at 5:00 AM, Tony Baechler wrote: > > > >> I do have a question for all of you who frequently point out problems > >> with PG and how Michael does things. Why do you stick with PG year > >> after year? > >> > > > > Because PG is the source for ebooks; if I started posting somewhere > > else, I'd lose half my audience. And Michael is no longer the end-all > > and be-all of PG; I feel perfectly fine ignoring him and being part of > > PG. > > > > > Yes, but what about the Internet Archive? You could submit texts to > them just as easily. With a little funding, you could advertise on > Google and other places to bring in an audience. There are many sites > which use PG files, often without credit. Search for any popular public > domain book and you'll find a bunch. > >> I can get > >> the last laugh because it's all public domain anyway and the books will > >> still eventually make their way to PG, albeit probably not by the person > >> who created the DP competitors or their own versions of PG. > >> > > > > So you get the last laugh because you're wasting effort? Do you know > > that "The Brothers Karamazov" is just now going through DP? Do you > > have any idea how much material is out that PG doesn't have, and > > probably won't have for decades, if ever? Look at > > and see just how much material has been > > out there for years that we've never touched. It's surely not going to > > speed up if you drive people away. > > __ > > Yes, I think I'm aware of how much PG doesn't have. Look at Google > books, the Library of Congress, IA, etc. That doesn't count libraries > in other countries with non-English texts. The thing is that I can > wait. If I just couldn't wait, either I would buy a used reprint and > scan it myself or would set up any of several free OCR packages and > process the already available page images. You somewhat misunderstand > me though. I'm not saying that people would be driven away. I'm saying > that if we have a bunch of DP spinoffs and DP-like competition going on, > essentially twice or three times more books can be produced. Even if > those DP spinoffs don't post to PG, at least high quality ebooks would > be available for PG harvesting. If they all did post to PG, instead of > however many books DP currently produces per day, multiply that by two, > three, ten, etc, depending on how many organizations there are and how > well they all produce. The resources are out there for those who want > to tap them. Regarding losing your audience, all I can say is that > anyone can put up web sites and anyone can find them with good search > engines. Between the PG newsletter (when one is actually posted), the > IA forums, Google itself, and the many other book sites and newsgroups > out there, I really don't see how you could say that you would be > driving your audience away. As with all new projects it would start > small as one could expect, but it could grow over time as PG Australia > has. Personally, I think more of the free software people should be > targetted. I see a paralel between making and producing free, GPL > software and making and producing public domain ebooks. Oh, there's > Creative Commons also. While CC is mostly interested in their own > licenses, they also could push for more public domain ebooks. > > I don't think and never meant to imply that people's efforts should be > ignored, duplicated, etc. I don't believe in wasting effort anymore > than most people. > I don't want to drive people away either. I'm just saying that with a > small amount of effort to start separate projects, more could be > produced, not less. One could still post anything they produce to PG > but have multiple DPs doing the actual work. DP already has a > harvesting page dealing with IA and some others. Split those efforts > into new organizations with more proofers and, if the founder of such > splits feels strongly enough, higher or different standards. This is > similar to what DP and DP Europe have done. They both produce, they > both post to PG, and some people proof for both even though they are > separate and deal with different areas. I would like to see a DP > Australia, or even DPs not specific to any country. > > Likewise, their could be PG spinoffs with their own standards. Let's > make up a PG spinoff which will only post html and pdf and won't accept > plain text. Well, since DP would still be posting to PG, they would of > course still be producing plain text. However, they could post their > special pdf and html to the PG spinoff. This way we have multiple DP > organizations, we still have a central PG which accepts everything, and > there would be one or many PG spinoffs which would take only page > images, only html, only pdf, only some other format, some combination of > the above, etc. If the sites are developed correctly with appropriate > keywords, I as a scholar could find some obscure text with complete page > images and a nicely formatted document, while an average reader could > find the basic plain text. As I said previously, I would still get the > last laugh because all of the DPs and PGs would eventually be posted or > harvested to the original PG one of these days and I can wait until that > happens. > > > Please don't send email to me off-list as all non-list email gets > automatically deleted. > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d -- Paul Maas paulmaas at airpost.net -- http://www.fastmail.fm - I mean, what is it about a decent email service? From Morasch at aol.com Sun May 25 12:19:26 2008 From: Morasch at aol.com (Morasch at aol.com) Date: Sun, 25 May 2008 15:19:26 EDT Subject: [gutvol-d] can't see the forest for the trees Message-ID: michael said: > you seem to think I'm not paying attn, just > because i don't act in accordance with your wishes. no, i don't think you're paying attention because you haven't addressed the logical argument that there's no purpose served by the inconsistencies while there _are_ great benefits of consistency... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080525/4d53a256/attachment.htm From hart at pglaf.org Sun May 25 13:37:05 2008 From: hart at pglaf.org (Michael Hart) Date: Sun, 25 May 2008 13:37:05 -0700 (PDT) Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: Message-ID: On Sun, 25 May 2008, Morasch at aol.com wrote: > michael said: >> you seem to think I'm not paying attn, just >> because i don't act in accordance with your wishes. > > no, i don't think you're paying attention because > you haven't addressed the logical argument that > there's no purpose served by the inconsistencies > while there _are_ great benefits of consistency... > > -bowerbird And you refuse, after being told so many times, just give your own consistency a trial run from which to garner support. After all, if YOU are unwilling to lead how can anyone else be expected to follow? You SAY you respect me because I got out there, did the grunt work to provide an example. . . . Now how about a little SELF-RESPECT and getting out there and setting your own examples? Rather than just contradicting yourself, SAYING you respect me and my work, NOT DOING the work. Actions speak more loudly that words. You have put SO many works out here that I will have to admit I can understand those who say it it too much and thus won't read them. JUST DO IT!!! GO!!! GO!!! GO!!! WIN!!! WIN!!! WIN!!! STOP TALKING!!! DO!!! From hart at pglaf.org Sun May 25 13:39:12 2008 From: hart at pglaf.org (Michael Hart) Date: Sun, 25 May 2008 13:39:12 -0700 (PDT) Subject: [gutvol-d] Why stay with PG? In-Reply-To: <1211740921.8444.1254998623@webmail.messagingengine.com> References: <20080524090013.GA17892@investigative.net> <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> <48392D2F.6070509@baechler.net> <1211740921.8444.1254998623@webmail.messagingengine.com> Message-ID: On Sun, 25 May 2008, Paul Maas wrote: > Maybe PG needs to recast itself as a text archive. A sort of text > "commons", if you will. PG could allow multiple submissions from > different projects for a particular book title, and let the reader > pick the one they'd prefer to read. Even if PG now views itself > this way, the perception is still "PG has one core text per book." > > So what is PG? What should PG be? We have always allow "multiple submissions" for each book. Right back to the very beginning with Roget and Paradise Lost. No one ever seemed to have an objection. mh > > > On Sun, 25 May 2008 02:11:11 -0700, "Tony Baechler" > said: >> David Starner wrote: >>> On Sat, May 24, 2008 at 5:00 AM, Tony Baechler wrote: >>> >>>> I do have a question for all of you who frequently point out problems >>>> with PG and how Michael does things. Why do you stick with PG year >>>> after year? >>>> >>> >>> Because PG is the source for ebooks; if I started posting somewhere >>> else, I'd lose half my audience. And Michael is no longer the end-all >>> and be-all of PG; I feel perfectly fine ignoring him and being part of >>> PG. >>> >>> >> Yes, but what about the Internet Archive? You could submit texts to >> them just as easily. With a little funding, you could advertise on >> Google and other places to bring in an audience. There are many sites >> which use PG files, often without credit. Search for any popular public >> domain book and you'll find a bunch. >>>> I can get >>>> the last laugh because it's all public domain anyway and the books will >>>> still eventually make their way to PG, albeit probably not by the person >>>> who created the DP competitors or their own versions of PG. >>>> >>> >>> So you get the last laugh because you're wasting effort? Do you know >>> that "The Brothers Karamazov" is just now going through DP? Do you >>> have any idea how much material is out that PG doesn't have, and >>> probably won't have for decades, if ever? Look at >>> and see just how much material has been >>> out there for years that we've never touched. It's surely not going to >>> speed up if you drive people away. >>> __ >> >> Yes, I think I'm aware of how much PG doesn't have. Look at Google >> books, the Library of Congress, IA, etc. That doesn't count libraries >> in other countries with non-English texts. The thing is that I can >> wait. If I just couldn't wait, either I would buy a used reprint and >> scan it myself or would set up any of several free OCR packages and >> process the already available page images. You somewhat misunderstand >> me though. I'm not saying that people would be driven away. I'm saying >> that if we have a bunch of DP spinoffs and DP-like competition going on, >> essentially twice or three times more books can be produced. Even if >> those DP spinoffs don't post to PG, at least high quality ebooks would >> be available for PG harvesting. If they all did post to PG, instead of >> however many books DP currently produces per day, multiply that by two, >> three, ten, etc, depending on how many organizations there are and how >> well they all produce. The resources are out there for those who want >> to tap them. Regarding losing your audience, all I can say is that >> anyone can put up web sites and anyone can find them with good search >> engines. Between the PG newsletter (when one is actually posted), the >> IA forums, Google itself, and the many other book sites and newsgroups >> out there, I really don't see how you could say that you would be >> driving your audience away. As with all new projects it would start >> small as one could expect, but it could grow over time as PG Australia >> has. Personally, I think more of the free software people should be >> targetted. I see a paralel between making and producing free, GPL >> software and making and producing public domain ebooks. Oh, there's >> Creative Commons also. While CC is mostly interested in their own >> licenses, they also could push for more public domain ebooks. >> >> I don't think and never meant to imply that people's efforts should be >> ignored, duplicated, etc. I don't believe in wasting effort anymore >> than most people. >> I don't want to drive people away either. I'm just saying that with a >> small amount of effort to start separate projects, more could be >> produced, not less. One could still post anything they produce to PG >> but have multiple DPs doing the actual work. DP already has a >> harvesting page dealing with IA and some others. Split those efforts >> into new organizations with more proofers and, if the founder of such >> splits feels strongly enough, higher or different standards. This is >> similar to what DP and DP Europe have done. They both produce, they >> both post to PG, and some people proof for both even though they are >> separate and deal with different areas. I would like to see a DP >> Australia, or even DPs not specific to any country. >> >> Likewise, their could be PG spinoffs with their own standards. Let's >> make up a PG spinoff which will only post html and pdf and won't accept >> plain text. Well, since DP would still be posting to PG, they would of >> course still be producing plain text. However, they could post their >> special pdf and html to the PG spinoff. This way we have multiple DP >> organizations, we still have a central PG which accepts everything, and >> there would be one or many PG spinoffs which would take only page >> images, only html, only pdf, only some other format, some combination of >> the above, etc. If the sites are developed correctly with appropriate >> keywords, I as a scholar could find some obscure text with complete page >> images and a nicely formatted document, while an average reader could >> find the basic plain text. As I said previously, I would still get the >> last laugh because all of the DPs and PGs would eventually be posted or >> harvested to the original PG one of these days and I can wait until that >> happens. >> >> >> Please don't send email to me off-list as all non-list email gets >> automatically deleted. >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > -- > Paul Maas > paulmaas at airpost.net > > -- > http://www.fastmail.fm - I mean, what is it about a decent email service? > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Sun May 25 14:18:18 2008 From: hart at pglaf.org (Michael Hart) Date: Sun, 25 May 2008 14:18:18 -0700 (PDT) Subject: [gutvol-d] Why stay with PG? In-Reply-To: References: <20080524090013.GA17892@investigative.net> <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> Message-ID: On Sat, 24 May 2008, Andrew Sly wrote: > > > On Sat, 24 May 2008, David Starner wrote: > > Do you know >> that "The Brothers Karamazov" is just now going through DP? Do you >> have any idea how much material is out that PG doesn't have, and >> probably won't have for decades, if ever? Look at >> and see just how much material has been >> out there for years that we've never touched. It's surely not going to >> speed up if you drive people away. > > In looking up details for items in the PG catalog, I sometimes do > a general search on an author name to try to check dates from another > source. Surprisingly often I find other texts online by the same > author on other sites that are not on PG--sometimes a whole collection > of them. I believe the amount of such material which is around is > larger then we generally realize. Even the extensive listings of the > Online Books Page is just the tip of the iceberg. > > Andrew I agree. I would estimate there are millions of eBooks on the Net already, not even counting Google, Carnegie Mellon, etc., just from plain folks who want to share their favorite books. I think we will hit 10 million public domain eBooks in 5 years-- whether or not Google, Carnegie Mellon, etc., really get project progress into gear in a manner that makes it easy to share them. And don't forget commercial eBooks. I wouldn't doubt there are already a million of those. mh > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From paulmaas at airpost.net Sun May 25 15:07:09 2008 From: paulmaas at airpost.net (Paul Maas) Date: Sun, 25 May 2008 15:07:09 -0700 Subject: [gutvol-d] Why stay with PG? In-Reply-To: References: <20080524090013.GA17892@investigative.net> <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> <48392D2F.6070509@baechler.net> <1211740921.8444.1254998623@webmail.messagingengine.com> Message-ID: <1211753229.9309.1255015053@webmail.messagingengine.com> That's good to hear from your lips, Michael? So you agree then that the PG Archive is simply a "text" commons where individuals and organizations can place their transcribed texts to share with others? If so, then you are right. There are no issues. PG should issue NO requirements other than the submitted text is transcribed from a public domain printing, and maybe that a plain text version is submitted along with whatever other formats the submitter wishes to donate. "Give me your tired, your poor, Your digital texts yearning to breathe free,........." But then, as I think of it, why not just move over and merge with the Internet Archive? This will solve a lot of problems. Why should PG continue as an independent entity apart from TIA? What purpose does PG play any more? Hasn't it already fulfilled its mission? On Sun, 25 May 2008 13:39:12 -0700 (PDT), "Michael Hart" said: > > On Sun, 25 May 2008, Paul Maas wrote: > > > Maybe PG needs to recast itself as a text archive. A sort of text > > "commons", if you will. PG could allow multiple submissions from > > different projects for a particular book title, and let the reader > > pick the one they'd prefer to read. Even if PG now views itself > > this way, the perception is still "PG has one core text per book." > > > > So what is PG? What should PG be? > > We have always allow "multiple submissions" for each book. > > Right back to the very beginning with Roget and Paradise Lost. > > No one ever seemed to have an objection. > > mh > > > > > > > On Sun, 25 May 2008 02:11:11 -0700, "Tony Baechler" > > said: > >> David Starner wrote: > >>> On Sat, May 24, 2008 at 5:00 AM, Tony Baechler wrote: > >>> > >>>> I do have a question for all of you who frequently point out problems > >>>> with PG and how Michael does things. Why do you stick with PG year > >>>> after year? > >>>> > >>> > >>> Because PG is the source for ebooks; if I started posting somewhere > >>> else, I'd lose half my audience. And Michael is no longer the end-all > >>> and be-all of PG; I feel perfectly fine ignoring him and being part of > >>> PG. > >>> > >>> > >> Yes, but what about the Internet Archive? You could submit texts to > >> them just as easily. With a little funding, you could advertise on > >> Google and other places to bring in an audience. There are many sites > >> which use PG files, often without credit. Search for any popular public > >> domain book and you'll find a bunch. > >>>> I can get > >>>> the last laugh because it's all public domain anyway and the books will > >>>> still eventually make their way to PG, albeit probably not by the person > >>>> who created the DP competitors or their own versions of PG. > >>>> > >>> > >>> So you get the last laugh because you're wasting effort? Do you know > >>> that "The Brothers Karamazov" is just now going through DP? Do you > >>> have any idea how much material is out that PG doesn't have, and > >>> probably won't have for decades, if ever? Look at > >>> and see just how much material has been > >>> out there for years that we've never touched. It's surely not going to > >>> speed up if you drive people away. > >>> __ > >> > >> Yes, I think I'm aware of how much PG doesn't have. Look at Google > >> books, the Library of Congress, IA, etc. That doesn't count libraries > >> in other countries with non-English texts. The thing is that I can > >> wait. If I just couldn't wait, either I would buy a used reprint and > >> scan it myself or would set up any of several free OCR packages and > >> process the already available page images. You somewhat misunderstand > >> me though. I'm not saying that people would be driven away. I'm saying > >> that if we have a bunch of DP spinoffs and DP-like competition going on, > >> essentially twice or three times more books can be produced. Even if > >> those DP spinoffs don't post to PG, at least high quality ebooks would > >> be available for PG harvesting. If they all did post to PG, instead of > >> however many books DP currently produces per day, multiply that by two, > >> three, ten, etc, depending on how many organizations there are and how > >> well they all produce. The resources are out there for those who want > >> to tap them. Regarding losing your audience, all I can say is that > >> anyone can put up web sites and anyone can find them with good search > >> engines. Between the PG newsletter (when one is actually posted), the > >> IA forums, Google itself, and the many other book sites and newsgroups > >> out there, I really don't see how you could say that you would be > >> driving your audience away. As with all new projects it would start > >> small as one could expect, but it could grow over time as PG Australia > >> has. Personally, I think more of the free software people should be > >> targetted. I see a paralel between making and producing free, GPL > >> software and making and producing public domain ebooks. Oh, there's > >> Creative Commons also. While CC is mostly interested in their own > >> licenses, they also could push for more public domain ebooks. > >> > >> I don't think and never meant to imply that people's efforts should be > >> ignored, duplicated, etc. I don't believe in wasting effort anymore > >> than most people. > >> I don't want to drive people away either. I'm just saying that with a > >> small amount of effort to start separate projects, more could be > >> produced, not less. One could still post anything they produce to PG > >> but have multiple DPs doing the actual work. DP already has a > >> harvesting page dealing with IA and some others. Split those efforts > >> into new organizations with more proofers and, if the founder of such > >> splits feels strongly enough, higher or different standards. This is > >> similar to what DP and DP Europe have done. They both produce, they > >> both post to PG, and some people proof for both even though they are > >> separate and deal with different areas. I would like to see a DP > >> Australia, or even DPs not specific to any country. > >> > >> Likewise, their could be PG spinoffs with their own standards. Let's > >> make up a PG spinoff which will only post html and pdf and won't accept > >> plain text. Well, since DP would still be posting to PG, they would of > >> course still be producing plain text. However, they could post their > >> special pdf and html to the PG spinoff. This way we have multiple DP > >> organizations, we still have a central PG which accepts everything, and > >> there would be one or many PG spinoffs which would take only page > >> images, only html, only pdf, only some other format, some combination of > >> the above, etc. If the sites are developed correctly with appropriate > >> keywords, I as a scholar could find some obscure text with complete page > >> images and a nicely formatted document, while an average reader could > >> find the basic plain text. As I said previously, I would still get the > >> last laugh because all of the DPs and PGs would eventually be posted or > >> harvested to the original PG one of these days and I can wait until that > >> happens. > >> > >> > >> Please don't send email to me off-list as all non-list email gets > >> automatically deleted. > >> _______________________________________________ > >> gutvol-d mailing list > >> gutvol-d at lists.pglaf.org > >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > > -- > > Paul Maas > > paulmaas at airpost.net > > > > -- > > http://www.fastmail.fm - I mean, what is it about a decent email service? > > > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d at lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d -- Paul Maas paulmaas at airpost.net -- http://www.fastmail.fm - Same, same, but different From sly at victoria.tc.ca Sun May 25 15:08:21 2008 From: sly at victoria.tc.ca (Andrew Sly) Date: Sun, 25 May 2008 15:08:21 -0700 (PDT) Subject: [gutvol-d] Why stay with PG? In-Reply-To: <1211740921.8444.1254998623@webmail.messagingengine.com> References: <20080524090013.GA17892@investigative.net> <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> <48392D2F.6070509@baechler.net> <1211740921.8444.1254998623@webmail.messagingengine.com> Message-ID: On Sun, 25 May 2008, Paul Maas wrote: > Maybe PG needs to recast itself as a text archive. A sort of text > "commons", if you will. PG could allow multiple submissions from > different projects for a particular book title, and let the reader > pick the one they'd prefer to read. Even if PG now views itself > this way, the perception is still "PG has one core text per book." I'm curious--where do you think that perception comes from? I would suspect it is more a matter of the general public's conception that a "book" just has one single state of being. Back as far as the early 90s PG had two different texts of Paradise Lost. And not long after that we of course had different versions of Shakespeare plays and a few other things. Andrew From gbnewby at pglaf.org Sun May 25 15:14:42 2008 From: gbnewby at pglaf.org (Greg Newby) Date: Sun, 25 May 2008 15:14:42 -0700 Subject: [gutvol-d] Why stay with PG? In-Reply-To: <1211740921.8444.1254998623@webmail.messagingengine.com> References: <20080524090013.GA17892@investigative.net> <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> <48392D2F.6070509@baechler.net> <1211740921.8444.1254998623@webmail.messagingengine.com> Message-ID: <20080525221441.GA14738@mail.pglaf.org> On Sun, May 25, 2008 at 11:42:01AM -0700, Paul Maas wrote: > Maybe PG needs to recast itself as a text archive. A sort of text > "commons", if you will. PG could allow multiple submissions from > different projects for a particular book title, and let the reader > pick the one they'd prefer to read. Even if PG now views itself > this way, the perception is still "PG has one core text per book." That's the sort of thing I'm proposing, as a spin-off. Sorry I didn't get my ideas typed up last night, but I'll try to get them today because I really want people's ideas & energies. [And have no intention of trying to detract, take over or dis other efforts.] I'm mostly looking to enable people who want to insert a variation of some sort, whether it's formatting, additional content [like images] or different ideas about the types of correction/standardization to be applied [such as whether to fix errors in the source text]. But as far as "multiple submissions from different projects for a particular book title," that happens all the time. What we strive for is WHEN the sources & eBooks produced are very similar, to create a single core text that has all the best features. If the only reason not do do that is the extra work involved, well, we might hold on the later eBook until someone has time to do the merger. But this event doesn't happen that often. If they are really different, even though the same title, then we produce different eBooks. This, also, doesn't happen that frequently. The thing is that most volunteers prefer to work on eBooks that are not already in the collection. So, such collisions of intent are relatively rare. > So what is PG? What should PG be? You can start with the opinion pieces Michael [and I, to some extent] wrote. They're in the About area of gutenberg.org and are called FAQs, but they're not in the FAQ area. -- Greg > On Sun, 25 May 2008 02:11:11 -0700, "Tony Baechler" > said: > > David Starner wrote: > > > On Sat, May 24, 2008 at 5:00 AM, Tony Baechler wrote: > > > > > >> I do have a question for all of you who frequently point out problems > > >> with PG and how Michael does things. Why do you stick with PG year > > >> after year? > > >> > > > > > > Because PG is the source for ebooks; if I started posting somewhere > > > else, I'd lose half my audience. And Michael is no longer the end-all > > > and be-all of PG; I feel perfectly fine ignoring him and being part of > > > PG. > > > > > > > > Yes, but what about the Internet Archive? You could submit texts to > > them just as easily. With a little funding, you could advertise on > > Google and other places to bring in an audience. There are many sites > > which use PG files, often without credit. Search for any popular public > > domain book and you'll find a bunch. > > >> I can get > > >> the last laugh because it's all public domain anyway and the books will > > >> still eventually make their way to PG, albeit probably not by the person > > >> who created the DP competitors or their own versions of PG. > > >> > > > > > > So you get the last laugh because you're wasting effort? Do you know > > > that "The Brothers Karamazov" is just now going through DP? Do you > > > have any idea how much material is out that PG doesn't have, and > > > probably won't have for decades, if ever? Look at > > > and see just how much material has been > > > out there for years that we've never touched. It's surely not going to > > > speed up if you drive people away. > > > __ > > > > Yes, I think I'm aware of how much PG doesn't have. Look at Google > > books, the Library of Congress, IA, etc. That doesn't count libraries > > in other countries with non-English texts. The thing is that I can > > wait. If I just couldn't wait, either I would buy a used reprint and > > scan it myself or would set up any of several free OCR packages and > > process the already available page images. You somewhat misunderstand > > me though. I'm not saying that people would be driven away. I'm saying > > that if we have a bunch of DP spinoffs and DP-like competition going on, > > essentially twice or three times more books can be produced. Even if > > those DP spinoffs don't post to PG, at least high quality ebooks would > > be available for PG harvesting. If they all did post to PG, instead of > > however many books DP currently produces per day, multiply that by two, > > three, ten, etc, depending on how many organizations there are and how > > well they all produce. The resources are out there for those who want > > to tap them. Regarding losing your audience, all I can say is that > > anyone can put up web sites and anyone can find them with good search > > engines. Between the PG newsletter (when one is actually posted), the > > IA forums, Google itself, and the many other book sites and newsgroups > > out there, I really don't see how you could say that you would be > > driving your audience away. As with all new projects it would start > > small as one could expect, but it could grow over time as PG Australia > > has. Personally, I think more of the free software people should be > > targetted. I see a paralel between making and producing free, GPL > > software and making and producing public domain ebooks. Oh, there's > > Creative Commons also. While CC is mostly interested in their own > > licenses, they also could push for more public domain ebooks. > > > > I don't think and never meant to imply that people's efforts should be > > ignored, duplicated, etc. I don't believe in wasting effort anymore > > than most people. > > I don't want to drive people away either. I'm just saying that with a > > small amount of effort to start separate projects, more could be > > produced, not less. One could still post anything they produce to PG > > but have multiple DPs doing the actual work. DP already has a > > harvesting page dealing with IA and some others. Split those efforts > > into new organizations with more proofers and, if the founder of such > > splits feels strongly enough, higher or different standards. This is > > similar to what DP and DP Europe have done. They both produce, they > > both post to PG, and some people proof for both even though they are > > separate and deal with different areas. I would like to see a DP > > Australia, or even DPs not specific to any country. > > > > Likewise, their could be PG spinoffs with their own standards. Let's > > make up a PG spinoff which will only post html and pdf and won't accept > > plain text. Well, since DP would still be posting to PG, they would of > > course still be producing plain text. However, they could post their > > special pdf and html to the PG spinoff. This way we have multiple DP > > organizations, we still have a central PG which accepts everything, and > > there would be one or many PG spinoffs which would take only page > > images, only html, only pdf, only some other format, some combination of > > the above, etc. If the sites are developed correctly with appropriate > > keywords, I as a scholar could find some obscure text with complete page > > images and a nicely formatted document, while an average reader could > > find the basic plain text. As I said previously, I would still get the > > last laugh because all of the DPs and PGs would eventually be posted or > > harvested to the original PG one of these days and I can wait until that > > happens. > > > > > > Please don't send email to me off-list as all non-list email gets > > automatically deleted. > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d at lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -- > Paul Maas > paulmaas at airpost.net > > -- > http://www.fastmail.fm - I mean, what is it about a decent email service? From gbnewby at pglaf.org Mon May 26 00:56:33 2008 From: gbnewby at pglaf.org (Greg Newby) Date: Mon, 26 May 2008 00:56:33 -0700 Subject: [gutvol-d] OpenGutenberg input sought Message-ID: <20080526075633.GA22752@mail.pglaf.org> As promised, here's some ideas for a semi-centralized service that might meet some of the needs I've heard expressed recently. I have some interns starting work with me over the upcoming weeks, and had already given thought to such a service...so, the recent debate on gutvol-d helped to crystalize some of my thinking. In case it's not completely obvious, this is yet another "let 1,000 flowers bloom" approach. It's not the solution for those who want single standards enforced. Rather, it's a somewhat more open location for new content, variations on content, and meta-content. While all of these things can, and do, find their way to gutenberg.org, I'm looking something that is somewhat easier for individual contributors, with less of a hierarcy. And commensurately less quality control, though mitigated by some community-based systems and various automation. The outline below is kind of general...in practical terms, I'm looking to start by putting the PG content into Trac with Subversion -- which adds Wiki & bug tracking & change tracking functionality. People with ideas are welcome to contribute them [if there are enough people or enough traffic, we can redirect to gutvol-p or start a new mailing list]. The outline OpenGutenberg Community Contributions for the Improvement of Public Domain EBooks Proposal outline May 25 2008 by gbn Purpose: Creation of user-friendly and open opportunities for adding value to Project Gutenberg content through enhanced versions, new formats, and community contributions such as book reviews and author biographies. All content will be open and editable. Design choices will make it as easy as possible to propagate enhancements back to the "main" Project Gutenberg collection. Features/desires. Items marked with an asterisk [*] will be two-way coupled with the main gutenberg.org collection. Implementation of the Project Gutenberg catalog * Fully tracked change management for eBooks and other content * Including the ability to upload new formats, new files, new metadata EBook tracking & management through Subversion or similar system. Per-book bug reporting and feature requests * Search features RSS or similar notification & subscription features Per book: WIKI [editable] Forks, for things like irreconcilable or orphaned versions, biography disputes Community functions: Reputation-based functions Automatic book recommendation Bookmarks, shareable * Book reviews * For later implementation: GutenbergSelf: Self-publishing of items General policies: - As much as possible based on open standards & open software - As much as possible transparent, especially via text files stored alongside their eBooks [for example, a browsable Subversion repository, with metadata in XML files] - Fully mirrorable in full or in part - Authorization/authentication required to contribute, edit, etc., but not necessarily strongly verified [similar to Wikipedia] - All content must be submitted under an open license or granted to the public domain From schultzk at uni-trier.de Mon May 26 00:51:47 2008 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Mon, 26 May 2008 09:51:47 +0200 Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: <48348A46.3000609@novomail.net> Message-ID: Am 23.05.2008 um 20:16 schrieb Michael Hart: > > On Fri, 23 May 2008, Schultz Keith J. wrote: > >> Hi Micheal, hi PG, hi DP, >> >> >> Yes, you allow any willing to do things, yet >> you refuse to instate stricter standards so that >> consolidation is possible. > > Not promoting someone to an official "czar of standards" > position is hardly the same as opposing their efforts. > > Anyone who wants to "consolidate" should have to do the > work of "consilidation" themselves and with volunteer help, > but NOT by a fiat standard being imposed officially. > > Project Gutenberg should be an open standard to as much of > a degree as possible. . .not completely. . .but close. I agree fully, yet PG has NO standard in the true sense. All, it has is a rough convention. > > However, there has never been ANY objection to the standards > DP has imposed, or any other group imposes on themselves and > we are only to glad to help them PROMOTE those standards for > any and all to adopt. . .but FORCING those standards: NO!!! Just as mentioned above there are no STANDARDS. > > The people who say we RESIST THEIR STANDARDS are seriously-- > perhaps even intentionlly--misrepresenting the situtation. I have NOT proposed a particular standard as such, but have repeatedly attempted to gain a consensus for the need and iterated ideas for standard which would solve some problems. >> Yes, humans can handle the inconsitencies, but >> E-texts, E-books, and PG is handle by computers. > > I guess it all depends on what systems you are lookin at. > > However, I repeat, Project Gutenberg is NOT a Xerox machine. Who wants that? DP is trying! Besides then all I would need would be scans and PDF (or what ever one seems fit to to distrube and view the scans)! > > This was never meant to be a completely automated process, > from end to end, which leaves room for humen intervention, > either to create those standards you say you want but will > not actually do the work for, or by those who will resist, > and create or maintain other standards. The process will never be completely automated. Yet, most of the mundane work can be done by computers. The standards make it easier to program. > > It seems as if you had your way, every paper book would've > been require to be the same height for standard shelving-- > a great idea for mass-production of library or bookstore's > shelving, but somehow it just has never taken hold. Come on Micheal. You know better. XML can hold about anything in every shape and size. Yet! It does require strict formats. Just a format for the format!! > > Why not? > > Yes, once "everybody is on board" as you say below, things > could be much better, but you aren't even trying to get an > even early population on board to try things out. I have tried. Yet, I do not find a following, big enough. I do not have the time for the project the size I am thinking about. Though I have found several that agree with me in general. > > I believe in feasibility studies. Was it feasible to go to the Moon, Mars Saturn? Was it feasible for Columbus to sail the Atlantic? Was it feasible to start Yahoo, progate HTML, UNIX? At first was the IDEA and it caught on. > > > Why? > > Because I have learned from experience. > > Feasibility studies give that experience a home to start. > > If you are unwilling to start, > you are unwilling to finish. > > So many complaints by people their idea is not completed, > when they have never even really started. > > Get started!!! > > "Build it. . .and they will come!" If I build it alone. PG will not get it. It will be a better mouse trap. > > Don't build it. . .and they can't come, can they? > > > So many people want everyone on board the same train, > they they refuse to even lay the first file of track, > build the first stations, or be the little engine.... > > that could. . . . > That is the problem PG has it waits for a track that goes in one direction and one track. I believe in being able to move in all directions. PG can fly. Yet, inorder for it to fly you need different and more efficient standards. Maybe, I can reach a different way. I am a advocate of engineering. Houses and machine were built by the average Joe/Jane. These where grossly over sized and more than often not the most efficient, simply because they are based on trial and err knowledge. Today, we can build better houses and machines not only that are better and more effiecient, but also in a more efficient way. Why, because there are standards which help. There are many good ideas out there, but without more exact standards and somebody to bring the together and to compromise, it just will not get built, because it IS NOT FEASIBLE for the individuals. As others have posted here, there is far to much waste of valuable resources in PG and DP. regards Keith. From schultzk at uni-trier.de Mon May 26 01:02:42 2008 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Mon, 26 May 2008 10:02:42 +0200 Subject: [gutvol-d] Why stay with PG? In-Reply-To: <20080524090013.GA17892@investigative.net> References: <20080524090013.GA17892@investigative.net> Message-ID: <59FA1B38-9762-47FC-A766-6F0D4F1293A3@uni-trier.de> Hi Tony, Am 24.05.2008 um 11:00 schrieb Tony Baechler: > Hello all, > > I rarely post here because of the bickering and general disagreements, > but I have to comment on many of the recent posts pointing out > problems > with PG and generally complaining on how things are done. I won't > comment on whether any of you are right or not. As far as I'm To bad. All opinions are important. > concerned, I don't care who's right as long as there are more > books. As > long as they're in plain text, that's good enough for me. :-) > I do have a question for all of you who frequently point out problems > with PG and how Michael does things. Why do you stick with PG year As you so well pointed out PG is a good source for texts. > after year? Why not just abandon PG and start your own project? Web > hosting is incredibly cheap nowadays. Even dedicated servers aren't > THAT expensive. Michael and Greg Newby have offered free server space > many times to anyone who asks, but I'm assuming that you want to We do not want to create our own PG. We try to make it better. > distance yourselves from PG. So, again I ask. Why bother? If you > think PG has poor standards, why not create your own? OK, so you > would > rather use the already existing PG ebooks and reformat them. Fine, > get > the PG DVD or take Greg's offer of free server space and build on > them. > Various people have said that all ebooks must/should be in xml, tei, > etc. OK, so do what blackmask.com used to do and reformat them under > your own domain and your own standards. If you remove the PG name and > small print, you don't even have to credit PG if I understand > correctly. > What prevents you from jumping ship and doing your own thing? Such reformating from plain text is far to tediuos and the information needed was lost. DP is just plain to inconsistant. I do not believe i stealing. If I use PG material I will credit them. I respect PGs effort. > regards Keith. From tb at baechler.net Mon May 26 02:55:24 2008 From: tb at baechler.net (Tony Baechler) Date: Mon, 26 May 2008 02:55:24 -0700 Subject: [gutvol-d] OpenGutenberg input sought In-Reply-To: <20080526075633.GA22752@mail.pglaf.org> References: <20080526075633.GA22752@mail.pglaf.org> Message-ID: <483A890C.9080200@baechler.net> Greg Newby wrote: > I'm looking to start by putting the PG content into Trac > with Subversion -- which adds Wiki & bug tracking & change > tracking functionality. People with ideas are welcome to > contribute them [if there are enough people or enough traffic, > we can redirect to gutvol-p or start a new mailing list]. > > By PG content are you meaning the gutenberg.org web pages, the ebooks or both? I'm not a developer but would it really be practical to put 25,000-plus ebooks in a repository in that way? I could see the point regarding error corrections, but maybe it would be better to adopt a DP approach, where the older files undergo multiple proofing rounds or something. > The outline > > OpenGutenberg > > Community Contributions for the Improvement of Public Domain EBooks > > Proposal outline May 25 2008 by gbn > > > Purpose: Creation of user-friendly and open opportunities for adding > value to Project Gutenberg content through enhanced versions, new > formats, and community contributions such as book reviews and author > biographies. All content will be open and editable. Design choices > will make it as easy as possible to propagate enhancements back to > the "main" Project Gutenberg collection. > > Isn't that already part of the gutenberg.org wiki? Look at the bookshelves, etc. Anyone could create book review pages. Why not just borrow from wikipedia where some of this has already been done? > Features/desires. Items marked with an asterisk [*] will be > two-way coupled with the main gutenberg.org collection. > > Implementation of the Project Gutenberg catalog * > > Do you mean the catalog that already exists or are you talking about a new and different catalog? > Fully tracked change management for eBooks and other content * > Including the ability to upload new formats, new files, new metadata > > I presume that plain text would still be available since this would be a separate site from gutenberg.org while still allowing page images, other formats, etc. Is this correct? > EBook tracking & management through Subversion or similar system. > > I think I understand subversion and similar, but could you elaborate on this? Would this mean that, similar to software projects, that anyone could make and contribute their changes back to any ebook? What if a volunteer doesn't want people changing their work? What if someone purposely inserts errors or misinformation? Who would oversee merging the new changes? Would there be any review or quality control process? > Per-book bug reporting and feature requests * > > Again, is this practical? If you're offering the above, anyone can do a checkout of a particular book and fix errors themselves. That means that the whitewashers or equivalents have vastly more work than they used to because they would have to "unfix" the errors which aren't really errors or were fixed incorrectly. Going back to a previous point, what about proprietary formats? It's good to say below that all content must be submitted with an open license, but what about pdf or "z.m.l" which can't easily be changed and manipulated? > Search features > > RSS or similar notification & subscription features > > I thought PG had this already for new postings. > Per book: > WIKI [editable] > Forks, for things like irreconcilable or orphaned versions, biography disputes > > Yes, but how is an average reader supposed to know which version to download? Why not just create a new book with a different number (edition numbers?) as opposed to saying that because of disputes, there are four or five versions of book X with no clear idea of what the differences are. Speaking for myself, most of the time I just want a plain text version that's readable and has the least number of errors. I don't want to and won't sort through five or ten different "forks" to find the "right" edition that I want. > Automatic book recommendation > How would you implement this? What criteria would determine what books are recommended? What if I want to turn that off? > Bookmarks, shareable * > Book reviews * > > Why not make a dedicated book review site? This way it isn't limited just to PG content. By that, I mean that people having nothing to do with PG could still review public domain books and the name of PG wouldn't be directly linked to the book review site. > For later implementation: > GutenbergSelf: Self-publishing of items > > I think this is a very bad idea. This goes against what you and Michael have said in the past, particularly that you don't want to be a vanity press. With copyrights being the way they are now, I think CC is better qualified to start such a project. As I said before, anyone can get web space and publish whatever they want. I'm not arguing for sensorship, but what about explicit material? In many countries, it is not legal to post about certain explicit subjects. Who determines what is published and what isn't and makes sure laws aren't broken? What about non-books such as video and music? How would one determine that a new book called, say, Berry Potter isn't really Harry Potter book 1 in disguise, thus violating copyright laws? There are already quite a few self-publishing sites that I kno of, ourmedia.org and IA coming to mind. > General policies: > - As much as possible based on open standards & open software > > - As much as possible transparent, especially via text files stored > alongside their eBooks [for example, a browsable Subversion repository, > with metadata in XML files] > > What do you mean by this? I know what metadata and subversion are but what do you mean by transparent? > - Fully mirrorable in full or in part > > Via rsync, ftp, or what other protocols? I would really like to see a fast rsync mirror. ibiblio.org offers a rsync server but it isn't that fast for me. Would there be nightly or hourly snapshots? How large would the archive be? > - Authorization/authentication required to contribute, edit, etc., but > not necessarily strongly verified [similar to Wikipedia] > > - All content must be submitted under an open license or granted to > the public domain > From rburkey2005 at earthlink.net Mon May 26 06:40:19 2008 From: rburkey2005 at earthlink.net (Ron Burkey) Date: Mon, 26 May 2008 08:40:19 -0500 Subject: [gutvol-d] OpenGutenberg input sought In-Reply-To: <20080526075633.GA22752@mail.pglaf.org> References: <20080526075633.GA22752@mail.pglaf.org> Message-ID: <483ABDC3.3030207@earthlink.net> I think these are very good ideas, as far as they go, but I think they don't really address the issue of consistency in formatting of the etexts. But it also seems to me that there are some minor additions to the outline that allow it to do so. I would say that there are two basic schools of thought on the standards/consistency-of-thought issue: 1. The "anything goes as long as there's a plain-text version" school of thought; and 2. The "I want an across-the-board consistent method of preparing the texts so that essential data such as italics, underlining, headings, etc. are preserved" school of thought. I would characterize Michael Hart as being in the former school of though. I am in the latter school of thought, but I don't want to arbitrarily throw away etexts that don't meet a standard method of preparation. So here's my proposed addition to Greg's outline: There would *be* a standard method of markup, but the content management system would have two levels. For lack of better terminilogy, there would be a "RAW" level, which contained the vanilla-ASCII version, and a "FINISHED" level, which contained the marked-up versions. A text would start in the "RAW" repository, and corrections to it would be made there for a time. But eventually, it might graduate into the "FINISHED" repository when the standard markup was available; at that point, edits to the RAW version would be locked out, and all future corrections would need to be made to the FINISHED version. Furthermore, there would be some (open source) software, licensed to be permanently availabe (or perhaps public-domain) that could run on any platform that could convert the FINISHED form to the RAW form, so that vanilla-ASCII was always available. I couldn't care less what the FINISHED format is --- whether HTML, or XML, or Bowerbird's "no markup" thing --- as long as there was *some* standard. The point is that it matters that the standard is official, so that it is clear whether it is the RAW or the FINISHED version which is eligible for corrections. -- Ron Burkey Greg Newby wrote: >The outline > >OpenGutenberg > >Community Contributions for the Improvement of Public Domain EBooks > >Proposal outline May 25 2008 by gbn > > >Purpose: Creation of user-friendly and open opportunities for adding >value to Project Gutenberg content through enhanced versions, new >formats, and community contributions such as book reviews and author >biographies. All content will be open and editable. Design choices >will make it as easy as possible to propagate enhancements back to >the "main" Project Gutenberg collection. > > >Features/desires. Items marked with an asterisk [*] will be >two-way coupled with the main gutenberg.org collection. > >Implementation of the Project Gutenberg catalog * > >Fully tracked change management for eBooks and other content * > Including the ability to upload new formats, new files, new metadata > >EBook tracking & management through Subversion or similar system. > >Per-book bug reporting and feature requests * > >Search features > >RSS or similar notification & subscription features > >Per book: > WIKI [editable] > Forks, for things like irreconcilable or orphaned versions, biography disputes > > >Community functions: > Reputation-based functions > Automatic book recommendation > Bookmarks, shareable * > Book reviews * > > >For later implementation: > GutenbergSelf: Self-publishing of items > > >General policies: >- As much as possible based on open standards & open software > >- As much as possible transparent, especially via text files stored >alongside their eBooks [for example, a browsable Subversion repository, >with metadata in XML files] > >- Fully mirrorable in full or in part > >- Authorization/authentication required to contribute, edit, etc., but >not necessarily strongly verified [similar to Wikipedia] > >- All content must be submitted under an open license or granted to >the public domain > > > From julio.reis at tintazul.com.pt Mon May 26 06:43:06 2008 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis) Date: Mon, 26 May 2008 14:43:06 +0100 Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 30 In-Reply-To: References: Message-ID: <1211809386.6590.99.camel@abetarda> Mr. Maas wrote, > PG could allow multiple submissions from > different projects for a particular book title, and let the > reader > pick the one they'd prefer to read. Even if PG now views > itself > this way, the perception is still "PG has one core text per > book." Then Mr. Hart replied, > We have always allow "multiple submissions" for each book. > Right back to the very beginning with Roget and Paradise Lost. Perhaps a different database structure would help? Right now all different versions of a book are treated as different works. Not allowing multiple submissions would be a no-no I think, but these should clearly be tagged multiple submissions. When I search for 'Paradise Lost' I should get a single result, with all variants included. . . . More thoughts on 'What should PG be?': the answer could be 'a repository of the written word' -- could be spoken or filmed renditions of the written word, but I'd clearly leave out an animation of the Earth's globe for instance. PG could look more like a repository by (randomly?) rotating some books to the front page. Home page looks a bit geeky right now, which works for the geek in me but not for the bookworm inside. It static, too long, not very appealing. We could ask for more involvement -- instead of just asking for money, could we ask for submissions a bit more clearly: donate your paper book, donate your type-in, whatever. Could be more international, too: not just mentioning US copyright freedom but other jurisdictions as well. Most of the world is life+50 or life+70, right? Since we have death dates for most authors, how about showing whether a book is free in these? Not much work I think. J?lio. From richfield at telkomsa.net Sun May 25 09:54:03 2008 From: richfield at telkomsa.net (Jon Richfield) Date: Sun, 25 May 2008 18:54:03 +0200 Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Message-ID: <483999AB.6050503@telkomsa.net> I won't favour that question with a substantial reply, but here are a few remarks. Firstly, PG could come to a tooth-rattling dead-end tomorrow and MH still would have earned his niche in the temple of human benefactors. It is likely to be an esoteric niche unfortunately, because people who care about books are in the minority, and those who care about books of yesteryear are in the vanishing minority, but if we all made our service to humanity conditional on achieving our own notoriety, we still would be biting ticks that had bitten us, while supplementing our diet by hunting with unworked rocks for tardy or retarded lizards. Secondly, there are a lot of books already in PG, so it is a valuable source, and a rewarding site for publishing one's own scans. The scanned works that I publish elsewhere are the ones that are not yet free of the copyright constraints observed by PG. (So far I have patronised PG AU mostly, but I am now working on some material for Sciencemadness as well.) As for format, while I agree that when TXT will do it does have serious advantages, I routinely download HTM and also prepare HTM for upload. HTM is the next best thing to text for simple reading and compact format; it is well supported by software (Firefox does nicely for reading, thank you) and some works are nicely illustrated, sometimes even with artwork that constructively complements the text. Also, much of what I scan is technical, so the graphics are not really dispensable. I fully understand that there are alternatives for illustrating TXT files, but I find such measures less rewarding and more trouble. What with the availability of support tools such as Tidy, I see no reason to stint myself. I do usually supply TXT versions as well, but then it is up to the text-zealots to do as they please about missing pictures. ============= Now, while I m at it, a couple of requests and a confession in sackcloth and blushes. (Ashes and blushes proved too revealing.) Since my last significant exchange I made a foul-up and lost all my recent email archives. I don't like to brag, but I did a good job of it. Among the things I lost was the identity of the friend who requested that I send him some of the Gothic script from German pre-war books for training scanners. Please make a noise and I'll get onto it. Another question is where to check on the fate of a book I submitted some months ago: "Practical Taxidermy" By Brown. Does this ring bells with anyone? If not, should I re-submit it? Sorry to pester, Jon From ajhaines at shaw.ca Mon May 26 11:22:23 2008 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Mon, 26 May 2008 11:22:23 -0700 Subject: [gutvol-d] Why stick with PG? And other Gothic digressions References: <483999AB.6050503@telkomsa.net> Message-ID: <000301c8bf5d$71a93f00$6601a8c0@ahainesp2400> Re: > Another question is where to check on the fate of a book I submitted > some months ago: "Practical Taxidermy" By Brown. Does this ring bells > with anyone? If not, should I re-submit it? No bells here, but then I'm a fairly new WWer. I'll check with the others. Al ----- Original Message ----- From: "Jon Richfield" To: Sent: Sunday, May 25, 2008 9:54 AM Subject: [gutvol-d] Why stick with PG? And other Gothic digressions >I won't favour that question with a substantial reply, but here are a > few remarks. Firstly, PG could come to a tooth-rattling dead-end > tomorrow and MH still would have earned his niche in the temple of human > benefactors. It is likely to be an esoteric niche unfortunately, > because people who care about books are in the minority, and those who > care about books of yesteryear are in the vanishing minority, but if we > all made our service to humanity conditional on achieving our own > notoriety, we still would be biting ticks that had bitten us, while > supplementing our diet by hunting with unworked rocks for tardy or > retarded lizards. > > Secondly, there are a lot of books already in PG, so it is a valuable > source, and a rewarding site for publishing one's own scans. The > scanned works that I publish elsewhere are the ones that are not yet > free of the copyright constraints observed by PG. (So far I have > patronised PG AU mostly, but I am now working on some material for > Sciencemadness as well.) > > As for format, while I agree that when TXT will do it does have serious > advantages, I routinely download HTM and also prepare HTM for upload. > HTM is the next best thing to text for simple reading and compact > format; it is well supported by software (Firefox does nicely for > reading, thank you) and some works are nicely illustrated, sometimes > even with artwork that constructively complements the text. Also, much > of what I scan is technical, so the graphics are not really > dispensable. I fully understand that there are alternatives for > illustrating TXT files, but I find such measures less rewarding and more > trouble. What with the availability of support tools such as Tidy, I > see no reason to stint myself. I do usually supply TXT versions as > well, but then it is up to the text-zealots to do as they please about > missing pictures. > > ============= > > Now, while I m at it, a couple of requests and a confession in sackcloth > and blushes. (Ashes and blushes proved too revealing.) Since my last > significant exchange I made a foul-up and lost all my recent email > archives. I don't like to brag, but I did a good job of it. Among the > things I lost was the identity of the friend who requested that I send > him some of the Gothic script from German pre-war books for training > scanners. Please make a noise and I'll get onto it. > > Another question is where to check on the fate of a book I submitted > some months ago: "Practical Taxidermy" By Brown. Does this ring bells > with anyone? If not, should I re-submit it? > > Sorry to pester, > > Jon > > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Mon May 26 11:38:53 2008 From: hart at pglaf.org (Michael Hart) Date: Mon, 26 May 2008 11:38:53 -0700 (PDT) Subject: [gutvol-d] Why stay with PG? In-Reply-To: References: <20080524090013.GA17892@investigative.net> <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> <48392D2F.6070509@baechler.net> <1211740921.8444.1254998623@webmail.messagingengine.com> Message-ID: On Sun, 25 May 2008, Andrew Sly wrote: > > > On Sun, 25 May 2008, Paul Maas wrote: > >> Maybe PG needs to recast itself as a text archive. A sort of text >> "commons", if you will. PG could allow multiple submissions from >> different projects for a particular book title, and let the reader >> pick the one they'd prefer to read. Even if PG now views itself >> this way, the perception is still "PG has one core text per book." > > I'm curious--where do you think that perception comes from? > I would suspect it is more a matter of the general public's > conception that a "book" just has one single state of being. Well, I wouldn't want to presume that our readers don't know there are many different editions of Shakespeare and Milton, or to insist on only only edition. > Back as far as the early 90s PG had two different texts of > Paradise Lost. And not long after that we of course had > different versions of Shakespeare plays and a few other things. I had always presumed we would do dozens of editions of the great classics, all of them. . . . Michael > > Andrew > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Mon May 26 11:53:08 2008 From: hart at pglaf.org (Michael Hart) Date: Mon, 26 May 2008 11:53:08 -0700 (PDT) Subject: [gutvol-d] Why stay with PG? In-Reply-To: <1211753229.9309.1255015053@webmail.messagingengine.com> References: <20080524090013.GA17892@investigative.net> <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com> <48392D2F.6070509@baechler.net> <1211740921.8444.1254998623@webmail.messagingengine.com> <1211753229.9309.1255015053@webmail.messagingengine.com> Message-ID: On Sun, 25 May 2008, Paul Maas wrote: > That's good to hear from your lips, Michael? > > So you agree then that the PG Archive is simply a "text" > commons where individuals and organizations can place their > transcribed texts to share with others? Yes. We even allow people to insist their eBooks be posted with no alterations whatsover, although we can't guarantee from now into the future that no one will ever change it. > If so, then you are right. There are no issues. PG should > issue NO requirements other than the submitted text is > transcribed from a public domain printing, and maybe that a > plain text version is submitted along with whatever other > formats the submitter wishes to donate. Even the "standards" messages I sent out for years said right up front said these were just "suggestions". . . . > "Give me your tired, your poor, > Your digital texts yearning to breathe free,........." Absolutely! > But then, as I think of it, why not just move over and > merge with the Internet Archive? This will solve a lot > of problems. Why should PG continue as an independent > entity apart from TIA? What purpose does PG play any > more? Hasn't it already fulfilled its mission? Most or all of these operations have their own agendas, that would allow us to work with their books, but not be an actual part of their operations. However, we work with the IA very closely, and others. Thanks!!! Michael > > > On Sun, 25 May 2008 13:39:12 -0700 (PDT), "Michael Hart" > said: >> >> On Sun, 25 May 2008, Paul Maas wrote: >> >>> Maybe PG needs to recast itself as a text archive. A sort of text >>> "commons", if you will. PG could allow multiple submissions from >>> different projects for a particular book title, and let the reader >>> pick the one they'd prefer to read. Even if PG now views itself >>> this way, the perception is still "PG has one core text per book." >>> >>> So what is PG? What should PG be? >> >> We have always allow "multiple submissions" for each book. >> >> Right back to the very beginning with Roget and Paradise Lost. >> >> No one ever seemed to have an objection. >> >> mh >> >>> >>> >>> On Sun, 25 May 2008 02:11:11 -0700, "Tony Baechler" >>> said: >>>> David Starner wrote: >>>>> On Sat, May 24, 2008 at 5:00 AM, Tony Baechler wrote: >>>>> >>>>>> I do have a question for all of you who frequently point out problems >>>>>> with PG and how Michael does things. Why do you stick with PG year >>>>>> after year? >>>>>> >>>>> >>>>> Because PG is the source for ebooks; if I started posting somewhere >>>>> else, I'd lose half my audience. And Michael is no longer the end-all >>>>> and be-all of PG; I feel perfectly fine ignoring him and being part of >>>>> PG. >>>>> >>>>> >>>> Yes, but what about the Internet Archive? You could submit texts to >>>> them just as easily. With a little funding, you could advertise on >>>> Google and other places to bring in an audience. There are many sites >>>> which use PG files, often without credit. Search for any popular public >>>> domain book and you'll find a bunch. >>>>>> I can get >>>>>> the last laugh because it's all public domain anyway and the books will >>>>>> still eventually make their way to PG, albeit probably not by the person >>>>>> who created the DP competitors or their own versions of PG. >>>>>> >>>>> >>>>> So you get the last laugh because you're wasting effort? Do you know >>>>> that "The Brothers Karamazov" is just now going through DP? Do you >>>>> have any idea how much material is out that PG doesn't have, and >>>>> probably won't have for decades, if ever? Look at >>>>> and see just how much material has been >>>>> out there for years that we've never touched. It's surely not going to >>>>> speed up if you drive people away. >>>>> __ >>>> >>>> Yes, I think I'm aware of how much PG doesn't have. Look at Google >>>> books, the Library of Congress, IA, etc. That doesn't count libraries >>>> in other countries with non-English texts. The thing is that I can >>>> wait. If I just couldn't wait, either I would buy a used reprint and >>>> scan it myself or would set up any of several free OCR packages and >>>> process the already available page images. You somewhat misunderstand >>>> me though. I'm not saying that people would be driven away. I'm saying >>>> that if we have a bunch of DP spinoffs and DP-like competition going on, >>>> essentially twice or three times more books can be produced. Even if >>>> those DP spinoffs don't post to PG, at least high quality ebooks would >>>> be available for PG harvesting. If they all did post to PG, instead of >>>> however many books DP currently produces per day, multiply that by two, >>>> three, ten, etc, depending on how many organizations there are and how >>>> well they all produce. The resources are out there for those who want >>>> to tap them. Regarding losing your audience, all I can say is that >>>> anyone can put up web sites and anyone can find them with good search >>>> engines. Between the PG newsletter (when one is actually posted), the >>>> IA forums, Google itself, and the many other book sites and newsgroups >>>> out there, I really don't see how you could say that you would be >>>> driving your audience away. As with all new projects it would start >>>> small as one could expect, but it could grow over time as PG Australia >>>> has. Personally, I think more of the free software people should be >>>> targetted. I see a paralel between making and producing free, GPL >>>> software and making and producing public domain ebooks. Oh, there's >>>> Creative Commons also. While CC is mostly interested in their own >>>> licenses, they also could push for more public domain ebooks. >>>> >>>> I don't think and never meant to imply that people's efforts should be >>>> ignored, duplicated, etc. I don't believe in wasting effort anymore >>>> than most people. >>>> I don't want to drive people away either. I'm just saying that with a >>>> small amount of effort to start separate projects, more could be >>>> produced, not less. One could still post anything they produce to PG >>>> but have multiple DPs doing the actual work. DP already has a >>>> harvesting page dealing with IA and some others. Split those efforts >>>> into new organizations with more proofers and, if the founder of such >>>> splits feels strongly enough, higher or different standards. This is >>>> similar to what DP and DP Europe have done. They both produce, they >>>> both post to PG, and some people proof for both even though they are >>>> separate and deal with different areas. I would like to see a DP >>>> Australia, or even DPs not specific to any country. >>>> >>>> Likewise, their could be PG spinoffs with their own standards. Let's >>>> make up a PG spinoff which will only post html and pdf and won't accept >>>> plain text. Well, since DP would still be posting to PG, they would of >>>> course still be producing plain text. However, they could post their >>>> special pdf and html to the PG spinoff. This way we have multiple DP >>>> organizations, we still have a central PG which accepts everything, and >>>> there would be one or many PG spinoffs which would take only page >>>> images, only html, only pdf, only some other format, some combination of >>>> the above, etc. If the sites are developed correctly with appropriate >>>> keywords, I as a scholar could find some obscure text with complete page >>>> images and a nicely formatted document, while an average reader could >>>> find the basic plain text. As I said previously, I would still get the >>>> last laugh because all of the DPs and PGs would eventually be posted or >>>> harvested to the original PG one of these days and I can wait until that >>>> happens. >>>> >>>> >>>> Please don't send email to me off-list as all non-list email gets >>>> automatically deleted. >>>> _______________________________________________ >>>> gutvol-d mailing list >>>> gutvol-d at lists.pglaf.org >>>> http://lists.pglaf.org/listinfo.cgi/gutvol-d >>> -- >>> Paul Maas >>> paulmaas at airpost.net >>> >>> -- >>> http://www.fastmail.fm - I mean, what is it about a decent email service? >>> >>> _______________________________________________ >>> gutvol-d mailing list >>> gutvol-d at lists.pglaf.org >>> http://lists.pglaf.org/listinfo.cgi/gutvol-d >>> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > -- > Paul Maas > paulmaas at airpost.net > > -- > http://www.fastmail.fm - Same, same, but different > From gbnewby at pglaf.org Mon May 26 12:49:23 2008 From: gbnewby at pglaf.org (Greg Newby) Date: Mon, 26 May 2008 12:49:23 -0700 Subject: [gutvol-d] OpenGutenberg input sought In-Reply-To: <483A890C.9080200@baechler.net> References: <20080526075633.GA22752@mail.pglaf.org> <483A890C.9080200@baechler.net> Message-ID: <20080526194923.GB4492@mail.pglaf.org> On Mon, May 26, 2008 at 02:55:24AM -0700, Tony Baechler wrote: > Greg Newby wrote: > > I'm looking to start by putting the PG content into Trac > > with Subversion -- which adds Wiki & bug tracking & change > > tracking functionality. People with ideas are welcome to > > contribute them [if there are enough people or enough traffic, > > we can redirect to gutvol-p or start a new mailing list]. > > > > > By PG content are you meaning the gutenberg.org web pages, the ebooks or > both? I'm not a developer but would it really be practical to put > 25,000-plus ebooks in a repository in that way? I could see the point > regarding error corrections, but maybe it would be better to adopt a DP > approach, where the older files undergo multiple proofing rounds or > something. Not the Web pages, the 25,000-plus eBooks. Yes, it's a rather large repository by some standards. Your idea for a DP approach that supports multiple proofing rounds for titles already within the PG collection is a good one, but it's not something I'm pursuing. > > The outline > > > > OpenGutenberg > > > > Community Contributions for the Improvement of Public Domain EBooks > > > > Proposal outline May 25 2008 by gbn > > > > > > Purpose: Creation of user-friendly and open opportunities for adding > > value to Project Gutenberg content through enhanced versions, new > > formats, and community contributions such as book reviews and author > > biographies. All content will be open and editable. Design choices > > will make it as easy as possible to propagate enhancements back to > > the "main" Project Gutenberg collection. > > > > > Isn't that already part of the gutenberg.org wiki? Look at the > bookshelves, etc. Anyone could create book review pages. Why not just > borrow from wikipedia where some of this has already been done? Yes, it is already part of the gutenberg.org wiki with the limitation that people can't add their own content to the "main" catalog page for an eBook. I agree that Wikipedia is a better location for things like author bios, and in fact I hope that's what people use first. The point is editable content, focused on a chosen eBook. So if someone wants to type a few quick notes on an author or title...or something that's not adequately researched for a wikipedia article...or some sort of family history related to an author...this might be a good forum. > > Features/desires. Items marked with an asterisk [*] will be > > two-way coupled with the main gutenberg.org collection. > > > > Implementation of the Project Gutenberg catalog * > > > > > Do you mean the catalog that already exists or are you talking about a > new and different catalog? That catalog that already exists. But my notion is to make some derivative products, automatically derived from the catalog. In particular, I'm interested in an XML metadata file withIN the eBook directory. That's the sort of method that the Internet Archive uses, and I like the idea of keeping metadata close to the eBook's other files. As some people might not know, there is a small but devoted catalog team where catalog maintenance is done. While the work is neverending, I think the current catalog has a lot of goodness. No need to supersede or reinvent. > > Fully tracked change management for eBooks and other content * > > Including the ability to upload new formats, new files, new metadata > > > > > I presume that plain text would still be available since this would be a > separate site from gutenberg.org while still allowing page images, other > formats, etc. Is this correct? Everything from gutenberg.org will be imported essentially immediately. The key is that OTHER stuff can be added. > > EBook tracking & management through Subversion or similar system. > > > > > I think I understand subversion and similar, but could you elaborate on > this? Would this mean that, similar to software projects, that anyone > could make and contribute their changes back to any ebook? What if a > volunteer doesn't want people changing their work? What if someone > purposely inserts errors or misinformation? Who would oversee merging > the new changes? Would there be any review or quality control process? Good questions, and I think the answers depend on just how many people, of what sorts of interests, make contributions and have motivation to do community policing. I'm not planning much central structure. Unlike [my understanding of] Wikipedia, I want to build in the capability of forks. So, when there is substantial disagreement, BOTH views can persist independently...and both with the ability for attachable commentary. I'm not trying to reinvent something like Wikipedia. >From a quality control standpoint, this is much more like the blogosphere than Wikipedia. > > Per-book bug reporting and feature requests * > > > > > Again, is this practical? If you're offering the above, anyone can do a > checkout of a particular book and fix errors themselves. That means > that the whitewashers or equivalents have vastly more work than they > used to because they would have to "unfix" the errors which aren't > really errors or were fixed incorrectly. Going back to a previous > point, what about proprietary formats? It's good to say below that all > content must be submitted with an open license, but what about pdf or > "z.m.l" which can't easily be changed and manipulated? The existing process for getting fixes back into the main PG collection are not changed, in any way. The thing added is ability for people to get their fixes "out there" immediately, something that is not working well with the current approach to errata. As for proprietary formats: you are right that the problem of regenerating new formats is an issue. Over time some titles will end up with various mis-matched titles. Avoiding such is one of the things we strive for for the main gutenberg.org collection. For this new collection, it's not one of my design criteria. HOWEVER, one feature I do want is for people to "subscribe" to a particular eBook, so that those who make new formats can at least know when something changes for that title. > > Search features > > > > RSS or similar notification & subscription features > > > > > I thought PG had this already for new postings. Sure, and it's part of my "home" page setting in Firefox. But we need a new feed for the other added stuff. > > Per book: > > WIKI [editable] > > Forks, for things like irreconcilable or orphaned versions, biography disputes > > > > > Yes, but how is an average reader supposed to know which version to > download? Why not just create a new book with a different number > (edition numbers?) as opposed to saying that because of disputes, there > are four or five versions of book X with no clear idea of what the > differences are. Speaking for myself, most of the time I just want a > plain text version that's readable and has the least number of errors. > I don't want to and won't sort through five or ten different "forks" to > find the "right" edition that I want. You're envisioning a top-down approach to quality control and authoritativeness. That's not a design goal. Practically speaking, I think people will mostly just want the most recent date in the format they desire. It's not that complicated. We're not talking about something like the LDS or Ron Paul page on Wikipedia, or some other contentious content...we're talking about literary works. > > Automatic book recommendation > > > How would you implement this? What criteria would determine what books > are recommended? What if I want to turn that off? There are a lot of algorithms for automatic book recommendations, I haven't chosen one. I don't think there's anything to turn off...it will be something you ask for. I'm not talking about having "you might also be interested in...." on every screen, like you see at Amazon etc. Just some way of finding a book of interest, by request. > > Bookmarks, shareable * > > Book reviews * > > > > > Why not make a dedicated book review site? This way it isn't limited > just to PG content. By that, I mean that people having nothing to do > with PG could still review public domain books and the name of PG > wouldn't be directly linked to the book review site. I don't really think there is a shortage of places where people can put their book reviews. This is for reviews of the content that's part of the site. I have no objection to people putting it in the gutenberg.org wiki or Wikipedia or whatever, and just linking from this new site. > > For later implementation: > > GutenbergSelf: Self-publishing of items > > > > > I think this is a very bad idea. This goes against what you and Michael > have said in the past, particularly that you don't want to be a vanity > press. We don't want gutenberg.org to be a vanity press. The GutenbergSelf concept *is* about vanity press, basically. Since people will browse or search for content, I imagine making it easy to search just particular "types" of content...by license, by source, etc. as well as author/title/subject. > With copyrights being the way they are now, I think CC is better > qualified to start such a project. Do you mean CreativeCommons? I didn't know they were doing anything with hosting eBook content. > As I said before, anyone can get web > space and publish whatever they want. You're right. Yet I get dozens of requests per week from people looking to use Project Gutenberg for their content. In addition to some name-brand recognition, the main thing I think PG offers is a likelihood of permanence. To me, that is key [and it's why mirroring is specifically facilitated]. > I'm not arguing for sensorship, > but what about explicit material? In many countries, it is not legal to > post about certain explicit subjects. Who determines what is published > and what isn't and makes sure laws aren't broken? What about non-books > such as video and music? How would one determine that a new book > called, say, Berry Potter isn't really Harry Potter book 1 in disguise, > thus violating copyright laws? There are already quite a few > self-publishing sites that I kno of, ourmedia.org and IA coming to mind. Everything will be US-based, and follow US laws for copyright & the few laws dealing with obscenity. By having all postings be non-anonymous, with some sort of waiting period built in for new posters, I think we'll avoid some of the more obvious stuff like people posting Harlan Ellison's works. Also, we'll keep generally on an eBook mission, to avoid becoming another allofmp3.com or pirate bay. While a community reputation system can be jigged, as we see on eBay, the stakes will be lower on this site. > > General policies: > > - As much as possible based on open standards & open software > > > > - As much as possible transparent, especially via text files stored > > alongside their eBooks [for example, a browsable Subversion repository, > > with metadata in XML files] > > > > > What do you mean by this? I know what metadata and subversion are but > what do you mean by transparent? Simply that you can get a directory listing. This is something the Internet Archive does for their eBooks. > > - Fully mirrorable in full or in part > > > > > Via rsync, ftp, or what other protocols? I would really like to see a > fast rsync mirror. ibiblio.org offers a rsync server but it isn't that > fast for me. Would there be nightly or hourly snapshots? How large > would the archive be? Yes, the ibiblio.org rsync server is very slow! I'm redirecting people to the readingroo.ms rsync server instead. ftp, http...yes. For snapshots, do you mean, how quickly will this new service reflect changes to the main gutenberg.org site? The answer is, immediately: - I'll push new titles to the new server, at the same time titles go to gutenberg.org - We'll re-import & update the catalog as soon as it's regenerated, daily > > - Authorization/authentication required to contribute, edit, etc., but > > not necessarily strongly verified [similar to Wikipedia] > > > > - All content must be submitted under an open license or granted to > > the public domain > > Those were some really good comments, thanks. As you see, I'm not trying to scratch everyone's itch, and there are both technical and social challenges to what I'm thinking of. If anyone wants to contribute some expertise, or even start something not quite the same, drop me a note. -- Greg From Bowerbird at aol.com Mon May 26 16:41:15 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 26 May 2008 19:41:15 EDT Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Message-ID: jon richfield said: > I do usually supply TXT versions as well, but then it is > up to the text-zealots to do as they please about missing pictures. jon, please just list the filename of each graphic at the point where it is to be included in the text, and tomorrow's e-text viewers will insert it there. thanks. -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080526/65d8e7ca/attachment.htm From prosfilaes at gmail.com Mon May 26 18:43:20 2008 From: prosfilaes at gmail.com (David Starner) Date: Mon, 26 May 2008 21:43:20 -0400 Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 30 In-Reply-To: <1211809386.6590.99.camel@abetarda> References: <1211809386.6590.99.camel@abetarda> Message-ID: <6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com> On Mon, May 26, 2008 at 9:43 AM, J?lio Reis wrote: > Could be more international, too: not just mentioning US copyright > freedom but other jurisdictions as well. Most of the world is life+50 or > life+70, right? Since we have death dates for most authors, how about > showing whether a book is free in these? Not much work I think. I suspect we don't have death dates for the majority of the authors in the system, though we probably have death dates for the authors of the majority of the books in the system. More over, you simplify the issue; the question is not just the death date of the author, but also the illustrator, author of the introduction, and possibly the editor. Stating whether or not a book is out of copyright outside the one jurisdiction we've carefully vetted it for is ill-advised, IMO. From sly at victoria.tc.ca Mon May 26 22:31:42 2008 From: sly at victoria.tc.ca (Andrew Sly) Date: Mon, 26 May 2008 22:31:42 -0700 (PDT) Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 30 In-Reply-To: <6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com> References: <1211809386.6590.99.camel@abetarda> <6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com> Message-ID: On Mon, 26 May 2008, David Starner wrote: >I suspect we don't have death dates for the majority of the authors in >the system, though we probably have death dates for the authors of the >majority of the books in the system. More over, you simplify the >issue; the question is not just the death date of the author, but also >the illustrator, author of the introduction, and possibly the editor. >Stating whether or not a book is out of copyright outside the one >jurisdiction we've carefully vetted it for is ill-advised, IMO. Yes, that is true. Perhaps worth noting is that copyright is not actually as simple as life+50 or life+70. Each country may have additional rules or exceptions that apply in particular circumstances, or depending on the nature of the copyrighted material. For instance, I seem to recall that France has an extended copyright term for works written by French citizens who died in service to their country during the world wars. Andrew From traverso at posso.dm.unipi.it Mon May 26 22:41:46 2008 From: traverso at posso.dm.unipi.it (Carlo Traverso) Date: Tue, 27 May 2008 07:41:46 +0200 (CEST) Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 30 In-Reply-To: (message from Andrew Sly on Mon, 26 May 2008 22:31:42 -0700 (PDT)) References: <1211809386.6590.99.camel@abetarda> <6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com> Message-ID: <20080527054146.64A4493B61@posso.dm.unipi.it> >>>>> "Andrew" == Andrew Sly writes: Andrew> On Mon, 26 May 2008, David Starner wrote: >> I suspect we don't have death dates for the majority of the >> authors in the system, though we probably have death dates for >> the authors of the majority of the books in the system. More >> over, you simplify the issue; the question is not just the >> death date of the author, but also the illustrator, author of >> the introduction, and possibly the editor. Stating whether or >> not a book is out of copyright outside the one jurisdiction >> we've carefully vetted it for is ill-advised, IMO. Andrew> Yes, that is true. Perhaps worth noting is that copyright Andrew> is not actually as simple as life+50 or life+70. Each Andrew> country may have additional rules or exceptions that apply Andrew> in particular circumstances, or depending on the nature of Andrew> the copyrighted material. For instance, I seem to recall Andrew> that France has an extended copyright term for works Andrew> written by French citizens who died in service to their Andrew> country during the world wars. True, but we may give a reversed information: indicate the books for which one of the creators is known to have been died later than 50 or 70 years ago. This simplifies life for people wanting to investigate the copyright status of the works, since the answer is immediately "No". Carlo From gbnewby at pglaf.org Mon May 26 22:45:12 2008 From: gbnewby at pglaf.org (Greg Newby) Date: Mon, 26 May 2008 22:45:12 -0700 Subject: [gutvol-d] OpenGutenberg input sought In-Reply-To: <483ABDC3.3030207@earthlink.net> References: <20080526075633.GA22752@mail.pglaf.org> <483ABDC3.3030207@earthlink.net> Message-ID: <20080527054512.GC12974@mail.pglaf.org> On Mon, May 26, 2008 at 08:40:19AM -0500, Ron Burkey wrote: > I think these are very good ideas, as far as they go, but I think they > don't really address the issue of consistency in formatting of the > etexts. But it also seems to me that there are some minor additions to > the outline that allow it to do so. Thanks, Ron. I'm not primarily interested in addressing consistency. That's the itch that some people want to have scratched, but it's not one that motivates me very much. However: > I would say that there are two basic schools of thought on the > standards/consistency-of-thought issue: > > 1. The "anything goes as long as there's a plain-text version" school > of thought; and > 2. The "I want an across-the-board consistent method of preparing the > texts so that essential data such as italics, underlining, > headings, etc. are preserved" school of thought. > > I would characterize Michael Hart as being in the former school of > though. I am in the latter school of thought, but I don't want to > arbitrarily throw away etexts that don't meet a standard method of > preparation. Overgeneralizations, but I certainly understand the sentiments in the two mindsets. > So here's my proposed addition to Greg's outline: There would *be* a > standard method of markup, but the content management system would have > two levels. For lack of better terminilogy, there would be a "RAW" > level, which contained the vanilla-ASCII version, and a "FINISHED" > level, which contained the marked-up versions. A text would start in > the "RAW" repository, and corrections to it would be made there for a > time. But eventually, it might graduate into the "FINISHED" repository > when the standard markup was available; at that point, edits to the RAW > version would be locked out, and all future corrections would need to be > made to the FINISHED version. Furthermore, there would be some (open > source) software, licensed to be permanently availabe (or perhaps > public-domain) that could run on any platform that could convert the > FINISHED form to the RAW form, so that vanilla-ASCII was always available. That's not a bad idea at all. Sort of an automated, or semi-automated way of having content "blessed." [In the Perl sense.] > I couldn't care less what the FINISHED format is --- whether HTML, or > XML, or Bowerbird's "no markup" thing --- as long as there was *some* > standard. The point is that it matters that the standard is official, > so that it is clear whether it is the RAW or the FINISHED version which > is eligible for corrections. Of course, my philosophy isn't to block RAW texts from usage, or improvement, etc. I think a basic way to achieve this is simply through metadata, so that a search can retrieve only the FINISHED versions. Importantly, to me, is that if FINISHED for one group means ZML, and for another means TeX, we can allow multiple definitions. I know this isn't comfortable for everyone, but: The real key, to me, is technical: having some excellent guidelines, examples, and technical tools to help people bring their eBook to FINISHED. I know some folks have ideas on this already, and encourage them to write up what they have [as software, guidelines/policies, HOWTOs, etc.], even if they're not complete or fully automated. Thanks for your ideas! Despite not necessarily being my personal itch, I think the idea of a FINISHED version is consistent & doable. -- Greg > Greg Newby wrote: > > >The outline > > > >OpenGutenberg > > > >Community Contributions for the Improvement of Public Domain EBooks > > > >Proposal outline May 25 2008 by gbn > > > > > >Purpose: Creation of user-friendly and open opportunities for adding > >value to Project Gutenberg content through enhanced versions, new > >formats, and community contributions such as book reviews and author > >biographies. All content will be open and editable. Design choices > >will make it as easy as possible to propagate enhancements back to > >the "main" Project Gutenberg collection. > > > > > >Features/desires. Items marked with an asterisk [*] will be > >two-way coupled with the main gutenberg.org collection. > > > >Implementation of the Project Gutenberg catalog * > > > >Fully tracked change management for eBooks and other content * > > Including the ability to upload new formats, new files, new metadata > > > >EBook tracking & management through Subversion or similar system. > > > >Per-book bug reporting and feature requests * > > > >Search features > > > >RSS or similar notification & subscription features > > > >Per book: > > WIKI [editable] > > Forks, for things like irreconcilable or orphaned versions, biography > > disputes > > > > > >Community functions: > > Reputation-based functions > > Automatic book recommendation > > Bookmarks, shareable * > > Book reviews * > > > > > >For later implementation: > > GutenbergSelf: Self-publishing of items > > > > > >General policies: > >- As much as possible based on open standards & open software > > > >- As much as possible transparent, especially via text files stored > >alongside their eBooks [for example, a browsable Subversion repository, > >with metadata in XML files] > > > >- Fully mirrorable in full or in part > > > >- Authorization/authentication required to contribute, edit, etc., but > >not necessarily strongly verified [similar to Wikipedia] > > > >- All content must be submitted under an open license or granted to > >the public domain > > > > > > From Bowerbird at aol.com Mon May 26 23:54:25 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 27 May 2008 02:54:25 EDT Subject: [gutvol-d] OpenGutenberg input sought Message-ID: greg said: > I'm not primarily interested in addressing consistency.? > That's the itch that some people want to have scratched, > but it's not one that motivates me very much. yet you're unclear why o.l.p.c. couldn't use your library. sad, but true. oh well... > The real key, to me, is technical: having some > excellent guidelines, examples, and technical tools > to help people bring their eBook to FINISHED.? guideline #1: consistency. -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080527/db6932f0/attachment.htm From julio.reis at tintazul.com.pt Tue May 27 02:48:26 2008 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis) Date: Tue, 27 May 2008 10:48:26 +0100 Subject: [gutvol-d] Dates of death and copyright status outside the US Message-ID: <1211881706.9979.65.camel@abetarda> Hello all, thanks for replying. David said, > More over, you simplify the issue; the question is not just > the death date of the author, but also the illustrator, > author of the introduction, and possibly the editor. I'm not simplifying any issue. We are discussing legal issues so my concept of "author" is a legal one: "author" being any person whom we need to consider for copyright issues. I am *assuming* we are listing in the catalog all such 'authors' for any given work. It's relevant to credit all authors for database search issues, and also ethically important. Andrew said, > Perhaps worth noting is that copyright is not > actually as simple as life+50 or life+70. Each country may > have > additional rules or exceptions that apply in particular > circumstances, or depending on the nature of the copyrighted > material. For instance, I seem to recall that France has an > extended copyright term for works written by French citizens > who died in service to their country during the world wars. I know about France and the issues regarding (lack of) copyright harmonization inside European Union countries. I am not asking PG to give copyright information for any given country or region other than the USA -- like Carlo pointed out, we only want PG to help people find such information regarding their own jurisdiction. I think you should be showing the date of publication already in the catalog. It's very relevant for people searching it; I see several editions of 'Paradise Lost', but which is the 1845 one? I think I didn't come across clearly the first time around; what I am asking in terms of authors is quite simply the number of years since the death of the last author. For any given work: 1) do we have death dates for all 'authors'? If no, then stop: we can't provide any information. 2) pick the last date of death of the work's 'authors'. 3) Years since last death = today's year, minus date of last death, minus one. That's just it. You send me the database schema, and I'll send you back the SQL statement to create that information. It's not complicated. I am one of the many maintainers of the PG catalog. Why go through the trouble of researching dates of death if we don't need those for the USA anyway? It's a small step to do the math afterwards. J?lio. From prosfilaes at gmail.com Tue May 27 03:32:45 2008 From: prosfilaes at gmail.com (David Starner) Date: Tue, 27 May 2008 06:32:45 -0400 Subject: [gutvol-d] Dates of death and copyright status outside the US In-Reply-To: <1211881706.9979.65.camel@abetarda> References: <1211881706.9979.65.camel@abetarda> Message-ID: <6d99d1fd0805270332x7e15dc02t29492826dc2f8210@mail.gmail.com> On Tue, May 27, 2008 at 5:48 AM, J?lio Reis wrote: > I am *assuming* we are listing in the catalog all such 'authors' for any > given work. It's relevant to credit all authors for database search > issues, and also ethically important. We aren't. We won't. A collection of a hundred short stories or a hundred paintings or both is just not feasible to list in the catalog. > I think I didn't come across clearly the first time around; what I am > asking in terms of authors is quite simply the number of years since the > death of the last author. For any given work: That's the part that's trivial; do we really need a computer to do it? It can also be misleading, since we don't have all the authors in the system. > I am one of the many maintainers of the PG catalog. Why go through the > trouble of researching dates of death if we don't need those for the USA > anyway? It's a small step to do the math afterwards. Because it's included in LoC records and is one easy way to distinguish authors with the same name. From rburkey2005 at earthlink.net Tue May 27 06:15:39 2008 From: rburkey2005 at earthlink.net (Ron Burkey) Date: Tue, 27 May 2008 08:15:39 -0500 Subject: [gutvol-d] OpenGutenberg input sought In-Reply-To: <20080527054512.GC12974@mail.pglaf.org> References: <20080526075633.GA22752@mail.pglaf.org> <483ABDC3.3030207@earthlink.net> <20080527054512.GC12974@mail.pglaf.org> Message-ID: <1211894139.21896.15.camel@software1.heads-up.local> On Mon, 2008-05-26 at 22:45 -0700, Greg Newby wrote: > On Mon, May 26, 2008 at 08:40:19AM -0500, Ron Burkey wrote: > > I couldn't care less what the FINISHED format is --- whether HTML, or > > XML, or Bowerbird's "no markup" thing --- as long as there was *some* > > standard. The point is that it matters that the standard is official, > > so that it is clear whether it is the RAW or the FINISHED version which > > is eligible for corrections. > > Of course, my philosophy isn't to block RAW texts from usage, > or improvement, etc. > > I think a basic way to achieve this is simply through metadata, > so that a search can retrieve only the FINISHED versions. > > Importantly, to me, is that if FINISHED for one group means > ZML, and for another means TeX, we can allow multiple definitions. > I know this isn't comfortable for everyone, but: > > The real key, to me, is technical: having some excellent guidelines, > examples, and technical tools to help people bring their eBook to > FINISHED. I know some folks have ideas on this already, and encourage > them to write up what they have [as software, guidelines/policies, > HOWTOs, etc.], even if they're not complete or fully automated. > > Thanks for your ideas! Despite not necessarily being my > personal itch, I think the idea of a FINISHED version is > consistent & doable. > -- Greg Well, I know that talking about "standards" does seem a little restrictive, given the sort of free-wheeling volunteer nature of PG. But I do think that "guidelines" are more appropriate for the RAW repository and "standards" are more appropriate for the FINISHED repository. Of course, just because I say that I think PG should have a standard for how the FINISHED texts were formatted, that's not to say that the standard must be exclusive. The PG standard could be "the FINISHED texts shall be HTML 4.0 *or* DocBook v4.5 *or* ZML vWhatever". The key ideas are that there should be a clear distinction between whether the RAW version or the FINISHED version is the master version, in the sense of being maintained in the future, and that there should some established procedure for graduating an etext from the RAW status to the FINISHED status. For example, if the PG standard was that FINISHED texts were HTML 4.0, it doesn't necessarily imply that just because somebody supplied an HTML 4.0 version of a text that it should graduate into the FINISHED archive. It's possible that the HTML version of the etext was itself crummy, and should be put back into the RAW archive along with the vanilla ASCII version. Perhaps there would need to be an editor who can look at a candidate for graduation and say, "yes, this is good enough!" So there are procedural issues as well as simple technical issues with my proposal. -- Ron From jeroen.mailinglist at bohol.ph Tue May 27 14:06:55 2008 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Tue, 27 May 2008 23:06:55 +0200 Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 30 In-Reply-To: <6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com> References: <1211809386.6590.99.camel@abetarda> <6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com> Message-ID: <483C77EF.6010301@bohol.ph> Although it is nice background information to add death dates for authors, these are not by themselves enough to clear the copyright of any particular work outside the US. The laws are just too complicated. This is why I, although working from the Netherlands, leave it to PG to republish things, and won't do it myself. life+something laws are a pain, especially given the draconian measures now proposed to enforce copyrights... Jeroen. David Starner wrote: > On Mon, May 26, 2008 at 9:43 AM, J?lio Reis wrote: > >> Could be more international, too: not just mentioning US copyright >> freedom but other jurisdictions as well. Most of the world is life+50 or >> life+70, right? Since we have death dates for most authors, how about >> showing whether a book is free in these? Not much work I think. >> > > I suspect we don't have death dates for the majority of the authors in > the system, though we probably have death dates for the authors of the > majority of the books in the system. More over, you simplify the > issue; the question is not just the death date of the author, but also > the illustrator, author of the introduction, and possibly the editor. > Stating whether or not a book is out of copyright outside the one > jurisdiction we've carefully vetted it for is ill-advised, IMO. > _______________________________________________ > From jeroen.mailinglist at bohol.ph Tue May 27 14:11:35 2008 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Tue, 27 May 2008 23:11:35 +0200 Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 30 In-Reply-To: <20080527054146.64A4493B61@posso.dm.unipi.it> References: <1211809386.6590.99.camel@abetarda> <6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com> <20080527054146.64A4493B61@posso.dm.unipi.it> Message-ID: <483C7907.8000303@bohol.ph> Carlo Traverso wrote: > True, but we may give a reversed information: indicate the books for > which one of the creators is known to have been died later than 50 or > 70 years ago. This simplifies life for people wanting to investigate > the copyright status of the works, since the answer is immediately > "No". > I also do not agree with that approach. We need not support life+something systems in any way, just let us be agnostic about it, or keep it with the Your Copyright Milage May Vary notice we have today. Furthermore, in many countries (The UK being an exception), it is perfectly legal to download about anything as long as it is for private use or study. In the EU, the European Treaty on Human Rights guaranties the freedom to collect information, and the Universal Declaration of Human Rights also declares access to cultural heritage an inalienable human right. (Making the UK law a violation of human rights) Jeroen. From jeroen.mailinglist at bohol.ph Tue May 27 14:23:25 2008 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Tue, 27 May 2008 23:23:25 +0200 Subject: [gutvol-d] Dates of death and copyright status outside the US In-Reply-To: <1211881706.9979.65.camel@abetarda> References: <1211881706.9979.65.camel@abetarda> Message-ID: <483C7BCD.2070309@bohol.ph> J?lio Reis wrote: > > > That's just it. You send me the database schema, and I'll send you back > the SQL statement to create that information. It's not complicated. > You are missing a lot of relevant pieces of information that might be relevant when establishing copyright status of works, including the answers to some of the following odd questions. - What was citizenship of the author at birth, and did he hold other citizenships during his lifetime, and when? (Einstein was born German, became Swiss in 1901, and US Citizen in 1940). Does the fact that he retained Swiss citizenship make his works eligible for WTO Restoration in the US? - Can we apply rules of shorter term for Soviet Authors published before 1974. - What is the citizenship of somebody born in Gdansk in 1924? - What is the influence of the place of first publication, and subsequent publications in other jurisdictions. - Are war time extension in place; did the author "die for France" (France) - Are transitional rules in place; does the author enjoy life+80 (Spain) - Do we have to deal with the Great Ormond Street Hospital Children's Charity and the CDPA 1988, Schedule 6, Section 301. (UK) And so on... From Bowerbird at aol.com Tue May 27 14:28:45 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 27 May 2008 17:28:45 EDT Subject: [gutvol-d] an inalienable human right Message-ID: jeroen said: > the Universal Declaration of Human Rights also declares > access to cultural heritage an inalienable human right _now_ we're talking... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080527/a5c2be46/attachment.htm From Bowerbird at aol.com Tue May 27 14:38:33 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 27 May 2008 17:38:33 EDT Subject: [gutvol-d] raw to finished Message-ID: by the way, roger frank (rfrank) from distributed proofreaders has been working on a system to allow automated conversions from his "master" format into .txt and .html, and latex and .pdf. roger's master is a "dot command" approach, a throwback to the earliest days of computer typesetting (e.g., roff/troff, and so on). > http://fadedpage.com/ maybe along the line he'll realize the .txt file could be his master... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080527/0b87faa7/attachment.htm From Bowerbird at aol.com Wed May 28 01:55:57 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 28 May 2008 04:55:57 EDT Subject: [gutvol-d] can't see the forest for the trees Message-ID: michael said: > And you refuse, after being told so many times, > just give your own consistency a trial run from > which to garner support. who gives a tinker's damn about "support"? not me. > You have put SO many works out here that I will > have to admit I can understand those who say it > it too much and thus won't read them. i think you should be smarter than to repeat the "words" of others that caused them to lose all of their credibility. i have plenty of examples up for anyone who needs 'em. i am way past the point of doing books one at a time, or even 5 or 10, or 50 or 100 at a time. i'm figuring out how to do them in bunches of a thousand, and ten thousand... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080528/79337985/attachment.htm From richfield at telkomsa.net Tue May 27 12:41:21 2008 From: richfield at telkomsa.net (Jon Richfield) Date: Tue, 27 May 2008 21:41:21 +0200 Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB Message-ID: <483C63E1.5000703@telkomsa.net> Al, Thanks, David W has picked up with me again. No doubt I'll soon be back on track. BB >jon, please just list the filename of each graphic at the point where it is to be included in the text, and tomorrow's e-text viewers will insert it there.< Errr, yeah, sortakinda, but why can't they get the full content and format as easily from the HTM file? TXT files have all sorts of funnies when there are interjected Greek (or Arabic, or swelpme, ancient Egyptian) characters in the text? I do include txt files for PG USA because they insist, but so far only enough to satisfy the WW team. Without venturing into the firefights about tools and formats, which interest me a lot less than getting the stuff on line, readable, and available, I don't see why more than what I have described is necessary from my point of view. Lazy of me of course, but time is a diminishing resource and so is my capacity for time sharing. Cheers, Jon From schultzk at uni-trier.de Wed May 28 02:39:06 2008 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Wed, 28 May 2008 11:39:06 +0200 Subject: [gutvol-d] raw to finished In-Reply-To: References: Message-ID: <65691217-76E6-4E1D-9DB5-A49BA2A1CCBB@uni-trier.de> Hi, Am 27.05.2008 um 23:38 schrieb Bowerbird at aol.com: > by the way, roger frank (rfrank) from distributed proofreaders > has been working on a system to allow automated conversions > from his "master" format into .txt and .html, and latex and .pdf. > > roger's master is a "dot command" approach, a throwback to the > earliest days of computer typesetting (e.g., roff/troff, and so on). What do you mean by the earliesr days of typesetting? Are you saying DP is using outdated technology? Come On. Apples MacOS X is based on one of the oldest OSes around: UniX. That does not mean it is bad. From what I can see his system works. That is all that matters. I do admit I find his formatting language not my cup of tee. > > > http://fadedpage.com/ > > maybe along the line he'll realize the .txt file could be his > master... His .src is a txt file !!! ;-)) You know as well as I do .txt denotes this file is not binary. regards Keith -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080528/f25e93b3/attachment.htm From julio.reis at tintazul.com.pt Wed May 28 03:35:02 2008 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis) Date: Wed, 28 May 2008 11:35:02 +0100 Subject: [gutvol-d] Dates of death and copyright status outside the US In-Reply-To: References: Message-ID: <1211970902.6698.80.camel@abetarda> Sure, David. Whatever, your country, your project. I can find my way around PG and tell what's legal or not in my jurisdiction. And anyway, I am a 'writer' to PG not a reader: the paper books that come my way are more than enough. When I read ebooks, I read modern-day science fiction, the old stuff at PG feels pretty boring. 'Golden age', sheesh. That's like saying the Sumerians were the golden age of writing -- which they were, but we don't really write that way anymore. Rant over. I was just trying to increase the international appeal of PG. Just curious about what the server logs show: how much of your bandwidth is from outside the USA? Because if it's 10%, don't even bother. Also, don't bother trying to explain why it's 'not feasible' to list one or two hundred authors for a collective work. Suffice to say that I don't understand, but what I understood is that for you and for me listing authors has a different significance. And I can live with that difference ;) No one's refuted my take on the homepage. PG feels like a respectable data store, not like an exciting portal to a wealth of the written word. Not everything in PG is boring, we could make it look alive! (And if it were all boring IMHO -- one man's Yawn is another man's Yay!) > We aren't. We won't. A collection of a hundred short stories > or a > hundred paintings or both is just not feasible to list in the > catalog. From prosfilaes at gmail.com Wed May 28 07:38:47 2008 From: prosfilaes at gmail.com (David Starner) Date: Wed, 28 May 2008 10:38:47 -0400 Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB In-Reply-To: <483C63E1.5000703@telkomsa.net> References: <483C63E1.5000703@telkomsa.net> Message-ID: <6d99d1fd0805280738m3615bb02pbf5191c2c556f175@mail.gmail.com> On Tue, May 27, 2008 at 3:41 PM, Jon Richfield wrote: > TXT files have all sorts of funnies > when there are interjected Greek (or Arabic, or swelpme, ancient > Egyptian) characters in the text? Not if you use them right. The same Unicode goes into a text file as goes into an HTML file. Ancient Egyptian is just pictures, or ASCII transcription; you can't write Ancient Egyptian in HTML anymore than you can in text. From Bowerbird at aol.com Wed May 28 09:55:18 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 28 May 2008 12:55:18 EDT Subject: [gutvol-d] raw to finished Message-ID: keith said: > What do you mean by the earliesr days of typesetting? > Are you saying DP is using outdated technology? i'm saying dot commands go back to the earliest days of _computer_ typesetting... just google it and you'll see that it's true. > His .src is a txt file !!! ?;-) it's an _ascii_ file (or .utf8 if you squint at it), but a lot of the characters are _not_ "the text", so it doesn't qualify in my mind as "a text file". however, the .txt file that it generates _is_ one, precisely since it has no superfluous characters. -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080528/87963fdc/attachment.htm From Bowerbird at aol.com Wed May 28 10:14:02 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 28 May 2008 13:14:02 EDT Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB Message-ID: jon richfield said: > Errr, yeah, sortakinda, but why can't they > get the full content and format as easily from the HTM file? um, you want the viewer-program for the .txt file to read the .html file to get the graphic's filename when you could have just put it right there in the .txt file? why be so convoluted? > TXT files have all sorts of funnies when there are interjected > Greek (or Arabic, or swelpme, ancient Egyptian) characters a .txt file can be in .utf8 format, and show those things directly. > Without venturing into the firefights about tools and formats, > which interest me a lot less than getting the stuff on line, > readable, and available, if you really want to get stuff "on line, readable, and available", a .txt file is the most direct route of 'em all, thankyouverymuch. and -- given the right format, and z.m.l. is just one variant -- you can also get typographic beauty out of a .txt file, as well as powerful e-book capabilities that can scale to a very large library. so your apathy about "tools and formats" isn't serving you well... thankfully, the future won't be so myopic. > I don't see why more than what I have described is necessary > from my point of view. i don't see that you have described anything at all. i suggested a way that you could tell a .txt-file viewer-program that a graphic should be inserted in a specific spot -- by simply listing the filename of the graphic at that point. this is about the same as what .html requires. (well, a little less, as you don't have to do the bracket-and-tag song-and-dance.) frankly, i don't see how you could do _less_ than list the filename, unless you expect the viewer-program to have some sort of e.s.p. -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080528/8bf834dd/attachment.htm From hart at pglaf.org Wed May 28 10:23:54 2008 From: hart at pglaf.org (Michael Hart) Date: Wed, 28 May 2008 10:23:54 -0700 (PDT) Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB In-Reply-To: <6d99d1fd0805280738m3615bb02pbf5191c2c556f175@mail.gmail.com> References: <483C63E1.5000703@telkomsa.net> <6d99d1fd0805280738m3615bb02pbf5191c2c556f175@mail.gmail.com> Message-ID: On Wed, 28 May 2008, David Starner wrote: > On Tue, May 27, 2008 at 3:41 PM, Jon Richfield wrote: >> TXT files have all sorts of funnies >> when there are interjected Greek (or Arabic, or swelpme, ancient >> Egyptian) characters in the text? > > Not if you use them right. The same Unicode goes into a text file as > goes into an HTML file. Ancient Egyptian is just pictures, or ASCII > transcription; you can't write Ancient Egyptian in HTML anymore than > you can in text. Gentlemen, please avoid the temptations to create rules based on exceptions to the general situation. Obviously the common book contains little, if any, from other alphabets. This is not to say NO books do, but they are exceptions and not the general case. However, we DO want to create eBooks in all languages-- at least all 250 that have over a million speakers, and we SHOULD create general rules for each language, rules created, I should add, by their native populations. Why? Because I learned a bit of a handful of languages in an assortent of ways, from native speakers and not, even a funny example of someone trying to teach me my own from the perspective of being a teacher of English, but from another country and language. Most anyone who has been in such a situation can tell a number of examples of how silly these events can be. I just hope we can provide books to the whole world. Michael From hart at pglaf.org Wed May 28 10:59:19 2008 From: hart at pglaf.org (Michael Hart) Date: Wed, 28 May 2008 10:59:19 -0700 (PDT) Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: Message-ID: On Wed, 28 May 2008, Bowerbird at aol.com wrote: > michael said: >> And you refuse, after being told so many times, >> just give your own consistency a trial run from >> which to garner support. > > who gives a tinker's damn about "support"? not me. > "Just Do It!" If you just tell others how it should be done, you are just one more voice in the Maelstrom. > >> You have put SO many works out here that I will >> have to admit I can understand those who say it >> it too much and thus won't read them. > > i think you should be smarter than to repeat the "words" > of others that caused them to lose all of their credibility. Same concepts, who cares which words. > i have plenty of examples up for anyone who needs 'em. > > i am way past the point of doing books one at a time, or > even 5 or 10, or 50 or 100 at a time. i'm figuring out how > to do them in bunches of a thousand, and ten thousand... THAT is EXACTLY your problem. You never did enough books one at a time to creat the literal infrastructure NECESSARY to your doing bunches of a thousand, and ten thousands. If you had done the proper run up, just 10 here and then 100 there, after making some revisions based on your new experience from the first 10, then you WOULD NOT have an assortment of perceived impervious ROADBLOCKS, to trying to do 1,000 or 10,000. Please recall that I labeled the first 10,000 as kind of a feasibility study, which you knew of. If you had been there, doing your own feasibilies, along with that scenario, YOU could have taken over at 10,000, barring other events that might have intervened. The truth is, whenever you START YOUR PROJECT it is WISE to do this runup of feasibility studies of 1, 10, 100 or 1,000 to 10,000 items along the way. . . . NOT JUST FOR YOURSELF, BUT SO YOU CAN GET VOLUNTEERS.... You need to DEMONSTRATE what you are doing to give those people who would help you time to get on board and learn how to actually help you. If you needed no help, you wouldn't be complaining. It's no one's fault but your own that these books aren't pretty much exactly what you need. YOU STILL NEED TO WORK YOUR WAY UP FROM 10, 100, 1,000-- just so the world has time to see what you are doing and to get on board. What you SHOULD do is just gather a few volunteers, from whom you can get an equal output to your own, to start. You put out one book exactly the way you like, and walk, not run, them through doing one in the same manner. Get just 9 such volunteers and when you do ONE book then you get 10, and will have trained those 10 volunteers. Then you start on 100, using those who like those 10. You should be able to improve the whole thing each time. Then it's off to 1,000, and when you hit 10,000, a whole world will take notice. . .one way or another. Remember what the world was like before PG hit 10,000? It was just a feasibility study to them before that. Now stop TALKING and start DOING!!! People will like that much better. I will be only too glad to provide all possible support, providing you've now learned enough not to reject it. Go back to line #1 and reread until you can accept help. If you can't accept help, it's going to be a loong trip. > > -bowerbird > > > > ************** > Get trade secrets for amazing burgers. Watch "Cooking with > Tyler Florence" on AOL Food. > (http://food.aol.com/tyler-florence?video=4& > ?NCID=aolfod00030000000002) > From hart at pglaf.org Wed May 28 11:20:06 2008 From: hart at pglaf.org (Michael Hart) Date: Wed, 28 May 2008 11:20:06 -0700 (PDT) Subject: [gutvol-d] an inalienable human right In-Reply-To: References: Message-ID: On Tue, 27 May 2008, Bowerbird at aol.com wrote: > jeroen said: >> the Universal Declaration of Human Rights also declares >> access to cultural heritage an inalienable human right > > _now_ we're talking... > > -bowerbird With or without any such declarations, the right to public domain information has always been "inalienable." You can't buy someone's right to the public domain. You can't sell your right to the public domain. No one else can buy or sell these public domain rights. Well, that is until the WIPO lobbyists appear onscene. And the FUNNY part is that WIPO is part of the U.N.!!! JUST THINK. . .THE U.N. IS SPONSORING THE ELIMINATION OF THE PUBLIC DOMAIN. . . . No wonder Ted Turner gave them a billion dollars cash! I wonder how many billions he made on all those movies he bought when their copyrights were extended??? In real money and audience terms the most popular movie: "Gone With The Wind" Who owns "Gone With The Wind"??? Ted Turner. Along with tons of other movies. "Gone With The Wind" was 1939. The original 56 years of copyright allotted as the max at the time expired on December 31, 1995. But it was extended, along with every other copyright, just to protect top 10% of moneymakers. . .the rest of them were never renewed. . . . From Bowerbird at aol.com Wed May 28 14:25:34 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 28 May 2008 17:25:34 EDT Subject: [gutvol-d] can't see the forest for the trees Message-ID: michael, you seem to think you know what's happening on my hard-drive. the only problem is, it doesn't jibe with what _i_know_ is happening there. > NOT JUST FOR YOURSELF, BUT SO YOU CAN GET VOLUNTEERS.... i've told you (probably a dozen times now) that i don't _need_ volunteers. > If you needed no help, you wouldn't be complaining. i'm not "complaining". i am telling you what's wrong with your e-library, so you can fix it. if you won't fix it, i'll have to _show_ you what's wrong, by re-creating it without the problem, which seems to be what you prefer. ok, fine, i'll do it that way. you would've been a lot smarter to just ask me to fix the problem _for_you_, as i would have been quite happy to do that. but you seem to want to insist there is nothing wrong. and you're wrong. > Now stop TALKING and start DOING!!! don't be an ass, michael. i have been _doing_ for a very long time now... the "talking" part was to offer you the benefit of the lessons i have learned, a courtesy to you, instead of upstaging your e-library with a better variant. but i give up, you've made it clear you don't want to accept the gift, so ok... understand, however, that what you have instructed me to _do_ right now is to make _your_ version of your e-library irrelevant... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080528/28305d26/attachment.htm From marcello at perathoner.de Wed May 28 14:37:35 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed, 28 May 2008 23:37:35 +0200 Subject: [gutvol-d] Dates of death and copyright status outside the US In-Reply-To: <1211970902.6698.80.camel@abetarda> References: <1211970902.6698.80.camel@abetarda> Message-ID: <483DD09F.6020206@perathoner.de> J?lio Reis wrote: > I was just trying to increase the international appeal of PG. Just > curious about what the server logs show: how much of your bandwidth is > from outside the USA? Because if it's 10%, don't even bother. Ask Alexa. > Also, don't bother trying to explain why it's 'not feasible' to list one > or two hundred authors for a collective work. Its perfectly doable with the current software. If you feel like it, just do it. If you don't do it, don't complain that nobody else wants to do it for you. > No one's refuted my take on the homepage. If I answered everybody who doesn't like the look of the site, I wouldn't do anything else. Its a wiki. Start a new front page and put it up for votes. -- Marcello Perathoner webmaster at gutenberg.org From sly at victoria.tc.ca Wed May 28 15:42:20 2008 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed, 28 May 2008 15:42:20 -0700 (PDT) Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB In-Reply-To: <6d99d1fd0805280738m3615bb02pbf5191c2c556f175@mail.gmail.com> References: <483C63E1.5000703@telkomsa.net> <6d99d1fd0805280738m3615bb02pbf5191c2c556f175@mail.gmail.com> Message-ID: On Wed, 28 May 2008, David Starner wrote: > On Tue, May 27, 2008 at 3:41 PM, Jon Richfield wrote: > > TXT files have all sorts of funnies > > when there are interjected Greek (or Arabic, or swelpme, ancient > > Egyptian) characters in the text? > > Not if you use them right. The same Unicode goes into a text file as > goes into an HTML file. Ancient Egyptian is just pictures, or ASCII > transcription; you can't write Ancient Egyptian in HTML anymore than > you can in text. Just to be my annoying self, I can't resist responding... According to: http://www.egpz.com/resources/unicode.htm >The 1063 hieroglyphs given in N3237: Proposal to encode Egyptian >Hieroglyphs in the SMP of the UCS were accepted for encoding in the >Universal Character Set by the ISO WG2 meeting in April 2007. This set is >based on the works of Alan Gardiner, the majority of which hieroglyphs are >given in his work Egyptian Grammar. Some minor changes were made to the >initial proposal and the set accepted by WG2 in first national ballot on >ISO 10646 PDAM5 (September 2007) now numbers 1071 (see N3349: Summary of >repertoire for FPDAM 5 of ISO/IEC 10646:2003 and future amendments). A >second national ballot on PDAM5 is expected April 2008 at which time the >set is likely to be fixed. All being well, Basic Egyptian Hieroglyphs >will then be released with Unicode 5.2 ( expected 2009/10). From schultzk at uni-trier.de Thu May 29 00:43:12 2008 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Thu, 29 May 2008 09:43:12 +0200 Subject: [gutvol-d] raw to finished In-Reply-To: References: Message-ID: <0655CC3A-9FB7-4134-A01B-9E3DFFE71A62@uni-trier.de> Am 28.05.2008 um 18:55 schrieb Bowerbird at aol.com: > keith said: > > What do you mean by the earliesr days of typesetting? > > Are you saying DP is using outdated technology? > > i'm saying dot commands go back to the > earliest days of _computer_ typesetting... > just google it and you'll see that it's true. I know how old this type of formating is. It simply does not mean it is not useful or efficient, though I do admit it is ugly and hard to work with, without tools. > > > > > His .src is a txt file !!! ;-) > > it's an _ascii_ file (or .utf8 if you squint at it), > but a lot of the characters are _not_ "the text", > so it doesn't qualify in my mind as "a text file". Just a matter of definition. > > > however, the .txt file that it generates _is_ one, > precisely since it has no superfluous characters. > Exactly, what I said about. It works and gets the job done! So why knock it. regards Keith. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080529/e0caab0e/attachment.htm From schultzk at uni-trier.de Thu May 29 00:36:49 2008 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Thu, 29 May 2008 09:36:49 +0200 Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB In-Reply-To: <483C63E1.5000703@telkomsa.net> References: <483C63E1.5000703@telkomsa.net> Message-ID: Hi Jon, By TXT files I assume you are talking about standard ASCII. With Greek I think 8-bit encodding would be fine. Far as the others are concerned I believe UTF will do. Especially, acient Egyptian. The cases you mentioned are hard to fit into PG-Proper. There is no simple solution. regards Keith. Am 27.05.2008 um 21:41 schrieb Jon Richfield: > Al, > > Thanks, David W has picked up with me again. No doubt I'll soon be > back > on track. > > BB > >> jon, please just list the filename of each graphic > at the point where it is to be included in the text, > and tomorrow's e-text viewers will insert it there.< > > Errr, yeah, sortakinda, but why can't they get the full content and > format as easily from the HTM file? TXT files have all sorts of > funnies > when there are interjected Greek (or Arabic, or swelpme, ancient > Egyptian) characters in the text? I do include txt files for PG USA > because they insist, but so far only enough to satisfy the WW team. > Without venturing into the firefights about tools and formats, which > interest me a lot less than getting the stuff on line, readable, and > available, I don't see why more than what I have described is > necessary > from my point of view. Lazy of me of course, but time is a > diminishing > resource and so is my capacity for time sharing. > > Cheers, > > Jon > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From hart at pglaf.org Thu May 29 10:25:58 2008 From: hart at pglaf.org (Michael Hart) Date: Thu, 29 May 2008 10:25:58 -0700 (PDT) Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: Message-ID: On Wed, 28 May 2008, Bowerbird at aol.com wrote: > michael, you seem to think you know what's happening on my > hard-drive. > > the only problem is, it doesn't jibe with what _i_know_ is > happening there. I'm only talking about what I know works, and it is obviously NOT working for you. Time for YOU to change. . . . One way or the other. . . . > > >> NOT JUST FOR YOURSELF, BUT SO YOU CAN GET VOLUNTEERS.... > > i've told you (probably a dozen times now) that i don't _need_ > volunteers. So you SAY, but without them you are not getting anywhere, which is why you complain the world doesn't fit perfectly with your computers. You want so substitute computers for volutneers, and you keep finding out that doesn't work. Then you complain that reality reality is not as perfect as your desired virtual reality. Stop complaining and DO THE WORK TO CHANGE REALITY!!! Or just shut up, if you can't. No one needs to hear you keep repeating complaints. You know, there is a word for complaining and not doing anything about it, but that word is not used in polite conversations in our societies. >> If you needed no help, you wouldn't be complaining. > > i'm not "complaining". Yes, you ARE complaining, just like all the others who tell us how they could run PG better than anyone else. > i am telling you what's wrong with > your e-library, so you can fix it. THAT is complaining. How many time must we tell you to LEAD, so others could possibly FOLLOW, if you haven't already aliendated them all??? You could have done ALL this and been DONE with it, if you had started with DOING rather than COMPLAINING!!! > if you won't fix it, i'll have to > _show_ you what's wrong, by re-creating it without the problem, > which seems to be what you prefer. Are you saying you finall get the idea? YES!!! FOR THE UMPTEENTH TIME. . .YES!!! JUST DO IT!!! >> ok, fine, i'll do it that way. you would've been a lot smarter to > just ask me to fix the problem _for_you_, as i would have been > quite happy to do that. That's EXACTLY what JUST DO IT means!!! JUST DO IT!!! What could be more simple??? Go! Go!! Go!!! > but you seem to want to insist there is nothing wrong. and you're > wrong. There is plenty that can be improved. . .GO FOR IT!!! > > >> Now stop TALKING and start DOING!!! > > don't be an ass, michael. i have been _doing_ for a very long > time now... As above. . . . > > the "talking" part was to offer you the benefit of the lessons i > have learned, a courtesy to you, instead of upstaging your > e-library with a better variant. Not everyone considers improvement as being upstaged. . . . You sound as if you want to pretend your improvements would not be welcome. . .perhaps by some you have alienated, but not my those who count. > > but i give up, you've made it clear you don't want to accept the > gift, so ok... Ah, so your gift is already retracted in the same message it was offered??? Who would believe it??? Read back and you will see that I have always been sincere about you DOING what you SAID. . .RIGHT HERE!!! > > understand, however, that what you have instructed me to _do_ > right now is to make _your_ version of your e-library > irrelevant... Are you insisting on the pretense that your version is not welcome here after all the !!!YEARS!!! I have spent encouraging you to JUST DO IT. . .RIGHT HERE??? Again, who would believe it??? > > -bowerbird > > > > ************** Get trade secrets for amazing burgers. Watch > "Cooking with Tyler Florence" on AOL Food. > (http://food.aol.com/tyler-florence?video=4& > ?NCID=aolfod00030000000002) > From Bowerbird at aol.com Thu May 29 11:43:57 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 29 May 2008 14:43:57 EDT Subject: [gutvol-d] raw to finished Message-ID: keith said: > So why knock it. i wasn't "knocking" it. saying something is "old" doesn't mean you're criticizing it. it's simply a description. -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080529/ed163e4b/attachment.htm From Bowerbird at aol.com Thu May 29 12:18:05 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 29 May 2008 15:18:05 EDT Subject: [gutvol-d] can't see the forest for the trees Message-ID: michael said: > So you SAY, but without them you are not getting anywhere and here again you seem to know how my progress is going. but your report is the exact opposite of my own experience... i'm making excellent progress, my friend, excellent progress. > You want so substitute computers for volutneers, > and you keep finding out that doesn't work. michael, you're wrong -- very badly wrong -- on both counts. > No one needs to hear you keep repeating complaints. i wonder why you have this "need" to keep trying to spin my posts as "complaints". they're constructive criticism, michael. if you listened, and acted, your e-library would be improved... > Ah, so your gift is already retracted > in the same message it was offered??? the question is whether i improved your e-library in-place, or set up another version -- my e-library -- independently... you have been telling me all along to set up my own version. i've been reluctant to do that, because it _will_ upstage you... your library has been held back because programmers can't access the underlying structure of the e-texts to add value... but you've won, michael. i am no longer reluctant to do that. congratulations. > Are you insisting on the pretense that your version is > not welcome here after all the !!!YEARS!!! I have spent > encouraging you to JUST DO IT. . .RIGHT HERE??? i can't even get well-documented errors corrected in your e-texts. why would i have any reason to believe i could do the large-scale editing necessary to bring those e-texts to a state of consistency? i'll mount my vision... when you see how programmers flock to it, you will come to grasp fully and precisely what i have been saying. i'm really sorry it had to come to this... but i see i have no choice... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080529/ce66e99b/attachment.htm From Bowerbird at aol.com Thu May 29 12:30:39 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 29 May 2008 15:30:39 EDT Subject: [gutvol-d] =?iso-8859-1?q?=A0_cleaning_up_the_catalog?= Message-ID: joyce said: > It looks to me like your data comes from the etext file headers, > and not from the bibliographic records in the catalog.? i do believe you're correct about that... it has been well over a year since i did that work, and i'd forgotten. and let me also add a quick "thank you" for looking at my output. it's heartening to know that _someone_ cares enough to _look_... > The list includes several titles by Edith Van Dyne.? In the catalog, > Edith Van Dyne exists only as a pseudonym for L. Frank Baum. Baum's > name is the one attached to the works in the bibliographic records. and here -- joyce -- is where i diverge sharply from meta-data people. because when i look at the e-texts themselves, there is _no_mention_ at all in them of baum's name, only this "edith van dyne" pseudonym... (well, in one of them, "van dyne" lists "baum" along with a number of other well-respected authors, which must've seemed like a fine joke.) yet in the r.d.f. catalog, it's "baum" only, with no mention of "van dyne". when you search the catalog on the website, it _mentions_ "van dyne", but not in the specific context of the exact e-texts authored as such... so there's this huge gulf between the _actual_data_ and the "metadata". one says one thing, the other says another, and i can't countenance that. that's why i decided to scrap the catalog, and depend on the headers... now of course, i'd love to have the information about _both_ names in _both_ places -- the actual e-texts and the catalog -- but in the absence of congruence between the two, i'll settle on the "real" data. i want for each of my "data" files to contain _all_ the information that appears about them in the "metadata" of the catalog, so the catalog can be regenerated at will just by combing through the "data" files... > Not to claim that there are no inconsistencies in the catalog, > but from a cataloger's point of view it's not reasonable to > tally every file header inconsistency as a catalog problem. perhaps from your cataloger's point of view, your catalog is correct, and the shortage of info in the "file header" is a mere "inconsistency". but from my standpoint of _integrating_ the catalog with the e-texts, as a mere guide to the library that's the accumulation of those e-texts, the _discrepancy_ between the two is an unacceptable burden to bear. i'm not suggesting that it's _your_ problem. (although it kind of is, in the sense that i believe the catalog should reflect the pseudonym; it can have the "real" name too, but it _should_ have the pseudonym.) i believe that the e-texts _and_ the catalog should reflect both names. and, of course, i'll do this fix to both sides of the equation in my work. but as we both know, this particular situation has a lot of manifestations. (the book from gweeks on which i made an error-report recently had it.) in my humble opinion, and the way that i'll surely structure my workflow, any change made to the catalog should be mirrored in the e-text itself... -bowerbird p.s. and of course when i talked about "cleaning up the catalog", i was _not_ talking about "cleaning" on a _factual_ basis, such as in this case. instead, i was talking about making all the catalog entries _consistent_. i appreciate greatly the work of the cataloging team to make the catalog more _accurate_, but my concern at present is merely making it _useful_. inconsistent entries make it difficult for people to know how to search it. ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080529/eba3169e/attachment.htm From julio.reis at tintazul.com.pt Thu May 29 14:03:20 2008 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis) Date: Thu, 29 May 2008 22:03:20 +0100 Subject: [gutvol-d] Dates of death and copyright status outside the US In-Reply-To: References: Message-ID: <1212095000.6554.148.camel@abetarda> Marcello, I realise I've touched some kind of wound. I didn't mean to do harm. (Quoting Ruth Rendell: "One could say he meant well, if it weren't such a bad thing to say about someone.") > Ask Alexa. She says "USA 37%, Germany 6%" etc. etc. > Its perfectly doable with the current software. If you feel > like it, just do it. If you don't do it, don't complain that > nobody else wants to do it for you. I'm doing all the corrections I can think of. I am not complaining. > > No one's refuted my take on the homepage. > > If I answered everybody who doesn't like the look of the site, > I > wouldn't do anything else. Believe me, I had no idea. I've been at this list for a few months. > Its a wiki. Start a new front page and put it up for votes. No, man. I'll take it the way it is. I am too busy doing books at DP. Plus randomly selecting from a list of choice items isn't something that can be done with regular MediaWiki syntax, it needs some programming. J?lio. From hart at pglaf.org Thu May 29 14:46:54 2008 From: hart at pglaf.org (Michael Hart) Date: Thu, 29 May 2008 14:46:54 -0700 (PDT) Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: Message-ID: On Thu, 29 May 2008, Bowerbird at aol.com wrote: > michael said: >> So you SAY, but without them you are not getting anywhere > > and here again you seem to know how my progress is going. > but your report is the exact opposite of my own experience... > > i'm making excellent progress, my friend, excellent progress. If you really are, then why all the complaining??? Oh. . .right. . .that's not really complaining, is it??? > > >> You want so substitute computers for volutneers, >> and you keep finding out that doesn't work. > > michael, you're wrong -- very badly wrong -- on both counts. > > >> No one needs to hear you keep repeating complaints. > > i wonder why you have this "need" to keep trying to spin my > posts as "complaints". they're constructive criticism, michael. > if you listened, and acted, your e-library would be improved... If you don't ACT on them, they are only complaints. . . . How many years does it take for that to get through??? > > >> Ah, so your gift is already retracted >> in the same message it was offered??? > > the question is whether i improved your e-library in-place, > or set up another version -- my e-library -- independently... Personally, _I_ would suggest doing both. You might just be more effective that way. > you have been telling me all along to set up my own version. Yes, I have. > i've been reluctant to do that, because it _will_ upstage you... Again, some of us don't feel improvement mean being upstaged. I would LOVE for YOU, or anyone else, to make ALL my work part of a past that is now just history, just the foundation for a new present that is a totally new dimension. > your library has been held back because programmers can't > access the underlying structure of the e-texts to add value... And THIS is a complaint. . . . I've been encouraging you do fix it all along. . . . > but you've won, michael. i am no longer reluctant to do that. > congratulations. I hope it works out well for all concerned. > > >> Are you insisting on the pretense that your version is >> not welcome here after all the !!!YEARS!!! I have spent >> encouraging you to JUST DO IT. . .RIGHT HERE??? > > i can't even get well-documented errors corrected in your e-texts. > why would i have any reason to believe i could do the large-scale > editing necessary to bring those e-texts to a state of consistency? Have you ever concerned yourself with how hard it is for ME to get errors fixed??? Have you ever concerned yourself that you might have alientated those who could fix these errors for you, for me, for us, for a whole world??? > > i'll mount my vision... when you see how programmers flock to it, > you will come to grasp fully and precisely what i have been saying. I hope you get everything running the way you hope and want, or can made adjustments to make it come out something like that. > > i'm really sorry it had to come to this... but i see i have no > choice... I still say this is a pretence on your part. YOU are creating some fictional rift that YOU need to separate yourself from me. This has never been necessary. You have been, and still are, welcome to come and go as you please. I just hope you don't try to build a wall where none exists. Do you really feel such a need to PRETEND there is such a rift? If so, just remember that this rift is a figment of yours. I doubt anyone else here will care, but I encourage you, and I hope your cup runneth over with success, and that you will share it with the world, even with me, or with Project Gutenberg, should I be gone by the time you get your big success. Michael > > -bowerbird > > > > ************** > Get trade secrets for amazing burgers. Watch "Cooking with > Tyler Florence" on AOL Food. > (http://food.aol.com/tyler-florence?video=4& > ?NCID=aolfod00030000000002) > From Bowerbird at aol.com Thu May 29 16:11:29 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 29 May 2008 19:11:29 EDT Subject: [gutvol-d] can't see the forest for the trees Message-ID: michael- there's no pretense, no rift, and no wall. there's just me saying that, as a programmer who has attempted to add value to the project gutenberg library, i was frustrated by massive inconsistencies in the e-texts, to the point that i was unable to accomplish what i wanted. further, as an observer of _other_ programmers who have had the exact same experience, i felt that i should tell you. some of these programmers experienced the inconsistencies many years ago -- like ron burkey -- and others experienced them a few years back -- like the o.l.p.c. person -- and others experienced them only recently-- like list-subscriber legutierr. and who knows how many others tried, failed, and left silently? i have no bone to pick with you. only with the inconsistencies. you're a swell fellow. i haven't said _anything_ bad about you. i'm not sure why now you think it's ok to pick on me personally, especially since you've taken up the "all complaining, no action" line that caused my antagonists here to lose all their credibility. but you're a swell guy. so maybe it's just a misunderstanding... but whatever... it doesn't matter, because flames don't hurt me. it's clear it's time for me to concede "defeat", though, since p.g. isn't ever going to become a consistent library, so i will have to set up an independent mirror to show the value of consistency. because this programmer is getting antsy to show his chops... otherwise the world will think that noring's "pagelets" are the best thing the cyberlibrary of tomorrow will offer to people... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080529/2408dcab/attachment-0001.htm From richfield at telkomsa.net Fri May 30 00:28:03 2008 From: richfield at telkomsa.net (Jon Richfield) Date: Fri, 30 May 2008 09:28:03 +0200 Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB Message-ID: <483FAC83.1050600@telkomsa.net> BB said: >> Errr, yeah, sortakinda, but why can't they >> get the full content and format as easily from the HTM file? >um, you want the viewer-program for the .txt file to >read the .html file to get the graphic's filename when >you could have just put it right there in the .txt file? >why be so convoluted? How is this more convoluted than getting it from specifically coded TXT? You might claim that markup is generally more complex than txt, but that hardly is true to a greater extent than a halfway decent ML is capable of giving more information about format and encoding than the txt. Note that the same goes for encoding of alien alphabets, scripts and typefaces, not to mention graphics (David S. please note!) the advantage of a widely used ML is that the format is defined and therefore it is largely easier to automate their interpretation and conversion than in the case of TXT. For example, time is becoming too precious for me to spend it on learning new non-automated and non-universal conventions, so when I want to scan a book with multiple necessary graphics, I use the Omnipage that came with my scanner (I may soon try another package for Linux) for OCR. I export the product into Word or Open Source Writer for formatting. The reason is the repertoire of macros that I have accumulated and that suit my typing style. The finished product I convert to HTML and I use TIDY to make it fit for decent company. This process deals with everything from formatting to spelling. To produce TXT is an extra overhead, particularly with the PG requirement of breaking lines at the window edge. In fact it frequently takes me more effort than the HTML does (though I grant that it reveals mre errors, of course.) Now, when Omnipage or some similar product will produce a universally acceptable and versatile format, that is what I may use if it costs only a few toes, but meanwhile, being a user, and not a producer and designer of MLs and ML automation tools, I simply wags my great long furry ears. Let me know when the squabbling at the other end is done. I don't have the time for it nowadays. Cheers, Jon From Bowerbird at aol.com Fri May 30 02:19:07 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 30 May 2008 05:19:07 EDT Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB Message-ID: jon richfield said: > I simply wags my great long furry ears. i'm sorry. i thought you were asking a question. i thought you wanted to know how people who were using the .txt file could have the benefit of the graphics files too. i suggested an easy answer -- include the name of each file at the point where that graphic was supposed to be shown... if nothing else, the user can chase down the file manually, with the advantage he knows what filename to search for... now -- unless i misunderstand -- you're saying this is all far too complicated. so i no longer know how to respond. but i guess it doesn't matter, because now it appears that you haven't been asking any question all along anyway... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080530/cd46693d/attachment.htm From marcello at perathoner.de Fri May 30 05:30:05 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri, 30 May 2008 14:30:05 +0200 Subject: [gutvol-d] Dates of death and copyright status outside the US In-Reply-To: <1212095000.6554.148.camel@abetarda> References: <1212095000.6554.148.camel@abetarda> Message-ID: <483FF34D.9010805@perathoner.de> J?lio Reis wrote: > No, man. I'll take it the way it is. I am too busy doing books at DP. > Plus randomly selecting from a list of choice items isn't something that > can be done with regular MediaWiki syntax, it needs some programming. MediaWiki plugins are not that hard to write. You just write some mock-up and if people like it, we'll turn it live. -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Fri May 30 10:40:13 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 30 May 2008 13:40:13 EDT Subject: [gutvol-d] antsy Message-ID: i said: > because this programmer is getting antsy to show his chops... hey, that's a good one, i hope marcello puts it on his fan-site for me. :+) -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080530/c1a81d00/attachment.htm From hart at pglaf.org Fri May 30 11:16:43 2008 From: hart at pglaf.org (Michael Hart) Date: Fri, 30 May 2008 11:16:43 -0700 (PDT) Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: Message-ID: I would certainly like to mirror your mirror, if that's ok. mh On Thu, 29 May 2008, Bowerbird at aol.com wrote: > michael- > > there's no pretense, no rift, and no wall. > > there's just me saying that, as a programmer who has > attempted to add value to the project gutenberg library, > i was frustrated by massive inconsistencies in the e-texts, > to the point that i was unable to accomplish what i wanted. > > further, as an observer of _other_ programmers who have > had the exact same experience, i felt that i should tell you. > > some of these programmers experienced the inconsistencies > many years ago -- like ron burkey -- and others experienced > them a few years back -- like the o.l.p.c. person -- and others > experienced them only recently-- like list-subscriber legutierr. > > and who knows how many others tried, failed, and left silently? > > i have no bone to pick with you. only with the inconsistencies. > you're a swell fellow. i haven't said _anything_ bad about you. > > i'm not sure why now you think it's ok to pick on me personally, > especially since you've taken up the "all complaining, no action" > line that caused my antagonists here to lose all their credibility. > but you're a swell guy. so maybe it's just a misunderstanding... > > but whatever... it doesn't matter, because flames don't hurt me. > > it's clear it's time for me to concede "defeat", though, since p.g. > isn't ever going to become a consistent library, so i will have to > set up an independent mirror to show the value of consistency. > > because this programmer is getting antsy to show his chops... > > otherwise the world will think that noring's "pagelets" are the > best thing the cyberlibrary of tomorrow will offer to people... > > -bowerbird > > > > ************** > Get trade secrets for amazing burgers. Watch "Cooking with > Tyler Florence" on AOL Food. > (http://food.aol.com/tyler-florence?video=4& > ?NCID=aolfod00030000000002) > From Bowerbird at aol.com Fri May 30 12:12:59 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 30 May 2008 15:12:59 EDT Subject: [gutvol-d] can't see the forest for the trees Message-ID: michael said: > I would certainly like to mirror your mirror, if that's ok. you know that public-domain means you don't need my ok. plus i'd consider it to be an honor if you did that. -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080530/a4855064/attachment.htm From richfield at telkomsa.net Fri May 30 13:19:17 2008 From: richfield at telkomsa.net (Jon Richfield) Date: Fri, 30 May 2008 22:19:17 +0200 Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB Message-ID: <48406145.6070909@telkomsa.net> BB >i'm sorry. i thought you were asking a question. >i thought you wanted to know how people who were using >the .txt file could have the benefit of the graphics files too. Thanks for your trouble, though I am slightly nonplussed at the idea of such a naive question. Nor can I see what I said that looked like such a question. However, I sincerely thank you for your trouble. I have made similar slip-ups myself from time to too frequent time. >i suggested an easy answer -- include the name of each file >at the point where that graphic was supposed to be shown... >if nothing else, the user can chase down the file manually, >with the advantage he knows what filename to search for... I fail to see how that is easier than reading a ML file with the material all in place. It certainly is no easier to prepare. I don't know how many of our users go for the txt files for lack of GUI browsers. If they do, then I don't see what they gain by being able to read the names of the graphics files. Without the means to present the graphics, they might as well either do without, or make do with the unembellished txt and as much imagination as the can muster. In either case, why bother with the TXT references? This is no rhetorical question; is there any incentive that I have overlooked? >now -- unless i misunderstand -- you're saying this is all >far too complicated. so i no longer know how to respond. Complicated? How did complication get into this? This must be getting too complicated for me. I simply expressed a preference for maximal exploitation of available cheap and convenient tools until something better came along. If txt plus pointers to user-accessible insertions were the best thing, then sure; I'd use them; however, since neolithic flakes now are available to affix to the arrows, I see no reason to continue hand-charring the wooden points to fire-harden them. Rool on machine tools! (No matter who designs them!) >but i guess it doesn't matter, because now it appears that >you haven't been asking any question all along anyway... Oh, I asked plenty. That just happened not to be it and essentially all that I asked (except for the German script thing) elicited helpful responses. . But as I said, thanks anyway, and any abrasion from my quarter is inadvertent. Cheers, Jon From Bowerbird at aol.com Sat May 31 00:10:32 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 31 May 2008 03:10:32 EDT Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB Message-ID: jon richfield said: > Nor can I see what I said that looked like such a question. oh, ok, i'm happy to clear that up. you said this: > I do usually supply TXT versions as well, but then it is > up to the text-zealots to do as they please about missing pictures. i see now, since you have clarified yourself for me, that you are basically giving the finger to us "text-zealots", informing us that you have no intention of doing a thing to make it easy for us to deal with those "missing" pictures. i'm seeing this attitude -- a desire to "cripple" the .txt files -- more and more, especially from quite a few people over at d.p. here's a good example -- the story of pocahontas: > http://www.gutenberg.org/files/24487/24487.txt it has lots of stuff like this: > She was a child of nature, and the birds trusted her > and came at her call. She knew their songs, and > where they built their nests. So she roamed the woods, > and learned the ways of all the wild things, and > grew to be a care-free maiden. > > [Illustration] notice that "[illustration]" notation? really something, isn't it? what good does it do? tells the user they're missing a picture. doesn't do a single thing to tell them _where_ they can find it. even if they were to navigate to the folder where the pictures are: > http://www.gutenberg.org/files/24487/24487-h/images/ it doesn't tell them the _name_ of the file containing that picture. by looking at the .html version, i can tell you that the picture is here: > http://www.gutenberg.org/files/24487/24487-h/images/i005-1.jpg is it that hard to imagine that if you put that u.r.l. in the .txt file, a .txt-file viewer-program could fetch it from there and display it, right there at that spot in the text? it isn't that hard for _me_ to imagine. so the person who prepared this book could've added a lot of value to the .txt file by including that information. and make no mistake, they _had_ that info, since they needed it to make the .html version. but they _took_an_extra_step_of_work_ to discard it from the .txt file. so -- jon -- since you've said you were "slightly nonplussed at the idea of such a naive question", let _me_ ask you an extremely naive question: why in the world is this significant information deliberately discarded? *** > I fail to see how that is easier than > reading a ML file with the material all in place. well, i could go in any number of directions to reply to this... first of all, some people don't particularly enjoy the web-browser as a book-reading environment; it's not tailored to that purpose. second, "reading" isn't the only thing people will do with these files. they will be remixing them, and we make that unnecessarily difficult whenever we fail to include all of the relevant information in the file... (by "remixing", i mean all kinds of conversions and file re-workings, some of which we haven't even imagined, but which will surely come.) > I don't know how many of our users go for the txt files for > lack of GUI browsers.? If they do, then I don't see what they > gain by being able to read the names of the graphics files.? again, you're failing to anticipate the wide range of ways to which these files could be put to use, if only we prepare them correctly... at any rate, i'm not sure whether you think it's too difficult or not to put the filenames into the .txt files at their appropriate places... if not, then i suggest you might entertain the thought of doing so... but if it is, fine, don't trouble yourself, since i will be fixing all these omissions when i mirror the library. either way, have a nice weekend. -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080531/08efa6b5/attachment.htm From joshua at hutchinson.net Sat May 31 05:41:41 2008 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sat, 31 May 2008 12:41:41 +0000 (GMT) Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB Message-ID: <1808933943.322341212237701607.JavaMail.mail@webmail02> It isn't that it's hard to imagine. It's that there isn't a single viewer program out there that DOES this (and no, your vapor-ware promises don't count). Nor, in the many years I've been around here, has there been a single announcement that such a product was in development. Or heck, even in planning. So putting a feature into the .txt file that 1-Has no spec on how to do it in a standard way AND 2-Has zero support if there WAS a standard method ... Well, that makes very little sense. Josh On May 31, 2008, Bowerbird at aol.com wrote: is it that hard to imagine that if you put that u.r.l. in the .txt file, a .txt-file viewer-program could fetch it from there and display it, right there at that spot in the text?