From Bowerbird at aol.com  Thu May  1 12:50:48 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 1 May 2008 15:50:48 EDT
Subject: [gutvol-d] happy may day
Message-ID: <cc0.31ccda08.354b7918@aol.com>

happy may day!

this worker is taking the day off...

see you tomorrow.

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080501/754a31a7/attachment.htm 

From Bowerbird at aol.com  Fri May  2 12:05:03 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 2 May 2008 15:05:03 EDT
Subject: [gutvol-d] tor offers books for free, and i get a big surprise
Message-ID: <cf2.2fdf7dad.354cbfdf@aol.com>

tor has been making new, frontlist books available for free.

here's the latest:
>    
http://hbpub.vo.llnwd.net/o16/video/olmk/tor.com/Priest,%20Cherie%20-%20Four%20and%20Twenty%20Blackbirds.pdf

i accidentally loaded this into the .pdf-viewer in safari,
and to my amazement, the viewer works very nicely...

for years now i've disabled the .pdf browser plug-in,
because it used to always hang, and sometimes crash.

i'll probably turn it off in safari as well, but it's nice to
know adobe _finally_ got all the bugs out of the thing.

tor also offers an .html version for those of you who
detest .pdf in any form.   if i'm reading on my laptop,
i prefer a paginated version myself.   and when that
laptop is actually at my desk (as opposed to -- say --
frolicking in palisades park overlooking the pacific)
with a 23-inch cinema-screen monitor, it would be
absolutely silly to give up the 2-page display of .pdf
for the screen-wasting 1-column .html nonsense...

i recognize the weaknesses of .pdf as well as anyone,
but i also recognize its strengths.

(tor offers a .mobi version of these books as well.)

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080502/3d6f81de/attachment.htm 

From julio.reis at tintazul.com.pt  Sat May  3 14:44:22 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Sat, 03 May 2008 22:44:22 +0100
Subject: [gutvol-d] tor offers books for free, and i get a big surprise
In-Reply-To: <mailman.2.1209841202.8591.gutvol-d@lists.pglaf.org>
References: <mailman.2.1209841202.8591.gutvol-d@lists.pglaf.org>
Message-ID: <1209851062.6439.4.camel@abetarda>

> tor has been making new, frontlist books available for free.

Slightly off-topic, but can anyone send me / let me know where I can
find the first PDFs offered? The first one I subscribed was Robert
Charles Wilson's "Spin;" I'd love to have any earlier ones.

Thanks
J?lio.


From Bowerbird at aol.com  Mon May  5 09:23:35 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 5 May 2008 12:23:35 EDT
Subject: [gutvol-d] too funny, part two point nine
Message-ID: <d02.31a34ba0.35508e87@aol.com>

ok, we're on iteration#9 of the "perpetual p1" experiment...

and yep, it has turned up a "real error".   two of 'em, in fact.
on the very first scan, where the name of a main character
-- nelsen -- was misspelled twice as "nelson".   too funny...

meanwhile, nope, the error on page 33 (an excess comma)
was _not_ corrected, so we're gonna have to hope for i#10.

oh yeah, another error was exposed.   i'll explain that later...

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/86650352/attachment.htm 

From Bowerbird at aol.com  Mon May  5 09:44:08 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 5 May 2008 12:44:08 EDT
Subject: [gutvol-d] too funny, part two point eight
Message-ID: <bc7.22337665.35509358@aol.com>

anyway, i guess "reality" is too slippery of a concept
for piggy, because over on the d.p. wiki, he has said
there were 7 "real errors" found during iteration#8...

under my definition of "real error", though, i find just 2.

one is this one, which i've discussed before:
>    "Okay, Frank. Nobody's indispensible. I might do the same
>    "Okay, Frank. Nobody's indispensable. I might do the same

this is clearly an error, since the dictionary informs us clearly.

the other is this one:
>    ridge, where I often go, when offshift. Carbon dioxide and a
>    ridge, where I often go, when off-shift. Carbon dioxide and a

the second is kind of iffy, but i'll call it a "printer's error" simply
because i did, in fact, go and change it in my version of the file.

the "off-" compounds are an inconsistent lot in general, and this
particular one isn't listed in the dictionary, so i would hesitate to
argue with anyone over this, but since "off-duty" and "off-hour"
were both hyphenated, _i_ went with the hyphenated "off-shift"...
but see offbeat, offcast, offhand, offshoot, offshore, and offstage.
finally, note that if you are off-camera, you will also be offscreen.
so, you know, reasonable folks can differ, and all of that rot...

still, at most we have 2 "real errors" here, and certainly not _7_...
so piggy, if you could clear that up for me, i would appreciate it.

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/435830e9/attachment.htm 

From Bowerbird at aol.com  Mon May  5 10:37:09 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 5 May 2008 13:37:09 EDT
Subject: [gutvol-d] tor offers books for free, and i get a big surprise
Message-ID: <d60.28fcda4d.35509fc5@aol.com>

julio said:
>   can anyone send me / let me know where I can find the first PDFs offered?

here are the links for some of them...

-bowerbird
====================================================

mistborn

http://e2ma.net/go/939129369/827662/29986921/goto:http://hbpub.vo.llnwd.net/o1
6/video/olmk/tor.com/9780765350381.pdf


spin

as .pdf
http://e2ma.net/go/959207679/850310/31028053/goto:http://hbpub.vo.llnwd.net/o1
6/video/olmk/tor.com/Wilson,%20Robert%20Charles%20-%20Spin.pdf

as .html
http://e2ma.net/go/959207679/850310/31028055/goto:http://hbpub.vo.llnwd.net/o1
6/video/olmk/tor.com/WilsonSpinHTML/Wilson,%20Robert%20Charles%20-%20Spin.html


as .mobi
http://e2ma.net/go/959207679/850310/31028056/goto:http://hbpub.vo.llnwd.net/o1
6/video/olmk/tor.com/WilsonSpinMobi/Wilson,%20Robert%20Charles%20-%20Spin.prc


through wolf's eyes

as .pdf
http://e2ma.net/go/1015848447/910799/33371213/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/Lindskold,%20Jane%20-%20Through%20Wolfs%20Eyes.pdf

as .html
http://e2ma.net/go/1015848447/910799/33371215/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/LindskoldTWEHTML/Lindskold,%20Jane%20-%20Through%20Wolfs
%20Eyes.html

as .mobi
http://e2ma.net/go/1015848447/910799/33371216/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/LindskoldTWEMobi/Lindskold,%20Jane%20-%20Through%20Wolfs
%20Eyes.prc


disunited states

as .pdf
http://e2ma.net/go/1026613141/922980/33806409/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/Turtledove,%20Harry%20-%20The%20Disunited%20States%20of%
20America.pdf

as .html
http://e2ma.net/go/1026613141/922980/33806383/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/TurtledoveTDSOAHTML/Turtledove,%20Harry%20-%20The%20Disu
nited%20States%20of%20America.html

as .mobi
http://e2ma.net/go/1026613141/922980/33806385/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/TurtledoveTDSOAMobi/Turtledove,%20Harry%20-%20The%20Disu
nited%20States%20of%20America.prc


reiffen's choice

as .pdf
http://e2ma.net/go/1037728040/935221/34265550/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/Butler,%20S.%20C.%20-%20Reiffeins%20Choice.pdf

as .html
http://e2ma.net/go/1037728040/935221/34265552/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/ButlerRCHTML/Butler,%20S.%20C.%20-%20Reiffeins%20Choice.
html

as .mobi
http://e2ma.net/go/1037728040/935221/34265553/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/ButlerRCMobi/Butler,%20S.%20C.%20-%20Reiffeins%20Choice.
prc


sun of suns

as .pdf
http://e2ma.net/go/1049214279/947609/34748629/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/Schroeder,%20Karl%20-%20Sun%20of%20Suns.pdf

as .html
http://e2ma.net/go/1049214279/947609/34748631/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/SchroederSOSHTML/Schroeder,%20Karl%20-%20Sun%20of%20Suns
.html

as .mobi
http://e2ma.net/go/1049214279/947609/34748632/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/SchroederSOSMobi/Schroeder,%20Karl%20-%20Sun%20of%20Suns
.prc


blackbirds

as .pdf
http://e2ma.net/go/1062408210/960854/35278504/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/Priest,%20Cherie%20-%20Four%20and%20Twenty%20Blackbirds.
pdf

as .html
http://e2ma.net/go/1062408210/960854/35278506/goto:http://hbpub.vo.llnwd.net/o
16/video/olmk/tor.com/PriestFATBHTML/Priest,%20Cherie%20-%20Four%20and%20Twent
y%20Blackbirds.html

as .mobi
http://e2ma.net/go/1062408210/960854/35278507/goto:http://hbpub.vo.llnwd.net/o
16/
video/olmk/tor.com/PriestFATBmobi/Priest,%20Cherie%20-%20Four%20and%20Twenty%20Blackbirds.prc


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/4eaae019/attachment.htm 

From hart at pglaf.org  Mon May  5 13:37:27 2008
From: hart at pglaf.org (Michael Hart)
Date: Mon, 5 May 2008 13:37:27 -0700 (PDT)
Subject: [gutvol-d] Google Ads
Message-ID: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>


We're considering running tests with Google ads,
any objections?


Thanks!!!

Michael


From Bowerbird at aol.com  Mon May  5 13:54:11 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 5 May 2008 16:54:11 EDT
Subject: [gutvol-d] Google Ads
Message-ID: <c1d.3a464d27.3550cdf3@aol.com>

michael said:
>   Google ads

might as well...   everybody else is doing it...   great idea...

before you do, can you tell us where the money will go?

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/2370add0/attachment.htm 

From Bowerbird at aol.com  Mon May  5 13:59:09 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 5 May 2008 16:59:09 EDT
Subject: [gutvol-d] at least a little good news from d.p.
Message-ID: <d4f.27427518.3550cf1d@aol.com>

rfrank sums up the results of his parallel proofing experiment:
>    http://www.pgdp.net/phpBB2/viewtopic.php?p=452560#452560

he found out that it works.   it's good to know that someone over there
can recognize the truth when it comes up and kicks them in the shins...

of course, as i pointed out originally, we _already_knew_ that it works...

***

andre engles adds this:
>    Regarding parallel proofing, might it be interesting to check whether 
>    parallel proofing works better or worse than subsequent proofing? 
>    That is, let a book go through P1 three times - twice in parallel 
>    (from the same point of departure), and then one of the two results 
>    as a P1 -> P1. Then compare the outcome of P1 -> P1 to that
>    of the combination of the two parallel rounds.

also good to see someone over there knows the _correct_ test to do...

and maybe someday they'll get around to running it _intentionally_...

***

carlo said:
>    We indeed suspect that this would be the best procedure 
>    for well-prepared books with good OCR. The experiment is 
>    to gather data and tools; unfortunately the amount of data 
>    to get evidence is huge, and doing it without proper tools 
>    is almost impossible.

i'm not sure why carlo thinks "the amount of data to get evidence
is huge", since it's not, or why "doing it without proper tools" is a
matter of concern for him, since the tools are very easy to build...

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/24177a4d/attachment.htm 

From piggy at netronome.com  Mon May  5 14:11:22 2008
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Mon, 05 May 2008 17:11:22 -0400
Subject: [gutvol-d] Google Ads
In-Reply-To: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
Message-ID: <481F77FA.1030907@netronome.com>

Michael Hart wrote:
> We're considering running tests with Google ads,
> any objections?
>
>
> Thanks!!!
>
> Michael
>   

Go for it!


From klofstrom at gmail.com  Mon May  5 14:20:29 2008
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Mon, 5 May 2008 11:20:29 -1000
Subject: [gutvol-d] Google Ads
In-Reply-To: <481F77FA.1030907@netronome.com>
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
	<481F77FA.1030907@netronome.com>
Message-ID: <1e8e65080805051420l1e927b74o19e9415cac66e538@mail.gmail.com>

On Mon, May 5, 2008 at 11:11 AM, La Monte H.P. Yarroll
<piggy at netronome.com> wrote:

> Michael Hart wrote:
>  > We're considering running tests with Google ads,
>  > any objections?

Putting Google ads on PG, or putting ads for PG on Google?

NO!!!! and Yes.

--
Karen Lofstrom
AKA Zora

From hart at pglaf.org  Mon May  5 14:43:49 2008
From: hart at pglaf.org (Michael Hart)
Date: Mon, 5 May 2008 14:43:49 -0700 (PDT)
Subject: [gutvol-d] Google Ads
In-Reply-To: <1e8e65080805051420l1e927b74o19e9415cac66e538@mail.gmail.com>
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
	<481F77FA.1030907@netronome.com>
	<1e8e65080805051420l1e927b74o19e9415cac66e538@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0805051436030.22675@pglaf.org>


Talking of putting Google ads on PG sites. . . .

Some have them already, but not the biggest one. . . .

We don't control advertizing on other PG sites,
just the ones we run ourselves, there are hundreds,
if not thousands altogether.

We've been in the red for over 5 years now,
during which I have put off my whole salary,
and even part of my office expenses, but I'm
not sure how long I can continue this way.

I'm good for at least another year or two,
so don't agree just because this is an emergency,
becausse it is not, but, I am getting to the age
where I have to plan ahead more.

Thanks!!!

Michael


On Mon, 5 May 2008, Karen Lofstrom wrote:

> On Mon, May 5, 2008 at 11:11 AM, La Monte H.P. Yarroll
> <piggy at netronome.com> wrote:
>
>> Michael Hart wrote:
>> > We're considering running tests with Google ads,
>> > any objections?
>
> Putting Google ads on PG, or putting ads for PG on Google?
>
> NO!!!! and Yes.
>
> --
> Karen Lofstrom
> AKA Zora
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From grythumn at gmail.com  Mon May  5 14:51:53 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Mon, 5 May 2008 17:51:53 -0400
Subject: [gutvol-d] Google Ads
In-Reply-To: <Pine.LNX.4.64.0805051436030.22675@pglaf.org>
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
	<481F77FA.1030907@netronome.com>
	<1e8e65080805051420l1e927b74o19e9415cac66e538@mail.gmail.com>
	<Pine.LNX.4.64.0805051436030.22675@pglaf.org>
Message-ID: <15cfa2a50805051451g3775f751pe2c1a40e66df57b5@mail.gmail.com>

I suppose the biggest question is a) Does PG pay directly for hosting
the main site, and b) If it doesn't, does the entity paying (NCERN?)
mind having google ads on a site hosted on their network. I've heard
of other non-profit groups that have ran into this or similar problems
before and lost their hosting partner...

R C

On Mon, May 5, 2008 at 5:43 PM, Michael Hart <hart at pglaf.org> wrote:
>
>  Talking of putting Google ads on PG sites. . . .
>
>  Some have them already, but not the biggest one. . . .
>
>  We don't control advertizing on other PG sites,
>  just the ones we run ourselves, there are hundreds,
>  if not thousands altogether.
>
>  We've been in the red for over 5 years now,
>  during which I have put off my whole salary,
>  and even part of my office expenses, but I'm
>  not sure how long I can continue this way.
>
>  I'm good for at least another year or two,
>  so don't agree just because this is an emergency,
>  becausse it is not, but, I am getting to the age
>  where I have to plan ahead more.
>
>  Thanks!!!
>
>  Michael
>
>
>
>
>  On Mon, 5 May 2008, Karen Lofstrom wrote:
>
>  > On Mon, May 5, 2008 at 11:11 AM, La Monte H.P. Yarroll
>  > <piggy at netronome.com> wrote:
>  >
>  >> Michael Hart wrote:
>  >> > We're considering running tests with Google ads,
>  >> > any objections?
>  >
>  > Putting Google ads on PG, or putting ads for PG on Google?
>  >
>  > NO!!!! and Yes.
>  >
>  > --
>  > Karen Lofstrom
>  > AKA Zora
>  > _______________________________________________
>  > gutvol-d mailing list
>  > gutvol-d at lists.pglaf.org
>  > http://lists.pglaf.org/listinfo.cgi/gutvol-d
>  >
>  _______________________________________________
>  gutvol-d mailing list
>  gutvol-d at lists.pglaf.org
>  http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From Bowerbird at aol.com  Mon May  5 15:11:03 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 5 May 2008 18:11:03 EDT
Subject: [gutvol-d] Google Ads
Message-ID: <cfd.31e63b59.3550dff7@aol.com>

michael said:
>    We've been in the red for over 5 years now,
>   during which I have put off my whole salary,
>   and even part of my office expenses, but I'm
>   not sure how long I can continue this way.

i can support a salary for you wholeheartedly...

but not for anyone else...

and _especially_ not over on the d.p. side,
given their immoral squandering of energy
which volunteers are donating in good faith...
they should be _penalized_ for that negligence.

-bowerbird

p.s.   i was gonna say i could see david widger
getting a small paycheck too, but that puts us
on a very slippery and very steep slope, so no.
(but if you _do_ get on that slope, and d.p. did
clean up their mess, juliet deserves something.
but, to repeat, _only_ if she cleans up their act.)


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/1a1bda7c/attachment.htm 

From Bowerbird at aol.com  Mon May  5 15:34:35 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 5 May 2008 18:34:35 EDT
Subject: [gutvol-d] parallel -- the plunderer -- 03 -- yet another upbeat
	post in this series
Message-ID: <c2d.337eb3bc.3550e57b@aol.com>

ok folks, let's take a closer look at "the plunderer", which was
rfrank's parallel experiment over at distributed proofreaders.

as i told you, rfrank discovered that parallel proofing works!

he resolved discrepancies between the parallel p1 iterations,
and then subjected the results to a round by the p2 proofers.

he reports that:
>    in the P1 parallel work:
>    errors found by both: 573
>    additional errors only found in [A] path: 60
>    additional errors only found in [B] path: 104
>    in P2 round after P1 merge:
>    errors (diffs) reported: 55

so he found that the p1a and p1b iterations found
some 573+60+104 errors, giving a subtotal of 737.

p2 found an additional 55, for a grand total of 792...

p1a found 633, which is 80% of the total of 792.
p1b found 677, which is 85% of the total of 792.
p1 combined found 737, 93% of the total of 792.

we don't know if p1c -- a third iteration through p1 --
would have found as many errors as p2 found (55), but
the results of previous d.p. tests indicate they would have.

***

however, there is a much more interesting set of data on this book.

rfrank notes that "very little" preprocessing was done to the o.c.r.

i've noted before that a good chunk of the errors in this text were
340+ spacey-quotes, which can be found and fixed automatically...
a number of other errors could've also been fixed in preprocessing.

so i took the o.c.r. and did some rather good preprocessing on it...
i used known fixes -- nothing fancy -- but went at it aggressively...

what i found is that less than 50 errors remained on the body text,
with approximately 50 more on the front-matter and book-end ads.

these <100 errors are _significantly_ less than rfrank's total of 792,
indicating that even on these clean scans, with o.c.r. by finereader,
preprocessing can make a huge difference, and save proofers time.

moreover, now we can start to see exactly how clean the text can be
when you have clean scans, o.c.r. by abby, and good preprocessing...

with <50 errors on the 300 pages of body text in this book, given
good preprocessing, imagine how well p1 proofers could've done.

and a second iteration of p1 -- sequential, not parallel -- would've
taken this book right to the point of perfection, if not actually there.

i've said it before, but it's time to say it again:   the p1 proofers rock.

moreover, we have found a new twist on our old pattern:
p1a fixes most of the errors (with p1b) plus a few of its own,
p1b fixes most of the errors (with p1a) plus a few of its own,
and p2 comes in to do clean-up on the ones they both missed.

again, this is the pattern you get on page after page,
in book after book, day after day, over in d.p.-land...

why there is any lack of awareness or comprehension
of this pattern is a total and complete mystery to me...

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/3c42ee05/attachment-0001.htm 

From bzg at altern.org  Mon May  5 16:08:45 2008
From: bzg at altern.org (Bastien)
Date: Tue, 06 May 2008 01:08:45 +0200
Subject: [gutvol-d] Google Ads
In-Reply-To: <Pine.LNX.4.64.0805051336360.19808@pglaf.org> (Michael Hart's
	message of "Mon, 5 May 2008 13:37:27 -0700 (PDT)")
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
Message-ID: <87k5i81h8i.fsf@bzg.ath.cx>

Michael Hart <hart at pglaf.org> writes:

> We're considering running tests with Google ads,
> any objections?

I guess the main question is: how much do you expect to earn with ads?
Here is the dilemma we can anticipate: if you don't expect a lot of
money from ads it's not worth bothering people that hate ads; if you
expect a lot of money from ads, people that don't care about ads might
suddenly express concerns...

Hopefully the reality is somewhere in between.  

Or maybe a donation campaign?  Wikipedians might provide useful feedback
on how to deal with such a campaign, and internal discussions about ads.

-- 
Bastien

From ricardofdiogo at gmail.com  Mon May  5 16:28:05 2008
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Tue, 6 May 2008 00:28:05 +0100
Subject: [gutvol-d] Google Ads
In-Reply-To: <87k5i81h8i.fsf@bzg.ath.cx>
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
	<87k5i81h8i.fsf@bzg.ath.cx>
Message-ID: <9c6138c50805051628i67c543fem2686350a57af7904@mail.gmail.com>

I guess Michael is trying to raise a _very, very_ deeper question here...

If you need ads to maintain PG, then I think PG should be immediately shut down.

Depending on ads is as bad as depending on public funds. Both
advertisers and public intitutions can determine what a person can and
cannot do or publish.

_Maybe_ if PG is actually at serious financial risk, a worldwide
campaign asking for donations would be a better idea.

Ricardo

From Bowerbird at aol.com  Mon May  5 16:50:41 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 5 May 2008 19:50:41 EDT
Subject: [gutvol-d] Google Ads
Message-ID: <cf4.2a1d1969.3550f751@aol.com>

ricardo said:
>   If you need ads to maintain PG, then 
>    I think PG should be immediately shut down.

i'm sure p.g. doesn't need ads to be maintained.

but i'd say your commitment to the library doesn't seem solid,
not if you'd shut it down "immediately" if funds _were_ required.

i would say the topic is (a) if we think michael needs a paycheck, and 
(b) if a suitable way of gathering funds would be ads (or another way).

i'm all in favor of a paycheck for michael.   he birthed this project and
raised it for decades, carrying it on his back for most of that period...

and when the p.g. board made their decision, they decided to pay him.
they just don't have the funds to make it a reality at this point in time...

i hate ads.   i used to hate them with a _passion_.   but lately i have
learned to ignore them entirely, as i suspect most of you have too.
indeed, studies show 95% of us _rarely_ click an ad, or never at all.

but the 5% who do -- _idiots_, for the most part, but who cares? --
can generate a nice chunk of change, providing you have the traffic.

so how many of the people who come to p.g. are ad-clicking idiots?
considering that the site has a lot of traffic, it could be quite a lot...
but i'm guessing it's not that many, since ad-clickers aren't readers,
and readers aren't idiots.   but hey, until we try it out, we won't know.

if ads don't generate much cash, then we can make a separate decision
whether we want to mount some kind of other effort to generate funds
to give michael a salary.   but if ads will work, i'd say we should do ads...

-bowerbird

p.s.   i think most people already know that they _can_ donate, right?
they just don't...   after all, we've told them that they books are _free_.
high-profile "please donate" campaigns can be as irritating as ads...


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080505/46686881/attachment.htm 

From prosfilaes at gmail.com  Mon May  5 18:18:44 2008
From: prosfilaes at gmail.com (David Starner)
Date: Mon, 5 May 2008 21:18:44 -0400
Subject: [gutvol-d] Google Ads
In-Reply-To: <9c6138c50805051628i67c543fem2686350a57af7904@mail.gmail.com>
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
	<87k5i81h8i.fsf@bzg.ath.cx>
	<9c6138c50805051628i67c543fem2686350a57af7904@mail.gmail.com>
Message-ID: <6d99d1fd0805051818m40155bcble493a3ad7f02fd68@mail.gmail.com>

On Mon, May 5, 2008 at 7:28 PM, Ricardo F Diogo <ricardofdiogo at gmail.com> wrote:
>  Depending on ads is as bad as depending on public funds. Both
>  advertisers and public intitutions can determine what a person can and
>  cannot do or publish.

As a serious matter, many publishers do depend on ads and manage to
stay respected as independent publishers. Newspapers, for example. And
Project Gutenberg is a library, not publisher of original political
opinion. I think we haven't done nearly enough to get donations that
we should be looking at this, but it's not like ads or even public
funds are going to change what we do unless we choose ones that have
those types of strings attached.

From hart at pglaf.org  Mon May  5 23:26:13 2008
From: hart at pglaf.org (Michael Hart)
Date: Mon, 5 May 2008 23:26:13 -0700 (PDT)
Subject: [gutvol-d] Google Ads
In-Reply-To: <15cfa2a50805051451g3775f751pe2c1a40e66df57b5@mail.gmail.com>
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
	<481F77FA.1030907@netronome.com>
	<1e8e65080805051420l1e927b74o19e9415cac66e538@mail.gmail.com>
	<Pine.LNX.4.64.0805051436030.22675@pglaf.org>
	<15cfa2a50805051451g3775f751pe2c1a40e66df57b5@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0805052324310.30903@pglaf.org>


We could NOT have ads on our current largest host.

We would have to set up a trail run on anoher host. . . .
which we are prepared to do, this is the testing mentioned.

mh

On Mon, 5 May 2008, Robert Cicconetti wrote:

> I suppose the biggest question is a) Does PG pay directly for hosting
> the main site, and b) If it doesn't, does the entity paying (NCERN?)
> mind having google ads on a site hosted on their network. I've heard
> of other non-profit groups that have ran into this or similar problems
> before and lost their hosting partner...
>
> R C
>
> On Mon, May 5, 2008 at 5:43 PM, Michael Hart <hart at pglaf.org> wrote:
>>
>>  Talking of putting Google ads on PG sites. . . .
>>
>>  Some have them already, but not the biggest one. . . .
>>
>>  We don't control advertizing on other PG sites,
>>  just the ones we run ourselves, there are hundreds,
>>  if not thousands altogether.
>>
>>  We've been in the red for over 5 years now,
>>  during which I have put off my whole salary,
>>  and even part of my office expenses, but I'm
>>  not sure how long I can continue this way.
>>
>>  I'm good for at least another year or two,
>>  so don't agree just because this is an emergency,
>>  becausse it is not, but, I am getting to the age
>>  where I have to plan ahead more.
>>
>>  Thanks!!!
>>
>>  Michael
>>
>>
>>
>>
>>  On Mon, 5 May 2008, Karen Lofstrom wrote:
>>
>> > On Mon, May 5, 2008 at 11:11 AM, La Monte H.P. Yarroll
>> > <piggy at netronome.com> wrote:
>> >
>> >> Michael Hart wrote:
>> >>> We're considering running tests with Google ads,
>> >>> any objections?
>> >
>> > Putting Google ads on PG, or putting ads for PG on Google?
>> >
>> > NO!!!! and Yes.
>> >
>> > --
>> > Karen Lofstrom
>> > AKA Zora
>> > _______________________________________________
>> > gutvol-d mailing list
>> > gutvol-d at lists.pglaf.org
>> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
>> >
>>  _______________________________________________
>>  gutvol-d mailing list
>>  gutvol-d at lists.pglaf.org
>>  http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>
>

From hyphen at hyphenologist.co.uk  Mon May  5 23:59:09 2008
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Tue, 6 May 2008 07:59:09 +0100
Subject: [gutvol-d] Google Ads
In-Reply-To: <Pine.LNX.4.64.0805051436030.22675@pglaf.org>
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>	<481F77FA.1030907@netronome.com>	<1e8e65080805051420l1e927b74o19e9415cac66e538@mail.gmail.com>
	<Pine.LNX.4.64.0805051436030.22675@pglaf.org>
Message-ID: <001801c8af46$b267fcb0$1737f610$@co.uk>


Michael Hart wrote

>We've been in the red for over 5 years now,
>during which I have put off my whole salary,
>and even part of my office expenses, but I'm
>not sure how long I can continue this way.

There is another problem with PG funding apart from it being too low :-(

If every volunteer could be persuaded to give say 10USD per year 
it would go some way to improve the situation.

Small contributions from the USA apparently work well, but sky high 
Bank Charges prevent those outside the USA from giving small amounts 
say 10USD because bank charges are about 10 USD per transaction.

The credit card company charges are more reasonable but still very high.

If PG were to create bank accounts in Euroland the UK and other 
Currency areas it would be possible to pay small amounts into those
accounts, 
then transfer such funds in amounts of more than USD 200 when bank charges 
are more reasonable.  In the UK there is a Direct Debit facility available 
which works well for small regular payments to another UK Bank account.

The down side is that one would require an organisation, in each 
currency area to handle the not inconsiderable paperwork, 
and handle the local government controls.

Dave Fawthrop


From richfield at telkomsa.net  Tue May  6 08:59:47 2008
From: richfield at telkomsa.net (Jon Richfield)
Date: Tue, 06 May 2008 17:59:47 +0200
Subject: [gutvol-d] Google Ads
Message-ID: <48208073.7010702@telkomsa.net>

 >>
We're considering running tests with Google ads,
any objections?
<<


Michael,

In the circumstances, I am embarrassed to be asked;  *I* didn't make 
monetary sacrifices to keep the pot boiling. 
Certainly we prefer spam-free pages in every sense, but really, if it 
means that those who can afford it finally begin to pay to
support a genuinely valuable social function, I reckon that we can see 
our way to closing our callussed corneas and ad-inured
eyes to the abominations of the mercenary, for the children of this 
world are in their generation wiser than the children of light. 
For my part, I hardly notice the ads anymore anyway.  Besides, I should 
think that PG would be a prime site, whose many patrons have 
particularly targetable  interests, highly prized by the sponsors.

Go for it!

Jon


From Bowerbird at aol.com  Tue May  6 13:53:26 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 6 May 2008 16:53:26 EDT
Subject: [gutvol-d] parallel -- the plunderer -- 04 -- and another upbeat
	post in this series
Message-ID: <c0f.270fd4f5.35521f46@aol.com>

more on "the plunderer", rfrank's d.p. parallel experiment...

rfrank reports there were 55 changes made by the p2 proofers.
i count only 26-32, so it would be good to get his list from him.
(i didn't count changes of blank lines, or dehyphenations that
could've been done by the machine, so that might explain it...)

at any rate, rfrank says only 9 of the changes were "non-trivial":
>    four stray punctuation marks, 
>    three capitalization errors,
>    and two actual typos
>    ("de" for "do" and "*led" for "*ied" after a page break).

my analysis differs.   i suspect that my auto-detection routines are
more aggressive than his, as i count only 3 changes of substance,
and 2 of those were _correct_o.c.r._ of an error in the paper-book.

all of the other errors were detectable programmatically, and thus
didn't require a round of human proofing to be found and fixed...

of the 3 changes, 1 is an o.c.r. error which was not auto-detectable:
>    listening attitude. Suddenly, as if all had been,
>    listening attitude. Suddenly, as if all had been
>    http://z-m-l.com/go/plund/plundp287.html

there's 1 p-book itso, which is _boring_ (but not "difficult") to detect:
>    its a cinch, it seems to me, he wouldn't do that for
>    it's a cinch, it seems to me, he wouldn't do that for 
>    http://z-m-l.com/go/plund/plundp075.html

and 1 p-book error, not auto-detectable, which a sharp-eyed p2 found:
>    died. There those among them who had been in
>    died. There were those among them who had been in 
>    http://z-m-l.com/go/plund/plundp138.html

given that all of these 3 errors would be reasonably expected to be
caught by the general public in "continuous proofing", i believe that
the question of whether p2 was even needed in this particular book
is open for discussion.

i am quite serious.   if we can get books this perfect with 2 passes
through p1 -- whether they be sequential or parallel in nature --
do the benefits of a pass through p2 warrant the costs?   i say no...

of course, d.p. would have to jack up the quality of its clean-up tools,
but that's not all that difficult.   rfrank might be just the man to do it...

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080506/564a0f79/attachment.htm 

From jeroen.mailinglist at bohol.ph  Tue May  6 13:57:14 2008
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Tue, 06 May 2008 22:57:14 +0200
Subject: [gutvol-d] Google Ads
In-Reply-To: <87k5i81h8i.fsf@bzg.ath.cx>
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
	<87k5i81h8i.fsf@bzg.ath.cx>
Message-ID: <4820C62A.1090009@bohol.ph>

Bastien wrote:
> Michael Hart <hart at pglaf.org> writes:
>
>   
>> We're considering running tests with Google ads,
>> any objections?
>>     
>
> I guess the main question is: how much do you expect to earn with ads?
> Here is the dilemma we can anticipate: if you don't expect a lot of
> money from ads it's not worth bothering people that hate ads; if you
> expect a lot of money from ads, people that don't care about ads might
> suddenly express concerns...
>
>   
I run google ads on a tourism promotion related site I run 
(www.bohol.ph), and this generates about 200 dollar per month
for about 80.000 visits. More than enough to pay its bills, but probably 
not really well paid given the number of ours gone into it. Based on 
Alexa rankings, gutenberg.org has about 40 times as many visits, so a 
simple multiplication would result in about 8000 dollar per month. 
However, I suspect many more visits are possible if the content is 
restructured in smaller chunks, and ads are added in between. This is 
what many of the PG clones are doing.

Unfortunately, you cannot estimate easily with google ads. You will have 
to try and see what comes. Just slamming ads on the site is probably not 
the best idea. You will have to reorganize and actively manage the 
collection smartly to
maximize income. Lots of literature do not have any keywords that are 
worth anything for google ads. I was thinking about building an 
historical travel site around the travelogues available in PG, adding 
feed back, forums, "what is it like today", and "follow the trail of 
<name of famous person>" features, supported by
google ads, and expect, with proper maintenance and smart organization 
can earn more than enough to run a small team on. However,
doing would turn gutenberg.org into a fundamentally different site.

Finally, you may loose some of the more active contributors once you 
start using ads. The more idealistic ones will certainly think
you've completely fallen for big money (although the reality is, we all 
do have to pay our bills).

People who hate ads typically block them. So do I with the more annoying 
varieties.

Jeroen.


From davedoty at hotmail.com  Tue May  6 15:02:02 2008
From: davedoty at hotmail.com (Dave Doty)
Date: Tue, 6 May 2008 22:02:02 +0000
Subject: [gutvol-d] Google Ads
In-Reply-To: <4820C62A.1090009@bohol.ph>
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
	<87k5i81h8i.fsf@bzg.ath.cx>  <4820C62A.1090009@bohol.ph>
Message-ID: <BLU127-W30DFDD46FB2959E41770C2DFD60@phx.gbl>


I'm chiming in late, so forgive me if I'm retreading covered ground.

Another thought:  ads can cost donation dollars.  Not even necessarily because people have an objection to the ads, but because they're assuming ads are bringing in the bucks, and there's no need for extra support.

I have no idea what kind of donations the site brings in.  I'm guessing it's not enough to fully cover costs, or this wouldn't even be brought up.  Maybe it's virtually nothing, in which case this line of reasoning is irrelevant.  But if it's at least a significant percentage, it's worth considering whether ads will bring in enough to offset any potential loss in donations.  I don't have enough information to even hazard a guess on what the answer would be in this case.

Dave

> Date: Tue, 6 May 2008 22:57:14 +0200
> From: jeroen.mailinglist at bohol.ph
> To: gutvol-d at lists.pglaf.org
> Subject: Re: [gutvol-d] Google Ads
> 
> Bastien wrote:
> > Michael Hart <hart at pglaf.org> writes:
> >
> >   
> >> We're considering running tests with Google ads,
> >> any objections?
> >>     
> >
> > I guess the main question is: how much do you expect to earn with ads?
> > Here is the dilemma we can anticipate: if you don't expect a lot of
> > money from ads it's not worth bothering people that hate ads; if you
> > expect a lot of money from ads, people that don't care about ads might
> > suddenly express concerns...
> >
> >   
> I run google ads on a tourism promotion related site I run 
> (www.bohol.ph), and this generates about 200 dollar per month
> for about 80.000 visits. More than enough to pay its bills, but probably 
> not really well paid given the number of ours gone into it. Based on 
> Alexa rankings, gutenberg.org has about 40 times as many visits, so a 
> simple multiplication would result in about 8000 dollar per month. 
> However, I suspect many more visits are possible if the content is 
> restructured in smaller chunks, and ads are added in between. This is 
> what many of the PG clones are doing.
> 
> Unfortunately, you cannot estimate easily with google ads. You will have 
> to try and see what comes. Just slamming ads on the site is probably not 
> the best idea. You will have to reorganize and actively manage the 
> collection smartly to
> maximize income. Lots of literature do not have any keywords that are 
> worth anything for google ads. I was thinking about building an 
> historical travel site around the travelogues available in PG, adding 
> feed back, forums, "what is it like today", and "follow the trail of 
> <name of famous person>" features, supported by
> google ads, and expect, with proper maintenance and smart organization 
> can earn more than enough to run a small team on. However,
> doing would turn gutenberg.org into a fundamentally different site.
> 
> Finally, you may loose some of the more active contributors once you 
> start using ads. The more idealistic ones will certainly think
> you've completely fallen for big money (although the reality is, we all 
> do have to pay our bills).
> 
> People who hate ads typically block them. So do I with the more annoying 
> varieties.
> 
> Jeroen.
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

_________________________________________________________________
Windows Live SkyDrive lets you share files with faraway friends.
http://www.windowslive.com/skydrive/overview.html?ocid=TXT_TAGLM_WL_Refresh_skydrive_052008
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080506/82cd69fa/attachment.htm 

From Bowerbird at aol.com  Wed May  7 16:34:12 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 7 May 2008 19:34:12 EDT
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <c17.35f754c1.35539674@aol.com>

the open library -- http://openlibrary.org -- 
"has just finished its latest release", according to
an announcement that hit my e-mail this morning.

their website says they are now:
>    featuring 13,439,320 books
>    (including 234,857 with full-text)
so that helps firm up the answer to some questions
that we were batting around here somewhat recently.

that's a whole lot of scan-sets -- 13.44 million...

considerably fewer digital-text e-books, it's true,
but still over 10 times bigger than the p.g. library.

of course, their digital text isn't _nearly_ as clean as
the p.g. e-texts.   not even close.

not _yet_, anyway.

but i intend to do something about that little problem.

and it's ironic, because here i have been -- for years,
literally _years_ -- offering to help project gutenberg
clean up its e-texts, and no one took me up on that...

so now, when i go help the o.c.a. clean up their text,
there is some chance that they will actually end up
with text that's _cleaner_ than the p.g. text, meaning
that -- in addition to their huge lead in _quantity_ --
they will edge you out on the _quality_ issue as well...

heck, considering that the only "comparison" that can
be done will be on the books found in _both_ libraries,
and considering the fact that they can use your e-text
to correct their own, thereby ensuring that their text is
just as good as yours, if not _better_, i have to believe
that there is no way they can fall short on a comparison.

maybe you should've gone for the quality upgrade
while you had the opportunity...   now it's too late...

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080507/7ffac2ee/attachment.htm 

From ebooks at ibiblio.org  Thu May  8 01:07:24 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Thu, 08 May 2008 04:07:24 -0400
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <c17.35f754c1.35539674@aol.com>
References: <c17.35f754c1.35539674@aol.com>
Message-ID: <4822B4BC.7030102@ibiblio.org>

I've been remiss in finishing some replies to a few of Bowerbird's 
earlier posts, but this one didn't require a lot of typing on my part, 
which is always convenient for someone who uses the time-honored 
hunt-and-peck typing method.


Bowerbird wrote:

> the open library -- http://openlibrary.org --
> "has just finished its latest release", according to
> an announcement that hit my e-mail this morning.
> 
> their website says they are now:
>  >   featuring 13,439,320 books
>  >   (including 234,857 with full-text)
> so that helps firm up the answer to some questions
> that we were batting around here somewhat recently.


It would help if you knew what those numbers mean. :)


> that's a whole lot of scan-sets -- 13.44 million...


It would be a lot--if they actually had 13.44 million scan-sets, but 
they don't.

Apparently, Bowerbird, you didn't bother to go past the main page of 
the Open Library website. It's too bad you didn't click on the "About 
the project" link near the bottom of the main page:

http://www.openlibrary.org/about

Here's an excerpt:


"One web page for every book ever published. It's a lofty, but 
achievable, goal.

"To build it, we need hundreds of millions of book records, a brand 
new database infrastructure for handling huge amounts of dynamic 
information, a wiki interface, multi-language support, and people who 
are willing to contribute their time, effort, and book data.

"To date, we have gathered about 30 million records (13.4 million are 
available through the site now), and more are on the way. We have 
built the database infrastructure and the wiki interface, and you can 
search millions of book records, narrow results by facet, and search 
across the full text of 230,000 scanned books...."


So the "13,439,320 books" is actually referring to book *records*. The 
"234,857 with full-text" refers to the number that have been scanned.

Now, if you had clicked on the "Add a book" link, also near the bottom 
of the main page:

http://www.openlibrary.org/addbook

you would have seen the lengthy form used to add a book record to 
their database. You should try it. Just think! You could add a book 
(record) without having to scan a single page or correct any OCR. Of 
course, since they already have about 30 million records, you might 
have trouble coming up with a book that's not in their database.


> considerably fewer digital-text e-books, it's true,
> but still over 10 times bigger than the p.g. library.
> 
> of course, their digital text isn't _nearly_ as clean as
> the p.g. e-texts.  not even close.
> 
> not _yet_, anyway.
> 
> but i intend to do something about that little problem.


Given your track record, perhaps you should have said, "but i intend 
to do very little (other than talk a lot) about that little problem."


> and it's ironic, because here i have been -- for years,
> literally _years_ -- offering to help project gutenberg
> clean up its e-texts, and no one took me up on that...


If you had really wanted to help PG clean up its e-texts, you could 
have submitted cleaned up versions at any time. I used to do that some 
years ago. I'd look for items that were available only in plain text 
format and submit corrected plain text and HTML versions.


> so now, when i go help the o.c.a. clean up their text,
> there is some chance that they will actually end up
> with text that's _cleaner_ than the p.g. text, meaning
> that -- in addition to their huge lead in _quantity_ --
> they will edge you out on the _quality_ issue as well...


Unless you "help" the OCA as much as you've "helped" PG. :)


Jose Menendez


P.S. I'm hoping to finish typing up those replies I mentioned above in 
the near future.

From frank.vandrogen at bc.biol.ethz.ch  Thu May  8 06:53:31 2008
From: frank.vandrogen at bc.biol.ethz.ch (Frank van Drogen)
Date: Thu, 08 May 2008 15:53:31 +0200
Subject: [gutvol-d] Google Ads
In-Reply-To: <BLU127-W30DFDD46FB2959E41770C2DFD60@phx.gbl>
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
	<87k5i81h8i.fsf@bzg.ath.cx> <4820C62A.1090009@bohol.ph>
	<BLU127-W30DFDD46FB2959E41770C2DFD60@phx.gbl>
Message-ID: <XFE1cVux1Zcn8TtEvzo00001e5b@xfe1.d.ethz.ch>

Is there a possibility to get insight in the current financial 
situation of PGLAF, and into considered alternatives for ads?

Frank


From hart at pglaf.org  Thu May  8 10:16:18 2008
From: hart at pglaf.org (Michael Hart)
Date: Thu, 8 May 2008 10:16:18 -0700 (PDT)
Subject: [gutvol-d] Google Ads
In-Reply-To: <XFE1cVux1Zcn8TtEvzo00001e5b@xfe1.d.ethz.ch>
References: <Pine.LNX.4.64.0805051336360.19808@pglaf.org>
	<87k5i81h8i.fsf@bzg.ath.cx> <4820C62A.1090009@bohol.ph>
	<BLU127-W30DFDD46FB2959E41770C2DFD60@phx.gbl>
	<XFE1cVux1Zcn8TtEvzo00001e5b@xfe1.d.ethz.ch>
Message-ID: <Pine.LNX.4.64.0805081009590.31002@pglaf.org>


On Thu, 8 May 2008, Frank van Drogen wrote:

> Is there a possibility to get insight in the current financial
> situation of PGLAF, and into considered alternatives for ads?
>
> Frank

PGLAF has never been totally broke, mostly because I won't even
let Greg pay my office expenses when cash flow is low.

However, the more we've looked into this Google ad thing, the
more it appears we could stand on our own.

I've long wondered if we could ever find someone to replace me
who would value Project Gutenberg more than their own paycheck.

I'm not sure we really want to find out the hard way.

We won't be offering any more than someone would get staying
in academia and working from there, it was just that academia
was pretty inconsistent in their support, which is why we had
to create the PGLAF in the first place.

By the way, I don't think the ads show up at all on Newby's
computer because he has "adblock plus" and I only have the
normal "adblock," so it appears no one may have to see ads
that doesn't want to.  As for myself, I never even notice
the Google ads when I do my usual surfing.


Michael


From hart at pglaf.org  Thu May  8 10:33:11 2008
From: hart at pglaf.org (Michael Hart)
Date: Thu, 8 May 2008 10:33:11 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <4822B4BC.7030102@ibiblio.org>
References: <c17.35f754c1.35539674@aol.com> <4822B4BC.7030102@ibiblio.org>
Message-ID: <Pine.LNX.4.64.0805081026090.31002@pglaf.org>


On Thu, 8 May 2008, Jose Menendez wrote:

The "234,857 with full-text" refers to the number that have been
scanned.


///


Actually, if they really are "full-text" in the manner that term
has always been used, as opposed to "raw scans," then these 1/4
million or so books would NOT, technically, "refer to the number
that have been scanned" but more accurately "refer to the number
that have been scanned and converted from image to text mode."

If. . .they are using the language as it always has been. . . .


However, I don't think the OCA does much proofreading, if any,
so we might need even a more detailed technical language.

Perhaps "raw text" ???

I'm sure I have left out several other categories that should
eventually be included in breaking down the entire processing
from MARC records of 13.xs million down to who knows how many
eBooks that have been proofread to the current 99.975% level.


Reading the below with this in mind might be advantageous.


Thanks!!!

Michael


> I've been remiss in finishing some replies to a few of Bowerbird's
> earlier posts, but this one didn't require a lot of typing on my part,
> which is always convenient for someone who uses the time-honored
> hunt-and-peck typing method.
>
>
> Bowerbird wrote:
>
>> the open library -- http://openlibrary.org --
>> "has just finished its latest release", according to
>> an announcement that hit my e-mail this morning.
>>
>> their website says they are now:
>> >   featuring 13,439,320 books
>> >   (including 234,857 with full-text)
>> so that helps firm up the answer to some questions
>> that we were batting around here somewhat recently.
>
>
> It would help if you knew what those numbers mean. :)
>
>
>> that's a whole lot of scan-sets -- 13.44 million...
>
>
> It would be a lot--if they actually had 13.44 million scan-sets, but
> they don't.
>
> Apparently, Bowerbird, you didn't bother to go past the main page of
> the Open Library website. It's too bad you didn't click on the "About
> the project" link near the bottom of the main page:
>
> http://www.openlibrary.org/about
>
> Here's an excerpt:
>
>
>
> "One web page for every book ever published. It's a lofty, but
> achievable, goal.
>
> "To build it, we need hundreds of millions of book records, a brand
> new database infrastructure for handling huge amounts of dynamic
> information, a wiki interface, multi-language support, and people who
> are willing to contribute their time, effort, and book data.
>
> "To date, we have gathered about 30 million records (13.4 million are
> available through the site now), and more are on the way. We have
> built the database infrastructure and the wiki interface, and you can
> search millions of book records, narrow results by facet, and search
> across the full text of 230,000 scanned books...."
>
>
>
> So the "13,439,320 books" is actually referring to book *records*. The
> "234,857 with full-text" refers to the number that have been scanned.
>
> Now, if you had clicked on the "Add a book" link, also near the bottom
> of the main page:
>
> http://www.openlibrary.org/addbook
>
> you would have seen the lengthy form used to add a book record to
> their database. You should try it. Just think! You could add a book
> (record) without having to scan a single page or correct any OCR. Of
> course, since they already have about 30 million records, you might
> have trouble coming up with a book that's not in their database.
>
>
>> considerably fewer digital-text e-books, it's true,
>> but still over 10 times bigger than the p.g. library.
>>
>> of course, their digital text isn't _nearly_ as clean as
>> the p.g. e-texts.  not even close.
>>
>> not _yet_, anyway.
>>
>> but i intend to do something about that little problem.
>
>
> Given your track record, perhaps you should have said, "but i intend
> to do very little (other than talk a lot) about that little problem."
>
>
>> and it's ironic, because here i have been -- for years,
>> literally _years_ -- offering to help project gutenberg
>> clean up its e-texts, and no one took me up on that...
>
>
> If you had really wanted to help PG clean up its e-texts, you could
> have submitted cleaned up versions at any time. I used to do that some
> years ago. I'd look for items that were available only in plain text
> format and submit corrected plain text and HTML versions.
>
>
>> so now, when i go help the o.c.a. clean up their text,
>> there is some chance that they will actually end up
>> with text that's _cleaner_ than the p.g. text, meaning
>> that -- in addition to their huge lead in _quantity_ --
>> they will edge you out on the _quality_ issue as well...
>
>
> Unless you "help" the OCA as much as you've "helped" PG. :)
>
>
> Jose Menendez
>
>
> P.S. I'm hoping to finish typing up those replies I mentioned above in
> the near future.
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From Bowerbird at aol.com  Thu May  8 10:44:52 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 8 May 2008 13:44:52 EDT
Subject: [gutvol-d] Google Ads
Message-ID: <c26.33f9a3b1.35549614@aol.com>

michael-

since you said ibiblio won't allow ads,
does this mean that the main u.r.l.
-- http://www.gutenberg.org --
is _not_ what we're talking about here?

if that's the case, and it would be some
_new_ site that you set up, i'd think that
you don't even need to ask our thoughts.

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080508/8c062276/attachment.htm 

From hart at pglaf.org  Thu May  8 11:38:02 2008
From: hart at pglaf.org (Michael Hart)
Date: Thu, 8 May 2008 11:38:02 -0700 (PDT)
Subject: [gutvol-d] Google Ads
In-Reply-To: <c26.33f9a3b1.35549614@aol.com>
References: <c26.33f9a3b1.35549614@aol.com>
Message-ID: <Pine.LNX.4.64.0805081135520.31002@pglaf.org>


On Thu, 8 May 2008, Bowerbird at aol.com wrote:

> michael-
>
> since you said ibiblio won't allow ads,
> does this mean that the main u.r.l.
> -- http://www.gutenberg.org --
> is _not_ what we're talking about here?
>
> if that's the case, and it would be some
> _new_ site that you set up, i'd think that
> you don't even need to ask our thoughts.
>
> -bowerbird


I always like to ask. . . .period. . . .

Yes, this would be a different host.

The first test we are thinking of would
be only a partial hosting, with some of
the ibiblio site still getting the hits
and some going to the test site.

It'll be a while, plenty of time for at
least a few generations of comments and
trials before going public. . . .


mh

From Bowerbird at aol.com  Thu May  8 12:08:42 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 8 May 2008 15:08:42 EDT
Subject: [gutvol-d] Google Ads
Message-ID: <ce4.309c1d9e.3554a9ba@aol.com>

michael said:
>    I always like to ask. . . .period. . . .

well, that's nice and all.            :+)

but it gives us the impression that we have a say,
when maybe we _shouldn't_ have a say, not really.

i think lots of people don't realize that any time
p.g. needed something and no monies existed,
you took money out of your pocket to pay for it.

and not because you "made a voluntary donation"
at that time, but simply because it was _required_.

other people have bought things for the project,
yes sir.   (i know several people have paid for _lots_
of books.)   but not even _those_ people have said
"you can take whatever money you need any time."

so i don't think you have to _ask_us_ now that you
would like to get paid back some of that money, or
assure that such monies _will_ exist into the future.


>   It'll be a while, plenty of time for at
>    least a few generations of comments 
>    and trials before going public. . . .

why don't you ask _the_users_ themselves?

put a thumbs-up/down poll right on the site.

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080508/b006b53b/attachment.htm 

From Bowerbird at aol.com  Thu May  8 12:20:56 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 8 May 2008 15:20:56 EDT
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <cf7.2f2ec2c2.3554ac98@aol.com>

thank you, jose, for clearing up my mistaken notion that
the o.c.a. had already scanned some 13.44 million books,
when the actual number is more like a quarter of a million.
of course, it was the lower number that i was dealing with,
so almost nothing in my post needs rewriting in response...

i also noticed that -- back on groundhog day in february --
umichigan announced their total of 1 million books scanned.
since umich have what? -- like 6-9 million volumes or so --
and google has been scanning there for over 3 years now,
i guess things aren't going as fast as they originally planned.

at any rate, no matter how fast or slow it's all going, i am
pleased as punch that we finally started digitizing libraries.

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080508/c4fd13cc/attachment.htm 

From Bowerbird at aol.com  Thu May  8 12:24:42 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 8 May 2008 15:24:42 EDT
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <ca6.26cef564.3554ad7a@aol.com>

michael said:
>    However, I don't think the OCA does much proofreading, if any,
>    so we might need even a more detailed technical language.
>    Perhaps "raw text" ???

and here's where we start to come full circle...

***

people who have been paying attention to my recent analyses of
the data from the experiments over at distributed proofreaders
should now know that with good scans and good o.c.r., you can
move "raw o.c.r." close to perfection with a good clean-up tool...

so that's what i intend to do, with the "raw o.c.r." from umichigan
-- and google more generally -- _and_ the open content alliance,
sometimes even via a _comparison_ of the same book from both...

an extremely persistent campaign on my part to get umichigan to
fix the fatal flaws in its o.c.r. has _finally_ paid off, i am informed,
thanks in part, i would guess, because i went to the very top of the
org food-chain and addressed the _university_librarian_ publicly...

moreover, an equally tenacious campaign directed at the o.c.a. has
-- just today -- finally given me the name of a person in charge of
their o.c.r., so i can hope that soon they too will fix their fatal flaws.

so i expect that soon i will be able to start scraping text in earnest,
and remounting it after aggressively cleaning it with my programs.

will this machine-cleaned text be as clean as p.g. e-texts?   nope.
not at first, anyway.   but since i will wrap it in an infrastructure of
"continuous proofing" to encourage the error-reporting process,
i expect that it won't take long before it matches and exceeds p.g.
after all, proofing isn't rocket-science...

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080508/19aa8b40/attachment.htm 

From hart at pglaf.org  Thu May  8 17:28:50 2008
From: hart at pglaf.org (Michael Hart)
Date: Thu, 8 May 2008 17:28:50 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <cf7.2f2ec2c2.3554ac98@aol.com>
References: <cf7.2f2ec2c2.3554ac98@aol.com>
Message-ID: <Pine.LNX.4.64.0805081709460.18854@pglaf.org>


On Thu, 8 May 2008, Bowerbird at aol.com wrote:

> thank you, jose, for clearing up my mistaken notion that
> the o.c.a. had already scanned some 13.44 million books,
> when the actual number is more like a quarter of a million.
> of course, it was the lower number that i was dealing with,
> so almost nothing in my post needs rewriting in response...
>
> i also noticed that -- back on groundhog day in february --
> umichigan announced their total of 1 million books scanned.
> since umich have what? -- like 6-9 million volumes or so --
> and google has been scanning there for over 3 years now,
> i guess things aren't going as fast as they originally planned.
>
> at any rate, no matter how fast or slow it's all going, i am
> pleased as punch that we finally started digitizing libraries.
>

Google announced on December 14, 2004 that they would digitize
10 million books in 6 years.

Of course that includes a lot more libraries than UMich,
who,by the way, used to claim Project Gutenberg was there.

Hee hee!

Meanwhile, it's been nearly 3 1/2 years.

If google did 3 1/3 million in the first three years,
and then doubled production for the next three years,
then they might actually be able to claim on schedule
and even longer if tey pretend they never mean a date
of December 14, 2004 to be remembered by anyone as an
official starting date, een though it was the date of
biggest media blitz I've ever seen in my entire life.

Hee hee!

The real question will be wheter or not Google allows
"their" books to become "everyone's" books in a quite
useful form, or whether the world will be forced into
a permanent continuation of reading over the shoulder
of Google, much as the Brtish Library "Readers."

Now don't get me wrong, the British Library "Readers"
feel themselves to be a particularly well-heeled, and
very fortunate bunch. . .but still, even in their put
on Sunday best, they have to STAND in tiny carrels to
do all their reading.

Nevertheless, _I_ am promoting electronic books quite
literally on a different plane, books YOU can OWN for
as long as you want, and get again if you lose them.

Millions of books.

You own them all.

The "personal computer" as "personal library."

Somehow I don't think this is exacly what Google, and
the Open Content Alliance, and Million Book Project--
and most of the rest--actually have in mind.

And _I_ want them all in full text that can be pulled
into any word processor, emailer, text editor, etc.

Small files anyone can use in any text program. . . .

OWN a million books.

Maybe even a billion. . . .

No kidding. . . .


Michael S. Hart
Founder, 1971
Project Gutenberg
Inventor of eBooks


> -bowerbird
>
>
>
> **************
> Wondering what's for Dinner Tonight? Get new twists on family
> favorites at AOL Food.
>
> (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
>

From hart at pglaf.org  Thu May  8 17:30:56 2008
From: hart at pglaf.org (Michael Hart)
Date: Thu, 8 May 2008 17:30:56 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <ca6.26cef564.3554ad7a@aol.com>
References: <ca6.26cef564.3554ad7a@aol.com>
Message-ID: <Pine.LNX.4.64.0805081729410.18854@pglaf.org>


Personally, I always thought the poor quality of their scans
was intentional. . .to prevent creating good enough OCR to do
what I mentioned in my previous message.

Have they really changed their minds, and will let out their
best scans now???

mh


On Thu, 8 May 2008, Bowerbird at aol.com wrote:

> michael said:
>>    However, I don't think the OCA does much proofreading, if any,
>>    so we might need even a more detailed technical language.
>>    Perhaps "raw text" ???
>
> and here's where we start to come full circle...
>
> ***
>
> people who have been paying attention to my recent analyses of
> the data from the experiments over at distributed proofreaders
> should now know that with good scans and good o.c.r., you can
> move "raw o.c.r." close to perfection with a good clean-up tool...
>
> so that's what i intend to do, with the "raw o.c.r." from umichigan
> -- and google more generally -- _and_ the open content alliance,
> sometimes even via a _comparison_ of the same book from both...
>
> an extremely persistent campaign on my part to get umichigan to
> fix the fatal flaws in its o.c.r. has _finally_ paid off, i am informed,
> thanks in part, i would guess, because i went to the very top of the
> org food-chain and addressed the _university_librarian_ publicly...
>
> moreover, an equally tenacious campaign directed at the o.c.a. has
> -- just today -- finally given me the name of a person in charge of
> their o.c.r., so i can hope that soon they too will fix their fatal flaws.
>
> so i expect that soon i will be able to start scraping text in earnest,
> and remounting it after aggressively cleaning it with my programs.
>
> will this machine-cleaned text be as clean as p.g. e-texts?   nope.
> not at first, anyway.   but since i will wrap it in an infrastructure of
> "continuous proofing" to encourage the error-reporting process,
> i expect that it won't take long before it matches and exceeds p.g.
> after all, proofing isn't rocket-science...
>
> -bowerbird
>
>
>
> **************
> Wondering what's for Dinner Tonight? Get new twists on family
> favorites at AOL Food.
>
> (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
>

From Bowerbird at aol.com  Thu May  8 18:53:40 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 8 May 2008 21:53:40 EDT
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <c93.27f9d8c2.355508a4@aol.com>

michael said:
>    Personally, I always thought 
>    the poor quality of their scans was intentional. . .
>    to prevent creating good enough OCR to 
>    do what I mentioned in my previous message.

well, you really have to separate the o.c.a. from google.


>    Have they really changed their minds, 
>   and will let out their best scans now???

it's not really a question of which _scans_ they will "let out".
we know google ain't releasing their high-resolution scans;
they're too big anyway, and don't give better o.c.r. output...

but -- hopefully -- we won't   have to deal with their scans,
except to display 'em in the "continuous proofing" interface,
where we want bandwidth-saving smaller images anyway...

but mostly, we'll be dealing with their o.c.r. output...

so the question is "how good is their raw o.c.r. output?"

and the answer is that it's relatively good.   good enough.

good enough that, after we run aggressive clean-up on it,
it gives us results good enough for "continuous proofing".

at least, their o.c.r. _will_ be good enough, once fatal flaws 
in it are overcome...   "fatal flaws" means missing characters,
usually em-dashes and quotemarks, as well as pagebreaks
(which o.c.r. will record as a formfeed if instructed to do so).

i haven't yet ruled out that these "fatal flaws" are intentional,
but i do believe they are simply the result of _incompetence_,
rather than a sinister attempt to make sure that competitors
who might try to "steal" this data are thwarted with bad text.

whatever the reasons behind it, the fact is that the _public_
-- who are always shown as the beneficiaries of this work --
simply won't put up with this cruddy text, so i'm just the first
in a long series of loud-mouthed complainers if it ain't fixed.

but once the fatal flaws are fixed, the o.c.r. gets very good...
my tests show google o.c.r. is good, and o.c.a. o.c.r. is great.
in one such test, there were only _57_ bad lines in the book
using o.c.r. from the o.c.a.   and only 240 in the google o.c.r.

***

so the o.c.r. is good (with exceptions, yes) in both projects.

but there's another important feature -- the convenience...

the o.c.a. actively wants you to have the full text of the book,
offering it in one file, for maximal downloading convenience.

google -- and (per their contract with google) umichigan --
are making it far less convenient, forcing people to undergo
a page-by-page interface, and threatening to stop scrapers...

i will be attempting to engage john wilkins at umichigan in a
conversation involving firm answers to questions about how
and where they will draw the lines on "automated scraping".

suffice it for now to observe that will be an "interesting" talk,
because i sense that they don't _want_ to make it _too_ easy,
but they will have a difficult task making hard-and-fast rules
that limit access since they want to say their books are "open".
(brewster and carl malamud have already called them on this.)

just for the record, i stated in the past that it was my belief that
google would never share their text, because that would mean
giving up their competitive advantage, understandable as it was
the result of their investment of hundreds of millions of dollars.
they _deserve_ that competitive advantage.   but they gave it up.

so already google is giving more than i ever thought they would.

when they announced they were making the full-text available,
they made a big deal that they were now making it _accessible_
to visually-impaired users, so i got the impression that they had
concluded they'd face a major a.d.a. suit if they didn't cough up,
and i suspect that that is why they decided to release their text...
but maybe i'm just not giving them enough credit...

the point is that one can now get text from o.c.a. _and_ google,
and it's clean enough text that you can move it close to perfect,
at least if you've learned as much about o.c.r. clean-up as i have.

even if -- for google -- you have to scrape it one page at a time,
the clean-up tool can clean one page while it's scraping the next,
and can upload the entire book once it has scraped all the pages,
so -- you know -- you can just turn the thing on and watch it run.

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080508/d01759df/attachment-0001.htm 

From hyphen at hyphenologist.co.uk  Fri May  9 12:30:04 2008
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Fri, 9 May 2008 20:30:04 +0100
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <Pine.LNX.4.64.0805081729410.18854@pglaf.org>
References: <ca6.26cef564.3554ad7a@aol.com>
	<Pine.LNX.4.64.0805081729410.18854@pglaf.org>
Message-ID: <003901c8b20b$1c0a4a80$541edf80$@co.uk>

Michael Hart wrote


> Personally, I always thought the poor quality of their scans
> was intentional. . .to prevent creating good enough OCR to do
> what I mentioned in my previous message.

> Have they really changed their minds, and will let out their
> best scans now???

I looked up one of "my" books and the scan was pretty good, 
from a library copy complete with hand written marginalia.

I expect that I could have OCRed it without problems.

Dave F


From ebooks at ibiblio.org  Fri May  9 13:03:10 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Fri, 09 May 2008 16:03:10 -0400
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <Pine.LNX.4.64.0805081026090.31002@pglaf.org>
References: <c17.35f754c1.35539674@aol.com> <4822B4BC.7030102@ibiblio.org>
	<Pine.LNX.4.64.0805081026090.31002@pglaf.org>
Message-ID: <4824ADFE.6030903@ibiblio.org>

On May 8, 2008, Michael Hart wrote:

> On Thu, 8 May 2008, Jose Menendez wrote:
> 
> The "234,857 with full-text" refers to the number that have been 
> scanned.
> 
> 
> ///
> 
> 
> Actually, if they really are "full-text" in the manner that term 
> has always been used, as opposed to "raw scans," then these 1/4 
> million or so books would NOT, technically, "refer to the number 
> that have been scanned" but more accurately "refer to the number 
> that have been scanned and converted from image to text mode."
> 
> If. . .they are using the language as it always has been. . . .


Obviously, if the "full-text" is available, the books were not only 
scanned but OCRed as well, but Bowerbird's mistake wasn't about the 
number of books that had been OCRed. It was about the number of books 
that had been scanned. That's why I wrote, "The '234,857 with 
full-text' refers to the number that have been scanned." I didn't 
think I'd have to add the obvious "and OCRed" to the end of the sentence.

By the way, it seems that the number given on the Open Library website 
might be a little out-of-date. If we look at the Internet Archive's 
Text Archive page,

http://www.archive.org/details/texts

we'll see that the total number of items listed for the "American 
Libraries" and "Canadian Libraries" sub-collections is over 298,000.


Jose Menendez

From ebooks at ibiblio.org  Fri May  9 13:06:57 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Fri, 09 May 2008 16:06:57 -0400
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <ca6.26cef564.3554ad7a@aol.com>
References: <ca6.26cef564.3554ad7a@aol.com>
Message-ID: <4824AEE1.5090908@ibiblio.org>

On May 8, 2008, Bowerbird wrote:

> michael said:
>  >   However, I don't think the OCA does much proofreading, if any, 
>  >   so we might need even a more detailed technical language.
>  >   Perhaps "raw text" ???
> 
> and here's where we start to come full circle...
> 
> ***
> 
> people who have been paying attention to my recent analyses of 
> the data from the experiments over at distributed proofreaders 
> should now know that with good scans and good o.c.r., you can 
> move "raw o.c.r." close to perfection with a good clean-up tool...
> 
> so that's what i intend to do, with the "raw o.c.r." from umichigan 
> -- and google more generally -- _and_ the open content alliance, 
> sometimes even via a _comparison_ of the same book from both...
> 
> an extremely persistent campaign on my part to get umichigan to 
> fix the fatal flaws in its o.c.r. has _finally_ paid off, i am informed, 
> thanks in part, i would guess, because i went to the very top of the 
> org food-chain and addressed the _university_librarian_ publicly...


The Google OCR at the University of Michigan started improving over a 
year ago. Perhaps you recall my pointing that out to you on the DP 
forums in January of last year.

http://www.pgdp.net/phpBB2/viewtopic.php?p=271008#271008

      "But there are other Google books at the UM site that
      aren't the way you describe. For example, look at this
      page of OCR text from _Abraham Lincoln_ by Carl Schurz. I
      see a number of separate paragraphs. I also see a number
      of quotation marks and even a few end-line hyphens, but
      not all of the hyphens that should be there."

Was the OCR as good as it should have been? No, but it was already 
getting better.


> moreover, an equally tenacious campaign directed at the o.c.a. has 
> -- just today -- finally given me the name of a person in charge of 
> their o.c.r., so i can hope that soon they too will fix their fatal flaws.


The OCA's OCR also began improving at least a year ago. For example, 
look at "The Spanish Story of the Armada, and Other Essays":

http://www.archive.org/details/spanishstoryofar00frouuoft

If we look at the directory with the various files,

http://ia340919.us.archive.org/2/items/spanishstoryofar00frouuoft/

we'll see that most of them were posted exactly a year ago, on May 9, 
2007. And if you look at the plain-text file,

http://ia340919.us.archive.org/2/items/spanishstoryofar00frouuoft/spanishstoryofar00frouuoft_djvu.txt

you'll see that it contains quotation marks, apostrophes, end of line 
hyphens, etc. Em dashes do seem to be missing, but that flaw was fixed 
with later books, starting last December. For example, look at "The 
Scarlet Letter, A Romance":

http://www.archive.org/details/letterromscarlet00hawtrich

The directory listing shows that most of the files were posted on 
December 15, 2007:

http://ia360619.us.archive.org/0/items/letterromscarlet00hawtrich/

And if you look at the plain-text file,

http://ia360619.us.archive.org/0/items/letterromscarlet00hawtrich/letterromscarlet00hawtrich_djvu.txt

you'll see that it also includes the em dashes. You might not 
recognize them, but they're there. Look for this string of characters: ???

That's UTF-8 for em dashes. Indeed, if you switch your web browser to 
use UTF-8 encoding, you'll see them displayed as em dashes.

It would be nice, of course, if the OCA would fix the OCR of the books 
that had been processed earlier.


> so i expect that soon i will be able to start scraping text in earnest, 
> and remounting it after aggressively cleaning it with my programs.


You know, Bowerbird, all this time you've been complaining about the 
quality of the OCA's and Google's OCR, you also could have been doing 
your own OCR of their scans "in earnest" and "aggressively cleaning 
it." It's rather easy if you know what you're doing, but I guess it's 
even easier to sit back and wait for someone else to provide you with 
high quality OCR. :)


> will this machine-cleaned text be as clean as p.g. e-texts?  nope. 
> not at first, anyway.  but since i will wrap it in an infrastructure of 
> "continuous proofing" to encourage the error-reporting process, 
> i expect that it won't take long before it matches and exceeds p.g. 
> after all, proofing isn't rocket-science...


You've posted links to your "continuous proofing" demos in a number of 
places, but have you gotten a single person (other than yourself) to 
use your "error-reporting process"? :)


Jose Menendez

From ebooks at ibiblio.org  Fri May  9 13:07:24 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Fri, 09 May 2008 16:07:24 -0400
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <Pine.LNX.4.64.0805081709460.18854@pglaf.org>
References: <cf7.2f2ec2c2.3554ac98@aol.com>
	<Pine.LNX.4.64.0805081709460.18854@pglaf.org>
Message-ID: <4824AEFC.50102@ibiblio.org>

On May 8, 2008, Michael Hart wrote:


> Google announced on December 14, 2004 that they would digitize 
> 10 million books in 6 years.


No, they didn't. Here's a link to the story CBS News placed on its 
website back on Dec. 14, 2004:

"Google To Scan Library Volumes"
http://www.cbsnews.com/stories/2004/12/14/tech/main660896.shtml

The only "timeline" for the scanning given in the article is this:

"Michigan's library alone contains 7 million of its library volumes -- 
about 132 miles of books. Google hopes to get the job done at Michigan 
within six years, Wilkin said."

The article does NOT say how many books Google hoped to scan at the 
other 4 libraries in those 6 years.

And here's a link to the article BBC News posted on Dec. 14, 2004:

"Google to scan famous libraries"
http://news.bbc.co.uk/2/hi/technology/4094271.stm

Again the only mention of 6 years refers to the University of Michigan 
library:

"It will take six years to digitise the full collection at Michigan, 
which contains seven million volumes."

Could you point to a single article posted back then, reporting that 
Google announced it would scan 10 million books in 6 years?


> Of course that includes a lot more libraries than UMich, 
> who,by the way, used to claim Project Gutenberg was there.


Did they also claim Kilroy was there? ;)


> If google did 3 1/3 million in the first three years, 
> and then doubled production for the next three years, 
> then they might actually be able to claim on schedule 
> and even longer if tey pretend they never mean a date 
> of December 14, 2004 to be remembered by anyone as an 
> official starting date, een though it was the date of 
> biggest media blitz I've ever seen in my entire life.


It was the "biggest media blitz" you've ever seen, yet you always seem 
to have trouble remembering what was actually said. :)


Jose Menendez

From lee at novomail.net  Fri May  9 13:46:39 2008
From: lee at novomail.net (Lee Passey)
Date: Fri, 09 May 2008 14:46:39 -0600
Subject: [gutvol-d] parallel -- the plunderer -- 04 -- and another
 upbeat post in this series
In-Reply-To: <c0f.270fd4f5.35521f46@aol.com>
References: <c0f.270fd4f5.35521f46@aol.com>
Message-ID: <4824B82F.306@novomail.net>

Bowerbird at aol.com wrote:

> given that all of these 3 errors would be reasonably expected to be
> caught by the general public in "continuous proofing", i believe that
> the question of whether p2 was even needed in this particular book
> is open for discussion.

And therein lies the rub. Pretty much every suggestion for change you 
have made to Distributed Proofreaders depends on the existence of an 
effective and efficient "continuous proofreading" process--which does 
not exist, and will probably never exist so long as Distributed 
Proofreaders views Project Gutenberg as it's primary distribution mechanism.

The relationship between Project Gutenberg and Distributed Proofreaders 
is both complex and one-way, with DP playing the role of the unrequited 
lover. The volunteers at DP go to great lengths and expend great effort 
in producing the finest work product their processes allow them to. 
However, once DP has finished its work it is passed over to PG, where it 
goes into the barrels with all the other apples.

This is why the volunteers at DP expend so much time and energy on 
trying to figure out how many rounds of various types are required to 
maximize the quality of their work product. They know that once a set of 
files for any particular work is submitted to PG that there is no 
effective or efficient process to correct errors, or even enhance the 
output. While there is a theoretical process to correct problems, there 
is no practical process. ("In theory, there is no difference between 
theory and practice. In practice, there is.") For all practical 
purposes, when a file leaves DP and enters the PG archive it is forever 
cast in stone.

I agree that much of what DP now does would be unnecessary if there were 
in place a "continuous proofreading" process; but there is not. And so 
long as DP has no control over the archiving and distribution of its own 
output it will be unable to put a "continuous proofreading" process into 
practice. The first step in improving DP's processes should be to find a 
partner where DO's work product can be archived and accessible /in 
addition to/ Project Gutenberg, and where Distributed Proofreaders might 
have the kind of control required to implement "continuous proofreading."

From Bowerbird at aol.com  Fri May  9 14:45:08 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 9 May 2008 17:45:08 EDT
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <d2b.25689244.35561fe4@aol.com>

jose said:
>    The Google OCR at the University of Michigan 
>    started improving over a year ago. 

as long as it has "fatal flaws" in it, "improvement" means little.


>    Was the OCR as good as it should have been? 
>    No, but it was already getting better.

the point is that, as of march 31st of this year, the "problems"
are now solved, according to people at the umichigan library.
i haven't checked to make sure of that, but i hope that it's true.


>    The OCA's OCR also began improving at least a year ago. 

again, "improvement" is nice, but removing fatal flaws is necessary.

if you can tell me that all of their o.c.r. output since a certain date
is _free_ of fatal flaws, i'll go see if i can confirm that.   otherwise...

(by the way, one flaw -- quite serious, and perhaps even fatal --
with the o.c.a. output is that it doesn't include the pagebreaks...
i'd need that to be fixed before i could seriously work that text.)


>    It would be nice, of course, if the OCA would fix the OCR 
>    of the books that had been processed earlier.

well, yeah, that too...


>    You know, Bowerbird, all this time you've been complaining 
>    about the quality of the OCA's and Google's OCR, you also 
>    could have been doing your own OCR of their scans "in earnest" 

i could have.   but i'd consider that to be a waste of my time,
since both the o.c.a. and google will fix their text eventually.


>    I guess it's even easier to sit back and wait for someone else 
>    to provide you with high quality OCR. :)

if i really wanted to "sit back and wait", i'd sit back and wait
until they cleaned up their o.c.r., because they'll have to do
_that_ eventually as well.   and if it was a lot of work for me to
clean up the o.c.r., that's precisely what i _would_ do, because
i see no point in re-doing someone else's work when i know
that that someone else will eventually re-do the work anyway...


>    You've posted links to your "continuous proofing" demos 
>    in a number of places, but have you gotten a single person 
>    (other than yourself) to use your "error-reporting process"? :)

perhaps no one has found an error to report...         ;+)

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080509/f29b9068/attachment.htm 

From hart at pglaf.org  Fri May  9 19:09:45 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 9 May 2008 19:09:45 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <4824AEFC.50102@ibiblio.org>
References: <cf7.2f2ec2c2.3554ac98@aol.com>
	<Pine.LNX.4.64.0805081709460.18854@pglaf.org>
	<4824AEFC.50102@ibiblio.org>
Message-ID: <Pine.LNX.4.64.0805091903370.8679@pglaf.org>


On Fri, 9 May 2008, Jose Menendez wrote:

> On May 8, 2008, Michael Hart wrote:
>
>
>> Google announced on December 14, 2004 that they would digitize
>> 10 million books in 6 years.
>
>
> No, they didn't. Here's a link to the story CBS News placed on its
> website back on Dec. 14, 2004:
>


Your research is very incomplete if you think the CBS story 
contained all the press releases given by the "Google Print Library
personnel on that date.

You'd better go back and chekc al the other TV networks, not to
mention radio, newspapers, etc.

I've quoted many of these here in the past, and I hope that these
kinds of statement will NOT be taken at face value when I am gone.

Let's not forget all the interviews with the various librarians
at the original member institutions of the project, along with
any number of Google officials and others.

_I_ went over ALL of them I could find, had people send me tapes
of others. . . .

Why?

This was probably a lot more important to me than to anyone else.

;-)

mh


> "Google To Scan Library Volumes"
> http://www.cbsnews.com/stories/2004/12/14/tech/main660896.shtml
>
> The only "timeline" for the scanning given in the article is this:
>
> "Michigan's library alone contains 7 million of its library volumes --
> about 132 miles of books. Google hopes to get the job done at Michigan
> within six years, Wilkin said."
>
> The article does NOT say how many books Google hoped to scan at the
> other 4 libraries in those 6 years.
>
> And here's a link to the article BBC News posted on Dec. 14, 2004:
>
> "Google to scan famous libraries"
> http://news.bbc.co.uk/2/hi/technology/4094271.stm
>
> Again the only mention of 6 years refers to the University of Michigan
> library:
>
> "It will take six years to digitise the full collection at Michigan,
> which contains seven million volumes."
>
> Could you point to a single article posted back then, reporting that
> Google announced it would scan 10 million books in 6 years?
>
>
>> Of course that includes a lot more libraries than UMich,
>> who,by the way, used to claim Project Gutenberg was there.
>
>
> Did they also claim Kilroy was there? ;)
>
>
>> If google did 3 1/3 million in the first three years,
>> and then doubled production for the next three years,
>> then they might actually be able to claim on schedule
>> and even longer if tey pretend they never mean a date
>> of December 14, 2004 to be remembered by anyone as an
>> official starting date, een though it was the date of
>> biggest media blitz I've ever seen in my entire life.
>
>
> It was the "biggest media blitz" you've ever seen, yet you always seem
> to have trouble remembering what was actually said. :)
>
>
> Jose Menendez
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Fri May  9 19:12:47 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 9 May 2008 19:12:47 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <4824ADFE.6030903@ibiblio.org>
References: <c17.35f754c1.35539674@aol.com> <4822B4BC.7030102@ibiblio.org>
	<Pine.LNX.4.64.0805081026090.31002@pglaf.org>
	<4824ADFE.6030903@ibiblio.org>
Message-ID: <Pine.LNX.4.64.0805091910160.8679@pglaf.org>


No, it is NOT obvious, so you MUST say "scanned" or "OCRed"
[with or without "scanned."

Why?

Because there are so many people out there muddying up the waters
as to what is what.

"Full text" can mean SO many different things, and formats. . .
even from those who are NOT trying to muddy the waters.

Be perfectly clear. . .it will help more than most imagine.

mh


On Fri, 9 May 2008, Jose Menendez wrote:

> On May 8, 2008, Michael Hart wrote:
>
>> On Thu, 8 May 2008, Jose Menendez wrote:
>>
>> The "234,857 with full-text" refers to the number that have been
>> scanned.
>>
>>
>> ///
>>
>>
>> Actually, if they really are "full-text" in the manner that term
>> has always been used, as opposed to "raw scans," then these 1/4
>> million or so books would NOT, technically, "refer to the number
>> that have been scanned" but more accurately "refer to the number
>> that have been scanned and converted from image to text mode."
>>
>> If. . .they are using the language as it always has been. . . .
>
>
> Obviously, if the "full-text" is available, the books were not only
> scanned but OCRed as well, but Bowerbird's mistake wasn't about the
> number of books that had been OCRed. It was about the number of books
> that had been scanned. That's why I wrote, "The '234,857 with
> full-text' refers to the number that have been scanned." I didn't
> think I'd have to add the obvious "and OCRed" to the end of the sentence.
>
> By the way, it seems that the number given on the Open Library website
> might be a little out-of-date. If we look at the Internet Archive's
> Text Archive page,
>
> http://www.archive.org/details/texts
>
> we'll see that the total number of items listed for the "American
> Libraries" and "Canadian Libraries" sub-collections is over 298,000.
>
>
> Jose Menendez
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From marcello at perathoner.de  Sat May 10 06:16:24 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat, 10 May 2008 15:16:24 +0200
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <Pine.LNX.4.64.0805091903370.8679@pglaf.org>
References: <cf7.2f2ec2c2.3554ac98@aol.com>	<Pine.LNX.4.64.0805081709460.18854@pglaf.org>	<4824AEFC.50102@ibiblio.org>
	<Pine.LNX.4.64.0805091903370.8679@pglaf.org>
Message-ID: <4825A028.2040203@perathoner.de>

Michael Hart wrote:

>> No, they didn't. Here's a link to the story CBS News placed on its 
>> website back on Dec. 14, 2004:
>> 
>> "Google To Scan Library Volumes"
>> http://www.cbsnews.com/stories/2004/12/14/tech/main660896.shtml
 >
> You'd better go back and chekc al the other TV networks, not to
> mention radio, newspapers, etc.

Usually, Michael, if YOU make a statement YOU have to prove it.

If there are lots of papers, TV networks etc. that brought YOUR version 
of the facts, then it will be easy for YOU to come up with a link to 
prove it.


> Google announced on December 14, 2004 that they would digitize
> 10 million books in 6 years.

All that Google "announced" was this blog entry:

   http://googleblog.blogspot.com/2004/12/all-booked-up.html

where they say NOTHING about 10 million NOR ANYTHING about 6 years.


Again, give us a link to where Google says they will digitize 10 MB in 6 
years or stand back from your claim.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From joshua at hutchinson.net  Sat May 10 09:44:17 2008
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat, 10 May 2008 16:44:17 +0000 (GMT)
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <254161975.687181210437857364.JavaMail.mail@webmail05>

The only places where I could find those numbers mixed together was places like this:

http://tzvee.blogspot.com/2007/06/google-will-scan-10-million-more.html

Quick summary... the goal is 10 million, but no date was given.  The U of M contract was 6 years, but no goal was given.

Maybe that is where the 10 million/6 years is coming from?

Josh

On May 10, 2008, marcello at perathoner.de wrote: 
Michael Hart wrote:
>
> You'd better go back and chekc al the other TV networks, not to
> mention radio, newspapers, etc.

Again, give us a link to where Google says they will digitize 10 MB in 6 
years or stand back from your claim.


From Bowerbird at aol.com  Sat May 10 12:38:21 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 10 May 2008 15:38:21 EDT
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <bf5.3154f7ac.355753ad@aol.com>

sometimes the backbiting on this list gets _extremely_ amusing...           
:+)

the original f.a.q. from michigan tells the numbers about michigan...


>    Q. 1: What is the UM?Google project?   
>    A: The UM-Google project is a partnership between UM and Google 
>    that will make the   seven million UM University Library volumes 
>    searchable via the Google search engine,   and open the way to 
>    universal access to information. Google will digitize our library 
>    collection and make the items accessible through the Google site. 
>    The University Library will also receive and own a high quality 
>    digital copy of the collection to use for its own 
>    purposes. 

voila, we have the 7-million number.


>    Q. 3: How long will the project take?   
>    A: Estimating how long the project will take is difficult, 
>    but we are currently planning for 
>    approximately six years of scanning. 

voila, we have the 6-year timeframe...


there were 5 libraries involved in the project at the outset.
my guess -- back then, and even now today -- would be
that if they intended to scan 7 million umichigan books in
6 years, they intended to scan _at_least_ another 7 million
from the other 4 libraries in that same amount of time, so
i'd say the implicit promise was to do 14 million in 6 years,
and i don't think you can call that an unreasonable position,
either then or now.

since -- after 3 years -- they've only scanned _1_ million books
from umichigan, then it is _completely_ fair to say that they are
"behind schedule" at umichigan.   of course, since many libraries
(dozens?) have joined the project since its onset, i'd guess the
schedule was altered somewhere along the line, and that's fine.
i'm convinced they're working on it, and working hard, so fine...

i _do_ wish that -- 3 years into it -- they would be a little bit
further along than 1 million out of 7 million umichigan books,
because that makes it look like this could take 20 years total...
but, you know, i'm not paying their bills, so what say do i have?

i'm just glad that public-domain books are popping up fast...
who knows, there might be a million of them, already or soon.
and that's a good number.

meanwhile, how about if we instead discuss some topics that
have some important substance, instead of mere trivialities?...

-bowerbird

p.s.   by the way, under google's newer contract with the c.i.c.,
which includes umichigan along with 11 other universities,
the "initial term" of the contract was for -- you guessed it --
6 years.   so it seems that they are fixated on that timeframe.
perhaps some corporate tax accountants could tell us why...


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080510/8750aa77/attachment.htm 

From hart at pglaf.org  Sat May 10 14:38:14 2008
From: hart at pglaf.org (Michael Hart)
Date: Sat, 10 May 2008 14:38:14 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <bf5.3154f7ac.355753ad@aol.com>
References: <bf5.3154f7ac.355753ad@aol.com>
Message-ID: <Pine.LNX.4.64.0805101423041.22818@pglaf.org>


This fits very well with the comments I was referring to,
all given on or about December 14, 2004.

I collected up dozens of interviews and news stories, and
none of them gave the entire story, but pieces fit here &
there into a pattern that wasn't and exact fit, but close
enough to get a pretty good picture.

I'm sure of the 10 million book estimate by at least one,
and of the 6 year figure, as well.

Obviously there are additional projects now with Google--
and without Google--my own local university is doing many
books, but from the insiders, and it takes insiders to do
any research on these, they are choosing books quite much
in opposition to our own PG philosophy, choosing lists of
books that they are sure no one else would choose & thus,
literally, no one else would read in general usage.

I do clearly remember some of the librarians' comments in
a manner suggesting this should be a library for readers,
all over the world.

However, as I suspectly then, and suspect even more now--
that many of these eBooks will never see the light of day
in any general sense.

1.  they are definitely NOT being prominently available.

2.  they are definitely NOT of general reader interest.

3.  they appear to be mostly "raw scans". . .not sure of
the percentage, if any, being reported as OCRed, proofed
by a human, or whatever.

I have actually spoken in person to the local book czar,
and one of the national ones, and can't get any real hot
and hard facts about what percentages, etc.

But I keep hoping. . . .


Michael


On Sat, 10 May 2008, Bowerbird at aol.com wrote:

> sometimes the backbiting on this list gets _extremely_ amusing...
> :+)
>
> the original f.a.q. from michigan tells the numbers about michigan...
>
>
>>    Q. 1: What is the UM??Google project?
>>    A: The UM-Google project is a partnership between UM and Google
>>    that will make the   seven million UM University Library volumes
>>    searchable via the Google search engine,   and open the way to
>>    universal access to information. Google will digitize our library
>>    collection and make the items accessible through the Google site.
>>    The University Library will also receive and own a high quality
>>    digital copy of the collection to use for its own
>>    purposes.
>
> voila, we have the 7-million number.
>
>
>>    Q. 3: How long will the project take?
>>    A: Estimating how long the project will take is difficult,
>>    but we are currently planning for
>>    approximately six years of scanning.
>
> voila, we have the 6-year timeframe...
>
>
> there were 5 libraries involved in the project at the outset.
> my guess -- back then, and even now today -- would be
> that if they intended to scan 7 million umichigan books in
> 6 years, they intended to scan _at_least_ another 7 million
> from the other 4 libraries in that same amount of time, so
> i'd say the implicit promise was to do 14 million in 6 years,
> and i don't think you can call that an unreasonable position,
> either then or now.
>
> since -- after 3 years -- they've only scanned _1_ million books
> from umichigan, then it is _completely_ fair to say that they are
> "behind schedule" at umichigan.   of course, since many libraries
> (dozens?) have joined the project since its onset, i'd guess the
> schedule was altered somewhere along the line, and that's fine.
> i'm convinced they're working on it, and working hard, so fine...
>
> i _do_ wish that -- 3 years into it -- they would be a little bit
> further along than 1 million out of 7 million umichigan books,
> because that makes it look like this could take 20 years total...
> but, you know, i'm not paying their bills, so what say do i have?
>
> i'm just glad that public-domain books are popping up fast...
> who knows, there might be a million of them, already or soon.
> and that's a good number.
>
> meanwhile, how about if we instead discuss some topics that
> have some important substance, instead of mere trivialities?...
>
> -bowerbird
>
> p.s.   by the way, under google's newer contract with the c.i.c.,
> which includes umichigan along with 11 other universities,
> the "initial term" of the contract was for -- you guessed it --
> 6 years.   so it seems that they are fixated on that timeframe.
> perhaps some corporate tax accountants could tell us why...
>
>
>
> **************
> Wondering what's for Dinner Tonight? Get new twists on family
> favorites at AOL Food.
>
> (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
>

From hart at pglaf.org  Sat May 10 14:54:20 2008
From: hart at pglaf.org (Michael Hart)
Date: Sat, 10 May 2008 14:54:20 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <254161975.687181210437857364.JavaMail.mail@webmail05>
References: <254161975.687181210437857364.JavaMail.mail@webmail05>
Message-ID: <Pine.LNX.4.64.0805101440180.22818@pglaf.org>


On Sat, 10 May 2008, Joshua Hutchinson wrote:

> The only places where I could find those numbers mixed together 
> was places like this:
>
> http://tzvee.blogspot.com/2007/06/google-will-scan-10-million-more.html
>
> Quick summary... the goal is 10 million, but no date was given. 
> The U of M contract was 6 years, but no goal was given.
>
> Maybe that is where the 10 million/6 years is coming from?
>
> Josh

Josh,

You have your tenses inverted.

You'd better get used to the idea that not all information
and knowledge comes from linked articles. . . .

There still is the "real world" out there, without links.

Just because you have not done the required research to do
references to something that happened December 14, 2004 is
neither a valid nor reliable indicator it did not happen.

Your comments are a great example for those who have been,
and will continue to do so, slamming Wikipedia research.

Not that I am saying you used Wikipedia, but just that you
oretebd that becuse you have no links, it didn't happen.

By the way, you should be able to find some refereences, I
provided a few at the time, but I am not really interested
in doing your homework for you, as you never say thanks if
I spend an hour answering your questions.

Haven't you had a chance to learn the moral behind:

"You get more with honey than with vinegar."

Given the years you have demonstrated this lack, it might,
just might, take that many years to change perceptions.

"Can the leopard really change its spots?"


>
> On May 10, 2008, marcello at perathoner.de wrote:
> Michael Hart wrote:
>>
>> You'd better go back and chekc al the other TV networks, not to
>> mention radio, newspapers, etc.
>
> Again, give us a link to where Google says they will digitize 10 MB in 6
> years or stand back from your claim.
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From marcello at perathoner.de  Sat May 10 15:46:54 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun, 11 May 2008 00:46:54 +0200
Subject: [gutvol-d] parallel -- the plunderer -- 04 -- and another
 upbeat post in this series
In-Reply-To: <4824B82F.306@novomail.net>
References: <c0f.270fd4f5.35521f46@aol.com> <4824B82F.306@novomail.net>
Message-ID: <482625DE.70509@perathoner.de>

Lee Passey wrote:

> The first step in improving DP's processes should be to find a 
> partner where DO's work product can be archived and accessible /in 
> addition to/ Project Gutenberg, and where Distributed Proofreaders might 
> have the kind of control required to implement "continuous proofreading."

If Michael decides to make a few quick bucks with his trademark and 
domain, and thus to abandon the free hosting facilities at ibiblio, DP 
will just have to step into the vacated environment. The books are PD, 
the software that drives the web site is GPLed, so is the catalog data.

Im sure many PG volunteers will stay with ibiblio and keep the new old 
distribution site running. DP will just have to come up with a new name 
and domain.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From marcello at perathoner.de  Sat May 10 16:26:04 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun, 11 May 2008 01:26:04 +0200
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <Pine.LNX.4.64.0805101440180.22818@pglaf.org>
References: <254161975.687181210437857364.JavaMail.mail@webmail05>
	<Pine.LNX.4.64.0805101440180.22818@pglaf.org>
Message-ID: <48262F0C.3080808@perathoner.de>

Michael Hart wrote:

> Just because you have not done the required research to do
> references to something that happened December 14, 2004 is
> neither a valid nor reliable indicator it did not happen.

You are an incorrigible liar.

(I tried to formulate this in a less rude way, but no polite expression 
quite covers the ground.)


Your assertion is made up.

Not only that, but when you are pressed to show some evidence -- and you 
can't -- then you try to wiggle your way out by reversing the positions. 
Not we have to do the research to find out if your claim is true (it is 
not) but YOU have to give US evidence that it is indeed true.


I understand that you are jealous because Google is getting all the 
attention while PG is getting none.

Still this is no reason to bad-mouth Google's very laudable exertions by 
making up "facts".

For decades PG neglected to organize itself and to get proper fundings. 
Now Google is doing what an organized and funded PG ought to have done 
long ago. So I guess that after all the media attention is falling where 
it is due.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From gbnewby at pglaf.org  Sat May 10 22:00:25 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat, 10 May 2008 22:00:25 -0700
Subject: [gutvol-d] parallel -- the plunderer -- 04 -- and another
	upbeat post in this series
In-Reply-To: <4824B82F.306@novomail.net>
References: <c0f.270fd4f5.35521f46@aol.com> <4824B82F.306@novomail.net>
Message-ID: <20080511050025.GD27486@mail.pglaf.org>

On Fri, May 09, 2008 at 02:46:39PM -0600, Lee Passey wrote:
> Bowerbird at aol.com wrote:
> 
> > given that all of these 3 errors would be reasonably expected to be
> > caught by the general public in "continuous proofing", i believe that
> > the question of whether p2 was even needed in this particular book
> > is open for discussion.
> 
> And therein lies the rub. Pretty much every suggestion for change you 
> have made to Distributed Proofreaders depends on the existence of an 
> effective and efficient "continuous proofreading" process--which does 
> not exist, and will probably never exist so long as Distributed 
> Proofreaders views Project Gutenberg as it's primary distribution mechanism.
> 
> The relationship between Project Gutenberg and Distributed Proofreaders 
> is both complex and one-way, with DP playing the role of the unrequited 
> lover. The volunteers at DP go to great lengths and expend great effort 
> in producing the finest work product their processes allow them to. 
> However, once DP has finished its work it is passed over to PG, where it 
> goes into the barrels with all the other apples.

I LOVE DP, and I love their apples.


> This is why the volunteers at DP expend so much time and energy on 
> trying to figure out how many rounds of various types are required to 
> maximize the quality of their work product. They know that once a set of 
> files for any particular work is submitted to PG that there is no 
> effective or efficient process to correct errors, or even enhance the 
> output. While there is a theoretical process to correct problems, there 
> is no practical process. ("In theory, there is no difference between 
> theory and practice. In practice, there is.") For all practical 
> purposes, when a file leaves DP and enters the PG archive it is forever 
> cast in stone.

I think you understand the challenges as well as I do, and as
always I'm ready to hear about any sort of solution, including but
not limited to forking or version control for the PG content. 
  -- Greg


> I agree that much of what DP now does would be unnecessary if there were 
> in place a "continuous proofreading" process; but there is not. And so 
> long as DP has no control over the archiving and distribution of its own 
> output it will be unable to put a "continuous proofreading" process into 
> practice. The first step in improving DP's processes should be to find a 
> partner where DO's work product can be archived and accessible /in 
> addition to/ Project Gutenberg, and where Distributed Proofreaders might 
> have the kind of control required to implement "continuous proofreading."
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From joshua at hutchinson.net  Sun May 11 08:43:30 2008
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sun, 11 May 2008 15:43:30 +0000 (GMT)
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <86761188.725181210520611248.JavaMail.mail@webmail06>

You know, it's kinda sad that the founder of PG is one of the two biggest trolls on gutvol-d.

Michael, since you missed it, that was an attempt to find a backup to your statement.  The link I found was the closest news article to what you said (there were some others like it, but nothing closer).

Yes, I mixed verb tenses up.  I apologize to the grammar nazi.

Josh

On May 10, 2008, hart at pglaf.org wrote: 


On Sat, 10 May 2008, Joshua Hutchinson wrote:

> The only places where I could find those numbers mixed together 
> was places like this:
>
> http://tzvee.blogspot.com/2007/06/google-will-scan-10-million-more.html
>
> Quick summary... the goal is 10 million, but no date was given. 
> The U of M contract was 6 years, but no goal was given.
>
> Maybe that is where the 10 million/6 years is coming from?
>
> Josh

Josh,

You have your tenses inverted.

You'd better get used to the idea that not all information
and knowledge comes from linked articles. . . .

There still is the "real world" out there, without links.

Just because you have not done the required research to do
references to something that happened December 14, 2004 is
neither a valid nor reliable indicator it did not happen.

Your comments are a great example for those who have been,
and will continue to do so, slamming Wikipedia research.

Not that I am saying you used Wikipedia, but just that you
oretebd that becuse you have no links, it didn't happen.

By the way, you should be able to find some refereences, I
provided a few at the time, but I am not really interested
in doing your homework for you, as you never say thanks if
I spend an hour answering your questions.

Haven't you had a chance to learn the moral behind:

"You get more with honey than with vinegar."

Given the years you have demonstrated this lack, it might,
just might, take that many years to change perceptions.

"Can the leopard really change its spots?"


>
> On May 10, 2008, marcello at perathoner.de wrote:
> Michael Hart wrote:
>>
>> You'd better go back and chekc al the other TV networks, not to
>> mention radio, newspapers, etc.
>
> Again, give us a link to where Google says they will digitize 10 MB in 6
> years or stand back from your claim.
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
_______________________________________________
gutvol-d mailing list
gutvol-d at lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


From hart at pglaf.org  Sun May 11 10:13:11 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 11 May 2008 10:13:11 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <86761188.725181210520611248.JavaMail.mail@webmail06>
References: <86761188.725181210520611248.JavaMail.mail@webmail06>
Message-ID: <Pine.LNX.4.64.0805110949250.5278@pglaf.org>


On Sun, 11 May 2008, Joshua Hutchinson wrote:

> You know, it's kinda sad that the founder of PG is one of the two 
> biggest trolls on gutvol-d.

You just can't stand it that snyone stands up to your ridicule.

>
> Michael, since you missed it, that was an attempt to find a backup 
> to your statement.  The link I found was the closest news article 
> to what you said (there were some others like it, but nothing 
> closer).

Josh, I am sure you have still not accepted that I am not going to
do your homework for you.  Again, I repeat, because we have to be
sure you actually got the message:

Just because you don't find a link doesn't mean it didn't happen.

Check the four major US TV networks, LISTEN to what they said,
and add in NPR, BBC, CBC, and the newspaper syndicates.

Personally, I think I remember one or two interviews that could
possibly satisfy even you, but I personally doubt you could be
satisfied unless you found someone who said 10 million AND 6 years
in the same phrase.

As I said before, I pieced all of it together on December 14, 2004,
and more analysis during the following few days.

You would/could/should have done the same thing if it had any
similar importance to you.

Now that it's mid-2008, it's just that much harder, and without
asking for cooperation, but actually intentionally alienating
the sources that could help you, you are just diggin your hole
deeper and deeper and deeper.

> Yes, I mixed verb tenses up.  I apologize to the grammar nazi.

Several people here have suggested that we should think more of
how this will all sound 10 years or so from now.

The real trouble from my point of view, and why I don't just do
the obvious of either ignoring you or moderating you is because
I have to consider how things will play out when I am gone.

You see, I have to respond to your silliness now to make it soo
obvious what you do that no one will take these comments you do
so often with anything other than a ton of salt.

You haven't made ANY points, either logically or emotionally to
further whatever cause it is you think you might have.

By the way, just what IS your cause for doing such things???

Is there an actual goal you have???

Other than just pouring more noise into the system???

If you had actually done the searches you said you did, I think
you would/could/should have found the materials in question.

However, I seriously doubt you put in even half as much work on
these searches as I did, or you would have found at least a few
of the references I and others have brought up.

It is all too obvious here, and elsewhere, that those who quite
literally make the most noise about troll, are actually trolls,
of the lowest/highest order, depending on your perspective.

So, for now I will just leave you be, you can have the floor on
these until someone gives me feedback that they actully believe
your rants and raves.

However, I will continue to resist your trolling to get others,
including myself, to do your own homework for you.

You are such a perfect example, I don't think I should moderate
you even if we did such moderation in Project Gutenberg.

However, I will remind the audience that those who call for the
moderator to "moderate"/censor others the most are those whom a
realisitic and logical observer would say should be moderated.

But this is all too obvious to the majority, and I will simply,
and forever, continue to make it obvious when you make examples
so very obvious.

Just Joshing. . . .

Michael

>
> Josh
>
> On May 10, 2008, hart at pglaf.org wrote:
>
>
> On Sat, 10 May 2008, Joshua Hutchinson wrote:
>
>> The only places where I could find those numbers mixed together
>> was places like this:
>>
>> http://tzvee.blogspot.com/2007/06/google-will-scan-10-million-more.html
>>
>> Quick summary... the goal is 10 million, but no date was given.
>> The U of M contract was 6 years, but no goal was given.
>>
>> Maybe that is where the 10 million/6 years is coming from?
>>
>> Josh
>
> Josh,
>
> You have your tenses inverted.
>
> You'd better get used to the idea that not all information
> and knowledge comes from linked articles. . . .
>
> There still is the "real world" out there, without links.
>
> Just because you have not done the required research to do
> references to something that happened December 14, 2004 is
> neither a valid nor reliable indicator it did not happen.
>
> Your comments are a great example for those who have been,
> and will continue to do so, slamming Wikipedia research.
>
> Not that I am saying you used Wikipedia, but just that you
> oretebd that becuse you have no links, it didn't happen.
>
> By the way, you should be able to find some refereences, I
> provided a few at the time, but I am not really interested
> in doing your homework for you, as you never say thanks if
> I spend an hour answering your questions.
>
> Haven't you had a chance to learn the moral behind:
>
> "You get more with honey than with vinegar."
>
> Given the years you have demonstrated this lack, it might,
> just might, take that many years to change perceptions.
>
> "Can the leopard really change its spots?"
>
>
>
>
>
>>
>> On May 10, 2008, marcello at perathoner.de wrote:
>> Michael Hart wrote:
>>>
>>> You'd better go back and chekc al the other TV networks, not to
>>> mention radio, newspapers, etc.
>>
>> Again, give us a link to where Google says they will digitize 10 MB in 6
>> years or stand back from your claim.
>>
>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d at lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Sun May 11 10:36:13 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 11 May 2008 10:36:13 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <48262F0C.3080808@perathoner.de>
References: <254161975.687181210437857364.JavaMail.mail@webmail05>
	<Pine.LNX.4.64.0805101440180.22818@pglaf.org>
	<48262F0C.3080808@perathoner.de>
Message-ID: <Pine.LNX.4.64.0805111022070.5278@pglaf.org>


Marcello,

You and yours have said pretty much exactly the same thing before.

Since no one pays any attention to you, you could save much labor
simply by sending the same messages you sent last time.

Don't you realize that the more you call names the less anyone
pays attention to you?

You come along all prickly and poisonous and then wonder why
nobody wants to give you a hug.

Since I have proven you wrong in the past, without any response
that anyone needed such proof, I leave you stew in your juices,
along with Josh, and the same potential handful you managed for
a few other such rants and raves.

Do you really think I have forgotten???

Do you really think everyone else has forgotten???

The truth is that no one believes you. . . .

If you really want a serious reply, you'll have to change tack
long enough that the memory has faded.

However, I will give you a clue:

The search terms you would have to use to find what you SAY is
non-existent are only half a dozen.

Let's presume you need to use ALL possible search terms, that,
as we once discussed before, would be about a dozen, and gets,
as was stated here, more this than you wanted.

So, do a few searches with the half dozens you might pick, and
after a few such searches, the answer should be obvious unless
some sites have changed between my searches and yours.

However, it's past "1984" and we should not be so sure no one,
and I mean many no ones, including here, have attempted to get
in some rewriting of history.

Doubly however, since I am sure you wouldn't thank me, even if
I did your homework for you, and wouldn't accept it, no matter
what was found, I have no reason to do your homework for you--
only for those who have a sincere desire to know.

Do you remember how silly it all looked the last time messages
very much like the one below appeared?

Since everyone seems to remember, I need not do it again, eh?


mh


On Sun, 11 May 2008, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>> Just because you have not done the required research to do
>> references to something that happened December 14, 2004 is
>> neither a valid nor reliable indicator it did not happen.
>
> You are an incorrigible liar.
>
> (I tried to formulate this in a less rude way, but no polite 
> expression quite covers the ground.)
>
>
> Your assertion is made up.
>
> Not only that, but when you are pressed to show some evidence -- 
> and you can't -- then you try to wiggle your way out by reversing 
> the positions. Not we have to do the research to find out if your 
> claim is true (it is not) but YOU have to give US evidence that 
> it is indeed true.
>
>
> I understand that you are jealous because Google is getting all 
> the attention while PG is getting none.
>
> Still this is no reason to bad-mouth Google's very laudable 
> exertions by making up "facts".
>
> For decades PG neglected to organize itself and to get proper 
> fundings. Now Google is doing what an organized and funded PG 
> ought to have done long ago. So I guess that after all the media 
> attention is falling where it is due.
>
>
> -- 
> Marcello Perathoner
> webmaster at gutenberg.org
>

From hart at pglaf.org  Sun May 11 10:48:02 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 11 May 2008 10:48:02 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <4825A028.2040203@perathoner.de>
References: <cf7.2f2ec2c2.3554ac98@aol.com>
	<Pine.LNX.4.64.0805081709460.18854@pglaf.org>
	<4824AEFC.50102@ibiblio.org>
	<Pine.LNX.4.64.0805091903370.8679@pglaf.org>
	<4825A028.2040203@perathoner.de>
Message-ID: <Pine.LNX.4.64.0805111037450.5278@pglaf.org>


Sorry, I don't think Marcello can admit that that HUGE media blitz
on December 14, 2004 didn't happen all by itself.

This could be either because Marcello doesn't understand the P.R.
departments of places such as Google or because he does not want
YOU to understand.

If you think Google JUST put the link below up one day and EVERY
major news outlet ran it a few hours later as a major story then
you probably don't realize just how much of the "news" is fed to
the media via these various P.R. people.

Once again I must suggest, as though Marcello and Josh were in a
listening mode, to go over the various interviews that aired the
particular day of December 14, 2004.

There is not, and was not, one single source that provided /all/
the information that was broadcast and printed that day.

If you do your homework, you will find many such sources, some I
referenced earlier, but none all that hard to find.

If you actually look at the origins of Google Print Library from
December 14, 2004. . . .

Go for it!

Or not!

But don't complain if you don't.

You can tease and troll all you want, but I learned not to work,
or play, in response to such teasing and trolling from you.

If you were serious, your message below would not be so empty.

It would have substance, not just accusations.

;-)

mh

On Sat, 10 May 2008, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>>> No, they didn't. Here's a link to the story CBS News placed on 
>>> its website back on Dec. 14, 2004:
>>> 
>>> "Google To Scan Library Volumes"
>>> http://www.cbsnews.com/stories/2004/12/14/tech/main660896.shtml
>>
>> You'd better go back and chekc al the other TV networks, not to
>> mention radio, newspapers, etc.
>
> Usually, Michael, if YOU make a statement YOU have to prove it.
>
> If there are lots of papers, TV networks etc. that brought YOUR 
> version of the facts, then it will be easy for YOU to come up 
> with a link to prove it.
>
>
>> Google announced on December 14, 2004 that they would digitize
>> 10 million books in 6 years.
>
> All that Google "announced" was this blog entry:
>
>  http://googleblog.blogspot.com/2004/12/all-booked-up.html
>
> where they say NOTHING about 10 million NOR ANYTHING about 6 
> years.
>
>
> Again, give us a link to where Google says they will digitize 10 
> MB in 6 years or stand back from your claim.
>
>
>
> -- 
> Marcello Perathoner
> webmaster at gutenberg.org
>

From hart at pglaf.org  Sun May 11 11:28:39 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 11 May 2008 11:28:39 -0700 (PDT)
Subject: [gutvol-d] Grammar Error vs Logic Error
Message-ID: <Pine.LNX.4.64.0805111109240.5278@pglaf.org>


By the way, reversing tenses is not to much a grammatical error
as it is an error in logic.

Trying to pretend that just because someone can't find links to
easy to find reports today about something that took place back
in 2004, and the accompanying tense errors, etc., are errors in
logical construction, not errors in grammatical construction.

Throwing in accusations of "liar," "rude," Nazi, etc., will not
gain one much of an audience here in Project Gutenberg.

Just because someone won't do your bidding does not make such a
person "rude."

Just because somoene disagrees with ou doesn't make a "liar."

You can always search for the last time I was called a "liar"--
right here on this list--and find out how silly it all looks in
the light of reason, logic, and history.

Now, back to the point, I am sure that if push comes to shove--
which it might--Google will DENY the following as FALSE:


1.  That December 14, 2004 was the official beginning of eBooks
or "The Google Print Library" or anything else from Google.

However, if you listen/read those reports again you will find a
present tense, nothing to indicate the project had started some
weeks or months or years ago, or would be starting some numbers
of weeks or months or years later.

I don't really think there has been a larger "media blitz" from
any company in history.  BTW, you can find many of my comments,
current, previous, and future, searching for "media blitz," and
"Google" along with any other terms you like.


2.  That anyone SHOULD have SAID doing 10 million books.

3.  That anyone SHOULD have SAID doing it in 6 years.


Google might even copy some comments given here and say neither
of these statements was ever made in any of the news coverage a
world saw on or about December 14, 2004.

Questions for consideration:

Does anyone really think all those interviews taking place in a
handful of famous library just happened at the spur of moments'
interest created by a simple blog report mentioned earlier?

If you think it might have been just incidental/accidental, may
a thought about the difference in time zones might give thought
to the idea that some of these interviews took place earlier in
some time zones than others, by enough of a factor indicating a
setup lead time in exess of what you would get with no planning
of this as a worldwide event.

Oh, well, perhaps I am trying to elevate the logic too far from
where it was earlier, perhaps I should have let it all lie.

;-)


Michael


From hart at pglaf.org  Sun May 11 12:20:21 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 11 May 2008 12:20:21 -0700 (PDT)
Subject: [gutvol-d] Current Google Search on 10 million and 6 years
Message-ID: <Pine.LNX.4.64.0805111208220.7880@pglaf.org>


I wanted to make sure all the quotes I referenced had not vanished,
and no, the very first search I did not only gave the 6 years, but,
in addition, even the name of the person in question.

The same hit included the UMich reference given earlier, to which a
number of other libraries addes similar numbers that day.

I also got a hit for the 10 million but I didn't like it as much as
the original I got on December 14, 2004.

I used only three terms, all this from the first hit.

Obviously there are some people here NOT doing their homework.

Plus, this was also made obvious in another's previous message,
so it becomes just more and more obvious who the real trolls are
and that they are just making noise without providing content.

Why do I go through all this???

Because I want everyone to remember when Marcello, Josh and Co.,
try to take over Project Gutenberg, again, and again, and again.

If I can get such good answers in the first hit I get, they are
obviouly either NOT trying or are totally inept at searching.

In either case they have no case.

Now perhaps I can leave this all be for another year.

I missed this usual ranting and raving in March, as mentioned,
but perhaps with the loss of Jon Noring from their ranks, this
process just takes a little longer.

I've hearde scattered reports that Mr. Noring is ok, but I can
get no replies from him.  If anyone has more info. . .please.

Only 8 weeks until Project Gutenberg turns 38 years old!!!


Thanks!!!

Michael


From Bowerbird at aol.com  Sun May 11 12:36:52 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 11 May 2008 15:36:52 EDT
Subject: [gutvol-d] doing homework
Message-ID: <c7a.33271617.3558a4d4@aol.com>

hey, i love doing homework...

so i've found _lots_ of quotes!

so, you know, if you need any,
just shoot me a backchannel...           :+)

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080511/b1aeed4e/attachment.htm 

From hart at pglaf.org  Sun May 11 19:44:28 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 11 May 2008 19:44:28 -0700 (PDT)
Subject: [gutvol-d] Current Google Search on 10 million and 6 years
In-Reply-To: <53392.83.171.175.104.1210540007.squirrel@www.franken.de>
References: <Pine.LNX.4.64.0805111208220.7880@pglaf.org>
	<53392.83.171.175.104.1210540007.squirrel@www.franken.de>
Message-ID: <Pine.LNX.4.64.0805111936250.13238@pglaf.org>


On Sun, 11 May 2008, Karl Eichwalder wrote:

>
> Michael Hart schrieb:
>>
>> I wanted to make sure all the quotes I referenced had not vanished,
>> and no, the very first search I did not only gave the 6 years, but,
>> in addition, even the name of the person in question.
>
> etc. pp.
>
> Do us and you a favor and stop posting.  Nobody is
> interested in your number acrobatics.  It's all boring.

Just what numbers are you talking about?

Did I bring up any  numbers in the last few days?


> It is not fair how you treat Marcello and Josh, who
> contributed that much to our free library and who are
> still productive.

Oh, right, you are just fine with them calling me a liar,
rude, and a Nazi when they won't even look up the data
they say they want?  And _I_ am not fair?

Is the really the opinion you want on the record as the
best you have to offer?


> If you do not like Google, get a blog and post your
> rantings, but stop spamming this list.

YOU aid "of you don't like Google". . .not me.

I just quoted their press from December 12, 2004.

Why would you make the presumption from that
that I don't like them?

Sure, I would run that project differently than they do,
and you, Josh and Marcello could all run Project Gutenberg
better than I do, which is why I let them do just that.

Your complaint is???


>
> -- 
> Karl Eichwalder
>


mh

From lee at novomail.net  Tue May 27 10:27:49 2008
From: lee at novomail.net (Lee Passey)
Date: Tue, 27 May 2008 12:27:49 -0500
Subject: [gutvol-d] Any "Top 1000" style lists of Gutenberg texts in
 public domain?
In-Reply-To: <20080509144312.GA20699@moxie>
References: <20080509144312.GA20699@moxie>
Message-ID: <483C4495.1090802@novomail.net>

Larry Marso wrote:

> Anyone aware of any good "Top 1000" style lists of the texts found at
> Gutenberg, particularly any that is itself in the public domain?
> 
> I'm not looking for a "# of times downloaded" list, but rather judgments
> of the merit of text applying various criteria.

I'm not aware of anything like this for Project Gutenberg. Indeed, PG 
itself only maintains download stats for the past 30 days.

However, Bartleby.com has the entire Harvard Classics and Shelf of 
Fiction online (http://www.bartleby.com/hc/) which implies the values 
which Charles W. Eliot has assigned to specific books.

Great Books of the Western World is a series of books originally 
published in the United States in 1952 by Encyclop?dia Britannica Inc. 
in an attempt to present the western canon in a single package of 54 
volumes. The series is now in its second edition and contains 60 
volumes. The second edition contains 130 authors and 517 individual 
works. A list of the contents can be found on Wikipedia, 
http://en.wikipedia.org/wiki/Great_Books_of_the_Western_World. The list 
of great books is maintained by the Great Books Foundation. You may be 
able to find a more comprehensive list at their web site, 
http://www.greatbooks.org/.

See also Robert Teeter's list of lists at 
http://www.interleaves.org/~rteeter/greatbks.html.

Lastly, don't spurn the download stats from PG. Given the all-volunteer 
nature of PG, if a work has been transcribed for PG at all, it means 
that someone thought it an important enough work to be digitized. If 
someone goes to the trouble to download it, despite the obvious 
imperfections, it means that that person probably thought it important 
enough to read. If you were to take the first 2000 texts created in PG, 
and sort out the top 1000 downloads, you'd probably have as good a list 
as most literature professors could create.

A close approximation of the top 400 downloads from PG over the past 3 
years can be found at http://www.passkeysoft.com/~lee/zero.txt.

-- 
Nothing of significance below this line.


From hart at pglaf.org  Mon May 12 11:42:50 2008
From: hart at pglaf.org (Michael Hart)
Date: Mon, 12 May 2008 11:42:50 -0700 (PDT)
Subject: [gutvol-d] Any "Top 1000" style lists of Gutenberg texts in
 public domain?
In-Reply-To: <483C4495.1090802@novomail.net>
References: <20080509144312.GA20699@moxie> <483C4495.1090802@novomail.net>
Message-ID: <Pine.LNX.4.64.0805121136360.29111@pglaf.org>


If you are really looking for lists as large or larger than 1,000 
books, I would ask your locl librarians to look in the lists by
Eugene Garfield and his ISI work.

He used to publish a list of the 1500 most quoted general works.

Don't be misled by his hundreds of other lists, this one should
start with Plato, Aristotle, Shakespeare, or the like, and should
be obviously the one you are looking for, but he has other lists
for various other subjects and media.

Another clue that you are on the right list is that the top 10
or so on the list might be quoted 1,000 - 2,000 times in the
references he counts.

Garfield's list is the only one I know of with so many titles,
many/most of which are found in Project Gutenberg if/when they
are public domain.

If you have trouble finding this list, let me know.


Michael


From ebooks at ibiblio.org  Mon May 12 23:58:40 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Tue, 13 May 2008 02:58:40 -0400
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <Pine.LNX.4.64.0805091903370.8679@pglaf.org>
References: <cf7.2f2ec2c2.3554ac98@aol.com>	<Pine.LNX.4.64.0805081709460.18854@pglaf.org>	<4824AEFC.50102@ibiblio.org>
	<Pine.LNX.4.64.0805091903370.8679@pglaf.org>
Message-ID: <48293C20.2010709@ibiblio.org>

I see that Michael Hart has been busily employing some of his typical, 
evasive debating tactics rather than providing any evidence to support 
his claims. (I say "typical" because I've seen him use these tactics 
many times before.) Unlike him, I will provide evidence--to refute his 
claims. :)


On May 9, 2008, Michael Hart wrote:

> On Fri, 9 May 2008, Jose Menendez wrote:
> 
>> On May 8, 2008, Michael Hart wrote:
>>
>>
>>> Google announced on December 14, 2004 that they would digitize
>>> 10 million books in 6 years.
>>
>> No, they didn't. Here's a link to the story CBS News placed on its
>> website back on Dec. 14, 2004:
>>
> 
> 
> Your research is very incomplete if you think the CBS story 
> contained all the press releases given by the "Google Print Library
> personnel on that date.


Could you point out where I said that the CBS story contained all the 
press releases? If I had thought that the CBS story was complete, I 
wouldn't have bothered to link to the BBC News story as well, would I? 
And I never said that the BBC story, or both stories together, 
contained all the press releases, did I?


> You'd better go back and chekc al the other TV networks, not to
> mention radio, newspapers, etc.


I really got a good laugh from this line, Michael, especially since 
you included "radio." Do you recall when I used an NPR radio report 
back on the Book People mailing list to *conclusively* demonstrate 
that another one of your claims about the Google Print Library Project 
(and its media coverage) was false? Let me remind you. Here's a link 
to a post I sent to the BP List on July 1, 2005:

"Re: !@!Re: [BP] Is Google Print Real?"
http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&post=2005-07-01,1

And here's the relevant excerpt:


> On June 26, 2005 Michael Hart wrote:

[snip]

>> Once again, I am ONLY referring to the December 14 public relations
>> blitz that included millions of dollars worth of publicity, NONE of
>> which mentioned sending searchers off to buy books, to physical
>> libraries, or other sources via the BBC, CBS, NBC, ABC, PBS, NPR, etc.
>
>
> Oh, really? NONE of those news sources mentioned those things? Let's put your assertion to the test, shall we? Here's a link to a very short description of the story NPR's "All Things Considered" aired on the afternoon of December 14, 2004:
>
> "Google to Digitize Major Library Resources"
> http://www.npr.org/templates/story/story.php?storyId=4227893
>
> Click on the "Listen" icon on that page to hear the actual report that aired on Dec. 14th. (The audio is available in both Real Player and Windows Media Player formats.) Here's a quote from that NPR broadcast:
>
>
> "It'll work just like any other Google search. If you type in, say, 'books ancient Rome,' three titles appear at the top of your results. Click on one, and if the book is out of copyright, John Wilkin at the University of Michigan library says you'll be able to read the whole thing.... If the book is still in copyright, you'll get a few short segments with your search terms highlighted. Google will also tell you where you can buy the book and in some cases where you can borrow it locally."
>
>
> So much for your claim that NONE of your sources mentioned those things. I guess you managed to miss that NPR story, Michael. ;) 


That was a fun debate. :) (For those living outside the U.S. who may 
not be familiar with it, NPR stands for National Public Radio.)


> I've quoted many of these here in the past, and I hope that these
> kinds of statement will NOT be taken at face value when I am gone.


I've looked at some of your past messages, posted soon after the 
Library Project was announced, and, unfortunately for you, they don't 
support your present claims. Oops!

For example, here's a post you sent to the Book People mailing list on 
June 10, 2005:

"More Google Print Queries"
http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&post=2005-06-10,2

Here are a few excerpts:


> We are coming up on June 14, 2005, the end of the first 6 months
> of the project that received millions of dollars of publicity on
> December 14th, 2004, when Google revealed that it had "invented"
> the idea of the electronic library.
> 
> Here are several aspects of Google Print people have commented a
> number of times on, and YOUR comments would be appreciated, both
> on these topics and any additional topics or points of view your
> own experiences have brought to light. 

[snip]

> 2.  Are they producing as many books as planned?
> 
> The initial claims that the hardware was already there
> and ready to use and that the libraries were all ready
> to provide 10-15,000 books per week for scanning, seem
> to have vanished in terms of publicity, but we CAN see
> that some Google Print eBooks are actually online, but
> it's hard to figure out how many.
> 
> At 10,000 books per week for 50 weeks of the year, the
> project would generate 1/2 a million books per year.
> 
> At 15,000 books per week for 50 weeks of the year, the
> project would generate 3/4 million books per year.
> 
> Thus it would take 20 years to accomplish their stated
> goal of 15 million eBooks in 20 years, at 15,000/week,
> but I recall the original goals being set at 10 years,
> or perhaps even 15, with no mention of 20 years.


Look closely at that last paragraph, Michael, and see what you claimed 
were Google's "stated goal" and "original goals." Neither one, 
according to you, was "10 million books in 6 years." Oops!

Now, you did post that message nearly 6 months after the big Google 
"media blitz," and memories can fade, so let's take a look at a 
message you sent to the gutvol-d list on Dec. 14, 2004, the same day 
as the "media blitz":

"Google Partners with Oxford, Harvard & Others to Digitize Libraries"
http://lists.pglaf.org/private.cgi/gutvol-d/2004-December/000978.html

Here's an excerpt:


> The two projections I heard were 7 and 10 years for the project.


Hmmm... There's another mention of "10 years," which you said, in the 
Book People post I just quoted, was the period the "original goals" 
were set at.

Now let's take a look at a post you sent to the gutvol-d list on Dec. 
15, 2004, the day after the "media blitz":

"Re: [ebook-community] Google Question for Michael Hart"
http://lists.pglaf.org/private.cgi/gutvol-d/2004-December/001003.html

Here's an excerpt:


> BTW, they said 15 million eBooks. . .and I'm not sure they HAVE
> 15 million eBooks that they can legally use in the worldwide
> service they announced yesterday.


There's the same "15 million" figure you mentioned in your Book People 
post. You didn't say anything about "10 million."

Now let's look at PT1 of the Weekly Project Gutenberg Newsletter you 
sent out on Dec. 15, 2004, the day after the "media blitz":

http://lists.pglaf.org/pipermail/gweekly/2004-December/000043.html

Scroll down to the "Headline News from NewsScan and Edupage" section, 
and you'll see this:


> >From NewsScan:
> 
> GOOGLE CUTS DEAL WITH LIBRARIES TO DIGITIZE HOLDINGS
> Flush with new wealth after its IPO last summer, Google has offered
> to underwrite the cost of digitizing library collections at Harvard,
> Stanford, Oxford, the University of Michigan and the New York Public
> Library. Although company executives declined to comment on the total
> funding amount, one estimate pegs it at $10 for each of the more than 15
> million books and other documents covered in the agreement.
[snip]
> (New York Times 14 Dec 2004)
> <http://www.nytimes.com/2004/12/14/technology/14google.html>


There's that "15 million" figure again. Now let's take a look at the 
"New York Times" article that was cited from Dec. 14, 2004:

"Google Is Adding Major Libraries to Its Database"

Here's the paragraph most relevant to our little debate:


"Although Google executives declined to comment on its technology or 
the cost of the undertaking, others involved estimate the figure at 
$10 for each of the more than 15 million books and other documents 
covered in the agreements. Librarians involved predict the project 
could take at least a decade."


Note the number of books given, Michael: "15 million," not "10 
million." Note the time period given: "a decade" (10 years), not "6 
years." Note also that it says "at least a decade," not "at most a 
decade." "At least" means that a decade was the *minimum* amount of 
time they were predicting the project could take.  OOPS!


> Let's not forget all the interviews with the various librarians
> at the original member institutions of the project, along with
> any number of Google officials and others.


Note the source the "New York Times" gave for the "at least a decade" 
prediction: "Librarians involved."


> _I_ went over ALL of them I could find, had people send me tapes
> of others. . . .


Apparently, you couldn't find the "New York Times" article you linked 
to in your own newsletter. You also apparently missed the ABC World 
News Tonight report about the project on Dec. 14, 2004. Unfortunately, 
I couldn't locate a link to the original report with either the poor 
search function on the ABC News website (http://abcnews.go.com/) or a 
Google search confined to the ABC News website. But I did find this 
article, which cites the ABC News report, posted on LISNews Librarian 
And Information Science News on Dec. 16, 2004:


"Google to Digitize 15 Million Books in 10 years"
http://lisnews.org/node/12867/

"During the December 14, 2004 broadcast of ABC News World News 
Tonight, Peter Jennings reported that Google announced their goal to 
digitize 15 million Books in 10 years. This ABC news report featured a 
machine housed in the basement of the Stanford University Library that 
can digitize 1,000 pages each hour where the outcome produces pages of 
books that can be searched on...."


Now, compare that first sentence:

"During the December 14, 2004 broadcast of ABC News World News 
Tonight, Peter Jennings reported that Google announced their goal to 
digitize 15 million Books in 10 years"

to your recent claim:

"Google announced on December 14, 2004 that they would digitize
10 million books in 6 years."

I suppose you'll tell us now that Peter Jennings didn't do his 
homework either before airing that report. ;)


> Why?
> 
> This was probably a lot more important to me than to anyone else.


It was so "important" to you, Michael, yet, as I pointed out in my 
last post, "you always seem to have trouble remembering what was 
actually said." :)


Last, but not least, let's take a look at a lengthy message you sent 
to the Book People list on Dec. 21, 2004, just *one week* after the 
big "media blitz":

"Project Googleberg"
http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2004&post=2004-12-21,2

Here are some excerpts:


> PROJECT GOOGLEBERG
> 
> 
> This message contains most of those questions Project Gutenberg
> received about Google Print over the past week, and first draft
> answers.  At this time I have not included quotations from some
> of reports I have at hand, so if you have any favorites such as
> "This is going to change the entire world" sort of thing, email
> them to me for inclusion in the final draft.  Added questions &
> comments are encouraged. 

[snip]

> In the 48 hours since the announcement of the "Google Print" project,
> I have listened to 6 major network news stories and read, and reread,
> the major print media stories in an attempt to answer these following
> questions as best I can.  Sometimes it has not been possible to get a
> good answer from the information available, and I am either guessing,
> or passing on indirect information from others. 

[snip]

> 2.  How many books will there be, and when will they be available?
> 
> 15 million was the number thrown around the most, but I doubt that it
> is possible that even this collection of famous libraries should have
> enough books that fit the criteria they announced for their worldwide
> eBook service:
> 
> A.  Public Domain
> 
> B.  19th Century
> 
> C.  Scannable Editions


Before posting more excerpts, I want to point out your obvious 
confusion. The major news stories I saw, including the two from the 
CBS News and BBC News websites that I cited in my last post

"Google To Scan Library Volumes"
http://www.cbsnews.com/stories/2004/12/14/tech/main660896.shtml

"Google to scan famous libraries"
http://news.bbc.co.uk/2/hi/technology/4094271.stm

and the "New York Times" article you linked to in your newsletter, 
stated that both the Michigan and Stanford libraries had agreed to let 
Google scan all or nearly all of their books. How you thought that 
scanning practically all the books in 2 major university libraries 
meant scanning only 19th century public domain books is beyond me. As 
for your "19th Century" criterion, those CBS, BBC, and "New York 
Times" articles show that that limitation applied only to Oxford's 
library. Keeping that confusion in mind, let's take a look at some 
more excerpts:


> I'm guessing that when they start researching the copyright issues, a
> retraction will be made, stating that they have discovered those 19th
> Century books might still be under copyright, depending on a lifespan
> of the authors.  Some of their most important works, such as Oxford's
> "Oxford English Dictionary" might have a few volumes published before
> the 20th Century that are still under copyright and thus not eligible
> for inclusion in their proposed service, along with many other books.
> 
> [Of course this depends on which country the database is placed in,
> as the copyright rules for the U.K. are different that in the U.S.]
> 
> Obviously, if they are really going to include 15 million books, they
> will have to include nearly every public domain book in each of their
> 5 member libraries, from the rarest to the most common.  I am told by
> information science professionals that there have only been just over
> 30 million copyrights sought in entire history of the United States--
> and that includes millions of items besides books.  Doing 15 million,
> might then become problematic, as there might not be 15 million total
> separate books to work from, even presuming that at least half of the
> 30 million U.S. copyrights sought between 1790 and 2003 were for book
> titles that are now in the public domain.  Obviously not every single
> book ever published has made it into these 5 libraries.
> 
> The various time frames mentioned have ranged at least from 10 years,
> the longest, to 6 years, the shortest, among those I have seen.
> 
> Please let me know if you have seen a wider range.
> 
> I, myself, would bet they will have to cut some corners to include 10
> million books 10 years from the announcement date, December 14, 2014.


Aha! There we have it, Michael. Because you didn't think Google would 
be able to come up with "15 million" public domain books, *you* came 
up with the "10 million" figure, but note the time span you gave in 
that last line: "10 years," not the "6 years" you're saying now, which 
was the time period given in many media reports for digitizing just 
the University of Michigan library.

Well, in conclusion, Michael, once again your claims crashed into the 
facts and didn't survive the collision. :)


Jose Menendez


P.S. You sent out that same "Project Googleberg" post to PG's gweekly 
mailing list on Dec. 21, 2004. Here's a link to it in the gweekly archive:

http://lists.pglaf.org/pipermail/gweekly/2004-December/000044.html

From hart at pglaf.org  Tue May 13 00:58:09 2008
From: hart at pglaf.org (Michael Hart)
Date: Tue, 13 May 2008 00:58:09 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <48293C20.2010709@ibiblio.org>
References: <cf7.2f2ec2c2.3554ac98@aol.com>
	<Pine.LNX.4.64.0805081709460.18854@pglaf.org>
	<4824AEFC.50102@ibiblio.org>
	<Pine.LNX.4.64.0805091903370.8679@pglaf.org>
	<48293C20.2010709@ibiblio.org>
Message-ID: <Pine.LNX.4.64.0805130011450.6491@pglaf.org>


So, Jose, are you really, after our previous exchange...
really going to pretend there was no mention of 6 years?

Or 10 million books?


On December 14, 2004, in the major media?


If you are NOT going with that pretense, then all your
rhetoric has no logical place in the conversation other
than as additional footnotes.


Those are the only two items that were in this subject
at this time, whether those were said on the record in
the major media as a result of Google's project PR.


You are all too obviously going for the overload of an
awful lot of information, none of which states that it
was not true than any of the interviewees said numbers
as I have stated.

By now you certainly should actually have the names of
at least one and perhaps two of those I am quoting.

After all, it took me only one search and it was first
in the list of hits just a day or two ago.

Thus you must know that at least one of the interviews
included the "6 years" frame of reference.

Why can't you just share that with everyone here, stop
pretending it wasn't said, and get on with your life?

What, exactly, is your goal with that last message???

Obviously the sheer tonnage was enough for elephants--
but you didn't actually hit either of the targets.

Once again I remind you, and request to you, to think,
if at all possible, about how all this will look in 10
years or whatever, when you just might want persons in
possession of this conversation to take you seriously.

Please. . . .


Michael


PS


Yes, I am sure there were discussions of 10 years, and
other periods as well, but that is not the issue here.


Yes, I am aware there were discussions of various book
totals for The Google Print Library, but they aren't a
subject of the previous discussion here.


Yes, I COULD have tried to make Google's claims seem a
lot more by stressing the largest numbers mentioned in
any of the converations reported, but I do not work in
such a manner.

However, I must admit that those who claim 15 million,
previously and at present, were probably more on their
targets, given the 7 million mentioned of U. Michigan,
only one of the project members at that time, and even
a smaller fraction of the members announced since, but
I am not trying to distort the December 14, 2004 data,
via the addition of 20/20 foresight into later media.

It SHOULD be obvious to anyone who simply writes down,
and then uses, a table of the figures at hand on dates
from December 14, 2004 on, The Google Print Library is
now in possession of so many more member libraries and
so much more technology that keep up with with what is
on the record from that date, actually surpassing such
figures as if they were standing still, which they are
in fact doing, while realily gallops by.

However, once again, I am NOT using those new figures,
in any way, shape or form.

Yes, I did NOT quote the UMich figure of 7 million.

Why not?

Because I was NOT referring to just that one library--
but the entire Google Print Library project.

Yes, other numbers have been mentioned later, but I am
not trying to muddy the waters with larger figures.

If Google makes it to 10 million I won't be the one to
say it should have been 15 million, though I might say
I know people who might misquote me as to what I said,
in this particular conversation, or others, in, or out
the proper or improper contexts.

I am not trying to say, and never have, that any media
report contained one single reference that said Google
would be doing 10 million books in 6 years. . .I ain't
saying that it was said altogether in one breath or in
one phrase or in one sentence or in one report or by a
single network or press syndicate. . .I have said over
and over that I had to go through a number of reports,
and went through them a number of times, to gain those
data points necessary to create a perspective model.

Now, could someone else have created differing models?

Of course.

I never said mine was the only possible interpretation
that anyone would/could/or should consider.

However, if anyone really cares to look, they can find
the 6 years quotation without any real effort.

If need be, after a while, I will post the source, and
the media corporation reporting the source.

If numbers higher than 15 million are there, it makes,
without a doubt, my current statement of 10 million to
appear conservative by comparison.

Next!

From Bowerbird at aol.com  Tue May 13 01:05:19 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 13 May 2008 04:05:19 EDT
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <c20.372e3c2e.355aa5bf@aol.com>

ok, let's see if we can agree on a few things...

google has promised to scan millions and millions and millions of books.
heck, sometimes, they even talk about indexing "every book ever written".
lofty plans...   it's about time _someone_ grew a pair of balls and did 
this...

as far as a _timeframe_ for doing it, most reports gave that as _6_years_,
although some claimed "a decade", and some say it will take even longer.
3 years in, it appears that the latter predictions were the most accurate...

it has even been said that peter jennings reported on "world news tonight",
back on december 14, 2004, that "google announced their goal to digitize
15 million books in 10 years".   but a.b.c. _is_ owned by disney, so perhaps
they were just making up those numbers, in the great spirit of walt and roy.

and it's clear that the university of michigan and google expected that the
7 million books at umichigan would take about 6 years to scan, and i think
nobody in their right mind expected google to work _only_ at umichigan
during those first 6 years.

given all these various reports, a reasonable synthesis of the positions is
"10 million books in 6 years" -- best you will get on such a p.r. elephant...

but from their own announcement in february of this year, we know that
umichigan only has _1_million_ books scanned thus far -- 3 years along --
so i think it's very safe to say that they are "behind schedule at this 
time"...

is there anything here worth arguing about at this time?   if so, i don't see 
it.
i'm sure not gonna lose any sleep over any of it...

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080513/2de1e782/attachment.htm 

From Bowerbird at aol.com  Tue May 13 01:17:09 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 13 May 2008 04:17:09 EDT
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <cfa.3300636c.355aa885@aol.com>

if i would've know michael had just posted, i wouldn't have posted too...

you can tell when jose gets desperate, because 
he starts to argue on both sides of the dispute...

and his barrage of "data" is meant to get people to stop paying attention,
probably so they don't notice how badly he has degraded the "discussion".

the truth is pretty easy to see here.   google has fallen behind its 
timetable.
but who cares?   as far as i know, there was no "over/under" line on them
in las vegas.   so as long as they're still scanning, i'm still happy with 
them.

everything else is just needless backbiting...

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080513/676378b7/attachment-0001.htm 

From Bowerbird at aol.com  Wed May 14 13:23:23 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 14 May 2008 16:23:23 EDT
Subject: [gutvol-d] games with a purpose (turning proofing into a game)
Message-ID: <bc0.30bb1b63.355ca43b@aol.com>

luis von ahn has a new site up:   "games with a purpose".

>    http://www.gwap.com/gwap/

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080514/c7a36157/attachment.htm 

From joyce.b.wilson at sbcglobal.net  Wed May 14 17:13:39 2008
From: joyce.b.wilson at sbcglobal.net (Joyce Wilson)
Date: Wed, 14 May 2008 19:13:39 -0500
Subject: [gutvol-d] Line-wrap problems in Chinese texts?
Message-ID: <482B8033.7020809@sbcglobal.net>

Hey,
I've been looking at a lot of recent Chinese texts lately while working 
on the PG catalog, and I've noticed really widespread problems with lack 
of line-wrapping in the texts (for instance, of the 6 recently-posted 
works by Lu Xun, 3 have problems with lack of line-wrap).  What's up 
with that?

And in case anyone on this list might be able to pass word along to 
where it would do good, it would be really *really* helpful to the 
catalogers if all the new Chinese texts included their titles and 
authors (if known) *in Chinese characters* at the beginning of the 
text.  Some do, but many don't.  When they don't, we have only the 
romanized versions (sometimes non-standard romanizations, sometimes 
combined with English translation, sometimes translation alone) in the 
file header to go on (along with whatever Google can turn up).  Google 
is a life-saver, but it can be time-consuming for non-Chinese-readers to 
determine that, for instance, the new author "Chr Chr Dau Jen" is 
already in the catalog as "Yunyangchichidaoren".  And aside from the 
catalogers, it would just be nice to have that information present in 
the text for the reader (now and then I remember that it isn't all about 
us catalogers! ;) ).

Pip-pip,
Joyce W

From julio.reis at tintazul.com.pt  Thu May 15 12:32:34 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Thu, 15 May 2008 20:32:34 +0100
Subject: [gutvol-d] games with a purpose (turning proofing into a game)
In-Reply-To: <mailman.2.1210878002.24690.gutvol-d@lists.pglaf.org>
References: <mailman.2.1210878002.24690.gutvol-d@lists.pglaf.org>
Message-ID: <1210879958.12377.2.camel@abetarda>

Too bad the terms of service.

Closed content, proprietary stuff, yadda yadda.

But it looked like a good idea.

And I tried it and it's... fun.

J?lio.

Qui, 2008-05-15 ?s 12:00 -0700, gutvol-d-request at lists.pglaf.org
escreveu:
> luis von ahn has a new site up:  "games with a purpose".
> 
> >   http://www.gwap.com/gwap/


From Bowerbird at aol.com  Mon May 19 11:48:29 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 19 May 2008 14:48:29 EDT
Subject: [gutvol-d] cleaning up the catalog
Message-ID: <d19.2705beee.3563257d@aol.com>

boy, what a mess the p.g. catalog is!

i cleaned the info for the english e-texts 10000-14000:
>    http://z-m-l.com/misc/cata10-14-all.html

this is what i need, and might not be useful to p.g. (sorry),
but i'm happy to share it.

here's a more-concentrated list, showing many of the
multiple-item e-texts, which were particularly messy:
>    http://z-m-l.com/misc/cata10-14-repeats.html

this exercise suggests that the post-processors/whitewashers
might want to see how items in a series were posted in the past
when preparing additional items from the series for submission,
with the intent of minimizing the inconsistencies...

-bowerbird

p.s.   if anyone has any questions on what i've done, or why,
or anything related to this, i will be happy to address them...


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080519/45c2ef53/attachment.htm 

From Bowerbird at aol.com  Mon May 19 15:25:46 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 19 May 2008 18:25:46 EDT
Subject: [gutvol-d] cleaning up the catalog
Message-ID: <be0.343f14a7.3563586a@aol.com>

a couple of nice natural experiments have introduced themselves,
in checking on some possible duplicates in the library...

first, the two "young captives" e-texts are entirely different.   ok.

second, the two "pearl box" e-texts are highly similar, but not identical...
this is a book "containing one hundred beautiful stories for young people", 
each version contains a few completely different stories from the other one,
but there's no list (in either of the versions) of the differences between 
them.
since the versions contain 90-95 identical stories, this seems like a book 
that
would benefit greatly by having the two different versions _merged_ into one.
and of course comparison of the two versions could identify the errors in 
each,
so that's the first of our two "natural experiments".

third, the two "scranton high chums on the cinder path" look to be identical,
although there is some possibility they could be of slightly different 
editions.
either way, a comparison of the two gives us our second "natural experiment".

i'll let you know in the next few days how these experiments turn out...

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080519/46ab0210/attachment.htm 

From hyphen at hyphenologist.co.uk  Tue May 20 00:41:04 2008
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Tue, 20 May 2008 08:41:04 +0100
Subject: [gutvol-d] cleaning up the catalog
In-Reply-To: <be0.343f14a7.3563586a@aol.com>
References: <be0.343f14a7.3563586a@aol.com>
Message-ID: <000001c8ba4c$dcc04c90$9640e5b0$@co.uk>

Bowerbird wrote

 
>>> 

a couple of nice natural experiments have introduced themselves,
in checking on some possible duplicates in the library...

first, the two "young captives" e-texts are entirely different.  ok.

second, the two "pearl box" e-texts are highly similar, but not identical...
this is a book "containing one hundred beautiful stories for young people", 
each version contains a few completely different stories from the other one,
but there's no list (in either of the versions) of the differences between
them.
since the versions contain 90-95 identical stories, this seems like a book
that
would benefit greatly by having the two different versions _merged_ into
one.
and of course comparison of the two versions could identify the errors in
each,
so that's the first of our two "natural experiments".

third, the two "scranton high chums on the cinder path" look to be
identical,
although there is some possibility they could be of slightly different
editions.
either way, a comparison of the two gives us our second "natural
experiment".

i'll let you know in the next few days how these experiments turn out...

<<< 


These are the normal differences between different editions of a book
written 

and produced pre 1923.   In "my" books I have found different versions of
the 

same poem in different books.   I normally include the *longer* version in
both 

books with a note about what I have done.    Publishers and editors were
stronger 

than the authors in those days,   and took greater liberties, than they do
today,

 
I really think that PG and DP  are a bit too paranoid about producing an
etext which is an 

*exact* copy of a paper version.    I personally think that any difference
between the 

etext and the paper version should  be about the same as one could expect
between 

two editions of the same book produced pre 1923.   These were regularly
re-typeset, 

complete with typos, spelling mistakes and formatting.

 
Dave Fawthrop


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080520/173b6c4d/attachment.htm 

From Bowerbird at aol.com  Tue May 20 02:50:44 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 20 May 2008 05:50:44 EDT
Subject: [gutvol-d] cleaning up the catalog
Message-ID: <c0b.32a3b5f7.3563f8f4@aol.com>

dave said:
>    These are the normal differences between different editions 
>    of a book written and produced pre 1923.

right.   i'm pretty sure that the two versions of "pearl box" were
based on different editions, since the differences are striking...
d.p. people don't lift huge segments out of the books they do...

the "scranton chums" differences look like digitization mistakes,
rather than version differences, with perhaps a few exceptions...
i can't find any scans of this book to say that for sure, however...


>    In ?my? books I have found different versions of the same poem 
>    in different books.?? I normally include the *longer* version 
>    in both books with a note about what I have done.

i'd do something similar if i merged the two "pearl box" e-texts
-- i.e., i'd include every piece that was printed in either version...


>    Publishers and editors were stronger than the authors in those days,
>    and took greater liberties, than they do today.

yeah, a lot of what d.p. people call "the intention of the author" is
really something the _publisher_ is more likely to have controlled.


>    I really think that PG and DP ?are a bit too paranoid about 
>    producing an etext which is an *exact* copy of a paper version.

i think if that's _really_ their intention, they're doing a lousy job of it.

but i agree with you that, in many cases, that shouldn't be the goal...

(but i admit i'm totally confused as to what the exact goal of d.p. is...
every aspect that they might claim seems rife with self-contradiction.)


>    I personally think that any difference between the etext and the
>    paper version should ?be about the same as one could expect 
>    between two editions of the same book produced pre 1923.

i'd want to be more specific about the types of changes allowed.       :+)

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080520/54ccee4b/attachment.htm 

From prosfilaes at gmail.com  Tue May 20 03:36:57 2008
From: prosfilaes at gmail.com (David Starner)
Date: Tue, 20 May 2008 06:36:57 -0400
Subject: [gutvol-d] cleaning up the catalog
In-Reply-To: <000001c8ba4c$dcc04c90$9640e5b0$@co.uk>
References: <be0.343f14a7.3563586a@aol.com>
	<000001c8ba4c$dcc04c90$9640e5b0$@co.uk>
Message-ID: <6d99d1fd0805200336r5feb1df2k59c0536b5475f61e@mail.gmail.com>

On Tue, May 20, 2008 at 3:41 AM, Dave Fawthrop
<hyphen at hyphenologist.co.uk> wrote:
> These are the normal differences between different editions of a book
> written
> and produced pre 1923.   In "my" books I have found different versions of
> the
> same poem in different books.   I normally include the *longer* version in
> both
> books with a note about what I have done.    Publishers and editors were
> stronger
> than the authors in those days,   and took greater liberties, than they do
> today,

And what makes the *longer* version the right one? What makes it the
one that the author originally wrote? What makes it fundamentally
wrong to abridge a poem for an anthology, which is still done today?

>  I personally think that any difference
> between the
> etext and the paper version should  be about the same as one could expect
> between
> two editions of the same book produced pre 1923.

We can do better. We aren't pre-1923, and we don't have the same
constraints on printing. Important books are reproduced verbatim in
the modern world, and I see no reason why we shouldn't do the same.
Quick careless reproductions are hurting Project Gutenberg's
reputation, so we need to do better.

From marcello at perathoner.de  Tue May 20 05:51:49 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 20 May 2008 14:51:49 +0200
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <Pine.LNX.4.64.0805111037450.5278@pglaf.org>
References: <cf7.2f2ec2c2.3554ac98@aol.com>	<Pine.LNX.4.64.0805081709460.18854@pglaf.org>	<4824AEFC.50102@ibiblio.org>	<Pine.LNX.4.64.0805091903370.8679@pglaf.org>	<4825A028.2040203@perathoner.de>
	<Pine.LNX.4.64.0805111037450.5278@pglaf.org>
Message-ID: <4832C965.7090803@perathoner.de>

Michael Hart wrote:

> Once again I must suggest, as though Marcello and Josh were in a
> listening mode, to go over the various interviews that aired the
> particular day of December 14, 2004.

Obviously, as you already did the research, it would cost you nothing to 
spring the links to those interviews. (If those interviews existed.)


> There is not, and was not, one single source that provided /all/
> the information that was broadcast and printed that day.

So you picked "6 years" from one guy here and "10 millions" from another 
guy there and then compounded that to an official Google announcement?

No wonder you don't get interviewed by the media.


> If you do your homework, you will find many such sources, some I
> referenced earlier, but none all that hard to find.

Maybe you should have become a teacher. Then you could have given 
everybody their homework.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From marcello at perathoner.de  Tue May 20 05:54:20 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 20 May 2008 14:54:20 +0200
Subject: [gutvol-d] Current Google Search on 10 million and 6 years
In-Reply-To: <Pine.LNX.4.64.0805111208220.7880@pglaf.org>
References: <Pine.LNX.4.64.0805111208220.7880@pglaf.org>
Message-ID: <4832C9FC.5080704@perathoner.de>

Michael Hart wrote:

> Because I want everyone to remember when Marcello, Josh and Co.,
> try to take over Project Gutenberg, again, and again, and again.

That's ridiculous.

What should I want to take over? The BIG fundings? The WELL-RUN 
organisation? The EVER-INCREASING volunteer base?

As a matter of fact, DP has long since taken over PG. With their 
well-organized workflow (instead of the clueless PG anarchy) they have 
produced more books in 8 years than PG in 38.

THEY are creating books now, not PG.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From hart at pglaf.org  Tue May 20 08:22:10 2008
From: hart at pglaf.org (Michael Hart)
Date: Tue, 20 May 2008 08:22:10 -0700 (PDT)
Subject: [gutvol-d] cleaning up the catalog
In-Reply-To: <6d99d1fd0805200336r5feb1df2k59c0536b5475f61e@mail.gmail.com>
References: <be0.343f14a7.3563586a@aol.com>
	<000001c8ba4c$dcc04c90$9640e5b0$@co.uk>
	<6d99d1fd0805200336r5feb1df2k59c0536b5475f61e@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0805200806010.4794@pglaf.org>


On Tue, 20 May 2008, David Starner wrote:

> On Tue, May 20, 2008 at 3:41 AM, Dave Fawthrop 
> <hyphen at hyphenologist.co.uk> wrote:
>> These are the normal differences between different editions of a 
>> book written and produced pre 1923.  In "my" books I have found 
>> different versions of the same poem in different books.  I 
>> normally include the *longer* version in both books with a note 
>> about what I have done.  Publishers and editors were stronger 
>> than the authors in those days, and took greater liberties, than 
>> they do today,
>
> And what makes the *longer* version the right one? What makes it 
> the one that the author originally wrote? What makes it 
> fundamentally wrong to abridge a poem for an anthology, which is 
> still done today?

When it comes down to which is a "wrong" or "right" publication,
I tend to side with the author against editors and publishers.

After all, it is the mind and heart of the author we try to see
into when we read, not the minds and hearts of the publishers.

Yes, the publishers used to always get the last word, and also
the last sheckel, ducat, ruble, mark, franc, dollar, whatever.

Well, the authors still get only about 5% of the gross, but it
is nice that they sometimes have a little control nowadays


>>  I personally think that any difference between the etext and the 
>> paper version should be about the same as one could expect 
>> between two editions of the same book produced pre 1923.

Personally, I think we can do better than pre-1923.

No reason we can't do both versions.

However, if one must the chosen over the other, I choose authors,
as the creator of the baby in question, just as did Solomon.

It's the author's baby, the editors and publishers are midwives,
at best, but the book industry gets 19/20 of the cash, isn't it
enough that they get that, without getting to do plastic surgery
on the baby to make it look more like them, less like an author?


> We can do better. We aren't pre-1923, and we don't have the same 
> constraints on printing. Important books are reproduced verbatim 
> in the modern world, and I see no reason why we shouldn't do the 
> same.

Verbatim reproductions are really nothing more than Xeroxes.

Project Gutenberg should be more than just an eXerox machine.


> Quick careless reproductions are hurting Project Gutenberg's 
> reputation, so we need to do better.

Everyone here can run Project Gutenberg better than anyone else.

There is no doubt about that in everyone's mind.

Yet, "Project Gutenberg's reputation" has been created by some
50,000+ volunteers from all walks of life all over the world,
and not by the editors of Random House, Simon and Schuster,
Knopf, Ballantine, HarperCollins, etc., etc., etc.

These people had way more than enough control in their day,
let's not help extend their control to the next millennium.

However, if you want to personally create verbatim editions
we will be only too glad to included them in our collection.

We just won't demand that they be the ONLY editions. . . .


Michael S. Hart
Founder
Project Gutenberg


From hart at pglaf.org  Tue May 20 10:01:18 2008
From: hart at pglaf.org (Michael Hart)
Date: Tue, 20 May 2008 10:01:18 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <4832C965.7090803@perathoner.de>
References: <cf7.2f2ec2c2.3554ac98@aol.com>
	<Pine.LNX.4.64.0805081709460.18854@pglaf.org>
	<4824AEFC.50102@ibiblio.org>
	<Pine.LNX.4.64.0805091903370.8679@pglaf.org>
	<4825A028.2040203@perathoner.de>
	<Pine.LNX.4.64.0805111037450.5278@pglaf.org>
	<4832C965.7090803@perathoner.de>
Message-ID: <Pine.LNX.4.64.0805200931450.6420@pglaf.org>


So, you still refuse the possibility that a better picture of
Google's initial rollout of Google "Print Library" would have
been available from the reading of that single entry, rather,
of course, than by partaking of all available soures.

No wonder you have such a myopic view of the things we talked
about over the years.

Of course, you ARE correct to do this if you want to be SURE.

"Person who has one clock ALWAYS knows the correct time;
person who has two clocks NEVER knows the correct time."

[Well now there are "atomic clocks" but. . . .]

So, I can see how life is so much easier for you with just a
single source of information to quote from that you would be
loathe to add another source, much less a dozen or two.

I leave you with the words of Isaac Asimov's nomination to a
position of "World's Smartest Person". . . .

"You don't understand anything until
you learn it more than one way"

Your single source was not in charge of all the aspects of a
half dozen major libraries doing "Google Print Library."

No one could really exercise that kind of control.

Therefore other points of view could possibly be worthwhile,
and a viewpoint comprising multiple sources just might maybe
barely possibly have more relation to the real world than an
elementary single statement press release.

Not, of course, that I believe that single press release was
ALL the Google management gave to ALL the worldwide press.

I can see that the syndicated news continues to escape every
possible point of your attention on this, where single quote
sources DO mention years and millions of books in a "single-
source" aspect you find de rigueur for such events.

Actually, however, the more you say, the more I am sure your
reading actually included such sources, and that you are now
and have always been intentionally ignoring them.

I just can't possibly imagine that you missed them without a
very intentional intervention of your myopic viewpoint as an
obvious fact that syndicated news sources quoted in hundreds
or a thousand newspapers could not have anything of offer.

The real question is, of course, why are you continuing this
"tag team trolling" to try to keep a flame war alive that is
obviously of no interest to anyone but your tag team members
after such a long time of silence and not more research on a
subject that is still so easy to find with Google searches.

What possible worthwhile motivation could you have?

What point are you trying to make?

That certain Google representives to the press might have a
penchant for avoiding specifics such as saying how many the
years might be to do how many books?

I'll certainly grant you that which is why it is important,
to no mean degree, to find multiple sources.

Game!

Set!

Match!

There is nothing you can add that doesn't make you seem in
ever sillier perspectives, and hasn't been all along.

So why do you continue making a fool of yourself, and, our
listserver, which people will probably read years ahead.

Doesn't it matter at all to you what you/we look like some
years down the road when you might want us to be trusted?

End.


mh


On Tue, 20 May 2008, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>> Once again I must suggest, as though Marcello and Josh were in 
>> a
>> listening mode, to go over the various interviews that aired 
>> the
>> particular day of December 14, 2004.
>
> Obviously, as you already did the research, it would cost you 
> nothing to spring the links to those interviews. (If those 
> interviews existed.)
>
>
>> There is not, and was not, one single source that provided 
>> /all/
>> the information that was broadcast and printed that day.
>
> So you picked "6 years" from one guy here and "10 millions" from 
> another guy there and then compounded that to an official Google 
> announcement?
>
> No wonder you don't get interviewed by the media.
>
>
>> If you do your homework, you will find many such sources, some 
>> I
>> referenced earlier, but none all that hard to find.
>
> Maybe you should have become a teacher. Then you could have given 
> everybody their homework.
>
>
> -- 
> Marcello Perathoner
> webmaster at gutenberg.org
>

From Bowerbird at aol.com  Tue May 20 11:44:26 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 20 May 2008 14:44:26 EDT
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <c41.35a9e376.3564760a@aol.com>

you know one thing that's interesting about this?

every one of the three major broadcast networks carried this item
on their nightly news show, but we can't access a single transcript.

not just for that one night, but for _any_ night.

they cram propaganda down our throats but keep no public record,
so we can't even go back after the fact and check what they've said...

so -- years later -- we have to piece it together like a jigsaw puzzle.

it sure makes it easy for _liars_ like george w. bush to operate, eh?
(and remember when the republicans actually put forth into motion
_impeachment_proceedings_ against bill for lying about a blowjob?
the hypocrisy of that political party is so overwhelming it stuns me.)

i forget what orwell called it, but those people who claimed that
his predictions didn't come true have their heads up their butts...

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080520/90291972/attachment.htm 

From gbnewby at pglaf.org  Tue May 20 13:17:52 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue, 20 May 2008 13:17:52 -0700
Subject: [gutvol-d] OLPC eBook reader
Message-ID: <20080520201752.GA13298@mail.pglaf.org>

Seen on slashdot:

2nd Generation "$100 Laptop" Will Be an E-Book Reader
http://hardware.slashdot.org/article.pl?sid=08/05/20/1621214

"At a conference sponsored by the One Laptop Per Child Foundation this
morning, OLPC founder unveiled the design for the foundation's
second-generation laptop. It's actually not a laptop at all -- it's a
dual-screen e-book reader (we've got pictures). Negroponte said the
foundation hopes that the cost of the new device, which is scheduled for
production by 2010, can be kept to $75, in part by using low-cost 
displays manufactured for portable DVD players."

The article:
  http://www.xconomy.com/2008/05/20/negroponte-unveils-2nd-generation-olpc-laptop-its-an-e-book/

As mentioned on gutvol-d before, PG tried to make eBooks available for
[current] OLPC XO system but they didn't end up taking many [or maybe
any], due to some arbitrary file format restrictions they have.

  -- Greg


From marcello at perathoner.de  Tue May 20 13:51:50 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 20 May 2008 22:51:50 +0200
Subject: [gutvol-d] Grammar Error vs Logic Error
In-Reply-To: <Pine.LNX.4.64.0805111109240.5278@pglaf.org>
References: <Pine.LNX.4.64.0805111109240.5278@pglaf.org>
Message-ID: <483339E6.4090508@perathoner.de>

Michael Hart wrote:

> Just because somoene disagrees with ou doesn't make a "liar."

What do you call a person that makes a public statement, and when 
challenged to post evidence to his claim, openly refuses and tries to 
reverse the burden of proof?

Your post about Google is as clear a case of defamation as can be.


> If you think it might have been just incidental/accidental, may
> a thought about the difference in time zones might give thought
> to the idea that some of these interviews took place earlier in
> some time zones than others, by enough of a factor indicating a
> setup lead time in exess of what you would get with no planning
> of this as a worldwide event.

If you can't give any evidence for your claim, don't mention the 
"difference in time zones". You just made yourself more ridiculous.

First law of holes: if you are in one stop digging.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From marcello at perathoner.de  Tue May 20 14:24:33 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 20 May 2008 23:24:33 +0200
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <Pine.LNX.4.64.0805200931450.6420@pglaf.org>
References: <cf7.2f2ec2c2.3554ac98@aol.com>	<Pine.LNX.4.64.0805081709460.18854@pglaf.org>	<4824AEFC.50102@ibiblio.org>	<Pine.LNX.4.64.0805091903370.8679@pglaf.org>	<4825A028.2040203@perathoner.de>	<Pine.LNX.4.64.0805111037450.5278@pglaf.org>	<4832C965.7090803@perathoner.de>
	<Pine.LNX.4.64.0805200931450.6420@pglaf.org>
Message-ID: <48334191.1020809@perathoner.de>

Michael Hart wrote:

Your original statement was that:

> Google announced on December 14, 2004 that they would digitize
> 10 million books in 6 years.

I might remember you that to prove your statement you have to show that:

1. At least one person mentioned said numbers in said context.

2. That person is an official representative of Google.


 > Therefore other points of view could possibly be worthwhile,
 > and a viewpoint comprising multiple sources just might maybe
 > barely possibly have more relation to the real world than an
 > elementary single statement press release.

That amounts to admitting that you assembled the "Google announcement" 
in your head.

You might want to be more careful. Losing a libel case against Google 
would be bad publicity for PG.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From Bowerbird at aol.com  Tue May 20 16:29:16 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 20 May 2008 19:29:16 EDT
Subject: [gutvol-d] can't see the forest for the trees
Message-ID: <d02.3357a5f6.3564b8cc@aol.com>

greg said:
>    As mentioned on gutvol-d before, 
>    PG tried to make eBooks available for [current] OLPC XO system 
>    but they didn't end up taking many [or maybe any], 
>    due to some arbitrary file format restrictions they have.

well, far be it from me to correct an executive officer from p.g. -- 
and greg, please do let us know if you _do_ know better than me 
-- but the reason o.l.p.c. didn't scoop up the p.g. library in one bite
is because your library has inconsistencies that make it _impossible_.

and that is the _only_ reason.

heck, even your _catalog_ is an unworkable mess.

at the time the original o.l.p.c. person came around, i talked to him
-- he was a kid doing a summer internship -- and he had zero idea
of the complexities that awaited him.   so, of course, he failed badly.
you can't even fathom the problems in 3 months, let alone solve 'em.

the inconsistencies in your library make it unworkable _as_ a library.

and _that_ is the reason o.l.p.c. didn't (couldn't) incorporate it.

and that is the _only_ reason.

a consistent library would be easy to re-engineer to "restrictions".

you can't make a viewer-program for these books, because of their
_inconsistencies_, a point i've made here for well over 4 years now...

_i_ can make such a viewer-program, because i know how to resolve
the inconsistencies, but i am not sharing those secrets because then
the point will not be crystal clear that inconsistencies hobble a library.

instead, i'm going to use my ability to resolve the inconsistencies to
create a _consistent_ version of the p.g. library, which _will_ be able
to be scooped up in a single bite, and _many_ entities will then do it.

i mean, for crying out loud, your system should have been designed
such that it could be dropped on _any_ file-storage system to create
a turn-key electronic-library, for one person or one hundred million.

one click of one button should be all that it takes.

boom!

instant library, with every utility most people will ever need.

but you haven't got _any_ of the pieces needed to make that happen.
not a single one.   this total blind spot is a very bad black mark on you.

heck, if p.g. would've had good infrastructure in-place and working,
negroponte might have been much more successful selling his x.o.,
seeings as how he could've pitched each one as chock-full of books,
adding hundreds (thousands?) of dollars of value to each machine...

"whaddya mean, you want ms-office?   you've got _shakespeare_!"

i want to put a full p.g. version on tens of thousands of hard-drives,
thereby creating a hyper-redundant and world-wide e-library mesh,
and see where _that_ kind of development lets us trampoline up to...

but you're all too busy churning out even more inconsistent e-texts
to do something like _that_...   it's sad.   i tell you, it's really, really 
sad.

you call yourself an electronic library...
but you can't see the forest for the trees...

***

i brought this point to your attention, greg (and yours too, michael)
when you hit 10,000.   and you did nothing to fix the basic problem.
now your library has hit 25,000, with 2.5 times the inconsistencies...
are you gonna do something now?   or will i be repeating at 50,000?

you need to learn.

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080520/783744d2/attachment.htm 

From hart at pglaf.org  Tue May 20 17:42:02 2008
From: hart at pglaf.org (Michael Hart)
Date: Tue, 20 May 2008 17:42:02 -0700 (PDT)
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <d02.3357a5f6.3564b8cc@aol.com>
References: <d02.3357a5f6.3564b8cc@aol.com>
Message-ID: <Pine.LNX.4.64.0805201733170.17013@pglaf.org>


I can tell you exactly why the OLPC didn't take our books,
and in just one concept:

They refused my suggestion to do a feasibility study.

I wanted to run 10 books through their whole system,
find out what needed improvement, then 100, 1,000, &
finally 10,000. . .each time making adjustments that
might not have been quite so obvious concerning some
smaller numbers of books.

Feasibility Study.

It should be tatooed on the hands of all MBAs, etc.,
upon graduation today, just as "HOLD FAST" was quite
literally tattoed on the knuckles of the old sailors
of the 19th Century.

Everyone seems to think they are so smart then could
figure out all the details in advance.

Duh!

There is NO substitute for experience.

The way to build up experience is trial runs. . . .

It really IS as simple as that.

Every time you come up with a new plan, try it out!

Period.

No exceptions.

This is why I always answer those who know it all--
and there have been plenty, with encouragement from
the heart, and mind, to try out their plans, with a
lot of help from us, before going any further.

If you can't take that first step, you are not soon
going to complete that journey of many many steps.

There is a reason people say to walk before running
and it is the voice of experience.

Some people run. . .and run with scissors. . . .

mh

On Tue, 20 May 2008, Bowerbird at aol.com wrote:

> greg said:
>>    As mentioned on gutvol-d before,
>>    PG tried to make eBooks available for [current] OLPC XO system
>>    but they didn't end up taking many [or maybe any],
>>    due to some arbitrary file format restrictions they have.
>
> well, far be it from me to correct an executive officer from p.g. --
> and greg, please do let us know if you _do_ know better than me
> -- but the reason o.l.p.c. didn't scoop up the p.g. library in one bite
> is because your library has inconsistencies that make it _impossible_.
>
> and that is the _only_ reason.
>
> heck, even your _catalog_ is an unworkable mess.
>
> at the time the original o.l.p.c. person came around, i talked to him
> -- he was a kid doing a summer internship -- and he had zero idea
> of the complexities that awaited him.   so, of course, he failed badly.
> you can't even fathom the problems in 3 months, let alone solve 'em.
>
> the inconsistencies in your library make it unworkable _as_ a library.
>
> and _that_ is the reason o.l.p.c. didn't (couldn't) incorporate it.
>
> and that is the _only_ reason.
>
> a consistent library would be easy to re-engineer to "restrictions".
>
> you can't make a viewer-program for these books, because of their
> _inconsistencies_, a point i've made here for well over 4 years now...
>
> _i_ can make such a viewer-program, because i know how to resolve
> the inconsistencies, but i am not sharing those secrets because then
> the point will not be crystal clear that inconsistencies hobble a library.
>
> instead, i'm going to use my ability to resolve the inconsistencies to
> create a _consistent_ version of the p.g. library, which _will_ be able
> to be scooped up in a single bite, and _many_ entities will then do it.
>
> i mean, for crying out loud, your system should have been designed
> such that it could be dropped on _any_ file-storage system to create
> a turn-key electronic-library, for one person or one hundred million.
>
> one click of one button should be all that it takes.
>
> boom!
>
> instant library, with every utility most people will ever need.
>
> but you haven't got _any_ of the pieces needed to make that happen.
> not a single one.   this total blind spot is a very bad black mark on you.
>
> heck, if p.g. would've had good infrastructure in-place and working,
> negroponte might have been much more successful selling his x.o.,
> seeings as how he could've pitched each one as chock-full of books,
> adding hundreds (thousands?) of dollars of value to each machine...
>
> "whaddya mean, you want ms-office?   you've got _shakespeare_!"
>
> i want to put a full p.g. version on tens of thousands of hard-drives,
> thereby creating a hyper-redundant and world-wide e-library mesh,
> and see where _that_ kind of development lets us trampoline up to...
>
> but you're all too busy churning out even more inconsistent e-texts
> to do something like _that_...   it's sad.   i tell you, it's really, really
> sad.
>
> you call yourself an electronic library...
> but you can't see the forest for the trees...
>
> ***
>
> i brought this point to your attention, greg (and yours too, michael)
> when you hit 10,000.   and you did nothing to fix the basic problem.
> now your library has hit 25,000, with 2.5 times the inconsistencies...
> are you gonna do something now?   or will i be repeating at 50,000?
>
> you need to learn.
>
> -bowerbird
>
>
>
> **************
> Wondering what's for Dinner Tonight? Get new twists on family
> favorites at AOL Food.
>
> (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
>

From Morasch at aol.com  Tue May 20 20:02:53 2008
From: Morasch at aol.com (Morasch at aol.com)
Date: Tue, 20 May 2008 23:02:53 EDT
Subject: [gutvol-d] can't see the forest for the trees
Message-ID: <c3a.35826cd4.3564eadd@aol.com>

michael hart said:
>   They refused my suggestion to do a feasibility study.

bowerbird to michael:   i've _done_ your feasibility study.

i can tell you exactly how "feasible" it is to try and make
an e-library with your 25,000 e-texts as-is:   _not_at_all_.


>   I wanted to run 10 books through their whole system,
>    find out what needed improvement, 
>    then 100, 1,000, & finally 10,000. . .
>    each time making adjustments that might not have been 
>    quite so obvious concerning some smaller numbers of books.

listen please, michael.   i've done just that exact identical process.

i have done runs of all sizes, and adjusted and readjusted wildly...

and each time the answer comes up the same on the magic 8-ball:
yes, this is definitely doable; not even that hard; however first you
will need to remove the inconsistencies from the library...   period...

why?   'cause you can't treat an inconsistent mass programmatically.
and any time you deal with the care and nurturing of an e-library,
you're _compelled_by_reality_ to do such dealing programmatically.

let me be very clear about this:   i have examined this problem from
the standpoint of a programmer trying to add value to your e-texts.
and there's an incontrovertible law here:   garbage-in-garbage-out.

the thing is, you've got _diamonds_ in amongst all of your garbage...
but as long as it's an inconsistent mass, nobody can mine them out...

and the kicker is that it would be relatively _easy_ for you to fix this!
a conscious decision that, from now on, you're gonna be consistent
wouldn't _cost_ time or energy -- it would actually _save_ you some.
moreover, it would put you on the right path to deal with the backlog.
and once i knew the spill had been plugged, i could start cleaning up.
but as long as you pile on _more_ inconsistency, you will lose ground.

anyway, it no longer matters if you ignore what i'm telling you, since
i'm on the way to creating my own consistent version of your library,
and i _will_ give people a one-button turn-key means of working it...

it's just too bad the words "project gutenberg" will not appear within...
but the time has passed for the niceties of standing back in deference.

-bowerbird


**************
Wondering what's for Dinner Tonight? Get new twists on family 
favorites at AOL Food.
      
(http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080520/3f056f7c/attachment-0001.htm 

From ralf at ark.in-berlin.de  Wed May 21 00:06:34 2008
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Wed, 21 May 2008 09:06:34 +0200
Subject: [gutvol-d] Grammar Error vs Logic Error
In-Reply-To: <483339E6.4090508@perathoner.de>
References: <Pine.LNX.4.64.0805111109240.5278@pglaf.org>
	<483339E6.4090508@perathoner.de>
Message-ID: <20080521070634.GB24648@ark.in-berlin.de>

Marcello,

I'd rather you'd waste time on the PGTEI patch I sent you (twice!)
than on a useless flame war.

And if this makes me end up in your killfile, please tell, so
I can stop doing work on effectively unmaintained software.


P***ed,
ralf

From hart at pglaf.org  Wed May 21 10:09:51 2008
From: hart at pglaf.org (Michael Hart)
Date: Wed, 21 May 2008 10:09:51 -0700 (PDT)
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <c3a.35826cd4.3564eadd@aol.com>
References: <c3a.35826cd4.3564eadd@aol.com>
Message-ID: <Pine.LNX.4.64.0805211001240.29618@pglaf.org>


I think the problem, in your eyes, is that you are thinking
too much like a computer, not enough like a human being.

You want everything to line up perfectly. . . .

Sorry, it doesn't have to do that for human beings to work
with the eBooks, or even computers. . .unless they are VERY
demandingly programmed.

THe OLPC people never paid the slightest attention. . . .

Seriously.

No wonder they didn't get exactly what they wanted.

I would have been silly to expect they would have.

I would be silly to expect ANYone to get exactly what
they wanted without some feedback, dare I say it. . .
cybernetic. . .processes.

You have claimed for years, and remarkably consistently,
that your programs required better eBooks to work well.

But you never made the real effort to bridge the gaps
between your dream eBooks and those existing in reality.

Nearly every reader CAN read our eBooks, computer or human.

Some choose not to do so intentionally.

I'm not really worried about those.

Next in line not to worry about are those who don't take
the necessary steps to get from where things ARE to where
they WANT them to be.

You always said the PG books were close to what you wanted,
but what you did NOT do was provide the pathway, leading by
example, to get to where you wanted to go.

The longest journey, of a billion eBooks, starts with one.

Just one. . .then just two. . .then just three. . .four...


On Tue, 20 May 2008, Morasch at aol.com wrote:

> michael hart said:
>>   They refused my suggestion to do a feasibility study.
>
> bowerbird to michael:   i've _done_ your feasibility study.
>
> i can tell you exactly how "feasible" it is to try and make
> an e-library with your 25,000 e-texts as-is:   _not_at_all_.
>
>
>>   I wanted to run 10 books through their whole system,
>>    find out what needed improvement,
>>    then 100, 1,000, & finally 10,000. . .
>>    each time making adjustments that might not have been
>>    quite so obvious concerning some smaller numbers of books.
>
> listen please, michael.   i've done just that exact identical process.
>
> i have done runs of all sizes, and adjusted and readjusted wildly...
>
> and each time the answer comes up the same on the magic 8-ball:
> yes, this is definitely doable; not even that hard; however first you
> will need to remove the inconsistencies from the library...   period...
>
> why?   'cause you can't treat an inconsistent mass programmatically.
> and any time you deal with the care and nurturing of an e-library,
> you're _compelled_by_reality_ to do such dealing programmatically.
>
> let me be very clear about this:   i have examined this problem from
> the standpoint of a programmer trying to add value to your e-texts.
> and there's an incontrovertible law here:   garbage-in-garbage-out.
>
> the thing is, you've got _diamonds_ in amongst all of your garbage...
> but as long as it's an inconsistent mass, nobody can mine them out...
>
> and the kicker is that it would be relatively _easy_ for you to fix this!
> a conscious decision that, from now on, you're gonna be consistent
> wouldn't _cost_ time or energy -- it would actually _save_ you some.
> moreover, it would put you on the right path to deal with the backlog.
> and once i knew the spill had been plugged, i could start cleaning up.
> but as long as you pile on _more_ inconsistency, you will lose ground.
>
> anyway, it no longer matters if you ignore what i'm telling you, since
> i'm on the way to creating my own consistent version of your library,
> and i _will_ give people a one-button turn-key means of working it...
>
> it's just too bad the words "project gutenberg" will not appear within...
> but the time has passed for the niceties of standing back in deference.
>
> -bowerbird
>
>
>
> **************
> Wondering what's for Dinner Tonight? Get new twists on family
> favorites at AOL Food.
>
> (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
>

From Bowerbird at aol.com  Wed May 21 12:52:22 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 21 May 2008 15:52:22 EDT
Subject: [gutvol-d] can't see the forest for the trees
Message-ID: <bf9.39a35a67.3565d776@aol.com>

i said:
>    let me be very clear about this:? i have examined this problem from
>   the standpoint of a programmer trying to add value to your e-texts.

i don't think i elaborated enough to become "very clear", so i will do so...

i want you to understand that the perspective that informs my thoughts
is the one derived from _writing_code_ to create desirable capabilities...

my opinions are not based on "ideological concerns", where i'm trying to
get you to adhere to my religious dogma, so you should take it on faith...

nor am i motivated by lust for control, where i want you to do it my way.

no sir, my focus is mundane -- "how can i write code to make this work?"

mundane.

but she is also a very strict and unforgiving mistress, this muse of code.

a misplaced comma or semicolon in source-code, and a program fails.
won't even _compile_.   might not even say (clearly) what you did wrong.
but believe you me, until you _find_ the problem, and _fix_ it correctly,
she's gonna continue to balk, you're gonna continue to be obstructed...

and just because you finally do get it to _run_ doesn't mean it's gonna
_do_what_you_want_it_to_do_.   there might be big bugs in your logic...

but once your code _does_ do what you want it to do, or _exceeds_ that,
then you _know_ without a shadow of a doubt that your formulas _work_.

you _know_ you've got it _right_, and you don't have to take it on faith.
because you've got working code.

working code is a beautiful proof.   you don't need anything more...

***

so...   when i say "you need to do this", what you should be _hearing_ is,
"if you don't do this, us programmers aren't gonna be able to help you,
and if us programmers can't help you, your electronic library won't fly."

and i'm not talking about _me_ as one of your programmer helpers,
i'm talking about the _dozens_and_dozens_ of programmers who will
pop up into your existence just as soon as you make it so they _can_
negotiate the library, with code, simply.   and it doesn't have to be with
something as complex as an "a.p.i." for them to be able to help you...
just make things _simple_ and _consistent_, so all those programmers
can write simple programs with simple routines that will work correctly.

make it easy for 'em to write working code.

working code is beautiful proof.

you don't need anything more.

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080521/58b5102d/attachment.htm 

From lee at novomail.net  Wed May 21 13:47:02 2008
From: lee at novomail.net (Lee Passey)
Date: Wed, 21 May 2008 14:47:02 -0600
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <d02.3357a5f6.3564b8cc@aol.com>
References: <d02.3357a5f6.3564b8cc@aol.com>
Message-ID: <48348A46.3000609@novomail.net>

Bowerbird at aol.com wrote:

> i brought this point to your attention, greg (and yours too, michael)
> when you hit 10,000.   and you did nothing to fix the basic problem.
> now your library has hit 25,000, with 2.5 times the inconsistencies...
> are you gonna do something now?   or will i be repeating at 50,000?
> 
> you need to learn.

Look, over the course of several years two things have become blatantly 
obvious:

1. By just about every measure Project Gutenberg is severely broken.

2. Michael Hart (who is the only one whose opinion counts at PG) is 
adamantly opposed to any of the changes which would be required to fix it.

Give up.

Move on.

If you're repeating this message at 50,000, it will be because you're 
the one who won't learn -- that PG is on a dead-end course, and will not 
alter its direction.

Those of us who want a reliable archive of consistently formatted 
documents with accurate cataloging information (this is the "metadata" 
which you have so reviled in the past) will simply have to start over. I 
see no other alternative.


From hart at pglaf.org  Thu May 22 10:13:29 2008
From: hart at pglaf.org (Michael Hart)
Date: Thu, 22 May 2008 10:13:29 -0700 (PDT)
Subject: [gutvol-d] cleaning up the catalog
In-Reply-To: <be0.343f14a7.3563586a@aol.com>
References: <be0.343f14a7.3563586a@aol.com>
Message-ID: <Pine.LNX.4.64.0805221012160.16988@pglaf.org>


When there are two different paper editions that differ as much
as the ones listed below, I certainly don't mind if someone "merges" 
them, but I don't want to kill off the original editions, either.

someone made some editing shoices there that were too obvious.

mh

On Mon, 19 May 2008, Bowerbird at aol.com wrote:

> a couple of nice natural experiments have introduced themselves,
> in checking on some possible duplicates in the library...
>
> first, the two "young captives" e-texts are entirely different.   ok.
>
> second, the two "pearl box" e-texts are highly similar, but not identical...
> this is a book "containing one hundred beautiful stories for young people",
> each version contains a few completely different stories from the other one,
> but there's no list (in either of the versions) of the differences between
> them.
> since the versions contain 90-95 identical stories, this seems like a book
> that
> would benefit greatly by having the two different versions _merged_ into one.
> and of course comparison of the two versions could identify the errors in
> each,
> so that's the first of our two "natural experiments".
>
> third, the two "scranton high chums on the cinder path" look to be identical,
> although there is some possibility they could be of slightly different
> editions.
> either way, a comparison of the two gives us our second "natural experiment".
>
> i'll let you know in the next few days how these experiments turn out...
>
> -bowerbird
>
>
>
> **************
> Wondering what's for Dinner Tonight? Get new twists on family
> favorites at AOL Food.
>
> (http://food.aol.com/dinner-tonight?NCID=aolfod00030000000001)
>

From hart at pglaf.org  Thu May 22 11:45:26 2008
From: hart at pglaf.org (Michael Hart)
Date: Thu, 22 May 2008 11:45:26 -0700 (PDT)
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <48348A46.3000609@novomail.net>
References: <d02.3357a5f6.3564b8cc@aol.com> <48348A46.3000609@novomail.net>
Message-ID: <Pine.LNX.4.64.0805221137310.16988@pglaf.org>


On Wed, 21 May 2008, Lee Passey wrote:

> Bowerbird at aol.com wrote:
>
>> i brought this point to your attention, greg (and yours too, 
>> michael) when you hit 10,000.  and you did nothing to fix the 
>> basic problem. now your library has hit 25,000, with 2.5 times 
>> the inconsistencies... are you gonna do something now?  or will i 
>> be repeating at 50,000?
>>
>> you need to learn.
>
> Look, over the course of several years two things have become 
> blatantly obvious:
>
> 1. By just about every measure Project Gutenberg is severely 
> broken.
>
> 2. Michael Hart (who is the only one whose opinion counts at PG) 
> is adamantly opposed to any of the changes which would be required 
> to fix it.

I have supported all the people who wanted to make changes.

They just haven't used that support to make the changes.

> Give up.

That's exactly what they have done.

They don't need your encouragement to do more of that.


> Move on.

That's exactly what they have NOT done.


> If you're repeating this message at 50,000, it will be because 
> you're the one who won't learn -- that PG is on a dead-end course, 
> and will not alter its direction.

Anyone can "alter its direction". . .but without action. . .no.

If you only put as much into action as into your words.

Just get out there and DO 10 books the way YOU want as an example?

Then 100.

Then 1,000.

Then 10,000.

Then 100,000.

However, if you are not willing to even start with 10,
how can people find you to be their leader, if you go nowhere?


> Those of us who want a reliable archive of consistently formatted 
> documents with accurate cataloging information (this is the 
> "metadata" which you have so reviled in the past) will simply have 
> to start over. I see no other alternative.


Every level of progress was made by "starting over."

So. . ."Get Started!"

The starting line is wherever YOU put it.

The starting gun goes off whenever YOU start.

You!!!

Go!!!

Go!!

Go!


From joyce.b.wilson at sbcglobal.net  Thu May 22 13:31:37 2008
From: joyce.b.wilson at sbcglobal.net (Joyce Wilson)
Date: Thu, 22 May 2008 15:31:37 -0500
Subject: [gutvol-d]  cleaning up the catalog
Message-ID: <4835D829.8060909@sbcglobal.net>

>
> boy, what a mess the p.g. catalog is!
>
> i cleaned the info for the english e-texts 10000-14000:
> >/    http://z-m-l.com/misc/cata10-14-all.html
> /
> this is what i need, and might not be useful to p.g. (sorry),
> but i'm happy to share it.
>
> here's a more-concentrated list, showing many of the
> multiple-item e-texts, which were particularly messy:
> >/    http://z-m-l.com/misc/cata10-14-repeats.html
> /
> this exercise suggests that the post-processors/whitewashers
> might want to see how items in a series were posted in the past
> when preparing additional items from the series for submission,
> with the intent of minimizing the inconsistencies...
>
> -bowerbird
>
> p.s.   if anyone has any questions on what i've done, or why,
> or anything related to this, i will be happy to address them...
>   

It looks to me like your data comes from the etext file headers, and not 
from the bibliographic records in the catalog.  The list includes 
several titles by Edith Van Dyne.  In the catalog, Edith Van Dyne exists 
only as a pseudonym for L. Frank Baum.  Baum's name is the one attached 
to the works in the bibliographic records.  Not to claim that there are 
no inconsistencies in the catalog, but from a cataloger's point of view 
it's not reasonable to tally every file header inconsistency as a 
catalog problem.

The catalog is big.  The cataloging team is small.  It may be a long 
time before the problems that bug you get noticed.  To speed things up, 
send your pet catalog peeves to: catalog at pglaf.org

Joyce W

From schultzk at uni-trier.de  Fri May 23 02:48:51 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Fri, 23 May 2008 11:48:51 +0200
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <Pine.LNX.4.64.0805221137310.16988@pglaf.org>
References: <d02.3357a5f6.3564b8cc@aol.com> <48348A46.3000609@novomail.net>
	<Pine.LNX.4.64.0805221137310.16988@pglaf.org>
Message-ID: <E86C3D72-721F-4DB4-8FED-F53DAF317C3E@uni-trier.de>

Hi Micheal, hi PG, hi DP,


	Yes, you allow any willing to do things, yet
	you refuse to instate stricter standards so that
	consolidation is possible.

	Yes, humans can handle the inconsitencies, but
	E-texts, E-books, and PG is handle by computers.
	Therefore, more structure is required. In the modern
	computing world it is possible to set standards
	which lend to humans and computers alike. I do admit
	that most do not master these arts or simply do understand
	how to use and apply them to a particular tasks.

	On the otherside the implementation of such standards and
	upholding them consume over 80% of the resources in the beginning
	over 50% during transition, yet less than 10% once everybody is
	on board. Furthermore, once a change becomes necessary the effort
	is minimal.

	Any programmer or anyone working with a large heterogeneous project
	knows how important well designed standards and their adherence is.

	As Lee states PG [and DP] are broken in this sense. That is why I do
	not actively participate. There are no sufficient rules to follow.
	Bowerbird does want he can, yet there is no simple solution, because
	he get the job done only to about 95%. It is the other 5% which is the
	most important one. Without it success is very far away.

	Regards
		Keith.


Am 22.05.2008 um 20:45 schrieb Michael Hart:

>
>
> On Wed, 21 May 2008, Lee Passey wrote:
>
>> Bowerbird at aol.com wrote:
>>
>>> i brought this point to your attention, greg (and yours too,
>>> michael) when you hit 10,000.  and you did nothing to fix the
>>> basic problem. now your library has hit 25,000, with 2.5 times
>>> the inconsistencies... are you gonna do something now?  or will i
>>> be repeating at 50,000?
>>>
>>> you need to learn.
>>
>> Look, over the course of several years two things have become
>> blatantly obvious:
>>
>> 1. By just about every measure Project Gutenberg is severely
>> broken.
>>
>> 2. Michael Hart (who is the only one whose opinion counts at PG)
>> is adamantly opposed to any of the changes which would be required
>> to fix it.
>
> I have supported all the people who wanted to make changes.
>
> They just haven't used that support to make the changes.
>
>> Give up.
>
> That's exactly what they have done.
>
> They don't need your encouragement to do more of that.
>
>
>
>> Move on.
>
> That's exactly what they have NOT done.
>
>
>
>> If you're repeating this message at 50,000, it will be because
>> you're the one who won't learn -- that PG is on a dead-end course,
>> and will not alter its direction.
>
> Anyone can "alter its direction". . .but without action. . .no.
>
> If you only put as much into action as into your words.
>
> Just get out there and DO 10 books the way YOU want as an example?
>
> Then 100.
>
> Then 1,000.
>
> Then 10,000.
>
> Then 100,000.
>
> However, if you are not willing to even start with 10,
> how can people find you to be their leader, if you go nowhere?
>
>
>> Those of us who want a reliable archive of consistently formatted
>> documents with accurate cataloging information (this is the
>> "metadata" which you have so reviled in the past) will simply have
>> to start over. I see no other alternative.
>
>
> Every level of progress was made by "starting over."
>
> So. . ."Get Started!"
>
> The starting line is wherever YOU put it.
>
> The starting gun goes off whenever YOU start.
>
> You!!!
>
> Go!!!
>

From hart at pglaf.org  Fri May 23 11:16:24 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 23 May 2008 11:16:24 -0700 (PDT)
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <E86C3D72-721F-4DB4-8FED-F53DAF317C3E@uni-trier.de>
References: <d02.3357a5f6.3564b8cc@aol.com> <48348A46.3000609@novomail.net>
	<Pine.LNX.4.64.0805221137310.16988@pglaf.org>
	<E86C3D72-721F-4DB4-8FED-F53DAF317C3E@uni-trier.de>
Message-ID: <Pine.LNX.4.64.0805231101400.7549@pglaf.org>


On Fri, 23 May 2008, Schultz Keith J. wrote:

> Hi Micheal, hi PG, hi DP,
>
>
> 	Yes, you allow any willing to do things, yet
> 	you refuse to instate stricter standards so that
> 	consolidation is possible.

Not promoting someone to an official "czar of standards"
position is hardly the same as opposing their efforts.

Anyone who wants to "consolidate" should have to do the
work of "consilidation" themselves and with volunteer help,
but NOT by a fiat standard being imposed officially.

Project Gutenberg should be an open standard to as much of
a degree as possible. . .not completely. . .but close.

However, there has never been ANY objection to the standards
DP has imposed, or any other group imposes on themselves and
we are only to glad to help them PROMOTE those standards for
any and all to adopt. . .but FORCING those standards:  NO!!!

The people who say we RESIST THEIR STANDARDS are seriously--
perhaps even intentionlly--misrepresenting the situtation.


> 	Yes, humans can handle the inconsitencies, but
> 	E-texts, E-books, and PG is handle by computers.

I guess it all depends on what systems you are lookin at.

However, I repeat, Project Gutenberg is NOT a Xerox machine.

This was never meant to be a completely automated process,
from end to end, which leaves room for humen intervention,
either to create those standards you say you want but will
not actually do the work for, or by those who will resist,
and create or maintain other standards.

It seems as if you had your way, every paper book would've
been require to be the same height for standard shelving--
a great idea for mass-production of library or bookstore's
shelving, but somehow it just has never taken hold.

Why not?

Yes, once "everybody is on board" as you say below, things
could be much better, but you aren't even trying to get an
even early population on board to try things out.

I believe in feasibility studies.

Why?

Because I have learned from experience.

Feasibility studies give that experience a home to start.

If you are unwilling to start,
you are unwilling to finish.

So many complaints by people their idea is not completed,
when they have never even really started.

Get started!!!

"Build it. . .and they will come!"

Don't build it. . .and they can't come, can they?


So many people want everyone on board the same train,
they they refuse to even lay the first file of track,
build the first stations, or be the little engine....

that could. . . .


Go for it!!!


Michael


> 	Therefore, more structure is required. In the modern
> 	computing world it is possible to set standards
> 	which lend to humans and computers alike. I do admit
> 	that most do not master these arts or simply do 
> understand
> 	how to use and apply them to a particular tasks.
>
> 	On the otherside the implementation of such standards and
> 	upholding them consume over 80% of the resources in the 
> beginning
> 	over 50% during transition, yet less than 10% once 
> everybody is
> 	on board. Furthermore, once a change becomes necessary 
> the effort
> 	is minimal.
>
> 	Any programmer or anyone working with a large 
> heterogeneous project
> 	knows how important well designed standards and their 
> adherence is.
>
> 	As Lee states PG [and DP] are broken in this sense. That 
> is why I do
> 	not actively participate. There are no sufficient rules 
> to follow.
> 	Bowerbird does want he can, yet there is no simple 
> solution, because
> 	he get the job done only to about 95%. It is the other 5% 
> which is the
> 	most important one. Without it success is very far away.
>
> 	Regards
> 		Keith.
>
>
> Am 22.05.2008 um 20:45 schrieb Michael Hart:
>
>> 
>> 
>> On Wed, 21 May 2008, Lee Passey wrote:
>> 
>>> Bowerbird at aol.com wrote:
>>> 
>>>> i brought this point to your attention, greg (and yours too,
>>>> michael) when you hit 10,000.  and you did nothing to fix the
>>>> basic problem. now your library has hit 25,000, with 2.5 
>>>> times
>>>> the inconsistencies... are you gonna do something now?  or 
>>>> will i
>>>> be repeating at 50,000?
>>>> 
>>>> you need to learn.
>>> 
>>> Look, over the course of several years two things have become
>>> blatantly obvious:
>>> 
>>> 1. By just about every measure Project Gutenberg is severely
>>> broken.
>>> 
>>> 2. Michael Hart (who is the only one whose opinion counts at 
>>> PG)
>>> is adamantly opposed to any of the changes which would be 
>>> required
>>> to fix it.
>> 
>> I have supported all the people who wanted to make changes.
>> 
>> They just haven't used that support to make the changes.
>> 
>>> Give up.
>> 
>> That's exactly what they have done.
>> 
>> They don't need your encouragement to do more of that.
>> 
>> 
>> 
>>> Move on.
>> 
>> That's exactly what they have NOT done.
>> 
>> 
>> 
>>> If you're repeating this message at 50,000, it will be because
>>> you're the one who won't learn -- that PG is on a dead-end 
>>> course,
>>> and will not alter its direction.
>> 
>> Anyone can "alter its direction". . .but without action. . .no.
>> 
>> If you only put as much into action as into your words.
>> 
>> Just get out there and DO 10 books the way YOU want as an 
>> example?
>> 
>> Then 100.
>> 
>> Then 1,000.
>> 
>> Then 10,000.
>> 
>> Then 100,000.
>> 
>> However, if you are not willing to even start with 10,
>> how can people find you to be their leader, if you go nowhere?
>> 
>> 
>>> Those of us who want a reliable archive of consistently 
>>> formatted
>>> documents with accurate cataloging information (this is the
>>> "metadata" which you have so reviled in the past) will simply 
>>> have
>>> to start over. I see no other alternative.
>> 
>> 
>> Every level of progress was made by "starting over."
>> 
>> So. . ."Get Started!"
>> 
>> The starting line is wherever YOU put it.
>> 
>> The starting gun goes off whenever YOU start.
>> 
>> You!!!
>> 
>> Go!!!
>> 
>

From tb at baechler.net  Sat May 24 02:00:13 2008
From: tb at baechler.net (Tony Baechler)
Date: Sat, 24 May 2008 02:00:13 -0700
Subject: [gutvol-d] Why stay with PG?
Message-ID: <20080524090013.GA17892@investigative.net>

Hello all,

I rarely post here because of the bickering and general disagreements, 
but I have to comment on many of the recent posts pointing out problems 
with PG and generally complaining on how things are done.  I won't 
comment on whether any of you are right or not.  As far as I'm 
concerned, I don't care who's right as long as there are more books.  As 
long as they're in plain text, that's good enough for me. :-)

I do have a question for all of you who frequently point out problems 
with PG and how Michael does things.  Why do you stick with PG year 
after year?  Why not just abandon PG and start your own project?  Web 
hosting is incredibly cheap nowadays.  Even dedicated servers aren't 
THAT expensive.  Michael and Greg Newby have offered free server space 
many times to anyone who asks, but I'm assuming that you want to 
distance yourselves from PG.  So, again I ask.  Why bother?  If you 
think PG has poor standards, why not create your own?  OK, so you would 
rather use the already existing PG ebooks and reformat them.  Fine, get 
the PG DVD or take Greg's offer of free server space and build on them.  
Various people have said that all ebooks must/should be in xml, tei, 
etc.  OK, so do what blackmask.com used to do and reformat them under 
your own domain and your own standards.  If you remove the PG name and 
small print, you don't even have to credit PG if I understand correctly.  
What prevents you from jumping ship and doing your own thing?

My personal opinion is as I said above.  I don't really care what format 
book X is in as long as I can convert it to plain text.  If PG releases 
25,000 ebooks and all of the people who have complaints against PG each 
release a few more books, that's a few more books more than the 25,000 
already available.  Competition isn't all bad.  For people who don't 
like DP, create your own DP competitor.  I would like to see a bunch of 
DP-like organizations all trying to produce the best quality.  I can get 
the last laugh because it's all public domain anyway and the books will 
still eventually make their way to PG, albeit probably not by the person 
who created the DP competitors or their own versions of PG.  Otherwise, 
if it's all talk with no intention to do anything, shut up and don't 
waste bandwidth for no  good reason.  Personally I think it's all talk 
and that is what it looks like.  It must not be 100% talk though since 
ebookforge.net somehow got created and has produced a few files for PG.

Please don't reply off list as all non-list email gets automatically 
deleted.

From prosfilaes at gmail.com  Sat May 24 03:56:53 2008
From: prosfilaes at gmail.com (David Starner)
Date: Sat, 24 May 2008 06:56:53 -0400
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <20080524090013.GA17892@investigative.net>
References: <20080524090013.GA17892@investigative.net>
Message-ID: <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>

On Sat, May 24, 2008 at 5:00 AM, Tony Baechler <tb at baechler.net> wrote:
> I do have a question for all of you who frequently point out problems
> with PG and how Michael does things.  Why do you stick with PG year
> after year?

Because PG is the source for ebooks; if I started posting somewhere
else, I'd lose half my audience. And Michael is no longer the end-all
and be-all of PG; I feel perfectly fine ignoring him and being part of
PG.

> I can get
> the last laugh because it's all public domain anyway and the books will
> still eventually make their way to PG, albeit probably not by the person
> who created the DP competitors or their own versions of PG.

So you get the last laugh because you're wasting effort? Do you know
that "The Brothers Karamazov" is just now going through DP? Do you
have any idea how much material is out that PG doesn't have, and
probably won't have for decades, if ever? Look at
<http://www.sacred-texts.com/> and see just how much material has been
out there for years that we've never touched. It's surely not going to
speed up if you drive people away.

From Bowerbird at aol.com  Sat May 24 11:46:08 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 24 May 2008 14:46:08 EDT
Subject: [gutvol-d] Why stay with PG?
Message-ID: <c5e.35e80bf2.3569bc70@aol.com>

tony said:
>    Why stay with PG?

well, because i love project gutenberg.   and i love michael hart.
and i love the work he's done as an important e-book pioneer...

moreover, thanks to his insistence on the plain-text baseline,
i eventually recognized the full power of a no-markup format,
which not even michael grasps fully.

so i'm willing to stick with project gutenberg to make it better...

but it seems that i haven't been able to do that with simple logic.

so now my course of action will be to mount my own mirror and
show the superiority of no-markup format applied consistently.

so, in one sense, yes, i'll then be "abandoning" project gutenberg.
because there won't be one mention of it in my library.   can't be...

and it won't be easy for p.g. to back-assimilate my files, because
i will have tossed out a lot of work that was done by its volunteers
(like hand-crafted -- and thus unmaintainable -- .html versions).

but i'll still be here, every day, having discussions in the lobby of
the project gutenberg library, talking about how i can do _more_,
with less work, because i took the simple step of _consistency_...

because one of my biggest flaws is saying "i told you so"...

and one of my biggest virtues is that i know that
love doesn't walk out the door...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080524/8418a246/attachment.htm 

From Bowerbird at aol.com  Sat May 24 13:33:49 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 24 May 2008 16:33:49 EDT
Subject: [gutvol-d] can't see the forest for the trees
Message-ID: <c7d.28b02d40.3569d5ad@aol.com>

i was gonna wait until after the holiday to post any more,
but since i just posted a message in reply to tony, i'll do this,
and finish up this thought.

michael, you seem to be on autoresponder, because you
haven't seemed to notice that we're not communicating...

you seem to think this issue is about _file-formats_.   it's not.
it's about _consistency_ -- to your established conventions...

for a couple examples, look in your f.a.q., and you'll see that
it calls for section headers to be presented in a certain way:

>   For a standard novel, you can choose either 
>    four blank lines before the chapter heading and two lines after, 
>    or three lines before and one line after, but whichever you use, 
>    do try to keep it consistent throughout.

notice right there that your own guidelines stress _consistency_...

but, as if contradicting yourself, you tell people two ways to do it.

now, just one hour ago, i got the posted digest listing 2 updates,
to e-texts #172 and #173.   both of those newly-redone e-books
have three blank lines before and two blank lines after, which is
different from either of the two options that you gave above...

to rephrase it, your e-texts aren't consistent with your own f.a.q.

and i don't see that this inconsistency buys you _anything_ at all.
it is of absolutely no benefit to you.   the only real effect that it has
is to make it difficult (sometimes to the point of total impossibility)
for programmers to deal with the library, and add some value to it.

this has absolutely nothing to do with resisting dubious file-formats.
i'm with you 100% that you should continue to exercise such resistance.

but it has everything to do with adhering to the standards you've set...
because if you don't adhere to them, what's the purpose of having 'em?

let's continue on with that same section in the f.a.q.:
>    Normally, you should move chapter headings to the left 
>    rather than try to imitate the centering that is used in some books. 

this is a good idea.   and it used to be that the e-texts were consistent
in following this rule.   but lately, there are more and more cases where
this is being disregarded, and various things are being centered.   why?

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080524/10bda1a7/attachment.htm 

From hart at pglaf.org  Sat May 24 16:26:47 2008
From: hart at pglaf.org (Michael Hart)
Date: Sat, 24 May 2008 16:26:47 -0700 (PDT)
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <c7d.28b02d40.3569d5ad@aol.com>
References: <c7d.28b02d40.3569d5ad@aol.com>
Message-ID: <Pine.LNX.4.64.0805241618030.30245@pglaf.org>


On Sat, 24 May 2008, Bowerbird at aol.com wrote:

> i was gonna wait until after the holiday to post any more,
> but since i just posted a message in reply to tony, i'll do this,
> and finish up this thought.
>
> michael, you seem to be on autoresponder, because you
> haven't seemed to notice that we're not communicating...
>
> you seem to think this issue is about _file-formats_.   it's not.
> it's about _consistency_ -- to your established conventions...
>
> for a couple examples, look in your f.a.q., and you'll see that
> it calls for section headers to be presented in a certain way:

you seem to think I'm not paying attn, just because i don't
act in accordance with your wishes.

Consistency is fine for within the various Project Gutenberg
groups, etc., and I'm sure YOU are are auto-respncer if you
continue this pretense of not having been told that before,
on many occasions.

Wake up and start your own group to make at least 10 books your way.

If you never do 10, then 100, etc., how can anyone agree with your
exemplified production techniques?

How can someone "get on board" if you are not going anywhere?

You know this.

Don't pretend.

Go For It!!!


>
>>   For a standard novel, you can choose either
>>    four blank lines before the chapter heading and two lines after,
>>    or three lines before and one line after, but whichever you use,
>>    do try to keep it consistent throughout.
>
> notice right there that your own guidelines stress _consistency_...

But this "_consistency_" is not forced on anyone.

>
> but, as if contradicting yourself, you tell people two ways to do it.

There have ALWAYS been more than one way, but no one is stopping
YOU, or any of the others, ALL of which could obviously do eBook
production better than is being done, for actually DOING it. . .

Write your own standards!

We'll put them right up there with the ones you don't like.

You would/could/should have done this years ago if YOU were
not on autoresponder.

_I_ don't make rules!

YOU don't like that!

Make your OWN rules!!!

Go For it!!

Go!  Go!!  Go!!!

Hooray your your side!!!

Now DO something. . .PLEASE!!!

Michael

>
> now, just one hour ago, i got the posted digest listing 2 updates,
> to e-texts #172 and #173.   both of those newly-redone e-books
> have three blank lines before and two blank lines after, which is
> different from either of the two options that you gave above...
>
> to rephrase it, your e-texts aren't consistent with your own f.a.q.
>
> and i don't see that this inconsistency buys you _anything_ at all.
> it is of absolutely no benefit to you.   the only real effect that it has
> is to make it difficult (sometimes to the point of total impossibility)
> for programmers to deal with the library, and add some value to it.
>
> this has absolutely nothing to do with resisting dubious file-formats.
> i'm with you 100% that you should continue to exercise such resistance.
>
> but it has everything to do with adhering to the standards you've set...
> because if you don't adhere to them, what's the purpose of having 'em?
>
> let's continue on with that same section in the f.a.q.:
>>    Normally, you should move chapter headings to the left
>>    rather than try to imitate the centering that is used in some books.
>
> this is a good idea.   and it used to be that the e-texts were consistent
> in following this rule.   but lately, there are more and more cases where
> this is being disregarded, and various things are being centered.   why?
>
> -bowerbird
>
>
>
> **************
> Get trade secrets for amazing burgers. Watch "Cooking with
> Tyler Florence" on AOL Food.
>      (http://food.aol.com/tyler-florence?video=4&amp;
> ?NCID=aolfod00030000000002)
>

From hart at pglaf.org  Sat May 24 16:57:06 2008
From: hart at pglaf.org (Michael Hart)
Date: Sat, 24 May 2008 16:57:06 -0700 (PDT)
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
References: <20080524090013.GA17892@investigative.net>
	<6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0805241654541.30245@pglaf.org>


On Sat, 24 May 2008, David Starner wrote:

> On Sat, May 24, 2008 at 5:00 AM, Tony Baechler <tb at baechler.net> 
> wrote:
>> I do have a question for all of you who frequently point out 
>> problems with PG and how Michael does things.  Why do you stick 
>> with PG year after year?
>
> Because PG is the source for ebooks; if I started posting 
> somewhere else, I'd lose half my audience. And Michael is no 
> longer the end-all and be-all of PG; I feel perfectly fine 
> ignoring him and being part of PG.

That's the whole point.

No one should be the end-all and be-all of Project Gutenberg.

Now that that has been clarified, let's get everyone out from
behind their supposed eight-ball and have them get going.


>> I can get the last laugh because it's all public domain anyway 
>> and the books will still eventually make their way to PG, albeit 
>> probably not by the person who created the DP competitors or 
>> their own versions of PG.
>
> So you get the last laugh because you're wasting effort? Do you 
> know that "The Brothers Karamazov" is just now going through DP? 
> Do you have any idea how much material is out that PG doesn't 
> have, and probably won't have for decades, if ever? Look at 
> <http://www.sacred-texts.com/> and see just how much material has 
> been out there for years that we've never touched. It's surely not 
> going to speed up if you drive people away. 
> list gutvol-d at lists.pglaf.org 
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From dakretz at gmail.com  Sat May 24 21:07:04 2008
From: dakretz at gmail.com (don kretz)
Date: Sat, 24 May 2008 21:07:04 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 27
In-Reply-To: <mailman.2.1211655601.27155.gutvol-d@lists.pglaf.org>
References: <mailman.2.1211655601.27155.gutvol-d@lists.pglaf.org>
Message-ID: <627d59b80805242107v672dac32jc8e280e0aa9b64be@mail.gmail.com>

>
>
>
> ---------- Forwarded message ----------
> From: Tony Baechler <tb at baechler.net>
> To: gutvol-d at lists.pglaf.org
> Date: Sat, 24 May 2008 02:00:13 -0700
> Subject: [gutvol-d] Why stay with PG?
>
> |
> |    ... or take Greg's offer of free server space ...
> |


What is the nature of this offer? I'm not familiar with it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080524/77f70492/attachment.htm 

From sly at victoria.tc.ca  Sat May 24 22:36:59 2008
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat, 24 May 2008 22:36:59 -0700 (PDT)
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
References: <20080524090013.GA17892@investigative.net>
	<6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
Message-ID: <Pine.GSO.4.58.0805242232300.23694@vtn1.victoria.tc.ca>


On Sat, 24 May 2008, David Starner wrote:

Do you know
> that "The Brothers Karamazov" is just now going through DP? Do you
> have any idea how much material is out that PG doesn't have, and
> probably won't have for decades, if ever? Look at
> <http://www.sacred-texts.com/> and see just how much material has been
> out there for years that we've never touched. It's surely not going to
> speed up if you drive people away.

In looking up details for items in the PG catalog, I sometimes do
a general search on an author name to try to check dates from another
source. Surprisingly often I find other texts online by the same
author on other sites that are not on PG--sometimes a whole collection
of them. I believe the amount of such material which is around is
larger then we generally realize. Even the extensive listings of the
Online Books Page is just the tip of the iceberg.

Andrew

From gbnewby at pglaf.org  Sat May 24 23:21:59 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat, 24 May 2008 23:21:59 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 27
In-Reply-To: <627d59b80805242107v672dac32jc8e280e0aa9b64be@mail.gmail.com>
References: <mailman.2.1211655601.27155.gutvol-d@lists.pglaf.org>
	<627d59b80805242107v672dac32jc8e280e0aa9b64be@mail.gmail.com>
Message-ID: <20080525062159.GA2592@mail.pglaf.org>

On Sat, May 24, 2008 at 09:07:04PM -0700, don kretz wrote:
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: Tony Baechler <tb at baechler.net>
> > To: gutvol-d at lists.pglaf.org
> > Date: Sat, 24 May 2008 02:00:13 -0700
> > Subject: [gutvol-d] Why stay with PG?
> >
> > |
> > |    ... or take Greg's offer of free server space ...
> > |
> 
> 
> 
> What is the nature of this offer? I'm not familiar with it.

I have several rather large systems that I'm perpetually
happy to provide access to for various projects.  These
include:
  snowy.arsc.alaska.edu
  readingroo.ms
and for mailing lists,
  lists.pglaf.org

snowy & readingroo have complete copies of the PG
collection.  I am working on a plan, which I'll share
here in my next message, to provide user updates to copies of the
PG collection, as well as features like personal
bookshelves and commentary on eBooks.  I know lots
of people have their own ideas about such things, and
encourage them to share their plans and/or get involved
with common efforts. 

As was recently pointed out on the gutvol-d list,
eBookforge is one project that resulted from such
a "spin-off" effort.  There are others, ranging
in size from the Project Gutenberg Consortia
Collection [gutenberg.cc] to Bowerbird's own collection
of ZML-enabled eBooks [which partially lives on 
snowy].
  -- Greg


From tb at baechler.net  Sun May 25 02:11:11 2008
From: tb at baechler.net (Tony Baechler)
Date: Sun, 25 May 2008 02:11:11 -0700
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
References: <20080524090013.GA17892@investigative.net>
	<6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
Message-ID: <48392D2F.6070509@baechler.net>

David Starner wrote:
> On Sat, May 24, 2008 at 5:00 AM, Tony Baechler <tb at baechler.net> wrote:
>   
>> I do have a question for all of you who frequently point out problems
>> with PG and how Michael does things.  Why do you stick with PG year
>> after year?
>>     
>
> Because PG is the source for ebooks; if I started posting somewhere
> else, I'd lose half my audience. And Michael is no longer the end-all
> and be-all of PG; I feel perfectly fine ignoring him and being part of
> PG.
>
>   
Yes, but what about the Internet Archive?  You could submit texts to 
them just as easily.  With a little funding, you could advertise on 
Google and other places to bring in an audience.  There are many sites 
which use PG files, often without credit.  Search for any popular public 
domain book and you'll find a bunch.
>> I can get
>> the last laugh because it's all public domain anyway and the books will
>> still eventually make their way to PG, albeit probably not by the person
>> who created the DP competitors or their own versions of PG.
>>     
>
> So you get the last laugh because you're wasting effort? Do you know
> that "The Brothers Karamazov" is just now going through DP? Do you
> have any idea how much material is out that PG doesn't have, and
> probably won't have for decades, if ever? Look at
> <http://www.sacred-texts.com/> and see just how much material has been
> out there for years that we've never touched. It's surely not going to
> speed up if you drive people away.
> __

Yes, I think I'm aware of how much PG doesn't have.  Look at Google 
books, the Library of Congress, IA, etc.  That doesn't count libraries 
in other countries with non-English texts.  The thing is that I can 
wait.  If I just couldn't wait, either I would buy a used reprint and 
scan it myself or would set up any of several free OCR packages and 
process the already available page images.  You somewhat misunderstand 
me though.  I'm not saying that people would be driven away.  I'm saying 
that if we have a bunch of DP spinoffs and DP-like competition going on, 
essentially twice or three times more books can be produced.  Even if 
those DP spinoffs don't post to PG, at least high quality ebooks would 
be available for PG harvesting.  If they all did post to PG, instead of 
however many books DP currently produces per day, multiply that by two, 
three, ten, etc, depending on how many organizations there are and how 
well they all produce.  The resources are out there for those who want 
to tap them.  Regarding losing your audience, all I can say is that 
anyone can put up web sites and anyone can find them with good search 
engines.  Between the PG newsletter (when one is actually posted), the 
IA forums, Google itself, and the many other book sites and newsgroups 
out there, I really don't see how you could say that you would be 
driving your audience away.  As with all new projects it would start 
small as one could expect, but it could grow over time as PG Australia 
has.  Personally, I think more of the free software people should be 
targetted.  I see a paralel between making and producing free, GPL 
software and making and producing public domain ebooks.  Oh, there's 
Creative Commons also.  While CC is mostly interested in their own 
licenses, they also could push for more public domain ebooks.

I don't think and never meant to imply that people's efforts should be 
ignored, duplicated, etc.  I don't believe in wasting effort anymore 
than most people.
  I don't want to drive people away either.  I'm just saying that with a 
small amount of effort to start separate projects, more could be 
produced, not less.  One could still post anything they produce to PG 
but have multiple DPs doing the actual work.  DP already has a 
harvesting page dealing with IA and some others.  Split those efforts 
into new organizations with more proofers and, if the founder of such 
splits feels strongly enough, higher or different standards.  This is 
similar to what DP and DP Europe have done.  They both produce, they 
both post to PG, and some people proof for both even though they are 
separate and deal with different areas.  I would like to see a DP 
Australia, or even DPs not specific to any country.

Likewise, their could be PG spinoffs with their own standards.  Let's 
make up a PG spinoff which will only post html and pdf and won't accept 
plain text.  Well, since DP would still be posting to PG, they would of 
course still be producing plain text.  However, they could post their 
special pdf and html to the PG spinoff.  This way we have multiple DP 
organizations, we still have a central PG which accepts everything, and 
there would be one or many PG spinoffs which would take only page 
images, only html, only pdf, only some other format, some combination of 
the above, etc.  If the sites are developed correctly with appropriate 
keywords, I as a scholar could find some obscure text with complete page 
images and a nicely formatted document, while an average reader could 
find the basic plain text.  As I said previously, I would still get the 
last laugh because all of the DPs and PGs would eventually be posted or 
harvested to the original PG one of these days and I can wait until that 
happens.


Please don't send email to me off-list as all non-list email gets 
automatically deleted.

From prosfilaes at gmail.com  Sun May 25 08:04:43 2008
From: prosfilaes at gmail.com (David Starner)
Date: Sun, 25 May 2008 11:04:43 -0400
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <48392D2F.6070509@baechler.net>
References: <20080524090013.GA17892@investigative.net>
	<6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
	<48392D2F.6070509@baechler.net>
Message-ID: <6d99d1fd0805250804g2cd42728sf8446b7d66bac366@mail.gmail.com>

You said "shut up and don't waste bandwidth for no good reason." Try
it out; remember that this is running over megabit and gigabit fiber,
but people are much lower bandwidth.

Of course,
> Please don't send email to me off-list as all non-list email gets
> automatically deleted.
also wastes bandwidth for no good reason.

I don't know how you can honestly claim that you don't want to drive
away people, if you want to get the "last laugh". You bring up DP, but
that's a red herring; the question was "Why do you stick with PG year
after year?"

From paulmaas at airpost.net  Sun May 25 11:42:01 2008
From: paulmaas at airpost.net (Paul Maas)
Date: Sun, 25 May 2008 11:42:01 -0700
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <48392D2F.6070509@baechler.net>
References: <20080524090013.GA17892@investigative.net>
	<6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
	<48392D2F.6070509@baechler.net>
Message-ID: <1211740921.8444.1254998623@webmail.messagingengine.com>

Maybe PG needs to recast itself as a text archive.  A sort of text
"commons", if you will.  PG could allow multiple submissions from
different projects for a particular book title, and let the reader
pick the one they'd prefer to read.  Even if PG now views itself
this way, the perception is still "PG has one core text per book."

So what is PG? What should PG be?


On Sun, 25 May 2008 02:11:11 -0700, "Tony Baechler" <tb at baechler.net>
said:
> David Starner wrote:
> > On Sat, May 24, 2008 at 5:00 AM, Tony Baechler <tb at baechler.net> wrote:
> >   
> >> I do have a question for all of you who frequently point out problems
> >> with PG and how Michael does things.  Why do you stick with PG year
> >> after year?
> >>     
> >
> > Because PG is the source for ebooks; if I started posting somewhere
> > else, I'd lose half my audience. And Michael is no longer the end-all
> > and be-all of PG; I feel perfectly fine ignoring him and being part of
> > PG.
> >
> >   
> Yes, but what about the Internet Archive?  You could submit texts to 
> them just as easily.  With a little funding, you could advertise on 
> Google and other places to bring in an audience.  There are many sites 
> which use PG files, often without credit.  Search for any popular public 
> domain book and you'll find a bunch.
> >> I can get
> >> the last laugh because it's all public domain anyway and the books will
> >> still eventually make their way to PG, albeit probably not by the person
> >> who created the DP competitors or their own versions of PG.
> >>     
> >
> > So you get the last laugh because you're wasting effort? Do you know
> > that "The Brothers Karamazov" is just now going through DP? Do you
> > have any idea how much material is out that PG doesn't have, and
> > probably won't have for decades, if ever? Look at
> > <http://www.sacred-texts.com/> and see just how much material has been
> > out there for years that we've never touched. It's surely not going to
> > speed up if you drive people away.
> > __
> 
> Yes, I think I'm aware of how much PG doesn't have.  Look at Google 
> books, the Library of Congress, IA, etc.  That doesn't count libraries 
> in other countries with non-English texts.  The thing is that I can 
> wait.  If I just couldn't wait, either I would buy a used reprint and 
> scan it myself or would set up any of several free OCR packages and 
> process the already available page images.  You somewhat misunderstand 
> me though.  I'm not saying that people would be driven away.  I'm saying 
> that if we have a bunch of DP spinoffs and DP-like competition going on, 
> essentially twice or three times more books can be produced.  Even if 
> those DP spinoffs don't post to PG, at least high quality ebooks would 
> be available for PG harvesting.  If they all did post to PG, instead of 
> however many books DP currently produces per day, multiply that by two, 
> three, ten, etc, depending on how many organizations there are and how 
> well they all produce.  The resources are out there for those who want 
> to tap them.  Regarding losing your audience, all I can say is that 
> anyone can put up web sites and anyone can find them with good search 
> engines.  Between the PG newsletter (when one is actually posted), the 
> IA forums, Google itself, and the many other book sites and newsgroups 
> out there, I really don't see how you could say that you would be 
> driving your audience away.  As with all new projects it would start 
> small as one could expect, but it could grow over time as PG Australia 
> has.  Personally, I think more of the free software people should be 
> targetted.  I see a paralel between making and producing free, GPL 
> software and making and producing public domain ebooks.  Oh, there's 
> Creative Commons also.  While CC is mostly interested in their own 
> licenses, they also could push for more public domain ebooks.
> 
> I don't think and never meant to imply that people's efforts should be 
> ignored, duplicated, etc.  I don't believe in wasting effort anymore 
> than most people.
>   I don't want to drive people away either.  I'm just saying that with a 
> small amount of effort to start separate projects, more could be 
> produced, not less.  One could still post anything they produce to PG 
> but have multiple DPs doing the actual work.  DP already has a 
> harvesting page dealing with IA and some others.  Split those efforts 
> into new organizations with more proofers and, if the founder of such 
> splits feels strongly enough, higher or different standards.  This is 
> similar to what DP and DP Europe have done.  They both produce, they 
> both post to PG, and some people proof for both even though they are 
> separate and deal with different areas.  I would like to see a DP 
> Australia, or even DPs not specific to any country.
> 
> Likewise, their could be PG spinoffs with their own standards.  Let's 
> make up a PG spinoff which will only post html and pdf and won't accept 
> plain text.  Well, since DP would still be posting to PG, they would of 
> course still be producing plain text.  However, they could post their 
> special pdf and html to the PG spinoff.  This way we have multiple DP 
> organizations, we still have a central PG which accepts everything, and 
> there would be one or many PG spinoffs which would take only page 
> images, only html, only pdf, only some other format, some combination of 
> the above, etc.  If the sites are developed correctly with appropriate 
> keywords, I as a scholar could find some obscure text with complete page 
> images and a nicely formatted document, while an average reader could 
> find the basic plain text.  As I said previously, I would still get the 
> last laugh because all of the DPs and PGs would eventually be posted or 
> harvested to the original PG one of these days and I can wait until that 
> happens.
> 
> 
> Please don't send email to me off-list as all non-list email gets 
> automatically deleted.
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
-- 
  Paul Maas
  paulmaas at airpost.net

-- 
http://www.fastmail.fm - I mean, what is it about a decent email service?


From Morasch at aol.com  Sun May 25 12:19:26 2008
From: Morasch at aol.com (Morasch at aol.com)
Date: Sun, 25 May 2008 15:19:26 EDT
Subject: [gutvol-d] can't see the forest for the trees
Message-ID: <cd3.34113456.356b15be@aol.com>

michael said:
>    you seem to think I'm not paying attn, just 
>    because i don't act in accordance with your wishes.

no, i don't think you're paying attention because
you haven't addressed the logical argument that
there's no purpose served by the inconsistencies
while there _are_ great benefits of consistency...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080525/4d53a256/attachment.htm 

From hart at pglaf.org  Sun May 25 13:37:05 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 25 May 2008 13:37:05 -0700 (PDT)
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <cd3.34113456.356b15be@aol.com>
References: <cd3.34113456.356b15be@aol.com>
Message-ID: <Pine.LNX.4.64.0805251331060.12767@pglaf.org>


On Sun, 25 May 2008, Morasch at aol.com wrote:

> michael said:
>>    you seem to think I'm not paying attn, just
>>    because i don't act in accordance with your wishes.
>
> no, i don't think you're paying attention because
> you haven't addressed the logical argument that
> there's no purpose served by the inconsistencies
> while there _are_ great benefits of consistency...
>
> -bowerbird


And you refuse, after being told so many times,
just give your own consistency a trial run from
which to garner support.

After all, if YOU are unwilling to lead how can
anyone else be expected to follow?

You SAY you respect me because I got out there,
did the grunt work to provide an example. . . .

Now how about a little SELF-RESPECT and getting
out there and setting your own examples?

Rather than just contradicting yourself, SAYING
you respect me and my work, NOT DOING the work.

Actions speak more loudly that words.

You have put SO many works out here that I will
have to admit I can understand those who say it
it too much and thus won't read them.

JUST DO IT!!!

GO!!!

GO!!!

GO!!!

WIN!!!

WIN!!!

WIN!!!

STOP TALKING!!!

DO!!!


From hart at pglaf.org  Sun May 25 13:39:12 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 25 May 2008 13:39:12 -0700 (PDT)
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <1211740921.8444.1254998623@webmail.messagingengine.com>
References: <20080524090013.GA17892@investigative.net>
	<6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
	<48392D2F.6070509@baechler.net>
	<1211740921.8444.1254998623@webmail.messagingengine.com>
Message-ID: <Pine.LNX.4.64.0805251337360.12767@pglaf.org>


On Sun, 25 May 2008, Paul Maas wrote:

> Maybe PG needs to recast itself as a text archive.  A sort of text
> "commons", if you will.  PG could allow multiple submissions from
> different projects for a particular book title, and let the reader
> pick the one they'd prefer to read.  Even if PG now views itself
> this way, the perception is still "PG has one core text per book."
>
> So what is PG? What should PG be?

We have always allow "multiple submissions" for each book.

Right back to the very beginning with Roget and Paradise Lost.

No one ever seemed to have an objection.

mh

>
>
> On Sun, 25 May 2008 02:11:11 -0700, "Tony Baechler" <tb at baechler.net>
> said:
>> David Starner wrote:
>>> On Sat, May 24, 2008 at 5:00 AM, Tony Baechler <tb at baechler.net> wrote:
>>>
>>>> I do have a question for all of you who frequently point out problems
>>>> with PG and how Michael does things.  Why do you stick with PG year
>>>> after year?
>>>>
>>>
>>> Because PG is the source for ebooks; if I started posting somewhere
>>> else, I'd lose half my audience. And Michael is no longer the end-all
>>> and be-all of PG; I feel perfectly fine ignoring him and being part of
>>> PG.
>>>
>>>
>> Yes, but what about the Internet Archive?  You could submit texts to
>> them just as easily.  With a little funding, you could advertise on
>> Google and other places to bring in an audience.  There are many sites
>> which use PG files, often without credit.  Search for any popular public
>> domain book and you'll find a bunch.
>>>> I can get
>>>> the last laugh because it's all public domain anyway and the books will
>>>> still eventually make their way to PG, albeit probably not by the person
>>>> who created the DP competitors or their own versions of PG.
>>>>
>>>
>>> So you get the last laugh because you're wasting effort? Do you know
>>> that "The Brothers Karamazov" is just now going through DP? Do you
>>> have any idea how much material is out that PG doesn't have, and
>>> probably won't have for decades, if ever? Look at
>>> <http://www.sacred-texts.com/> and see just how much material has been
>>> out there for years that we've never touched. It's surely not going to
>>> speed up if you drive people away.
>>> __
>>
>> Yes, I think I'm aware of how much PG doesn't have.  Look at Google
>> books, the Library of Congress, IA, etc.  That doesn't count libraries
>> in other countries with non-English texts.  The thing is that I can
>> wait.  If I just couldn't wait, either I would buy a used reprint and
>> scan it myself or would set up any of several free OCR packages and
>> process the already available page images.  You somewhat misunderstand
>> me though.  I'm not saying that people would be driven away.  I'm saying
>> that if we have a bunch of DP spinoffs and DP-like competition going on,
>> essentially twice or three times more books can be produced.  Even if
>> those DP spinoffs don't post to PG, at least high quality ebooks would
>> be available for PG harvesting.  If they all did post to PG, instead of
>> however many books DP currently produces per day, multiply that by two,
>> three, ten, etc, depending on how many organizations there are and how
>> well they all produce.  The resources are out there for those who want
>> to tap them.  Regarding losing your audience, all I can say is that
>> anyone can put up web sites and anyone can find them with good search
>> engines.  Between the PG newsletter (when one is actually posted), the
>> IA forums, Google itself, and the many other book sites and newsgroups
>> out there, I really don't see how you could say that you would be
>> driving your audience away.  As with all new projects it would start
>> small as one could expect, but it could grow over time as PG Australia
>> has.  Personally, I think more of the free software people should be
>> targetted.  I see a paralel between making and producing free, GPL
>> software and making and producing public domain ebooks.  Oh, there's
>> Creative Commons also.  While CC is mostly interested in their own
>> licenses, they also could push for more public domain ebooks.
>>
>> I don't think and never meant to imply that people's efforts should be
>> ignored, duplicated, etc.  I don't believe in wasting effort anymore
>> than most people.
>>   I don't want to drive people away either.  I'm just saying that with a
>> small amount of effort to start separate projects, more could be
>> produced, not less.  One could still post anything they produce to PG
>> but have multiple DPs doing the actual work.  DP already has a
>> harvesting page dealing with IA and some others.  Split those efforts
>> into new organizations with more proofers and, if the founder of such
>> splits feels strongly enough, higher or different standards.  This is
>> similar to what DP and DP Europe have done.  They both produce, they
>> both post to PG, and some people proof for both even though they are
>> separate and deal with different areas.  I would like to see a DP
>> Australia, or even DPs not specific to any country.
>>
>> Likewise, their could be PG spinoffs with their own standards.  Let's
>> make up a PG spinoff which will only post html and pdf and won't accept
>> plain text.  Well, since DP would still be posting to PG, they would of
>> course still be producing plain text.  However, they could post their
>> special pdf and html to the PG spinoff.  This way we have multiple DP
>> organizations, we still have a central PG which accepts everything, and
>> there would be one or many PG spinoffs which would take only page
>> images, only html, only pdf, only some other format, some combination of
>> the above, etc.  If the sites are developed correctly with appropriate
>> keywords, I as a scholar could find some obscure text with complete page
>> images and a nicely formatted document, while an average reader could
>> find the basic plain text.  As I said previously, I would still get the
>> last laugh because all of the DPs and PGs would eventually be posted or
>> harvested to the original PG one of these days and I can wait until that
>> happens.
>>
>>
>> Please don't send email to me off-list as all non-list email gets
>> automatically deleted.
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d at lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> --
>  Paul Maas
>  paulmaas at airpost.net
>
> -- 
> http://www.fastmail.fm - I mean, what is it about a decent email service?
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Sun May 25 14:18:18 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 25 May 2008 14:18:18 -0700 (PDT)
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <Pine.GSO.4.58.0805242232300.23694@vtn1.victoria.tc.ca>
References: <20080524090013.GA17892@investigative.net>
	<6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
	<Pine.GSO.4.58.0805242232300.23694@vtn1.victoria.tc.ca>
Message-ID: <Pine.LNX.4.64.0805251414420.12767@pglaf.org>


On Sat, 24 May 2008, Andrew Sly wrote:

>
>
> On Sat, 24 May 2008, David Starner wrote:
>
> Do you know
>> that "The Brothers Karamazov" is just now going through DP? Do you
>> have any idea how much material is out that PG doesn't have, and
>> probably won't have for decades, if ever? Look at
>> <http://www.sacred-texts.com/> and see just how much material has been
>> out there for years that we've never touched. It's surely not going to
>> speed up if you drive people away.
>
> In looking up details for items in the PG catalog, I sometimes do
> a general search on an author name to try to check dates from another
> source. Surprisingly often I find other texts online by the same
> author on other sites that are not on PG--sometimes a whole collection
> of them. I believe the amount of such material which is around is
> larger then we generally realize. Even the extensive listings of the
> Online Books Page is just the tip of the iceberg.
>
> Andrew

I agree.

I would estimate there are millions of eBooks on the Net already,
not even counting Google, Carnegie Mellon, etc., just from plain
folks who want to share their favorite books.

I think we will hit 10 million public domain eBooks in 5 years--
whether or not Google, Carnegie Mellon, etc., really get project
progress into gear in a manner that makes it easy to share them.

And don't forget commercial eBooks.

I wouldn't doubt there are already a million of those.


mh


> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From paulmaas at airpost.net  Sun May 25 15:07:09 2008
From: paulmaas at airpost.net (Paul Maas)
Date: Sun, 25 May 2008 15:07:09 -0700
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <Pine.LNX.4.64.0805251337360.12767@pglaf.org>
References: <20080524090013.GA17892@investigative.net>
	<6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
	<48392D2F.6070509@baechler.net>
	<1211740921.8444.1254998623@webmail.messagingengine.com>
	<Pine.LNX.4.64.0805251337360.12767@pglaf.org>
Message-ID: <1211753229.9309.1255015053@webmail.messagingengine.com>

That's good to hear from your lips, Michael?

So you agree then that the PG Archive is simply a "text"
commons where individuals and organizations can place their
transcribed texts to share with others?

If so, then you are right.  There are no issues.  PG should
issue NO requirements other than the submitted text is
transcribed from a public domain printing, and maybe that a
plain text version is submitted along with whatever other
formats the submitter wishes to donate.

"Give me your tired, your poor,
Your digital texts yearning to breathe free,........."

But then, as I think of it, why not just move over and
merge with the Internet Archive?  This will solve a lot
of problems.  Why should PG continue as an independent
entity apart from TIA?  What purpose does PG play any
more?  Hasn't it already fulfilled its mission?


On Sun, 25 May 2008 13:39:12 -0700 (PDT), "Michael Hart"
<hart at pglaf.org> said:
> 
> On Sun, 25 May 2008, Paul Maas wrote:
> 
> > Maybe PG needs to recast itself as a text archive.  A sort of text
> > "commons", if you will.  PG could allow multiple submissions from
> > different projects for a particular book title, and let the reader
> > pick the one they'd prefer to read.  Even if PG now views itself
> > this way, the perception is still "PG has one core text per book."
> >
> > So what is PG? What should PG be?
> 
> We have always allow "multiple submissions" for each book.
> 
> Right back to the very beginning with Roget and Paradise Lost.
> 
> No one ever seemed to have an objection.
> 
> mh
> 
> >
> >
> > On Sun, 25 May 2008 02:11:11 -0700, "Tony Baechler" <tb at baechler.net>
> > said:
> >> David Starner wrote:
> >>> On Sat, May 24, 2008 at 5:00 AM, Tony Baechler <tb at baechler.net> wrote:
> >>>
> >>>> I do have a question for all of you who frequently point out problems
> >>>> with PG and how Michael does things.  Why do you stick with PG year
> >>>> after year?
> >>>>
> >>>
> >>> Because PG is the source for ebooks; if I started posting somewhere
> >>> else, I'd lose half my audience. And Michael is no longer the end-all
> >>> and be-all of PG; I feel perfectly fine ignoring him and being part of
> >>> PG.
> >>>
> >>>
> >> Yes, but what about the Internet Archive?  You could submit texts to
> >> them just as easily.  With a little funding, you could advertise on
> >> Google and other places to bring in an audience.  There are many sites
> >> which use PG files, often without credit.  Search for any popular public
> >> domain book and you'll find a bunch.
> >>>> I can get
> >>>> the last laugh because it's all public domain anyway and the books will
> >>>> still eventually make their way to PG, albeit probably not by the person
> >>>> who created the DP competitors or their own versions of PG.
> >>>>
> >>>
> >>> So you get the last laugh because you're wasting effort? Do you know
> >>> that "The Brothers Karamazov" is just now going through DP? Do you
> >>> have any idea how much material is out that PG doesn't have, and
> >>> probably won't have for decades, if ever? Look at
> >>> <http://www.sacred-texts.com/> and see just how much material has been
> >>> out there for years that we've never touched. It's surely not going to
> >>> speed up if you drive people away.
> >>> __
> >>
> >> Yes, I think I'm aware of how much PG doesn't have.  Look at Google
> >> books, the Library of Congress, IA, etc.  That doesn't count libraries
> >> in other countries with non-English texts.  The thing is that I can
> >> wait.  If I just couldn't wait, either I would buy a used reprint and
> >> scan it myself or would set up any of several free OCR packages and
> >> process the already available page images.  You somewhat misunderstand
> >> me though.  I'm not saying that people would be driven away.  I'm saying
> >> that if we have a bunch of DP spinoffs and DP-like competition going on,
> >> essentially twice or three times more books can be produced.  Even if
> >> those DP spinoffs don't post to PG, at least high quality ebooks would
> >> be available for PG harvesting.  If they all did post to PG, instead of
> >> however many books DP currently produces per day, multiply that by two,
> >> three, ten, etc, depending on how many organizations there are and how
> >> well they all produce.  The resources are out there for those who want
> >> to tap them.  Regarding losing your audience, all I can say is that
> >> anyone can put up web sites and anyone can find them with good search
> >> engines.  Between the PG newsletter (when one is actually posted), the
> >> IA forums, Google itself, and the many other book sites and newsgroups
> >> out there, I really don't see how you could say that you would be
> >> driving your audience away.  As with all new projects it would start
> >> small as one could expect, but it could grow over time as PG Australia
> >> has.  Personally, I think more of the free software people should be
> >> targetted.  I see a paralel between making and producing free, GPL
> >> software and making and producing public domain ebooks.  Oh, there's
> >> Creative Commons also.  While CC is mostly interested in their own
> >> licenses, they also could push for more public domain ebooks.
> >>
> >> I don't think and never meant to imply that people's efforts should be
> >> ignored, duplicated, etc.  I don't believe in wasting effort anymore
> >> than most people.
> >>   I don't want to drive people away either.  I'm just saying that with a
> >> small amount of effort to start separate projects, more could be
> >> produced, not less.  One could still post anything they produce to PG
> >> but have multiple DPs doing the actual work.  DP already has a
> >> harvesting page dealing with IA and some others.  Split those efforts
> >> into new organizations with more proofers and, if the founder of such
> >> splits feels strongly enough, higher or different standards.  This is
> >> similar to what DP and DP Europe have done.  They both produce, they
> >> both post to PG, and some people proof for both even though they are
> >> separate and deal with different areas.  I would like to see a DP
> >> Australia, or even DPs not specific to any country.
> >>
> >> Likewise, their could be PG spinoffs with their own standards.  Let's
> >> make up a PG spinoff which will only post html and pdf and won't accept
> >> plain text.  Well, since DP would still be posting to PG, they would of
> >> course still be producing plain text.  However, they could post their
> >> special pdf and html to the PG spinoff.  This way we have multiple DP
> >> organizations, we still have a central PG which accepts everything, and
> >> there would be one or many PG spinoffs which would take only page
> >> images, only html, only pdf, only some other format, some combination of
> >> the above, etc.  If the sites are developed correctly with appropriate
> >> keywords, I as a scholar could find some obscure text with complete page
> >> images and a nicely formatted document, while an average reader could
> >> find the basic plain text.  As I said previously, I would still get the
> >> last laugh because all of the DPs and PGs would eventually be posted or
> >> harvested to the original PG one of these days and I can wait until that
> >> happens.
> >>
> >>
> >> Please don't send email to me off-list as all non-list email gets
> >> automatically deleted.
> >> _______________________________________________
> >> gutvol-d mailing list
> >> gutvol-d at lists.pglaf.org
> >> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> > --
> >  Paul Maas
> >  paulmaas at airpost.net
> >
> > -- 
> > http://www.fastmail.fm - I mean, what is it about a decent email service?
> >
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d at lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
-- 
  Paul Maas
  paulmaas at airpost.net

-- 
http://www.fastmail.fm - Same, same, but different


From sly at victoria.tc.ca  Sun May 25 15:08:21 2008
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun, 25 May 2008 15:08:21 -0700 (PDT)
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <1211740921.8444.1254998623@webmail.messagingengine.com>
References: <20080524090013.GA17892@investigative.net>
	<6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
	<48392D2F.6070509@baechler.net>
	<1211740921.8444.1254998623@webmail.messagingengine.com>
Message-ID: <Pine.GSO.4.58.0805251500520.12447@vtn1.victoria.tc.ca>


On Sun, 25 May 2008, Paul Maas wrote:

> Maybe PG needs to recast itself as a text archive.  A sort of text
> "commons", if you will.  PG could allow multiple submissions from
> different projects for a particular book title, and let the reader
> pick the one they'd prefer to read.  Even if PG now views itself
> this way, the perception is still "PG has one core text per book."

I'm curious--where do you think that perception comes from?
I would suspect it is more a matter of the general public's
conception that a "book"  just has one single state of being.
Back as far as the early 90s PG had two different texts of
Paradise Lost. And not long after that we of course had
different versions of Shakespeare plays and a few other things.

Andrew

From gbnewby at pglaf.org  Sun May 25 15:14:42 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun, 25 May 2008 15:14:42 -0700
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <1211740921.8444.1254998623@webmail.messagingengine.com>
References: <20080524090013.GA17892@investigative.net>
	<6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
	<48392D2F.6070509@baechler.net>
	<1211740921.8444.1254998623@webmail.messagingengine.com>
Message-ID: <20080525221441.GA14738@mail.pglaf.org>

On Sun, May 25, 2008 at 11:42:01AM -0700, Paul Maas wrote:
> Maybe PG needs to recast itself as a text archive.  A sort of text
> "commons", if you will.  PG could allow multiple submissions from
> different projects for a particular book title, and let the reader
> pick the one they'd prefer to read.  Even if PG now views itself
> this way, the perception is still "PG has one core text per book."

That's the sort of thing I'm proposing, as a spin-off.  Sorry
I didn't get my ideas typed up last night, but I'll try to
get them today because I really want people's ideas & energies.
[And have no intention of trying to detract, take over or
dis other efforts.]  I'm mostly looking to enable people who
want to insert a variation of some sort, whether it's formatting,
additional content [like images] or different ideas about the
types of correction/standardization to be applied [such as whether
to fix errors in the source text].

But as far as "multiple submissions from different projects
for a particular book title," that happens all the time.  What
we strive for is WHEN the sources & eBooks produced are very
similar, to create a single core text that has all the best
features.  If the only reason not do do that is the extra
work involved, well, we might hold on the later eBook until
someone has time to do the merger.  But this event doesn't happen
that often.

If they are really different, even though the same title,
then we produce different eBooks.  This, also, doesn't happen
that frequently.

The thing is that most volunteers prefer to work on eBooks
that are not already in the collection.  So, such collisions of
intent are relatively rare.

> So what is PG? What should PG be?

You can start with the opinion pieces Michael [and I, to some
extent] wrote.  They're in the About area of gutenberg.org
and are called FAQs, but they're not in the FAQ area.
  -- Greg

> On Sun, 25 May 2008 02:11:11 -0700, "Tony Baechler" <tb at baechler.net>
> said:
> > David Starner wrote:
> > > On Sat, May 24, 2008 at 5:00 AM, Tony Baechler <tb at baechler.net> wrote:
> > >   
> > >> I do have a question for all of you who frequently point out problems
> > >> with PG and how Michael does things.  Why do you stick with PG year
> > >> after year?
> > >>     
> > >
> > > Because PG is the source for ebooks; if I started posting somewhere
> > > else, I'd lose half my audience. And Michael is no longer the end-all
> > > and be-all of PG; I feel perfectly fine ignoring him and being part of
> > > PG.
> > >
> > >   
> > Yes, but what about the Internet Archive?  You could submit texts to 
> > them just as easily.  With a little funding, you could advertise on 
> > Google and other places to bring in an audience.  There are many sites 
> > which use PG files, often without credit.  Search for any popular public 
> > domain book and you'll find a bunch.
> > >> I can get
> > >> the last laugh because it's all public domain anyway and the books will
> > >> still eventually make their way to PG, albeit probably not by the person
> > >> who created the DP competitors or their own versions of PG.
> > >>     
> > >
> > > So you get the last laugh because you're wasting effort? Do you know
> > > that "The Brothers Karamazov" is just now going through DP? Do you
> > > have any idea how much material is out that PG doesn't have, and
> > > probably won't have for decades, if ever? Look at
> > > <http://www.sacred-texts.com/> and see just how much material has been
> > > out there for years that we've never touched. It's surely not going to
> > > speed up if you drive people away.
> > > __
> > 
> > Yes, I think I'm aware of how much PG doesn't have.  Look at Google 
> > books, the Library of Congress, IA, etc.  That doesn't count libraries 
> > in other countries with non-English texts.  The thing is that I can 
> > wait.  If I just couldn't wait, either I would buy a used reprint and 
> > scan it myself or would set up any of several free OCR packages and 
> > process the already available page images.  You somewhat misunderstand 
> > me though.  I'm not saying that people would be driven away.  I'm saying 
> > that if we have a bunch of DP spinoffs and DP-like competition going on, 
> > essentially twice or three times more books can be produced.  Even if 
> > those DP spinoffs don't post to PG, at least high quality ebooks would 
> > be available for PG harvesting.  If they all did post to PG, instead of 
> > however many books DP currently produces per day, multiply that by two, 
> > three, ten, etc, depending on how many organizations there are and how 
> > well they all produce.  The resources are out there for those who want 
> > to tap them.  Regarding losing your audience, all I can say is that 
> > anyone can put up web sites and anyone can find them with good search 
> > engines.  Between the PG newsletter (when one is actually posted), the 
> > IA forums, Google itself, and the many other book sites and newsgroups 
> > out there, I really don't see how you could say that you would be 
> > driving your audience away.  As with all new projects it would start 
> > small as one could expect, but it could grow over time as PG Australia 
> > has.  Personally, I think more of the free software people should be 
> > targetted.  I see a paralel between making and producing free, GPL 
> > software and making and producing public domain ebooks.  Oh, there's 
> > Creative Commons also.  While CC is mostly interested in their own 
> > licenses, they also could push for more public domain ebooks.
> > 
> > I don't think and never meant to imply that people's efforts should be 
> > ignored, duplicated, etc.  I don't believe in wasting effort anymore 
> > than most people.
> >   I don't want to drive people away either.  I'm just saying that with a 
> > small amount of effort to start separate projects, more could be 
> > produced, not less.  One could still post anything they produce to PG 
> > but have multiple DPs doing the actual work.  DP already has a 
> > harvesting page dealing with IA and some others.  Split those efforts 
> > into new organizations with more proofers and, if the founder of such 
> > splits feels strongly enough, higher or different standards.  This is 
> > similar to what DP and DP Europe have done.  They both produce, they 
> > both post to PG, and some people proof for both even though they are 
> > separate and deal with different areas.  I would like to see a DP 
> > Australia, or even DPs not specific to any country.
> > 
> > Likewise, their could be PG spinoffs with their own standards.  Let's 
> > make up a PG spinoff which will only post html and pdf and won't accept 
> > plain text.  Well, since DP would still be posting to PG, they would of 
> > course still be producing plain text.  However, they could post their 
> > special pdf and html to the PG spinoff.  This way we have multiple DP 
> > organizations, we still have a central PG which accepts everything, and 
> > there would be one or many PG spinoffs which would take only page 
> > images, only html, only pdf, only some other format, some combination of 
> > the above, etc.  If the sites are developed correctly with appropriate 
> > keywords, I as a scholar could find some obscure text with complete page 
> > images and a nicely formatted document, while an average reader could 
> > find the basic plain text.  As I said previously, I would still get the 
> > last laugh because all of the DPs and PGs would eventually be posted or 
> > harvested to the original PG one of these days and I can wait until that 
> > happens.
> > 
> > 
> > Please don't send email to me off-list as all non-list email gets 
> > automatically deleted.
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d at lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> -- 
>   Paul Maas
>   paulmaas at airpost.net
> 
> -- 
> http://www.fastmail.fm - I mean, what is it about a decent email service?

From gbnewby at pglaf.org  Mon May 26 00:56:33 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon, 26 May 2008 00:56:33 -0700
Subject: [gutvol-d] OpenGutenberg input sought
Message-ID: <20080526075633.GA22752@mail.pglaf.org>

As promised, here's some ideas for a semi-centralized
service that might meet some of the needs I've heard
expressed recently.  I have some interns starting work
with me over the upcoming weeks, and had already given
thought to such a service...so, the recent debate on 
gutvol-d helped to crystalize some of my thinking.

In case it's not completely obvious, this is yet another
"let 1,000 flowers bloom" approach.  It's not the solution
for those who want single standards enforced.  Rather, it's
a somewhat more open location for new content, variations on
content, and meta-content.  While all of these things can,
and do, find their way to gutenberg.org, I'm looking something
that is somewhat easier for individual contributors, with less
of a hierarcy.  And commensurately less quality control,
though mitigated by some community-based systems and various
automation.

The outline below is kind of general...in practical terms,
I'm looking to start by putting the PG content into Trac
with Subversion -- which adds Wiki & bug tracking & change
tracking functionality.  People with ideas are welcome to
contribute them [if there are enough people or enough traffic,
we can redirect to gutvol-p or start a new mailing list].


The outline

OpenGutenberg

Community Contributions for the Improvement of Public Domain EBooks

Proposal outline May 25 2008 by gbn


Purpose: Creation of user-friendly and open opportunities for adding
value to Project Gutenberg content through enhanced versions, new
formats, and community contributions such as book reviews and author
biographies.  All content will be open and editable.  Design choices
will make it as easy as possible to propagate enhancements back to
the "main" Project Gutenberg collection. 


Features/desires.  Items marked with an asterisk [*] will be
two-way coupled with the main gutenberg.org collection.

Implementation of the Project Gutenberg catalog *

Fully tracked change management for eBooks and other content *
  Including the ability to upload new formats, new files, new metadata

EBook tracking & management through Subversion or similar system.

Per-book bug reporting and feature requests *

Search features

RSS or similar notification & subscription features

Per book:
  WIKI [editable]
  Forks, for things like irreconcilable or orphaned versions, biography disputes


Community functions:
  Reputation-based functions
  Automatic book recommendation
  Bookmarks, shareable *
  Book reviews *


For later implementation:
  GutenbergSelf: Self-publishing of items


General policies:
- As much as possible based on open standards & open software

- As much as possible transparent, especially via text files stored
alongside their eBooks [for example, a browsable Subversion repository,
with metadata in XML files]

- Fully mirrorable in full or in part

- Authorization/authentication required to contribute, edit, etc., but
not necessarily strongly verified [similar to Wikipedia]

- All content must be submitted under an open license or granted to
the public domain


From schultzk at uni-trier.de  Mon May 26 00:51:47 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Mon, 26 May 2008 09:51:47 +0200
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <Pine.LNX.4.64.0805231101400.7549@pglaf.org>
References: <d02.3357a5f6.3564b8cc@aol.com> <48348A46.3000609@novomail.net>
	<Pine.LNX.4.64.0805221137310.16988@pglaf.org>
	<E86C3D72-721F-4DB4-8FED-F53DAF317C3E@uni-trier.de>
	<Pine.LNX.4.64.0805231101400.7549@pglaf.org>
Message-ID: <C534A209-6961-4AAA-ADDB-8C88E9132827@uni-trier.de>


Am 23.05.2008 um 20:16 schrieb Michael Hart:
>
> On Fri, 23 May 2008, Schultz Keith J. wrote:
>
>> Hi Micheal, hi PG, hi DP,
>>
>>
>> 	Yes, you allow any willing to do things, yet
>> 	you refuse to instate stricter standards so that
>> 	consolidation is possible.
>
> Not promoting someone to an official "czar of standards"
> position is hardly the same as opposing their efforts.
>
> Anyone who wants to "consolidate" should have to do the
> work of "consilidation" themselves and with volunteer help,
> but NOT by a fiat standard being imposed officially.
>
> Project Gutenberg should be an open standard to as much of
> a degree as possible. . .not completely. . .but close.
	I agree fully, yet PG has NO standard in the true
	sense. All, it has is a rough convention.

>
> However, there has never been ANY objection to the standards
> DP has imposed, or any other group imposes on themselves and
> we are only to glad to help them PROMOTE those standards for
> any and all to adopt. . .but FORCING those standards:  NO!!!
	Just as mentioned above there are no STANDARDS.

>
> The people who say we RESIST THEIR STANDARDS are seriously--
> perhaps even intentionlly--misrepresenting the situtation.
	I have NOT proposed a particular standard as such,
	but have repeatedly attempted to gain a consensus
	for the need and iterated ideas for standard which
	would solve some problems.

>> 	Yes, humans can handle the inconsitencies, but
>> 	E-texts, E-books, and PG is handle by computers.
>
> I guess it all depends on what systems you are lookin at.
>
> However, I repeat, Project Gutenberg is NOT a Xerox machine.
	Who wants that? DP is trying! Besides then all
	I would need would be scans and PDF (or what ever
	one seems fit to to distrube and view the scans)!

>
> This was never meant to be a completely automated process,
> from end to end, which leaves room for humen intervention,
> either to create those standards you say you want but will
> not actually do the work for, or by those who will resist,
> and create or maintain other standards.
	The process will never be completely automated.
	Yet, most of the mundane work can be done by
	computers. The standards make it easier to program.
	
>
> It seems as if you had your way, every paper book would've
> been require to be the same height for standard shelving--
> a great idea for mass-production of library or bookstore's
> shelving, but somehow it just has never taken hold.
	Come on Micheal. You know better. XML can hold
	about anything in every shape and size. Yet! It does
	require strict formats. Just a format for the format!!

>
> Why not?
>
> Yes, once "everybody is on board" as you say below, things
> could be much better, but you aren't even trying to get an
> even early population on board to try things out.
	I have tried. Yet, I do not find a following, big
	enough. I do not have the time for the project
	the size I am thinking about. Though I have found
	several that agree with me in general.

>
> I believe in feasibility studies.
	Was it feasible to go to the Moon, Mars Saturn?
	Was it feasible for Columbus to sail the Atlantic?
	Was it feasible to start Yahoo, progate HTML, UNIX?
	At first was the IDEA and it caught on.
>
>
> Why?
>
> Because I have learned from experience.
>
> Feasibility studies give that experience a home to start.
>
> If you are unwilling to start,
> you are unwilling to finish.
>
> So many complaints by people their idea is not completed,
> when they have never even really started.
>
> Get started!!!
>
> "Build it. . .and they will come!"
	If I build it alone. PG will not get it.
	It will be a better mouse trap.
>
> Don't build it. . .and they can't come, can they?
>
>
> So many people want everyone on board the same train,
> they they refuse to even lay the first file of track,
> build the first stations, or be the little engine....
>
> that could. . . .
>
	That is the problem PG has it waits for a track
	that goes in one direction and one track.
	I believe in being able to move in all
	directions. PG can fly. Yet, inorder for it
	to fly you need different and more efficient
	standards.


	Maybe, I can reach a different way. I am a
	advocate of engineering. Houses and machine
	were built by the average Joe/Jane. These where
	grossly over sized and more than often not the
	most efficient, simply because they are based on
	trial and err knowledge. Today, we can build better
	houses and machines not only that are better and
	more effiecient, but also in a more efficient way.
	
	Why, because there are standards which help.

	There are many good ideas out there, but without
	more exact standards and somebody to bring the
	together and to compromise, it just will not get
	built, because it IS NOT FEASIBLE for the individuals.

	As others have posted here, there is far to much waste
	of valuable resources in PG and DP.

	regards
		Keith.


From schultzk at uni-trier.de  Mon May 26 01:02:42 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Mon, 26 May 2008 10:02:42 +0200
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <20080524090013.GA17892@investigative.net>
References: <20080524090013.GA17892@investigative.net>
Message-ID: <59FA1B38-9762-47FC-A766-6F0D4F1293A3@uni-trier.de>

Hi Tony,

Am 24.05.2008 um 11:00 schrieb Tony Baechler:

> Hello all,
>
> I rarely post here because of the bickering and general disagreements,
> but I have to comment on many of the recent posts pointing out  
> problems
> with PG and generally complaining on how things are done.  I won't
> comment on whether any of you are right or not.  As far as I'm
	To bad. All opinions are important.

> concerned, I don't care who's right as long as there are more  
> books.  As
> long as they're in plain text, that's good enough for me. :-)
> I do have a question for all of you who frequently point out problems
> with PG and how Michael does things.  Why do you stick with PG year
	As you so well pointed out PG is a good source for texts.

> after year?  Why not just abandon PG and start your own project?  Web
> hosting is incredibly cheap nowadays.  Even dedicated servers aren't
> THAT expensive.  Michael and Greg Newby have offered free server space
> many times to anyone who asks, but I'm assuming that you want to
	We do not want to create our own PG. We try to make it better.
	
> distance yourselves from PG.  So, again I ask.  Why bother?  If you
> think PG has poor standards, why not create your own?  OK, so you  
> would
> rather use the already existing PG ebooks and reformat them.  Fine,  
> get
> the PG DVD or take Greg's offer of free server space and build on  
> them.
> Various people have said that all ebooks must/should be in xml, tei,
> etc.  OK, so do what blackmask.com used to do and reformat them under
> your own domain and your own standards.  If you remove the PG name and
> small print, you don't even have to credit PG if I understand  
> correctly.
> What prevents you from jumping ship and doing your own thing?
	Such reformating from plain text is far to tediuos and
	the information needed was lost. DP is just plain to
	inconsistant.

	I do not believe i stealing. If I use PG material I will
	credit them. I respect PGs effort.
>

	regards
		Keith.

From tb at baechler.net  Mon May 26 02:55:24 2008
From: tb at baechler.net (Tony Baechler)
Date: Mon, 26 May 2008 02:55:24 -0700
Subject: [gutvol-d] OpenGutenberg input sought
In-Reply-To: <20080526075633.GA22752@mail.pglaf.org>
References: <20080526075633.GA22752@mail.pglaf.org>
Message-ID: <483A890C.9080200@baechler.net>

Greg Newby wrote:
> I'm looking to start by putting the PG content into Trac
> with Subversion -- which adds Wiki & bug tracking & change
> tracking functionality.  People with ideas are welcome to
> contribute them [if there are enough people or enough traffic,
> we can redirect to gutvol-p or start a new mailing list].
>
>   
By PG content are you meaning the gutenberg.org web pages, the ebooks or 
both?  I'm not a developer but would it really be practical to put 
25,000-plus ebooks in a repository in that way?  I could see the point 
regarding error corrections, but maybe it would be better to adopt a DP 
approach, where the older files undergo multiple proofing rounds or 
something.
> The outline
>
> OpenGutenberg
>
> Community Contributions for the Improvement of Public Domain EBooks
>
> Proposal outline May 25 2008 by gbn
>
>
> Purpose: Creation of user-friendly and open opportunities for adding
> value to Project Gutenberg content through enhanced versions, new
> formats, and community contributions such as book reviews and author
> biographies.  All content will be open and editable.  Design choices
> will make it as easy as possible to propagate enhancements back to
> the "main" Project Gutenberg collection. 
>
>   
Isn't that already part of the gutenberg.org wiki?  Look at the 
bookshelves, etc.  Anyone could create book review pages.  Why not just 
borrow from wikipedia where some of this has already been done?
> Features/desires.  Items marked with an asterisk [*] will be
> two-way coupled with the main gutenberg.org collection.
>
> Implementation of the Project Gutenberg catalog *
>
>   
Do you mean the catalog that already exists or are you talking about a 
new and different catalog?

> Fully tracked change management for eBooks and other content *
>   Including the ability to upload new formats, new files, new metadata
>
>   
I presume that plain text would still be available since this would be a 
separate site from gutenberg.org while still allowing page images, other 
formats, etc.  Is this correct?

> EBook tracking & management through Subversion or similar system.
>
>   
I think I understand subversion and similar, but could you elaborate on 
this?  Would this mean that, similar to software projects, that anyone 
could make and contribute their changes back to any ebook?  What if a 
volunteer doesn't want people changing their work?  What if someone 
purposely inserts errors or misinformation?  Who would oversee merging 
the new changes?  Would there be any review or quality control process?

> Per-book bug reporting and feature requests *
>
>   
Again, is this practical?  If you're offering the above, anyone can do a 
checkout of a particular book and fix errors themselves.  That means 
that the whitewashers or equivalents have vastly more work than they 
used to because they would have to "unfix" the errors which aren't 
really errors or were fixed incorrectly.  Going back to a previous 
point, what about proprietary formats?  It's good to say below that all 
content must be submitted with an open license, but what about pdf or 
"z.m.l" which can't easily be changed and manipulated?

> Search features
>
> RSS or similar notification & subscription features
>
>   
I thought PG had this already for new postings.

> Per book:
>   WIKI [editable]
>   Forks, for things like irreconcilable or orphaned versions, biography disputes
>
>   
Yes, but how is an average reader supposed to know which version to 
download?  Why not just create a new book with a different number 
(edition numbers?) as opposed to saying that because of disputes, there 
are four or five versions of book X with no clear idea of what the 
differences are.  Speaking for myself, most of the time I just want a 
plain text version that's readable and has the least number of errors.  
I don't want to and won't sort through five or ten different "forks" to 
find the "right" edition that I want.

>   Automatic book recommendation
>   
How would you implement this?  What criteria would determine what books 
are recommended?  What if I want to turn that off?

>   Bookmarks, shareable *
>   Book reviews *
>
>   
Why not make a dedicated book review site?  This way it isn't limited 
just to PG content.  By that, I mean that people having nothing to do 
with PG could still review public domain books and the name of PG 
wouldn't be directly linked to the book review site.
> For later implementation:
>   GutenbergSelf: Self-publishing of items
>
>   
I think this is a very bad idea.  This goes against what you and Michael 
have said in the past, particularly that you don't want to be a vanity 
press.  With copyrights being the way they are now, I think CC is better 
qualified to start such a project.  As I said before, anyone can get web 
space and publish whatever they want.  I'm not arguing for sensorship, 
but what about explicit material?  In many countries, it is not legal to 
post about certain explicit subjects.  Who determines what is published 
and what isn't and makes sure laws aren't broken?  What about non-books 
such as video and music?  How would one determine that a new book 
called, say, Berry Potter isn't really Harry Potter book 1 in disguise, 
thus violating copyright laws?  There are already quite a few 
self-publishing sites that I kno of, ourmedia.org and IA coming to mind.
> General policies:
> - As much as possible based on open standards & open software
>
> - As much as possible transparent, especially via text files stored
> alongside their eBooks [for example, a browsable Subversion repository,
> with metadata in XML files]
>
>   
What do you mean by this?  I know what metadata and subversion are but 
what do you mean by transparent?

> - Fully mirrorable in full or in part
>
>   
Via rsync, ftp, or what other protocols?  I would really like to see a 
fast rsync mirror.  ibiblio.org offers a rsync server but it isn't that 
fast for me.  Would there be nightly or hourly snapshots?  How large 
would the archive be?

> - Authorization/authentication required to contribute, edit, etc., but
> not necessarily strongly verified [similar to Wikipedia]
>
> - All content must be submitted under an open license or granted to
> the public domain
>   


From rburkey2005 at earthlink.net  Mon May 26 06:40:19 2008
From: rburkey2005 at earthlink.net (Ron Burkey)
Date: Mon, 26 May 2008 08:40:19 -0500
Subject: [gutvol-d] OpenGutenberg input sought
In-Reply-To: <20080526075633.GA22752@mail.pglaf.org>
References: <20080526075633.GA22752@mail.pglaf.org>
Message-ID: <483ABDC3.3030207@earthlink.net>

I think these are very good ideas, as far as they go, but I think they 
don't really address the issue of consistency in formatting of the 
etexts.  But it also seems to me that there are some minor additions to 
the outline that allow it to do so.

I would say that there are two basic schools of thought on the 
standards/consistency-of-thought issue:

   1. The "anything goes as long as there's a plain-text version" school
      of thought; and
   2. The "I want an across-the-board consistent method of preparing the
      texts so that essential data such as italics, underlining,
      headings, etc. are preserved" school of thought.

I would characterize Michael Hart as being in the former school of 
though.  I am in the latter school of thought, but I don't want to 
arbitrarily throw away etexts that don't meet a standard method of 
preparation.

So here's my proposed addition to Greg's outline:  There would *be* a 
standard method of markup, but the content management system would have 
two levels.  For lack of better terminilogy, there would be a "RAW" 
level, which contained the vanilla-ASCII version, and a "FINISHED" 
level, which contained the marked-up versions.  A text would start in 
the "RAW" repository, and corrections to it would be made there for a 
time.  But eventually, it might graduate into the "FINISHED" repository 
when the standard markup was available; at that point, edits to the RAW 
version would be locked out, and all future corrections would need to be 
made to the FINISHED version.  Furthermore, there would be some (open 
source) software, licensed to be permanently availabe (or perhaps 
public-domain) that could run on any platform that could convert the 
FINISHED form to the RAW form, so that vanilla-ASCII was always available.

I couldn't care less what the FINISHED format is --- whether HTML, or 
XML, or Bowerbird's "no markup" thing --- as long as there was *some* 
standard.  The point is that it matters that the standard is official, 
so that it is clear whether it is the RAW or the FINISHED version which 
is eligible for corrections.

-- Ron Burkey


Greg Newby wrote:

>The outline
>
>OpenGutenberg
>
>Community Contributions for the Improvement of Public Domain EBooks
>
>Proposal outline May 25 2008 by gbn
>
>
>Purpose: Creation of user-friendly and open opportunities for adding
>value to Project Gutenberg content through enhanced versions, new
>formats, and community contributions such as book reviews and author
>biographies.  All content will be open and editable.  Design choices
>will make it as easy as possible to propagate enhancements back to
>the "main" Project Gutenberg collection. 
>
>
>Features/desires.  Items marked with an asterisk [*] will be
>two-way coupled with the main gutenberg.org collection.
>
>Implementation of the Project Gutenberg catalog *
>
>Fully tracked change management for eBooks and other content *
>  Including the ability to upload new formats, new files, new metadata
>
>EBook tracking & management through Subversion or similar system.
>
>Per-book bug reporting and feature requests *
>
>Search features
>
>RSS or similar notification & subscription features
>
>Per book:
>  WIKI [editable]
>  Forks, for things like irreconcilable or orphaned versions, biography disputes
>
>
>Community functions:
>  Reputation-based functions
>  Automatic book recommendation
>  Bookmarks, shareable *
>  Book reviews *
>
>
>For later implementation:
>  GutenbergSelf: Self-publishing of items
>
>
>General policies:
>- As much as possible based on open standards & open software
>
>- As much as possible transparent, especially via text files stored
>alongside their eBooks [for example, a browsable Subversion repository,
>with metadata in XML files]
>
>- Fully mirrorable in full or in part
>
>- Authorization/authentication required to contribute, edit, etc., but
>not necessarily strongly verified [similar to Wikipedia]
>
>- All content must be submitted under an open license or granted to
>the public domain
>
>  
>


From julio.reis at tintazul.com.pt  Mon May 26 06:43:06 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Mon, 26 May 2008 14:43:06 +0100
Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 30
In-Reply-To: <mailman.2277.1211753304.2809.gutvol-d@lists.pglaf.org>
References: <mailman.2277.1211753304.2809.gutvol-d@lists.pglaf.org>
Message-ID: <1211809386.6590.99.camel@abetarda>

Mr. Maas wrote,

>         PG could allow multiple submissions from
>         different projects for a particular book title, and let the
>         reader
>         pick the one they'd prefer to read.  Even if PG now views
>         itself
>         this way, the perception is still "PG has one core text per
>         book."

Then Mr. Hart replied,

>         We have always allow "multiple submissions" for each book.
>         Right back to the very beginning with Roget and Paradise Lost.

Perhaps a different database structure would help? Right now all
different versions of a book are treated as different works. Not
allowing multiple submissions would be a no-no I think, but these should
clearly be tagged multiple submissions. When I search for 'Paradise
Lost' I should get a single result, with all variants included.

 . . .

More thoughts on 'What should PG be?': the answer could be 'a repository
of the written word' -- could be spoken or filmed renditions of the
written word, but I'd clearly leave out an animation of the Earth's
globe for instance.

PG could look more like a repository by (randomly?) rotating some books
to the front page. Home page looks a bit geeky right now, which works
for the geek in me but not for the bookworm inside. It static, too long,
not very appealing.

We could ask for more involvement -- instead of just asking for money,
could we ask for submissions a bit more clearly: donate your paper book,
donate your type-in, whatever.

Could be more international, too: not just mentioning US copyright
freedom but other jurisdictions as well. Most of the world is life+50 or
life+70, right? Since we have death dates for most authors, how about
showing whether a book is free in these? Not much work I think.

J?lio.


From richfield at telkomsa.net  Sun May 25 09:54:03 2008
From: richfield at telkomsa.net (Jon Richfield)
Date: Sun, 25 May 2008 18:54:03 +0200
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions
Message-ID: <483999AB.6050503@telkomsa.net>

I won't favour that question with a substantial reply, but here are a 
few remarks.  Firstly, PG could come to a tooth-rattling dead-end 
tomorrow and MH still would have earned his niche in the temple of human 
benefactors.  It is likely to be an esoteric niche unfortunately, 
because people who care about books are in the minority, and those who 
care about books of yesteryear are in the vanishing minority, but if we 
all made our service to humanity conditional on achieving our own 
notoriety, we still would be biting ticks that had bitten us, while 
supplementing our diet by hunting with unworked rocks for tardy or 
retarded lizards. 

Secondly, there are a lot of books already in PG, so it is a valuable 
source, and a rewarding site for publishing one's own scans.  The 
scanned works that I publish elsewhere are the ones that are not yet 
free of the copyright constraints observed by PG.  (So far I have 
patronised PG AU mostly, but I am now working on some material for 
Sciencemadness as well.) 

As for format, while I agree that when TXT will do it does have serious 
advantages, I routinely download HTM and also prepare HTM for upload.  
HTM is the next best thing to text for simple reading and compact 
format; it is well supported by software (Firefox does nicely for 
reading, thank you) and some works are nicely illustrated, sometimes 
even with artwork that constructively complements the text.   Also, much 
of what I scan is technical, so the graphics are not really 
dispensable.  I fully understand that there are alternatives for 
illustrating TXT files, but I find such measures less rewarding and more 
trouble.  What with the availability of support tools such as Tidy, I 
see no reason to stint myself.  I do usually supply TXT versions as 
well, but then it is up to the text-zealots to do as they please about 
missing pictures. 

=============

Now, while I m at it, a couple of requests and a confession in sackcloth 
and blushes.  (Ashes and blushes proved too revealing.)  Since my last 
significant exchange I made a foul-up and lost all my recent email 
archives.  I don't like to brag, but I did a good job of it.  Among the 
things I lost was the identity of the friend who requested that I send 
him some of the Gothic script from German pre-war books for training 
scanners. Please make a noise and I'll get onto it.  

Another question is where to check on the fate of a book I submitted 
some months ago: "Practical Taxidermy"  By Brown.  Does this ring bells 
with anyone?  If not, should I re-submit  it? 

Sorry to pester,

Jon


From ajhaines at shaw.ca  Mon May 26 11:22:23 2008
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Mon, 26 May 2008 11:22:23 -0700
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions
References: <483999AB.6050503@telkomsa.net>
Message-ID: <000301c8bf5d$71a93f00$6601a8c0@ahainesp2400>

Re: 

> Another question is where to check on the fate of a book I submitted 
> some months ago: "Practical Taxidermy"  By Brown.  Does this ring bells 
> with anyone?  If not, should I re-submit  it? 

No bells here, but then I'm a fairly new WWer.  I'll check with the others.

Al


----- Original Message ----- 
From: "Jon Richfield" <richfield at telkomsa.net>
To: <gutvol-d at lists.pglaf.org>
Sent: Sunday, May 25, 2008 9:54 AM
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions


>I won't favour that question with a substantial reply, but here are a 
> few remarks.  Firstly, PG could come to a tooth-rattling dead-end 
> tomorrow and MH still would have earned his niche in the temple of human 
> benefactors.  It is likely to be an esoteric niche unfortunately, 
> because people who care about books are in the minority, and those who 
> care about books of yesteryear are in the vanishing minority, but if we 
> all made our service to humanity conditional on achieving our own 
> notoriety, we still would be biting ticks that had bitten us, while 
> supplementing our diet by hunting with unworked rocks for tardy or 
> retarded lizards. 
> 
> Secondly, there are a lot of books already in PG, so it is a valuable 
> source, and a rewarding site for publishing one's own scans.  The 
> scanned works that I publish elsewhere are the ones that are not yet 
> free of the copyright constraints observed by PG.  (So far I have 
> patronised PG AU mostly, but I am now working on some material for 
> Sciencemadness as well.) 
> 
> As for format, while I agree that when TXT will do it does have serious 
> advantages, I routinely download HTM and also prepare HTM for upload.  
> HTM is the next best thing to text for simple reading and compact 
> format; it is well supported by software (Firefox does nicely for 
> reading, thank you) and some works are nicely illustrated, sometimes 
> even with artwork that constructively complements the text.   Also, much 
> of what I scan is technical, so the graphics are not really 
> dispensable.  I fully understand that there are alternatives for 
> illustrating TXT files, but I find such measures less rewarding and more 
> trouble.  What with the availability of support tools such as Tidy, I 
> see no reason to stint myself.  I do usually supply TXT versions as 
> well, but then it is up to the text-zealots to do as they please about 
> missing pictures. 
> 
> =============
> 
> Now, while I m at it, a couple of requests and a confession in sackcloth 
> and blushes.  (Ashes and blushes proved too revealing.)  Since my last 
> significant exchange I made a foul-up and lost all my recent email 
> archives.  I don't like to brag, but I did a good job of it.  Among the 
> things I lost was the identity of the friend who requested that I send 
> him some of the Gothic script from German pre-war books for training 
> scanners. Please make a noise and I'll get onto it.  
> 
> Another question is where to check on the fate of a book I submitted 
> some months ago: "Practical Taxidermy"  By Brown.  Does this ring bells 
> with anyone?  If not, should I re-submit  it? 
> 
> Sorry to pester,
> 
> Jon
> 
> 
> 
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From hart at pglaf.org  Mon May 26 11:38:53 2008
From: hart at pglaf.org (Michael Hart)
Date: Mon, 26 May 2008 11:38:53 -0700 (PDT)
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <Pine.GSO.4.58.0805251500520.12447@vtn1.victoria.tc.ca>
References: <20080524090013.GA17892@investigative.net>
	<6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
	<48392D2F.6070509@baechler.net>
	<1211740921.8444.1254998623@webmail.messagingengine.com>
	<Pine.GSO.4.58.0805251500520.12447@vtn1.victoria.tc.ca>
Message-ID: <Pine.LNX.4.64.0805261102560.1879@pglaf.org>


On Sun, 25 May 2008, Andrew Sly wrote:

>
>
> On Sun, 25 May 2008, Paul Maas wrote:
>
>> Maybe PG needs to recast itself as a text archive.  A sort of text
>> "commons", if you will.  PG could allow multiple submissions from
>> different projects for a particular book title, and let the reader
>> pick the one they'd prefer to read.  Even if PG now views itself
>> this way, the perception is still "PG has one core text per book."
>
> I'm curious--where do you think that perception comes from?
> I would suspect it is more a matter of the general public's
> conception that a "book"  just has one single state of being.

Well, I wouldn't want to presume that our readers don't know
there are many different editions of Shakespeare and Milton,
or to insist on only only edition.


> Back as far as the early 90s PG had two different texts of
> Paradise Lost. And not long after that we of course had
> different versions of Shakespeare plays and a few other things.

I had always presumed we would do dozens of editions of the
great classics, all of them. . . .

Michael


>
> Andrew
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Mon May 26 11:53:08 2008
From: hart at pglaf.org (Michael Hart)
Date: Mon, 26 May 2008 11:53:08 -0700 (PDT)
Subject: [gutvol-d] Why stay with PG?
In-Reply-To: <1211753229.9309.1255015053@webmail.messagingengine.com>
References: <20080524090013.GA17892@investigative.net>
	<6d99d1fd0805240356k64e7ccd5m3caa43a55aec666d@mail.gmail.com>
	<48392D2F.6070509@baechler.net>
	<1211740921.8444.1254998623@webmail.messagingengine.com>
	<Pine.LNX.4.64.0805251337360.12767@pglaf.org>
	<1211753229.9309.1255015053@webmail.messagingengine.com>
Message-ID: <Pine.LNX.4.64.0805261143320.1879@pglaf.org>


On Sun, 25 May 2008, Paul Maas wrote:

> That's good to hear from your lips, Michael?
>
> So you agree then that the PG Archive is simply a "text"
> commons where individuals and organizations can place their
> transcribed texts to share with others?

Yes.

We even allow people to insist their eBooks be posted with
no alterations whatsover, although we can't guarantee from
now into the future that no one will ever change it.


> If so, then you are right.  There are no issues.  PG should
> issue NO requirements other than the submitted text is
> transcribed from a public domain printing, and maybe that a
> plain text version is submitted along with whatever other
> formats the submitter wishes to donate.

Even the "standards" messages I sent out for years said right
up front said these were just "suggestions". . . .


> "Give me your tired, your poor,
> Your digital texts yearning to breathe free,........."

Absolutely!


> But then, as I think of it, why not just move over and
> merge with the Internet Archive?  This will solve a lot
> of problems.  Why should PG continue as an independent
> entity apart from TIA?  What purpose does PG play any
> more?  Hasn't it already fulfilled its mission?

Most or all of these operations have their own agendas,
that would allow us to work with their books, but not be
an actual part of their operations.

However, we work with the IA very closely, and others.

Thanks!!!

Michael


>
>
> On Sun, 25 May 2008 13:39:12 -0700 (PDT), "Michael Hart"
> <hart at pglaf.org> said:
>>
>> On Sun, 25 May 2008, Paul Maas wrote:
>>
>>> Maybe PG needs to recast itself as a text archive.  A sort of text
>>> "commons", if you will.  PG could allow multiple submissions from
>>> different projects for a particular book title, and let the reader
>>> pick the one they'd prefer to read.  Even if PG now views itself
>>> this way, the perception is still "PG has one core text per book."
>>>
>>> So what is PG? What should PG be?
>>
>> We have always allow "multiple submissions" for each book.
>>
>> Right back to the very beginning with Roget and Paradise Lost.
>>
>> No one ever seemed to have an objection.
>>
>> mh
>>
>>>
>>>
>>> On Sun, 25 May 2008 02:11:11 -0700, "Tony Baechler" <tb at baechler.net>
>>> said:
>>>> David Starner wrote:
>>>>> On Sat, May 24, 2008 at 5:00 AM, Tony Baechler <tb at baechler.net> wrote:
>>>>>
>>>>>> I do have a question for all of you who frequently point out problems
>>>>>> with PG and how Michael does things.  Why do you stick with PG year
>>>>>> after year?
>>>>>>
>>>>>
>>>>> Because PG is the source for ebooks; if I started posting somewhere
>>>>> else, I'd lose half my audience. And Michael is no longer the end-all
>>>>> and be-all of PG; I feel perfectly fine ignoring him and being part of
>>>>> PG.
>>>>>
>>>>>
>>>> Yes, but what about the Internet Archive?  You could submit texts to
>>>> them just as easily.  With a little funding, you could advertise on
>>>> Google and other places to bring in an audience.  There are many sites
>>>> which use PG files, often without credit.  Search for any popular public
>>>> domain book and you'll find a bunch.
>>>>>> I can get
>>>>>> the last laugh because it's all public domain anyway and the books will
>>>>>> still eventually make their way to PG, albeit probably not by the person
>>>>>> who created the DP competitors or their own versions of PG.
>>>>>>
>>>>>
>>>>> So you get the last laugh because you're wasting effort? Do you know
>>>>> that "The Brothers Karamazov" is just now going through DP? Do you
>>>>> have any idea how much material is out that PG doesn't have, and
>>>>> probably won't have for decades, if ever? Look at
>>>>> <http://www.sacred-texts.com/> and see just how much material has been
>>>>> out there for years that we've never touched. It's surely not going to
>>>>> speed up if you drive people away.
>>>>> __
>>>>
>>>> Yes, I think I'm aware of how much PG doesn't have.  Look at Google
>>>> books, the Library of Congress, IA, etc.  That doesn't count libraries
>>>> in other countries with non-English texts.  The thing is that I can
>>>> wait.  If I just couldn't wait, either I would buy a used reprint and
>>>> scan it myself or would set up any of several free OCR packages and
>>>> process the already available page images.  You somewhat misunderstand
>>>> me though.  I'm not saying that people would be driven away.  I'm saying
>>>> that if we have a bunch of DP spinoffs and DP-like competition going on,
>>>> essentially twice or three times more books can be produced.  Even if
>>>> those DP spinoffs don't post to PG, at least high quality ebooks would
>>>> be available for PG harvesting.  If they all did post to PG, instead of
>>>> however many books DP currently produces per day, multiply that by two,
>>>> three, ten, etc, depending on how many organizations there are and how
>>>> well they all produce.  The resources are out there for those who want
>>>> to tap them.  Regarding losing your audience, all I can say is that
>>>> anyone can put up web sites and anyone can find them with good search
>>>> engines.  Between the PG newsletter (when one is actually posted), the
>>>> IA forums, Google itself, and the many other book sites and newsgroups
>>>> out there, I really don't see how you could say that you would be
>>>> driving your audience away.  As with all new projects it would start
>>>> small as one could expect, but it could grow over time as PG Australia
>>>> has.  Personally, I think more of the free software people should be
>>>> targetted.  I see a paralel between making and producing free, GPL
>>>> software and making and producing public domain ebooks.  Oh, there's
>>>> Creative Commons also.  While CC is mostly interested in their own
>>>> licenses, they also could push for more public domain ebooks.
>>>>
>>>> I don't think and never meant to imply that people's efforts should be
>>>> ignored, duplicated, etc.  I don't believe in wasting effort anymore
>>>> than most people.
>>>>   I don't want to drive people away either.  I'm just saying that with a
>>>> small amount of effort to start separate projects, more could be
>>>> produced, not less.  One could still post anything they produce to PG
>>>> but have multiple DPs doing the actual work.  DP already has a
>>>> harvesting page dealing with IA and some others.  Split those efforts
>>>> into new organizations with more proofers and, if the founder of such
>>>> splits feels strongly enough, higher or different standards.  This is
>>>> similar to what DP and DP Europe have done.  They both produce, they
>>>> both post to PG, and some people proof for both even though they are
>>>> separate and deal with different areas.  I would like to see a DP
>>>> Australia, or even DPs not specific to any country.
>>>>
>>>> Likewise, their could be PG spinoffs with their own standards.  Let's
>>>> make up a PG spinoff which will only post html and pdf and won't accept
>>>> plain text.  Well, since DP would still be posting to PG, they would of
>>>> course still be producing plain text.  However, they could post their
>>>> special pdf and html to the PG spinoff.  This way we have multiple DP
>>>> organizations, we still have a central PG which accepts everything, and
>>>> there would be one or many PG spinoffs which would take only page
>>>> images, only html, only pdf, only some other format, some combination of
>>>> the above, etc.  If the sites are developed correctly with appropriate
>>>> keywords, I as a scholar could find some obscure text with complete page
>>>> images and a nicely formatted document, while an average reader could
>>>> find the basic plain text.  As I said previously, I would still get the
>>>> last laugh because all of the DPs and PGs would eventually be posted or
>>>> harvested to the original PG one of these days and I can wait until that
>>>> happens.
>>>>
>>>>
>>>> Please don't send email to me off-list as all non-list email gets
>>>> automatically deleted.
>>>> _______________________________________________
>>>> gutvol-d mailing list
>>>> gutvol-d at lists.pglaf.org
>>>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>> --
>>>  Paul Maas
>>>  paulmaas at airpost.net
>>>
>>> --
>>> http://www.fastmail.fm - I mean, what is it about a decent email service?
>>>
>>> _______________________________________________
>>> gutvol-d mailing list
>>> gutvol-d at lists.pglaf.org
>>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d at lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> --
>  Paul Maas
>  paulmaas at airpost.net
>
> -- 
> http://www.fastmail.fm - Same, same, but different

>

From gbnewby at pglaf.org  Mon May 26 12:49:23 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon, 26 May 2008 12:49:23 -0700
Subject: [gutvol-d] OpenGutenberg input sought
In-Reply-To: <483A890C.9080200@baechler.net>
References: <20080526075633.GA22752@mail.pglaf.org>
	<483A890C.9080200@baechler.net>
Message-ID: <20080526194923.GB4492@mail.pglaf.org>

On Mon, May 26, 2008 at 02:55:24AM -0700, Tony Baechler wrote:
> Greg Newby wrote:
> > I'm looking to start by putting the PG content into Trac
> > with Subversion -- which adds Wiki & bug tracking & change
> > tracking functionality.  People with ideas are welcome to
> > contribute them [if there are enough people or enough traffic,
> > we can redirect to gutvol-p or start a new mailing list].
> >
> >   
> By PG content are you meaning the gutenberg.org web pages, the ebooks or 
> both?  I'm not a developer but would it really be practical to put 
> 25,000-plus ebooks in a repository in that way?  I could see the point 
> regarding error corrections, but maybe it would be better to adopt a DP 
> approach, where the older files undergo multiple proofing rounds or 
> something.

Not the Web pages, the 25,000-plus eBooks.  Yes, it's a rather
large repository by some standards.

Your idea for a DP approach that supports multiple proofing rounds
for titles already within the PG collection is a good one, but
it's not something I'm pursuing.  

> > The outline
> >
> > OpenGutenberg
> >
> > Community Contributions for the Improvement of Public Domain EBooks
> >
> > Proposal outline May 25 2008 by gbn
> >
> >
> > Purpose: Creation of user-friendly and open opportunities for adding
> > value to Project Gutenberg content through enhanced versions, new
> > formats, and community contributions such as book reviews and author
> > biographies.  All content will be open and editable.  Design choices
> > will make it as easy as possible to propagate enhancements back to
> > the "main" Project Gutenberg collection. 
> >
> >   
> Isn't that already part of the gutenberg.org wiki?  Look at the 
> bookshelves, etc.  Anyone could create book review pages.  Why not just 
> borrow from wikipedia where some of this has already been done?

Yes, it is already part of the gutenberg.org wiki with the limitation
that people can't add their own content to the "main" catalog
page for an eBook.

I agree that Wikipedia is a better location for things like
author bios, and in fact I hope that's what people use first.
The point is editable content, focused on a chosen eBook.
So if someone wants to type a few quick notes on an author
or title...or something that's not adequately researched
for a wikipedia article...or some sort of family history related
to an author...this might be a good forum.

> > Features/desires.  Items marked with an asterisk [*] will be
> > two-way coupled with the main gutenberg.org collection.
> >
> > Implementation of the Project Gutenberg catalog *
> >
> >   
> Do you mean the catalog that already exists or are you talking about a 
> new and different catalog?

That catalog that already exists.  But my notion is to make
some derivative products, automatically derived from the 
catalog.  In particular, I'm interested in an XML metadata
file withIN the eBook directory.  That's the sort of method
that the Internet Archive uses, and I like the idea of keeping
metadata close to the eBook's other files.  

As some people might not know, there is a small but devoted
catalog team where catalog maintenance is done.  While the
work is neverending, I think the current catalog has a lot of 
goodness.  No need to supersede or reinvent.

> > Fully tracked change management for eBooks and other content *
> >   Including the ability to upload new formats, new files, new metadata
> >
> >   
> I presume that plain text would still be available since this would be a 
> separate site from gutenberg.org while still allowing page images, other 
> formats, etc.  Is this correct?

Everything from gutenberg.org will be imported essentially immediately.
The key is that OTHER stuff can be added.

> > EBook tracking & management through Subversion or similar system.
> >
> >   
> I think I understand subversion and similar, but could you elaborate on 
> this?  Would this mean that, similar to software projects, that anyone 
> could make and contribute their changes back to any ebook?  What if a 
> volunteer doesn't want people changing their work?  What if someone 
> purposely inserts errors or misinformation?  Who would oversee merging 
> the new changes?  Would there be any review or quality control process?

Good questions, and I think the answers depend on just how many
people, of what sorts of interests, make contributions and have
motivation to do community policing.  I'm not planning much
central structure.

Unlike [my understanding of] Wikipedia, I want to build in the
capability of forks.  So, when there is substantial disagreement,
BOTH views can persist independently...and both with the ability
for attachable commentary.  I'm not trying to reinvent something
like Wikipedia.

>From a quality control standpoint, this is much more like the
blogosphere than Wikipedia.

> > Per-book bug reporting and feature requests *
> >
> >   
> Again, is this practical?  If you're offering the above, anyone can do a 
> checkout of a particular book and fix errors themselves.  That means 
> that the whitewashers or equivalents have vastly more work than they 
> used to because they would have to "unfix" the errors which aren't 
> really errors or were fixed incorrectly.  Going back to a previous 
> point, what about proprietary formats?  It's good to say below that all 
> content must be submitted with an open license, but what about pdf or 
> "z.m.l" which can't easily be changed and manipulated?

The existing process for getting fixes back into the main PG collection
are not changed, in any way.

The thing added is ability for people to get their fixes "out there"
immediately, something that is not working well with the current
approach to errata.

As for proprietary formats: you are right that the problem of
regenerating new formats is an issue.  Over time some
titles will end up with various mis-matched titles.  Avoiding
such is one of the things we strive for for the main gutenberg.org
collection.  For this new collection, it's not one of my
design criteria.  HOWEVER, one feature I do want is for people
to "subscribe" to a particular eBook, so that those who make
new formats can at least know when something changes for that
title.

> > Search features
> >
> > RSS or similar notification & subscription features
> >
> >   
> I thought PG had this already for new postings.

Sure, and it's part of my "home" page setting in Firefox.  But
we need a new feed for the other added stuff.

> > Per book:
> >   WIKI [editable]
> >   Forks, for things like irreconcilable or orphaned versions, biography disputes
> >
> >   
> Yes, but how is an average reader supposed to know which version to 
> download?  Why not just create a new book with a different number 
> (edition numbers?) as opposed to saying that because of disputes, there 
> are four or five versions of book X with no clear idea of what the 
> differences are.  Speaking for myself, most of the time I just want a 
> plain text version that's readable and has the least number of errors.  
> I don't want to and won't sort through five or ten different "forks" to 
> find the "right" edition that I want.

You're envisioning a top-down approach to quality control 
and authoritativeness.  That's not a design goal.

Practically speaking, I think people will mostly just want the most
recent date in the format they desire.  It's not that complicated.
We're not talking about something like the LDS or Ron Paul
page on Wikipedia, or some other contentious content...we're
talking about literary works.

> >   Automatic book recommendation
> >   
> How would you implement this?  What criteria would determine what books 
> are recommended?  What if I want to turn that off?

There are a lot of algorithms for automatic book recommendations,
I haven't chosen one.  I don't think there's anything to
turn off...it will be something you ask for.  I'm not talking about
having "you might also be interested in...." on every screen,
like you see at Amazon etc.  Just some way of finding a book
of interest, by request.

> >   Bookmarks, shareable *
> >   Book reviews *
> >
> >   
> Why not make a dedicated book review site?  This way it isn't limited 
> just to PG content.  By that, I mean that people having nothing to do 
> with PG could still review public domain books and the name of PG 
> wouldn't be directly linked to the book review site.

I don't really think there is a shortage of places where people
can put their book reviews.  This is for reviews of the content
that's part of the site.

I have no objection to people putting it in the gutenberg.org
wiki or Wikipedia or whatever, and just linking from this new site.

> > For later implementation:
> >   GutenbergSelf: Self-publishing of items
> >
> >   
> I think this is a very bad idea.  This goes against what you and Michael 
> have said in the past, particularly that you don't want to be a vanity 
> press.  

We don't want gutenberg.org to be a vanity press.

The GutenbergSelf concept *is* about vanity press, basically.
Since people will browse or search for content, I imagine making
it easy to search just particular "types" of content...by
license, by source, etc. as well as author/title/subject.

> With copyrights being the way they are now, I think CC is better 
> qualified to start such a project.  

Do you mean CreativeCommons?  I didn't know they were doing
anything with hosting eBook content.  

> As I said before, anyone can get web 
> space and publish whatever they want.  

You're right.  Yet I get dozens of requests per week from
people looking to use Project Gutenberg for their content.  In
addition to some name-brand recognition, the main thing I think
PG offers is a likelihood of permanence.  To me, that is 
key [and it's why mirroring is specifically facilitated].  

> I'm not arguing for sensorship, 
> but what about explicit material?  In many countries, it is not legal to 
> post about certain explicit subjects.  Who determines what is published 
> and what isn't and makes sure laws aren't broken?  What about non-books 
> such as video and music?  How would one determine that a new book 
> called, say, Berry Potter isn't really Harry Potter book 1 in disguise, 
> thus violating copyright laws?  There are already quite a few 
> self-publishing sites that I kno of, ourmedia.org and IA coming to mind.

Everything will be US-based, and follow US laws for copyright &
the few laws dealing with obscenity.  

By having all postings be non-anonymous, with some sort of
waiting period built in for new posters, I think we'll avoid
some of the more obvious stuff like people posting Harlan Ellison's
works.  Also, we'll keep generally on an eBook mission, to 
avoid becoming another allofmp3.com or pirate bay.   

While a community reputation system can be jigged, as we see
on eBay, the stakes will be lower on this site.

> > General policies:
> > - As much as possible based on open standards & open software
> >
> > - As much as possible transparent, especially via text files stored
> > alongside their eBooks [for example, a browsable Subversion repository,
> > with metadata in XML files]
> >
> >   
> What do you mean by this?  I know what metadata and subversion are but 
> what do you mean by transparent?

Simply that you can get a directory listing.  This is something
the Internet Archive does for their eBooks.  

> > - Fully mirrorable in full or in part
> >
> >   
> Via rsync, ftp, or what other protocols?  I would really like to see a 
> fast rsync mirror.  ibiblio.org offers a rsync server but it isn't that 
> fast for me.  Would there be nightly or hourly snapshots?  How large 
> would the archive be?

Yes, the ibiblio.org rsync server is very slow!  I'm redirecting
people to the readingroo.ms rsync server instead.  

ftp, http...yes.

For snapshots, do you mean, how quickly will this new service
reflect changes to the main gutenberg.org site?  The answer
is, immediately:
- I'll push new titles to the new server, at the same time
   titles go to gutenberg.org
- We'll re-import & update the catalog as soon as it's regenerated,
   daily

> > - Authorization/authentication required to contribute, edit, etc., but
> > not necessarily strongly verified [similar to Wikipedia]
> >
> > - All content must be submitted under an open license or granted to
> > the public domain
> >   


Those were some really good comments, thanks.  As you see,
I'm not trying to scratch everyone's itch, and there are
both technical and social challenges to what I'm thinking of.

If anyone wants to contribute some expertise, or even
start something not quite the same, drop me a note.
  -- Greg


From Bowerbird at aol.com  Mon May 26 16:41:15 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 26 May 2008 19:41:15 EDT
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions
Message-ID: <c3a.36292202.356ca49b@aol.com>

jon richfield said:
>   I do usually supply TXT versions as well, but then it is 
>    up to the text-zealots to do as they please about missing pictures.

jon, please just list the filename of each graphic
at the point where it is to be included in the text,
and tomorrow's e-text viewers will insert it there.

thanks.

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080526/65d8e7ca/attachment.htm 

From prosfilaes at gmail.com  Mon May 26 18:43:20 2008
From: prosfilaes at gmail.com (David Starner)
Date: Mon, 26 May 2008 21:43:20 -0400
Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 30
In-Reply-To: <1211809386.6590.99.camel@abetarda>
References: <mailman.2277.1211753304.2809.gutvol-d@lists.pglaf.org>
	<1211809386.6590.99.camel@abetarda>
Message-ID: <6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com>

On Mon, May 26, 2008 at 9:43 AM, J?lio Reis <julio.reis at tintazul.com.pt> wrote:
> Could be more international, too: not just mentioning US copyright
> freedom but other jurisdictions as well. Most of the world is life+50 or
> life+70, right? Since we have death dates for most authors, how about
> showing whether a book is free in these? Not much work I think.

I suspect we don't have death dates for the majority of the authors in
the system, though we probably have death dates for the authors of the
majority of the books in the system. More over, you simplify the
issue; the question is not just the death date of the author, but also
the illustrator, author of the introduction, and possibly the editor.
Stating whether or not a book is out of copyright outside the one
jurisdiction we've carefully vetted it for is ill-advised, IMO.

From sly at victoria.tc.ca  Mon May 26 22:31:42 2008
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon, 26 May 2008 22:31:42 -0700 (PDT)
Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 30
In-Reply-To: <6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com>
References: <mailman.2277.1211753304.2809.gutvol-d@lists.pglaf.org>
	<1211809386.6590.99.camel@abetarda>
	<6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com>
Message-ID: <Pine.GSO.4.58.0805262226030.24843@vtn1.victoria.tc.ca>


On Mon, 26 May 2008, David Starner wrote:

>I suspect we don't have death dates for the majority of the authors in
>the system, though we probably have death dates for the authors of the
>majority of the books in the system. More over, you simplify the
>issue; the question is not just the death date of the author, but also
>the illustrator, author of the introduction, and possibly the editor.
>Stating whether or not a book is out of copyright outside the one
>jurisdiction we've carefully vetted it for is ill-advised, IMO.

Yes, that is true. Perhaps worth noting is that copyright is not
actually as simple as life+50 or life+70. Each country may have
additional rules or exceptions that apply in particular
circumstances, or depending on the nature of the copyrighted
material. For instance, I seem to recall that France has an
extended copyright term for works written by French citizens
who died in service to their country during the world wars.

Andrew


From traverso at posso.dm.unipi.it  Mon May 26 22:41:46 2008
From: traverso at posso.dm.unipi.it (Carlo Traverso)
Date: Tue, 27 May 2008 07:41:46 +0200 (CEST)
Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 30
In-Reply-To: <Pine.GSO.4.58.0805262226030.24843@vtn1.victoria.tc.ca> (message
	from Andrew Sly on Mon, 26 May 2008 22:31:42 -0700 (PDT))
References: <mailman.2277.1211753304.2809.gutvol-d@lists.pglaf.org>
	<1211809386.6590.99.camel@abetarda>
	<6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com>
	<Pine.GSO.4.58.0805262226030.24843@vtn1.victoria.tc.ca>
Message-ID: <20080527054146.64A4493B61@posso.dm.unipi.it>

>>>>> "Andrew" == Andrew Sly <sly at victoria.tc.ca> writes:

    Andrew> On Mon, 26 May 2008, David Starner wrote:

    >> I suspect we don't have death dates for the majority of the
    >> authors in the system, though we probably have death dates for
    >> the authors of the majority of the books in the system. More
    >> over, you simplify the issue; the question is not just the
    >> death date of the author, but also the illustrator, author of
    >> the introduction, and possibly the editor.  Stating whether or
    >> not a book is out of copyright outside the one jurisdiction
    >> we've carefully vetted it for is ill-advised, IMO.

    Andrew> Yes, that is true. Perhaps worth noting is that copyright
    Andrew> is not actually as simple as life+50 or life+70. Each
    Andrew> country may have additional rules or exceptions that apply
    Andrew> in particular circumstances, or depending on the nature of
    Andrew> the copyrighted material. For instance, I seem to recall
    Andrew> that France has an extended copyright term for works
    Andrew> written by French citizens who died in service to their
    Andrew> country during the world wars.

True, but we may give a reversed information: indicate the books for
which one of the creators is known to have been died later than 50 or
70 years ago. This simplifies life for people wanting to investigate
the copyright status of the works, since the answer is immediately
"No".

Carlo

From gbnewby at pglaf.org  Mon May 26 22:45:12 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon, 26 May 2008 22:45:12 -0700
Subject: [gutvol-d] OpenGutenberg input sought
In-Reply-To: <483ABDC3.3030207@earthlink.net>
References: <20080526075633.GA22752@mail.pglaf.org>
	<483ABDC3.3030207@earthlink.net>
Message-ID: <20080527054512.GC12974@mail.pglaf.org>

On Mon, May 26, 2008 at 08:40:19AM -0500, Ron Burkey wrote:
> I think these are very good ideas, as far as they go, but I think they 
> don't really address the issue of consistency in formatting of the 
> etexts.  But it also seems to me that there are some minor additions to 
> the outline that allow it to do so.

Thanks, Ron. 

I'm not primarily interested in addressing consistency.  That's the itch
that some people want to have scratched, but it's not one that motivates
me very much.  However:

> I would say that there are two basic schools of thought on the 
> standards/consistency-of-thought issue:
> 
>   1. The "anything goes as long as there's a plain-text version" school
>      of thought; and
>   2. The "I want an across-the-board consistent method of preparing the
>      texts so that essential data such as italics, underlining,
>      headings, etc. are preserved" school of thought.
> 
> I would characterize Michael Hart as being in the former school of 
> though.  I am in the latter school of thought, but I don't want to 
> arbitrarily throw away etexts that don't meet a standard method of 
> preparation.

Overgeneralizations, but I certainly understand the sentiments
in the two mindsets.

> So here's my proposed addition to Greg's outline:  There would *be* a 
> standard method of markup, but the content management system would have 
> two levels.  For lack of better terminilogy, there would be a "RAW" 
> level, which contained the vanilla-ASCII version, and a "FINISHED" 
> level, which contained the marked-up versions.  A text would start in 
> the "RAW" repository, and corrections to it would be made there for a 
> time.  But eventually, it might graduate into the "FINISHED" repository 
> when the standard markup was available; at that point, edits to the RAW 
> version would be locked out, and all future corrections would need to be 
> made to the FINISHED version.  Furthermore, there would be some (open 
> source) software, licensed to be permanently availabe (or perhaps 
> public-domain) that could run on any platform that could convert the 
> FINISHED form to the RAW form, so that vanilla-ASCII was always available.

That's not a bad idea at all.  Sort of an automated, or semi-automated
way of having content "blessed."  [In the Perl sense.]

> I couldn't care less what the FINISHED format is --- whether HTML, or 
> XML, or Bowerbird's "no markup" thing --- as long as there was *some* 
> standard.  The point is that it matters that the standard is official, 
> so that it is clear whether it is the RAW or the FINISHED version which 
> is eligible for corrections.

Of course, my philosophy isn't to block RAW texts from usage,
or improvement, etc.

I think a basic way to achieve this is simply through metadata,
so that a search can retrieve only the FINISHED versions.

Importantly, to me, is that if FINISHED for one group means
ZML, and for another means TeX, we can allow multiple definitions.
I know this isn't comfortable for everyone, but:

The real key, to me, is technical: having some excellent guidelines,
examples, and technical tools to help people bring their eBook to
FINISHED.  I know some folks have ideas on this already, and encourage
them to write up what they have [as software, guidelines/policies,
HOWTOs, etc.], even if they're not complete or fully automated.

Thanks for your ideas!  Despite not necessarily being my
personal itch, I think the idea of a FINISHED version is
consistent & doable.
  -- Greg

> Greg Newby wrote:
> 
> >The outline
> >
> >OpenGutenberg
> >
> >Community Contributions for the Improvement of Public Domain EBooks
> >
> >Proposal outline May 25 2008 by gbn
> >
> >
> >Purpose: Creation of user-friendly and open opportunities for adding
> >value to Project Gutenberg content through enhanced versions, new
> >formats, and community contributions such as book reviews and author
> >biographies.  All content will be open and editable.  Design choices
> >will make it as easy as possible to propagate enhancements back to
> >the "main" Project Gutenberg collection. 
> >
> >
> >Features/desires.  Items marked with an asterisk [*] will be
> >two-way coupled with the main gutenberg.org collection.
> >
> >Implementation of the Project Gutenberg catalog *
> >
> >Fully tracked change management for eBooks and other content *
> > Including the ability to upload new formats, new files, new metadata
> >
> >EBook tracking & management through Subversion or similar system.
> >
> >Per-book bug reporting and feature requests *
> >
> >Search features
> >
> >RSS or similar notification & subscription features
> >
> >Per book:
> > WIKI [editable]
> > Forks, for things like irreconcilable or orphaned versions, biography 
> > disputes
> >
> >
> >Community functions:
> > Reputation-based functions
> > Automatic book recommendation
> > Bookmarks, shareable *
> > Book reviews *
> >
> >
> >For later implementation:
> > GutenbergSelf: Self-publishing of items
> >
> >
> >General policies:
> >- As much as possible based on open standards & open software
> >
> >- As much as possible transparent, especially via text files stored
> >alongside their eBooks [for example, a browsable Subversion repository,
> >with metadata in XML files]
> >
> >- Fully mirrorable in full or in part
> >
> >- Authorization/authentication required to contribute, edit, etc., but
> >not necessarily strongly verified [similar to Wikipedia]
> >
> >- All content must be submitted under an open license or granted to
> >the public domain
> >
> > 
> >

From Bowerbird at aol.com  Mon May 26 23:54:25 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 27 May 2008 02:54:25 EDT
Subject: [gutvol-d] OpenGutenberg input sought
Message-ID: <d1f.2607f6f5.356d0a21@aol.com>

greg said:
>   I'm not primarily interested in addressing consistency.?
>    That's the itch that some people want to have scratched, 
>    but it's not one that motivates me very much.

yet you're unclear why o.l.p.c. couldn't use your library.

sad, but true.

oh well...


>   The real key, to me, is technical:   having some 
>    excellent guidelines, examples, and technical tools 
>    to help people bring their eBook to FINISHED.?

guideline #1:   consistency.

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080527/db6932f0/attachment.htm 

From julio.reis at tintazul.com.pt  Tue May 27 02:48:26 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Tue, 27 May 2008 10:48:26 +0100
Subject: [gutvol-d] Dates of death and copyright status outside the US
Message-ID: <1211881706.9979.65.camel@abetarda>

Hello all, thanks for replying.

David said,

>         More over, you simplify the issue; the question is not just
>         the death date of the author, but  also the illustrator,
>         author of the introduction, and possibly the editor.

I'm not simplifying any issue. We are discussing legal issues so my
concept of "author" is a legal one: "author" being any person whom we
need to consider for copyright issues.

I am *assuming* we are listing in the catalog all such 'authors' for any
given work. It's relevant to credit all authors for database search
issues, and also ethically important.

Andrew said,

>         Perhaps worth noting is that copyright is not
>         actually as simple as life+50 or life+70. Each country may
>         have
>         additional rules or exceptions that apply in particular
>         circumstances, or depending on the nature of the copyrighted
>         material. For instance, I seem to recall that France has an
>         extended copyright term for works written by French citizens
>         who died in service to their country during the world wars.

I know about France and the issues regarding (lack of) copyright
harmonization inside European Union countries. I am not asking PG to
give copyright information for any given country or region other than
the USA -- like Carlo pointed out, we only want PG to help people find
such information regarding their own jurisdiction.

I think you should be showing the date of publication already in the
catalog. It's very relevant for people searching it; I see several
editions of 'Paradise Lost', but which is the 1845 one?

I think I didn't come across clearly the first time around; what I am
asking in terms of authors is quite simply the number of years since the
death of the last author. For any given work:

1) do we have death dates for all 'authors'? If no, then stop: we can't
provide any information.

2) pick the last date of death of the work's 'authors'.

3) Years since last death = today's year, minus date of last death,
minus one.

That's just it. You send me the database schema, and I'll send you back
the SQL statement to create that information. It's not complicated.

I am one of the many maintainers of the PG catalog. Why go through the
trouble of researching dates of death if we don't need those for the USA
anyway? It's a small step to do the math afterwards.

J?lio.


From prosfilaes at gmail.com  Tue May 27 03:32:45 2008
From: prosfilaes at gmail.com (David Starner)
Date: Tue, 27 May 2008 06:32:45 -0400
Subject: [gutvol-d] Dates of death and copyright status outside the US
In-Reply-To: <1211881706.9979.65.camel@abetarda>
References: <1211881706.9979.65.camel@abetarda>
Message-ID: <6d99d1fd0805270332x7e15dc02t29492826dc2f8210@mail.gmail.com>

On Tue, May 27, 2008 at 5:48 AM, J?lio Reis <julio.reis at tintazul.com.pt> wrote:
> I am *assuming* we are listing in the catalog all such 'authors' for any
> given work. It's relevant to credit all authors for database search
> issues, and also ethically important.

We aren't. We won't. A collection of a hundred short stories or a
hundred paintings or both is just not feasible to list in the catalog.

> I think I didn't come across clearly the first time around; what I am
> asking in terms of authors is quite simply the number of years since the
> death of the last author. For any given work:

That's the part that's trivial; do we really need a computer to do it?
It can also be misleading, since we don't have all the authors in the
system.

> I am one of the many maintainers of the PG catalog. Why go through the
> trouble of researching dates of death if we don't need those for the USA
> anyway? It's a small step to do the math afterwards.

Because it's included in LoC records and is one easy way to
distinguish authors with the same name.

From rburkey2005 at earthlink.net  Tue May 27 06:15:39 2008
From: rburkey2005 at earthlink.net (Ron Burkey)
Date: Tue, 27 May 2008 08:15:39 -0500
Subject: [gutvol-d] OpenGutenberg input sought
In-Reply-To: <20080527054512.GC12974@mail.pglaf.org>
References: <20080526075633.GA22752@mail.pglaf.org>
	<483ABDC3.3030207@earthlink.net>
	<20080527054512.GC12974@mail.pglaf.org>
Message-ID: <1211894139.21896.15.camel@software1.heads-up.local>

On Mon, 2008-05-26 at 22:45 -0700, Greg Newby wrote:
> On Mon, May 26, 2008 at 08:40:19AM -0500, Ron Burkey wrote:

> > I couldn't care less what the FINISHED format is --- whether HTML, or 
> > XML, or Bowerbird's "no markup" thing --- as long as there was *some* 
> > standard.  The point is that it matters that the standard is official, 
> > so that it is clear whether it is the RAW or the FINISHED version which 
> > is eligible for corrections.
> 
> Of course, my philosophy isn't to block RAW texts from usage,
> or improvement, etc.
> 
> I think a basic way to achieve this is simply through metadata,
> so that a search can retrieve only the FINISHED versions.
> 
> Importantly, to me, is that if FINISHED for one group means
> ZML, and for another means TeX, we can allow multiple definitions.
> I know this isn't comfortable for everyone, but:
> 
> The real key, to me, is technical: having some excellent guidelines,
> examples, and technical tools to help people bring their eBook to
> FINISHED.  I know some folks have ideas on this already, and encourage
> them to write up what they have [as software, guidelines/policies,
> HOWTOs, etc.], even if they're not complete or fully automated.
> 
> Thanks for your ideas!  Despite not necessarily being my
> personal itch, I think the idea of a FINISHED version is
> consistent & doable.
>   -- Greg

Well, I know that talking about "standards" does seem a little
restrictive, given the sort of free-wheeling volunteer nature of PG.
But I do think that "guidelines" are more appropriate for the RAW
repository and "standards" are more appropriate for the FINISHED
repository.

Of course, just because I say that I think PG should have a standard for
how the FINISHED texts were formatted, that's not to say that the
standard must be exclusive.  The PG standard could be "the FINISHED
texts shall be HTML 4.0 *or* DocBook v4.5 *or* ZML vWhatever".  The key
ideas are that there should be a clear distinction between whether the
RAW version or the FINISHED version is the master version, in the sense
of being maintained in the future, and that there should some
established procedure for graduating an etext from the RAW status to the
FINISHED status.  

For example, if the PG standard was that FINISHED texts were HTML 4.0,
it doesn't necessarily imply that just because somebody supplied an HTML
4.0 version of a text that it should graduate into the FINISHED archive.
It's possible that the HTML version of the etext was itself crummy, and
should be put back into the RAW archive along with the vanilla ASCII
version.  Perhaps there would need to be an editor who can look at a
candidate for graduation and say, "yes, this is good enough!"  So there
are procedural issues as well as simple technical issues with my
proposal.

-- Ron


From jeroen.mailinglist at bohol.ph  Tue May 27 14:06:55 2008
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Tue, 27 May 2008 23:06:55 +0200
Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 30
In-Reply-To: <6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com>
References: <mailman.2277.1211753304.2809.gutvol-d@lists.pglaf.org>	<1211809386.6590.99.camel@abetarda>
	<6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com>
Message-ID: <483C77EF.6010301@bohol.ph>


Although it is nice background information to add death dates for 
authors, these are not by themselves enough to clear the copyright of 
any particular work outside the US. The laws are just too complicated. 
This is why I, although working from the Netherlands, leave it to PG to 
republish things, and won't do it myself.

life+something laws are a pain, especially given the draconian measures 
now proposed to enforce copyrights...

Jeroen.

David Starner wrote:
> On Mon, May 26, 2008 at 9:43 AM, J?lio Reis <julio.reis at tintazul.com.pt> wrote:
>   
>> Could be more international, too: not just mentioning US copyright
>> freedom but other jurisdictions as well. Most of the world is life+50 or
>> life+70, right? Since we have death dates for most authors, how about
>> showing whether a book is free in these? Not much work I think.
>>     
>
> I suspect we don't have death dates for the majority of the authors in
> the system, though we probably have death dates for the authors of the
> majority of the books in the system. More over, you simplify the
> issue; the question is not just the death date of the author, but also
> the illustrator, author of the introduction, and possibly the editor.
> Stating whether or not a book is out of copyright outside the one
> jurisdiction we've carefully vetted it for is ill-advised, IMO.
> _______________________________________________
>   

From jeroen.mailinglist at bohol.ph  Tue May 27 14:11:35 2008
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Tue, 27 May 2008 23:11:35 +0200
Subject: [gutvol-d] gutvol-d Digest, Vol 46, Issue 30
In-Reply-To: <20080527054146.64A4493B61@posso.dm.unipi.it>
References: <mailman.2277.1211753304.2809.gutvol-d@lists.pglaf.org>	<1211809386.6590.99.camel@abetarda>	<6d99d1fd0805261843l19aba7a4n5a7fcfa9f826439@mail.gmail.com>	<Pine.GSO.4.58.0805262226030.24843@vtn1.victoria.tc.ca>
	<20080527054146.64A4493B61@posso.dm.unipi.it>
Message-ID: <483C7907.8000303@bohol.ph>

Carlo Traverso wrote:
> True, but we may give a reversed information: indicate the books for
> which one of the creators is known to have been died later than 50 or
> 70 years ago. This simplifies life for people wanting to investigate
> the copyright status of the works, since the answer is immediately
> "No".
>   
I also do not agree with that approach. We need not support 
life+something systems in any way, just let us be agnostic about it, or 
keep it with the Your Copyright Milage May Vary notice we have today. 
Furthermore, in many countries (The UK being an exception), it is 
perfectly legal to download about anything as long as it is for private 
use or study. In the EU, the European Treaty on Human Rights guaranties 
the freedom to collect information, and the Universal Declaration of 
Human Rights also declares access to cultural heritage an inalienable 
human right. (Making the UK law a violation of human rights)

Jeroen.


From jeroen.mailinglist at bohol.ph  Tue May 27 14:23:25 2008
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Tue, 27 May 2008 23:23:25 +0200
Subject: [gutvol-d] Dates of death and copyright status outside the US
In-Reply-To: <1211881706.9979.65.camel@abetarda>
References: <1211881706.9979.65.camel@abetarda>
Message-ID: <483C7BCD.2070309@bohol.ph>

J?lio Reis wrote:
>
>
> That's just it. You send me the database schema, and I'll send you back
> the SQL statement to create that information. It's not complicated.
>   

You are missing a lot of relevant pieces of information that might be 
relevant when establishing copyright
status of works, including the answers to some of the following odd 
questions.

- What was citizenship of the author at birth, and did he hold other 
citizenships during his lifetime, and when?
(Einstein was born German, became Swiss in 1901, and US Citizen in 
1940). Does the fact that he retained Swiss citizenship
make his works eligible for WTO Restoration in the US?
- Can we apply rules of shorter term for Soviet Authors published before 
1974.
- What is the citizenship of somebody born in Gdansk in 1924?
- What is the influence of the place of first publication, and 
subsequent publications in other jurisdictions.
- Are war time extension in place; did the author "die for France" (France)
- Are transitional rules in place; does the author enjoy life+80 (Spain)
- Do we have to deal with the Great Ormond Street Hospital Children's 
Charity and the CDPA 1988, Schedule 6, Section 301. (UK)

And so on...

From Bowerbird at aol.com  Tue May 27 14:28:45 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 27 May 2008 17:28:45 EDT
Subject: [gutvol-d] an inalienable human right
Message-ID: <bfd.331d02c9.356dd70d@aol.com>

jeroen said:
>   the Universal Declaration of Human Rights also declares 
>    access to cultural heritage an inalienable human right

_now_ we're talking...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080527/a5c2be46/attachment.htm 

From Bowerbird at aol.com  Tue May 27 14:38:33 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 27 May 2008 17:38:33 EDT
Subject: [gutvol-d] raw to finished
Message-ID: <d25.298970ed.356dd959@aol.com>

by the way, roger frank (rfrank) from distributed proofreaders
has been working on a system to allow automated conversions
from his "master" format into .txt and .html, and latex and .pdf.

roger's master is a "dot command" approach, a throwback to the
earliest days of computer typesetting (e.g., roff/troff, and so on).

>    http://fadedpage.com/

maybe along the line he'll realize the .txt file could be his master...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080527/0b87faa7/attachment.htm 

From Bowerbird at aol.com  Wed May 28 01:55:57 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 28 May 2008 04:55:57 EDT
Subject: [gutvol-d] can't see the forest for the trees
Message-ID: <d59.21ae769b.356e781d@aol.com>

michael said:
>    And you refuse, after being told so many times,
>    just give your own consistency a trial run from
>    which to garner support.

who gives a tinker's damn about "support"?   not me.


>    You have put SO many works out here that I will
>    have to admit I can understand those who say it
>    it too much and thus won't read them.

i think you should be smarter than to repeat the "words"
of others that caused them to lose all of their credibility.

i have plenty of examples up for anyone who needs 'em.

i am way past the point of doing books one at a time, or
even 5 or 10, or 50 or 100 at a time.   i'm figuring out how
to do them in bunches of a thousand, and ten thousand...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080528/79337985/attachment.htm 

From richfield at telkomsa.net  Tue May 27 12:41:21 2008
From: richfield at telkomsa.net (Jon Richfield)
Date: Tue, 27 May 2008 21:41:21 +0200
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
 and BB
Message-ID: <483C63E1.5000703@telkomsa.net>

Al,

Thanks, David W has picked up with me again.  No doubt I'll soon be back 
on track.

BB

 >jon, please just list the filename of each graphic
at the point where it is to be included in the text,
and tomorrow's e-text viewers will insert it there.<

Errr, yeah, sortakinda, but why can't they get the full content and 
format as easily from the HTM file?  TXT files have all sorts of funnies 
when there are interjected Greek (or Arabic, or swelpme, ancient 
Egyptian) characters in the text?  I do include txt files for PG USA 
because they insist, but so far only enough to satisfy the WW team.  
Without venturing into the firefights about tools and formats, which 
interest me a lot less than getting the stuff on line, readable, and 
available, I don't see why more than what I have described is necessary 
from my point of view.  Lazy of me of course, but time is a diminishing 
resource and so is my capacity for time sharing.

Cheers,

Jon


From schultzk at uni-trier.de  Wed May 28 02:39:06 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Wed, 28 May 2008 11:39:06 +0200
Subject: [gutvol-d] raw to finished
In-Reply-To: <d25.298970ed.356dd959@aol.com>
References: <d25.298970ed.356dd959@aol.com>
Message-ID: <65691217-76E6-4E1D-9DB5-A49BA2A1CCBB@uni-trier.de>

Hi,

Am 27.05.2008 um 23:38 schrieb Bowerbird at aol.com:

> by the way, roger frank (rfrank) from distributed proofreaders
> has been working on a system to allow automated conversions
> from his "master" format into .txt and .html, and latex and .pdf.
>
> roger's master is a "dot command" approach, a throwback to the
> earliest days of computer typesetting (e.g., roff/troff, and so on).
	What do you mean by the earliesr days of typesetting? Are you saying
	DP is using outdated technology?

	Come On. Apples MacOS X is based on one of the oldest OSes around:
	UniX. That does not mean it is bad.

	From what I can see his system works. That is all that matters.
	I do admit I find his formatting language not my cup of tee.

>
> >   http://fadedpage.com/
>
> maybe along the line he'll realize the .txt file could be his  
> master...
	His .src is a txt file !!!  ;-))  You know as well as I do .txt  
denotes this file
	is not binary.

	regards
		Keith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080528/f25e93b3/attachment.htm 

From julio.reis at tintazul.com.pt  Wed May 28 03:35:02 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Wed, 28 May 2008 11:35:02 +0100
Subject: [gutvol-d] Dates of death and copyright status outside the US
In-Reply-To: <mailman.2.1211914802.28838.gutvol-d@lists.pglaf.org>
References: <mailman.2.1211914802.28838.gutvol-d@lists.pglaf.org>
Message-ID: <1211970902.6698.80.camel@abetarda>

Sure, David. Whatever, your country, your project. I can find my way
around PG and tell what's legal or not in my jurisdiction. And anyway, I
am a 'writer' to PG not a reader: the paper books that come my way are
more than enough. When I read ebooks, I read modern-day science fiction,
the old stuff at PG feels pretty boring. 'Golden age', sheesh. That's
like saying the Sumerians were the golden age of writing -- which they
were, but we don't really write that way anymore. Rant over.

I was just trying to increase the international appeal of PG. Just
curious about what the server logs show: how much of your bandwidth is
from outside the USA? Because if it's 10%, don't even bother.

Also, don't bother trying to explain why it's 'not feasible' to list one
or two hundred authors for a collective work. Suffice to say that I
don't understand, but what I understood is that for you and for me
listing authors has a different significance. And I can live with that
difference ;)

No one's refuted my take on the homepage. PG feels like a respectable
data store, not like an exciting portal to a wealth of the written word.
Not everything in PG is boring, we could make it look alive! (And if it
were all boring IMHO -- one man's Yawn is another man's Yay!)

>         We aren't. We won't. A collection of a hundred short stories
>         or a
>         hundred paintings or both is just not feasible to list in the
>         catalog.


From prosfilaes at gmail.com  Wed May 28 07:38:47 2008
From: prosfilaes at gmail.com (David Starner)
Date: Wed, 28 May 2008 10:38:47 -0400
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
	and BB
In-Reply-To: <483C63E1.5000703@telkomsa.net>
References: <483C63E1.5000703@telkomsa.net>
Message-ID: <6d99d1fd0805280738m3615bb02pbf5191c2c556f175@mail.gmail.com>

On Tue, May 27, 2008 at 3:41 PM, Jon Richfield <richfield at telkomsa.net> wrote:
> TXT files have all sorts of funnies
> when there are interjected Greek (or Arabic, or swelpme, ancient
> Egyptian) characters in the text?

Not if you use them right. The same Unicode goes into a text file as
goes into an HTML file. Ancient Egyptian is just pictures, or ASCII
transcription; you can't write Ancient Egyptian in HTML anymore than
you can in text.

From Bowerbird at aol.com  Wed May 28 09:55:18 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 28 May 2008 12:55:18 EDT
Subject: [gutvol-d] raw to finished
Message-ID: <d31.2f9267cf.356ee876@aol.com>

keith said:
>    What do you mean by the earliesr days of typesetting?
>    Are you saying DP is using outdated technology?

i'm saying dot commands go back to the
earliest days of _computer_ typesetting...
just google it and you'll see that it's true.


>    His .src is a txt file !!! ?;-)

it's an _ascii_ file (or .utf8 if you squint at it),
but a lot of the characters are _not_ "the text",
so it doesn't qualify in my mind as "a text file".

however, the .txt file that it generates _is_ one,
precisely since it has no superfluous characters.

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080528/87963fdc/attachment.htm 

From Bowerbird at aol.com  Wed May 28 10:14:02 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 28 May 2008 13:14:02 EDT
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
	and BB
Message-ID: <bed.23e33fd2.356eecda@aol.com>

jon richfield said:
>    Errr, yeah, sortakinda, but why can't they 
>    get the full content and format as easily from the HTM file?

um, you want the viewer-program for the .txt file to
read the .html file to get the graphic's filename when 
you could have just put it right there in the .txt file?

why be so convoluted?


>    TXT files have all sorts of funnies when there are interjected 
>    Greek (or Arabic, or swelpme, ancient Egyptian) characters

a .txt file can be in .utf8 format, and show those things directly.


>    Without venturing into the firefights about tools and formats,
>    which interest me a lot less than getting the stuff on line,
>    readable, and available, 

if you really want to get stuff "on line, readable, and available",
a .txt file is the most direct route of 'em all, thankyouverymuch.

and -- given the right format, and z.m.l. is just one variant --
you can also get typographic beauty out of a .txt file, as well as
powerful e-book capabilities that can scale to a very large library.

so your apathy about "tools and formats" isn't serving you well...

thankfully, the future won't be so myopic.


>    I don't see why more than what I have described is necessary
>    from my point of view.

i don't see that you have described anything at all.

i suggested a way that you could tell a .txt-file viewer-program
that a graphic should be inserted in a specific spot -- by simply
listing the filename of the graphic at that point.

this is about the same as what .html requires.   (well, a little less,
as you don't have to do the bracket-and-tag song-and-dance.)

frankly, i don't see how you could do _less_ than list the filename,
unless you expect the viewer-program to have some sort of e.s.p.

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080528/8bf834dd/attachment.htm 

From hart at pglaf.org  Wed May 28 10:23:54 2008
From: hart at pglaf.org (Michael Hart)
Date: Wed, 28 May 2008 10:23:54 -0700 (PDT)
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
 and BB
In-Reply-To: <6d99d1fd0805280738m3615bb02pbf5191c2c556f175@mail.gmail.com>
References: <483C63E1.5000703@telkomsa.net>
	<6d99d1fd0805280738m3615bb02pbf5191c2c556f175@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0805281015120.18337@pglaf.org>


On Wed, 28 May 2008, David Starner wrote:

> On Tue, May 27, 2008 at 3:41 PM, Jon Richfield <richfield at telkomsa.net> wrote:
>> TXT files have all sorts of funnies
>> when there are interjected Greek (or Arabic, or swelpme, ancient
>> Egyptian) characters in the text?
>
> Not if you use them right. The same Unicode goes into a text file as
> goes into an HTML file. Ancient Egyptian is just pictures, or ASCII
> transcription; you can't write Ancient Egyptian in HTML anymore than
> you can in text.


Gentlemen, please avoid the temptations to create rules
based on exceptions to the general situation.

Obviously the common book contains little, if any, from
other alphabets.

This is not to say NO books do, but they are exceptions
and not the general case.

However, we DO want to create eBooks in all languages--
at least all 250 that have over a million speakers, and
we SHOULD create general rules for each language, rules
created, I should add, by their native populations.

Why?

Because I learned a bit of a handful of languages in an
assortent of ways, from native speakers and not, even a
funny example of someone trying to teach me my own from
the perspective of being a teacher of English, but from
another country and language.

Most anyone who has been in such a situation can tell a
number of examples of how silly these events can be.

I just hope we can provide books to the whole world.


Michael


From hart at pglaf.org  Wed May 28 10:59:19 2008
From: hart at pglaf.org (Michael Hart)
Date: Wed, 28 May 2008 10:59:19 -0700 (PDT)
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <d59.21ae769b.356e781d@aol.com>
References: <d59.21ae769b.356e781d@aol.com>
Message-ID: <Pine.LNX.4.64.0805281040060.18337@pglaf.org>


On Wed, 28 May 2008, Bowerbird at aol.com wrote:

> michael said:
>>    And you refuse, after being told so many times,
>>    just give your own consistency a trial run from
>>    which to garner support.
>
> who gives a tinker's damn about "support"?   not me.
>

"Just Do It!"

If you just tell others how it should be done,
you are just one more voice in the Maelstrom.

>
>>    You have put SO many works out here that I will
>>    have to admit I can understand those who say it
>>    it too much and thus won't read them.
>
> i think you should be smarter than to repeat the "words"
> of others that caused them to lose all of their credibility.

Same concepts, who cares which words.


> i have plenty of examples up for anyone who needs 'em.
>
> i am way past the point of doing books one at a time, or
> even 5 or 10, or 50 or 100 at a time.   i'm figuring out how
> to do them in bunches of a thousand, and ten thousand...

THAT is EXACTLY your problem.

You never did enough books one at a time to creat the
literal infrastructure NECESSARY to your doing bunches
of a thousand, and ten thousands.

If you had done the proper run up, just 10 here and then
100 there, after making some revisions based on your new
experience from the first 10, then you WOULD NOT have an
assortment of perceived impervious ROADBLOCKS, to trying
to do 1,000 or 10,000.

Please recall that I labeled the first 10,000 as kind of
a feasibility study, which you knew of.

If you had been there, doing your own feasibilies, along
with that scenario, YOU could have taken over at 10,000,
barring other events that might have intervened.

The truth is, whenever you START YOUR PROJECT it is WISE
to do this runup of feasibility studies of 1, 10, 100 or
1,000 to 10,000 items along the way. . . .

NOT JUST FOR YOURSELF, BUT SO YOU CAN GET VOLUNTEERS....


You need to DEMONSTRATE what you are doing to give those
people who would help you time to get on board and learn
how to actually help you.

If you needed no help, you wouldn't be complaining.

It's no one's fault but your own that these books aren't
pretty much exactly what you need.

YOU STILL NEED TO WORK YOUR WAY UP FROM 10, 100, 1,000--
just so the world has time to see what you are doing and
to get on board.

What you SHOULD do is just gather a few volunteers, from
whom you can get an equal output to your own, to start.

You put out one book exactly the way you like, and walk,
not run, them through doing one in the same manner.

Get just 9 such volunteers and when you do ONE book then
you get 10, and will have trained those 10 volunteers.

Then you start on 100, using those who like those 10.

You should be able to improve the whole thing each time.

Then it's off to 1,000, and when you hit 10,000, a whole
world will take notice. . .one way or another.

Remember what the world was like before PG hit 10,000?

It was just a feasibility study to them before that.

Now stop TALKING and start DOING!!!

People will like that much better.

I will be only too glad to provide all possible support,
providing you've now learned enough not to reject it.

Go back to line #1 and reread until you can accept help.

If you can't accept help, it's going to be a loong trip.

>

> -bowerbird
>
>
>
> **************
> Get trade secrets for amazing burgers. Watch "Cooking with
> Tyler Florence" on AOL Food.
>      (http://food.aol.com/tyler-florence?video=4&amp;
> ?NCID=aolfod00030000000002)
>

From hart at pglaf.org  Wed May 28 11:20:06 2008
From: hart at pglaf.org (Michael Hart)
Date: Wed, 28 May 2008 11:20:06 -0700 (PDT)
Subject: [gutvol-d] an inalienable human right
In-Reply-To: <bfd.331d02c9.356dd70d@aol.com>
References: <bfd.331d02c9.356dd70d@aol.com>
Message-ID: <Pine.LNX.4.64.0805281111250.18337@pglaf.org>


On Tue, 27 May 2008, Bowerbird at aol.com wrote:

> jeroen said:
>>   the Universal Declaration of Human Rights also declares
>>    access to cultural heritage an inalienable human right
>
> _now_ we're talking...
>
> -bowerbird


With or without any such declarations, the right to public
domain information has always been "inalienable."

You can't buy someone's right to the public domain.

You can't sell your right to the public domain.

No one else can buy or sell these public domain rights.

Well, that is until the WIPO lobbyists appear onscene.

And the FUNNY part is that WIPO is part of the U.N.!!!


JUST THINK. . .THE U.N. IS SPONSORING THE ELIMINATION
OF THE PUBLIC DOMAIN. . . .


No wonder Ted Turner gave them a billion dollars cash!


I wonder how many billions he made on all those movies
he bought when their copyrights were extended???

In real money and audience terms the most popular movie:

"Gone With The Wind"


Who owns "Gone With The Wind"???


Ted Turner.


Along with tons of other movies.


"Gone With The Wind" was 1939.

The original 56 years of copyright allotted as the max
at the time expired on December 31, 1995.

But it was extended, along with every other copyright,
just to protect top 10% of moneymakers. . .the rest of
them were never renewed. . . .


From Bowerbird at aol.com  Wed May 28 14:25:34 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 28 May 2008 17:25:34 EDT
Subject: [gutvol-d] can't see the forest for the trees
Message-ID: <ce9.32b6bb80.356f27ce@aol.com>

michael, you seem to think you know what's happening on my hard-drive.

the only problem is, it doesn't jibe with what _i_know_ is happening there.


>   NOT JUST FOR YOURSELF, BUT SO YOU CAN GET VOLUNTEERS....

i've told you (probably a dozen times now) that i don't _need_ volunteers.


>    If you needed no help, you wouldn't be complaining.

i'm not "complaining".   i am telling you what's wrong with your e-library,
so you can fix it.   if you won't fix it, i'll have to _show_ you what's 
wrong,
by re-creating it without the problem, which seems to be what you prefer.

ok, fine, i'll do it that way.   you would've been a lot smarter to just ask 
me
to fix the problem _for_you_, as i would have been quite happy to do that.

but you seem to want to insist there is nothing wrong.   and you're wrong.


>   Now stop TALKING and start DOING!!!

don't be an ass, michael.   i have been _doing_ for a very long time now...

the "talking" part was to offer you the benefit of the lessons i have 
learned,
a courtesy to you, instead of upstaging your e-library with a better variant.

but i give up, you've made it clear you don't want to accept the gift, so 
ok...

understand, however, that what you have instructed me to _do_ right now
is to make _your_ version of your e-library irrelevant...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080528/28305d26/attachment.htm 

From marcello at perathoner.de  Wed May 28 14:37:35 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed, 28 May 2008 23:37:35 +0200
Subject: [gutvol-d] Dates of death and copyright status outside the US
In-Reply-To: <1211970902.6698.80.camel@abetarda>
References: <mailman.2.1211914802.28838.gutvol-d@lists.pglaf.org>
	<1211970902.6698.80.camel@abetarda>
Message-ID: <483DD09F.6020206@perathoner.de>

J?lio Reis wrote:

> I was just trying to increase the international appeal of PG. Just
> curious about what the server logs show: how much of your bandwidth is
> from outside the USA? Because if it's 10%, don't even bother.

Ask Alexa.


> Also, don't bother trying to explain why it's 'not feasible' to list one
> or two hundred authors for a collective work.

Its perfectly doable with the current software. If you feel like it, 
just do it. If you don't do it, don't complain that nobody else wants to 
do it for you.


> No one's refuted my take on the homepage.

If I answered everybody who doesn't like the look of the site, I 
wouldn't do anything else.

Its a wiki. Start a new front page and put it up for votes.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From sly at victoria.tc.ca  Wed May 28 15:42:20 2008
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed, 28 May 2008 15:42:20 -0700 (PDT)
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
 and BB
In-Reply-To: <6d99d1fd0805280738m3615bb02pbf5191c2c556f175@mail.gmail.com>
References: <483C63E1.5000703@telkomsa.net>
	<6d99d1fd0805280738m3615bb02pbf5191c2c556f175@mail.gmail.com>
Message-ID: <Pine.GSO.4.58.0805281540260.14748@vtn1.victoria.tc.ca>


On Wed, 28 May 2008, David Starner wrote:

> On Tue, May 27, 2008 at 3:41 PM, Jon Richfield <richfield at telkomsa.net> wrote:
> > TXT files have all sorts of funnies
> > when there are interjected Greek (or Arabic, or swelpme, ancient
> > Egyptian) characters in the text?
>
> Not if you use them right. The same Unicode goes into a text file as
> goes into an HTML file. Ancient Egyptian is just pictures, or ASCII
> transcription; you can't write Ancient Egyptian in HTML anymore than
> you can in text.

Just to be my annoying self, I can't resist responding...

According to: http://www.egpz.com/resources/unicode.htm

>The 1063 hieroglyphs given in N3237: Proposal to encode Egyptian
>Hieroglyphs in the SMP of the UCS were accepted for encoding in the
>Universal Character Set by the ISO WG2 meeting in April 2007. This set is
>based on the works of Alan Gardiner, the majority of which hieroglyphs are
>given in his work Egyptian Grammar. Some minor changes were made to the
>initial proposal and the set accepted by WG2 in first national ballot on
>ISO 10646 PDAM5 (September 2007) now numbers 1071 (see N3349: Summary of
>repertoire for FPDAM 5 of ISO/IEC 10646:2003 and future amendments). A
>second national ballot on PDAM5 is expected April 2008 at which time the
>set is likely to be fixed. All being well, Basic Egyptian Hieroglyphs
>will then be released with Unicode 5.2 ( expected 2009/10).


From schultzk at uni-trier.de  Thu May 29 00:43:12 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 29 May 2008 09:43:12 +0200
Subject: [gutvol-d] raw to finished
In-Reply-To: <d31.2f9267cf.356ee876@aol.com>
References: <d31.2f9267cf.356ee876@aol.com>
Message-ID: <0655CC3A-9FB7-4134-A01B-9E3DFFE71A62@uni-trier.de>


Am 28.05.2008 um 18:55 schrieb Bowerbird at aol.com:

> keith said:
> >   What do you mean by the earliesr days of typesetting?
> >   Are you saying DP is using outdated technology?
>
> i'm saying dot commands go back to the
> earliest days of _computer_ typesetting...
> just google it and you'll see that it's true.
	I know how old this type of formating is.
	It simply does not mean it is not useful or efficient,
	though I do admit it is ugly and hard to work with,
	without tools.

>
>
>
> >   His .src is a txt file !!!  ;-)
>
> it's an _ascii_ file (or .utf8 if you squint at it),
> but a lot of the characters are _not_ "the text",
> so it doesn't qualify in my mind as "a text file".
	Just a matter of definition.

>
>
> however, the .txt file that it generates _is_ one,
> precisely since it has no superfluous characters.
>
	Exactly, what I said about. It works and gets the job done!
	So why knock it.

regards
	Keith.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080529/e0caab0e/attachment.htm 

From schultzk at uni-trier.de  Thu May 29 00:36:49 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 29 May 2008 09:36:49 +0200
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
	and BB
In-Reply-To: <483C63E1.5000703@telkomsa.net>
References: <483C63E1.5000703@telkomsa.net>
Message-ID: <FE54EB9F-9740-4C05-AB6D-A197F9A6800F@uni-trier.de>

Hi Jon,

	By TXT files I assume you are talking
	about standard ASCII. With Greek I think
	8-bit encodding would be fine. Far as the
	others are concerned I believe UTF will
	do. Especially, acient Egyptian.

	The cases you mentioned are hard to fit into
	PG-Proper. There is no simple solution.

	regards
		Keith.

Am 27.05.2008 um 21:41 schrieb Jon Richfield:

> Al,
>
> Thanks, David W has picked up with me again.  No doubt I'll soon be  
> back
> on track.
>
> BB
>
>> jon, please just list the filename of each graphic
> at the point where it is to be included in the text,
> and tomorrow's e-text viewers will insert it there.<
>
> Errr, yeah, sortakinda, but why can't they get the full content and
> format as easily from the HTM file?  TXT files have all sorts of  
> funnies
> when there are interjected Greek (or Arabic, or swelpme, ancient
> Egyptian) characters in the text?  I do include txt files for PG USA
> because they insist, but so far only enough to satisfy the WW team.
> Without venturing into the firefights about tools and formats, which
> interest me a lot less than getting the stuff on line, readable, and
> available, I don't see why more than what I have described is  
> necessary
> from my point of view.  Lazy of me of course, but time is a  
> diminishing
> resource and so is my capacity for time sharing.
>
> Cheers,
>
> Jon
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From hart at pglaf.org  Thu May 29 10:25:58 2008
From: hart at pglaf.org (Michael Hart)
Date: Thu, 29 May 2008 10:25:58 -0700 (PDT)
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <ce9.32b6bb80.356f27ce@aol.com>
References: <ce9.32b6bb80.356f27ce@aol.com>
Message-ID: <Pine.LNX.4.64.0805291011160.9322@pglaf.org>


On Wed, 28 May 2008, Bowerbird at aol.com wrote:

> michael, you seem to think you know what's happening on my 
> hard-drive.
>
> the only problem is, it doesn't jibe with what _i_know_ is 
> happening there.

I'm only talking about what I know works,
and it is obviously NOT working for you.

Time for YOU to change. . . .

One way or the other. . . .

>
>
>>   NOT JUST FOR YOURSELF, BUT SO YOU CAN GET VOLUNTEERS....
>
> i've told you (probably a dozen times now) that i don't _need_ 
> volunteers.

So you SAY, but without them you are not getting anywhere,
which is why you complain the world doesn't fit perfectly
with your computers.

You want so substitute computers for volutneers, and you
keep finding out that doesn't work.

Then you complain that reality reality is not as perfect
as your desired virtual reality.

Stop complaining and DO THE WORK TO CHANGE REALITY!!!

Or just shut up, if you can't.

No one needs to hear you keep repeating complaints.

You know, there is a word for complaining and not doing
anything about it, but that word is not used in polite
conversations in our societies.


>>    If you needed no help, you wouldn't be complaining.
>
> i'm not "complaining".

Yes, you ARE complaining, just like all the others who
tell us how they could run PG better than anyone else.


> i am telling you what's wrong with
> your e-library, so you can fix it.

THAT is complaining.

How many time must we tell you to LEAD,
so others could possibly FOLLOW, if you
haven't already aliendated them all???

You could have done ALL this and been
DONE with it, if you had started with
DOING rather than COMPLAINING!!!


> if you won't fix it, i'll have to 
> _show_ you what's wrong, by re-creating it without the problem, 
> which seems to be what you prefer.

Are you saying you finall get the idea?

YES!!!

FOR THE UMPTEENTH TIME. . .YES!!!

JUST DO IT!!!


>> ok, fine, i'll do it that way.  you would've been a lot smarter to 
> just ask me to fix the problem _for_you_, as i would have been 
> quite happy to do that.

That's EXACTLY what JUST DO IT means!!!

JUST DO IT!!!

What could be more simple???

Go!

Go!!

Go!!!


> but you seem to want to insist there is nothing wrong.  and you're 
> wrong.

There is plenty that can be improved. . .GO FOR IT!!!


>
>
>>   Now stop TALKING and start DOING!!!
>
> don't be an ass, michael.  i have been _doing_ for a very long 
> time now...

As above. . . .


>
> the "talking" part was to offer you the benefit of the lessons i 
> have learned, a courtesy to you, instead of upstaging your 
> e-library with a better variant.

Not everyone considers improvement as being upstaged. . . .

You sound as if you want to pretend your improvements would
not be welcome. . .perhaps by some you have alienated, but
not my those who count.


>
> but i give up, you've made it clear you don't want to accept the 
> gift, so ok...

Ah, so your gift is already retracted in the same message
it was offered???

Who would believe it???

Read back and you will see that I have always been sincere
about you DOING what you SAID. . .RIGHT HERE!!!


>
> understand, however, that what you have instructed me to _do_ 
> right now is to make _your_ version of your e-library 
> irrelevant...

Are you insisting on the pretense that your version is not
welcome here after all the !!!YEARS!!! I have spent encouraging
you to JUST DO IT. . .RIGHT HERE???

Again, who would believe it???


>
> -bowerbird
>
>
>
> ************** Get trade secrets for amazing burgers. Watch 
> "Cooking with Tyler Florence" on AOL Food.
>      (http://food.aol.com/tyler-florence?video=4&amp; 
> ?NCID=aolfod00030000000002)
>

From Bowerbird at aol.com  Thu May 29 11:43:57 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 29 May 2008 14:43:57 EDT
Subject: [gutvol-d] raw to finished
Message-ID: <c29.381c493e.3570536d@aol.com>

keith said:
>   So why knock it.

i wasn't "knocking" it.

saying something is "old" doesn't mean you're criticizing it.
it's simply a description.

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080529/ed163e4b/attachment.htm 

From Bowerbird at aol.com  Thu May 29 12:18:05 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 29 May 2008 15:18:05 EDT
Subject: [gutvol-d] can't see the forest for the trees
Message-ID: <d02.3458f4d6.35705b6d@aol.com>

michael said:
>    So you SAY, but without them you are not getting anywhere

and here again you seem to know how my progress is going.
but your report is the exact opposite of my own experience...

i'm making excellent progress, my friend, excellent progress.


>   You want so substitute computers for volutneers, 
>    and you keep finding out that doesn't work.

michael, you're wrong -- very badly wrong -- on both counts.


>   No one needs to hear you keep repeating complaints.

i wonder why you have this "need" to keep trying to spin my
posts as "complaints".   they're constructive criticism, michael.
if you listened, and acted, your e-library would be improved...


>   Ah, so your gift is already retracted 
>    in the same message it was offered???

the question is whether i improved your e-library in-place,
or set up another version -- my e-library -- independently...

you have been telling me all along to set up my own version.
i've been reluctant to do that, because it _will_ upstage you...
your library has been held back because programmers can't
access the underlying structure of the e-texts to add value...
but you've won, michael.   i am no longer reluctant to do that.
congratulations.


>    Are you insisting on the pretense that your version is 
>    not welcome here after all the !!!YEARS!!! I have spent 
>    encouraging you to JUST DO IT. . .RIGHT HERE???

i can't even get well-documented errors corrected in your e-texts.
why would i have any reason to believe i could do the large-scale
editing necessary to bring those e-texts to a state of consistency?

i'll mount my vision...   when you see how programmers flock to it,
you will come to grasp fully and precisely what i have been saying.

i'm really sorry it had to come to this...   but i see i have no choice...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080529/ce66e99b/attachment.htm 

From Bowerbird at aol.com  Thu May 29 12:30:39 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 29 May 2008 15:30:39 EDT
Subject: [gutvol-d] =?iso-8859-1?q?=A0_cleaning_up_the_catalog?=
Message-ID: <c87.2e84f1f0.35705e5f@aol.com>

joyce said:
>    It looks to me like your data comes from the etext file headers, 
>    and not from the bibliographic records in the catalog.? 

i do believe you're correct about that...

it has been well over a year since i did that work, and i'd forgotten.

and let me also add a quick "thank you" for looking at my output.
it's heartening to know that _someone_ cares enough to _look_...


>    The list includes several titles by Edith Van Dyne.? In the catalog, 
>    Edith Van Dyne exists only as a pseudonym for L. Frank Baum.   Baum's 
>    name is the one attached to the works in the bibliographic records.

and here -- joyce -- is where i diverge sharply from meta-data people.

because when i look at the e-texts themselves, there is _no_mention_
at all in them of baum's name, only this "edith van dyne" pseudonym...
(well, in one of them, "van dyne" lists "baum" along with a number of
other well-respected authors, which must've seemed like a fine joke.)

yet in the r.d.f. catalog, it's "baum" only, with no mention of "van dyne".

when you search the catalog on the website, it _mentions_ "van dyne",
but not in the specific context of the exact e-texts authored as such...

so there's this huge gulf between the _actual_data_ and the "metadata".
one says one thing, the other says another, and i can't countenance that.

that's why i decided to scrap the catalog, and depend on the headers...

now of course, i'd love to have the information about _both_ names
in _both_ places -- the actual e-texts and the catalog -- but in the
absence of congruence between the two, i'll settle on the "real" data.

i want for each of my "data" files to contain _all_ the information that
appears about them in the "metadata" of the catalog, so the catalog
can be regenerated at will just by combing through the "data" files...


>    Not to claim that there are no inconsistencies in the catalog, 
>    but from a cataloger's point of view it's not reasonable to 
>    tally every file header inconsistency as a catalog problem.

perhaps from your cataloger's point of view, your catalog is correct,
and the shortage of info in the "file header" is a mere "inconsistency".

but from my standpoint of _integrating_ the catalog with the e-texts,
as a mere guide to the library that's the accumulation of those e-texts,
the _discrepancy_ between the two is an unacceptable burden to bear.

i'm not suggesting that it's _your_ problem.   (although it kind of is,
in the sense that i believe the catalog should reflect the pseudonym;
it can have the "real" name too, but it _should_ have the pseudonym.)

i believe that the e-texts _and_ the catalog should reflect both names.
and, of course, i'll do this fix to both sides of the equation in my work.

but as we both know, this particular situation has a lot of manifestations.
(the book from gweeks on which i made an error-report recently had it.)

in my humble opinion, and the way that i'll surely structure my workflow,
any change made to the catalog should be mirrored in the e-text itself...

-bowerbird

p.s.   and of course when i talked about "cleaning up the catalog", i was
_not_ talking about "cleaning" on a _factual_ basis, such as in this case.
instead, i was talking about making all the catalog entries _consistent_.
i appreciate greatly the work of the cataloging team to make the catalog
more _accurate_, but my concern at present is merely making it _useful_.
inconsistent entries make it difficult for people to know how to search it.


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080529/eba3169e/attachment.htm 

From julio.reis at tintazul.com.pt  Thu May 29 14:03:20 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Thu, 29 May 2008 22:03:20 +0100
Subject: [gutvol-d] Dates of death and copyright status outside the US
In-Reply-To: <mailman.2.1212087601.11335.gutvol-d@lists.pglaf.org>
References: <mailman.2.1212087601.11335.gutvol-d@lists.pglaf.org>
Message-ID: <1212095000.6554.148.camel@abetarda>

Marcello,

I realise I've touched some kind of wound. I didn't mean to do harm.
(Quoting Ruth Rendell: "One could say he meant well, if it weren't such
a bad thing to say about someone.")

>         Ask Alexa.

She says "USA 37%, Germany 6%" etc. etc.

>         Its perfectly doable with the current software. If you feel
>         like it, just do it. If you don't do it, don't complain that
>         nobody else wants to do it for you.

I'm doing all the corrections I can think of. I am not complaining.

>         > No one's refuted my take on the homepage.
>         
>         If I answered everybody who doesn't like the look of the site,
>         I 
>         wouldn't do anything else.

Believe me, I had no idea. I've been at this list for a few months.

>         Its a wiki. Start a new front page and put it up for votes.

No, man. I'll take it the way it is. I am too busy doing books at DP.
Plus randomly selecting from a list of choice items isn't something that
can be done with regular MediaWiki syntax, it needs some programming.

J?lio.


From hart at pglaf.org  Thu May 29 14:46:54 2008
From: hart at pglaf.org (Michael Hart)
Date: Thu, 29 May 2008 14:46:54 -0700 (PDT)
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <d02.3458f4d6.35705b6d@aol.com>
References: <d02.3458f4d6.35705b6d@aol.com>
Message-ID: <Pine.LNX.4.64.0805291434440.12911@pglaf.org>

On Thu, 29 May 2008, Bowerbird at aol.com wrote:

> michael said:
>>    So you SAY, but without them you are not getting anywhere
>
> and here again you seem to know how my progress is going.
> but your report is the exact opposite of my own experience...
>
> i'm making excellent progress, my friend, excellent progress.

If you really are, then why all the complaining???

Oh. . .right. . .that's not really complaining, is it???

>
>
>>   You want so substitute computers for volutneers,
>>    and you keep finding out that doesn't work.
>
> michael, you're wrong -- very badly wrong -- on both counts.
>
>
>>   No one needs to hear you keep repeating complaints.
>
> i wonder why you have this "need" to keep trying to spin my
> posts as "complaints".   they're constructive criticism, michael.
> if you listened, and acted, your e-library would be improved...

If you don't ACT on them, they are only complaints. . . .

How many years does it take for that to get through???

>
>
>>   Ah, so your gift is already retracted
>>    in the same message it was offered???
>
> the question is whether i improved your e-library in-place,
> or set up another version -- my e-library -- independently...

Personally, _I_ would suggest doing both.

You might just be more effective that way.


> you have been telling me all along to set up my own version.

Yes, I have.

> i've been reluctant to do that, because it _will_ upstage you...

Again, some of us don't feel improvement mean being upstaged.

I would LOVE for YOU, or anyone else, to make ALL my work part
of a past that is now just history, just the foundation for a
new present that is a totally new dimension.


> your library has been held back because programmers can't
> access the underlying structure of the e-texts to add value...

And THIS is a complaint. . . .

I've been encouraging you do fix it all along. . .  .


> but you've won, michael.   i am no longer reluctant to do that.
> congratulations.

I hope it works out well for all concerned.

>
>
>>    Are you insisting on the pretense that your version is
>>    not welcome here after all the !!!YEARS!!! I have spent
>>    encouraging you to JUST DO IT. . .RIGHT HERE???
>
> i can't even get well-documented errors corrected in your e-texts.
> why would i have any reason to believe i could do the large-scale
> editing necessary to bring those e-texts to a state of consistency?

Have you ever concerned yourself with how hard it is for ME to
get errors fixed???

Have you ever concerned yourself that you might have alientated
those who could fix these errors for you, for me, for us, for a
whole world???


>
> i'll mount my vision...   when you see how programmers flock to it,
> you will come to grasp fully and precisely what i have been saying.

I hope you get everything running the way you hope and want,
or can made adjustments to make it come out something like that.


>
> i'm really sorry it had to come to this...  but i see i have no 
> choice...

I still say this is a pretence on your part.

YOU are creating some fictional rift that YOU need to separate
yourself from me.

This has never been necessary.

You have been, and still are, welcome to come and go as you please.

I just hope you don't try to build a wall where none exists.

Do you really feel such a need to PRETEND there is such a rift?

If so, just remember that this rift is a figment of yours.

I doubt anyone else here will care, but I encourage you,
and I hope your cup runneth over with success, and that
you will share it with the world, even with me, or with
Project Gutenberg, should I be gone by the time you get
your big success.


Michael

>
> -bowerbird
>
>
>
> **************
> Get trade secrets for amazing burgers. Watch "Cooking with
> Tyler Florence" on AOL Food.
>      (http://food.aol.com/tyler-florence?video=4&amp;
> ?NCID=aolfod00030000000002)
>

From Bowerbird at aol.com  Thu May 29 16:11:29 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 29 May 2008 19:11:29 EDT
Subject: [gutvol-d] can't see the forest for the trees
Message-ID: <bc8.2e140317.35709221@aol.com>

michael-

there's no pretense, no rift, and no wall.

there's just me saying that, as a programmer who has
attempted to add value to the project gutenberg library,
i was frustrated by massive inconsistencies in the e-texts,
to the point that i was unable to accomplish what i wanted.

further, as an observer of _other_ programmers who have
had the exact same experience, i felt that i should tell you.

some of these programmers experienced the inconsistencies
many years ago -- like ron burkey -- and others experienced
them a few years back -- like the o.l.p.c. person -- and others
experienced them only recently-- like list-subscriber legutierr.

and who knows how many others tried, failed, and left silently?

i have no bone to pick with you.   only with the inconsistencies.
you're a swell fellow.   i haven't said _anything_ bad about you.

i'm not sure why now you think it's ok to pick on me personally,
especially since you've taken up the "all complaining, no action"
line that caused my antagonists here to lose all their credibility.
but you're a swell guy.   so maybe it's just a misunderstanding...

but whatever... it doesn't matter, because flames don't hurt me.

it's clear it's time for me to concede "defeat", though, since p.g.
isn't ever going to become a consistent library, so i will have to
set up an independent mirror to show the value of consistency.

because this programmer is getting antsy to show his chops...

otherwise the world will think that noring's "pagelets" are the 
best thing the cyberlibrary of tomorrow will offer to people...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080529/2408dcab/attachment-0001.htm 

From richfield at telkomsa.net  Fri May 30 00:28:03 2008
From: richfield at telkomsa.net (Jon Richfield)
Date: Fri, 30 May 2008 09:28:03 +0200
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
 and BB
Message-ID: <483FAC83.1050600@telkomsa.net>

BB said:
 >>   Errr, yeah, sortakinda, but why can't they
 >>   get the full content and format as easily from the HTM file?

 >um, you want the viewer-program for the .txt file to
 >read the .html file to get the graphic's filename when
 >you could have just put it right there in the .txt file?

 >why be so convoluted?

How is this more convoluted than getting it from specifically coded 
TXT?  You might claim that markup is generally more complex than txt, 
but that hardly is true to a greater extent than a halfway decent ML  is 
capable of giving more information about format and encoding than the 
txt.  Note that the same goes for encoding of alien alphabets, scripts 
and typefaces, not to mention graphics (David S. please note!)  the 
advantage of a widely used ML is that the format is defined and 
therefore it is largely easier to automate their interpretation and 
conversion than in the case of TXT.

For example, time is becoming too precious for me to spend it on 
learning new non-automated and non-universal conventions, so when I want 
to scan a book with multiple  necessary graphics, I use the Omnipage 
that came with my scanner (I may soon try another package for Linux) for 
OCR.  I export the product into Word or Open Source Writer for 
formatting.  The reason is the repertoire of macros that I have 
accumulated and that suit my typing style.  The finished product I 
convert to HTML and I use TIDY to make it fit for decent company. This 
process deals with everything from formatting to spelling. To produce 
TXT is an extra overhead, particularly with the PG requirement of 
breaking lines at the window edge.  In fact it frequently takes me more 
effort than the HTML does (though I grant that it reveals mre errors, of 
course.)  Now, when Omnipage or some similar product will produce a 
universally acceptable and versatile format, that is what I may use if 
it costs only a few toes,  but meanwhile, being a user, and not a 
producer and designer of  MLs and ML automation tools, I simply wags my 
great long furry ears.  Let me know when the squabbling at the other end 
is done.  I don't have the time for it nowadays.

Cheers,

Jon


From Bowerbird at aol.com  Fri May 30 02:19:07 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 30 May 2008 05:19:07 EDT
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
	and BB
Message-ID: <c95.2ec058fa.3571208b@aol.com>

jon richfield said:
>   I simply wags my great long furry ears.

i'm sorry.   i thought you were asking a question.

i thought you wanted to know how people who were using
the .txt file could have the benefit of the graphics files too.

i suggested an easy answer -- include the name of each file
at the point where that graphic was supposed to be shown...

if nothing else, the user can chase down the file manually,
with the advantage he knows what filename to search for...

now -- unless i misunderstand -- you're saying this is all
far too complicated.   so i no longer know how to respond.

but i guess it doesn't matter, because now it appears that
you haven't been asking any question all along anyway...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080530/cd46693d/attachment.htm 

From marcello at perathoner.de  Fri May 30 05:30:05 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri, 30 May 2008 14:30:05 +0200
Subject: [gutvol-d] Dates of death and copyright status outside the US
In-Reply-To: <1212095000.6554.148.camel@abetarda>
References: <mailman.2.1212087601.11335.gutvol-d@lists.pglaf.org>
	<1212095000.6554.148.camel@abetarda>
Message-ID: <483FF34D.9010805@perathoner.de>

J?lio Reis wrote:

> No, man. I'll take it the way it is. I am too busy doing books at DP.
> Plus randomly selecting from a list of choice items isn't something that
> can be done with regular MediaWiki syntax, it needs some programming.

MediaWiki plugins are not that hard to write. You just write some 
mock-up and if people like it, we'll turn it live.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From Bowerbird at aol.com  Fri May 30 10:40:13 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 30 May 2008 13:40:13 EDT
Subject: [gutvol-d] antsy
Message-ID: <ccf.30fcf652.357195fd@aol.com>

i said:
>    because this programmer is getting antsy to show his chops...

hey, that's a good one, i hope marcello puts it on his fan-site for me.       
:+)

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080530/c1a81d00/attachment.htm 

From hart at pglaf.org  Fri May 30 11:16:43 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 30 May 2008 11:16:43 -0700 (PDT)
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <bc8.2e140317.35709221@aol.com>
References: <bc8.2e140317.35709221@aol.com>
Message-ID: <Pine.LNX.4.64.0805301116180.28696@pglaf.org>


I would certainly like to mirror your mirror, if that's ok.


mh


On Thu, 29 May 2008, Bowerbird at aol.com wrote:

> michael-
>
> there's no pretense, no rift, and no wall.
>
> there's just me saying that, as a programmer who has
> attempted to add value to the project gutenberg library,
> i was frustrated by massive inconsistencies in the e-texts,
> to the point that i was unable to accomplish what i wanted.
>
> further, as an observer of _other_ programmers who have
> had the exact same experience, i felt that i should tell you.
>
> some of these programmers experienced the inconsistencies
> many years ago -- like ron burkey -- and others experienced
> them a few years back -- like the o.l.p.c. person -- and others
> experienced them only recently-- like list-subscriber legutierr.
>
> and who knows how many others tried, failed, and left silently?
>
> i have no bone to pick with you.   only with the inconsistencies.
> you're a swell fellow.   i haven't said _anything_ bad about you.
>
> i'm not sure why now you think it's ok to pick on me personally,
> especially since you've taken up the "all complaining, no action"
> line that caused my antagonists here to lose all their credibility.
> but you're a swell guy.   so maybe it's just a misunderstanding...
>
> but whatever... it doesn't matter, because flames don't hurt me.
>
> it's clear it's time for me to concede "defeat", though, since p.g.
> isn't ever going to become a consistent library, so i will have to
> set up an independent mirror to show the value of consistency.
>
> because this programmer is getting antsy to show his chops...
>
> otherwise the world will think that noring's "pagelets" are the
> best thing the cyberlibrary of tomorrow will offer to people...
>
> -bowerbird
>
>
>
> **************
> Get trade secrets for amazing burgers. Watch "Cooking with
> Tyler Florence" on AOL Food.
>      (http://food.aol.com/tyler-florence?video=4&amp;
> ?NCID=aolfod00030000000002)
>

From Bowerbird at aol.com  Fri May 30 12:12:59 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 30 May 2008 15:12:59 EDT
Subject: [gutvol-d] can't see the forest for the trees
Message-ID: <d69.2d9538d8.3571abbb@aol.com>

michael said:
>    I would certainly like to mirror your mirror, if that's ok.

you know that public-domain means you don't need my ok.

plus i'd consider it to be an honor if you did that.

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080530/a4855064/attachment.htm 

From richfield at telkomsa.net  Fri May 30 13:19:17 2008
From: richfield at telkomsa.net (Jon Richfield)
Date: Fri, 30 May 2008 22:19:17 +0200
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
 and BB
Message-ID: <48406145.6070909@telkomsa.net>

BB

 >i'm sorry.  i thought you were asking a question.

 >i thought you wanted to know how people who were using
 >the .txt file could have the benefit of the graphics files too.

Thanks for your trouble, though I am slightly nonplussed at the idea of 
such a naive question.  Nor can I see what I said that looked like such 
a question.  However, I sincerely thank you for your trouble.  I have 
made similar slip-ups myself from time to too frequent time.


 >i suggested an easy answer -- include the name of each file
 >at the point where that graphic was supposed to be shown...

 >if nothing else, the user can chase down the file manually,
 >with the advantage he knows what filename to search for...

I fail to see how that is easier than reading a ML file with the 
material all in place.  It certainly is no easier to prepare.  I don't 
know how many of our users go for the txt files for lack of GUI 
browsers.  If they do, then I don't see what they gain by being able to 
read the names of the graphics files.  Without the means to present the 
graphics, they might as well either do without, or make do with the 
unembellished txt and as much imagination as the can muster.  In either 
case, why bother with the TXT references?  This is no rhetorical 
question; is there any incentive that I have overlooked? 

 >now -- unless i misunderstand -- you're saying this is all
 >far too complicated.  so i no longer know how to respond.

Complicated?  How did complication get into this?  This must be getting 
too complicated for me.  I simply expressed a preference for maximal 
exploitation of available cheap and convenient tools until something 
better came along.  If txt plus pointers to user-accessible insertions 
were the best thing, then sure; I'd use them; however, since neolithic 
flakes now are available to affix to the arrows, I see no reason to 
continue hand-charring  the wooden points to fire-harden them.  Rool on 
machine tools!  (No matter who designs them!) 

 >but i guess it doesn't matter, because now it appears that
 >you haven't been asking any question all along anyway...

Oh, I asked plenty.

That just happened not to be it and essentially all that I asked (except 
for the German script thing) elicited helpful responses.  . 

But as I said, thanks anyway, and any abrasion from my quarter is 
inadvertent.

Cheers,

Jon


From Bowerbird at aol.com  Sat May 31 00:10:32 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 31 May 2008 03:10:32 EDT
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
	and BB
Message-ID: <c5b.3245b7aa.357253e8@aol.com>

jon richfield said:
>   Nor can I see what I said that looked like such a question.

oh, ok, i'm happy to clear that up.   you said this:
>    I do usually supply TXT versions as well, but then it is 
>    up to the text-zealots to do as they please about missing pictures.

i see now, since you have clarified yourself for me, that
you are basically giving the finger to us "text-zealots",
informing us that you have no intention of doing a thing
to make it easy for us to deal with those "missing" pictures.

i'm seeing this attitude -- a desire to "cripple" the .txt files --
more and more, especially from quite a few people over at d.p.

here's a good example -- the story of pocahontas:
>   http://www.gutenberg.org/files/24487/24487.txt

it has lots of stuff like this:
>   She was a child of nature, and the birds trusted her 
>    and came at her call. She knew their songs, and 
>    where they built their nests. So she roamed the woods, 
>    and learned the ways of all the wild things, and 
>    grew to be a care-free maiden.
>
>    [Illustration]

notice that "[illustration]" notation?   really something, isn't it?
what good does it do?   tells the user they're missing a picture.
doesn't do a single thing to tell them _where_ they can find it.

even if they were to navigate to the folder where the pictures are:
>   http://www.gutenberg.org/files/24487/24487-h/images/
it doesn't tell them the _name_ of the file containing that picture.

by looking at the .html version, i can tell you that the picture is here:
>    http://www.gutenberg.org/files/24487/24487-h/images/i005-1.jpg

is it that hard to imagine that if you put that u.r.l. in the .txt file,
a .txt-file viewer-program could fetch it from there and display it,
right there at that spot in the text?

it isn't that hard for _me_ to imagine.

so the person who prepared this book could've added a lot of value
to the .txt file by including that information.   and make no mistake,
they _had_ that info, since they needed it to make the .html version.
but they _took_an_extra_step_of_work_ to discard it from the .txt file.

so -- jon -- since you've said you were "slightly nonplussed at the idea
of such a naive question", let _me_ ask you an extremely naive question:
why in the world is this significant information deliberately discarded?

***

>    I fail to see how that is easier than 
>    reading a ML file with the material all in place.

well, i could go in any number of directions to reply to this...

first of all, some people don't particularly enjoy the web-browser
as a book-reading environment; it's not tailored to that purpose.

second, "reading" isn't the only thing people will do with these files.
they will be remixing them, and we make that unnecessarily difficult
whenever we fail to include all of the relevant information in the file...
(by "remixing", i mean all kinds of conversions and file re-workings,
some of which we haven't even imagined, but which will surely come.)


>    I don't know how many of our users go for the txt files for 
>    lack of GUI browsers.? If they do, then I don't see what they 
>    gain by being able to read the names of the graphics files.?

again, you're failing to anticipate the wide range of ways to which
these files could be put to use, if only we prepare them correctly...

at any rate, i'm not sure whether you think it's too difficult or not
to put the filenames into the .txt files at their appropriate places...

if not, then i suggest you might entertain the thought of doing so...
but if it is, fine, don't trouble yourself, since i will be fixing all these
omissions when i mirror the library.   either way, have a nice weekend.

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080531/08efa6b5/attachment.htm 

From joshua at hutchinson.net  Sat May 31 05:41:41 2008
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat, 31 May 2008 12:41:41 +0000 (GMT)
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
 and BB
Message-ID: <1808933943.322341212237701607.JavaMail.mail@webmail02>


It isn't that it's hard to imagine.  It's that there isn't a single viewer program out there that DOES this (and no, your vapor-ware promises don't count).  Nor, in the many years I've been around here, has there been a single announcement that such a product was in development.  Or heck, even in planning.

So putting a feature into the .txt file that 

1-Has no spec on how to do it in a standard way 

AND 

2-Has zero support if there WAS a standard method ...

Well, that makes very little sense.

Josh

On May 31, 2008, Bowerbird at aol.com wrote: 

is it that hard to imagine that if you put that u.r.l. in the .txt file,
a .txt-file viewer-program could fetch it from there and display it,
right there at that spot in the text?