From Bowerbird at aol.com  Sun Jun  1 12:03:46 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 1 Jun 2008 15:03:46 EDT
Subject: [gutvol-d] good tools that do all the work (the short version)
Message-ID: <d2c.2c82ea64.35744c92@aol.com>

i've written a longer version of this post,
which i might or might not send later, but
here's the short version.

***

jon richfield does make some very good points about
wanting to have good tools that do all the work -- or
at least as much of it that it is possible for tools to do.

jon notes that he gets .html output from his o.c.r.-app.

the problem with that .html is that you can't maintain it.
even maintaining one such file can be a chore, but when
you must maintain tens of thousands, it gets impossible.

likewise with all the hand-crafted .html coming from d.p.
it takes far too long to determine the unique fingerprint
of each one, so you'll know what you need to do to fix it.

you can be sure the world will "progress" to _something_
different -- be it .html6, xhtml3, the "semantic web" or
something we haven't even anticipated at this early date
-- and when it does, the p.g. .html files will be abandoned.

it'll be easier to "start from scratch" (i.e., from the .txt files,
or maybe even google's o.c.r.) than to convert that .html...

_creation_ is just the first step.   _maintenance_ is long-term.

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080601/e7e7a6d3/attachment.htm 

From wainwright1000 at gmail.com  Mon Jun  2 01:57:57 2008
From: wainwright1000 at gmail.com (Andrew Wainwright)
Date: Mon, 2 Jun 2008 10:57:57 +0200
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
	and BB
Message-ID: <927a0ec40806020157u3d78cd04pc048133d7a4fd5ee@mail.gmail.com>

 On May 31, 2008, Bowerbird at aol.com wrote:

> here's a good example -- the story of pocahontas:
> >   http://www.gutenberg.org/files/24487/24487.txt
>
> it has lots of stuff like this:
> >   She was a child of nature, and the birds trusted her
> >   and came at her call. She knew their songs, and
> >   where they built their nests. So she roamed the woods,
> >   and learned the ways of all the wild things, and
> >   grew to be a care-free maiden.
> >
> >   [Illustration]
>
> notice that "[illustration]" notation?  really something, isn't it?
> what good does it do?  tells the user they're missing a picture.
> doesn't do a single thing to tell them _where_ they can find it.
>
> even if they were to navigate to the folder where the pictures are:
> >   http://www.gutenberg.org/files/24487/24487-h/images/
> it doesn't tell them the _name_ of the file containing that picture.
>
> by looking at the .html version, i can tell you that the picture is here:
> >   http://www.gutenberg.org/files/24487/24487-h/images/i005-1.jpg
>
> is it that hard to imagine that if you put that u.r.l. in the .txt file,
> a .txt-file viewer-program could fetch it from there and display it,
> right there at that spot in the text?
>
> it isn't that hard for _me_ to imagine.
>
> so the person who prepared this book could've added a lot of value
> to the .txt file by including that information.  and make no mistake,
> they _had_ that info, since they needed it to make the .html version.
> but they _took_an_extra_step_of_work_ to discard it from the .txt file.
>

As someone who produces ebooks similar in structure to the one mentioned,
could you please let me know how I find the eBook number, before I produce
the ebook?  (It's always part of illustration URLs.)  As far as I know, the
ebook number is only assigned at whitewashing time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080602/e2c3e9e0/attachment.htm 

From prosfilaes at gmail.com  Mon Jun  2 03:37:10 2008
From: prosfilaes at gmail.com (David Starner)
Date: Mon, 2 Jun 2008 06:37:10 -0400
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
	and BB
In-Reply-To: <c5b.3245b7aa.357253e8@aol.com>
References: <c5b.3245b7aa.357253e8@aol.com>
Message-ID: <6d99d1fd0806020337m28c991c0hbdc5dd11b41ad744@mail.gmail.com>

On Sat, May 31, 2008 at 3:10 AM,  <Bowerbird at aol.com> wrote:
>>   [Illustration]
>
> notice that "[illustration]" notation?

There's no [illustration] notation there.

> really something, isn't it?
> what good does it do?  tells the user they're missing a picture.
> doesn't do a single thing to tell them _where_ they can find it.

True. In most cases it should probably be omitted.

> is it that hard to imagine that if you put that u.r.l. in the .txt file,
> a .txt-file viewer-program could fetch it from there and display it,
> right there at that spot in the text?

No plain text viewer could do it, because then it would no longer be a
plain text viewer. Furthermore, there's no real need to worry about
what can be imagined; we should worry about what's useful to most of
our users, who aren't and won't be using such a tool.

> second, "reading" isn't the only thing people will do with these files.
> they will be remixing them, and we make that unnecessarily difficult
> whenever we fail to include all of the relevant information in the file...

No, for many purposes, if you include extraneous data in the file, it
makes it harder to remix. If you want a corpus, you want all that
image information in the trash, not messing up your corpus.

From Bowerbird at aol.com  Mon Jun  2 10:24:04 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 2 Jun 2008 13:24:04 EDT
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
	and BB
Message-ID: <bc6.24342438.357586b4@aol.com>

andrew said:
>   As someone who produces ebooks similar in structure to the one 
>    mentioned, could you please let me know how I find the eBook number, 
>    before I produce the ebook?? (It's always part of illustration URLs.)? 
>    As far as I know, the ebook number is only assigned at whitewashing 
time.

yeah, that's kind of silly how they keep you in the dark, isn't it?
at the very least, they should let you use a "placeholder" string
that would be automatically converted to the e-text number...

at any rate, since the images for any particular e-text _are_ in a
specific location, you don't need to put the full u.r.l. in the e-text;
just put the filename.   oh, and thanks for asking a good question...

the assumption my viewer-program makes is that any images are
in the same folder as the .txt file, but it's also smart enough to look
in a subfolder named "images" if there happens to be one of those...
(this rule applies whether the e-text is located on the web or offline.)

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080602/20fde545/attachment.htm 

From richfield at telkomsa.net  Mon Jun  2 06:06:03 2008
From: richfield at telkomsa.net (Jon Richfield)
Date: Mon, 02 Jun 2008 15:06:03 +0200
Subject: [gutvol-d] Concerning the finger and those Gothic digressions...
Message-ID: <4843F03B.30904@telkomsa.net>

BB, you have been BBing too long. 
It clearly has been eroding your mind into too-well-worn channels. 
When the only thing that happens when one speaks is that one's fingers 
get squashed, every topic looks like a hammer. 
I wasn't giving anyone any fingers, much less any ells, elbows or arms.  
The topic in question is of tenuous interest to me, whatever its 
intrinsic merit. 
If I had no text input but HTML, and nothing but text readers and a 
graphics displayer to look at the pics with, it would be a minor problem 
to turn it into reasonably useful TXT plus graphic files, so I am not 
terribly sensitive to the needs of more vulnerable parties.  Mea maxima 
culpa of course.  But giving people TXT plus pic references seems barely 
more useful than giving them HTML.  You have of course inspected HTML 
source?  Give or take a few tags, and ignoring really elaborate 
formatting like tables, multiple columns, indenting and so on, it looks 
almost suspiciously like text, doesn't it?    OTOH, if a user cannot 
handle graphic formats, then it hardly matters whether the graphics 
accompany TXT or HTML.  If I condemned everyone to obscure PDF source 
instead, I might feel a greater sense of guilt.  Or am I missing summat? 

Being, as I am, an unfrocked biologist, I am not much moved to join in 
the merry romp of partisans for rival formats and their associated 
software, obvious though the eventual value of a universally homologated 
and accepted superior notation may be.  Analogously, I am a keen 
reformer of spelling and English: in principle I out-Shaw Shaw any day,  
but in practice I go through the world rapping the knuckles of 
perpetrators of errors and infelicities in terms of Onions and Fowlers 
and I writhe I whenever I catch myself in similar sin.  You see, until 
everyone learns how much better it would be to listen to me, all reform 
is futile, so it is better to nurture and conserve such merits as the 
language retains for as long as infer does not mean imply, any more than 
if means whether. 

Similarly, when the hurly-burly's done, when all have accepted a nice 
new convention, feel welcome to wake me with a kiss after the requisite 
hundred years. 

All I was saying was that a particular notation is convenient to me, 
both for reading and preparation, and is widely usable for others. 

Silly of me of course, but what is new about that? 

All the best,

Jon


From Bowerbird at aol.com  Mon Jun  2 10:46:46 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 2 Jun 2008 13:46:46 EDT
Subject: [gutvol-d] one month of nothing been done on a roundless system for
	d.p.
Message-ID: <cce.300513e2.35758c06@aol.com>

it's been a month since piggy updated the "confidence in page" wiki
over at d.p.   i understand he got a new job that's keeping him busy...

meanwhile, rfrank has been continuing with his experiments, but
he seems to work a bit more privately.   and that's fine, but it's also
easy to get the impression that no one at d.p. really _cares_ about
implementing a roundless system.   (rfrank seems to be hung up on
_parallel_proofing_, and it doesn't seem to me that he's considered
the extra costs involved therein.   as usually, p1 proofers are seen as
plentiful, so there's little effort made to conserve them as a resource.)

rfrank also does programming, so there's a very real chance that he
will be guided by the data to learn the importance of preprocessing,
so i'm optimistic about that, because then d.p. will get worthy tools...

but again, it's quite sad "the powers that be" don't support this better.

meanwhile, much time and energy is being spent _circumventing_
the present workflow -- what with p1->p1 becoming very popular,
and f1->f1 experiments being done now -- which is a total waste,
because if that time and energy were being spent on _roundless_
experimentation, it could be leading to a _much_ better outcome...
("skips" in the workflow mean some pages get too little attention.)

as usual, their focus is myopic on "getting this book here done now",
instead of a _smarter_ investment of time and energy in determining
the best way to get the most books done in the near and far _future_.

they're "too busy" using their shovels to fire up the bulldozer nearby...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080602/199bf685/attachment.htm 

From Bowerbird at aol.com  Mon Jun  2 11:38:19 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 2 Jun 2008 14:38:19 EDT
Subject: [gutvol-d] Concerning the finger and those Gothic digressions...
Message-ID: <c2f.34c81b85.3575981b@aol.com>

jon richfield said:
>    giving people TXT plus pic references seems 
>    barely more useful than giving them HTML.?

well, jon, that's where i can tell you unequivocally that
_you_are_wrong_.

from the standpoint of reworking an e-text, it's _much_ 
easier to start with clean -- and non-deficient -- text
than to try and rework an .html file.

in fact, if you _do_ have only an .html file to work with,
the best course is to turn it into clean non-deficient text,
so that you _can_ start it all anew, working from scratch...

take it from me.   i've done it.

in fact, just to give you an example of such a reworking,
i redid the "pocahontas" e-text i used earlier as an example.

you can find my version here:
>    http://z-m-l.com/pgmirror/88/pocah.html

for comparison, see the original .html version:
>    http://www.gutenberg.org/files/24487/24487-h/24487-h.htm

i like my version better, because it retains the look-and-feel
landscape orientation of the original book.   i also prefer how
my table-of-contents links work better.   (plus, you can click
on a picture to "turn the page", which brings the next spread
up in a focused way, to avoid scrolling.)   you might prefer mine,
or you might prefer the other.   the point is _some_ people _will_
want to rework your e-texts, for an infinite variety of reasons,
and there are simple things you can do to make it easy for 'em.

it was pretty easy for me to rework this text.   _except_ for the
fact that i had to puzzle out which graphic went on each page.
(thankfully, the graphic file-names reflected their pagenumber,
so even that was pretty easy to figure out, for the most part...)

it would have been more difficult for me to figure out how to
rework the _.html_ to get the precise look-and-feel i wanted.
first i would've had to determine what the original producer did,
and then decide how to make his system do what i wanted to do.
believe me, it was easier to work from scratch.

but if you wanna provide a little "pudding" as some "proof" for
your side of the argument, jon, _you_ could rework that .html...
show the world that it's not really as "difficult" as i make it out.
i'm sure if you try, you will begin to see what i'm talking about.

plus, now that my .txt version is clean _and_ non-deficient:
>    http://z-m-l.com/pgmirror/88/pocah.zml
i can generate new versions -- .html, .pdf, whatever -- easily!


>    You have of course inspected HTML source?

um, yeah, i sure have...   why, just yesterday, i looked at
the .html source for a few of the e-texts you did for p.g.
(we can talk about those, if you'd like...)            :+)


>    Give or take a few tags, and 
>    ignoring really elaborate formatting like tables, 
>    multiple columns, indenting and so on, 
>    it looks almost suspiciously like text, doesn't it??

you know, if you ignore the mane, and the coloring, and
the lack of stripes, a lion looks "suspiciously" like a tiger.

but if what you want is a tiger, it's best to start with a tiger,
not a lion.

so when i only have an .html file, i load it into a browser,
and then i do a copy-and-paste into my word-processor.
that's the easiest way to get back to the text, and just text.

for the information of the mac people out there, safari is
the best browser to use when you do this, because it will
_retain_ the bulk of the formatting, which is important...

still, there usually remains much formatting to be re-done.

(you failed to mention the one i hate most -- translating
those ampersand-entities back into their .txt equivalents.)


>    OTOH, if a user cannot handle graphic formats, then it 
>    hardly matters whether the graphics accompany TXT or HTML.

i don't know of any users these days who cannot handle graphics.
yet p.g. still largely removes info about the graphics in the .txt files.

i should note that there are a few smart producers out there who
do indeed include the filenames of the graphics in their .txt files.
i wish i'd kept track of _their_ names so i could laud them publicly.


>   and I writhe I whenever I catch myself in similar sin.

it's best not to talk about mistakes, because then you'll make one.        
:+)


>   All I was saying was that a particular notation is convenient to me,
>    both for reading and preparation, and is widely usable for others.
>    Silly of me of course, but what is new about that?

well, of course, you know that's not "silly".   it's very astute.

and i was merely trying to inform you that you could give
other people -- who want to rework your e-texts -- some
_help_ by including the graphic filenames in your .txt files...

i also said that, if this was "too much work" -- or whatever --
that you didn't _want_ to do it, that that was fine, too, because
i will be reworking the p.g. files to put this information back in.

so surely i'm not out of line for making such a simple suggestion.
am i?

all the best,

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080602/627c62ca/attachment-0001.htm 

From Bowerbird at aol.com  Mon Jun  2 11:51:44 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 2 Jun 2008 14:51:44 EDT
Subject: [gutvol-d] spam spam spam spam spam spam spam
Message-ID: <cb7.2efd9e81.35759b40@aol.com>

i see josh and david in my spam folder.

if they've made any valid points to which you would like to
see me respond, rephrase them in your own words, please,
backchannel or frontchannel, and i'll respond frontchannel.

otherwise, i will just assume they aren't talking about me...
no sense being paranoid, is there?, especially about fleas...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080602/c3d8c1a6/attachment.htm 

From Bowerbird at aol.com  Mon Jun  2 14:47:08 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 2 Jun 2008 17:47:08 EDT
Subject: [gutvol-d] e-book viewer-programs for p.g. e-texts
Message-ID: <c41.373cfbb4.3575c45c@aol.com>

speaking of e-book viewer-programs for p.g., here's one:
>    http://voluminous.wooji-juice.com/blog

this is for the mac only -- and only mac-os 10.5 to boot --
but i mention it because just today, the guy made a post
to his blog about determining where new chapters begin:
>    http://voluminous.wooji-juice.com/blog/1-0-5-chapter-and-verse.html
he notes that _inconsistencies_ in the e-text formatting
make this task more difficult than it would be otherwise.

just wait until he tries to figure out _footnotes_...          ;+)

***

nor is this guy the only one using p.g. e-texts as content...

there's also blackmask.com and manybooks.net, of course,
and feedbooks.com, which creates some nice-looking books.

but there is also ybook, and ubook, and fbreader as well...
>    http://www.spacejock.com/yBook.html
>    http://www.gowerpoint.com/
>    http://www.fbreader.org/

this program for the nokia handhelds browses the p.g. catalog:
>    http://www.elisanet.fi/ptvirtan/software/gutenbrowse/index.html
it then lets you download the e-texts to be displayed by fbreader...

and here are some lesser-known programs...
>    http://guten.sourceforge.net/
>    http://gutenpy.sourceforge.net/#about
>    http://pybookreader.narod.ru/download.html
>    http://jbook.sourceforge.net/
>    http://pyge.sourceforge.net/

and of course there are also the programs aimed at the ipod:
>    http://ebookhood.com/ipod-ebook-creator
>    http://www.tomsci.com/book2pod/
>    http://pod2go.en.softonic.com/
>    http://www.macupdate.com/info.php/id/16915/podreader
>    http://homepage.mac.com/applelover/text2ipodx/text2ipodx.html
>    http://burtcom.com/lex/#Anchor-iPoDoc-49575
>    http://www.ipodebookmaker.com/
>    http://www.iamlarge.com/

i suspect the list of programs available for ipod reading
will grow exponentially once developers take to the s.d.k.,
and i look forward with delightful expectation to what steve
will reveal about the iphone's future at w.w.d.c. next month...

moreover, with technologies like adobe-air and ms-silverlight
popping up all over the place, not to mention google-gears
and yahoo's just-announced in-browser plug-in capabilities,
there will be more and more developers taking on the job of
creating e-book viewer-programs.   it's a shame that each one
will have to suffer through the same discovery of the hassles
caused by inconsistencies that resulted in the blog entry above.

but of course, that just means that they'll be very, very happy
once they find out that i have produced a _consistent_ corpus.

speaking of adobe-air, here's a newish viewer-app based on it:
>    http://members.cox.net/dean-mckee/

-bowerbird

p.s.   by the way, in doing the google-work for this message,
i came across the original o.l.p.c. interest in the p.g. corpus:
>    http://dev.laptop.org/wiki/EBookViewerFormatSpec
the date on my edit of the page shows it was 22 months ago.


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4?&amp;
NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080602/72db23f2/attachment.htm 

From richfield at telkomsa.net  Tue Jun  3 01:40:43 2008
From: richfield at telkomsa.net (Jon Richfield)
Date: Tue, 03 Jun 2008 10:40:43 +0200
Subject: [gutvol-d] Concerning the finger and those Gothic digressions...
Message-ID: <4845038B.8030809@telkomsa.net>

Hi again BB,

I'm not sure why we are errr... in discussion.  Well-meaning friends do 
accuse me of a  tendency towards obscurity in both articulation and 
verbalisation, but who listens to well-meaning friends anyway?  
Certainly not everyone in this forum, no?  In any case, I cannot 
remember what I said to leave you unable to assimilate my intimations.

Do I sleep? do I dream? Do I wonder and doubt? Are things what they 
seem? Or is visions about?  What did I say to give the impression that I 
thought the media I used or the format I adopted were perfect, or even 
satisfactory?  Did I not indicate that once everyone had elected to use 
something superior I would change unhesitatingly?  And have I not lived 
up to that commitment so far?  Even to a more generous commitment to 
change once such an improvement became generally accepted as an 
accessible alternative to our current tools?  Haven't I, BB?  Haven't 
I?  And what appreciation do I get for it?  A lecture on lions and 
tigers?  Barely escaping the bears! Oh my?  Someday when you are in a 
duly palaeotaxonomic mood, we must sit down and have a good long 
discussion on the interchangeability of species in the genus Panthera, 
in the strata Pleistocene to Recent.  Fascinating stuff of course, but 
currently I am busy.  Part of what I am busy on (just part, not being in 
the same category as David W. nor even aspiring to anything like it) is 
capturing material that I value and commend to others and that may well 
no longer be there once the eventual, the ineffably, unassailably, 
perfect, system is established.  Nothing I have seen here so far 
persuades me that its advent is imminent.  Count me among the deafer, 
more shortsighted, even curmudgeonly, among the shepherds watching the 
flocks.  I am the one who goes on with the sheep while the enlightened 
charge off to worship.  Silly, limited me... <sob> 


The best is enemy to the good.  (Lan' o' Goshen! My text is so creative 
today.  I cannot help thinking that if Heaven had not made me a lunatic 
my peculiar talent might have made me an entertaining writer!)  So I do 
not aspire to the best, BB, not till it is served predigested by 
stronger, more corrosive entrails than mine own.  (In the interests of 
my appetite, it might be well to omit identifying those energetic 
entrails before feeding me their proceeds, but don't say that I do not 
show willing!)  I also cannot help thinking that if I had insisted on 
selecting one crusade among those in contention, and nailed my baby 
knickers to its mast, I should have done less good than I have done by 
submitting my few miserable and internally inferior scans so far, not 
even to mention the fact that this is but one of my fields of activity.  
Do I realise that we are building an ever-accumulating task for future 
tidiers of our archives?  Certainly.  Do I regret it?  As a systems 
consultant of decades' standing, definitely.  Do I none the less think 
that that is at least better than accumulating an increasingly Augean 
agglomeration of Psocopteran-riddled, oxidation-browned paper?  Decidedly. 

Rest assured that by the time everyone has seen the future in the 
blinding illumination of your insights and achievements, I shall have 
produced no more than a few more items and shall subsequently join the 
congregation around the shrine that I had overlooked in my grubbing and 
plodding.  I might even assist in the manual conversion of the works 
that I had perpetrated in forms inaccessible to the tools of that future. 

But for now I shall neither encumber nor oppose you.  That which lies 
there bleeding is my heart.  That which you smell there burning is my 
zeal. 

Go well, good luck, and enjoy.

Jon


From hart at pglaf.org  Tue Jun  3 10:02:49 2008
From: hart at pglaf.org (Michael Hart)
Date: Tue, 3 Jun 2008 10:02:49 -0700 (PDT)
Subject: [gutvol-d] Concerning the finger and those Gothic digressions...
In-Reply-To: <c2f.34c81b85.3575981b@aol.com>
References: <c2f.34c81b85.3575981b@aol.com>
Message-ID: <Pine.LNX.4.64.0806030959430.15282@pglaf.org>


Any time technology sufficiently advances,
then you come to a point where starting on
things from scratch via the new technology
gets easier and easier to the final point,
where it is easier than rebuilding ye olde
stuff and better to start from scratch.

I first said this long long ago. . . .


Michael


On Mon, 2 Jun 2008, Bowerbird at aol.com wrote:

> jon richfield said:
>>    giving people TXT plus pic references seems
>>    barely more useful than giving them HTML.?
>
> well, jon, that's where i can tell you unequivocally that
> _you_are_wrong_.
>
> from the standpoint of reworking an e-text, it's _much_
> easier to start with clean -- and non-deficient -- text
> than to try and rework an .html file.
>
> in fact, if you _do_ have only an .html file to work with,
> the best course is to turn it into clean non-deficient text,
> so that you _can_ start it all anew, working from scratch...
>
> take it from me.   i've done it.
>
> in fact, just to give you an example of such a reworking,
> i redid the "pocahontas" e-text i used earlier as an example.
>
> you can find my version here:
>>    http://z-m-l.com/pgmirror/88/pocah.html
>
> for comparison, see the original .html version:
>>    http://www.gutenberg.org/files/24487/24487-h/24487-h.htm
>
> i like my version better, because it retains the look-and-feel
> landscape orientation of the original book.   i also prefer how
> my table-of-contents links work better.   (plus, you can click
> on a picture to "turn the page", which brings the next spread
> up in a focused way, to avoid scrolling.)   you might prefer mine,
> or you might prefer the other.   the point is _some_ people _will_
> want to rework your e-texts, for an infinite variety of reasons,
> and there are simple things you can do to make it easy for 'em.
>
> it was pretty easy for me to rework this text.   _except_ for the
> fact that i had to puzzle out which graphic went on each page.
> (thankfully, the graphic file-names reflected their pagenumber,
> so even that was pretty easy to figure out, for the most part...)
>
> it would have been more difficult for me to figure out how to
> rework the _.html_ to get the precise look-and-feel i wanted.
> first i would've had to determine what the original producer did,
> and then decide how to make his system do what i wanted to do.
> believe me, it was easier to work from scratch.
>
> but if you wanna provide a little "pudding" as some "proof" for
> your side of the argument, jon, _you_ could rework that .html...
> show the world that it's not really as "difficult" as i make it out.
> i'm sure if you try, you will begin to see what i'm talking about.
>
> plus, now that my .txt version is clean _and_ non-deficient:
>>    http://z-m-l.com/pgmirror/88/pocah.zml
> i can generate new versions -- .html, .pdf, whatever -- easily!
>
>
>>    You have of course inspected HTML source?
>
> um, yeah, i sure have...   why, just yesterday, i looked at
> the .html source for a few of the e-texts you did for p.g.
> (we can talk about those, if you'd like...)            :+)
>
>
>>    Give or take a few tags, and
>>    ignoring really elaborate formatting like tables,
>>    multiple columns, indenting and so on,
>>    it looks almost suspiciously like text, doesn't it??
>
> you know, if you ignore the mane, and the coloring, and
> the lack of stripes, a lion looks "suspiciously" like a tiger.
>
> but if what you want is a tiger, it's best to start with a tiger,
> not a lion.
>
> so when i only have an .html file, i load it into a browser,
> and then i do a copy-and-paste into my word-processor.
> that's the easiest way to get back to the text, and just text.
>
> for the information of the mac people out there, safari is
> the best browser to use when you do this, because it will
> _retain_ the bulk of the formatting, which is important...
>
> still, there usually remains much formatting to be re-done.
>
> (you failed to mention the one i hate most -- translating
> those ampersand-entities back into their .txt equivalents.)
>
>
>>    OTOH, if a user cannot handle graphic formats, then it
>>    hardly matters whether the graphics accompany TXT or HTML.
>
> i don't know of any users these days who cannot handle graphics.
> yet p.g. still largely removes info about the graphics in the .txt files.
>
> i should note that there are a few smart producers out there who
> do indeed include the filenames of the graphics in their .txt files.
> i wish i'd kept track of _their_ names so i could laud them publicly.
>
>
>>   and I writhe I whenever I catch myself in similar sin.
>
> it's best not to talk about mistakes, because then you'll make one.
> :+)
>
>
>>   All I was saying was that a particular notation is convenient to me,
>>    both for reading and preparation, and is widely usable for others.
>>    Silly of me of course, but what is new about that?
>
> well, of course, you know that's not "silly".   it's very astute.
>
> and i was merely trying to inform you that you could give
> other people -- who want to rework your e-texts -- some
> _help_ by including the graphic filenames in your .txt files...
>
> i also said that, if this was "too much work" -- or whatever --
> that you didn't _want_ to do it, that that was fine, too, because
> i will be reworking the p.g. files to put this information back in.
>
> so surely i'm not out of line for making such a simple suggestion.
> am i?
>
> all the best,
>
> -bowerbird
>
>
>
> **************
> Get trade secrets for amazing burgers. Watch "Cooking with
> Tyler Florence" on AOL Food.
>      (http://food.aol.com/tyler-florence?video=4&amp;
> ?NCID=aolfod00030000000002)
>

From Bowerbird at aol.com  Tue Jun  3 10:49:02 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 3 Jun 2008 13:49:02 EDT
Subject: [gutvol-d] Concerning the finger and those Gothic digressions...
Message-ID: <d4c.207acf4e.3576de0e@aol.com>

jon said:
>    I'm not sure why we are errr... in discussion.

well, i made a simple point, and then you responded.
and kept responding.   even though you claim you are
uninterested in the topic.   i am _quite_ interested in
the general topic of revealing the inherent power of
the .txt files, so that's why i keep responding.   but yes,
i'm making general points intended at the whole list...
i'm talking to you personally because -- otherwise --
when one speaks to an "abstract" entity like "the list
as a whole", one can become too removed from reality.
so i like to interact with an actual human -- like you --
to stay grounded, rather than pontificate at large.         :+)

(although, as we all know, i can -- and do -- do both.)


>    Well-meaning friends do accuse me of a? tendency 
>    towards obscurity in both articulation and verbalisation

nah.   they're just jealous of your poetic inclination...        ;+)


>    In any case, I cannot remember what I said 
>    to leave you unable to assimilate my intimations.

well, as i said, it's really not about you.   it's a general point.

and that general point is that _some_ people -- and you might
well be one of them -- are trying to _cripple_ the .txt versions,
either intentionally or because they can't see the inherent power.

"the .txt files cannot include illustrations," they'll tell you.   bullshit!
_of_course_ they can.   they include illustrations the very same way
that the .html files include illustrations -- by listing their filename
so that a viewer-program (in the case of .html, that'd be a browser)
can show them at the point in the text where they should be shown.


>    Do I sleep? do I dream? Do I wonder and doubt? 
>    Are things what they seem? Or is visions about?

see?   poetry.   don't be ashamed of that, jon.   poetry is beautiful.


>    What did I say to give the impression that I thought the media I used 
>    or the format I adopted were perfect, or even satisfactory?? 

are you interested in making your formats _more_useful?_
(remember that "you" means every single digitizer on this list.)

if so, i just told you one dirt-simple way that you can do that.

if not, just ignore my suggestion.   like i said, i'll fix your files.

what is so difficult to understand about what _i_ have just said?


>    Did I not indicate that once everyone had elected to use
>    something superior I would change unhesitatingly?

you can even cling to your old ways _after_ everyone else has
changed to "something superior", and i'll _still_ fix your files...

nobody is asking you to change, jon.   (or anyone else, either.)

i just suggested a way that you _could_ make your .txt version
more useful to people out in the real world, if you _wanted_ to.

that's all.


>    And have I not lived up to that commitment so far?

again, it's not really about _you_, not you personally, jon,
so i haven't really been keeping track.   nor am i wont to...


>    Even to a more generous commitment to change 
>    once such an improvement became generally accepted 
>    as an accessible alternative to our current tools?? Haven't I, BB?

again, i don't really care if you change, jon.   not in the slightest...


>    Haven't I?? And what appreciation do I get for it?? 
>    A lecture on lions and tigers?? Barely escaping the bears! Oh my?? 

you have to admit it was funny of you to say that -- once you "look
past" the markup -- an .html file looks "suspiciously like" a text file.

as they say on the playground, if my aunt had balls, she'd be my uncle.


>    Someday when you are in a duly palaeotaxonomic mood, 
>    we must sit down and have a good long discussion on 
>    the interchangeability of species in the genus Panthera,
>    in the strata Pleistocene to Recent.

we'll have to make sure there is wifi, so i can consult wikipedia,
and look up some of those big words you're throwing around...


>    Fascinating stuff of course, but currently I am busy.

me too.   or rather, i should say that my computers are busy,
churning away on p.g. e-texts #10000-#25555+change...
step one is stripping off the headers and footers, and _boy_,
even that minor change makes it seem more like a _library_,
since the book-titles now pop to the top of the display and
remind you that these are indeed _books_ that you're viewing.
sometimes i forget what a headache that legalese gives me...


>    Part of what I am busy on (just part, not being in the same 
>    category as David W. nor even aspiring to anything like it)

there's almost nobody in the widger category.   if only we could make
a widget out of david widger, none of us would have to digitize again.


>    is capturing material that I value and commend to others 
>    and that may well no longer be there once the eventual, 
>    the ineffably, unassailably, perfect, system is established.

and for that, jon, i give to you my heartfelt thanks for doing it...

i'm not sure why you think this involves some "perfect" system,
let alone one that is "ineffably, unassailably" perfect, when all i
have suggested is the dirt-simple recommendation that digitizers
include the filename of the graphics file in their .txt versions, but
this seems a small price to bear (see, you did get the bear after all)
to elicit your wonderful poetry...


>    Nothing I have seen here so far persuades me that its advent 
>    is imminent.

nah, perfection is as elusive as ever...            :+)


>    Count me among the deafer, more shortsighted, even curmudgeonly, 
>    among the shepherds watching the flocks.? I am the one who goes on 
>    with the sheep while the enlightened charge off to worship.? 
>    Silly, limited me... <sob>

on the one hand you call yourself "curmudgeonly",
and on the other hand you are <sobbing>.   decide!
because curmudgeons don't cry.   it's against the code.


>    The best is enemy to the good.? (Lan' o' Goshen! 
>    My text is so creative today.? I cannot help thinking that 
>    if Heaven had not made me a lunatic my peculiar talent 
>    might have made me an entertaining writer!)

well, _i_ am entertained, for sure.   but since your "well-meaning"
friends don't seem to grok your charm, do not quit the day-job...


>    So I do not aspire to the best, BB

well, _that_ is unfortunate.   but nobody is perfect,
because, as i said, perfection is as elusive as ever...

(how come nobody ever notes that 
"the good is enemy to the best"?
and if these two are _really_ fighting,
whose side do you want to be on?)


>    I also cannot help thinking that if I had insisted on
>    selecting one crusade among those in contention, 
>    and nailed my baby knickers to its mast, 
>    I should have done less good than I have done by
>    submitting my few miserable and internally inferior scans so far

oh, hey, that reminds me.   you didn't submit your _scans_
to p.g. for the books that you digitized.   could you, please?
that way, when people remix your work in the future, they
will be able to see how the text looked in the original book,
and that will be a tremendous aid to them.   so be a pal, ok?

(but hey, if it's too much work for you to do that, no sweat.
i am sure google will get around to scanning your books.)


>    Do I realise that we are building an ever-accumulating 
>    task for future tidiers of our archives?? Certainly.? 

see, that's where you're wrong, jon.

there will be _no_ "future tidiers" of these archives,
because the job has already become too immense...

oh, it might still take some time for the "present tidiers"
to realize they've created a mess they cannot get out of,
but at _some_ point down the line, that realization will be
full-blown, and they will run screaming from the building.

you've seen on the nightly news about those government
projects where they spent hundreds of millions of dollars
on a computer system that just plain flat-out doesn't work?
and they can't get it to work?   so they just have to eat the loss?
it'll be the same thing here.   minus, of course, the small matter
of hundreds of millions of dollars.

oh, and please, whoever is thinking of doing it right now,
don't respond by saying "the text files will still be here"...
they will, but since they are inconsistent, they won't help.
and also, you've changed the linebreaks, so each text-file
can no longer be associated with a specific google scan-set
(if it ever could), so the future will have very little use for it...


>    Do I regret it?? As a systems consultant of decades' standing, 
>    definitely.

oh, i see the problem...   you've become inured to the problem...

you are one of the purveyors of those big complex systems that
don't work, so you think that's "the natural order" of such things.


>    Do I none the less think that that is at least better than 
>    accumulating an increasingly Augean agglomeration of 
>    Psocopteran-riddled, oxidation-browned paper?? Decidedly.

well, yes and no.   in spite of the fact i have immersed my life in
digitization, i do still have a deep and profound love for paper.
but of course i can certainly see your point as well.   and besides,
since the process of digitizing some books has kept you busy,
and therefore saved the world from at least _some_ damage
that might've occurred from more "systems consulting" by you,
i'd say that your hobby has had some good effects...           ;+)


>    Rest assured that by the time everyone has seen the future 
>    in the blinding illumination of your insights and achievements, 
>    I shall have produced no more than a few more items

well, you know, we each do what we can.   no one can expect more...       ;+)


>    and shall subsequently join the congregation around the 
>    shrine that I had overlooked in my grubbing and plodding.

you mean someone is going to put up a _shrine_ to the idea of
putting the graphics filenames in the .txt versions.   _who_knew?_
i mean, seriously, who knew?   i thought it was such a simple thing.

it's probably marcello.   that guy is such a cad.   have you seen
the fansite that he made for me?   so nice to see i have a stalker.


>    I might even assist in the manual conversion of the works that 
>    I had perpetrated in forms inaccessible to the tools of that future.

oh, we ain't going to do that work _manually_, my friend, not at all.

the program is already written to do most of the work automatically.
it pulls in the .html version, locates the "img" tags, gets the filename
from the "src" component, and then finds the equivalent location in
the .txt file, and plops the filename there.   yes, there is an occasional
idiosyncratic glitch, but overall it works pretty smoothly in most cases.
so this task isn't really a big deal.


>    But for now I shall neither encumber nor oppose you.

like i said, it's not really about you.   so do whatever you like, jon.


>    That which lies there bleeding is my heart.

i'd expect nothing less from a poet...


>    That which you smell there burning is my zeal.

no, actually, that's the smell of the burrito i had for breakfast.
you'd think i woulda learned by now those things give me gas.

go well, good luck, and enjoy.

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4&amp;
?NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080603/762f5c29/attachment-0001.htm 

From hart at pglaf.org  Wed Jun  4 09:28:12 2008
From: hart at pglaf.org (Michael Hart)
Date: Wed, 4 Jun 2008 09:28:12 -0700 (PDT)
Subject: [gutvol-d] can't see the forest for the trees
In-Reply-To: <d69.2d9538d8.3571abbb@aol.com>
References: <d69.2d9538d8.3571abbb@aol.com>
Message-ID: <Pine.LNX.4.64.0806040927160.13629@pglaf.org>

On Fri, 30 May 2008, Bowerbird at aol.com wrote:

> michael said:
>>    I would certainly like to mirror your mirror, if that's ok.
>
> you know that public-domain means you don't need my ok.
>
> plus i'd consider it to be an honor if you did that.
>
> -bowerbird
>

Please keep us posted!!!

And if you decide you need any volunteers. . .or want. . . .

me

>
>
> **************
> Get trade secrets for amazing burgers. Watch "Cooking with
> Tyler Florence" on AOL Food.
>      (http://food.aol.com/tyler-florence?video=4&amp;
> ?NCID=aolfod00030000000002)
>

From gbnewby at pglaf.org  Wed Jun  4 10:43:16 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed, 4 Jun 2008 10:43:16 -0700
Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al
	and BB
In-Reply-To: <927a0ec40806020157u3d78cd04pc048133d7a4fd5ee@mail.gmail.com>
References: <927a0ec40806020157u3d78cd04pc048133d7a4fd5ee@mail.gmail.com>
Message-ID: <20080604174316.GA16067@mail.pglaf.org>

On Mon, Jun 02, 2008 at 10:57:57AM +0200, Andrew Wainwright wrote:
> ...
> 
> As someone who produces ebooks similar in structure to the one mentioned,
> could you please let me know how I find the eBook number, before I produce
> the ebook?  (It's always part of illustration URLs.)  As far as I know, the
> ebook number is only assigned at whitewashing time.

You are right that we generally don't pre-assign eBook numbers.

The way it works in illustration URLs is that the URLs are
relative...so within the book you have something like:
  <a href="images/image01.png">image text</a>

which your Web browser will built a full URL out of.  So if
it's eBook #23456, this would be prepended:
  http://www.gutenberg.org/files/23456/23456-h/images/image01.png
or similar... so, eBook producers don't need to know the eBook
# for this to work.

In fact [and this is by design], relative URLs mean that the eBook will
be properly viewable on any Web or FTP site, or even a local directory.


  -- Greg

From Bowerbird at aol.com  Wed Jun  4 11:18:01 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 4 Jun 2008 14:18:01 EDT
Subject: [gutvol-d] can't see the forest for the trees
Message-ID: <c24.393a8ed2.35783659@aol.com>

michael said:
>    Please keep us posted!!!

oh, i'm not going anywhere...   i'm staying right here,
chatting in the lobby of the project gutenberg library.
so you'll hear all about it...

i have to say that i don't think you realize the danger
in forking your library.   i've been reluctant to do that,
but just can't wait on this any longer.   and if you're not
worried about it, it's silly for me to be worried about it.
but still, i don't think you fully understand the fallout...

and realistically, your people will not back-incorporate
my version of the library, because it is a repudiation of
their carelessness about inconsistencies in the library,
and they know it as well as i do and everyone else does.

i'm just sayin'...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4?&amp;
NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080604/4130d489/attachment.htm 

From Bowerbird at aol.com  Wed Jun  4 13:32:31 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 4 Jun 2008 16:32:31 EDT
Subject: [gutvol-d] e-book viewer-programs for p.g. e-texts
Message-ID: <cf7.327ccc9f.357855df@aol.com>

wouldn't you know it, just a few hours after i wrote up this post
on viewer-programs for p.g. e-texts, i learned of a new entrant.

"stanza" is now available in beta form from "lexcycle":
>    http://www.lexcycle.com
it promises to read a staggering number of formats...

this dude seems ambitious.   the website looks really nice.
and he's already written up a wikipedia entry for the app!

it's mac only, at present, though it looks like he wants to port it...
functionally, it won't handle any fancy formatting or graphics yet,
but like i said, this guy seems very ambitious.   look for this to grow.

"stanza" reminds me of another mac-only e-book viewer-app
which i'd forgotten to mention the first time around -- "tofu":
>    http://amarsagoo.info/tofu/index.shtml

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4?&amp;
NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080604/c55b1c93/attachment.htm 

From Bowerbird at aol.com  Thu Jun  5 12:54:37 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 5 Jun 2008 15:54:37 EDT
Subject: [gutvol-d] cory scores another first
Message-ID: <c47.3479af2c.35799e7d@aol.com>

this just came over on cory doctorow's announcement-list...

>    Remember four weeks ago when I told you 
>    that my young adult novel Little Brother 
>    made the New York Times bestseller list? 
>    Well, I've just heard from my publisher that 
>    it's about to go into its *fourth week* on the list,
>    having climbed to position *eight* Color me ecstatic! 
>    My sincere thanks to all of you who talked about the book,
>    gave it to your friends, sent it to teachers and librarians, 
>    and downloaded it -- you all helped make this 
>    the first-ever Creative Commons-licensed novel 
>    to get on the NYT list!

giving away free copies doesn't seem to hurt cory's sales,
that's for sure...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4?&amp;
NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080605/068ab3f9/attachment.htm 

From Bowerbird at aol.com  Thu Jun  5 16:25:24 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 5 Jun 2008 19:25:24 EDT
Subject: [gutvol-d] most recent research demo-app
Message-ID: <d55.3093965e.3579cfe4@aol.com>

today's research demo-app takes any p.g. e-text number and
downloads the graphic-files for that book and runs a slideshow.

e-mail me (telling me your o.s.) if you'd like a copy...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4?&amp;
NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080605/37bf092f/attachment.htm 

From ebooks at ibiblio.org  Fri Jun  6 00:54:13 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Fri, 06 Jun 2008 03:54:13 -0400
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <bf5.3154f7ac.355753ad@aol.com>
References: <bf5.3154f7ac.355753ad@aol.com>
Message-ID: <4848ED25.1040202@ibiblio.org>

Sorry for the very late reply, but my internet time has been limited 
for the past few weeks.

On May 10, 2008, Bowerbird wrote:

> sometimes the backbiting on this list gets _extremely_ 
> amusing...          :+)


I find the misinformation, ignorance, jumping to conclusions, and 
faulty logic more amusing. :)


> the original f.a.q. from michigan tells the numbers about michigan...

[snip]

> voila, we have the 7-million number.

[snip]

> voila, we have the 6-year timeframe...
> 
> 
> there were 5 libraries involved in the project at the outset.
> my guess -- back then, and even now today -- would be
> that if they intended to scan 7 million umichigan books in
> 6 years, they intended to scan _at_least_ another 7 million
> from the other 4 libraries in that same amount of time, so
> i'd say the implicit promise was to do 14 million in 6 years,
> and i don't think you can call that an unreasonable position,
> either then or now.


I'd not only call it "unreasonable," I'd call it silly, especially 
since both the "New York Times" and ABC News reported on Dec. 14, 2004 
that it could take 10 years to finish 15 million books. :)


> since -- after 3 years -- they've only scanned _1_ million books
> from umichigan, then it is _completely_ fair to say that they are
> "behind schedule" at umichigan.  of course, since many libraries
> (dozens?) have joined the project since its onset, i'd guess the
> schedule was altered somewhere along the line, and that's fine.
> i'm convinced they're working on it, and working hard, so fine...


You're assuming that the number of books the University of Michigan 
has placed online equals the number scanned, and it would be 
completely fair to say that your assumption is completely wrong. Take 
a look at the main MBooks page:

http://www.lib.umich.edu/mdp/

Note the "Scanning Schedule" about half-way down the page:


"Currently scanning from the Hatcher Graduate Library. More details 
about the Hatcher Scanning Schedule at Michigan. This summer we expect 
to scan Dentistry Library titles, Taubman Medical Library monographs 
and selected sections of the Undergraduate Library, before we return 
to the Graduate Library scanning in the Fall

"Library materials have been scanned from the Buhr Remote Shelving 
Facility, the Social Work Library, and the Art, Architecture and 
Engineering Library."


Now take another look at the UMich FAQ you like so much:

http://www.lib.umich.edu/staff/google/public/faq.pdf

Note these two FAQs from page 2 of the PDF:


"Q. 7: What collections in the library will be digitized?
A: Most of the University Library's bound print collections will be 
digitized (see Question 10 below for exceptions), beginning with all 
volumes in the Buhr Shelving Facility."

"Q. 9: In what order will the different libraries be scanned, and will 
the project include new acquisitions?
A: A timetable and strategy for digitizing volumes in locations other 
than Buhr will be developed over time. We are currently focusing on 
the 2.5 million volumes in Buhr; consequently, newly acquired 
materials are not factored into the conversion process. As we move 
into other libraries, we will formulate strategies for taking new 
acquisitions into account."


When I first saw that scanning schedule, I was tempted to assume that 
it meant Google had finished scanning all 2.5 million volumes in the 
Buhr Shelving Facility. But rather than just assume, I decided to 
check with someone who would know. So I emailed John Wilkin, an 
associate librarian who has been working with Google at the University 
of Michigan.

At first, because of Google's love of secrecy, Wilkin only told me 
that they've finished the Social Work Library, and the Art, 
Architecture and Engineering Library and that they've done "a 
considerable portion" of Buhr. But when I told him you'd said that 
Google had only scanned 1 million books at UMich and is "behind 
schedule," he sent me this reply:


"I think you can satisfy him by saying that, according to me (and
you can quote me), we've got well in excess of 1m online here at UM 
and have digitized more than twice that."


And thanks to something Paul Courant, University Librarian and Dean of 
Libraries at the University of Michigan, posted on his blog recently, 
we can calculate an estimate of how many UMich books have been posted 
and scanned. In this May 31st blog post:

"Microsoft Exits the Mass Digitization Business"
http://paulcourant.net/2008/05/31/microsoft-exits-the-mass-digitization-business/

Courant wrote, "In the meantime, the University of Michigan Library 
now has well over a million digitized books in its catalogue, with the 
number growing by thousands every day."

Now, the announcement of the "millionth book" was posted on or before 
February 2nd. ("Last Update: 08:30 PM EST on Saturday, February 02, 
2008".)

http://www.lib.umich.edu/news/millionth.html

Let's be conservative in our calculations, so let's assume that books 
have only been posted online on normal workdays (5 days per week). 
There have been 87 full workdays since February 2nd. (89 weekdays 
minus the Presidents' Day and Memorial Day holidays.) Now, Paul 
Courant said that the number is "growing by thousands every day." 
"Thousands" (plural) implies a minimum of 2,000 each day.

87 * 2,000 = 174,000.

Add that to 1 million, and we're up to 1.174 million UMich books 
online. Now, since John Wilkin told me that they've "digitized more 
than twice" the number they have online, let's calculate a range:

1,174,000 * 2.0 = 2,348,000
1,174,000 * 2.1 = 2,465,400
1,174,000 * 2.2 = 2,582,800
1,174,000 * 2.3 = 2,700,200
1,174,000 * 2.4 = 2,817,600
1,174,000 * 2.5 = 2,935,000


So even using conservative assumptions, we're looking at a minimum of 
about 2.4 million books scanned at UMich so far--and perhaps many more. :)


> i _do_ wish that -- 3 years into it -- they would be a little bit
> further along than 1 million out of 7 million umichigan books,
> because that makes it look like this could take 20 years total...
> but, you know, i'm not paying their bills, so what say do i have?


It looks like your wish has been granted. :) And to see how fast 
Google is racing through the stacks, take a look at the additional 
details about the current scanning:

http://www.lib.umich.edu/grad/mdpprogress.html

"Beginning on Tuesday, February 19, 2008, scanning in the Graduate 
Library started with the collections on the 3rd, 4th and 5th floors of 
Hatcher South. With approximately 250,000 volumes on each floor, it 
will take several months to digitize this part of the Graduate Library."

Only "several months" to digitize about 750,000 volumes!


Jose Menendez


P.S. I'm surprised that no one has mentioned on gutvol-d before now 
that Microsoft was quitting its book scanning operation.

From Bowerbird at aol.com  Fri Jun  6 01:32:01 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 6 Jun 2008 04:32:01 EDT
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <bf0.2b51b0ef.357a5001@aol.com>

jose said:
>   blah blah blah dodge distortion factoid blah blah blah

<yawn>...

time for me to go to bed...

-bowerbird

p.s.   tell john wilkin "hi" for me.   he never answers me
when i try to ask him a question directly on his blog...


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4?&amp;
NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080606/fce18cf4/attachment.htm 

From hart at pglaf.org  Fri Jun  6 09:15:38 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 6 Jun 2008 09:15:38 -0700 (PDT)
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <4848ED25.1040202@ibiblio.org>
References: <bf5.3154f7ac.355753ad@aol.com> <4848ED25.1040202@ibiblio.org>
Message-ID: <Pine.LNX.4.64.0806060850000.22177@pglaf.org>


On Fri, 6 Jun 2008, Jose Menendez wrote:

> Sorry for the very late reply, but my internet time has been 
> limited for the past few weeks.
>
> On May 10, 2008, Bowerbird wrote:
>
>> sometimes the backbiting on this list gets _extremely_ amusing... 
>> :+)
>
>
> I find the misinformation, ignorance, jumping to conclusions, and 
> faulty logic more amusing. :)

Sad. . .but amusing.

>
>
>> the original f.a.q. from michigan tells the numbers about 
>> michigan...
>
> [snip]
>
>> voila, we have the 7-million number.
>
> [snip]
>
>> voila, we have the 6-year timeframe...
>>
>>
>> there were 5 libraries involved in the project at the outset. my 
>> guess -- back then, and even now today -- would be that if they 
>> intended to scan 7 million umichigan books in 6 years, they 
>> intended to scan _at_least_ another 7 million from the other 4 
>> libraries in that same amount of time, so i'd say the implicit 
>> promise was to do 14 million in 6 years, and i don't think you 
>> can call that an unreasonable position, either then or now.
>
> I'd not only call it "unreasonable," I'd call it silly, especially 
> since both the "New York Times" and ABC News reported on Dec. 14, 
> 2004 that it could take 10 years to finish 15 million books. :)

Once again I feel I must point out that giving an estimate of:

10 years for 15 million books

is hardly a contradiction

6 years for 10 million books.


In fact, given the usual discrepancies between "Peter and Paul,"
I'd say these were pretty close.

Of course, in the original December 14, 2004 media frenzy,
there were no references to half the books scanned being a
secret library not available to the public. . .references,
all the references _I_ heard were about a great library an
entire world could use, and nothing about the majority for
some secret private library only insiders could use and an
entirely different, much more limited public library.


{snip]

> Courant wrote, "In the meantime, the University of Michigan 
> Library now has well over a million digitized books in its 
> catalogue, with the number growing by thousands every day."
>
> Now, the announcement of the "millionth book" was posted on or 
> before February 2nd. ("Last Update: 08:30 PM EST on Saturday, 
> February 02, 2008".)

Now THIS is what they were all talking about on 12/14/04 and it
is what the public can presumably use.

3 years and a fraction for the first million public eBooks.

Let's suppose they double in the next equal timeframe,
double production, that is, triple the total.

That would be:

6 years and two fractions for the first 3 million public books.

Which could happen, not to mention the other libraries.

We'll just have to wait to see what happens.

The real question, of course, is "how useful will they be?"

Will there be too many that are just raw scans?

Will there be too many that are just raw OCR?

How many will be full text files of 99.975+& accuracy?


> P.S. I'm surprised that no one has mentioned on gutvol-d before 
> now that Microsoft was quitting its book scanning operation.

Perhaps no one here actually believed Microsoft was serious about
doing eBooks in the first place.

Perhaps some of the people here realized that Microsoft was not
going to be happy about losing the Yahoo! deal, and combined it
with the fact that Yahoo! is also a major supporter of the same
OCA [Open Content Alliance] that Microsoft was trying to get in
perhaps yet another kind of takeover bid.

Quite possibly Microsoft found, as did the Federal Spook Agency
and Co., that Brewster Kahal, who runs the OCA, is not quite an
easy to maniputulate person as they had assumed.


In any of these cases the real proof is in the pudding, as is a
similar case with Amazon's Kindle and Sony's Reader. . .and The
Google Book Search. . .how many people download how many books?

Amazon is reluctant to admit they have hardly sold any Kindles,
just as Sony won't admit The Sony Reader isn't selling, even at
reduced pricing.

The pundits aren't even estimating much over 50,000 sales for a
reader of any brand, and that is nothing in a world with a more
than a billion computer total and over 3 billion cellphones and
who knows what other devices for reading eBooks.


mh


> _______________________________________________ gutvol-d mailing 
> list gutvol-d at lists.pglaf.org 
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From Bowerbird at aol.com  Fri Jun  6 09:29:43 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 6 Jun 2008 12:29:43 EDT
Subject: [gutvol-d] mockingbirds
Message-ID: <cda.30369684.357abff7@aol.com>

over on the d.p. forms, dkretz recommends this:
>    http://www.ted.com/talks/view/id/161

it's a talk at t.e.d. by a lexicographer.
she's cute, funny, and smart.
highly recommended.

also cute, funny, and smart, 
and highly recommended as well,
is the video from that same session, 
by my performance poet friend rives:
>    http://www.ted.com/talks/view/id/108

and yeah, that second guy giving him a 
bear-hug afterward actually _is_ al gore...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4?&amp;
NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080606/4bb83182/attachment.htm 

From Bowerbird at aol.com  Fri Jun  6 10:28:23 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 6 Jun 2008 13:28:23 EDT
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <c4f.2d1642db.357acdb7@aol.com>

michael said:
>    Once again I feel I must point out 
>    that giving an estimate of:
>    10 years for 15 million books
>    is hardly a contradiction
>    6 years for 10 million books.

i'm not sure why you feel the need to "point out" simple arithmetic like 
that...
i'd say that whoever would fall for such a dodge isn't worth bothering about.

jose's just trying to yank your chain.

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4?&amp;
NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080606/e6c3267d/attachment-0001.htm 

From julio.reis at tintazul.com.pt  Fri Jun  6 10:41:06 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Fri, 06 Jun 2008 18:41:06 +0100
Subject: [gutvol-d] cory scores another first
In-Reply-To: <mailman.2540.1212773308.2809.gutvol-d@lists.pglaf.org>
References: <mailman.2540.1212773308.2809.gutvol-d@lists.pglaf.org>
Message-ID: <1212774066.6961.113.camel@abetarda>

> giving away free copies doesn't seem to hurt cory's sales,
> that's for sure...

And I like his reasoning for licensing under Creative Commons... the
thing about a paradigm shift in printed books... on how giving away
books raises his profile, so that if *selling* books becomes the past,
he'll still earn a good living by being invited to do lectures. So, he's
laying his eggs in all baskets.

Or, to use the nomenclature from 'Down and Out,' he's "raising his
Whuffie." Nice book, and <infomercial> dailylit.com is great too
</infomercial>

J?lio.


From marcello at perathoner.de  Fri Jun  6 11:40:41 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri, 06 Jun 2008 20:40:41 +0200
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <Pine.LNX.4.64.0806060850000.22177@pglaf.org>
References: <bf5.3154f7ac.355753ad@aol.com> <4848ED25.1040202@ibiblio.org>
	<Pine.LNX.4.64.0806060850000.22177@pglaf.org>
Message-ID: <484984A9.20000@perathoner.de>

Michael Hart wrote:

> In any of these cases the real proof is in the pudding, 

Michael Hart and Bowerbird announced today they founded Distributed 
Proofeaters.

The goal of Distributed Proofeaters is to find the proof that was 
allegedly hidden in a pudding (by terrorists).

Bowerbird wrote a pudding reader program that is almost in beta stage.

It is estimated they will eat about 10 million puddings in 6 years.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From Bowerbird at aol.com  Fri Jun  6 13:07:19 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 6 Jun 2008 16:07:19 EDT
Subject: [gutvol-d] a new type of inconsistency
Message-ID: <cdb.32ccc0cf.357af2f7@aol.com>

it must be tough to keep coming up with new inconsistencies in the e-texts.

but thanks to the ingenuity of many volunteers, and the willing cooperation
of the whitewashers, project gutenberg continues with its important mission.

take e-text #25536 as an example:
>    http://www.gutenberg.org/files/25536/

the images for an e-text are stored in the "images" subdirectory located
in the html folder -- 25536-h/ in this case -- which is all well and good...

except that the producer of this e-text decided to link to the page-images.
now, that's a great idea -- indeed, i've suggested it often in the past.   
but...

but for this text, the page-scans are in the "images" subdirectory, which is
inconsistent with the way the page-scan images have always been handled,
i.e., in a "#####-page-images" subdirectory.   see, for instance, this 
e-text:
>    http://www.gutenberg.org/files/22144/

just another wrinkle that needs to be smoothed out for a consistent 
library...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4?&amp;
NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080606/72f91e4d/attachment.htm 

From Bowerbird at aol.com  Sun Jun  8 11:50:39 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 8 Jun 2008 14:50:39 EDT
Subject: [gutvol-d] "but when she came there, the cupboard was bare..."
Message-ID: <c86.31459194.357d83ff@aol.com>

here's an observation:
>    http://z-m-l.com/misc/was_bare.html

on the left is the original. 
on the right is the version from distributed proofreaders.

i'd say there's a large amount of disutility in removing text 
from pictures like this...   but i'd be open to a discussion...

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4?&amp;
NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080608/cf381fbb/attachment.htm 

From tb at baechler.net  Sun Jun  8 12:15:31 2008
From: tb at baechler.net (Tony Baechler)
Date: Sun, 8 Jun 2008 12:15:31 -0700
Subject: [gutvol-d] Preprints: Hart/Newby presentation
Message-ID: <20080608191531.GA25191@investigative.net>

Hello,

I didn't see any contact email address on the Preprints site, so I'm 
asking here.  I saw the Hart and Newby presentation from HOPE 6 in the 
form of an .iso file.  What exactly is needed to make this ready for PG?  
All the site said was that it would be more effective in a compressed 
format, but that's vague.  I'm assuming that we would at least want an 
mp3 audio file and an mpeg or mp4 video, but what else would be 
necessary?  I can convert uncompressed audio and video to compressed 
formats, but some idea of what's wanted would be helpful.  If someone 
else is already working on this, that's fine with me, just let me know.  
Thanks.

From gbnewby at pglaf.org  Sun Jun  8 14:05:26 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun, 8 Jun 2008 14:05:26 -0700
Subject: [gutvol-d] Preprints: Hart/Newby presentation
In-Reply-To: <20080608191531.GA25191@investigative.net>
References: <20080608191531.GA25191@investigative.net>
Message-ID: <20080608210526.GA24061@mail.pglaf.org>

On Sun, Jun 08, 2008 at 12:15:31PM -0700, Tony Baechler wrote:
> Hello,
> 
> I didn't see any contact email address on the Preprints site, so I'm 
> asking here.  I saw the Hart and Newby presentation from HOPE 6 in the 
> form of an .iso file.  What exactly is needed to make this ready for PG?  
> All the site said was that it would be more effective in a compressed 
> format, but that's vague.  I'm assuming that we would at least want an 
> mp3 audio file and an mpeg or mp4 video, but what else would be 
> necessary?  I can convert uncompressed audio and video to compressed 
> formats, but some idea of what's wanted would be helpful.  If someone 
> else is already working on this, that's fine with me, just let me know.  

I'm the contact.  There isn't anything precise I had in mind..
preprints are where I put "raw" material [with no particular definition]
that needs effort to be added to the main PG collection.

You're right that we'd like MP3 or similar audio, and MP4 or
similar audio+video, extracted.  If you want to do different
versions [ogg, etc.] that's fine.

I don't think anyone else is working on this.  Michael and
I are planning a presentation at the next conference in the
same series, www.thelasthope.org

Thanks! 
  -- Greg

From Bowerbird at aol.com  Sun Jun  8 23:36:29 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 9 Jun 2008 02:36:29 EDT
Subject: [gutvol-d] c'mon steve
Message-ID: <cf6.3294d8d7.357e296d@aol.com>

c'mon steve, let's push the envelope even further.
v2 of the iphone is nice, but hardly revolutionary.
give us a new machine that blows the doors off...

please.

thanks.

-bowerbird


**************
Get trade secrets for amazing burgers. Watch "Cooking with 
Tyler Florence" on AOL Food.
      (http://food.aol.com/tyler-florence?video=4?&amp;
NCID=aolfod00030000000002)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080609/0f8e6000/attachment.htm 

From lee at novomail.net  Mon Jun  9 15:48:09 2008
From: lee at novomail.net (Lee Passey)
Date: Mon, 09 Jun 2008 16:48:09 -0600
Subject: [gutvol-d] good tools that do all the work (the short version)
In-Reply-To: <d2c.2c82ea64.35744c92@aol.com>
References: <d2c.2c82ea64.35744c92@aol.com>
Message-ID: <484DB329.8060802@novomail.net>

I apologize, but I simply cannot let this go without comment:

Bowerbird at aol.com wrote:

[snip]

> the problem with that .html is that you can't maintain it.
> even maintaining one such file can be a chore, but when
> you must maintain tens of thousands, it gets impossible.

As a general rule, bowerbird's comments are, while inflammatory, mostly 
correct. As he has pointed out, the biggest flaw in the PG corpus, even 
more serious than the loss of metadata, is that it is totally lacking in 
standards (he uses the word consistency) and that Mr. Hart is thoroughly 
committed to maintaining this chaos as an institutional objective.

On this one particular issue, however, he is utterly, totally wrong. It 
is the big lie, that he hopes we will eventually accept simply because 
he repeats it over and over.

The problem with bowerbird's s.m.l. is that you can't maintain it. HTML 
is a well-documented, standard markup language with dozens, if not 
hundreds, of tools that can be used to display and manipulate it. The 
notion that the display of HTML files is restricted to web-browsers is 
simply na?ve.

s.m.l. is subtle, incomplete and ambiguous. It is, of course, an attempt 
to create a markup language and is far more than the Plain Vanilla Text 
(or Impoverished Text Format) that Mr. Hart advocates.

It is utterly inconceivable to me that anyone could possibly claim that 
HTML is difficult to maintain whereas s.m.l. is not.

As we have seen, it is certainly possible to abuse any markup language, 
and many of the HTML files now in the PG archive are evidence of this. 
But even the worst of these files are easier to modify, update and 
maintain than /any/ s.m.l. file.

I realize that no one here really lends any credence to bowerbird's 
attempt to create Yet Another Markup Language; but every once in a while 
I think it is appropriate to call a spade a spade, and an irrational 
conclusion an irrational conclusion.

From marcello at perathoner.de  Tue Jun 10 04:23:21 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 10 Jun 2008 13:23:21 +0200
Subject: [gutvol-d] good tools that do all the work (the short version)
In-Reply-To: <484DB329.8060802@novomail.net>
References: <d2c.2c82ea64.35744c92@aol.com> <484DB329.8060802@novomail.net>
Message-ID: <484E6429.7040504@perathoner.de>

Lee Passey wrote:

> As a general rule, bowerbird's comments are, while inflammatory, mostly 
> correct. 

I strongly disagree. He's a braggard. Big mouth. No teeth.

All BB did in 5 years was to propose one simple solution for all our 
problems, and that solution is wrong.

It is wrong because his pet format doesn't scale to the level of text 
detail most of us want to capture.

It is wrong because his pet format also is fundamentally flawed for 
reasons I detailed long ago but which BB never had the pluck to reply 
to. (ie. his use of non-printing characters for formatting and his 
reliance on counting empty lines for text division markup.)


> As he has pointed out, the biggest flaw in the PG corpus, even 
> more serious than the loss of metadata, is that it is totally lacking in 
> standards (he uses the word consistency) and that Mr. Hart is thoroughly 
> committed to maintaining this chaos as an institutional objective.

I wouldn't say that. Basically, once an idea formed in MH's head, it is 
not amendable by fact. Fact is: that DP, doing exactly the opposite of 
what MH recommends, produced more books in 8 years than PG in 37 years. 
There is no way on earth that this simple fact will convince MH that 
organized ebook production is the way to go.

Fortunately the people at DP didn't listen to what MH said but simply 
set up an environment for organized ebook production and started turning 
out books.

The solution to your problem is: take your ebook standard of choice and 
start converting the library. If it does any good, people will notice 
and jump to it.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From hart at pglaf.org  Tue Jun 10 08:58:04 2008
From: hart at pglaf.org (Michael Hart)
Date: Tue, 10 Jun 2008 08:58:04 -0700 (PDT)
Subject: [gutvol-d] good tools that do all the work (the short version)
In-Reply-To: <484E6429.7040504@perathoner.de>
References: <d2c.2c82ea64.35744c92@aol.com> <484DB329.8060802@novomail.net>
	<484E6429.7040504@perathoner.de>
Message-ID: <Pine.LNX.4.64.0806100843500.27586@pglaf.org>


I suppose the real question is whether any more than a small
handful of people believe the message I reply to below.

If not, perhaps it would be best simply to ignore messages a
person such as this sends to the list.

How many people think such replies are really necessary?

And, by the way, it's more like 38 years, but who would make
the presumptive call that the author in question pays a more
than lip-service attention to what he is saying. . . .

Hopefully most of you will agree that replying is just waste
of time after waste of time after waste of time, and forgive
me if I take some of my own advice and wait until people say
they need a refutation to all this garbage.

. . . .


The message below will have you believe that DP did all this
without either permission or encouragement from me.

This can be filed with most of the rest of that author.

As I have stated in reply to these same accusations before,
I personally went to Las Vegas, the then home town of DP's
founder, Charles Franks, where we met in person and worked
out all the details he had in mind, but the author below--
sadly to say--was not in attendance.  A latecomer.

In addition, he would have you believe that DP has created
much more than half of all the eBooks at PG sites.

According to his own numbers, it is, and has been for some
time, about half of the total listed in the Newsletters.

Which is no mean feat and should not reflect upon DP other
than in the most positive manner, it is simply a numerical,
an nothing more, observation on the strategies and tactics,
used for years by the author whose name appears below.


On Tue, 10 Jun 2008, Marcello Perathoner wrote:

> Lee Passey wrote:
>
>> As a general rule, bowerbird's comments are, while inflammatory, mostly
>> correct.
>
> I strongly disagree. He's a braggard. Big mouth. No teeth.
>
> All BB did in 5 years was to propose one simple solution for all our
> problems, and that solution is wrong.
>
> It is wrong because his pet format doesn't scale to the level of text
> detail most of us want to capture.
>
> It is wrong because his pet format also is fundamentally flawed for
> reasons I detailed long ago but which BB never had the pluck to reply
> to. (ie. his use of non-printing characters for formatting and his
> reliance on counting empty lines for text division markup.)
>
>
>> As he has pointed out, the biggest flaw in the PG corpus, even
>> more serious than the loss of metadata, is that it is totally lacking in
>> standards (he uses the word consistency) and that Mr. Hart is thoroughly
>> committed to maintaining this chaos as an institutional objective.
>
> I wouldn't say that. Basically, once an idea formed in MH's head, it is
> not amendable by fact. Fact is: that DP, doing exactly the opposite of
> what MH recommends, produced more books in 8 years than PG in 37 years.
> There is no way on earth that this simple fact will convince MH that
> organized ebook production is the way to go.
>
> Fortunately the people at DP didn't listen to what MH said but simply
> set up an environment for organized ebook production and started turning
> out books.
>
> The solution to your problem is: take your ebook standard of choice and
> start converting the library. If it does any good, people will notice
> and jump to it.
>
>
>
>
> -- 
> Marcello Perathoner
> webmaster at gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From Bowerbird at aol.com  Tue Jun 10 10:38:15 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 10 Jun 2008 13:38:15 EDT
Subject: [gutvol-d] good tools that do all the work (the short version)
Message-ID: <d59.2394c588.35801607@aol.com>

marcello is in my spam folder, of course, since his signal-to-noise ratio is
so close to zero that there's no reason to even try to discern the number...

lee has, on some occasions, said a few things that were worth reading, but
i had to put him in my spam folder too, because he comes from a position
where he hates project gutenberg, while i come from a position of _love_.
so even when lee _thinks_ that i might agree with him, he's badly mistaken.

and in addition to our motivational differences, there are just a lot of 
times
where lee is misguided.   this latest post of his would be one such instance.

but i've passed the point where i'm jumping in to say "no, you are wrong..."
so that post could have stayed in my spam folder and i wouldn't have cared.

now that michael has fished it out and waved it in front of me, i will say 
that
lee doesn't seem to have a clue.   i wonder if he's actually _looked_ at any 
of
the d.p. .html books.   and i'm quite sure he hasn't _tracked_ them over 
time.

if he had, he'd know that the various producers have used a _wide_variety_
of methods in creating all those .html files.   and _that_ is what makes them
difficult (to the point of impossibility) to maintain.   and the fact that 
.html is
"a standard" doesn't really fix that problem.   anyone who wants to convert
those files is going to have to go into each one individually and _grok_ it,
first of all, and then _apply_an_upgrade_ more-or-less manually.   difficult.
and it becomes more and more difficult the longer that the task is put off...

with z.m.l., on the other hand, there's only one way to get a desired effect.
so all the files will be _consistent_, so they can be treated 
_programatically_.
that is, i just write the program that does the upgrade, and run it across 
all
the files.   once the program is written, most of the work is done, no matter
how many files i run it against.   so this infrastructure scales extremely 
well.

lee can say whatever he will, but what he says won't make it one bit easier
for anyone to maintain the big hairy mess that has become the p.g. corpus.
after all, if it was easy to upgrade those files, they'd all be .tei files by 
now...

and what he says won't make it one bit harder for me to maintain my mirror.
which, by the way, is coming along just fine, in case anyone was wondering.

so michael, i'd recommend you divert those fellows to your spam folder...

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080610/e8cd2f45/attachment.htm 

From lee at novomail.net  Tue Jun 10 10:45:08 2008
From: lee at novomail.net (Lee Passey)
Date: Tue, 10 Jun 2008 11:45:08 -0600
Subject: [gutvol-d] good tools that do all the work (the short version)
In-Reply-To: <484E6429.7040504@perathoner.de>
References: <d2c.2c82ea64.35744c92@aol.com> <484DB329.8060802@novomail.net>
	<484E6429.7040504@perathoner.de>
Message-ID: <484EBDA4.4030402@novomail.net>

Marcello Perathoner wrote:

> The solution to your problem is: take your ebook standard of choice and 
> start converting the library. If it does any good, people will notice 
> and jump to it.

Well, I don't really have a problem; but your suggestion is a good one.

It is, of course, a little more complex than this. You do need to start 
by selecting a markup standard, and creating a markup tutorial that goes 
beyond the bare syntax requirements.

Then you need to start building a library using that markup language; 
but the Project Gutenberg archive has been so sloppily created you can't 
use it as a starting point. Typographical markup, provenance, references 
and other such metadata have been irretrievably lost, so you pretty much 
have to start over.

Now, because consistency within the archive is also important (if you 
want a standard, you should also want an archive where everything 
satisfies that standard) you can't use the PG archive because it has no 
standards. Even if the new, improved files make their way back to PG you 
would want a place where they could be stored in their pristine state.

So, PG has been useful how?

Please, please, PLEASE do not think that I'm suggesting that PG should 
become relevant or useful; I'm simply pointing out that it is not, and 
that attempts to make it relevant or useful will simply be futile.


From marcello at perathoner.de  Tue Jun 10 10:54:56 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 10 Jun 2008 19:54:56 +0200
Subject: [gutvol-d] good tools that do all the work (the short version)
In-Reply-To: <Pine.LNX.4.64.0806100843500.27586@pglaf.org>
References: <d2c.2c82ea64.35744c92@aol.com>
	<484DB329.8060802@novomail.net>	<484E6429.7040504@perathoner.de>
	<Pine.LNX.4.64.0806100843500.27586@pglaf.org>
Message-ID: <484EBFF0.2040108@perathoner.de>

Michael Hart wrote:

> The message below will have you believe that DP did all this
> without either permission or encouragement from me.

I didn't know anybody needed your permission to do ebooks.


> As I have stated in reply to these same accusations before,
> I personally went to Las Vegas, the then home town of DP's
> founder, Charles Franks, where we met in person and worked
> out all the details he had in mind, but the author below--
> sadly to say--was not in attendance.  A latecomer.

The accounts I heard about this meeting where somehow different. But it 
often happens that persons percieve the same situation in a different 
manner.


> In addition, he would have you believe that DP has created
> much more than half of all the eBooks at PG sites.

I said DP produced more books than PG. Not *much* more.

As of today and according to DP they have completed and posted 13,342 books.

As of PG, today we have posted #25755.

   25755 -
   13342
   =======
   12413

The 13,342 books posted thru DP are more than the 12,413 books posted 
thru other channels. About a thousand more. And they did it in 8 years, 
not in 38.

And this simple arithmetic proves that you create less ebooks by 
"offer[ing] as many freedoms to our volunteers as possible" and by
not being "very bossy about what our volunteers should do". 
(http://www.gutenberg.org/wiki/Gutenberg:Project_Gutenberg_Mission_Statement_by_Michael_Hart)

It shows instead how you can create more books by providing guidance and 
a productive environment to volunteers. It proves that by requiring 
strict guidelines you actually go faster than by requiring none.

If there shall be a PG II, then that title belongs to DP.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From rburkey2005 at earthlink.net  Tue Jun 10 10:58:43 2008
From: rburkey2005 at earthlink.net (Ron Burkey)
Date: Tue, 10 Jun 2008 12:58:43 -0500
Subject: [gutvol-d] good tools that do all the work (the short version)
In-Reply-To: <484EBDA4.4030402@novomail.net>
References: <d2c.2c82ea64.35744c92@aol.com> <484DB329.8060802@novomail.net>
	<484E6429.7040504@perathoner.de>  <484EBDA4.4030402@novomail.net>
Message-ID: <1213120723.28993.3.camel@software1.heads-up.local>

On Tue, 2008-06-10 at 11:45 -0600, Lee Passey wrote:
> Marcello Perathoner wrote:
> 
> > The solution to your problem is: take your ebook standard of choice and 
> > start converting the library. If it does any good, people will notice 
> > and jump to it.
> 
> Well, I don't really have a problem; but your suggestion is a good one.
> 
> It is, of course, a little more complex than this. You do need to start 
> by selecting a markup standard, and creating a markup tutorial that goes 
> beyond the bare syntax requirements.
> 
> Then you need to start building a library using that markup language; 
> but the Project Gutenberg archive has been so sloppily created you can't 
> use it as a starting point. Typographical markup, provenance, references 
> and other such metadata have been irretrievably lost, so you pretty much 
> have to start over.
> 

That's throwing out the baby with the bath-water.  You need to
distinguish between the perfect case and a practically-achievable case. 

Step 1:  Choose a standard.  Make sure it's flexible enough to handle
unknown data, such as provenance.

Step 2:  Get some texts into that format.

Step 3:  Hope other people notice and jump on board.

Step 4:  *Then* worry about the do-overs and bemoan the fact that there
need to be any do-overs (when it really could have been avoided).


From Bowerbird at aol.com  Tue Jun 10 11:58:09 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 10 Jun 2008 14:58:09 EDT
Subject: [gutvol-d] the wonder of widger, cleaning up the library
Message-ID: <be0.36e52d10.358028c1@aol.com>

david widger has cleaned up gibbon's "decline and fall".

there were 2 versions of this book in the p.g. library
-- one of them text-only and the other .html-only --
with both formats encompassing several p.g. e-texts,
since this is a book that was published in 6 volumes...

as david widger put it:
>    Both of these sets have been recently completely 
>    reproofed with correction of several thousand errors.

this is david widger commenting on "several thousand errors",
folks, not someone with a "grudge" against project gutenberg.

it's just a _fact_ that many of the e-texts are plagued by errors.
anyone who disputes this message has their head in the sand.
they need to be cleaned...   thanks to david for doing that job...

rather than adding more books to the pile, the best thing for
d.p. (and independent digitizers) to do at this time would be
to find and fix the errors in the existing e-texts...

>    History of the Decline and Fall of the Roman Empire
>    http://www.gutenberg.org/etext/731

>    History of the Decline and Fall of the Roman Empire
>    http://www.gutenberg.org/etext/890

>    The History Of The Decline And Fall Of The Roman Empire
>    http://www.gutenberg.org/etext/25717

(and yes, the inconsistency in the titles did make me laugh.)

thanks again to david, for doing what needs to be done...

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080610/bc2777a2/attachment-0001.htm 

From bzg at altern.org  Tue Jun 10 20:52:40 2008
From: bzg at altern.org (Bastien)
Date: Wed, 11 Jun 2008 05:52:40 +0200
Subject: [gutvol-d] good tools that do all the work (the short version)
In-Reply-To: <484DB329.8060802@novomail.net> (Lee Passey's message of "Mon, 09
	Jun 2008 16:48:09 -0600")
References: <d2c.2c82ea64.35744c92@aol.com> <484DB329.8060802@novomail.net>
Message-ID: <87od68typj.fsf@bzg.ath.cx>

Lee Passey <lee at novomail.net> writes:

> I realize that no one here really lends any credence to bowerbird's
> attempt to create Yet Another Markup Language; but every once in a
> while I think it is appropriate to call a spade a spade, and an
> irrational conclusion an irrational conclusion.

And?

-- 
Bastien

From ebooks at ibiblio.org  Tue Jun 10 22:59:22 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Wed, 11 Jun 2008 01:59:22 -0400
Subject: [gutvol-d] cyberlibrary numbers
In-Reply-To: <bf0.2b51b0ef.357a5001@aol.com>
References: <bf0.2b51b0ef.357a5001@aol.com>
Message-ID: <484F69BA.2030702@ibiblio.org>

On June 6, 2008, Bowerbird wrote:

> jose said:
>  >  blah blah blah dodge distortion factoid blah blah blah


Actually, that would make a pretty good description of some of your 
posts, especially if you add "jump to erroneous conclusions blah blah 
blah grandiose claims blah blah blah." :)

You know, Bowerbird, you disappoint me. I thought you'd be jumping for 
joy after finding out that Google has scanned more than twice as many 
books at the University of Michigan as you thought they had. ;)


> p.s.  tell john wilkin "hi" for me.  he never answers me
> when i try to ask him a question directly on his blog...


It seems that Paul Courant doesn't answer you when you address him on 
his blog either:

http://paulcourant.net/2008/04/26/john-wilkin-and-others-on-openness-and-its-opposites/#comments

http://paulcourant.net/2008/05/31/microsoft-exits-the-mass-digitization-business/#comments

I can't speak for either other one of them, but if I had to hazard a 
guess, perhaps the reason they ignore you is that they think you ought 
to do your homework. For instance, you might start by reading UM's 
cooperative agreement with Google. It's been available on the UM 
website since June 2005. In fact, David Carter posted a link to a PDF 
version of the agreement in this Book People post on June 16, 2005:

"Text of Michigan/Google agreement"
http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&post=2005-06-16,12

Two days later, J Flenner also posted a link to the agreement in this 
post:

"Details Revealed on Google Library Project at U. Michigan"
http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&post=2005-06-18,2

Here are links to it in both HTML and PDF formats:

http://www.lib.umich.edu/mdp/umgooglecooperativeagreement.html

http://www.lib.umich.edu/mdp/um-google-cooperative-agreement.pdf

You might want to start with this paragraph:


"4.4.1   Use of U of M Digital Copy on U of M Website. U of M shall 
have the right to use the U of M Digital Copy, in whole or in part at 
U of M's sole discretion, as part of services offered on U of M's 
website. U of M shall implement technological measures (e.g., through 
use of the robots.txt protocol) to restrict automated access to any 
portion of the U of M Digital Copy or the portions of the U of M 
website on which any portion of the U of M Digital Copy is available. 
U of M shall also make reasonable efforts (including but not limited 
to restrictions placed in Terms of Use for the U of M website) to 
prevent third parties from (a) downloading or otherwise obtaining any 
portion of the U of M Digital Copy for commercial purposes, (b) 
redistributing any portions of the U of M Digital Copy, or (c) 
automated and systematic downloading from its website image files from 
the U of M Digital Copy. U of M shall restrict access to the U of M 
Digital Copy to those persons having a need to access such materials 
and shall also cooperate in good faith with Google to mutually develop 
methods and systems for ensuring that the substantial portions of the 
U of M Digital Copy are not downloaded from the services offered on U 
of M's website or otherwise disseminated to the public at large."


Did you notice the part about preventing "third parties from ... (b) 
redistributing any portions of the U of M Digital Copy, or (c) 
automated and systematic downloading from its website image files from 
the U of M Digital Copy"? That would apply to you, third party, er, 
Bowerbird. Perhaps that's why they ignore you when you talk about 
scraping and re-mounting their scans and OCR.

Well, if I can ever help you with your homework in the future, feel 
free to let me know. My tutoring rates are very reasonable. ;)


Jose Menendez

From ebooks at ibiblio.org  Tue Jun 10 23:18:06 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Wed, 11 Jun 2008 02:18:06 -0400
Subject: [gutvol-d] Microsoft quits mass book digitization (was cyberlibrary
	numbers)
In-Reply-To: <Pine.LNX.4.64.0806060850000.22177@pglaf.org>
References: <bf5.3154f7ac.355753ad@aol.com> <4848ED25.1040202@ibiblio.org>
	<Pine.LNX.4.64.0806060850000.22177@pglaf.org>
Message-ID: <484F6E1E.3090206@ibiblio.org>

I thought this deserved its own thread.


On June 6, 2008, I wrote:

> P.S. I'm surprised that no one has mentioned on gutvol-d before 
> now that Microsoft was quitting its book scanning operation.


The same day Michael Hart replied:

> Perhaps no one here actually believed Microsoft was serious about
> doing eBooks in the first place.
> 
> Perhaps some of the people here realized that Microsoft was not
> going to be happy about losing the Yahoo! deal, and combined it
> with the fact that Yahoo! is also a major supporter of the same
> OCA [Open Content Alliance] that Microsoft was trying to get in
> perhaps yet another kind of takeover bid.


Well, if you'd done your homework, Michael, you would have realized 
that Microsoft was not only serious, but it was responsible for the 
*overwhelming majority* of books scanned by the OCA. You also would 
have realized that Yahoo is NOT a "major supporter" of the OCA. 
Indeed, in terms of financial support, Yahoo WAS a minuscule supporter 
of the OCA compared to Microsoft. (Notice I switched to the past 
tense. There's a reason for that.) Want some proof?

Looking at the Internet Archive's Text Archive page:

http://www.archive.org/details/texts

I see "202,578 items" for the American Libraries sub-collection and 
"119,424 items" for the Canadian Libraries. Adding them together, we 
get a total of 322,002.

(Those numbers are updated regularly as more books are put online, so 
people may see higher numbers later.)

Now how many of those 322,000+ books were thanks to Yahoo and how many 
were thanks to Microsoft? Let's start with "major supporter" Yahoo:

http://www.archive.org/details/yahoo_books

I hope you're sitting down when you read this, Michael, because Yahoo 
contributed the staggering total of "1,075 items." That's not a typo; 
I didn't leave out a few digits. Only 1,075! If we divide 1,075 by 
322,002, we get 0.00334. So only 0.334% of the books the OCA has 
scanned and put online from American and Canadian libraries are thanks 
to "major supporter" Yahoo!

Now let's see how Microsoft did:

http://www.archive.org/details/msn_books

The total for Microsoft is "288,518 items." (Since the OCA is still 
scanning books with funds contributed by Microsoft, this total keeps 
getting updated too.)

288,518 divided by 322,002 equals 0.896. So 89.6% of the books the OCA 
has scanned and put online from American and Canadian libraries are 
thanks to the company "perhaps no one here actually believed ... was 
serious."

If you're still tempted to cling to the notion that Microsoft wasn't 
serious and that Yahoo is a "major supporter" of the OCA, let's take a 
look at Brewster Kahle's own announcement on May 26 about Microsoft's 
decision:

http://www.archive.org/iathreads/post-view.php?id=194217

It's not too long, so I'm going to quote the whole thing, with a few 
comments from me enclosed in brackets [].


"The Internet Archive operates 13 scanning centers in great libraries, 
digitizing 1000 books a day. This scanning is financially supported by 
libraries, foundations, and the Microsoft Corporation. Today, 
Microsoft has announced that it will ramp down their investment in 
this area. We very much appreciate their efforts and funding in book 
scanning over the last 3 years. As a result, over 300,000 books are 
publicly available on the archive.org site that would not otherwise be."

[Note that Brewster didn't mention ANY financial support from Yahoo. 
See why I switched to the past tense and said that "Yahoo WAS a 
minuscule supporter of the OCA"?]

"To their credit, they said they are taking off any contractual 
restrictions on the public domain books and letting us keep the 
equipment that they funded. This is extremely important because it can 
allow those of us in the public sphere to leverage what they helped 
build. Keeping the public domain materials public domain is where we 
all wanted to be. Getting a books scanning process in place is also a 
major accomplishment. Thank you Microsoft."

[Note the mention of "contractual restrictions." I'll get to those a 
little later.]

"Funding for the time being is secure, but going forward we will need 
to replace the Microsoft funding. Microsoft has always encourage the 
Open Content Alliance to work in parallel in case this day arrived. 
Lets work together, quickly, to build on the existing momentum. All 
ideas welcome.

"Onward to a completely public library system!"


Did you notice, Michael, that Brewster didn't say anything about Yahoo 
helping to replace the Microsoft funding?


> Quite possibly Microsoft found, as did the Federal Spook Agency
> and Co., that Brewster Kahal, who runs the OCA, is not quite an
> easy to maniputulate person as they had assumed.


But apparently his surname is easy to misspell. ;) Seriously, do you 
recall the "contractual restrictions" I pointed out in Brewster's 
announcement? If you look again at the Microsoft page at the Internet 
Archive:

http://www.archive.org/details/msn_books

you'll see a box labeled "Rights" on the left side of the page. Here's 
what it says:


"Books scanned before November 1, 2006 are under OCA principles, 
thereafter they are available for non-commercial use and may not 
appear in commercial services. Please contact info at archive.org or 
Microsoft about bulk access."


Brewster Kahle may not be "easy to maniputulate [sic]," but he didn't 
stop Microsoft from violating the OCA principles it had agreed to.


Jose Menendez

From jmdyck at ibiblio.org  Wed Jun 11 01:41:11 2008
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Wed, 11 Jun 2008 01:41:11 -0700
Subject: [gutvol-d] PGDP's contribution to PG, numerically speaking
In-Reply-To: <484EBFF0.2040108@perathoner.de>
References: <d2c.2c82ea64.35744c92@aol.com> <484DB329.8060802@novomail.net>
	<484E6429.7040504@perathoner.de>
	<Pine.LNX.4.64.0806100843500.27586@pglaf.org>
	<484EBFF0.2040108@perathoner.de>
Message-ID: <484F8FA7.2000205@ibiblio.org>

Marcello Perathoner wrote:
> 
> As of today and according to DP they have completed and posted 13,342 books.
> 
> As of PG, today we have posted #25755.
> 
>    25755 -
>    13342
>    =======
>    12413

The 13,342 number at pgdp.net is the number of its 'projects' that have 
"posted to PG" status. However, it isn't the case that 1 pgdp.net 
project equals 1 PG etext. Large books are often split into multiple 
projects and then merged into a single posting -- sometimes (mostly in 
the past) the multiple projects would each get the "posted to PG" 
status, so contributing more than one to the number in question.

Instead, to get the number of PG texts that were contributed by 
pgdp.net, the best approximation is the number in the upper right of 
most pages at the site, currently 12,897. (This is the number of 
distinct PG etext numbers that we have recorded for our posted projects.)

Also, I'm guessing that PG's "reserved count" is still about 40, so when 
#25755 is posted, that means PG (USA) has about 25715 books. So the 
correct calculation is something more like:

     25715  PG USA total
   - 12897  # from pgdp.net
     -----
     12818  # from elsewhere

So pgdp.net still accounts for more than half, but it's pretty close.
The last time I did that calculation (a couple months ago, I think), 
PGDP's contribution was just under half, so we must have crossed the 
equator sometime since then.

Mind you, if you set aside audio and video files and only look at texts 
per se, PGDP's contribution has been more than half for quite a while. 
Someone else can do that calculation if they want.

Of course, if you consider the larger interpretations of "Project 
Gutenberg", PGDP's fraction thereof will be less.

-Michael


From paulmaas at airpost.net  Wed Jun 11 07:16:51 2008
From: paulmaas at airpost.net (Paul Maas)
Date: Wed, 11 Jun 2008 07:16:51 -0700
Subject: [gutvol-d] PGDP's contribution to PG, numerically speaking
In-Reply-To: <484F8FA7.2000205@ibiblio.org>
References: <d2c.2c82ea64.35744c92@aol.com> <484DB329.8060802@novomail.net>
	<484E6429.7040504@perathoner.de>
	<Pine.LNX.4.64.0806100843500.27586@pglaf.org>
	<484EBFF0.2040108@perathoner.de> <484F8FA7.2000205@ibiblio.org>
Message-ID: <1213193811.18491.1257910599@webmail.messagingengine.com>

What can be said is that DP is the #1 contributor of electronic texts to
PG.  No one else even comes close.  Congratulations to DP!

pm


On Wed, 11 Jun 2008 01:41:11 -0700, "Michael Dyck" <jmdyck at ibiblio.org>
said:
> Marcello Perathoner wrote:
> > 
> > As of today and according to DP they have completed and posted 13,342 books.
> > 
> > As of PG, today we have posted #25755.
> > 
> >    25755 -
> >    13342
> >    =======
> >    12413
> 
> The 13,342 number at pgdp.net is the number of its 'projects' that have 
> "posted to PG" status. However, it isn't the case that 1 pgdp.net 
> project equals 1 PG etext. Large books are often split into multiple 
> projects and then merged into a single posting -- sometimes (mostly in 
> the past) the multiple projects would each get the "posted to PG" 
> status, so contributing more than one to the number in question.
> 
> Instead, to get the number of PG texts that were contributed by 
> pgdp.net, the best approximation is the number in the upper right of 
> most pages at the site, currently 12,897. (This is the number of 
> distinct PG etext numbers that we have recorded for our posted projects.)
> 
> Also, I'm guessing that PG's "reserved count" is still about 40, so when 
> #25755 is posted, that means PG (USA) has about 25715 books. So the 
> correct calculation is something more like:
> 
>      25715  PG USA total
>    - 12897  # from pgdp.net
>      -----
>      12818  # from elsewhere
> 
> So pgdp.net still accounts for more than half, but it's pretty close.
> The last time I did that calculation (a couple months ago, I think), 
> PGDP's contribution was just under half, so we must have crossed the 
> equator sometime since then.
> 
> Mind you, if you set aside audio and video files and only look at texts 
> per se, PGDP's contribution has been more than half for quite a while. 
> Someone else can do that calculation if they want.
> 
> Of course, if you consider the larger interpretations of "Project 
> Gutenberg", PGDP's fraction thereof will be less.
-- 
  Paul Maas
  paulmaas at airpost.net

-- 
http://www.fastmail.fm - Choose from over 50 domains or use your own


From hart at pglaf.org  Wed Jun 11 09:53:38 2008
From: hart at pglaf.org (Michael Hart)
Date: Wed, 11 Jun 2008 09:53:38 -0700 (PDT)
Subject: [gutvol-d] good tools that do all the work (the short version)
In-Reply-To: <484EBFF0.2040108@perathoner.de>
References: <d2c.2c82ea64.35744c92@aol.com> <484DB329.8060802@novomail.net>
	<484E6429.7040504@perathoner.de>
	<Pine.LNX.4.64.0806100843500.27586@pglaf.org>
	<484EBFF0.2040108@perathoner.de>
Message-ID: <Pine.LNX.4.64.0806110925350.15312@pglaf.org>


Once again the author below would have you believe 25,xxx
are all the Project Gutenberg eBooks in existence without
giving any credit to PG of Australia with 1640, or Canada
with ~100, or Europe with ~500,

Which are enough by themselves to counter his argument of
"simple aritmetic" as listed below.

NOT to even mention PrePrints with 387 or
to mention his dreated Nemesis he named indirectly below,
the dreaded site at:

http://www.gutenberg.cc with 75,000+.


Since he wasn't there at the Las Vegas meeting, he can do
only the most indirectly kibitzing from the sidelines and
the only way to make sure to keep his remarks in a proper
perspective is to save his messages, rather than just the
easier alternative of deleting them, and returning them a
while later when they will impact his face with the same,
or actually reversed, impact he intended.

It would appear that his major goal is to cause strife in
our midst here, and since no one replied that they needed
any assistance in refuting his rants and raves, we'll see
just how well he manages without any feedback for a bit.

However, the more he rants and raves, the more he proves,
again and again, that freedom of speech is strong here at
Project Gutenberg.

And, just to make the "simple arithmetic" even moreso:

Current Totals


25,755  Project Gutenberg Under US Copyright Law
  1,640  Project Gutenberg Of Australia
    504  Project Gutenberg of Europe
    138  Project Gutenberg of Canada [through May]
======
28,037  Grand Total

Not to mention some worthwhile titles at PrePrints:

    387  Project Gutenberg PrePrints

or, since he mentioned the dreaded II below, we should,
but, won't, add in the 75,000+ eBooks donated by those
entirely outside the Project Gutenberg evironment from
around the world, but who don't have any distribution.

And it would certainly be too much to consider a first
Project Gutenberg spin-off, Project Runeberg, from way
before DP's time, or Project Wittenberg, or. . . .


On Tue, 10 Jun 2008, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>> The message below will have you believe that DP did all this
>> without either permission or encouragement from me.
>
> I didn't know anybody needed your permission to do ebooks.
>
>
>> As I have stated in reply to these same accusations before,
>> I personally went to Las Vegas, the then home town of DP's
>> founder, Charles Franks, where we met in person and worked
>> out all the details he had in mind, but the author below--
>> sadly to say--was not in attendance.  A latecomer.
>
> The accounts I heard about this meeting where somehow different. 
> But it often happens that persons percieve the same situation in a 
> different manner.
>
>
>> In addition, he would have you believe that DP has created
>> much more than half of all the eBooks at PG sites.
>
> I said DP produced more books than PG. Not *much* more.
>
> As of today and according to DP they have completed and posted 
> 13,342 books.
>
> As of PG, today we have posted #25755.
>
>  25755 -
>  13342
>  =======
>  12413
>
> The 13,342 books posted thru DP are more than the 12,413 books 
> posted thru other channels. About a thousand more. And they did it 
> in 8 years, not in 38.
>
> And this simple arithmetic proves that you create less ebooks by 
> "offer[ing] as many freedoms to our volunteers as possible" and by
> not being "very bossy about what our volunteers should do". 
> (http://www.gutenberg.org/wiki/Gutenberg:Project_Gutenberg_Mission_Statement_by_Michael_Hart)
>
> It shows instead how you can create more books by providing 
> guidance and a productive environment to volunteers. It proves 
> that by requiring strict guidelines you actually go faster than by 
> requiring none.
>
> If there shall be a PG II, then that title belongs to DP.
>
>
> -- 
> Marcello Perathoner
> webmaster at gutenberg.org
>

From hart at pglaf.org  Wed Jun 11 10:14:34 2008
From: hart at pglaf.org (Michael Hart)
Date: Wed, 11 Jun 2008 10:14:34 -0700 (PDT)
Subject: [gutvol-d] "Simple Arithmetic"
In-Reply-To: <Pine.LNX.4.64.0806110925350.15312@pglaf.org>
References: <d2c.2c82ea64.35744c92@aol.com> <484DB329.8060802@novomail.net>
	<484E6429.7040504@perathoner.de>
	<Pine.LNX.4.64.0806100843500.27586@pglaf.org>
	<484EBFF0.2040108@perathoner.de>
	<Pine.LNX.4.64.0806110925350.15312@pglaf.org>
Message-ID: <Pine.LNX.4.64.0806110959440.15312@pglaf.org>


25,755  Project Gutenberg Under US Copyright Law
  1,640  Project Gutenberg Of Australia
    504  Project Gutenberg of Europe
    138  Project Gutenberg of Canada [through May]
======
28,037  Grand Total NOT mentioning so many more.

13,342  as stated in Marcello's "simple arithmetic"


It's really a shame that these figures are so eqocentric,
UScentric, etc., as to derail the topic from what SHOULD
be the real statement at hand.

That Distributed Proofreader is doing WONDERFUL WORK!!!


And the misleading part about 8 years versus 38, well--
it's not quite "simple arithmetic" when you get into an
example of such rapid growth curves, yet I think anyone
here realizes that any such fast growth function yields
much more in recent years than earlier years.

Of course, we perhaps have to take into account that it
is outside that author's timeframe of experience to say
just how easy or hard it was starting those 38 years.

Well, 37.95, or so.

Let's see just how much growth there has been in a same
8 year period of flash RAM, for example???

No. . .not here, not now.

This should really be a time for CONGRATULATIONS TO DP!

Regargless of that author's inabilities to say it well,
without im proper "simple arithmetic" to detract from a
GOOD
SOLID

"WELL DONE!!!!!!!"


and


MANY THANKS TO THE DISTRIBUTED PROOFREADERS!!!!!!!


Michael S. Hart
Founder
Project Gutenberg


From Bowerbird at aol.com  Wed Jun 11 10:14:37 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 11 Jun 2008 13:14:37 EDT
Subject: [gutvol-d] PGDP's contribution to PG, numerically speaking
Message-ID: <d1f.27ed3517.358161fd@aol.com>

i'd say that, for some time now, 3/4 of the books that are
being posted were digitized by distributed proofreaders...

indeed, at this point, i'd estimate it conservatively at 4/5.

and i wouldn't be surprised if it were to jump up to 9/10.
or higher.

these days, on the "independent" side, we have david widger,
and al haines, and the chinese students, and not much more.

there are literally _thousands_ of people working over at d.p.

and ever since the p.g. website has had a banner that directs
people to d.p., they've had a constant supply of volunteers...

so if you thought d.p. is "independent" of p.g., think again.
before those banners, they were worried about their churn.
even now, _we_ should be worried about their burnout rate,
because they are destroying a _huge_ number of digitizers.
even the ones they keep are being stunted at the p1 phase.

besides, a number of the people at d.p. are working there
_because_ it feeds p.g.   if it didn't, they'd work elsewhere...

now... if only d.p. were _efficient_, they could be turning out
a _lot_ more e-texts than they are.   as it is, their inefficiency
has them stalled out at a couple thousand e-texts per year.
in comparison, google scans that many books before lunch.

the d.p. number _could_ go up, but if you look more closely,
you might well discover that it's because they're now doing
many more _children's_books_, which have barely any text.
they're putting a lot more time into the _illustrations_ now.
that's not a bad thing.   the emphasis on quantity is stupid.

it's also the case that they are now mounting more scan-sets.
again, lowers the quantity, but it's the right thing to be doing.

to repeat, any emphasis on quantity is stupid.   especially since
that's a game that d.p. will lose when a more-efficient system
appears on the scene...   i suggest instead that people celebrate
the fact that d.p. has crystallized a community of thousands for
the purpose of digitizing the public domain...   that's beautiful...

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080611/84ea1d63/attachment.htm 

From Bowerbird at aol.com  Wed Jun 11 10:25:21 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 11 Jun 2008 13:25:21 EDT
Subject: [gutvol-d] cyberlibrary numbers
Message-ID: <d56.12b4e672.35816481@aol.com>

jose said:
>    You know, Bowerbird, you disappoint me. I thought you'd be 
>    jumping for joy after finding out that Google has scanned 
>    more than twice as many books at the University of Michigan 
>    as you thought they had. ;)

as _i_ "thought" they had?

the university itself _announced_ that they've scanned one million.
made a big deal about it.   and they continue to use _that_ number.

why -- privately -- they told you something different, i don't know.
or care.   it might be that they have _scanned_ 2.5 million already,
but have only _processed_ 1 million.   if that's the case, then they
really need to work on their processing, because it's a bottleneck.

at any rate, i've said that i don't _care_ if they are behind schedule.
as long as i think they're working hard on it, i'm happy with them...
so i'm happy.   but i think it's absolutely clear that if they only have
a million public-facing e-books right now, they're behind schedule.

and you can throw up a big smokescreen, but it's still perfectly clear.

***

of course, if i were to have said that they _weren't_ behind schedule,
you would have thrown up a big smokescreen to say that they _were_.

because you don't really care about _the_truth_ much at all, jose,
you just care about disputing whatever that i say.   or michael says.

and because of that, jose, _you_disappoint_me_.   enough that it's 
to the point where i'm going to have to put you in my spam folder.

criminey sakes, if i said your wife was good-looking, you would say
she's ugly, just to dispute me.   if i said the sky was blue, you'd argue.


>    It seems that Paul Courant doesn't answer you 
>    when you address him on his blog either:

that's right.   i've got about 5 posts in to him without 1 reply yet...


>    I can't speak for either other one of them, but if I had to 
>    hazard a guess, perhaps the reason they ignore you is that 
>    they think you ought to do your homework. For instance, 
>    you might start by reading UM's cooperative agreement with 
>    Google. It's been available on the UM website since June 2005. 

don't be an ass, jose.   of course i've read that contract.   in fact,
i even referred to various parts of it earlier in this very thread...


>    You might want to start with this paragraph:

oh yes, i've read that paragraph too.   indeed, that very paragraph
was mentioned earlier over on courant's blog, before i had posted.

and that paragraph certainly seems to say that umichigan _must_
try to thwart automated downloads.   but yet john wilkin _insisted_
that their material is _free_.   that's _why_ brewster challenged him,
and carl malmud followed with a question that pinpointed the issue.
on the one hand, they say their e-books are _free_, but on the other,
you're not allowed to harvest them en masse.   that's a contradiction.

unlike carl,   who posed the mass-harvest question as "hypothetical",
i've informed them that i have very _real_ intentions to mass-scrape
their public-domain books, to see how they resolve the contradiction.

i have given the google project -- and umichigan in particular --
a good deal of support across cyberspace, precisely _because_
john wilkin has said -- loudly and clearly, from the beginning --
that they would make the pubic-domain material freely available.

now they are trying to take that back, by saying that we can only
"look" at the material, that we can't actually _download_ it ourselves.
that's _bullshit_, and i'm going to call them on it, and do it publicly...

of course, i'm sure you knew all this, and you're just kicking up dust,
which is why -- from now on -- you're going in my spam folder, jose.
if people want to believe the disinformation you spout out, they can...

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080611/4baa3896/attachment.htm 

From Bowerbird at aol.com  Wed Jun 11 10:44:29 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 11 Jun 2008 13:44:29 EDT
Subject: [gutvol-d] cyberlibrary numbers -- one more question
Message-ID: <bec.247807c6.358168fd@aol.com>

one more question, jose...

i know you're retired now, but when you were working, were you a lawyer?

the way you kick up dust, and blow smokescreens, well...

well, it makes me think that you were a lawyer.   is that right?

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080611/f14a2c93/attachment.htm 

From Bowerbird at aol.com  Wed Jun 11 11:05:10 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 11 Jun 2008 14:05:10 EDT
Subject: [gutvol-d] good tools that do all the work (the short version)
Message-ID: <c3d.33a92250.35816dd6@aol.com>

michael said:
>    his major goal is to cause strife in our midst

i keep wondering why you have this person _inside_ your organization,
minding your website.   certainly web-jockeys can't be that hard to find.

indeed, it seems to me that there are a lot of vultures around here,
just waiting for you to die, so they can feast on the p.g. carcass and
turn it into the opposite of what you intended which made it great...

of course, maybe once you're gone, and they take over, and p.g. fails,
that will be the ultimate proof that your approach was the correct one.
then someone else will come in and pick up the pieces and restore it...
crafty, michael, crafty...          :+)

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080611/db091d3e/attachment.htm 

From hart at pglaf.org  Wed Jun 11 11:05:28 2008
From: hart at pglaf.org (Michael Hart)
Date: Wed, 11 Jun 2008 11:05:28 -0700 (PDT)
Subject: [gutvol-d] Microsoft quits mass book digitization (was
 cyberlibrary numbers)
In-Reply-To: <484F6E1E.3090206@ibiblio.org>
References: <bf5.3154f7ac.355753ad@aol.com> <4848ED25.1040202@ibiblio.org>
	<Pine.LNX.4.64.0806060850000.22177@pglaf.org>
	<484F6E1E.3090206@ibiblio.org>
Message-ID: <Pine.LNX.4.64.0806111028570.17446@pglaf.org>


Several more contradiction to refute:

1.  Microsoft was not 100% responsible for 300K titles--
via the various Internet Archive / Open Content Alliance
efforts that have been going on.  Not to mention several
years of effort without Microsoft that produced 100K.

1a. Let's not forget the previous lesson in which we had
just learned to watch out for claims that /recent years/
are more important the /previous years/ logaritmically.

1b. Thus, if Microsoft were "seriously interested" in an
eBook operation, they would certainly have done more.

1c. Given the sheer size of Microsoft, this can only per
that size ratio, be seen as "putting a toe in the water"
at the very most.

1d. Hence the term "minuscule" is more appropriate here.
Given the serious orders of magnitude difference between
Microsoft and Yahoo, it was Microsoft's interest in this
world of eBooks that was statistically "minuscule."

If MS were at all seriously interested in eBooks, it had
every opportunity and resource to go toe to toe with the
Google effort, but never really applied themselves to an
effort more than about 10% of what Google laid out day 1
of their announcements.

Thus it was never part of my expectations that Microsoft
even WANTED to become a major player in the eBook world,
and hence I was apparently less surprised than most of a
world of punditry to see them go. . .since I wasn't sure
they had ever even arrive to the tune of more than a toe
in the door, or in the water, so to speak.

And this doesn't even address "finished eBooks" of kinds
we are used to discussing here.

When one of the "giants" of our industry puts a million,
or two, or three million eBooks into the system, even on
a commercial basis, then, and only then will I take them
at all seriously. . .and the longer they wait, the more,
and more, and more eBooks it will take.

Why?

Because there will already be so many more, as per those
lessons I've been mentioning about growth curves.

There are already millions of eBooks out there without a
dependence on any one source.

Thus it's no big deal for Microsoft to assist in scanned
outputs of 300,000, particularly if one considers larger
sizing when it comes to giants.

It's hard to say that a hundred billion dollar company's
interest is serious when they spend 1% of 1% of it on it
as in the case of the subject under discussion.

When you are talking about giants worth hundred billion,
two hundred billion, etc., this is really more like just
a toenail in the water, not even a toe.

1% of 100 billion is 1 billion

1% of 1 billion is 10 million.

Let's say 300,000 books at 333 1/3 pages each.

That's 100 million pages. . .100,000,000

At 10 cents a page, that's $10 million.

However, that would have to be multiplied for each times
100 billion of company value.

Google is worth some $200+ billion.

Microsoft is worth some $300+ billion.

[End of last year]

search:

"microsoft is valued at" billion


Thus 300K books at 333 1/3 pages each doesn not qualify,
as it is only half of 1% of 1%.


Google, on the other hand, was not only quite much more
serious in the public relations aspect of eBooks but it
was also more serious in the amound accomplished.

However, as stated elsewhere, I do not predict fullness
of what the public was led to expect anytime soon or in
the predicted ranges given on December 14, 2004.


On Wed, 11 Jun 2008, Jose Menendez wrote:

> I thought this deserved its own thread.
>
>
> On June 6, 2008, I wrote:
>
>> P.S. I'm surprised that no one has mentioned on gutvol-d before
>> now that Microsoft was quitting its book scanning operation.
>
>
> The same day Michael Hart replied:
>
>> Perhaps no one here actually believed Microsoft was serious about
>> doing eBooks in the first place.
>>
>> Perhaps some of the people here realized that Microsoft was not
>> going to be happy about losing the Yahoo! deal, and combined it
>> with the fact that Yahoo! is also a major supporter of the same
>> OCA [Open Content Alliance] that Microsoft was trying to get in
>> perhaps yet another kind of takeover bid.
>
>
> Well, if you'd done your homework, Michael, you would have realized
> that Microsoft was not only serious, but it was responsible for the
> *overwhelming majority* of books scanned by the OCA. You also would
> have realized that Yahoo is NOT a "major supporter" of the OCA.
> Indeed, in terms of financial support, Yahoo WAS a minuscule supporter
> of the OCA compared to Microsoft. (Notice I switched to the past
> tense. There's a reason for that.) Want some proof?
>
> Looking at the Internet Archive's Text Archive page:
>
> http://www.archive.org/details/texts
>
> I see "202,578 items" for the American Libraries sub-collection and
> "119,424 items" for the Canadian Libraries. Adding them together, we
> get a total of 322,002.
>
> (Those numbers are updated regularly as more books are put online, so
> people may see higher numbers later.)
>
> Now how many of those 322,000+ books were thanks to Yahoo and how many
> were thanks to Microsoft? Let's start with "major supporter" Yahoo:
>
> http://www.archive.org/details/yahoo_books
>
> I hope you're sitting down when you read this, Michael, because Yahoo
> contributed the staggering total of "1,075 items." That's not a typo;
> I didn't leave out a few digits. Only 1,075! If we divide 1,075 by
> 322,002, we get 0.00334. So only 0.334% of the books the OCA has
> scanned and put online from American and Canadian libraries are thanks
> to "major supporter" Yahoo!
>
> Now let's see how Microsoft did:
>
> http://www.archive.org/details/msn_books
>
> The total for Microsoft is "288,518 items." (Since the OCA is still
> scanning books with funds contributed by Microsoft, this total keeps
> getting updated too.)
>
> 288,518 divided by 322,002 equals 0.896. So 89.6% of the books the OCA
> has scanned and put online from American and Canadian libraries are
> thanks to the company "perhaps no one here actually believed ... was
> serious."
>
> If you're still tempted to cling to the notion that Microsoft wasn't
> serious and that Yahoo is a "major supporter" of the OCA, let's take a
> look at Brewster Kahle's own announcement on May 26 about Microsoft's
> decision:
>
> http://www.archive.org/iathreads/post-view.php?id=194217
>
> It's not too long, so I'm going to quote the whole thing, with a few
> comments from me enclosed in brackets [].
>
>
> "The Internet Archive operates 13 scanning centers in great libraries,
> digitizing 1000 books a day. This scanning is financially supported by
> libraries, foundations, and the Microsoft Corporation. Today,
> Microsoft has announced that it will ramp down their investment in
> this area. We very much appreciate their efforts and funding in book
> scanning over the last 3 years. As a result, over 300,000 books are
> publicly available on the archive.org site that would not otherwise be."
>
> [Note that Brewster didn't mention ANY financial support from Yahoo.
> See why I switched to the past tense and said that "Yahoo WAS a
> minuscule supporter of the OCA"?]
>
> "To their credit, they said they are taking off any contractual
> restrictions on the public domain books and letting us keep the
> equipment that they funded. This is extremely important because it can
> allow those of us in the public sphere to leverage what they helped
> build. Keeping the public domain materials public domain is where we
> all wanted to be. Getting a books scanning process in place is also a
> major accomplishment. Thank you Microsoft."
>
> [Note the mention of "contractual restrictions." I'll get to those a
> little later.]
>
> "Funding for the time being is secure, but going forward we will need
> to replace the Microsoft funding. Microsoft has always encourage the
> Open Content Alliance to work in parallel in case this day arrived.
> Lets work together, quickly, to build on the existing momentum. All
> ideas welcome.
>
> "Onward to a completely public library system!"
>
>
> Did you notice, Michael, that Brewster didn't say anything about Yahoo
> helping to replace the Microsoft funding?
>
>
>> Quite possibly Microsoft found, as did the Federal Spook Agency
>> and Co., that Brewster Kahal, who runs the OCA, is not quite an
>> easy to maniputulate person as they had assumed.
>
>
> But apparently his surname is easy to misspell. ;) Seriously, do you
> recall the "contractual restrictions" I pointed out in Brewster's
> announcement? If you look again at the Microsoft page at the Internet
> Archive:
>
> http://www.archive.org/details/msn_books
>
> you'll see a box labeled "Rights" on the left side of the page. Here's
> what it says:
>
>
> "Books scanned before November 1, 2006 are under OCA principles,
> thereafter they are available for non-commercial use and may not
> appear in commercial services. Please contact info at archive.org or
> Microsoft about bulk access."
>
>
> Brewster Kahle may not be "easy to maniputulate [sic]," but he didn't
> stop Microsoft from violating the OCA principles it had agreed to.
>
>
> Jose Menendez
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From jmdyck at ibiblio.org  Wed Jun 11 13:24:43 2008
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Wed, 11 Jun 2008 13:24:43 -0700
Subject: [gutvol-d] age of PG
In-Reply-To: <Pine.LNX.4.64.0806100843500.27586@pglaf.org>
References: <d2c.2c82ea64.35744c92@aol.com> <484DB329.8060802@novomail.net>
	<484E6429.7040504@perathoner.de>
	<Pine.LNX.4.64.0806100843500.27586@pglaf.org>
Message-ID: <4850348B.1060908@ibiblio.org>

Marcello Perathoner wrote:
> DP ... produced more books in 8 years than PG in 37 years. 

and then Michael Hart wrote:
> 
> And, by the way, it's more like 38 years, but who would make
> the presumptive call that the author in question pays a more
> than lip-service attention to what he is saying. . . .

and

> Of course, we perhaps have to take into account that it
> is outside that author's timeframe of experience to say
> just how easy or hard it was starting those 38 years.
> 
> Well, 37.95, or so.

Sources I've found indicate that PG started on July 4, 1971, which
means that it's very close to 37 years old (36.94 if you like). So it
seems that, on this point, Marcello was quite correct, and Michael not
so much.

See, e.g.,
http://www.gutenberg.org/newsletter/archive/PGWeekly_2008_01_23.txt

-Michael Dyck


From hart at pglaf.org  Wed Jun 11 18:30:36 2008
From: hart at pglaf.org (Michael Hart)
Date: Wed, 11 Jun 2008 18:30:36 -0700 (PDT)
Subject: [gutvol-d] age of PG
In-Reply-To: <4850348B.1060908@ibiblio.org>
References: <d2c.2c82ea64.35744c92@aol.com> <484DB329.8060802@novomail.net>
	<484E6429.7040504@perathoner.de>
	<Pine.LNX.4.64.0806100843500.27586@pglaf.org>
	<4850348B.1060908@ibiblio.org>
Message-ID: <Pine.LNX.4.64.0806111828430.23566@pglaf.org>


Sorry, my bad "simple arithmetic". . .or got stuck on an
olde typo and never corrected it. . . .

We will START our 38th year on July 4.

Well, technically, July 5th, as it was after midnight.


Michael


On Wed, 11 Jun 2008, Michael Dyck wrote:

> Marcello Perathoner wrote:
>> DP ... produced more books in 8 years than PG in 37 years.
>
> and then Michael Hart wrote:
>>
>> And, by the way, it's more like 38 years, but who would make
>> the presumptive call that the author in question pays a more
>> than lip-service attention to what he is saying. . . .
>
> and
>
>> Of course, we perhaps have to take into account that it
>> is outside that author's timeframe of experience to say
>> just how easy or hard it was starting those 38 years.
>>
>> Well, 37.95, or so.
>
> Sources I've found indicate that PG started on July 4, 1971, which
> means that it's very close to 37 years old (36.94 if you like). So it
> seems that, on this point, Marcello was quite correct, and Michael not
> so much.
>
> See, e.g.,
> http://www.gutenberg.org/newsletter/archive/PGWeekly_2008_01_23.txt
>
> -Michael Dyck
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From ebooks at ibiblio.org  Thu Jun 12 22:04:47 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Fri, 13 Jun 2008 01:04:47 -0400
Subject: [gutvol-d] Microsoft quits mass book digitization (was
 cyberlibrary numbers)
In-Reply-To: <Pine.LNX.4.64.0806111028570.17446@pglaf.org>
References: <bf5.3154f7ac.355753ad@aol.com>
	<4848ED25.1040202@ibiblio.org>	<Pine.LNX.4.64.0806060850000.22177@pglaf.org>	<484F6E1E.3090206@ibiblio.org>
	<Pine.LNX.4.64.0806111028570.17446@pglaf.org>
Message-ID: <4851FFEF.3010701@ibiblio.org>

On June 11, 2008, Michael Hart wrote:

> Several more contradiction to refute:


But you didn't "refute" anything. You just spouted more rhetoric and 
unsubstantiated assertions.


> 1.  Microsoft was not 100% responsible for 300K titles--
> via the various Internet Archive / Open Content Alliance
> efforts that have been going on.  Not to mention several
> years of effort without Microsoft that produced 100K.


I never said that Microsoft was "100% responsible," did I? I wrote


"288,518 divided by 322,002 equals 0.896. So 89.6% of the books the 
OCA has scanned and put online from American and Canadian libraries 
are thanks to the company 'perhaps no one here actually believed ... 
was serious.'"


89.6% of 322,002 is not the same as 100%, is it?

As for this line of yours, "Not to mention several years of effort 
without Microsoft that produced 100K," you would have been better off 
not mentioning it, because it's false. Here's a link to a "Wall Street 
Journal" article from Nov. 9, 2005:

"Building an Online Library, One Volume at a Time"
http://online.wsj.com/public/article/SB113111987803688478-VNpw62xi_JA4avE8cxOZf0pf_nM_20061109.html?mod=blogs

And here's an excerpt:


"The Internet Archive's effort to get books online is still in its 
early stages. In the little more than a year since the group started 
scanning books, it has digitized just 2,800 books, at a cost of about 
$108,250. Funding has come largely from libraries that have paid to 
have their texts digitized. Work will likely speed up now that 
Microsoft and Yahoo are on board; both companies joined the effort in 
October...."


Let's see. 100K - 2,800? Congratulations, Michael, you were only off 
by 97,200! :)

By the way, Jim Tinsley posted a link to that "Wall Street Journal" 
article here on the gutvol-d list back on Nov. 12, 2005:

http://lists.pglaf.org/private.cgi/gutvol-d/2005-November/003526.html

And Bowerbird also posted a link to that article on the Book People 
list back on Sept. 4, 2006:

http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2006&post=2006-09-04,7

So you've had opportunities to read it.


> 1d. Hence the term "minuscule" is more appropriate here.
> Given the serious orders of magnitude difference between
> Microsoft and Yahoo, it was Microsoft's interest in this
> world of eBooks that was statistically "minuscule."


According to Yahoo! Finance, Microsoft's current "market cap" (market 
capitalization) is $263.01 billion, and its enterprise value is 
$228.55 billion.

http://finance.yahoo.com/q/ks?s=MSFT

Yahoo's current market cap is $32.36 billion, and its enterprise value 
is $33.95 billion.

http://finance.yahoo.com/q/ks?s=YHOO

Market cap ratio: 263.01/32.36 = 8.13

Enterprise value ratio: 228.55/33.95 = 6.73

So, in both market capitalization and enterprise value, there isn't 
even one order of magnitude difference between Microsoft and Yahoo. If 
we look at their book totals for the OCA, however, we will find 
"serious orders of magnitude difference." Microsoft's total is now up 
to 290,123.

http://www.archive.org/details/msn_books

Yahoo's total is still the piddling 1,075.

http://www.archive.org/details/yahoo_books

290,123/1,075 = 269.9

So, despite your rhetoric and false assertions, Michael, Yahoo is 
still the one whose support for the OCA was "minuscule."


> Let's say 300,000 books at 333 1/3 pages each.
> 
> That's 100 million pages. . .100,000,000
> 
> At 10 cents a page, that's $10 million.


It's funny you didn't do the same sort of calculation for Yahoo's OCA 
contribution.

Let's say 1,200 books (I'm rounding up Yahoo's 1,075 total to get nice 
even results) at 333 1/3 pages each.

That's 400,000 pages.

At 10 cents a page, that's $40,000.

Hmmm.... "Minuscule" may have been too generous an adjective for 
Yahoo's support of the OCA.


> Google is worth some $200+ billion.
> 
> Microsoft is worth some $300+ billion.
> 
> [End of last year]
> 
> search:
> 
> "microsoft is valued at" billion


The first thing that struck me here was that you didn't give a value 
for Yahoo, which would have instantly exposed your false claim of 
"serious orders of magnitude difference between Microsoft and Yahoo."

The second thing that struck me here was how you used the wrong verb 
tense, after criticizing Josh Hutchinson the other day about his 
tenses. The "end of last year" is not the present; it's in the past. 
You should have said, "Google was worth ..." and "Microsoft was worth 
..." They're definitely worth less now.

The third thing that struck me here was your suggestion to search for 
"'microsoft is valued at' billion." Here's a tip for you, Michael: 
There are web sites that people can use to look up detailed financial 
information about companies. For instance, there's Yahoo! Finance 
(http://finance.yahoo.com/), which I used earlier to look up the value 
of Microsoft and Yahoo. If we look up Google's "key statistics,"

http://finance.yahoo.com/q/ks?s=GOOG

we'll see that its current market cap is $173.68 billion, and its 
enterprise value is $159.11 billion.


Jose Menendez

From Bowerbird at aol.com  Thu Jun 12 23:01:04 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 13 Jun 2008 02:01:04 EDT
Subject: [gutvol-d] microsoft digitization
Message-ID: <c26.385c4b46.35836720@aol.com>

i'd like to thank microsoft for the $5 million
they generously kicked in to digitize books,
before their recent decision to exit the scene.
that's a lot more money than i have, for sure!

but i trust it won't strain bill's retirement plans.

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080613/3f3f92a0/attachment.htm 

From hart at pglaf.org  Fri Jun 13 09:58:47 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 13 Jun 2008 09:58:47 -0700 (PDT)
Subject: [gutvol-d] Microsoft quits mass book digitization (was
 cyberlibrary numbers)
In-Reply-To: <4851FFEF.3010701@ibiblio.org>
References: <bf5.3154f7ac.355753ad@aol.com> <4848ED25.1040202@ibiblio.org>
	<Pine.LNX.4.64.0806060850000.22177@pglaf.org>
	<484F6E1E.3090206@ibiblio.org>
	<Pine.LNX.4.64.0806111028570.17446@pglaf.org>
	<4851FFEF.3010701@ibiblio.org>
Message-ID: <Pine.LNX.4.64.0806130948272.480@pglaf.org>


It would be nice if you could keep your tenses straight,
then perhaps people would at least TRY to believe you in
the future. . .but based on your past. . .hardly. . . .

As for Yahoo value, that's YOUR balikwick.

I was talking about Microsoft.

I only mention Yahoo as an aside, but have no interest in
once again doing your homework for you.

You seem to lack a working knowledge of what it means to
be responsible, either in MS's case here, or your own.

Couldn't you at last PRETEND to consider what your rants
and raves will look like years down the road?

Taking my quotes out of context won't get you anywhere,
you can always quote from a later date with 20/20 hindsight,
but no one will be impressed, and you will only make them
aware YOU didn't have better figures either, way be when.

As this goes on and on, I understand "1984" and rewriting
history all the more. . .so I guess I have to make sure it
goes as well as possible before I am gone to make it just
all the more obvious what you will be doing afterwards.

"Morituri te salutamus."

Meanwhile, the name the several of you are making for
yourselves is nothing I would rely on in the future.


On Fri, 13 Jun 2008, Jose Menendez wrote:

> On June 11, 2008, Michael Hart wrote:
>
>> Several more contradiction to refute:
>
>
> But you didn't "refute" anything. You just spouted more rhetoric and
> unsubstantiated assertions.
>
>
>> 1.  Microsoft was not 100% responsible for 300K titles--
>> via the various Internet Archive / Open Content Alliance
>> efforts that have been going on.  Not to mention several
>> years of effort without Microsoft that produced 100K.
>
>
> I never said that Microsoft was "100% responsible," did I? I wrote
>
>
> "288,518 divided by 322,002 equals 0.896. So 89.6% of the books the
> OCA has scanned and put online from American and Canadian libraries
> are thanks to the company 'perhaps no one here actually believed ...
> was serious.'"
>
>
> 89.6% of 322,002 is not the same as 100%, is it?
>
> As for this line of yours, "Not to mention several years of effort
> without Microsoft that produced 100K," you would have been better off
> not mentioning it, because it's false. Here's a link to a "Wall Street
> Journal" article from Nov. 9, 2005:
>
> "Building an Online Library, One Volume at a Time"
> http://online.wsj.com/public/article/SB113111987803688478-VNpw62xi_JA4avE8cxOZf0pf_nM_20061109.html?mod=blogs
>
> And here's an excerpt:
>
>
> "The Internet Archive's effort to get books online is still in its
> early stages. In the little more than a year since the group started
> scanning books, it has digitized just 2,800 books, at a cost of about
> $108,250. Funding has come largely from libraries that have paid to
> have their texts digitized. Work will likely speed up now that
> Microsoft and Yahoo are on board; both companies joined the effort in
> October...."
>
>
> Let's see. 100K - 2,800? Congratulations, Michael, you were only off
> by 97,200! :)
>
> By the way, Jim Tinsley posted a link to that "Wall Street Journal"
> article here on the gutvol-d list back on Nov. 12, 2005:
>
> http://lists.pglaf.org/private.cgi/gutvol-d/2005-November/003526.html
>
> And Bowerbird also posted a link to that article on the Book People
> list back on Sept. 4, 2006:
>
> http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2006&post=2006-09-04,7
>
> So you've had opportunities to read it.
>
>
>> 1d. Hence the term "minuscule" is more appropriate here.
>> Given the serious orders of magnitude difference between
>> Microsoft and Yahoo, it was Microsoft's interest in this
>> world of eBooks that was statistically "minuscule."
>
>
> According to Yahoo! Finance, Microsoft's current "market cap" (market
> capitalization) is $263.01 billion, and its enterprise value is
> $228.55 billion.
>
> http://finance.yahoo.com/q/ks?s=MSFT
>
> Yahoo's current market cap is $32.36 billion, and its enterprise value
> is $33.95 billion.
>
> http://finance.yahoo.com/q/ks?s=YHOO
>
> Market cap ratio: 263.01/32.36 = 8.13
>
> Enterprise value ratio: 228.55/33.95 = 6.73
>
> So, in both market capitalization and enterprise value, there isn't
> even one order of magnitude difference between Microsoft and Yahoo. If
> we look at their book totals for the OCA, however, we will find
> "serious orders of magnitude difference." Microsoft's total is now up
> to 290,123.
>
> http://www.archive.org/details/msn_books
>
> Yahoo's total is still the piddling 1,075.
>
> http://www.archive.org/details/yahoo_books
>
> 290,123/1,075 = 269.9
>
> So, despite your rhetoric and false assertions, Michael, Yahoo is
> still the one whose support for the OCA was "minuscule."
>
>
>> Let's say 300,000 books at 333 1/3 pages each.
>>
>> That's 100 million pages. . .100,000,000
>>
>> At 10 cents a page, that's $10 million.
>
>
> It's funny you didn't do the same sort of calculation for Yahoo's OCA
> contribution.
>
> Let's say 1,200 books (I'm rounding up Yahoo's 1,075 total to get nice
> even results) at 333 1/3 pages each.
>
> That's 400,000 pages.
>
> At 10 cents a page, that's $40,000.
>
> Hmmm.... "Minuscule" may have been too generous an adjective for
> Yahoo's support of the OCA.
>
>
>> Google is worth some $200+ billion.
>>
>> Microsoft is worth some $300+ billion.
>>
>> [End of last year]
>>
>> search:
>>
>> "microsoft is valued at" billion
>
>
> The first thing that struck me here was that you didn't give a value
> for Yahoo, which would have instantly exposed your false claim of
> "serious orders of magnitude difference between Microsoft and Yahoo."
>
> The second thing that struck me here was how you used the wrong verb
> tense, after criticizing Josh Hutchinson the other day about his
> tenses. The "end of last year" is not the present; it's in the past.
> You should have said, "Google was worth ..." and "Microsoft was worth
> ..." They're definitely worth less now.
>
> The third thing that struck me here was your suggestion to search for
> "'microsoft is valued at' billion." Here's a tip for you, Michael:
> There are web sites that people can use to look up detailed financial
> information about companies. For instance, there's Yahoo! Finance
> (http://finance.yahoo.com/), which I used earlier to look up the value
> of Microsoft and Yahoo. If we look up Google's "key statistics,"
>
> http://finance.yahoo.com/q/ks?s=GOOG
>
> we'll see that its current market cap is $173.68 billion, and its
> enterprise value is $159.11 billion.
>
>
> Jose Menendez
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From marcello at perathoner.de  Fri Jun 13 10:43:28 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri, 13 Jun 2008 19:43:28 +0200
Subject: [gutvol-d] Microsoft quits mass book digitization (was
 cyberlibrary numbers)
In-Reply-To: <Pine.LNX.4.64.0806130948272.480@pglaf.org>
References: <bf5.3154f7ac.355753ad@aol.com>
	<4848ED25.1040202@ibiblio.org>	<Pine.LNX.4.64.0806060850000.22177@pglaf.org>	<484F6E1E.3090206@ibiblio.org>	<Pine.LNX.4.64.0806111028570.17446@pglaf.org>	<4851FFEF.3010701@ibiblio.org>
	<Pine.LNX.4.64.0806130948272.480@pglaf.org>
Message-ID: <4852B1C0.1060809@perathoner.de>

Michael Hart wrote:

> It would be nice if you could keep your tenses straight,

... but then got his Latin all tangled up ...

> "Morituri te salutamus."

... and his mathematics is even worse.


   http://en.wikipedia.org/wiki/Ave_Caesar_morituri_te_salutant


-- 
Marcello Perathoner
webmaster at gutenberg.org


From Bowerbird at aol.com  Fri Jun 13 12:35:14 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 13 Jun 2008 15:35:14 EDT
Subject: [gutvol-d] Microsoft quits mass book digitization (was
	cyberlibrary numbers)
Message-ID: <d05.357fa015.358425f2@aol.com>

michael said:
>   "Morituri te salutamus."

i hate latin.   and greek.

but i love google, and the dictionary:
>    http://www.merriam-webster.com/dictionary/morituri%20te%20salutamus

"we (or those) who are about to die salute thee."

this is what gladiators are reputed to have said,
as a salute to the emperor before their battles...

latin is dead, michael.   but you still have spunk...
but if you do go before me (which is no certainty),
i'll save your library from the invading technoids...

and the light markup revolution is the future.
so if i go before i've saved your library, then
_someone_else_will_...

-bowerbird

p.s.   try this one:
>    
http://penelope.uchicago.edu/Thayer/E/Roman/Texts/secondary/journals/TAPA/70/Morituri_Te_Salutamus*.html

p.p.s.   or this one:
>    http://en.wikipedia.org/wiki/For_Those_About_to_Rock_We_Salute_You
which includes things like this:
>    The title track's popularity was such that 
>    in every live concert AC/DC has done thereafter, 
>    the song is performed as an encore and is 
>    always accompanied by firing cannons on stage.
and
>    On Nintendo's website, the ad for the 
>    Wii version of Guitar Hero III: Legends of Rock states 
>    "For those about to rock, Wii salute you".
which, i guess, takes the cake...


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080613/e2d7e212/attachment.htm 

From Bowerbird at aol.com  Fri Jun 13 14:11:12 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 13 Jun 2008 17:11:12 EDT
Subject: [gutvol-d] take a look at an old book
Message-ID: <d44.27bc90f7.35843c70@aol.com>

there's old, and then there is _old_, as in incunabula:
>    http://www.kottke.org/08/06/hypnerotomachia-poliphili

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080613/d4b0195f/attachment.htm 

From donovan at abs.net  Fri Jun 13 16:24:16 2008
From: donovan at abs.net (D Garcia)
Date: Fri, 13 Jun 2008 19:24:16 -0400
Subject: [gutvol-d] take a look at an old book
In-Reply-To: <d44.27bc90f7.35843c70@aol.com>
References: <d44.27bc90f7.35843c70@aol.com>
Message-ID: <200806131924.16749.donovan@abs.net>

On Friday 13 June 2008 17:11, Bowerbird at aol.com wrote:
> there's old, and then there is _old_, as in incunabula:
> >    http://www.kottke.org/08/06/hypnerotomachia-poliphili
>
> -bowerbird

And for those of you who weren't already aware of the existence of this book, 
see PG # 18459, posted 2006-05-27 and produced by Distributed Proofreaders 
from a fascimile reprint.

From hart at pglaf.org  Sat Jun 14 14:34:44 2008
From: hart at pglaf.org (Michael Hart)
Date: Sat, 14 Jun 2008 14:34:44 -0700 (PDT)
Subject: [gutvol-d] Microsoft quits mass book digitization (was
 cyberlibrary numbers)
In-Reply-To: <4852EB86.6060507@perathoner.de>
References: <bf5.3154f7ac.355753ad@aol.com> <4848ED25.1040202@ibiblio.org>
	<Pine.LNX.4.64.0806060850000.22177@pglaf.org>
	<484F6E1E.3090206@ibiblio.org>
	<Pine.LNX.4.64.0806111028570.17446@pglaf.org>
	<4851FFEF.3010701@ibiblio.org>
	<Pine.LNX.4.64.0806130948272.480@pglaf.org>
	<4852B1C0.1060809@perathoner.de>
	<Pine.LNX.4.64.0806131323530.5770@pglaf.org>
	<4852EB86.6060507@perathoner.de>
Message-ID: <Pine.LNX.4.64.0806141422140.22000@pglaf.org>


The author below STILL hasn't done his homework. . .sadly to say,
but par for his tour of the course.

1.

The way this salute is intended is to give an added motivation to
the gladiators to WIN. . .then they have NOT saluted, as their
words were only true IF THEY DIED.

We who are NOT about to die do NOT salute you!!!


2.

There were gladiators for centuries before Caesar. . .duh!!!

Do how could Caesar's version be the original???. . .duh!!!


3.

This is what happens when someone rewrites history.

"To the victors belong the spoils" and just ONE of the spoils
is being able to rewrite history to one's liking. . . .

The author below could quote searches that indicate at a ten
to one ratio, or so, that his quotation is the correct one.

But that is just a memento of the FACT that Julius Caesar
rewrote history so damned thoroughly that even our historians
quote his "Hail Caesar" ten times as often as what the salute
actually was for hundreds of years before Caesar came along.


4.

"The fault lies NOT in the stars, dear Marcello,
the fault lies in ourselves. . . ."


History, even when it is there to be read, is most often
left to the interpretations of others. . .sadly to say.

It's all too obvious. . .doesn't always need interpreters.


On Fri, 13 Jun 2008, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>> "amus" is "we". . .not "ant". . . .
>> 
>> For those of you who never actually took Latin.
>
> Oh, how I wish! I actually "was taken" to Latin much against my 
> will.
>
>
>> _I_ for one, at least, have not crowned YOU Caesar.
>> 
>> "We who are about to die salute you"
>> 
>> Is different than:
>> 
>> "Those who are about to die salute you."
>
> "We who are about to die salute you" is wrong, because everybody 
> was saluting but not everybody that saluted was going to die.
>
> "Those who are about to die salute you" is right because everybody 
> who died, had saluted.
>
> Check the locus classicus, the Life of Claudius by Suetonius, 
> 21.6. :
>
> http://penelope.uchicago.edu/Thayer/L/Roman/Texts/Suetonius/12Caesars/Claudius*.html#21.6
>
> But you seem to prefer getting your quotes wrong, (or quoting 
> people that got their quotes wrong) because you can't be bothered 
> doing your homework.
>
>
>> But, then, truly, YOU, have always insisted on being a Caesar
>> in a world where there are no Caesars allowed. . . .
>
> Last time it was Stalin, this time it is Caesar, who will it be 
> next? Napoleon? Hitler? Bush?
>
>
>
> -- 
> Marcello Perathoner
> webmaster at gutenberg.org
>

From marcello at perathoner.de  Sat Jun 14 15:19:27 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun, 15 Jun 2008 00:19:27 +0200
Subject: [gutvol-d] Microsoft quits mass book digitization (was
 cyberlibrary numbers)
In-Reply-To: <Pine.LNX.4.64.0806141422140.22000@pglaf.org>
References: <bf5.3154f7ac.355753ad@aol.com>
	<4848ED25.1040202@ibiblio.org>	<Pine.LNX.4.64.0806060850000.22177@pglaf.org>	<484F6E1E.3090206@ibiblio.org>	<Pine.LNX.4.64.0806111028570.17446@pglaf.org>	<4851FFEF.3010701@ibiblio.org>	<Pine.LNX.4.64.0806130948272.480@pglaf.org>	<4852B1C0.1060809@perathoner.de>	<Pine.LNX.4.64.0806131323530.5770@pglaf.org>	<4852EB86.6060507@perathoner.de>
	<Pine.LNX.4.64.0806141422140.22000@pglaf.org>
Message-ID: <485443EF.30006@perathoner.de>

Michael Hart wrote:

> The author below STILL hasn't done his homework. . .sadly to say,
> but par for his tour of the course.

It is bad manners to post a private mail you received without asking the 
sender first.

Also, I have a name, and your "the author below" affectation is childish 
at best.

Also, why frontchannel this discussion again after we held it private 
for a while?


> But that is just a memento of the FACT that Julius Caesar
> rewrote history so damned thoroughly that even our historians
> quote his "Hail Caesar" ten times as often as what the salute
> actually was for hundreds of years before Caesar came along.

How could Julius Caesar have rewritten history that happened a hundred 
years after he was dead?


I did give you a link to the only classical mention in Latin of the 
phrase you (mis-)quoted. Here it is again:

http://penelope.uchicago.edu/Thayer/L/Roman/Texts/Suetonius/12Caesars/Claudius*.html#21.6 


   ---- Life of Claudius by Suetonius, 21.6.


If you had bothered to check the reference I gave you, you would easily 
have spotted the fact that Suetonius was talking about Tiberius Claudius 
Caesar (10 BC - AD 54) and not about Gaius Julius Caesar (100 BC - 44 BC).

You ranted about the *wrong* Caesar!

Embarassing, Michael, embarassing.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From Bowerbird at aol.com  Sat Jun 14 15:21:18 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 14 Jun 2008 18:21:18 EDT
Subject: [gutvol-d] microsoft digitization
Message-ID: <bd2.2414c670.35859e5e@aol.com>

i said:
>    that's a lot more money than i have, for sure!

and even if i did have $5 million dollars, or $25 million,
i probably wouldn't donate much of it to digitize books.

nope, i'd probably spend it on hookers and blow...        :+)

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080614/c657c670/attachment.htm 

From grythumn at gmail.com  Sat Jun 14 15:34:18 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Sat, 14 Jun 2008 18:34:18 -0400
Subject: [gutvol-d] take a look at an old book
In-Reply-To: <200806131924.16749.donovan@abs.net>
References: <d44.27bc90f7.35843c70@aol.com>
	<200806131924.16749.donovan@abs.net>
Message-ID: <15cfa2a50806141534x36b6a361p712a69fed055baac@mail.gmail.com>

On Fri, Jun 13, 2008 at 7:24 PM, D Garcia <donovan at abs.net> wrote:
> On Friday 13 June 2008 17:11, Bowerbird at aol.com wrote:
>> there's old, and then there is _old_, as in incunabula:
>> >    http://www.kottke.org/08/06/hypnerotomachia-poliphili
>>
>> -bowerbird
>
> And for those of you who weren't already aware of the existence of this book,
> see PG # 18459, posted 2006-05-27 and produced by Distributed Proofreaders
> from a fascimile reprint.

Just as a note, that edition is a facsimile reprint of an early
(~1592) partial english translation of the latin/italian original
(which is much prettier, FWIW). The first complete english translation
was published relatively recently (1999).

R C
(PM of said project)

From hart at pglaf.org  Sat Jun 14 15:37:48 2008
From: hart at pglaf.org (Michael Hart)
Date: Sat, 14 Jun 2008 15:37:48 -0700 (PDT)
Subject: [gutvol-d] Microsoft quits mass book digitization (was
 cyberlibrary numbers)
In-Reply-To: <485443EF.30006@perathoner.de>
References: <bf5.3154f7ac.355753ad@aol.com> <4848ED25.1040202@ibiblio.org>
	<Pine.LNX.4.64.0806060850000.22177@pglaf.org>
	<484F6E1E.3090206@ibiblio.org>
	<Pine.LNX.4.64.0806111028570.17446@pglaf.org>
	<4851FFEF.3010701@ibiblio.org>
	<Pine.LNX.4.64.0806130948272.480@pglaf.org>
	<4852B1C0.1060809@perathoner.de>
	<Pine.LNX.4.64.0806131323530.5770@pglaf.org>
	<4852EB86.6060507@perathoner.de>
	<Pine.LNX.4.64.0806141422140.22000@pglaf.org>
	<485443EF.30006@perathoner.de>
Message-ID: <Pine.LNX.4.64.0806141524590.22855@pglaf.org>

On Sun, 15 Jun 2008, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>> The author below STILL hasn't done his homework. . .sadly to 
>> say,
>> but par for his tour of the course.
>
> It is bad manners to post a private mail you received without 
> asking the sender first.

It is equally bad manners to send an ex parte reply to a public
message. . .please do not do so again.

If _I_ did that, it was in error, either that the message to you
arriveed and the message to the list did not, or operator error,
but it was not intentional.


> Also, I have a name, and your "the author below" affectation is 
> childish at best.

This is toinsure everyone knows that I am not making this case as
an ad hominem secnario, but just to keep things honest.

Using your name would make it too personal.


> Also, why frontchannel this discussion again after we held it 
> private for a while?

I am not trying to make this a private discussion, and I thought
all our messages were going through the list.


>
>
>> But that is just a memento of the FACT that Julius Caesar
>> rewrote history so damned thoroughly that even our historians
>> quote his "Hail Caesar" ten times as often as what the salute
>> actually was for hundreds of years before Caesar came along.
>
> How could Julius Caesar have rewritten history that happened a 
> hundred years after he was dead?

"before he was dead". . .don't you READ before your reply???

How do you expect ANYONE to take you seriously???


There are some 22,000 links to the search I did,
and they date back to 4th Century BC, which is
way before any of the Caesar's time.


> I did give you a link to the only classical mention in Latin of 
> the phrase you (mis-)quoted. Here it is again:
>
> http://penelope.uchicago.edu/Thayer/L/Roman/Texts/Suetonius/12Caesars/Claudius*.html#21.6 
>
>  ---- Life of Claudius by Suetonius, 21.6.
>
>
> If you had bothered to check the reference I gave you, you would 
> easily have spotted the fact that Suetonius was talking about 
> Tiberius Claudius Caesar (10 BC - AD 54) and not about Gaius 
> Julius Caesar (100 BC - 44 BC).
>
> You ranted about the *wrong* Caesar!
>
> Embarassing, Michael, embarassing.

That's the WHOLE POINT. . .AND /YOU/ MISSED IT. . . .

There were NOT any "Caesars" before Julius. . . .

Thus ALL references to "Caesars" are hundreds of years
AFTER the origins of gladiators and their salutes.


Oh, you also missed the point about their motivativation,
as ONLY those who were about to die were saluting Caesar.

Repeat:  Please don't send me private replies to public
messages, they will always go back to the list.


>
>
> -- 
> Marcello Perathoner
> webmaster at gutenberg.org
>

From hart at pglaf.org  Sat Jun 14 15:44:15 2008
From: hart at pglaf.org (Michael Hart)
Date: Sat, 14 Jun 2008 15:44:15 -0700 (PDT)
Subject: [gutvol-d] Leaving This List
Message-ID: <Pine.LNX.4.64.0806141538500.22855@pglaf.org>


I am going to do a trial run of not answering the various
rants and raves from the well-known and even lesser-known
tag team flame warriors on this list, and let Greg Newby,
our CEO, advise me when I should reply.

Obviously the recent events have not been constructive to
any degree, and I only reply for the sake of making sure,
to whatever degree I can, that honesty and accuracy are a
value someone is trying to keep alive.

No one has sent me any messages stating that there is any
real need to refute this handful of pretenders, so I will
simply await requests from our general population or Greg
Newby. . .unless I have a day that is too boring. . . .


Michael


From gbnewby at pglaf.org  Sat Jun 14 16:33:00 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat, 14 Jun 2008 16:33:00 -0700
Subject: [gutvol-d] good tools that do all the work (the short version)
In-Reply-To: <c3d.33a92250.35816dd6@aol.com>
References: <c3d.33a92250.35816dd6@aol.com>
Message-ID: <20080614233300.GH23938@mail.pglaf.org>

On Wed, Jun 11, 2008 at 02:05:10PM -0400, Bowerbird at aol.com wrote:
> michael said:
> >    his major goal is to cause strife in our midst
> 
> i keep wondering why you have this person _inside_ your organization,
> minding your website.   certainly web-jockeys can't be that hard to find.

Marcello has done, and continues to do, outstandingly good
volunteer service in maintaining the gutenberg.org Web site.

That has no bearing on his freedom to express himself on the
gutvol-d list, anymore than anyone else.

For Michael, or me, to seek to turn down major contributions due a
disagreement, argument, etc. would be pretty inconsistent with 
the overall management (or "non-management") of PG.

> indeed, it seems to me that there are a lot of vultures around here,
> just waiting for you to die, so they can feast on the p.g. carcass and
> turn it into the opposite of what you intended which made it great...

There are a variety of reasons why "taking over" isn't so easily
done, with or without Michael's involvement.  The most important,
I think, is that taking over really means doing a lot of work to
create something new, or taking the current PG & augenting it...evidently,
there aren't too many folks ready to do that, even WITH Michael's
encouragement.  

Some have, though.  Thus, we have:
 PGDP [pgdp.net]
 PGCC [gutenberg.us]
plus national sites like PG Canada

Plus those who decided the PG way wasn't their way, and did
something pretty different.  It's not hubris to put archive.org
in that group.

> of course, maybe once you're gone, and they take over, and p.g. fails,
> that will be the ultimate proof that your approach was the correct one.
> then someone else will come in and pick up the pieces and restore it...
> crafty, michael, crafty...          :+)

PG as a collection is pretty resilient...hard to make it go
away, by design.  Things like the Web site, catalog, and other
metata are also pretty resilient, though take a somewhat delicate
infrastructure to maintain.  Things like mailing lists are transient,
not mission critical.

I don't know what you mean by failure.  The work that's been done, is
done -- the fruits of that labor are available, and will remain
available.  I can think of various things that might indicate the end of
PG as it is now [for example, having no new content to add to the
collection], but how is that failure for PG?
  -- Greg


From Bowerbird at aol.com  Sat Jun 14 18:15:37 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 14 Jun 2008 21:15:37 EDT
Subject: [gutvol-d] good tools that do all the work (the short version)
Message-ID: <c34.354cf5bd.3585c739@aol.com>

greg said:
>    Marcello has done, and continues to do, outstandingly good
>    volunteer service in maintaining the gutenberg.org Web site.

i'm glad to hear it.              :+)


>    That has no bearing on his freedom to express himself 
>    on the gutvol-d list, anymore than anyone else.

i'm glad he expresses himself too.
even if i don't find anything worthwhile when he does.
(and heck, he's been in my spam folder for a while now, so maybe
his more-recent posts have been nicely-reasoned pieces of logic.)

still...   it's pretty clear what he thinks about michael...

and i wouldn't give my house-keys to a sworn enemy.
(but, you know, maybe that's just me.)

i also think it's quite ironic that michael seems to get
a lot more respect from the outside world than he gets
right here on the p.g. listserve.   but i guess that's what
they say about prophets...


>   I don't know what you mean by failure.

i could describe some scenarios, but that would be conjecture.

the files you have now -- and the ones you will gain in the future --
will remain available, so if you consider that collection to be "success",
now and into the future, then no, there's no way that p.g. can "fail"...

of course, using such a definition, the university of virginia collection
would also be considered a "success".   but i wouldn't wanna be them...

so, how do _you_ define "success" and "failure"?

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080614/65769fc0/attachment.htm 

From grythumn at gmail.com  Sat Jun 14 22:22:52 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Sun, 15 Jun 2008 01:22:52 -0400
Subject: [gutvol-d] Microsoft quits mass book digitization (was
	cyberlibrary numbers)
In-Reply-To: <Pine.LNX.4.64.0806141524590.22855@pglaf.org>
References: <bf5.3154f7ac.355753ad@aol.com>
	<Pine.LNX.4.64.0806111028570.17446@pglaf.org>
	<4851FFEF.3010701@ibiblio.org>
	<Pine.LNX.4.64.0806130948272.480@pglaf.org>
	<4852B1C0.1060809@perathoner.de>
	<Pine.LNX.4.64.0806131323530.5770@pglaf.org>
	<4852EB86.6060507@perathoner.de>
	<Pine.LNX.4.64.0806141422140.22000@pglaf.org>
	<485443EF.30006@perathoner.de>
	<Pine.LNX.4.64.0806141524590.22855@pglaf.org>
Message-ID: <15cfa2a50806142222y74d148deqfe76ae1b4446bfc@mail.gmail.com>

On Sat, Jun 14, 2008 at 6:37 PM, Michael Hart <hart at pglaf.org> wrote:
>> It is bad manners to post a private mail you received without
>> asking the sender first.
>
> It is equally bad manners to send an ex parte reply to a public
> message. . .please do not do so again.

It is polite practice in most online communities to take discussions
off list when they go offtopic, or degenerate into flame wars*, in
respect for the time of others on the list not directly involved.

http://www.albion.com/netiquette/rule7.html
http://www.dtcc.edu/cs/rfc1855.html#3

This thread ceased being amusing quite a while ago.

R C
(* excluding forums set aside for flaming, of course.)

From Bowerbird at aol.com  Sun Jun 15 01:18:32 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 15 Jun 2008 04:18:32 EDT
Subject: [gutvol-d] Microsoft quits mass book digitization (was
	cyberlibrary numbers)
Message-ID: <bc7.264a2844.35862a58@aol.com>

robert said:
>   It is polite practice in most online communities 
>    to take discussions off list when they go offtopic, 
>    or degenerate into flame wars*

oh my goodness, we're talking about "polite practice" now?

what a kick!   as if this "discussion" has _ever_ been "polite"...

that's a knee-slapper if i ever heard one...


>   This thread ceased being amusing quite a while ago.

well, hey, robert, you just changed that!            :+)

michael's position on this question was very straightforward:
microsoft was never all that serious about book digitization,
so there's no reason to mourn now that they have opted out.
michael predicted at the start that they wouldn't stick around.

now maybe someone came up with a well-reasoned argument
against his positions, and i don't know about it because they're
in my spam folder, in which case i hope someone will repeat it...

but i'd guess that instead it was just the typical run-of-the-mill
gamut of insults thrown at michael in the hope that the lurkers
would fail to sort the barrage to determine that _nothing_stuck_.

as far as _i_ am concerned, microsoft corrupted the "purity" of the
o.c.a., all for a positively _tiny_ amount of money in the big scheme
of things (you know, the arena where they made a bid to buy yahoo
for _$44_billion_), so i'm _glad_ they've left our little neighborhood,
as now brewster can go back to being true to his basic philosophy...

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080615/1871690f/attachment.htm 

From walter.van.holst at xs4all.nl  Sun Jun 15 05:16:31 2008
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Sun, 15 Jun 2008 14:16:31 +0200
Subject: [gutvol-d] good tools that do all the work (the short version)
In-Reply-To: <20080614233300.GH23938@mail.pglaf.org>
References: <c3d.33a92250.35816dd6@aol.com>
	<20080614233300.GH23938@mail.pglaf.org>
Message-ID: <4855081F.2040309@xs4all.nl>

Greg Newby wrote:

> I don't know what you mean by failure.  The work that's been done, is
> done -- the fruits of that labor are available, and will remain
> available.  I can think of various things that might indicate the end of
> PG as it is now [for example, having no new content to add to the
> collection], but how is that failure for PG?

And that touches exactly the point of contention. PG and about any other 
  open content/open source/free software project that has produced 
anything of interest cannot fail in the sense of being completely in 
vain as long as that what has been produced is still accessible. PG 
however does fail in the sense that it doesn't even grab relatively low 
hanging fruit. It is a failure in the sense of unfulfilled, or at least 
much slower than possible fulfilled, potential.

The useless flamewars on this will keep raging on as long as the 
participants of this list do not acknowledge that there is more than one 
failure mode and that PG is in the second one. Even someone as the 
otherwise in matters of engineering greatly misguided Leslie "Bowerbird 
Intelligentleman" Hanson seems at least able to make some effort to 
grasp said low hanging fruit.

However sympathetic the Zen-like way of running PG may be, letting 
thousands of flowers bloom does not rule out the possibility to at least 
express a slight preference for quality control and structure. There is 
a vast spectrum between the "we're not going to do any form of quality 
control other than copyright clearance" approach currently taken and a 
borderline fascist insistence on strict adherence to TEI-formatting as 
the other extreme.

It is a pity and a shame that all this has to deteriorate into a clash 
of massive egos every time this comes up. As others have said already, 
this has indeed ceased to be entertaining quite a while ago.

Regards,

  Walter

From Bowerbird at aol.com  Sun Jun 15 08:44:01 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 15 Jun 2008 11:44:01 EDT
Subject: [gutvol-d] good tools that do all the work (the short version)
Message-ID: <c31.349ba074.358692c1@aol.com>

walter said:
>    Leslie "Bowerbird Intelligentleman" Hanson

do please leave my girlfriend out of the discussion.   thanks.        :+)

otherwise, relatively well-put, walter... the main place where you
are off-base is that, on your "vast spectrum", it is possible to grab
all kinds of fruit (not just the "low-hanging" variety) by moving just
a _smidgen_ away from the "zen-like" side, without any need at all
to go anywhere near the "fascist" end, and once you see me making
that happen, you'll realize i wasn't "misguided", but right on-target...

and once greg realizes the huge benefits that become available from
the small bump that consistency adds to the cost side of the equation,
he too will come to see that p.g. in its current state really is a "failure".
a very fortunate, happy "failure", to be sure, but a failure nonetheless...

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080615/fa3b45bd/attachment.htm 

From hart at pglaf.org  Sun Jun 15 11:11:51 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 15 Jun 2008 11:11:51 -0700 (PDT)
Subject: [gutvol-d] Subject: !@! Re: Microsoft quits mass book digitization.
 . . .
Message-ID: <Pine.LNX.4.64.0806151110240.3856@pglaf.org>


Please, as a matter of decent practice on this list,
at least ANNOUNCE that you are speaking off list and
send that announcement BOTH in a message to the list
AND to make sure the person you are emailing knows.


On Sun, 15 Jun 2008, Robert Cicconetti wrote:

> On Sat, Jun 14, 2008 at 6:37 PM, Michael Hart <hart at pglaf.org> wrote:
>>> It is bad manners to post a private mail you received without
>>> asking the sender first.
>>
>> It is equally bad manners to send an ex parte reply to a public
>> message. . .please do not do so again.
>
> It is polite practice in most online communities to take discussions
> off list when they go offtopic, or degenerate into flame wars*, in
> respect for the time of others on the list not directly involved.
>
> http://www.albion.com/netiquette/rule7.html
> http://www.dtcc.edu/cs/rfc1855.html#3
>
> This thread ceased being amusing quite a while ago.
>
> R C
> (* excluding forums set aside for flaming, of course.)
>

From tb at baechler.net  Sun Jun 15 12:12:58 2008
From: tb at baechler.net (Tony Baechler)
Date: Sun, 15 Jun 2008 12:12:58 -0700
Subject: [gutvol-d] Preprints: Hart/Newby HOPE 6 presentation uploaded
Message-ID: <20080615191258.GA30136@investigative.net>

All,

I wouldn't normally post this type of announcement to the list, but I 
saw that the Wilson SF books from Preprints were recently posted so I 
thought I would announce here to avoid duplication of effort.

I have processed and uploaded the HOPE number 6 presentation with 
Michael Hart and Greg Newby.  It's on the pglaf.org server, but who 
knows when it will be posted.  If anyone wants the files before they're 
officially posted, ask here and I'll contact you off list.  The mp4 file 
is fairly huge but the audio files aren't that big.

Again, this announcement is informational only to avoid duplicated 
effort.

From julio.reis at tintazul.com.pt  Sun Jun 15 14:46:11 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Sun, 15 Jun 2008 22:46:11 +0100
Subject: [gutvol-d] No (more) flames
In-Reply-To: <mailman.2722.1213507375.2809.gutvol-d@lists.pglaf.org>
References: <mailman.2722.1213507375.2809.gutvol-d@lists.pglaf.org>
Message-ID: <1213566371.7712.165.camel@abetarda>

Robert Cicconetti said,

>         It is polite practice in most online communities to take
>         discussions
>         off list when they go offtopic, or degenerate into flame
>         wars*, in
>         respect for the time of others on the list not directly
>         involved.

and

>         This thread ceased being amusing quite a while ago.

That's how I feel.

I subscribe this list because I'm a volunteer for PG, mostly working in
DP, inviting others to look at PG, and working the PG catalog.

What I get from this list mostly is that Marcello and Michael hate each
other, for a reason or twelve; that Bowerbird hates DP and most people
there (and here) hate him, for a reason or one hundred; and that's it.
Not much that really interests Gutenberg volunteers like myself. And
yet, this list is called gutvol-d.

I'm interested in ebooks, not in power struggles, or in name-calling, or
in why DP treats its vols so badly, or in why that other markup is mo
much better than this one. So I ask myself: why do I still subscribe to
this list? The answer is, I expect the discussions to change.

I don't know what to say. The most positive thing that happened over
here lately was Michael deciding not to reply to it anymore. A bit sad,
really. My question is -- will everybody else stop the off-topic stuff?

J?lio.

PS -- Or how about starting gutrant-d just to debate the bad leaders and
the bad markup and the bad treatment of volunteers and other bad stuff?


From Bowerbird at aol.com  Sun Jun 15 15:54:22 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 15 Jun 2008 18:54:22 EDT
Subject: [gutvol-d] No (more) flames
Message-ID: <d3a.23e96720.3586f79e@aol.com>

julio said:
>    that Bowerbird hates DP and most people there (and here) hate him, 
>    for a reason or one hundred; and that's it.

oh please.   you seem to be very bad at figuring out the truth.

so here, let me spell it out for you, very clearly...

1.   i don't "hate" distributed proofreaders.   to the direct contrary, i 
love it.
thousands of volunteers digitizing the public domain, how can you not?
what i _do_ "hate" is the fact that too much time and energy from these
thousands of volunteers is _wasted_ by a tremendously bad workflow...

i've detailed the numerous problems with this workflow _many_ times,
yet the d.p. "powers that be" just drag their feet on making corrections,
mostly because they don't like to have their authority challenged, at all,
let alone by a vocal critic who can muster up the power of logic like i can,
and who'll do the work of building up a mountain of supportive evidence.

that's why they silenced me on their own forums.   that way, as time went on,
they've been able to implement many of the changes that i'd suggested, but
without "conceding" that i was correct.   i'll document these changes later 
on.

it should be very clear to you, and everyone else, that i spend a lot of time
researching and writing my posts.   i'm willing to give that time for the 
cause
precisely because i do _love_ the volunteers for project gutenberg and d.p.
for something i don't care about (university of virginia?), i spend zero 
time.

2.   most of the people at distributed proofreaders don't even _know_ me,
let alone "hate" me.   of the ones that _do_ know me, many of them realize
my intentions come from a good heart.   even the ones who won't grant that
have come to learn (painfully) they cannot mount an argument against me.
you can't find one case where i've given d.p. bad advice...   not a single 
one.

3.   most of the people _here_ on this listserve do not "hate" me either...
they don't like all the flack, but the vast majority of them clearly know
(and will tell you) that my antagonists are responsible for that, not me...

i'm glad that you are speaking up to say you are tired of all the flack.
i'm tired of it too.   i've been tired of it for a very, very, very long 
time...

so i hope the people who are _responsible_ for the flack get the message.
we want it to stop.   we want to have quiet, rational discussions on this 
list.


>   gutrant-d

if you think i write "rants", you're not reading the evidence i provide...

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080615/dae7ba3f/attachment.htm 

From grythumn at gmail.com  Sun Jun 15 16:57:10 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Sun, 15 Jun 2008 19:57:10 -0400
Subject: [gutvol-d]  New question (Was No (more) flames)
Message-ID: <15cfa2a50806151657v31a7975dy2b34f2abbd7be38a@mail.gmail.com>

On Sun, Jun 15, 2008 at 5:46 PM, J?lio Reis <julio.reis at tintazul.com.pt> wrote:
> Robert Cicconetti said,
>>         It is polite practice in most online communities to take
>>         discussions
>>         off list when they go offtopic, or degenerate into flame
>>         wars*, in
>>         respect for the time of others on the list not directly
>>         involved.
>
> and
>
>>         This thread ceased being amusing quite a while ago.
>
> That's how I feel.
>
[...]
> I'm interested in ebooks, not in power struggles, or in name-calling, or
> in why DP treats its vols so badly, or in why that other markup is mo
> much better than this one. So I ask myself: why do I still subscribe to
> this list? The answer is, I expect the discussions to change.

Well, let me pull out a real question that I've been working on.. I
have a clearance on most of the OED. I'm trying to figure out what
'final format' to shoot for, as this is going to require a lot of
markup not standard for DP, and I'll probably have to devise a
simplified or condensed form for the formatting rounds. My top
candidates right now are 1) A flavor of TEI (leaning towards freedict
standards), or 2) XDXF.

It's clear from looking at the text that semantic markup is going to
be easier than presentational for this project, as many of the style
differences are quite subtle. Does anyone have any experiences or
recommendations to share?

R C

From gbnewby at pglaf.org  Mon Jun 16 10:51:46 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon, 16 Jun 2008 10:51:46 -0700
Subject: [gutvol-d] OED
Message-ID: <20080616175146.GA23026@mail.pglaf.org>

On Sun, Jun 15, 2008 at 5:46 PM, J\372lio Reis <julio.reis at tintazul.com.pt wrote:
>Robert Cicconetti said,
>
>Well, let me pull out a real question that I've been working on.. I
>have a clearance on most of the OED. I'm trying to figure out what
>'final format' to shoot for, as this is going to require a lot of
>markup not standard for DP, and I'll probably have to devise a
>simplified or condensed form for the formatting rounds. My top
>candidates right now are 1) A flavor of TEI (leaning towards freedict
>standards), or 2) XDXF.
>
>It's clear from looking at the text that semantic markup is going to
>be easier than presentational for this project, as many of the style
>differences are quite subtle. Does anyone have any experiences or
>recommendations to share?

Robert, I don't really know the answer...the OED is immensely
complex, as you know..lots of typography, fonts, etc.

But I wanted to say: GO FOR IT!  This is a massive project,
and really, really important.  Having it be machine readable
will be a wonderful contribution.
  -- Greg


From Bowerbird at aol.com  Mon Jun 16 11:13:55 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 16 Jun 2008 14:13:55 EDT
Subject: [gutvol-d] 6 weeks of nothing been done
Message-ID: <bc5.26ceb779.35880763@aol.com>

it's now been 6 weeks since the "confidence in page" page 
has been updated over on the distributed proofreaders wiki.

basically, the person who was working on the task has been
swallowed up by real-world responsibilities, meaning that
this particular wild goose chase has come to a complete halt.

so the fixed-round workflow continues to remain ensconced.

this is a shame, because it impacts both quantity _and_ quality.

quantity is dampened because some pages are being handled
_too_many_times_.   they were easy enough to be perfected early,
and the subsequent rounds are simply needlessly wasted energy.

and quality is hurt as well, because some pages don't get seen
_enough_ times, so flaws remain because of insufficient views.

so even though "the d.p. powers that be" _agree_ that they need
a roundless workflow, they aren't doing anything to bring it about.
(which means they don't really grasp the importance of it after all,
because if they did, they would _work_ to make it a high priority.)

what _is_ happening is that people are doing all kinds of ad-hoc,
one-off "round juggling" -- repeating p1, skipping rounds, etc.,
which helps only a little bit and is mostly just a big waste of time
that is tolerated because it vents the volunteers' frustration and
impatience with the broken nature of the fixed-round system...

and meanwhile, the answer to "how can we be confident that a
specific page needs no more proofing?" is as clear as it ever was,
namely, when it undergoes _n_ iterations without any changes,
where _n_ can be any number you want it to be, i suggest _2_...

if they just implemented this simple solution, they'd see it works.

-bowerbird


**************
Vote for your city's best dining and nightlife. City's Best 
2008.
      (http://citysbest.aol.com?ncid=aolacg00050000000102)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080616/7989f25b/attachment.htm 

From julio.reis at tintazul.com.pt  Tue Jun 17 03:15:25 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Tue, 17 Jun 2008 11:15:25 +0100
Subject: [gutvol-d] No (more) flames
In-Reply-To: <mailman.2.1213642802.24862.gutvol-d@lists.pglaf.org>
References: <mailman.2.1213642802.24862.gutvol-d@lists.pglaf.org>
Message-ID: <1213697725.6578.49.camel@abetarda>

Bowerbird said,

> i'm glad that you are speaking up to say you are tired of all the
> flack. i'm tired of it too.  i've been tired of it for a very, very,
> very long time...

I may not be very good at figuring out the truth :D but I don't dispute
people's feelings. So you're tired of all the flak... I'm with you; just
note that sentences such as this one actually *attract* flak:

> oh please.  you seem to be very bad at figuring out the truth.

And your following remark sounds a tad patronizing, which might attract
even *more* flak:

> so here, let me spell it out for you, very clearly...

Take care not to fan the flames. :)

Anyway, good to know you love DP, honest I couldn't tell from your posts
here. You say the DP bosses treat their proofers badly -- but it is
*you* who make them feel bad for being treated like this or like that.
So spread that love around and improve the condition for ebook
volunteers somewhere; if your mojo won't work at DP, help somewhere
else. Do something creative. I've seen z-m-l.com which might or might
not be a good idea; try other stuff, somewhere else. I wish there was
real competition to DP, and to Gutenberg; even because if competition to
DP would increase the throughput of free texts, then it's really not
competition; same for another huge free ebook library not really being
competition to Gut, but working towards the same goal: free texts.

J?lio.


From Bowerbird at aol.com  Tue Jun 17 04:39:51 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 17 Jun 2008 07:39:51 EDT
Subject: [gutvol-d] No (more) flames
Message-ID: <d2f.24d4ff45.3588fc87@aol.com>

julio said:
>    just note that sentences such as this one actually *attract* flak:

as do sentences like:
>   Bowerbird hates DP and most people there (and here) hate him

c'est la vie...


>   Take care not to fan the flames. :)

i take care not to get burned.

after that, let the flames burn what they will...


>    Anyway, good to know you love DP, 
>    honest I couldn't tell from your posts here.

seriously?   people go to great lengths to defend the people they love.


>    You say the DP bosses treat their proofers badly -- 
>    but it is *you* who make them feel bad for being treated like this

well, yes, and that is a delicious conundrum.

when a person you love has taken up with someone who treats them badly
-- and we all know that this happens all the time -- do you say something?
or do you let them suffer in silence?

it's always a difficult question to answer, and the answer, it seems to me,
often depends upon just how bad the treatment is...

and when you can document that the treatment is _very_ bad, you squawk...

especially when you can easily make that documentation,
and clearly show the treatment is _very_ bad, you squawk...

so i'm squawkin'...


>    So spread that love around and 
>    improve the condition for ebook volunteers somewhere; 

well, i'm really trying to decide how i can do that, julio, i really am...


>    if your mojo won't work at DP, help somewhere else. 

yeah, i'm not above thinking that.

i'm really not...

but there is something...

something very insidious...

about _forking_...

it's not good for collaborative projects...

it's really not...

so i have a large degree of reluctance when it comes to forking...


>    Do something creative. 
>    I've seen z-m-l.com which might or might not be a good idea; 

well, in case you can't decide, it's an excellent idea...

and as i fork the p.g. library, you'll come to realize that...

(but again, i fork with extreme reluctance...)


>    try other stuff, somewhere else.

no need to.   the test bed has already been seeded...


>    I wish there was real competition to DP, and to Gutenberg;

well, again, i'm not sure, because forking is not a healthy thing...


>    even because if competition to DP would 
>    increase the throughput of free texts, 
>    then it's really not competition;
>    same for another huge free ebook library 
>    not really being competition to Gut, but 
>    working towards the same goal: free texts.

well yeah, it's not that something else is "competition".

because -- as you've noted -- it's pretty much all complementary...
online resources are _not_limited_, so they do not have a zero-sum
relationship with one another, so they can't cannibalize each other,
so the situation of "competition" doesn't really exist in their world...

this is a big part of michael's overall philosophy, and it does work...

but forking _is_ nonetheless a real danger, because it splits resources...
it disturbs synergy, and synergy is the be-all and end-all of collaboration.

i could've mounted an alternative to distributed proofreaders long ago.
(at one point, i announced one, named "committed proofreaders", but..)

but the thing is, i don't really want to create and nurture a _community_,
which is what you would have to do, if you wanted to create another d.p.
the technology is one thing, but the ass-kissing aspect is quite another...

and again, the disruption of synergy is all-important.

but yes, my frustration has grown to the point where i am _willing_ to
entertain the thought of doing forking...

and you'll see that manifesting in my posts in the immediate future...

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080617/9442b7c8/attachment.htm 

From Bowerbird at aol.com  Tue Jun 17 12:37:40 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 17 Jun 2008 15:37:40 EDT
Subject: [gutvol-d] the great preprocessing escapade
Message-ID: <c99.29dfef10.35896c84@aol.com>

wow.   where do i begin?

one of the weakest of the weak links in the d.p. workflow chain
is "preprocessing".   this step actually happens _after_ o.c.r., but
_before_ the first proofing round, so d.p. calls it "preprocessing".

i have made rational arguments -- long and hard -- that d.p.
could help their quantity _and_ their quality by improving their
preprocessing routines.   as usual, the "powers that be" from d.p.
have pretended to ignore me, tried to attack my credibility, and
implemented some of my suggestions in a roundabout way so
they wouldn't have to give me credit.   (i do not need credit, but
very often the roundabout implementations get details wrong.)

and mostly, they just continue on with shoddy preprocessing...

you can see where i've talked about this topic on the d.p. forums
by searching messages for "preprocessing" or "pre-processing".

here's a typical thread, this one appearing in january of 2007:
>   http://www.pgdp.net/phpBB2/viewtopic.php?t=24634

this thread is notable because i shared a useful hint with them,
on the topic of how to fix "spacey quotes" -- quotemarks that
have a space on both sides, which is not uncommon from o.c.r.:
>    the secret of fixing doublequotes 
>    is counting them within-paragraph.

now, i'll grant you that that "secret" is _pretty_obvious_ once you
hear it, but i can assure you _most_ people hadn't thought of it,
including programmers of the current d.p. preprocessing tools.

indeed, dkretz responded by saying:
>    Of course. Thanks, I hadn't thought of that. 

i followed up with some elaboration, and he included it in his 
clean-up apps, and i believe he has had much success with it.

but dkretz is about the only person who seemed to be listening.

indeed, that thread is a good illustration of how i have helped d.p.
-- or tried to, anyway -- and how they've ignored my suggestions.
you'll find a ton of good advice from me, just in that one thread...

it also illustrates how d.p. has intentionally set obstacles in my path.
for instance, you can read how i offered to correct their _database_
-- i.e., the text for _all_of_their_projects -- in one fell swoop, but
they refused to give me a copy of that database -- a 1.5-gig file --
and suggested instead that i download each of the 18,000+ e-texts
_individually_, over the course of a good number of days.    stupidity!
and then they objected to _that_ because it would "strain their server". 

so this is how ridiculous the d.p. "powers that be" have treated me...
so -- needless to say -- they've been stuck with lousy preprocessing.

and thus, i was cheered last december, when rfrank (roger frank)
announced he was working on an improved preprocessing tool...
>    http://www.pgdp.net/phpBB2/viewtopic.php?t=30903

you'll even see that, in regard to fixing spacey quotes, dkretz
repeated my suggestion about the secret of doing it correctly:
>    http://www.pgdp.net/phpBB2/viewtopic.php?p=403153#403153

that thread ended after just 2 days.   but over the past 6 months,
rfrank has been using cpprep, to good effect, from what i can tell.
a search for "cpprep" turns up lots of instances where it was used,
and the reports of the fixes made show it saves work for proofers.

***

so i expected this to be an upbeat message about how d.p. improved.

however...

when i looked at some of the early cpprep results, i was disappointed
to find that it was simply _marking_ the spacey doublequotes, and not
actually _fixing_ 'em.   but i figured that, over time, they would observe
that the fixes the program _would_have_ auto-applied were _correct_,
in the vast majority of cases, and thus could be made with confidence,
and they would flip the automatic-fix switch.

in looking at more recent cpprep output, however, i find that not only
have they not flipped the auto-fix switch, but have actually _regressed_
to where they are marking both types of spacey-quotes the same way!

that is, both the probable-open and probable-close spacey-quotes
are being marked with an asterisk in front of them, indicating the tool
has become _less_ certain of its ability to discern them, and not more so.

i haven't confirmed this on a number of files, because it's so hard to know
just exactly which files were treated in which way, but the one file where i
found these results is clearly a very recent one, and did have cpprep on it:
>    http://www.pgdp.net/c/project.php?id=projectID481cf3f654893
>    http://www.pgdp.net/c/project.php?id=projectID481cf42c2d365

there are 2 projects listed, because this was a parallel-p1 experiment.
in case you are interested, the project just released from the p2 queue:
>    http://www.pgdp.net/c/project.php?id=projectID4836c3cc2a3f5

i will detail the results of this experiment in a later post, but for now,
i'll note that this method of marking spacey-quotes (with an asterisk)
actually seems to be _counter-productive_...   there are several cases
where there was a difference between the two parallel p1 proofings
on the spacey-quotes, where one proofing was _downright_wrong_.
it wasn't just _missed_, it was acted upon and turned into an _error_...
(there might be more where both p1 proofings made the same error.)

that's not good.

furthermore, several of the other parallel-differences indicate that the
preprocessing that is being done is fairly lame.   i make this judgment
even though -- in the project comments -- rfrank said that the project
"has undergone significant preprocessing".   that's really unfortunate,
because it indicates that d.p. still has a very long road to travel before
they have attained an adequate preprocessing tool.   in the meantime,
proofers are having to make corrections the machine could be fixing...

i certainly would have hoped that 6 months of work from rfrank would
have resulted in a lot more progress toward a good preprocessing tool.

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080617/b284ff78/attachment.htm 

From julio.reis at tintazul.com.pt  Tue Jun 17 15:25:01 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Tue, 17 Jun 2008 23:25:01 +0100
Subject: [gutvol-d] US public domain for works published abroad
Message-ID: <1213741501.14935.50.camel@abetarda>

Hmmm... guys, probably everyone's got here before I did. And yet --

http://www.copyright.cornell.edu/public_domain/ -- is this reliable?

Under "Works Published Outside the U.S. by Foreign Nationals or U.S.
Citizens Living Abroad," see line:

* Date of Publication: 1923 through 1977
* Conditions: Published without compliance with US formalities, and in
the public domain in its home country as of 1 January 1996
* Copyright Term in the United States: In the public domain

Footnote 11 reveals that "US formalities include the requirement that a
formal notice of copyright be included in the work; registration,
renewal, and deposit of copies in the Copyright Office; and the
manufacture of the work in the US."

It follows that Gutenberg *could* clear a few more books... Rule 10? :)
Published outside the US (not in the US within 30 days) *and* from 1923
to 1977 *and* (no formal notice of copyright *or* no deposit with US
Copyright Office *or* not made in the USA) *and* PD in country of
publication as of 1 Jan 1996 = Cleared.

Rule 10-PT would clear books published in Portugal of authors dead on or
before 1925; not much more, but a bit more (already enough to clear
something in my library.)

Rule 10-BR would clear books published in Brazil of authors dead on or
before 1935; and so on and so forth.

So? Is it worth PG's effort?

J?lio.


From prosfilaes at gmail.com  Tue Jun 17 16:45:29 2008
From: prosfilaes at gmail.com (David Starner)
Date: Tue, 17 Jun 2008 19:45:29 -0400
Subject: [gutvol-d] US public domain for works published abroad
In-Reply-To: <1213741501.14935.50.camel@abetarda>
References: <1213741501.14935.50.camel@abetarda>
Message-ID: <6d99d1fd0806171645r31dd4cbbq6d9bcdd4aa7a9ac0@mail.gmail.com>

On Tue, Jun 17, 2008 at 6:25 PM, J?lio Reis <julio.reis at tintazul.com.pt> wrote:
> So? Is it worth PG's effort?

I think it's been mentioned as possible before. But there are a couple
complexities. First, it basically requires a Rule 6 check, since it's
very hard to tell whether a book was published in the US within 30
days or not.  The other big issue is "in the public domain in its home
country as of 1 January 1996". What, exactly, is a reliable source for
the state of the public domain in Portugal in 1996? Especially one in
English, so our copyright clearers can verify it? Given continuing
changes in copyright law around the world, and the fact that things
just aren't as simple as life+70 in so many cases I know of, make this
somewhat difficult. (One detail; 1996 only applies to Berne convention
countries; Vietnam is later, and other countries may have a start date
in the 21st century, when we start to retroactively recognized Iraqi
and Iranian copyrights.)

Personally, if you are interested, I think your best bet is take the
book you want, establish very carefully that it is in the public
domain under these rules, and try to get a copyright clearance. It'll
force them to look at the situation, and if they accept it, they'll
have a precedence. If PG does start clearing books this way, I
wouldn't be surprised if they make a list of countries to work from,
that they feel confident about the state of copyright law

From sly at victoria.tc.ca  Tue Jun 17 18:10:39 2008
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue, 17 Jun 2008 18:10:39 -0700 (PDT)
Subject: [gutvol-d] US public domain for works published abroad
In-Reply-To: <1213741501.14935.50.camel@abetarda>
References: <1213741501.14935.50.camel@abetarda>
Message-ID: <Pine.GSO.4.58.0806171809230.16931@vtn1.victoria.tc.ca>

I have thought the same thing about works published in the
former Soviet union. But trying to approach the actual copyright
laws involved gets confused.

Andrew

On Tue, 17 Jun 2008, J?lio Reis wrote:

> Hmmm... guys, probably everyone's got here before I did. And yet --
>
> http://www.copyright.cornell.edu/public_domain/ -- is this reliable?
>
> Under "Works Published Outside the U.S. by Foreign Nationals or U.S.
> Citizens Living Abroad," see line:
>
> * Date of Publication: 1923 through 1977
> * Conditions: Published without compliance with US formalities, and in
> the public domain in its home country as of 1 January 1996
> * Copyright Term in the United States: In the public domain
>
> Footnote 11 reveals that "US formalities include the requirement that a
> formal notice of copyright be included in the work; registration,
> renewal, and deposit of copies in the Copyright Office; and the
> manufacture of the work in the US."
>
> It follows that Gutenberg *could* clear a few more books... Rule 10? :)
> Published outside the US (not in the US within 30 days) *and* from 1923
> to 1977 *and* (no formal notice of copyright *or* no deposit with US
> Copyright Office *or* not made in the USA) *and* PD in country of
> publication as of 1 Jan 1996 = Cleared.
>
> Rule 10-PT would clear books published in Portugal of authors dead on or
> before 1925; not much more, but a bit more (already enough to clear
> something in my library.)
>
> Rule 10-BR would clear books published in Brazil of authors dead on or
> before 1935; and so on and so forth.
>
> So? Is it worth PG's effort?
>
> J?lio.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From walter.van.holst at xs4all.nl  Tue Jun 17 22:06:46 2008
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Wed, 18 Jun 2008 07:06:46 +0200
Subject: [gutvol-d] US public domain for works published abroad
In-Reply-To: <6d99d1fd0806171645r31dd4cbbq6d9bcdd4aa7a9ac0@mail.gmail.com>
References: <1213741501.14935.50.camel@abetarda>
	<6d99d1fd0806171645r31dd4cbbq6d9bcdd4aa7a9ac0@mail.gmail.com>
Message-ID: <485897E6.3060008@xs4all.nl>

David Starner wrote:

> days or not.  The other big issue is "in the public domain in its home
> country as of 1 January 1996". What, exactly, is a reliable source for
> the state of the public domain in Portugal in 1996? Especially one in
> English, so our copyright clearers can verify it? Given continuing

That would be Portugese copyright statutes at that date. Who are not 
very likely to be available in English and unlikely to be available online.

It nonetheless would be very useful to compile a set of EU member state 
copyright status as of the 1st of 1995 and the 1st of 1996. The first 
set since in some EU member states the date of 1st of 1995 is the marker 
date for retro-active extension from 50+ to 70+, the second for this 
rule 10 clearance.

Regards,

  Walter

From Bowerbird at aol.com  Wed Jun 18 11:20:49 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 18 Jun 2008 14:20:49 EDT
Subject: [gutvol-d] what dave did
Message-ID: <ca3.34767baa.358aac01@aol.com>

seth godin has an interesting post:
>    http://sethgodin.typepad.com/seths_blog/2008/06/what-dave-just.html

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080618/bd4175b8/attachment-0001.htm 

From Bowerbird at aol.com  Wed Jun 18 12:57:04 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 18 Jun 2008 15:57:04 EDT
Subject: [gutvol-d] across the mesa
Message-ID: <bf5.362d2cdf.358ac290@aol.com>

ok, here are the results for the parallel-p1 book that i talked about 
yesterday,
"across the mesa".

as usual, rfrank did good work here.   the scans are fairly straight and very 
clear,
and the quality of the o.c.r. is very good.   projects like this are a joy to 
work on...

and again, as usual, the p1 proofers have done a very good job as well.

there were just 234 points of difference between the parallel proofings...
on a file with over 9,200 lines -- a book of 318 pages -- that's excellent...

and the typical results continue on, because roughly 85% of those 234 diffs
could -- and should -- have been resolved with some good preprocessing
before _any_ of this text even went in front of the proofers...   sad but 
true...

you can tell the preprocessing was inferior because of some obvious gaffs,
like a line that begins with a semicolon.   that doesn't happen in real 
books.

the fact that such a line showed up in the diff results between the proofings
means, on the good news front, that one of the proofings caught this error.

of course, on the bad news front, it means that one of them also missed it,
but as long as one of the proofings caught it, our awareness of it is 
there...

still, this is the kind of thing that the computer can locate, so why not use 
it?

so, in sum, i'm glad a sharp cookie like rfrank is working on programming a
good preprocessing tool for d.p.   i just wish he was making better progress.

***

i've processed this project as z.m.l., and uploaded it to my website:
>    http://z-m-l.com/go/amesa/amesap123.html
>    http://z-m-l.com/go/amesa/amesa.zml

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080618/ad24957a/attachment.htm 

From jeroen.mailinglist at bohol.ph  Wed Jun 18 14:06:35 2008
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Wed, 18 Jun 2008 23:06:35 +0200
Subject: [gutvol-d] US public domain for works published abroad
In-Reply-To: <1213741501.14935.50.camel@abetarda>
References: <1213741501.14935.50.camel@abetarda>
Message-ID: <485978DB.8040204@bohol.ph>


Hi Julio,

I think in almost all cases, it would be easier to go through the 
PG-Canada (or PG-Philippines, but I have not yet made that operational 
for life+50) route, which work under a life+50 regime.

Jeroen.


J?lio Reis wrote:
> Hmmm... guys, probably everyone's got here before I did. And yet --
>
> http://www.copyright.cornell.edu/public_domain/ -- is this reliable?
>
> Under "Works Published Outside the U.S. by Foreign Nationals or U.S.
> Citizens Living Abroad," see line:
>
> * Date of Publication: 1923 through 1977
> * Conditions: Published without compliance with US formalities, and in
> the public domain in its home country as of 1 January 1996
> * Copyright Term in the United States: In the public domain
>
>   


From Bowerbird at aol.com  Wed Jun 18 17:29:24 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 18 Jun 2008 20:29:24 EDT
Subject: [gutvol-d] kid rock rocks kids
Message-ID: <d01.32b3afd8.358b0264@aol.com>

atlantic records went to kid rock, who's on their label, telling him
that he needed to say something publicly against downloading,
because "people are stealing from us and stealing from you"... 

"wait a second, you've been stealing from the artists for years,"
he responded, "but now you want me to stand up for you?"

instead, he started spreading the opposite message: 
"i was telling kids -- download it illegally, i don't care.
i want you to hear my music so i can play live."

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080618/f5f1ef5a/attachment.htm 

From Bowerbird at aol.com  Thu Jun 19 09:28:47 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 19 Jun 2008 12:28:47 EDT
Subject: [gutvol-d] across the mesa
Message-ID: <bc1.33d6aa82.358be33f@aol.com>

"across the mesa" flew through p2.   48 hours, in about 10 sessions.

once again, we have solid proof just how good the p1 proofers are,
especially when they get two cracks at a book, like this parallel-p1.

p2 made just _20_ corrections, on 17 pages, in this 318-page book,
and 7 of those 20 were concerning blank lines at the top of a page...

that's pretty impressive.   it means that -- even _without_ the p2 --
the parallel-p1 took the book far past the 1-error-every-10-pages
standard i have proposed for my "continuous proofreading" method.

but the story doesn't end there...

because when we take a closer look at the 20 errors that p2 fixed,
we find that every single one could've been fixed in preprocessing.

that's right, _every_single_one_!   20 out of 20.   100%.

i've listed them below, because i'm sure some bloomin' idiot out there
doesn't believe me.

by the way, the reg-ex i've listed here turned up 2 more errors
which were missed by both rounds of p1 plus the p2 proofers...

on page 106:
>    "Game's up, Pachuca." he said, shortly. "You're
>    "Game's up, Pachuca!" he said, shortly. "You're

on page 294:
>    "It's useless, of course," grunted Scott "They'll
>    "It's useless, of course," grunted Scott. "They'll

***

as usual, i'm amazed/appalled by the huge disutility of having
human beings search for errors by comparing word-for-word,
when the computer can find them much easier.   it's a big waste.

i mean, sure, if you want the humans to search word-for-word
for things that the computer _cannot_ find, that's fine.   but first
take care of all the errors that the computer _can_ find by itself.

c'mon, people, open your eyes...

-bowerbird

p1>   flat and stilling, to a region of small hills and valleys;
p2>   flat and stifling, to a region of small hills and valleys;
bb>   auto-detectable by spell-check

p1>   did not think you would stay with the Senora Morgan."
p2>   did not think you would stay with the Se??ora Morgan."
bb>   auto-detectable by spell-check

p1>   "I wouldn't call that queer," replied Scott
p2>   "I wouldn't call that queer," replied Scott.
bb>   auto-detectable by [:lowercase:][:whitespace:]\"[:uppercase:]

p1>   are you?".
p2>   are you?"
bb>   auto-detectable by punctuation-check

p1>   Li back on one of them to-night"
p2>   Li back on one of them to-night."
bb>   auto-detectable by paragraph-termination-check

p1>   thought
p2>   thought.
bb>   auto-detectable by paragraph-termination-check

p1>   and one of the candidates for the next presidency----"said
p2>   and one of the candidates for the next presidency----" said
bb>   auto-detectable by doublequote-check

p1>   said Scott, thoughtfully. "Or break it"
p2>   said Scott, thoughtfully. "Or break it."
bb>   auto-detectable by paragraph-termination-check

p1>   here in half a minute if I don't"
p2>   here in half a minute if I don't."
bb>   auto-detectable by paragraph-termination-check

p1>   "Mr. Hellick got flend--Mrs. Conlad." said Li,
p2>   "Mr. Hellick got flend--Mrs. Conlad," said Li,
bb>   auto-detectable by \.\" [:lowercase:]

p1>   Angel Gonzales. a large, brutal-looking man, his face
p2>   Angel Gonzales, a large, brutal-looking man, his face
bb>   auto-detectable by \.\ [:lowercase:]

p1>   "Men, and horses and plunder--oh, much plunder1"
p2>   "Men, and horses and plunder--oh, much plunder!"
bb>   auto-detectable by alpha/numeric-check

p1>   a yarn, Mr. Penhallow. and then you've got to help me
p2>   a yarn, Mr. Penhallow, and then you've got to help me
bb>   auto-detectable by \.\ [:lowercase:]

***

new paragraphs on pagebreaks should be auto-detected...

pb>   Emma," said the girl. "But Mr. Adams has been telling
pb>   Just bring over a couple of blankets, will you, Mrs.
pb>   "Wearin' of the Green" and the "Long, Long Trail."
pb>   "You see, I've got mining in my blood. My grandfather
pb>   Anyhow, Mrs. Conrad married her Englishman and
pb>   Scott, whose impatience and irritation made speech unendurable.
pb>   Pachuca--apart from the raid, at least, he thinks he


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080619/1e3b3403/attachment.htm 

From Bowerbird at aol.com  Thu Jun 19 14:02:08 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 19 Jun 2008 17:02:08 EDT
Subject: [gutvol-d] art in the cloud and on glossy stock
Message-ID: <c63.34f20e09.358c2350@aol.com>

derek powazek -- a writer/geek out of the bay area --
has been associated with some neat stuff, including this:
>    MagCloud enables you to publish your own magazines. 
>    All you have to do is upload a PDF 
>    and we'll take care of the rest: 
>    printing, mailing, subscription management, and more.
>    http://magcloud.com/

print-on-demand magazines.   what will they think of next?         :+)

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080619/1023beb5/attachment.htm 

From Bowerbird at aol.com  Thu Jun 19 14:26:04 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 19 Jun 2008 17:26:04 EDT
Subject: [gutvol-d] art in the cloud and on glossy stock
Message-ID: <d0a.28cd51bf.358c28ec@aol.com>

oh yeah, the thing i forgot to tell you is that magcloud.com
is backed by hp labs...   derek informs why this is important:
>    there are other print-on-demand companies out there, but 
>    MagCloud is the only one designed specifically for magazines. 
>    And it's the only one created by HP, the company that makes 
>    the Indigo printers that power the print-on-demand industry.

in other words, hp labs has a good incentive to _make_this_happen_.
they actually _want_ it to work.   how many "tests" have we seen done
by some entity that we weren't all too convinced wanted it to succeed?
too many.   so it's nice to see that shoe on the other foot for a change.

derek has more insightful things to say, so go read what he wrote:
>    http://powazek.com/posts/984

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080619/f3b13ce7/attachment.htm 

From julio.reis at tintazul.com.pt  Thu Jun 19 14:45:37 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Thu, 19 Jun 2008 22:45:37 +0100
Subject: [gutvol-d] US public domain for works published abroad
In-Reply-To: <mailman.4.1213902003.24682.gutvol-d@lists.pglaf.org>
References: <mailman.4.1213902003.24682.gutvol-d@lists.pglaf.org>
Message-ID: <1213911940.8125.30.camel@abetarda>

Thanks for your answers... they all contributed, but Jeroen's reply
nails the last nail: why complicate? The goal behind my question was not
to publish ebooks in PG-USA; but rather to have *somewhere* where I can
publish all books in the public domain in Portugal and Brazil. PG-Canada
is that place: we're life+70 they're life+50 so if it's free here, we
can publish it there.

Plus I've just PMed+PPed the first book in Portuguese in PGDP-Canada. A
sonnet book by one of our greatest poets, published in the 1930s.

J?lio.

>         I think in almost all cases, it would be easier to go through
>         the 
>         PG-Canada (or PG-Philippines, but I have not yet made that
>         operational for life+50) route, which work under a life+50
>         regime.


From Bowerbird at aol.com  Fri Jun 20 13:29:42 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 20 Jun 2008 16:29:42 EDT
Subject: [gutvol-d] across the mesa -- the conclusion
Message-ID: <bc8.30b77e57.358d6d36@aol.com>

let's put it in stark terms.

after my full-on preprocessing of "across the mesa",
there were just 11 errors that human proofers found.
an annotated list is appended...

6 were found by one parallel p1 proofing, 5 by the other.

this is the kind of accuracy you can expect when a book is:
1) carefully scanned so the scan-set is clean,
2) subjected to o.c.r. with a good o.c.r. app, and
3) preprocessed correctly...

all 11 of these errors would've been caught by engaged readers,
so need for human proofers on this book is highly questionable.

-bowerbird

======================================================================
== a>   He tried, several tunes, but the door held maddeningly. =======
== b>   He tried, several times, but the door held maddeningly. =======
== d>   ===================^^====================== tunes vs. times* ==
======================================================================
== a>   in the middle of the night, too." Mrs. Van Zandt ==============
== b>   in the middle of the night, too," Mrs. Van Zandt ==============
== d>   ===============================^====== [period]* vs. [comma] ==
======================================================================
== a>   down here," Scott was riding with his knee around =============
== b>   down here." Scott was riding with his knee around =============
== d>   =========^============================ [comma] vs. [period]* ==
======================================================================
== a>   some shickens -- netting else left." ==========================
== b>   some shickens -- notting else left." ==========================
== d>   ==================^==================== netting vs. notting* ==
======================================================================
== a>   up the community. Herrick. I want you to know Bob =============
== b>   up the community. Herrick, I want you to know Bob =============
== d>   =========================^============ [period] vs. [comma]* ==
======================================================================
== a>   "Could you ride. Henry, do you think? You and =================
== b>   "Could you ride, Henry, do you think? You and =================
== d>   ===============^====================== [period] vs. [comma]* ==
======================================================================
== a>   "But, Henry, I can't stand it! And I look so! I ===============
== b>   "But. Henry, I can't stand it! And I look so! I ===============
== d>   ====^================================= [comma]* vs. [period] ==
======================================================================
== a>   "'Twa'n't much. I took my time. You see, the ==================
== b>   "Twa'n't much. I took my time. You see, the ===================
== d>   =^=============================== singlequote* vs. (missing) ==
======================================================================
== a>   of him. Get out of my way, Hard." =============================
== b>   of him. Get out of my way. Hard." =============================
== d>   =========================^============ [comma]* vs. [period] ==
======================================================================
== a>   "Oh, laugh if you want to," said Polly, indulgently. ==========
== b>   "Oh, laugh if you want to," said Polly, indulgently, ==========
== d>   ======================== [period]* vs. [comma] ====^===========
======================================================================
== a>   against the bandits which have nourished so long ==============
== b>   against the bandits which have flourished so long =============
== d>   ===================^^================= nourish vs. flourish* ==
======================================================================
===================== 2 [period]* vs. [comma] ========================
===================== 2 [period] vs. [comma]* ========================
===================== 2 [comma]* vs. [period] ========================
===================== 1 [comma] vs. [period]* ========================
======================================================================
===================== singlequote* vs. (missing) =====================
======================================================================
===================== tunes vs. times* ===============================
===================== netting vs. notting* ===========================
===================== nourish vs. flourish* ==========================
======================================================================


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080620/5817f846/attachment.htm 

From Bowerbird at aol.com  Mon Jun 23 11:19:59 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 23 Jun 2008 14:19:59 EDT
Subject: [gutvol-d] roger's girls -- 001
Message-ID: <bc7.273d8a4d.3591434f@aol.com>

roger frank recently had 3 titles posted
that include the world "girls" in the title,
so i'm calling them "roger's girls", and
will be presenting some reports on them:
>    25870 -- A World of Girls, by L. T. Meade
>    25872 -- Girls of the Forest, by L. T. Meade
>    25873 -- The Motor Girls on Crystal Bay, by Margaret Penrose

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080623/ddbefdea/attachment.htm 

From Bowerbird at aol.com  Mon Jun 23 11:32:56 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 23 Jun 2008 14:32:56 EDT
Subject: [gutvol-d] english espresso
Message-ID: <bb9.2a413bfc.35914658@aol.com>

blackwell -- a 60-store bookstore-chain in england --
is putting the print-on-demand espresso book machine
into their shops.   the machine can print one million titles,
600,000 from a partnership with lightning source, with
the rest of them being public-domain.   (finally, a number
that we can count on as having eliminated the duplicates.)

vince gun, the c.e.o. of blackwell, said:
>    From a retailer's point of view, even allowing for the 
>    first-generation technology and publisher challenges, 
>    this is a fantastic opportunity?sell to demand with 
>    no risk to inventory and an opportunity to create 
>    incremental revenue streams for ourselves and publishers.

wow.   a bookseller with a brain.   who knew?

>    http://thebookseller.com/news/61423-blackwell-brews-up-espresso.html

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080623/ee3ce867/attachment.htm 

From Bowerbird at aol.com  Wed Jun 25 12:40:59 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 25 Jun 2008 15:40:59 EDT
Subject: [gutvol-d] how little they know
Message-ID: <d3d.24d0101b.3593f94b@aol.com>

sometimes it's just _staggering_ how little they know over at d.p.

consider this post, from roger frank:
>    http://www.pgdp.net/phpBB2/viewtopic.php?t=33945

it starts out like this:
>    The first time was an accident. Two PMs were running the same book 
>    through DP and it wasn't caught until after P1. Out of curiousity, 
>    the results were compared. It appeared that both P1 proofers, 
>    in parallel, found most of the same errors, but each of them found 
>    some that the other didn't. Letting a single merged copy go through P2 
>    showed that together, they caught nearly all the errors. Was this just 
luck?

does roger _really_ think that this was "the first time" for parallel 
proofing?

seriously?   does he have no knowledge that this methodology has a _very_
long history, going back to the earliest days of _keypunching_ -- where it
was called "double-punching" -- and likely was known even before then?

and does he really not know that parallel proofing has proven itself already?
does he really not know that i've pointed out that it could be usefully 
applied
at distributed proofreaders, and making that observation for _years_ now?

evidently not, because he thinks it's "time to explore this 
scientifically"...

and gee, i _know_ that he read at least _some_ of this d.p. forum thread:
>   http://www.pgdp.net/phpBB2/viewtopic.php?t=33945
where i discussed "revolutionary o.c.r. proofing", a methodology based
on _comparing_versions_, because he commented on the very first page.

again, roger is doing some good work:
>    http://fadedpage.com//ppgen-doc.htm

but his failure to do research -- failure to even go "next door" here to the
p.g. listserve so he'd know what's being said here-- is very disappointing...

(and those d.p. people who _are_ present, and are not telling roger that
he should be here, are doing him a disservice, and are not his friends...)

and, oh yeah, as i've said here _several_ times, in discussing the various
parallel p1 experiments i've analyzed, the _real_question_ is not "whether"
parallel proofing "works" -- because we _know_ that it works -- but rather
_whether_ it's _more_cost-effective_ than 2 _serial_ rounds of p1 proofing.

of course, at d.p., the "cost" of a p1 proofing is _literally_ next to 
nothing,
so i guess that _any_ benefit will be "cost-effective" from that 
standpoint...

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080625/5d3e2679/attachment.htm 

From Bowerbird at aol.com  Wed Jun 25 16:18:16 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 25 Jun 2008 19:18:16 EDT
Subject: [gutvol-d] how little we all know
Message-ID: <c1a.3fd988d7.35942c38@aol.com>

does it make it sound nicer if i include myself, by using "we"?

because if you want to know what _my_ research tells me,
it's that docs.google.com is the future of online documents,
including books.

wysiwyg has proven that people will chose it if they can.
25 years of desktop experience shows that, very clearly.

even though it's hardly the best workflow methodology,
wysiwyg is strongly preferred over dealing inside formats.

this includes .html, but also includes .zml -- my format --
and "least markup" -- rfrank's format -- and .xml, .epub,
.wikimarkup, .docbook, .rtf, .txt, .pdf, and whatever else...

behind the scenes, google can convert to whatever format,
but the face it presents to the user is a wysiwyg interface...

even though we might prefer a more semantic approach,
wysiwyg is going to be the interface that we're stuck with.

the project gutenberg library could be put in google-docs, 
in its entirety, formatted and styled quite nicely, with the
ability for any authorized person to make corrections to it.

any of the possible conversions that an end-user needed
could be requested and received, by that user, on the fly...

there's very little reason not to do it this way...

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080625/643f3d9d/attachment.htm 

From Bowerbird at aol.com  Thu Jun 26 10:51:22 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 26 Jun 2008 13:51:22 EDT
Subject: [gutvol-d] roger's girls -- 002
Message-ID: <ca6.2be6b992.3595311a@aol.com>

i took a look at 3 digitizations done by roger frank (rfrank):
>    world of girls
>    girls of the forest
>    motor girls at crystal bay

i thought these projects would be a little more interesting 
than they turned out to be.

although they've just been posted, they all entered d.p.
prior to december of last year, when rfrank began his
foray into preprocessing.   so all of these books were
inadequately prepared, a la the bad old days over at d.p
(which, for most content providers, are still going on)...

the absence of good preprocessing means that these books
contained hundreds and hundreds of errors for p1 to correct,
with as many as 500 of them being spacey-quotes, which --
as i have mentioned previously -- can be fixed _automatically_
by the computer, and thus shouldn't be subjected to humans...

anyway, what i found with these books is the usual pattern
-- you remember, the pattern that we've come to expect -- 
the pattern that seems to capture a "common-sense" take,
which is that p1 fixes most of the errors, p2 gets most of
the remaining ones, and p3 comes in and does clean-up.

again, this is the pattern you get on page after page,
in book after book, day after day, over in d.p.-land...

here are the number of lines changed, by round:

title --------- lines/     _p1   /     _p2   /     _p3   /     _f1   /     
_f2
world girls --- 10,000/     700   /     150   /     079   /     193   /     
010
girls forest -- 10,000/     350   /     075   /     016   /     036   /     
020
motor girls --- 07,000/     300   /     100   /     009   /     204   /     
---

due to the inadequate preprocessing, p1 made hundreds of fixes,
then p2 made about 20%-33% as many corrections as p1 had made.

the amazing thing is the small number of changes made by p3,
from about half as many as p2 down to _one_tenth_ as many...

likewise, while f1 made as many as 200 changes, f2 made 10-20.

the reason that the very small changes made by p3 and f2 are
_significant_ is that p3 and f2 are the current _big_bottlenecks_
in the workflow at distributed proofreaders.

there are very few volunteers working in p3 and f2, relative to
the other "lower-ranking" stations, so it's simply unreasonable
(some might say "stupid") to expect that p3 and f2 can handle
all of the material that's being generated by the earlier rounds.

ergo, bottleneck.

so "girls in the forest", was held up waiting for p3 to change _16_
lines, and then held up again waiting for f2 to change _20_ lines.
and this on a book that had 10,000 lines in it!

thus, many books are waiting a _long_ time for these "high-level"
rounds, where _virtually_nothing_ happens to them.   it's amazing!

the other experiments have indicated that repeated p1 rounds
could have found all of the errors that were located during p3.

when one considers all of the various resources being wasted
in _testing_ volunteers for p3 and f2, and the ill will generated
when people "fail" to pass these tests, as well as all the energy
being squandered in "round-skipping" and the like, it's simply
astounding that this tiered system hasn't already been scrapped.
but then again, the "powers that be" at d.p. do _not_ like to admit
they were wrong.   so look for them to patch over these problems.

anyway, if anyone wants to see my output for this research, say so,
and i'll post it.   otherwise, there isn't much to see here, so move on.

i will remark on one coincidence, however.   "world of girls" was the
13,000th e-text done by distributed proofreaders.   on the one hand,
hearty congratulations to the dedicated volunteers making it happen!
a big round number like that is indeed a good reason for celebration.

on the other hand, it's extremely disappointing that, 13,000 e-texts in,
d.p. still hasn't learned the value of doing a good job of preprocessing.
the "leadership" that allows this travesty to hobble their volunteers is
guilty of a serious failure to wisely utilize the labor being contributed
in good faith.   and they've been told this, repeatedly, and they ignore it,
which further compounds their culpability.   it's time to fix this problem.

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080626/191b1742/attachment.htm 

From Bowerbird at aol.com  Fri Jun 27 13:48:08 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 27 Jun 2008 16:48:08 EDT
Subject: [gutvol-d] the mountain
Message-ID: <d04.36957f17.3596ac08@aol.com>

over on the d.p. forums, rfrank (roger frank) announced this morning
that he is running yet another parallel-proofing experiment.   oh boy...

unfortunately, he's put the cart in front of the horse, because he _still_
isn't doing the type of decent preprocessing that d.p. should be doing.

which means he's still expecting the human volunteers to find and fix
errors that the _computer_ is much better at finding and helping us fix.

so what i've done is to show people what a well-preprocessed version
of this file would look like.   you can find it up on my website already:
>    http://z-m-l.com/go/mount/mount.zml

so, no, it doesn't take long to do the preprocessing right.   not long at 
all.

and of course, once the preprocessing has been done, then the book
is ready for "continuous proofreading", so i put those files up as well.
here are a bunch of various pages in the book:
>    http://z-m-l.com/go/mount/mountp001.html
>    http://z-m-l.com/go/mount/mountp123.html
>    http://z-m-l.com/go/mount/mountp234.html
>    http://z-m-l.com/go/mount/mountp345.html

luckily, rfrank arranged this so the pagenumber/filenames matched, so
i didn't have to do any renaming of these files...   it sure makes it more
_convenient_ when the content-provider does that from the very start...

***

over here:
>    http://www.pgdp.net/phpBB2/viewtopic.php?p=467791#467791
roger says this:
>    I know the concept of parallel or redundant processing has 
>    been around   and it's been applied effectively to many things. 
>    I'm trying to learn how it applies to proofing. 
>    "What kinds of errors are typically missed by both proofers?"

save your time, roger.   i've looked at lots and lots and lots of books,
and there's no rhyme or reason for why proofers miss what they do.

sometimes they catch errors that you'd think would be very elusive.
and at other times they can miss what is right in front of their nose.

the only thing you can count on is that, once something that you've
missed is pointed out to you, you'll wonder how you _ever_ missed it,
because it will stick out like a sore thumb.

the smartest course of action is to catch as much as you can with the
computer, in advance, and then just hope the humans catch the rest.

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080627/26418bfe/attachment.htm 

From Bowerbird at aol.com  Sat Jun 28 18:39:13 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 28 Jun 2008 21:39:13 EDT
Subject: [gutvol-d] the mountain
Message-ID: <c0c.3adcca02.359841c1@aol.com>

i decided to move "blood mountain" along a bit,
so it should be pretty much finished right now...

all the people who claim that i don't know how to
do this shit are invited to find the flaws in my work.

>    http://z-m-l.com/go/mount/mount.zml
>    http://z-m-l.com/go/mount/mountp001.html

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080628/37e3276e/attachment.htm 

From Bowerbird at aol.com  Mon Jun 30 22:35:00 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 1 Jul 2008 01:35:00 EDT
Subject: [gutvol-d] continued confusion over at distributed proofreaders
Message-ID: <d0c.36cd3e71.359b1c04@aol.com>

the question of whether o.c.r. can be "too perfect" has come up at d.p.
>    http://www.pgdp.net/phpBB2/viewtopic.php?p=468492#468492

someone actually is considering _leaving_in_ the runheads on a book,
thus forcing the proofers to remove them _manually_ instead.   sheesh!
because otherwise the pages would be "too perfect", and the proofers
would "become bored and perhaps miss something".   this is hogwash...

***

rfrank (roger frank) said:
>    If page after page goes by, does a proofer's attention fade? 
>    I believe it does. 

everyone is entitled to an "opinion", of course.   but why not test this?

_my_ opinion is that the proofer's attention will not _necessarily_ fade.

we've seen plenty of cases where a proofer who is doing page after page
of clean text finds an isolated error, one that the earlier proofers -- who
_made_ those pages so clean -- had missed...   and the fact that those
earlier proofers _missed_ an error shows that, even with their attention
sufficiently "engaged", it was still possible for them to miss an error...

i know that, from my own personal perspective, i have spotted typos in
books published by bib publishers, when i wasn't even looking for 'em,
because i _expected_ that the book was error-free.   typos just stick out.

but we _know_, from the _evidence_ of _several_earlier_experiments_,
that p1 proofers -- in their second pass, thus on much cleaner text --
perform _just_as_well_ (and sometimes even better) than p3 proofers...
this has been demonstrated, consistently, on many different occasions.
so it's a pity that rfrank wasn't paying attention to those experiments...


>    Does a proofer get satisfaction in finding and making a correction? 
>    Again, I believe that is true.

well, sure that causes satisfaction.

but do they still have that same sense of satisfaction when the computer
could have found that very same error, _immediately_, 100% of the time?

i would think that being the _engineer_ of that computer-aided process
would be _far_ more satisfying, and take _much_ less human resources.

and _i_ believe that proofers who _miss_ those errors, after they've spent
literally hours and hours proofreading a book, only to have the computer
find them _instantly_, are bound to be quite disappointed in themselves...


>    It's not proofing, but its relevant:
>    I've put several books in smoothreading 
>    and have gotten the comments 
>    "I wish you had given me something to find."

i don't think that that _is_ relevant.   (and "its" should be "it's".)

if people are smoothreading a book _only_ to find errors, then
they're wasting their time.   they should only smoothread books
if/when they are actually _interested_ in reading the content...


>    I regularly pre-process beyond what guiprep does. 
>    However, some things that I could pre-process correctly 
>    prehaps 98% of the time, I'll leave in the text because of 
>    the cost of finding and fixing a mis-correction. This has 
>    been discussed somewhere in the wayback, for example,
>    in adjusting spacing around quote marks. 

ok, this is just wack.   (and "prehaps" should be "perhaps".)

and not just because i laid out the _correct_ argument,
and juliet laid out the _incorrect_ one, and rfrank chose
the _incorrect_ argument.

but because he then _generalized_ the incorrect argument.


>    in the newcomer's only books I've started, I've observed that 
>    the number of corrections made by P2 dropped significantly 
>    once I started pre-processing.

that's because, once you start giving the p1 people cleaner text,
they are _far_ more likely to find _all_ of the errors on a page...

when you give them _dirty_ text, they will find _most_ of the
errors, but they will leave a good number of errors as well...

and this -- all by itself -- largely refutes the question at the top...

we want to move the text as close to perfection as soon as possible.
ideally, we would make it perfect in preprocessing, and then have the
first p1 pass be the first no-change confirmation that it _was_ perfect,
and the second p1 pass be the second no-change verification of that,
in which case (in my opinion), we would be able to certify it as perfect.


>    For a while I was marking suspects with small x marks 
>    before I realized that the mark wasn't showing up in guiguts, 
>    depending on the font selected. What good was that? 

it could be a _lot_ of good, so guiguts should be improved,
to take full advantage of this.   why let a tool hold you back?


>    Then I switched to asterisks, but some new proofers 
>    thought they had to leave every asterisk in place 
>    (ala a proofer's note) even though the instructions were that 
>    it was a warning in an area to be scrutinized closely 
>    and then removed.

so use a tilda~ or some other character that won't confuse them.


>    I don't have statistical data, but I know preprocessing makes for 
>    fewer P2 diffs on Newcomers Only P1s work. 

that's because p1 got it right, so there was nothing for p2 to change.


>    But that does not mean that 
>    the book is any better at the end of P2 
>    than if there were more errors to start with 
>    and both the P1 and the P2 had more to do.

but it _does_ mean p2 could've been bypassed, for many pages.
or that the second pass could have been done by (plentiful) p1,
instead of the (far less plentiful) p2, thus conserving resources.

and it _certainly_ means that those pages -- on which p2 made
no changes -- are _much_more_likely_ to be able to skip a p3...

-bowerbird


**************
Gas prices getting you down? Search AOL Autos for 
fuel-efficient used cars.
      (http://autos.aol.com/used?ncid=aolaut00050000000007)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080701/6cc7cea1/attachment.htm