From radicks at bellsouth.net  Fri May  5 07:11:39 2006
From: radicks at bellsouth.net (Dick Adicks)
Date: Fri May  5 07:11:43 2006
Subject: [gutvol-d] PG texts in South Africa
In-Reply-To: <Pine.LNX.4.60.0604130924560.7046@pglaf.org>
Message-ID: <C080D55B.5394%radicks@bellsouth.net>

The Open Book venture is indeed one to be applauded, Andrew.  I became a PG
volunteer after teaching in Zimbabwe in 1997-98 and perceiving the need for
inexpensive books throughout Africa (less so in South Africa).  Regrettably,
I have been unsuccessful in enlisting Zimbabwean faculty in a project to
print selected PG texts.  But the DVD adaptation will be a boon for teachers
and students whose access to the Internet is undependable.

Obviously Michael has found the OpenLab International e-mail address:  the
website quotes his enthusiastic welcome of Open Book.  In order to cooperate
with them, what about asking OpenLab International to transmit to PG a list
of titles needed by their academic clientele?  Such a list could be
circulated so that PG volunteers can supply the books needed.

A Nigerian friend told me that many a student in his country has to walk for
miles to borrow a book, take it home, copy it overnight by hand, and return
it the next day.  PG volunteers can help young people whose learning depends
on overcoming that kind of hardship.

Dick Adicks

. . . if vicious people are united and form a power, honest people must do
the same.
                                                   --Leo Tolstoy


> From: Michael Hart <hart@pglaf.org>
> Reply-To: "Michael S. Hart" <hart@pobox.com>, Project Gutenberg Volunteer
> Discussion <gutvol-d@lists.pglaf.org>
> Date: Thu, 13 Apr 2006 09:25:19 -0700 (PDT)
> To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
> Subject: Re: [gutvol-d] PG texts in South Africa
> 
> 
> I can't seem to find an email address that works for them.
> 
> Any help?
> 
> Thanks!
> 
> Michael
> 
> 
> 
> On Wed, 12 Apr 2006, Andrew Sly wrote:
> 
>> Here's another example of PG texts being used.
>> 
>> http://www.tectonic.co.za/view.php?id=961
>> 
>> 
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From jon at noring.name  Fri May  5 08:29:38 2006
From: jon at noring.name (Jon Noring)
Date: Fri May  5 08:36:04 2006
Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks
Message-ID: <1052830729.20060505092938@noring.name>

[It's sort of odd that Michael posted this "news release" to Book
People, but not here to gutvol-d. So I'm reposting it here (at least I
don't recall it being posted here -- I checked the gutvol-d archive
for April and May.) David Rothman, in his TeleRead blog, posted an
article covering this news release: http://www.teleread.org/blog/?p=4798
Is this PG II resurrected? Anyway, great work, Michael, at right
justifying the text!]


2 Months To 1/3 Million eBooks


Press Release:  May 4, 2006

See:
http://www.gutenberg.org [Use any favorite browsers]
http://www.worldebookfair.com[Best With MS Explorer]

Contact:
Michael S. Hart <hart@[redacted]> Phone: 217-344-6623


*

1/3 Million eBooks Free from July 4 Through August 4

1/3 of a million books, or 10 times the number found
in the average public library, will be available for
free downloading via the Internet and World Wide Web
beginning July 4, as Project Gutenberg and the World
eBook Library act on their dreams of increased world
literacy and education.

Such a collection, if printed out in standard format
would be large enough to outweigh elephant herds and
to cover the sidelines at all 40 Super Bowl games.

Each year for one month, The World eBook Library and
Project Gutenberg will team up to be major sponsors,
planning to make ONE MILLION eBooks available in the
World eBook Fair of 2009.

"It has been our goal since the dawn of the Internet
to break down the bars of ignorance and illiteracy,"
says Michael Hart, who founded the Project Gutenberg
effort by placing the first permanent text online on
July 4, 1971, 35 years ago.  "Our projects are based
on the premise that everyone in the world could have
access to a free worldwide public library," for John
Guagliardo, founder of The World eBook Library, this
event marks the fruition of years of hard labor.

Over 100 languages will be represented for worldwide
readers and that total is expected to increase, even
as the preparations are underway.  The books are the
permanent property of whomever downloads them, but a
warning is including to check your local copyrights,
as the books are provided under U.S. copyright laws,
and other nations have different copyrights.  eBooks
still under U.S. copyright have been donated via the
permission of the copyright holders.

Project Gutenberg, perhaps the oldest Internet site,
and The World eBook Library, perhaps the largest one
of the growing number of eBook libraries, joined for
the purpose of "bringing the most eBooks to the most
people in the world."

"This is the fulfillment of a lifetime of dreams," a
sentiment shared by Greg Newby, as Project Gutenberg
CEO, he hopes his Library & Information Science PhD.
will become the last of the olde library worlde, and
perhaps the first of a new world of library science.

"We can only hope that Google, Yahoo, and the others
can also achieve their goals in the next few years--
as we hope they will each reach for a million eBooks
before the decade ends.  Our own goals, with them or
without them, are to bring the world the 1/3 million
eBooks this year, 1/2 million next year, 3/4 million
in 2008, and to reach a grand total of a one million
volume World eBook Fair on July 4, 2009."


***


Additional facts and figures:


Proposed World eBook Fair totals for upcoming years:

July 4, 2006, 1/3 Million
July 4, 2007, 1/2 Million
July 4, 2008, 3/4 Million
July 4, 2009, ONE Million

*

By 2009, the terabyte [one thousand gigabytes] boxes
we have seen enter the consumer marketplace in 2006,
now priced as low as $500, will be commonplace on an
average computer on the shelf and will easily hold a
million volumes of a million characters each for the
price of just one semester's books at a university.

*

Project Gutenberg and The World eBook Library are an
example of the 501 (c) (3) corporations that operate
on a non-profit basis to improve the world.

*

Project Gutenberg eBooks are all free of charge from
http://www.gutenberg.org or http://gutenberg.net.au,
the home of Project Gutenberg of Australia, and also
http://pge.rastko.net, Project Gutenberg of Europe.

In addition The World eBook Library sponsors Project
Gutenberg's Consortia Center where entire collection
providers around the world make many eBook libraries
available in their entirety. http://www.gutenberg.cc

The World eBook Library, a member supported service,
offers unrestricted access to its collection of over
250,000 eBooks, documents, and articles.  Individual
membership is only $8.95 a year, discounted to $1 on
a per student or FTE [full time equivalency] for the
various educational groups or institutions.  An even
greater discount is available for public libraries.

For those who cannot afford a membership, or who may
be experiencing hardships, these World eBook Library
services are provided via complimentary subscription
services via Natural Disaster Relief Programs and/or
Economic Hardship Relief Subscriptions.  Not one who
has ever qualified for complimentary subscription of
either of these programs has ever been denied.

The World eBook Fair effort at worldebookfair.com or
via www.gutenberg.org is a cooperative effort by the
World eBook Library Consortia and Project Gutenberg,
representing the largest and oldest eBook libraries.

1/3 of a million eBooks will be made available for a
period from July 4 to August 4 this year in honor of
the date of the first steps taken to make eLibraries
on July 4, 1971, on what was to become the Internet,
via what as to become Project Gutenberg.

This celebration is to promote book awareness and to
assist in efforts to increase literacy and education
all over the world.

We hope you enjoy some of your favorite books in the
editions presented here and that you will pass those
on to others we hope will enjoy them as much as you.

Please feel free to send in requests for books to be
included in future World eBook Fairs.

*

We hope you will see fit to give this press release,
and future similar releases, your consideration, and
that you will see fit to pass them on to others.

If you have any favorite media people, encouragement
to them to use this would be wonderful!


Thank you so much!

Michael S. Hart
John Guagliardo
Gregory Newby


Visit Project Gutenberg sites at:

http://gutenberg.org  ~50 languages  The original PG site
http://gutenberg.net.au   Project Gutenberg of Australia
http://pgdp.net   Original Distributed Proofreaders Site
http://gutenberg.cc  ~100 languages  PG Consortia Center
http://pge.rastko.net ~65 languages  PG of Europe
http://dp.rastko.net     Distributed Proofreaders Europe


Visit The World eBook Library at:

http://www.netlibrary.net
http://public-library.net


*

David vs. Googliath

From Bowerbird at aol.com  Fri May  5 11:43:51 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri May  5 11:44:02 2006
Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks
Message-ID: <1ed.4f34cd80.318cf6e7@aol.com>

michael said:
>    2 Months To 1/3 Million eBooks

last i remember reading
-- six months back? --
the million book project
said they had 600,000
books already scanned.

but they conceded that
not all were online yet...

ok, here's the reference:
>    http://www.library.cmu.edu/Libraries/MBP_FAQ.html#current

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060505/b148faed/attachment.html
From ke at gnu.franken.de  Fri May  5 10:41:47 2006
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Fri May  5 11:52:00 2006
Subject: [gutvol-d] Re: PG texts in South Africa
In-Reply-To: <C080D55B.5394%radicks@bellsouth.net> (Dick Adicks's message of
	"Fri, 05 May 2006 10:11:39 -0400")
References: <C080D55B.5394%radicks@bellsouth.net>
Message-ID: <sh7j50v06c.fsf@tux.gnu.franken.de>

Dick Adicks <radicks@bellsouth.net> writes:

> PG volunteers can help young people whose learning depends
> on overcoming that kind of hardship.

Unfortunately, we cannot.  Our old books are nice, but they surely need
mostly books from last decade.

-- 
http://www.gnu.franken.de/ke/                           |      ,__o
                                                        |    _-\_<,
                                                        |   (*)/'(*)
Key fingerprint = F138 B28F B7ED E0AC 1AB4  AA7F C90A 35C3 E9D0 5D1C
From hart at pglaf.org  Mon May  8 08:43:17 2006
From: hart at pglaf.org (Michael Hart)
Date: Mon May  8 08:43:19 2006
Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks
In-Reply-To: <1ed.4f34cd80.318cf6e7@aol.com>
References: <1ed.4f34cd80.318cf6e7@aol.com>
Message-ID: <Pine.LNX.4.60.0605080840220.20226@pglaf.org>


My recollection of the latest word from Brewster is that they were
just about to pass 10,000 full text eBooks, though not all had been
proofread and edited to a 99.95% level of accuracy.

So I am presuming they did actually pass 10,000 in the last month
or so, though I haven't seen any official announcements.

It would appear that one of the hardest things to find from Yahoo
or Google eLibraries is the number of well finished eBooks.

mh

On Fri, 5 May 2006 Bowerbird@aol.com wrote:

> michael said:
>>    2 Months To 1/3 Million eBooks
>
> last i remember reading
> -- six months back? --
> the million book project
> said they had 600,000
> books already scanned.
>
> but they conceded that
> not all were online yet...
>
> ok, here's the reference:
>>    http://www.library.cmu.edu/Libraries/MBP_FAQ.html#current
>
> -bowerbird
>
From jon at noring.name  Mon May  8 13:40:20 2006
From: jon at noring.name (Jon Noring)
Date: Mon May  8 13:40:30 2006
Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks
In-Reply-To: <Pine.LNX.4.60.0605080840220.20226@pglaf.org>
References: <1ed.4f34cd80.318cf6e7@aol.com>
	<Pine.LNX.4.60.0605080840220.20226@pglaf.org>
Message-ID: <166276077.20060508144020@noring.name>

Michael Hart wrote:

> My recollection of the latest word from Brewster is that they were
> just about to pass 10,000 full text eBooks, though not all had been
> proofread and edited to a 99.95% level of accuracy.
>
> So I am presuming they did actually pass 10,000 in the last month
> or so, though I haven't seen any official announcements.

Is Brewster's effort (I assume you mean OCA?) doing actual proofing
(which I always interpret to mean human proofing)? I thought all they
were doing is scanning books and producing raw, unproofed text by OCR.

Jon

From Morasch at aol.com  Mon May  8 13:47:09 2006
From: Morasch at aol.com (Morasch@aol.com)
Date: Mon May  8 13:47:14 2006
Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks
Message-ID: <433.49089d.3191084d@aol.com>

michael said:
>    My recollection of the latest word from Brewster is that 
>    they were just about to pass 10,000 full text eBooks, 
>    though not all had been proofread and edited 
>    to a 99.95% level of accuracy.

so it's 600,000 scanned versus 10,000 proofed...

meaning that any people who want one of those
590,000 that have been scanned but not proofed
will need to do the o.c.r. and proofing themselves,
providing they can locate the scan-set online...

sounds fair to me.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060508/5eb8a60d/attachment.html
From gbnewby at pglaf.org  Mon May  8 14:36:29 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon May  8 14:36:32 2006
Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks
In-Reply-To: <166276077.20060508144020@noring.name>
References: <1ed.4f34cd80.318cf6e7@aol.com>
	<Pine.LNX.4.60.0605080840220.20226@pglaf.org>
	<166276077.20060508144020@noring.name>
Message-ID: <20060508213629.GA31020@pglaf.org>

On Mon, May 08, 2006 at 02:40:20PM -0600, Jon Noring wrote:
> Michael Hart wrote:
> 
> > My recollection of the latest word from Brewster is that they were
> > just about to pass 10,000 full text eBooks, though not all had been
> > proofread and edited to a 99.95% level of accuracy.
> >
> > So I am presuming they did actually pass 10,000 in the last month
> > or so, though I haven't seen any official announcements.
> 
> Is Brewster's effort (I assume you mean OCA?) doing actual proofing
> (which I always interpret to mean human proofing)? I thought all they
> were doing is scanning books and producing raw, unproofed text by OCR.
> 
> Jon

You could probably call them and ask for details.  When I
was there in January, they were talking about doing some
automated and semi-automated quality control (like looking
for missing pages, and aligning pages that didn't scan straight).

I don't think they're doing any human proofreading or
markup at all -- instead, they are looking to Distributed Proofreaders
to take that step (or anyone else interested).  
  -- Greg
From Bowerbird at aol.com  Mon May  8 16:14:52 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May  8 16:14:59 2006
Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks
Message-ID: <2ad.4346291.31912aec@aol.com>

>    raw, unproofed text by OCR.

i love the way this sounds -- "raw unproofed text by ocr".

the implication is that the results are rather disastrous...

and sometimes, granted, they can be.

the chain is only as strong as its weakest link.

however, if the paper-book is in fairly good shape, and
its text is rather straightforward, and a best-of-breed
scanner is used, and the scans are done carefully, and
then properly treated (e.g., deskewed and regularized),
and o.c.r. is done with a best-of-breed program, then
auto-clean-up tools combined with normal spell-check
will produce quite accurate text, thank you very much...

and given advances in o.c.r. technology, and other tricks,
a need for "human" proofreading of every word on a page
could be eliminated for all but the most difficult of books.

and my bet is that that day will come long before the one
when we have machine-generated language-translations.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060508/1d5b2c11/attachment.html
From bruce at zuhause.org  Mon May  8 18:04:22 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Mon May  8 18:04:30 2006
Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks
In-Reply-To: <Pine.LNX.4.60.0605080840220.20226@pglaf.org>
References: <1ed.4f34cd80.318cf6e7@aol.com>
	<Pine.LNX.4.60.0605080840220.20226@pglaf.org>
Message-ID: <17503.60054.882510.205276@celery.zuhause.org>

Michael Hart writes:
 > It would appear that one of the hardest things to find from Yahoo
 > or Google eLibraries is the number of well finished eBooks.

My searches at Google have found about 50K books, with another 42k+
books that ought to be available but are not because Google appears to
be not making books available if there's a publication date and no
copyright date.  I don't know how to find books from the OCA, or
Yahoo.
From greg at durendal.org  Tue May  9 04:44:48 2006
From: greg at durendal.org (Greg Weeks)
Date: Tue May  9 05:00:04 2006
Subject: [gutvol-d] index entry for 18346
Message-ID: <Pine.LNX.4.63.0605090742150.4556@durendal.durendal.org>


It looks like etext 18346 didn't index properly. There is an entry for it 
here:

http://www.gutenberg.org/browse/authors/m#a7954

but no entry here:

http://www.gutenberg.org/browse/authors/p#a7662

and this:

http://www.gutenberg.org/etext/18346

results in a No etext no. 18346 error.

-- 
Greg Weeks
http://durendal.org:8080/greg/

From marcello at perathoner.de  Tue May  9 05:30:59 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue May  9 05:31:02 2006
Subject: [gutvol-d] index entry for 18346
In-Reply-To: <Pine.LNX.4.63.0605090742150.4556@durendal.durendal.org>
References: <Pine.LNX.4.63.0605090742150.4556@durendal.durendal.org>
Message-ID: <44608B83.6030208@perathoner.de>

Greg Weeks wrote:
> 
> It looks like etext 18346 didn't index properly. There is an entry for
> it here:
> 
> http://www.gutenberg.org/browse/authors/m#a7954
> 
> but no entry here:
> 
> http://www.gutenberg.org/browse/authors/p#a7662
> 
> and this:
> 
> http://www.gutenberg.org/etext/18346
> 
> results in a No etext no. 18346 error.
> 

Works for me.

Most probably you (or somebody along the way) have cached the error page
and some of the authors pages. Hit the refresh button on your browser.

The automagic cataloger starts running at 02:00 EST and may run for a
couple of hours depending on fileserver load.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From greg at durendal.org  Tue May  9 06:46:44 2006
From: greg at durendal.org (Greg Weeks)
Date: Tue May  9 07:30:05 2006
Subject: [gutvol-d] index entry for 18346
In-Reply-To: <44608B83.6030208@perathoner.de>
References: <Pine.LNX.4.63.0605090742150.4556@durendal.durendal.org>
	<44608B83.6030208@perathoner.de>
Message-ID: <Pine.LNX.4.63.0605090945500.4977@durendal.durendal.org>

On Tue, 9 May 2006, Marcello Perathoner wrote:

> Works for me.
>
> Most probably you (or somebody along the way) have cached the error page
> and some of the authors pages. Hit the refresh button on your browser.

It worked this time. I've had this problem before, where things were 
incorrectly cached.

Thanks.

-- 
Greg Weeks
http://durendal.org:8080/greg/

From hart at pglaf.org  Tue May  9 07:33:55 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue May  9 07:33:57 2006
Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks
In-Reply-To: <17503.60054.882510.205276@celery.zuhause.org>
References: <1ed.4f34cd80.318cf6e7@aol.com>
	<Pine.LNX.4.60.0605080840220.20226@pglaf.org>
	<17503.60054.882510.205276@celery.zuhause.org>
Message-ID: <Pine.LNX.4.60.0605090732180.17440@pglaf.org>


On Mon, 8 May 2006, Bruce Albrecht wrote:

> Michael Hart writes:
> > It would appear that one of the hardest things to find from Yahoo
> > or Google eLibraries is the number of well finished eBooks.
>
> My searches at Google have found about 50K books, with another 42k+
> books that ought to be available but are not because Google appears to
> be not making books available if there's a publication date and no
> copyright date.  I don't know how to find books from the OCA, or
> Yahoo.

Is this the kind of search we were discussing before, searching for
commonplace words in the Google Book Search area, or have you found
a better way?  Perhaps you would be willing to post a list, or send
it to me privately?


Thanks!!!

Give the world eBooks in 2006!!!

Michael S. Hart
Founder
Project Gutenberg


From hart at pglaf.org  Tue May  9 07:35:33 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue May  9 07:35:35 2006
Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks
In-Reply-To: <17503.60054.882510.205276@celery.zuhause.org>
References: <1ed.4f34cd80.318cf6e7@aol.com>
	<Pine.LNX.4.60.0605080840220.20226@pglaf.org>
	<17503.60054.882510.205276@celery.zuhause.org>
Message-ID: <Pine.LNX.4.60.0605090734230.17440@pglaf.org>


One more question, did you figure out any estimate of how many of
those 50,000 books your search found could actually be downloaded?

More thanks!

Michael

From bruce at zuhause.org  Tue May  9 19:54:38 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Tue May  9 19:54:42 2006
Subject: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks
In-Reply-To: <Pine.LNX.4.60.0605090732180.17440@pglaf.org>
References: <1ed.4f34cd80.318cf6e7@aol.com>
	<Pine.LNX.4.60.0605080840220.20226@pglaf.org>
	<17503.60054.882510.205276@celery.zuhause.org>
	<Pine.LNX.4.60.0605090732180.17440@pglaf.org>
Message-ID: <17505.21998.788219.627950@celery.zuhause.org>

Michael Hart writes:
 > On Mon, 8 May 2006, Bruce Albrecht wrote:
 > > My searches at Google have found about 50K books, with another 42k+
 > > books that ought to be available but are not because Google appears to
 > > be not making books available if there's a publication date and no
 > > copyright date.  I don't know how to find books from the OCA, or
 > > Yahoo.
 > 
 > Is this the kind of search we were discussing before, searching for
 > commonplace words in the Google Book Search area, or have you found
 > a better way?  Perhaps you would be willing to post a list, or send
 > it to me privately?

They were found by doing keyword searches. Google Books now makes it
easier to determine whether the book can be viewed in full.  From the
Google Book search page, they now indicate whether the book is either
full view, snippet view, or no view.

I'm not making a full list available anymore, at least not as a single
download, because it took several minutes to download it from my site,
and I was getting too many download requests from people who were
downloading it because it was showing up at web search engines.  I am
currently working on loading MARC entries from several libraries for
the books I've found, and will be supporting standard typical MARC tag
searches (subject, author, publisher, language, etc.).  My long term
goal is to do the same for as many of the public domain image archives
as I can. 

I still need to clean up the MARC entries, and the searches have not
been implemented, but the website and displays of the Google Book
entries and the MARC entries are at http://pdbooks.zuhause.org/


From hart at pglaf.org  Wed May 10 07:36:22 2006
From: hart at pglaf.org (Michael Hart)
Date: Wed May 10 07:36:23 2006
Subject: !@!re: [gutvol-d] Press Release: 2 Months To 1/3 Million eBooks
In-Reply-To: <17505.21998.788219.627950@celery.zuhause.org>
References: <1ed.4f34cd80.318cf6e7@aol.com>
	<Pine.LNX.4.60.0605080840220.20226@pglaf.org>
	<17503.60054.882510.205276@celery.zuhause.org>
	<Pine.LNX.4.60.0605090732180.17440@pglaf.org>
	<17505.21998.788219.627950@celery.zuhause.org>
Message-ID: <Pine.LNX.4.60.0605100733330.8511@pglaf.org>


On Tue, 9 May 2006, Bruce Albrecht wrote:

> Michael Hart writes:
> > On Mon, 8 May 2006, Bruce Albrecht wrote:
> > > My searches at Google have found about 50K books, with another 42k+
> > > books that ought to be available but are not because Google appears to
> > > be not making books available if there's a publication date and no
> > > copyright date.  I don't know how to find books from the OCA, or
> > > Yahoo.
> >
> > Is this the kind of search we were discussing before, searching for
> > commonplace words in the Google Book Search area, or have you found
> > a better way?  Perhaps you would be willing to post a list, or send
> > it to me privately?
>
> They were found by doing keyword searches. Google Books now makes it
> easier to determine whether the book can be viewed in full.  From the
> Google Book search page, they now indicate whether the book is either
> full view, snippet view, or no view.
>
> I'm not making a full list available anymore, at least not as a single
> download, because it took several minutes to download it from my site,
> and I was getting too many download requests from people who were
> downloading it because it was showing up at web search engines.

Would you be willing to let pglaf.org handle those download problems?


> I am currently working on loading MARC entries from several libraries for the 
> books I've found, and will be supporting standard typical MARC tag searches 
> (subject, author, publisher, language, etc.).  My long term goal is to do the 
> same for as many of the public domain image archives as I can.

Wonderful!!!


> I still need to clean up the MARC entries, and the searches have not
> been implemented, but the website and displays of the Google Book
> entries and the MARC entries are at http://pdbooks.zuhause.org/

MARC listings for eBooks are obviously going to be one of the
"next big things" for eLibraries!


Thanks!!!

Give the world eBooks in 2006!!!

Michael S. Hart
Founder
Project Gutenberg

From hyphen at hyphenologist.co.uk  Wed May 10 23:28:41 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Wed May 10 23:28:53 2006
Subject: [gutvol-d] Please how do I get my etexts displayed like this?
Message-ID: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com>

My etexts are individual stories and/or poems which are known individually
and were often published several times in their own right.  see etext no
18175

from weekly newsletter
>>>
|Emerson's Wife and Other Western Stories, by Florence Finch Kelly        18309
|   [Illus.: Stanley L. Wood]
|   [Contents: Emerson's Wife]]
|   [          Colonel Kate's Protge]
|   [          The Kid of Apache Teju]
|   [          A Blaze on Pard Huff]
|   [          How Colonel Kate Won Her Spurs]
|   [          Hollyhocks]
|   [          The Rise, Fall, and Redemption of Johnson Sides]
|   [          A Piece of Wreckage]
|   [          The Story of a Chinee Kid]
|   [          Out of Sympathy]
|   [          An Old Roman of Mariposa]
|   [          Out of the Mouth of Babes]
|   [          Posey]
|   [          A Case of the Inner Imperative]
|   [Link: http://www.gutenberg.org/1/8/3/0/18309 ]
|   [Files: 18309.txt; 18309-8.txt; 18309-h.htm; ]
<<<
-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain

From sly at victoria.tc.ca  Wed May 10 23:35:31 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed May 10 23:35:37 2006
Subject: [gutvol-d] Please how do I get my etexts displayed like this?
In-Reply-To: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com>
References: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com>
Message-ID: <Pine.GSO.4.58.0605102333430.2872@vtn1.victoria.tc.ca>

What is it you want to have "displayed like this"?
The posted notes and gutindex listings, the online catalog record?

I'll take a look at PG#18175
Andrew

On Thu, 11 May 2006, Dave Fawthrop wrote:

> My etexts are individual stories and/or poems which are known individually
> and were often published several times in their own right.  see etext no
> 18175
>
> from weekly newsletter
> >>>
> |Emerson's Wife and Other Western Stories, by Florence Finch Kelly        18309
> |   [Illus.: Stanley L. Wood]
> |   [Contents: Emerson's Wife]]
> |   [          Colonel Kate's Protge]
> |   [          The Kid of Apache Teju]
> |   [          A Blaze on Pard Huff]
> |   [          How Colonel Kate Won Her Spurs]
> |   [          Hollyhocks]
> |   [          The Rise, Fall, and Redemption of Johnson Sides]
> |   [          A Piece of Wreckage]
> |   [          The Story of a Chinee Kid]
> |   [          Out of Sympathy]
> |   [          An Old Roman of Mariposa]
> |   [          Out of the Mouth of Babes]
> |   [          Posey]
> |   [          A Case of the Inner Imperative]
> |   [Link: http://www.gutenberg.org/1/8/3/0/18309 ]
> |   [Files: 18309.txt; 18309-8.txt; 18309-h.htm; ]
> <<<
>
From sly at victoria.tc.ca  Wed May 10 23:45:20 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed May 10 23:45:24 2006
Subject: [gutvol-d] Please how do I get my etexts displayed like this?
In-Reply-To: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com>
References: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com>
Message-ID: <Pine.GSO.4.58.0605102341160.15990@vtn1.victoria.tc.ca>


Ok, it's longer than I usually do it with, but I've added a "formatted
contents" field for 18175. What do you think?

Also, remember that unlike traditional library catalogs, our full
texts are also parsed. Search engines will pick up all of the items
listed in a table of contents near the beginning of a text.

Andrew

On Thu, 11 May 2006, Dave Fawthrop wrote:

> My etexts are individual stories and/or poems which are known individually
> and were often published several times in their own right.  see etext no
> 18175
>
> from weekly newsletter
> >>>
> |Emerson's Wife and Other Western Stories, by Florence Finch Kelly        18309
> |   [Illus.: Stanley L. Wood]
> |   [Contents: Emerson's Wife]]
> |   [          Colonel Kate's Protge]
> |   [          The Kid of Apache Teju]
> |   [          A Blaze on Pard Huff]
> |   [          How Colonel Kate Won Her Spurs]
> |   [          Hollyhocks]
> |   [          The Rise, Fall, and Redemption of Johnson Sides]
> |   [          A Piece of Wreckage]
> |   [          The Story of a Chinee Kid]
> |   [          Out of Sympathy]
> |   [          An Old Roman of Mariposa]
> |   [          Out of the Mouth of Babes]
> |   [          Posey]
> |   [          A Case of the Inner Imperative]
> |   [Link: http://www.gutenberg.org/1/8/3/0/18309 ]
> |   [Files: 18309.txt; 18309-8.txt; 18309-h.htm; ]
> <<<
>
From Bowerbird at aol.com  Sun May 14 10:21:59 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun May 14 10:22:10 2006
Subject: [gutvol-d] on the will to scan books and digitize their text
Message-ID: <3f4.26f0823.3198c137@aol.com>

i said:
>   however, if the paper-book is in fairly good shape, and
>    its text is rather straightforward, and a best-of-breed
>   scanner is used, and the scans are done carefully, and
>   then properly treated (e.g., deskewed and regularized),
>   and o.c.r. is done with a best-of-breed program, then
>   auto-clean-up tools combined with normal spell-check
>   will produce quite accurate text, thank you very much...
...and...
>   so it's 600,000 scanned versus 10,000 proofed...
>    meaning that any people who want one of those
>   590,000 that have been scanned but not proofed
>   will need to do the o.c.r. and proofing themselves,
>   providing they can locate the scan-set online...
>   sounds fair to me.

of course, whether or not people will deem it necessary
or even desirable to _do_the_work_ to get digital text is
another question entirely.

branko ran a poll at the teleread site and in the forums
at distributed proofreaders, and the results indicate that
people have little interest in digitizing their home library,
the books they have sitting as paper-copies in their homes.

over _half_ say they'd digitize them only if it could be done
with _less_than_one_hour_per_book_.   over one-quarter say
they'd do it only if it took _less_than_ten_minutes_per_book_.

a not-insignificant number want it happen almost _magically_,
having it take it _less_than_one_minute_per_book.   (telepathy?)

since teleread specializes in creating unrealistic expectations,
it would be tempting to chalk these poll results up to that, but
alas, some respondents are people who actually digitize books.
(but yes, the teleread respondents are even more out of touch.)

it's somewhat shocking to understand that even people from
distributed proofreaders say this, some of whom have likely
spent more than ten minutes proofing _a_couple_pages_, so
they have to know that time-frame is completely unrealistic.

so this isn't just massive ignorance about the time required.

the results tell us that _if_ they have a paper copy of a book,
people seem to feel little need for a digital copy of the text.

i know that i often tend to think from a mindset that posits
that digital text has many advantages over the printed page,
but people seem not to consider those advantages important.
at least not enough to merit a non-trivial amount of their time.

it seems only natural to extend the results, that if people have
the scan-set of a book, they'd have little need for digital text...

***

meanwhile, an article by kevin kelley in the new york times:
>   http://www.nytimes.com/2006/05/12/us/12vote.html?ex=1305086400
>    &en=5b3554a76aad524a&ei=5090&partner=rssuserland&emc=rss
informs us a company in china has scanned 1.3 million unique titles
in chinese, which it estimates is about half of the books published 
in the chinese language since 1949.

that's right:   _1.3_million_.   already.   and still going strong...

while we americans can't even get to a paperless office,
and publishers sue the daylights out of the one and only
company in this country who is willing to scan our libraries,
china is moving quickly to becoming a paperless country...

so michael, in spite of the flack that people want to give you,
it looks like you've been _undercounting_, by a wide margin...

and maybe just maybe you're holding your "world e-book fair"
on the wrong side of the globe...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060514/1a09d237/attachment.html
From gbnewby at pglaf.org  Sun May 14 15:58:24 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun May 14 15:58:25 2006
Subject: [gutvol-d] WAP books
Message-ID: <20060514225824.GA21841@pglaf.org>

I remember Marcello had done some work to allow
www.gutenberg.org to work on mobile phones via the WAP
method.

We're working with someone to get a bunch more 
reformatted PG eBooks for mobile phones...copying
them to wap.readingroo.ms (not yet running) from
wap.mobilebooks.org 

Question: has anyone had success with their
WAP-enabled phone via wap.mobilebooks.org ?  Feedback
or input would be valuable!

Keep in mind, there are a *lot* of people with a *lot*
of mobile phones.  Making our eBooks "phone-friendly"
will be a great accomplishment.
  -- Greg
From sly at victoria.tc.ca  Sun May 14 17:27:04 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun May 14 17:27:07 2006
Subject: [gutvol-d] Michael Geist lecture
In-Reply-To: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com>
References: <8sl562145u64grkjtvf197avgsgq8ru1ga@4ax.com>
Message-ID: <Pine.GSO.4.58.0605141725460.24157@vtn1.victoria.tc.ca>


Some here may be interested to read "Our Own Creative Land: Cultural
Monopoly and the Trouble With Copyright," by Michael Geist. It is an
exploration of issues of copyright, politics, and digital content
from a Canadian perspective.

http://www.p2pnet.net/story/8776

Andrew

From bruce at zuhause.org  Sun May 14 20:15:45 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Sun May 14 20:15:49 2006
Subject: [gutvol-d] on the will to scan books and digitize their text
In-Reply-To: <3f4.26f0823.3198c137@aol.com>
References: <3f4.26f0823.3198c137@aol.com>
Message-ID: <17511.62049.910097.161473@celery.zuhause.org>

Bowerbird@aol.com writes:
 > of course, whether or not people will deem it necessary
 > or even desirable to _do_the_work_ to get digital text is
 > another question entirely.

Well, I think this really depends on the answers to questions like "Do
I have a electronic reader I would prefer to read over a paperback?",
and "Am I likely to reread this book often enough that spending an
additional couple hours converting it to something usable on my reader
worth my time?" At least with producing texts for Project Gutenberg,
even if you never read it again, presumably others will.
From brad at chenla.org  Mon May 15 04:43:55 2006
From: brad at chenla.org (Brad Collins)
Date: Mon May 15 05:38:14 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <m37j4no6mc.fsf@chenla.org>


If you haven't seen this, it's well worth the read.

For those of you who don't know, the author, Kevin Kelly used to be
the chief editor of Wired Magazine back in it's glory days during the
great Bubble.

  http://www.nytimes.com/2006/05/14/magazine/14publishing.html?_r=1&oref=slogin&pagewanted=all

b/


-- 
Brad Collins <brad@chenla.org>, Banqwao, Thailand

From Bowerbird at aol.com  Mon May 15 10:38:47 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 15 10:39:01 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <417.1c4ad86.319a16a7@aol.com>

brad said:
>    If you haven't seen this, it's well worth the read.

kevin kelly is always good for a sweeping overview.

in spite of the tone of the article, the only "newish" idea
in it is the notion that "books will read each other" and
become synergistically interlinked, and that idea is one
that is both interesting and perplexing at the same time.

how -- _exactly_ -- is this supposed to happen?

neither links nor tags, in their current form anyway,
indicate an association between two external entities.

even the most basic of building blocks in that regard
-- a clean "a.p.i." into the cyberlibrary -- is absent...

heck, the official policy at project gutenberg is that
people must _not_ "deep-link" into the _content_ of
your books per se, but rather only to a catalog page.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060515/b4f93d2d/attachment.html
From traverso at dm.unipi.it  Mon May 15 12:41:40 2006
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Mon May 15 12:31:53 2006
Subject: [gutvol-d] on the will to scan books and digitize their text
In-Reply-To: <17511.62049.910097.161473@celery.zuhause.org> (message from
	Bruce Albrecht on Sun, 14 May 2006 22:15:45 -0500)
References: <3f4.26f0823.3198c137@aol.com>
	<17511.62049.910097.161473@celery.zuhause.org>
Message-ID: <200605151941.k4FJfen18024@pico.dm.unipi.it>

>>>>> "Bruce" == Bruce Albrecht <bruce@zuhause.org> writes:

    Bruce> Bowerbird@aol.com writes:
    >> of course, whether or not people will deem it necessary or even
    >> desirable to _do_the_work_ to get digital text is another
    >> question entirely.

    Bruce> Well, I think this really depends on the answers to
    Bruce> questions like "Do I have a electronic reader I would
    Bruce> prefer to read over a paperback?", and "Am I likely to
    Bruce> reread this book often enough that spending an additional
    Bruce> couple hours converting it to something usable on my reader
    Bruce> worth my time?" At least with producing texts for Project
    Bruce> Gutenberg, even if you never read it again, presumably
    Bruce> others will.

As the question was posed, the answer also depends on how many books
you owe. Having several thousand books, even at one minute per book it
is a huge work, especially if you are not going to read many of them,
and you are not allowed to share them because of copyright reasons
(and even the copying might be illegal).

>From what I understood, however, the meaning of the pool was to
measure the difference of results between DP and telerad users.

Carlo


From hyphen at hyphenologist.co.uk  Thu May 18 09:04:24 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Thu May 18 09:11:15 2006
Subject: [gutvol-d] UniPad 1.20 has been released
Message-ID: <vb6p621i8cr9thfik5dgja856p3f28nj0f@4ax.com>

Just got this Unipad is a programmers text editor which works in Unicode
I have 1.0 which works well with internal huge font.

Someone here may be interested

|We are glad to inform you about the release of UniPad 1.20. 
|This upgrade is free of charge for all registered users.
|
|
|Changes, corrections and new features:
|
|Upgrade to Unicode 4.1:
|
|o This version of UniPad incorporates over 1270 new characters that 
|  have been added since Unicode 4.0.
|
|o New complete scripts are: 
|    New Tai Lue, Buginese, Glagolitic, Coptic, Tifinagh, Syloti Nagri,
|    Old Persian, Kharoshthi.
|
|o Some scripts have been extended by adding new character blocks: 
|    Arabic with Arabic Supplement, Georgian with Georgian Supplement,
|    Ethiopic with Ethiopic Supplement and Ethiopic Extented.
|
|o Some supplemental character blocks have been added: 
|    Phonetic Extensions Supplement, Combining Diacritical Marks Supplement,
|    Supplemental Punctuation.
|
|o Other new blocks are: CJK Strokes, Modifier Tone Letters, Vertical Forms,
|  Ancient Greek Numbers, Ancient Greek Musical Notation.
|
|o Over 700 new characters have been added to the following existing character
|  blocks and scripts: 
|    Latin Extended-B, Combining Diacritical Marks, Greek
|    and Coptic, Cyrillic, Hebrew, Arabic, Devanagari, Bengali
|    Tamil, Tibetan, Georgian, Ethiopic, Phonetic Extensions, General Punctuation,
|    Currency Symbols, Combining Diacritical Marks for Symbols, Letterlike 
|    Symbols, Miscellaneous Technical, Miscellaneous Symbols, Miscellaneous 
|    Mathematical Symbols-A, Miscellaneous Symbols and Arrows, Enclosed CJK 
|    Letters and Months, CJK Unified Ideographs, CJK Compatibility Ideographs, 
|    Mathematical Alphanumeric Symbols.
|
|o For more information please visit the Unicode 4.1 page at the official web
|  site of the Unicode Consortium: <http://www.unicode.org/versions/Unicode4.1.0>.
|
|
|New keyboards:
|
|o Dzongkha, Uyghur, Polish (Programer), Kannada, Dari, Pashto, Uzbek (Southern).
|
|
|UniPad 1.20 is trialware. It runs in either unregistered or registered mode. Running UniPad in 
|
|unregistered mode is free for anyone. Running UniPad in unregistred mode is "Session-Limited". After the 
|
|session time you can save your work and restart UniPad for a new session.
|
|
|If you register, you will be able to run UniPad in registered mode and this limitation will be removed.
|
|Download: 
|    http:/www.unipad.org/download
|
|Registration:
|    http:/www.unipad.org/register
-- 
Dave Fawthrop <hyphen@hyphenologist.co.uk> 

From brad at chenla.org  Fri May 19 05:06:11 2006
From: brad at chenla.org (Brad Collins)
Date: Fri May 19 05:03:34 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <417.1c4ad86.319a16a7@aol.com> (Bowerbird@aol.com's message of
	"Mon, 15 May 2006 13:38:47 EDT")
References: <417.1c4ad86.319a16a7@aol.com>
Message-ID: <m3ejyqp6bw.fsf@chenla.org>

Bowerbird@aol.com writes:

> in spite of the tone of the article, the only "newish" idea
> in it is the notion that "books will read each other" and
> become synergistically interlinked, and that idea is one
> that is both interesting and perplexing at the same time.
>
> how -- _exactly_ -- is this supposed to happen?
>
> neither links nor tags, in their current form anyway,
> indicate an association between two external entities.
>
> even the most basic of building blocks in that regard
> -- a clean "a.p.i." into the cyberlibrary -- is absent...


You're quite right.  The article talks of scanning which is the first
stage.  DP/PG takes it to the next stage by turning scans into
electronic texts, but there hasn't been anything that has stepped up
to bat to take on the next stage.

This is exactly what I've been working on these past few years.

We'll be launching the spec (open and free) at the Extreme Markup
Language conference in Montreal in August.  At the same time we will
provide an AJAX Web application, and Emacs based development
environment a set of XSLT style sheets for converting into common
formats, complete documentation and a set of test data which will
provide examples and a data set for developing applications.

I am now revising the paper which we will introduce at the conference
and will send it out to anyone who is interested for feedback.

When it's ready I'll send a blurb to the list with a berief
description of the framework and see if anyone is interested in taking
a look.

b/

-- 
Brad Collins <brad@chenla.org>, Banqwao, Thailand
From Bowerbird at aol.com  Fri May 19 11:53:50 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri May 19 11:53:59 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <36f.41eaefe.319f6e3e@aol.com>

brad said:
>    This is exactly what I've been working on these past few years.
>    We'll be launching the spec (open and free) at the

great!   i look forward to playing with your stuff.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060519/90ad9c61/attachment.html
From Bowerbird at aol.com  Sat May 20 14:24:59 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat May 20 14:25:11 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <370.3e9dbdc.31a0e32b@aol.com>

carlo said:
>    Just a possibility: 
>    if a book quotes another, 
>    and both are available 
>    a link can be created 
>    from one to the other. 

yes, an exact quote is pretty easy to locate.

and of course, in most books, an exact quote
will already have been credited explicitly there, 
so there's not even a need to do the search digitally
(although it might be easier to do it programmatically
than to try to locate all of these references "manually",
at least if the false alarms from common expressions
were manageable).


>    Easy if an exact quotation is available, 
>    more intriguing if the reference is vague.

the recent cases of plagiarism in books coming out of
major publishing houses comes to mind here, doesn't it?

it shouldn't be too hard to write a routine that would do a
relatively good job rounding up "fuzzy" versions of quotes,
as long as they had some substantial similarity...


>    Another could be to find similar content 
>    even in absence of an explicit quotation, 
>    and have a link from a section of a book to 
>    a list of sections of other books.   This might be 
>    a detection of keywords coupled with a keyword search.

well yes, this is what i would think most people have in mind
when they talk about "the pages of books reading each other".

and since we will be indexing each book anyway, it would be
rather easy to set up a "semantic profile" of each book detailing
any idiosyncratic words that are fairly common within its pages.

books that have similar profiles could be linked to each other,
and then pages or sections within one book that are similar to
those in the other book could be linked together too, certainly.

amazon already does this in a fashion with their listings of the
"statistically improbable phrases" -- sips -- within each book,
and the "capitalized phrases" -- caps -- so that's fairly obvious.

amazon's "sips" and "caps" are interesting because you can click
on them, and see a list of other books where they also appear...

amazon also does an overall concordance, but i'd guess that
that isn't as useful for earmarking a book in a set of its peers.
(well, it looks like they formulate what's called a "tag cloud",
with links to the actual occurrences, so that could be useful.
still, i think the "sips" and "caps" would be more meaningful.)

lastly, amazon also lists "books on related topics" for each book.
i don't know if the quality of these associations is up to snuff --
amazon's version of "collaborative filtering" is a very bad joke
-- but i'd imagine that it has utility for a range of book-buyers...
(ok, more exploration tells me they do indeed use an overlap of
"sips" as their main tool in discerning "books on related topics".)

for those people who've never fooled around over at amazon,
i've appended a sample of some of their info for a specific book.

of course, what we _really_ want is not just to be _informed_
about similar books, but to have actual, honest-to-goodness
hyperlinks between 'em, so we can point-and-click at our desks,
rather than just order paper-copies to be delivered to our desks.


>    Both are already possible at the present state of technology.

these and more, absolutely.

tagging and annotation are other options that get thrown around a lot.
the idea here would be that interested users would form a "folksonomy"
that would link related books, perhaps with a commentary of their own.

this would give us the type of "intelligence" that can only be exercised
by actual human brains, and which might complement and/or supersede
the "brute force" approach of automatic computerized semantic analysis.

in another vein, the cats at the institute for the future of the book seem 
fond
of author/reader interaction in the actual _writing_ of the book, in a 
process
where a book grows "organically", against the backdrop of the cyberlibrary.
in this approach, links might _predate_ the content -- in essence be a 
"cause"
of the content, rather than merely an "effect" -- which is an interesting 
view...

likewise, david rothman's hobbyhorse these days is "blogs inside of books".
the initial version of an openreader viewer-app will support this capability,
so david has been raving about this feature like it's some kind of epiphany.
he is even of the opinion that amazon -- which announced such a feature
will be available soon in mobipocket -- could save themselves "a fortune in
development costs" by using openreader instead of mobipocket.   given that
it's relatively trivial to embed this capability, i'm not sure what he's 
thinking.

i would go over to his blog and ask him, but i've been semi-officially banned
-- i'm not banned, but many of my posts have now permanently disappeared --
because i have this annoying habit of saying things that do not go along with
the official spin that he likes to hype over there.   so i would certainly 
hope that
his put-a-blog-in-your-book capability allows an author to ban any "trolls",
because we wouldn't want to experience any disagreement now, would we?

either way, amazon seems to have had no trouble implementing a "discussion"
section feature -- currently labeled as "beta" -- on its webpage for each 
book.
this is in addition to the "wiki" which it already had, the purpose of which 
i am
not all that sure, and haven't investigated, because the overwhelming nature
of all of the _stuff_ on each amazon page gives me bad information overload,
and after a while i just feel a strong need to get the heck out of there!     
    :+)

***

at any rate, these are some of the ideas that are bubbling at the surface.

and though some of them sound interesting, to be sure, i also find that i am
left wondering if all of this "books reading each other" stuff is gonna lead 
to
something immense that leapfrogs us to the next level of super-intelligence,
or whether it's all much ado about not too much...   time will tell, i 
guess...

-bowerbird

p.s.   here is some of the information that amazon gives for a 1997 book...


Internet Dreams: Archetypes, Myths, and Metaphors (Paperback)

edited by Mark Stefik, with Introduction by Vinton Cerf


First Sentence:

We are born into a world rich in art, invention, and knowledge


Statistically Improbable Phrases (SIPs):

electronic mail metaphor, digital library metaphor, electronic sketch book, 
digital tickets, electronic brokerage effect, digital property rights, 
networked libraries, digital works, marketplace metaphor, 
superhighway metaphor, digital library system, trusted systems, digital book, 

dream session, editing test, digital reality, new design methods, usage 
rights, 
digital library project, electronic hierarchies, virtual rape, warrior 
archetype, 
digital publishing, fire bringer, electronic mail address


Capitalized Phrases (CAPs):

Library of Congress, Jeremy Taylor, United States, Gutenberg Bible, 
America Online, British Library, New York, Vannevar Bush, World Wide Web, 
Boston Spa, Joshua Lederberg, Lynn Conway, Palo Alto, San Francisco, 
Turing Test, Bungle Affair, Carver Mead, Challenging Assumptions, 
The Machine Stops, Yellow Pages, Alexander Eliot, Civil War, 
Digital Property Trust, Internet Companion, Libraries of the Future


Concordance

These are the 100 most frequently used words in this book:

access another article available between book case changes come 
communication community computer copy costs course design 
different digital dream dreammc electronic even example experience 
first form get good group however idea information internet joannel2 
journals kinds know knowledge large library life market may means 
meeting members message metaphor methods might mud need network 
new now number often others own paper part participants people place 
players problem process project provide public publishers publishing 
read real repositories research rights room say see sense several should 
social society system take technology text things thus time two use used 
users virtual without work world?


Text Stats

These statistics are computed from the text of this book.

Readability -- Compared with books in All Categories
Fog Index: 15.9 -- 75% are easier, 25% are harder
Flesch Index: 40.0 -- 69% are easier, 31% are harder
Flesch-Kincaid Index: 13.0 -- 76% are easier, 24% are harder
?
Complexity
Complex Words: 18% -- 66% have fewer, 34% have more
Syllables per Word: 1.7 -- 65% have fewer, 35% have more
Words per Sentence: 21.3 -- 77% have fewer, 23% have more

Number of:
Characters: 804,122 -- 83% have fewer, 17% have more
Words: 129,664 -- 85% have fewer, 15% have more
Sentences: 6,078 -- 73% have fewer, 27% have more
?
Fun stats
Words per Dollar: 4,987 ?
Words per Ounce: 5,332 ?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060520/a226f373/attachment.html
From hart at pglaf.org  Mon May 22 07:33:29 2006
From: hart at pglaf.org (Michael Hart)
Date: Mon May 22 07:33:31 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <m3ejyqp6bw.fsf@chenla.org>
References: <417.1c4ad86.319a16a7@aol.com> <m3ejyqp6bw.fsf@chenla.org>
Message-ID: <Pine.LNX.4.60.0605220730520.28264@pglaf.org>


On Fri, 19 May 2006, Brad Collins wrote:

> Bowerbird@aol.com writes:
>
>> in spite of the tone of the article, the only "newish" idea
>> in it is the notion that "books will read each other" and
>> become synergistically interlinked, and that idea is one
>> that is both interesting and perplexing at the same time.
>>
>> how -- _exactly_ -- is this supposed to happen?
>>
>> neither links nor tags, in their current form anyway,
>> indicate an association between two external entities.
>>
>> even the most basic of building blocks in that regard
>> -- a clean "a.p.i." into the cyberlibrary -- is absent...
>
>
> You're quite right.  The article talks of scanning which is the first
> stage.  DP/PG takes it to the next stage by turning scans into
> electronic texts, but there hasn't been anything that has stepped up
> to bat to take on the next stage.

It is quite interesting that Kevin Kelly has ignored Project Gutenberg
so throughly since Conde Naste has taken over WIRED, and WIRED used to
mention PG several times a year, sometimes even in cover stories or in
their timeline of the greatest Millennium events.

A friend asked him why. . .he said it was due to space limitations.

~8,000 words?

No problem mentioning The Billionaire Boys Club eBooks projects.

;-)

From davidrothman at pobox.com  Mon May 22 07:21:25 2006
From: davidrothman at pobox.com (David H. Rothman)
Date: Mon May 22 08:11:04 2006
Subject: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin Kelly
	in NYT on future of digital libraries]
In-Reply-To: <20060522102055.1501635176.davidrothman@pobox.com>
References: <370.3e9dbdc.31a0e32b@aol.com>
	<20060522044422.1912481309.davidrothman@pobox.com>
	<20060522045804.1822398048.davidrothman@pobox.com>
	<20060522050215.386360944.davidrothman@pobox.com>
	<20060522050755.1420494486.davidrothman@pobox.com>
	<20060522051428.967862575.davidrothman@pobox.com>
	<20060522051648.800796541.davidrothman@pobox.com>
	<20060522051925.15344279.davidrothman@pobox.com>
	<20060522053339.229489020.davidrothman@pobox.com>
	<20060522053502.448537029.davidrothman@pobox.com>
	<20060522053913.277997531.davidrothman@pobox.com>
	<20060522054828.358784686.davidrothman@pobox.com>
	<20060522054932.358966824.davidrothman@pobox.com>
	<20060522055320.1778311864.davidrothman@pobox.com>
	<20060522060428.714989660.davidrothman@pobox.com>
	<20060522061851.1553625961.davidrothman@pobox.com>
	<20060522062404.772062695.davidrothman@pobox.com>
	<20060522062415.383795055.davidrothman@pobox.com>
	<20060522064200.1649312626.davidrothman@pobox.com>
	<20060522064251.1131791825.davidrothman@pobox.com>
	<20060522064753.1548642210.davidrothman@pobox.com>
	<20060522065004.1797726829.davidrothman@pobox.com>
	<20060522065504.1505615045.davidrothman@pobox.com>
	<20060522070022.1954849555.davidrothman@pobox.com>
	<20060522070738.663127506.davidrothman@pobox.com>
	<20060522070958.1282631953.davidrothman@pobox.com>
	<20060522071023.1391252226.davidrothman@pobox.com>
	<20060522071226.1709291641.davidrothman@pobox.com>
	<20060522071348.1904243163.davidrothman@pobox.com>
	<20060522071427.1894670678.davidrothman@pobox.com>
	<20060522071833.1920423911.davidrothman@pobox.com>
	<20060522071850.1587886708.davidrothman@pobox.com>
	<20060522071957.745063081.davidrothman@pobox.com>
	<20060522072006.917881182.davidrothman@pobox.com>
	<20060522072207.742524294.davidrothman@pobox.com>
	<20060522072510.1578459509.davidrothman@pobox.com>
	<20060522072705.24216481.davidrothman@pobox.com>
	<20060522073142.450694835.davidrothman@pobox.com>
	<20060522073253.385219765.davidrothman@pobox.com>
	<20060522073647.2116477931.davidrothman@pobox.com>
	<20060522074157.443744003.davidrothman@pobox.com>
	<20060522074246.1132228085.davidrothman@pobox.com>
	<20060522074400.1976932334.davidrothman@pobox.com>
	<20060522074704.1214913263.davidrothman@pobox.com>
	<20060522075003.605536152.davidrothman@pobox.com>
	<20060522075149.400459223.davidrothman@pobox.com>
	<20060522075155.116956168.davidrothman@pobox.com>
	<20060522075202.1081707116.davidrothman@pobox.com>
	<20060522075345.1846128214.davidrothman@pobox.com>
	<20060522075613.509322149.davidrothman@pobox.com>
	<20060522075628.1580225884.davidrothman@pobox.com>
	<20060522075657.2074880068.davidrothman@pobox.com>
	<20060522075824.497658352.davidrothman@pobox.com>
	<20060522080003.280257892.davidrothman@pobox.com>
	<20060522080254.151952726.davidrothman@pobox.com>
	<20060522080441.1099670084.davidrothman@pobox.com>
	<20060522080516.148514185.davidrothman@pobox.com>
	<20060522081044.833029587.davidrothman@pobox.com>
	<20060522081452.935214828.davidrothman@pobox.com>
	<20060522081741.1613771435.davidrothman@pobox.com>
	<20060522081906.196095457.davidrothman@pobox.com>
	<20060522082107.414609285.davidrothman@pobox.com>
	<20060522082126.36116734.davidrothman@pobox.com>
	<20060522082511.185524298.davidrothman@pobox.com>
	<20060522082539.808272106.davidrothman@pobox.com>
	<20060522082604.1326329017.davidrothman@pobox.com>
	<20060522082619.810954139.davidrothman@pobox.com>
	<20060522082659.59838956.davidrothman@pobox.com>
	<20060522082704.141781736.davidrothman@pobox.com>
	<20060522082714.1016313511.davidrothman@pobox.com>
	<20060522083446.993232898.davidrothman@pobox.com>
	<20060522101653.1962499062.davidrothman@pobox.com>
	<20060522102055.1501635176.davidrothman@pobox.com>
Message-ID: <20060522102125.1706105918.davidrothman@pobox.com>

A feathery troll with competing business interests is once again abusing this PG list to smear the OpenReader e-book standard and the TeleRead Web Log. 

The normal rule is, "Don't feed the troll." But every now and then, as cofounder of OpenReader and moderator of the TeleBlog, I just may pop up with the facts for the benefit of newbies who don't yet know what's going on.

Yes, e-book software from OSoft, our first implementer, been available for several years now to do SHARED annotations. Embedded forums and even blogs inside books will be on the way. Imagine how this could help such wonderful activities as collaborative learning in K-12.

We're talking here about dotReader, the new name for OSoft's ThoutReader, except it'll work with the OpenReader format. dotReader is a real app for real users--developed by a real company, as opposed to a troll in the basement, so to speak. 

dotReader is named for Dorothy Thompson, an early foe of fascism and the leading female news commentator of the 30s and 40s. Miss Thompson also happened to be married to Sinclair Lewis, one of my favorites. She loved to annotate her books and share her literary enthusiasm with friends. Fittingly, then, an early book for dotReader will be Peter Kurth's "American Cassandra: The Life of Dorothy Thompson," which the L.A. Times hailed as "exemplary biography, thoroughly researched and entertainingly written." The next will very likely be a food guide, also the subject of stellar reviews. Readers will be able to use dotReader's interactive capabilities to help the coauthors of the food book stay up to date. Needless to say, OSoft is particularly keen on seeing public domain literature in its format. Anyway, I hope I've made this case for the R Word--Real.

The real company behind dotReader consists of two super-hardworking guys who've bet their assets in a serious way and hired other programmers. This is American business and technology at its best. I'm reminded of Preston Tucker, who was to Ford, GM and the others what OSoft is to Adobe, Microsoft, Amazon and the rest (http://en.wikipedia.org/wiki/Preston_Tucker). It's a shame that the troll is less interested in disc brakes than in badmouthing the Tuckers of e-book software.

See the true details for yourself at http://www.dotreader.org. The site for the OpenReader standard is at http://www.openreader.org. The TeleBlog, where I've often discussed related issues, is at http://www.teleread.org/blog. You can reach OSoft CEO Mark Carey and CTO Gary Varnell through the info at http://www.dotreader.com/site/?q=node/21. Sign up for progress reports via dotReader's home page. 

Like the troll, the proprietary formatters aren't too happy. But OpenReader and dotReader are happening anyway; and I'm expecting an outcome much happier than Tucker's. OpenReader and dotReader have just returned from a highly successful visit to BookExpo America. Some major publishers have expressed serious interest. It's just a matter of getting one of them to be the first to benefit from OpenReader and dotReader--a major but not insurmountable challenge in a conservative industry such as publishing. Gary and Mark are also keen on hearing from smaller publishers, whose use of the format and compatible software will help with the larger houses. More importantly, the cause of small publishers is worthy in itself.

The first version of dotReader will hit the Net for free this summer, thereby driving the troll crazy since all along he kept claiming that the OpenReader standard would be just vaporware. Might there be delays? Of course, as with any quality-minded effort. But yes, the launch will happen soon, and it would be good if the troll surpassed low expectations and apologized for his persistent falsehoods and abuse of the PG list.

> ... amazon -- which announced such a feature
will be available soon in mobipocket -- could save themselves "a fortune in
development costs" by using openreader instead of mobipocket.  given that
it's relatively trivial to embed this capability, i'm not sure what he's thinking.

Amazon could not give me a date when Mobipocket would have the shared annotations feature, so, in a Mobi context, we're not necessarily talking "soon" at all. The shared annotations will be available for now only via a Web browser, an e-book museum kind of deal, rather than downloadable e-book files. Besides, whether the issue is standards-compliance or user customization, dotReader does indeed leave Mobipocket behind in the dust. Those guys either don't get standards or bungled Mobipocket in certain ways. I love Mobipocket compared to Adobe, but OSoft's dotReader will be much better than either, and Amazon and rivals would be damn foolish not to use this rather affordable technology, which they would be free to rebrand. 

The main credit for dotReader, of course, goes to Gary Varnell and Mark Cary at OSoft, but Jon and I have contributed hundreds of suggestions to OpenReader first implementer. What's more, another terrific implementation, FBReader, is on the way; and you can bet I'll be bragging about FBReader, too, going by the high quality of the work (I'm basing this statement on what friends have told me). Other fine implementers cherished! Catch up with jon@openreader.org for OpenReader's preliminary specs and also to provide him your own feedback. Moreover, if you're a publisher, including the public domain variety, give us your thoughts on the traits you'd like for authoring and translation tools. Feedback from small guys especially cherished!

I'd love for Gutenberg itself to offer OpenReader format and for, say, Michael Hart or Greg Newby to participate constructively in the standards-setting process. We want the OpenReader process moved to a group such as OASIS so we link up with established standard-setting. 

Yes, Michael, OpenReader-to-ASCII conversion will be trivial. Contrast that with Amazon's proprietary approach. Like the troll, by the way, Amazon reads my TeleRead blog. I hear you do, too. That's great. Despite my disagreements over various issues, such as the best way to achieve true QC, I continue to wish for Project Gutenberg's success since I don't want governments, library bureaucrats and big publishers or even small ones to dictate which books survive. We need all kind of approaches. It is unfortunate that the troll is casting issues in terms of one vs. another. I'd love to see my own library efforts reinforce PG's and vice versa.

As for the troll's competing business interests vs. OpenReader's:

1. Jon Noring tells me that the feathery troll has given his ZML-related app a $50K price tag. True? Or is the troll just doing his reader as part of an Albert Schweitzer act for the good of humanity. If the price is a mere $50K, I'm amazed. Isn't the app worth with more? Poor creature. The many hours spent trolling against OpenReader are seemingly part of the development costs of his ZML work. I'm flattered that we're such a high priority. Before the troll gave up, he'd made hundreds of posts of the TeleBlog, and many and perhaps even most of them were either diatribes against either OpenReader or those involved.

2. From what I hear--true?--the troll is refusing to open source his app. By contrast, OSoft, despite having bet hundreds of thousands of dollars on development costs, is open-sourcing everything but the rather optional DRM that it added only at the vehement insistence of publishers. 

3. If the troll's ZML surprised the cosmos and caught on, it would be trivial for the free dotReader to read it since it'll work with almost any XMLish format despite the focus on OpenReader, the new standard. Whoops. There goes the business model of either the troll or whoever buys his app.

On a very related topic, no, the troll was not banned from the TeleBlog, but I do agree with him that it would be a big time-waster for him to return to the area and continue his trolling. Then he'd go on moderation, waste a lot of people's time and perhaps end up after all being the first of many hundreds of commenters to be banned--well, with the exception of the less subtle spammers. For the troll's acknowledgment that he didn't want to play by the TeleBlog's rules, check out the message below. You'll see that when our spam filters kept eating up the troll's remarks, Jon Noring offered to work with the him to solve the problem for both him and the TeleBlog; we don't like any legit comments to vanish. Alas, however, the troll turned down the suggestion. All we wanted in return was civil conduct. For those who don't know, Jon is a TeleBlog participant and the main founder of OpenReader--someone who, unlike the troll, has been highly active in mainstream e-book standards setting efforts with both small and large publishers involved. 

Significantly, the TeleBlog is Usenet NOT. We are a community where we compare notes on e-book technology and news, and where we welcome constructive disagreement rather than the trollish kind so well defined in the Wikipedia (http://en.wikipedia.org/wiki/Internet_troll). If our ill-feathered friend changes his ways and acts civilly, rather than doing troll spams, of course he'll be welcome back--with the expectation that he'll be a full community member as opposed to working constantly for his $50K or whatever he wants for his app. Along with the other TeleBlog regulars, I love Usenet as Usenet, but we'll not alter the community nature of the TeleBlog to make it troll-friendly.

As for the PG list, which the troll has used to disparage not just me and OpenReader but also the whole TeleBlog community (supposedly we're all "unrealistic" or whatever the adjective--not just the evil moderator), it's the decision of Michael and Greg. If they want to keep the PG list unmoderated, that's fine. At the same time they would do well to remember how Joseph McCarthy took advantage of the "objective" nature of the press to engage in never-ending character assassination against opponents, just as the troll is doing against those he sees as business rivals standing in the way of his $50K and his pride. See http://en.wikipedia.org/wiki/Mccarthyism for more on McCarthy's lies, which branded a number of people as unAmerican, ranging the composer Aaron Copeland to the novelist Dashiell Hammett. 

The troll's McCarthy-style posts are a vicious and highly disreputable use of the PG list, and it's high time that more responsible members of the list asked the troll to stop them. As Joseph Welch, a U.S. Army lawyer, told the late Senator from Wisconsin: "You've done enough. Have you no sense of decency, sir, at long last?  Have you left no sense of decency?" The troll is not just an innocent clown. While he delights in jousting with those he perceives as enemies and is motivated by his product-related pride as much as anything else, he also wants cash for this competing reader app; and he doesn't care whom he smears along the way.

Michael Hart, whom the troll admires, and whom I do, too, regardless of my disagreements on certain matters, perhaps can tell the troll to stop doing his McCarthyism act. The troll's Internet McCarthyism--this repeated use of outright lies in a forum to which he has easy access--reflects badly on the PG list and lowers newbies' impression of a valuable group like PG.

Thanks,
David Rothman
Cofounder of the OpenReader Consortium
http://www.openreader.org
Moderator of the TeleBlog
http://www.teleread.org/blog
davidrothman@openreader.org | dr@teleblog.org | 703-370-6540

P.S. This list is normally devoted to PG, not TeleRead or OpenReader. I wouldn't be commenting here except that the troll keeps gratuitously introducing these topic in a negative and misleading way. 

================================================

------------Original Message------------
From: Bowerbird@aol.com
To: gutvol-d@lists.pglaf.org, Bowerbird@aol.com
Date: Thu, Apr-13-2006 7:22 PM
Subject: re: [gutvol-d] don't believe everything you read on the internets
like i said, it doesn't matter to me if i'm banned or not,
because i want to stop wasting time over there anyway.

and since i am certainly not about to change what i say
_or_ how i say it, it would only be a matter of time before
i was banned eventually, "for the good of the community".

besides, jon, with adobe plotting against openreader,
you've got much bigger fish to fry than my fat old ass...

-bowerbird

From Bowerbird at aol.com  Mon May 22 09:02:13 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 22 09:02:27 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <424.1cc65e1.31a33a85@aol.com>

michael said:
>    A friend asked him why. . .
>    he said it was due to space limitations.

well, my one quibble with the article was when he
mentioned the google library scanning as the main
impetus for resurging interest about a cyberlibrary.

while that is undoubtedly true, i thought that he
could have mentioned project gutenberg as well.
it would've been a nod to your historic presence.

however, if you want your library to move forward
into the future that is being discussed, you'll need
to consider the constructive criticisms i have issued.

because if "books reading other books" _does_ take off,
your library of standalone files which cannot will languish.
you have the advantage now that most all of your files are
pure ascii, so it behooves you to leverage that advantage...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/8aaa4a41/attachment.html
From Bowerbird at aol.com  Mon May 22 09:11:39 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 22 09:11:48 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <1bc.5089b96.31a33cbb@aol.com>

geez, david, you'll have to do _something_
more original than calling me a "troll" here;
that "strategy" was worn out years ago here...

as for your fantasy that i "object" to openreader
because i have "competitive business interests",
what exactly would those "business interests" be?

my source code is available for $200,000, (not $50k),
but i will tell anyone who wants to buy it not to bother,
that they should just go off and write it themselves, so
i don't think you could call me a very good businessman.

as for my app itself, i've always said it's available for free.
again, not a good businessman.   what can i say?   i'm a poet.

a good businessman would get a booth at book expo,
and hawk the product there.   just like you're doing now.

for the record, i wish the osoft people all the best.
you were very lucky they came along when they did,
and i'll be glad when they finally turn your precious
"openreader" format into something besides vapor.

at that point, we will be able to measure its benefits and
its costs, to decide how worthwhile its cost-benefit will be.

considering the costs involve application of heavy markup,
and similar benefits can be delivered with lighter markup,
i'm not sure how you will ever be able to convince anyone
that your cost-benefit ratio will be the best one available.
but hey, microsoft has been able to do that for _years_, so
don't give up...

-bowerbird

p.s.   evidently, book expo ain't keeping you very busy.
kinda slow around the openreader booth, is it?        :+)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/f413137f/attachment.html
From hart at pglaf.org  Mon May 22 10:54:42 2006
From: hart at pglaf.org (Michael Hart)
Date: Mon May 22 10:54:44 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <424.1cc65e1.31a33a85@aol.com>
References: <424.1cc65e1.31a33a85@aol.com>
Message-ID: <Pine.LNX.4.60.0605221045230.32683@pglaf.org>


On Mon, 22 May 2006 Bowerbird@aol.com wrote:

> michael said:
>>    A friend asked him why. . .
>>    he said it was due to space limitations.
>
> well, my one quibble with the article was when he
> mentioned the google library scanning as the main
> impetus for resurging interest about a cyberlibrary.

Obviously the press coverage about "Google library scanning"
has done more "as the main impetus for resurg[ent] interest
in a cyberlibrary" than the actualy scanning itself.

We are coming up on the 18 month anniversary of the monster
press blitz that announced,

"This is the day the world changes."

And the latest estimated I have received show that Google's
total number of books has just recently passed 50,000, then
again similar reports say that 88% are neither downloadable
nor proofread to any particular level of accuracy.

If we double that number to 100,000, we could pretend these
results indicated that Google had accomplished 1% of a goal
of 10,000,000 books, in 25% of their 6 year plan.


> while that is undoubtedly true, i thought that he
> could have mentioned project gutenberg as well.
> it would've been a nod to your historic presence.

Somehow I don't think this was accidental. . . .

Same with WIRED's approach since Conde Naste. . . .


mh
From jmdyck at ibiblio.org  Mon May 22 12:26:12 2006
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Mon May 22 12:26:15 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <Pine.LNX.4.60.0605221045230.32683@pglaf.org>
References: <424.1cc65e1.31a33a85@aol.com>
	<Pine.LNX.4.60.0605221045230.32683@pglaf.org>
Message-ID: <44721054.3000104@ibiblio.org>

Michael Hart wrote:
> 
> If we double that number to 100,000, we could pretend these
> results indicated that Google had accomplished 1% of a goal
> of 10,000,000 books, in 25% of their 6 year plan.

In 1993, PG had accomplished 1% of its goal of 10,000,
in about 70% of the total time.

-Michael

From Bowerbird at aol.com  Mon May 22 12:55:40 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 22 12:55:53 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <439.1c91c11.31a3713c@aol.com>

michael said:
>    Obviously the press coverage about "Google library scanning"
>    has done more "as the main impetus for resurg[ent] interest
>    in a cyberlibrary" than the actualy scanning itself.

well, d'uh, of course.
that's how it always is.


>    And the latest estimated I have received show that Google's
>    total number of books has just recently passed 50,000

i do believe you misread that.   50,000 public-domain titles,
with another 42,000 under copyright, for a total of 92,000.

but even if it is just 50,000 total, they're still on my schedule:
i predicted 10,000 after one year, 100,000 after two years,
1 million after three years, and 10 million after four years...


>    similar reports say that 88% are neither downloadable
>    nor proofread to any particular level of accuracy.

except it's not google's job to make them downloadable,
not in convenient form, nor to proofread the digitized text.

it is _our_ job to grab the scans (as nicely and neatly as possible,
courteous and respectful of the cost they entailed by scanning),
and to make them available in a convenient format for reading,
as well as to formulate automatic procedures to digitize the text
and take it to a very high degree of accuracy.

even if google did do these jobs for us, i would still replicate it,
because i don't want to have to be dependent on google forever.


>   Somehow I don't think this was accidental. . . .

the point is, if your books were _already_ "reading each other",
people would have been talking about it long before this article.

-bowerbird

p.s.   i see you're one of those old-fashioned people
who refuse to recognize "resurging" as an adjective.
it's ok.   hopefully, if i keep using it that way, i'll win.
(i'm trying to change the usage of "hopefully" with
the same strategy.)                 :+)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/b81ad113/attachment.html
From fvandrog at scripps.edu  Mon May 22 13:32:04 2006
From: fvandrog at scripps.edu (Frank van Drogen)
Date: Mon May 22 13:47:05 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital
  libraries
In-Reply-To: <439.1c91c11.31a3713c@aol.com>
References: <439.1c91c11.31a3713c@aol.com>
Message-ID: <6.2.0.8.0.20060522132732.02eaa4c8@mail.scripps.edu>


> >   And the latest estimated I have received show that Google's
> >   total number of books has just recently passed 50,000
>
>i do believe you misread that.  50,000 public-domain titles,
>with another 42,000 under copyright, for a total of 92,000.

Even that number is a misinterpretation. There's at the moment 92000 
pre-1923 books available from Google Print. The 50.000 that google has made 
fully downloadable have a clear pre-1923 copyright statement; the 42.000 
don't have a clearcut copyright statement and thus Google only gives the 
snippet option.

I've never read numbers about the post-1923 books available, Bruce doesn't 
look for those in his various searches, as far as I am aware.

Frank 

From Bowerbird at aol.com  Mon May 22 13:57:39 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 22 13:57:45 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <48c.693268.31a37fc3@aol.com>

frank said:
>    Even that number is a misinterpretation

thanks for clearing that up for us, frank...

it's clear that google has gotten their legs under them
in regard to doing the scanning.   let's hope that they'll
get their quality-control under control very soon too...

it is important to keep in mind that 100,000 books is
<1% of the 10.5 million (or more) they'll do eventually;
it's understandable if the process isn't up to speed yet.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/690ed23f/attachment.html
From Bowerbird at aol.com  Mon May 22 14:01:14 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 22 14:01:22 2006
Subject: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries]
Message-ID: <48b.696020.31a3809a@aol.com>

david said:
>    You gratuitously attacked OpenReader out of the blue.

no, i didn't.

but i'm sure you'd like to spin it that way.

what i said was that you were making a big deal about
"putting a blog inside an e-book", when that is actually
a somewhat trivial thing to do.

look, i've inserted a blog inside of this e-mail:
>   
http://www.buzzmachine.com/index.php/2006/05/19/the-book-is-dead-long-live-the-book/

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/3fb1584a/attachment.html
From fvandrog at scripps.edu  Mon May 22 14:19:48 2006
From: fvandrog at scripps.edu (Frank van Drogen)
Date: Mon May 22 14:19:42 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital
  libraries
In-Reply-To: <48c.693268.31a37fc3@aol.com>
References: <48c.693268.31a37fc3@aol.com>
Message-ID: <6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu>


>it's clear that google has gotten their legs under them
>in regard to doing the scanning.  let's hope that they'll
>get their quality-control under control very soon too...

I have found less missing pages and other problems in books from Google 
then in those from the MBP and Canadian/IA. They are, however, still far 
from perfect. When they get a report regarding a missing or wrongly scanned 
page in a PD book; it is apparently up to the providing library to get the 
problem sorted out. I've heard report of complete books being rescanned 
(with the risk of having another page missing in the end ;) ). I've also 
heard somebody mentioning that the full rescanned book was stuck behind the 
existing one (rather space consuming, but for DP purposes a lot saver.

What worries me in this is that Google doesn't seem to care whether pages 
are missing or not... as long as they get 99% of the pages from a book 
stored, changes are most search terms pointing to the particular book will 
be identified. Their interest lies in people purchasing the book via 
Amazon, Abe etc. after identifying them via book.google.com.

The best quality control I have encountered so far is on Gallica, where 
appart from missing pages due to those pages missing in the original 
scanned manuscript, I've not encountered incomplete books. I'd be actually 
interesting to see how they perfrom their quality control.

Frank 

From marcello at perathoner.de  Mon May 22 14:37:28 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon May 22 14:37:31 2006
Subject: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin
	Kelly	in NYT on future of digital libraries]
In-Reply-To: <20060522102125.1706105918.davidrothman@pobox.com>
References: <370.3e9dbdc.31a0e32b@aol.com>	<20060522080441.1099670084.davidrothman@pobox.com>	<20060522080516.148514185.davidrothman@pobox.com>	<20060522081044.833029587.davidrothman@pobox.com>	<20060522081452.935214828.davidrothman@pobox.com>	<20060522081741.1613771435.davidrothman@pobox.com>	<20060522081906.196095457.davidrothman@pobox.com>	<20060522082107.414609285.davidrothman@pobox.com>	<20060522082126.36116734.davidrothman@pobox.com>	<20060522082511.185524298.davidrothman@pobox.com>	<20060522082539.808272106.davidrothman@pobox.com>	<20060522082604.1326329017.davidrothman@pobox.com>	<20060522082619.810954139.davidrothman@pobox.com>	<20060522082659.59838956.davidrothman@pobox.com>	<20060522082704.141781736.davidrothman@pobox.com>	<20060522082714.1016313511.davidrothman@pobox.com>	<20060522083446.993232898.davidrothman@pobox.com>	<20060522101653.1962499062.davidrothman@pobox.com>	<20060522102055.1501635176.davidrothman@pobox.com>
	<20060522102125.1706105918.davidrothman@pobox.com>
Message-ID: <44722F18.4080204@perathoner.de>

David H. Rothman wrote:


> See the true details for yourself at http://www.dotreader.org. 

It says:

  www.dotreader.org

  This page is parked free, courtesy of GoDaddy.com

Did the .reader bubble already burst?


> I'd love for Gutenberg itself to offer OpenReader format

We already offer most books in plucker. That's because they are open
format, widely deployed and offer an open toolchain.

We serverd 89504 plucker books in May 2006.

We'll see about OpenReader once you'll have widely deployed your .reader
and made available an open toolchain.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Mon May 22 14:41:25 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 22 14:41:34 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <436.1cc35e4.31a38a05@aol.com>

frank said:
>    I'd be actually interesting to see how they perfrom their quality 
control.

must be something i'm missing, because quality control seems _easy_ to me.

you step through the scan-set making sure you've got every page-number,
and then you step through it again making sure every scan was a clean one
that captured all of the text on the entire page without any blurs anywhere.
if a page is bad, you redo the page, while the book is still right in your 
hands.

on the other hand, getting a report of a bad page later means that you must
go to all of the difficulty of fetching the book again, which is a pain in 
the ass.

so, to my mind, the "learning curve" on any scanning project is
learning to do it right the first time, so you don't have to re-do it.

-bowerbird

p.s.   the idea that google rescans the whole book if a page is reported 
missing
makes them seem downright stupid.   if they keep that up, it'll take 'em 20 
years.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/40406b42/attachment.html
From gbnewby at pglaf.org  Mon May 22 14:47:26 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon May 22 14:47:27 2006
Subject: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin
	Kelly	in NYT on future of digital libraries]
In-Reply-To: <44722F18.4080204@perathoner.de>
References: <20060522082604.1326329017.davidrothman@pobox.com>
	<20060522082619.810954139.davidrothman@pobox.com>
	<20060522082659.59838956.davidrothman@pobox.com>
	<20060522082704.141781736.davidrothman@pobox.com>
	<20060522082714.1016313511.davidrothman@pobox.com>
	<20060522083446.993232898.davidrothman@pobox.com>
	<20060522101653.1962499062.davidrothman@pobox.com>
	<20060522102055.1501635176.davidrothman@pobox.com>
	<20060522102125.1706105918.davidrothman@pobox.com>
	<44722F18.4080204@perathoner.de>
Message-ID: <20060522214726.GD4985@pglaf.org>

On Mon, May 22, 2006 at 11:37:28PM +0200, Marcello Perathoner wrote:
> David H. Rothman wrote:
> 
> 
> > See the true details for yourself at http://www.dotreader.org. 
> 
> It says:
> 
>   www.dotreader.org
> 
>   This page is parked free, courtesy of GoDaddy.com
> 
> Did the .reader bubble already burst?

That's the funniest thing I've read all month.  Thanks!

dotreader.com seems a better source to start.  I'm not sure
if it has true details or not, though.
  -- Greg
From jeroen.mailinglist at bohol.ph  Mon May 22 14:46:04 2006
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Mon May 22 15:15:03 2006
Subject: [gutvol-d] Outsourcing scanning
In-Reply-To: <6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu>
References: <48c.693268.31a37fc3@aol.com>
	<6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu>
Message-ID: <4472311C.3020408@bohol.ph>


After demonstrating PGDP to some people, I got in touch with an NGO who 
likes to scan their entire holdings, and make it available on the web.

Has anybody on this list experience with outsourcing scanning jobs (on a 
larger scale)? I am looking at a project which includes about half a 
million pages that need to be digitized. Ofcourse I am not going to scan 
that much myself, and I heard prices at that scale can be as low as a 
few cents per page when done in the Philippines. Has anybody prepared 
documents describing quality control processes, etc., for such a bulk 
process. Hopefully, much of the material will be made available on-line, 
(although it will not copyright clear with PG, I don't expect issues 
with copyright). I may even set up a 'Distributed Proofreading' system 
for it.

Jeroen.

From davidrothman at yahoo.com  Mon May 22 15:39:53 2006
From: davidrothman at yahoo.com (David H. Rothman)
Date: Mon May 22 15:47:09 2006
Subject: DotReader.com adr. [Re: OpenReader vs. the troll in the basement [re:
	[gutvol-d] Kevin Kelly in NYT on future of digital libraries]]
Message-ID: <5eff08fa0605221539t6d8ae4e4l19de369b4d5f6c08@mail.gmail.com>

Actually that's http://www.dotreader.com , not .org --mea culpa--and if PG
really cares about open source, then it should encourage strong open source
efforts of the OSoft variety rather than just wait until they catch on.

Here's a little two-man software house in Tacoma, Washington, gambling
hundreds of thousands of dollars on an open-source reader that can do far
more than Plucker, allowing blogs and forums to be embedded inside books.

Plucker has many appreciative users, but dotReader/OpenReader will be of far
greater importance to commercial publishers, who are already starting to
show interest.

In turn, that'll be wonderful for PG works and other public domain books.
dotReader reader can work with many kinds of books while improving the user
experience.

dotReader uses a turbocharged version of existing e-book standards that
techies and publishers have thrashed aroundfor years.

It's the best of all worlds: open source for programmers and a powerful free
reader for users--and e-book standards similar to existing ones for
publishers. Plus, dotReader can handle other XML/CSS-related formats as
well.

> We serverd 89504 plucker books in May 2006.

I think you'll do much better with OpenReader available as well. OSoft's
e-reader for the format is a thing of beauty, and, as noted, it'll be free
to download.

Plus, another awesome implementation is planned via the FBReader, which,
according to the Wikipedia, is catching on among Nokia 770 users. See
http://only.mawhrin.net/fbreader/plans.html and
http://en.wikipedia.org/wiki/Plucker.

Thanks,
David

David Rothman | davidrothman@openreader.org | 703-370-6540
TeleRead: http://www.teleread.org/blog

On 5/22/06, Marcello Perathoner <marcello@perathoner.de> wrote:
>
> David H. Rothman wrote:
>
>
> > See the true details for yourself at http://www.dotreader.org.
>
> It says:
>
>   www.dotreader.org
>
>   This page is parked free, courtesy of GoDaddy.com
>
> Did the .reader bubble already burst?
>
>
> > I'd love for Gutenberg itself to offer OpenReader format
>
> We already offer most books in plucker. That's because they are open
> format, widely deployed and offer an open toolchain.
>
> We serverd 89504 plucker books in May 2006.
>
> We'll see about OpenReader once you'll have widely deployed your .reader
> and made available an open toolchain.
>
>
> --
> Marcello Perathoner
> webmaster@gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/ab9b63ce/attachment.html
From davidrothman at yahoo.com  Mon May 22 16:04:52 2006
From: davidrothman at yahoo.com (David H. Rothman)
Date: Mon May 22 16:04:56 2006
Subject: dotReader ][Re: OpenReader vs. the troll in the basement [re:
	[gutvol-d] Kevin Kelly in NYT on future of digital libraries]]
Message-ID: <5eff08fa0605221604v474771cey34c15e144a55c9b@mail.gmail.com>

Hi, Greg. Actually dotReader is ANTI-bubble. E-books won't catch on
big until they the technology improves, among other things; and the
dotReader/OpenReader combo can offer major interactivity in a big way.

I've already mentioned blogs and forums embedded inside e-books (and
available even for off-line reading).

As for the desirability of interactivity and multimedia for
consumers--that's no small factor, according to the esteemed Greg
Newby. dotReader/OpenReader will oblige in both areas.

Meanwhile see below from USA Today, especially the last paragraph: "To
be compelling enough to trigger any kind of mass migration away from
paper books, e-books will need to have compelling characteristics
regular books don't, such as interactivity and mixed-media
capabilities, Newby and others said."

Can we really trust this guy? ;-) I'd like to think so.

I hope he and others in PG will be open-minded about both the format
and the implementations, as opposed to letting the trolls and their
buddies set the tone for PG.

Cheers,
David

http://thelifeofbooks.blogspot.com/2006/04/whats-trouble-with-ebooks.html

We don't see a lot of resistance to electronic books per se," said
Gregory Newby, director of Project Gutenberg, the first electronic
library, which offers 20,000 titles for free. "What we see are
limiting factors in specialized readers and difficulty in finding good
stuff to read." Plus, "publishers are charging the same amount for an
electronic book as for a paper book."

There are other challenges too. With e-book readers, people may be
able to store numerous texts in one small device and do things to make
reading easier, such as changing type size, something that's
impossible with print. But people also like to share books with
others, resell them and hand them down to their children, he said.

"When you buy a book, you have it forever," Newby said. "With these
electronic books, you often are prevented from doing those things that
you can do with regular books. What happens when my device
breaks?...Books aren't just words on a page. They are things you can
trade, share and store for later."

To be compelling enough to trigger any kind of mass migration away
from paper books, e-books will need to have compelling characteristics
regular books don't, such as interactivity and mixed-media
capabilities, Newby and others said.


On 5/22/06, Greg Newby <gbnewby@pglaf.org> wrote:
> On Mon, May 22, 2006 at 11:37:28PM +0200, Marcello Perathoner wrote:
> > David H. Rothman wrote:
> >
> >
> > > See the true details for yourself at http://www.dotreader.org.
> >
> > It says:
> >
> >   www.dotreader.org
> >
> >   This page is parked free, courtesy of GoDaddy.com
> >
> > Did the .reader bubble already burst?
>
> That's the funniest thing I've read all month.  Thanks!
>
> dotreader.com seems a better source to start.  I'm not sure
> if it has true details or not, though.
>   -- Greg
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From joshua at hutchinson.net  Mon May 22 16:04:51 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon May 22 16:26:37 2006
Subject: [gutvol-d] Re: DotReader.com adr. [Re: OpenReader vs. the troll in
 the basement [re: [gutvo
Message-ID: <20060522230451.E698AEE422@ws6-1.us4.outblaze.com>

The show-stopper isn't the relative popularity (people have successfully argued lots of things in PG's history that didn't seem too popular at first blush).  What we *need* is a conversion utility (the "open tool-chain" that Marcello refers to).

I don't expect a conversion tool that can take our somewhat sloppy (at times) encoded plain text.  While wonderful, that is asking too much.  But I *do* want a conversion tool that can run from PGTEI to OpenReader format.

If the tool chain is open source and runs on Linux, we can probably have a quick and dirty converter up relatively quickly.  As always, I'd be happy to test the heck out of it.

Josh


> ----- Original Message -----
> From: "David H. Rothman" <davidrothman@yahoo.com>
> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
> Subject: DotReader.com adr. [Re: OpenReader vs. the troll in the basement [re:	[gutvol-d] Kevin Kelly in NYT on future of digital libraries]]
> Date: Mon, 22 May 2006 18:39:53 -0400
> 
> 
> Actually that's http://www.dotreader.com , not .org --mea culpa--and if PG
> really cares about open source, then it should encourage strong open source
> efforts of the OSoft variety rather than just wait until they catch on.
> 
> Here's a little two-man software house in Tacoma, Washington, gambling
> hundreds of thousands of dollars on an open-source reader that can do far
> more than Plucker, allowing blogs and forums to be embedded inside books.
> 
> Plucker has many appreciative users, but dotReader/OpenReader will be of far
> greater importance to commercial publishers, who are already starting to
> show interest.
> 
> In turn, that'll be wonderful for PG works and other public domain books.
> dotReader reader can work with many kinds of books while improving the user
> experience.
> 
> dotReader uses a turbocharged version of existing e-book standards that
> techies and publishers have thrashed aroundfor years.
> 
> It's the best of all worlds: open source for programmers and a powerful free
> reader for users--and e-book standards similar to existing ones for
> publishers. Plus, dotReader can handle other XML/CSS-related formats as
> well.
> 
> > We serverd 89504 plucker books in May 2006.
> 
> I think you'll do much better with OpenReader available as well. OSoft's
> e-reader for the format is a thing of beauty, and, as noted, it'll be free
> to download.
> 
> Plus, another awesome implementation is planned via the FBReader, which,
> according to the Wikipedia, is catching on among Nokia 770 users. See
> http://only.mawhrin.net/fbreader/plans.html and
> http://en.wikipedia.org/wiki/Plucker.
> 
> Thanks,
> David
> 
> David Rothman | davidrothman@openreader.org | 703-370-6540
> TeleRead: http://www.teleread.org/blog
> 
> On 5/22/06, Marcello Perathoner <marcello@perathoner.de> wrote:
> >
> > David H. Rothman wrote:
> >
> >
> > > See the true details for yourself at http://www.dotreader.org.
> >
> > It says:
> >
> >   www.dotreader.org
> >
> >   This page is parked free, courtesy of GoDaddy.com
> >
> > Did the .reader bubble already burst?
> >
> >
> > > I'd love for Gutenberg itself to offer OpenReader format
> >
> > We already offer most books in plucker. That's because they are open
> > format, widely deployed and offer an open toolchain.
> >
> > We serverd 89504 plucker books in May 2006.
> >
> > We'll see about OpenReader once you'll have widely deployed your .reader
> > and made available an open toolchain.
> >
> >
> > --
> > Marcello Perathoner
> > webmaster@gutenberg.org
> >
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d@lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

>

From vze3rknp at verizon.net  Mon May 22 16:34:02 2006
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Mon May 22 16:33:50 2006
Subject: DotReader.com adr. [Re: OpenReader vs. the troll in the basement
	[re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries]]
In-Reply-To: <5eff08fa0605221539t6d8ae4e4l19de369b4d5f6c08@mail.gmail.com>
References: <5eff08fa0605221539t6d8ae4e4l19de369b4d5f6c08@mail.gmail.com>
Message-ID: <44724A6A.2030107@verizon.net>


David H. Rothman wrote:

> It's the best of all worlds: open source for programmers and a 
> powerful free reader for users--and e-book standards similar to 
> existing ones for publishers. Plus, dotReader can handle other 
> XML/CSS-related formats as well.

If dotReader can handle XML/CSS-related formats, then many of the more 
recent PG books are already available for it, since they have been 
produced in xhtml. Most of the output from DP these days comes with an 
xhtml version.

JulietS

From davidrothman at yahoo.com  Mon May 22 16:34:02 2006
From: davidrothman at yahoo.com (David H. Rothman)
Date: Mon May 22 16:34:04 2006
Subject: OpenReader vs. the troll in the basement [Re: [gutvol-d] re: Kevin
	Kelly in NYT on future of digital libraries]]
Message-ID: <5eff08fa0605221634r3c2ea804l4753d5d0142badd1@mail.gmail.com>

BIRD: In the snippet below, you simply LINKED to a blog. dotReader,
OpenReader's first implementation, will CONTAIN blogs and forums and
make them readable to users even when they're offlline. They'll be
embedded INSIDE the books, as you well know from the TeleRead Web log.
It's one thing to talk about embedded blogs and forums. It's another
thing to offer them in a viable reader, as OSoft will be doing via
dotReader.

GREG: Well, as I said, beware of trolls and friends setting the tone
for PG. I trust you'll be more open-minded in choosing technologies
and formats. dotReader will offer the interactivity and multimedia
capabilities you were talking up in USA Today. Call me at 703-370-6540
if you want to begin some friendly and constructive dialogue.

Thanks,
David

Bowerbird wrote:


>
> what i said was that you were making a big deal about
>  "putting a blog inside an e-book", when that is actually
>  a somewhat trivial thing to do.
>
>  look, i've inserted a blog inside of this e-mail:
>  >   http://www.buzzmachine.com/index.php/2006/05/19/the-book-is-dead-long-live-the-book/
>
>
From Bowerbird at aol.com  Mon May 22 17:05:22 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 22 17:05:30 2006
Subject: [gutvol-d] re: blogs in e-books
Message-ID: <491.561e9f.31a3abc2@aol.com>

david said:
>    BIRD: In the snippet below, you simply LINKED to a blog. 

and in most cases, that will be quite good enough, thanks,
because people can then read the blog there, and comment.

but since you've brought it up, will these "blogs and forums"
that are "contained" within openreader e-books from osoft
be addressable by the general public using web-browsers?

or, like the current osoft thoutreader, will people need to use
that particular piece of software in order to view the comments?

a clear answer will tell us a lot about your attitude on "lock-in".


>    dotReader, OpenReader's first implementation, will 
>    CONTAIN blogs and forums and make them readable 
>    to users even when they're offlline.

it's not hard to implement that...

the app just downloads the content,
and saves it for offline presentation,
then uploading what is to be posted,
capabilities already included in many
r.s.s. readers and blogging software...

depending on if people actually use it,
it could end up being a neat technology,
much like all the other instantiations of it
i discussed in my earlier post in this thread.

my point was that it's not difficult to implement.

it's not.

so why are you hyping it like it's such a big deal?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/4f5a2da0/attachment-0001.html
From davidrothman at yahoo.com  Mon May 22 17:10:42 2006
From: davidrothman at yahoo.com (David H. Rothman)
Date: Mon May 22 17:17:10 2006
Subject: [gutvol-d] Re: DotReader.com adr. [Re: OpenReader vs. the troll
	in the basement [re: [gutvo
In-Reply-To: <20060522230451.E698AEE422@ws6-1.us4.outblaze.com>
References: <20060522230451.E698AEE422@ws6-1.us4.outblaze.com>
Message-ID: <5eff08fa0605221710j10fadbfah391bc01a642adbf7@mail.gmail.com>

Thanks to both Juliet and Josh for their useful thoughts. Let me add
that OpenReader files from an XML/CSS-related format or another would
be highly desirable if one cares about format standards as well as
reader standards. Format standards would be one way to help public
domain books get closer to the mainstream of publishing. Many public
librararies dispense e-books only in a few formats. I hate the idea of
their paying for public domain books, and format standardization could
help. I want libraries to be able to give away public domain books for
keeps rather than just loan them. While I hope that dotReader will
catch on, let's think format as well to be safe. Either way, though,
with or without OpenReader, dotReader could be very good for PG and DP
alike. Sooner or later OpenReader will catch on through other means,
and the readers (both human and software) will be ready.

David

David Rothman
davidrothman@openreader.org
dr@teleread.org
http://www.teleread.org/blog
703-370-6540

JulietS wrote:

If dotReader can handle XML/CSS-related formats, then many of the more
recent PG books are already available for it, since they have been
produced in xhtml. Most of the output from DP these days comes with an
xhtml version.

JulietS
From Bowerbird at aol.com  Mon May 22 17:21:56 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 22 17:22:03 2006
Subject: [gutvol-d] re: blogs in e-books
Message-ID: <45a.12dcc5d.31a3afa4@aol.com>

by the way, if anyone wants to see an example of
an experiment aimed at eliciting reader interaction
at the stage of a "finished first draft" of a book, see:
>    http://www.futureofthebook.org/gamertheory/

this project of "the institute for the future of the book"
just went up today.   in addition to an ability to comment
on any _paragraph_ of the book, there is a general forum.

i'm of the opinion that most books probably will not be able
to find a sufficiently large number of commenters to warrant
the work that an author will have to do to open up the process
of writing to such interaction.   but it's an interesting experiment.

and then of course there will always be some _major_ exceptions,
on the order of chris anderson and the blog he has been keeping
while writing his "long tail" book.   because of the great exposure
the idea got from its description in a "wired" cover story last year,
and because anderson just happens to be the editor of "wired",
and because -- let's face it -- the idea is a _very_ compelling one
that therefore subsequently has been written up all over the place,
anderson's "long tail" blog has been a tremendously exciting space.
but my sense is that this will be the _exception_, rather than the rule.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/0c0c48db/attachment.html
From Bowerbird at aol.com  Mon May 22 17:32:53 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 22 17:32:58 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <30d.5ad89a0.31a3b235@aol.com>

another comment on "the pages of books reading each other"...

>    http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2006&
post=2006-05-22,1

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/685285df/attachment.html
From bruce at zuhause.org  Mon May 22 19:11:48 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Mon May 22 19:11:51 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <439.1c91c11.31a3713c@aol.com>
References: <439.1c91c11.31a3713c@aol.com>
Message-ID: <17522.28516.463762.723548@celery.zuhause.org>

Bowerbird@aol.com writes:
 > >    And the latest estimated I have received show that Google's
 > >    total number of books has just recently passed 50,000
 > 
 > i do believe you misread that.   50,000 public-domain titles,
 > with another 42,000 under copyright, for a total of 92,000.

My searching found 50,000 public domain titles available as complete
books, and another 42,000 that should have been available as complete
books because they were published prior to 1923, but were only visible
in snippet view.  I have no idea how many books Google scanned
published after 1922 which are probably PD because the copyright was
apparently not renewed, nor the number of books scanned even though
the book is still under copyright.
From Bowerbird at aol.com  Mon May 22 20:46:11 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 22 20:46:19 2006
Subject: [gutvol-d] re: blogs in e-books
Message-ID: <4a2.583199.31a3df83@aol.com>

david said:
>    You unwittingly made the point in linking via e-mail 
>    to a browser-readable blog. Same blog could appear 
>    in dotReader or another OpenReader implementation.

that doesn't answer the question.

will each and every blog/forum inside an openreader book
be accessible with a general web-browser?

because as far as i know, i can't put a link in this e-mail that
would take the user to a comment in a thoutreader e-book.

but if it can be done, then by all means, please show us.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/321052f7/attachment.html
From Bowerbird at aol.com  Mon May 22 20:52:42 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 22 20:52:48 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <45d.13253d6.31a3e10a@aol.com>

bruce said:
>    I have no idea how many books Google scanned
>    published after 1922 which are probably PD 
>    because the copyright was apparently not renewed

it would be foolish for google to take the risk of
showing books published after 1922.   if even one
had been renewed, it would become ammunition
for the other side.


>    nor the number of books scanned even though
>    the book is still under copyright.

we could extrapolate from the ratio of public-domain
to copyrighted titles in the libraries, but that would
assume that they aren't taking that into consideration,
and they might well be.   i believe they had said that
they would concentrate on public-domain titles first.
(or maybe that's just what michael _said_ they said.)

at any rate, i'm happy to sit back and wait while they
continue to do more scanning...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060522/d9962c85/attachment.html
From Bowerbird at aol.com  Mon May 22 23:51:57 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon May 22 23:52:09 2006
Subject: [gutvol-d] re: blogs in e-books
Message-ID: <43c.17978a8.31a40b0d@aol.com>

david said:
>    You're extrapolating from a specific setup 
>    at a specific company--one that will change in time.

i have nothing else to go on.

and i have no indication of the way -- 
if any -- that it "will change in time"...
nothing technical has been written up.

unlike many open-source programs, this one
would appear to be being developed in secret.
nothing technical about the app is written up...

and while i'm quite happy to wait until the capability
actually appears in an app that runs on my machine,
so i can see what it does and exactly how it does it,
you meanwhile are busy spewing glowing p.r.-speak.
which, when you "answer" a hard question, turns to mush.

if you can put a link in an e-mail here that takes a user
to a comment that has been made in an osoft e-book --
let's say the one that i made in the "my antonia" demo --
then do it.   otherwise, admit that -- at this point in time,
with that e-book, anyway -- that capability is not present.

or not.   the answer here is of little or no consequence,
because whether the annotations are viewable with an
ordinary web-browser or not, the capability to code it
(either way) is rather elementary to implement, which is
the point that i made at the outset.   if you'd like to see
a demo program demonstrating the ease with which this
can be done, i would be willing to code one up for you...
but you don't really want me to steal dotreader thunder,
do you?   why not let mark carey roll out his version first?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/fd2fe877/attachment.html
From davidrothman at yahoo.com  Tue May 23 01:07:06 2006
From: davidrothman at yahoo.com (David H. Rothman)
Date: Tue May 23 01:07:09 2006
Subject: Bowerbird's development schedule and the $200K he's demanding [Re:
	[gutvol-d] re: blogs in e-books]
Message-ID: <5eff08fa0605230107o4769a8afofab749d48e78e244@mail.gmail.com>

As for your doing a demo, hey, be my guest. OSoft and others have long
since carried out the basic concept of shared annotation, and the
SA-capable dotReader is on the way from OSoft. Of course I'm actually
concerned that your demo would hurt the cause of shared annotations by
showing it off less than optimally, whether it was brower- or
reader-based or both. And you're not going to have the related
standards infrastructure that dotReader will. Who knows, you might
even want to give us, er, "lock-in."

Now, back to some grubby details from the ZML world, such as the rival
reader app that you've spent so many hours trolling for--against
OpenReader. What's your development schedule for your reader, so we
can guard against "hype" and "vaporware"? Isn't true you've taken
forever to get your reader out? And beyond people not paying you $200K
or whatever, how come you won't share the source code for your rival
reader? Are you ashamed of it? I still don't have a satisfactory
answer. Code can be dear to one's heart, but still is a long way from
poetic musings. Why must you keep your brillance to yourself? Don't
you believe in open source?

Answer those questions, and then I suggest that we wind down this
thread in the interest of bandwidth and time--both mine and others'.

PG people are very welcome to write me privately or phone
me--especially Greg, if he's really serious about the comments he made
to U.S. Today extolling interactivity. Here's PG's chance to adopt a
powerful format (OpenReader) and enjoy readers worthy of it (dotReader
and in the future FBReader). I'm all ears as far as suggestions from
Greg or anyone else, and I know others will be as well.

David Rothman | davidrothman@openreader.org | 703-370-6540
OpenReader: http://www.openreader.org
OR's first implementer: http://www.dotreader.org
TeleBlog: http://www.teleread.org/blog
From hart at pglaf.org  Tue May 23 06:54:02 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue May 23 06:54:05 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <45d.13253d6.31a3e10a@aol.com>
References: <45d.13253d6.31a3e10a@aol.com>
Message-ID: <Pine.LNX.4.60.0605230651110.19444@pglaf.org>


On Mon, 22 May 2006 Bowerbird@aol.com wrote:

> bruce said:
>>    I have no idea how many books Google scanned
>>    published after 1922 which are probably PD
>>    because the copyright was apparently not renewed

I seem to recall an earlier report from someone who
did lots of searches for Google books and determined
that 88% of them were published after 1922.

Or at least were being treated as copyrighted books.


mh
From hart at pglaf.org  Tue May 23 06:56:53 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue May 23 06:56:55 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <45d.13253d6.31a3e10a@aol.com>
References: <45d.13253d6.31a3e10a@aol.com>
Message-ID: <Pine.LNX.4.60.0605230655270.19444@pglaf.org>


More. . .as for that 88% figure, I think that may have alluded
to the number of books Google has at their potential disposal,
rather than the number of those that have been scanned yet, or
it may also take duplications into account.

Sorry, been really busy, can't recall all the details. . . .

mh
From hart at pglaf.org  Tue May 23 07:03:40 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue May 23 07:03:41 2006
Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <17522.28516.463762.723548@celery.zuhause.org>
References: <439.1c91c11.31a3713c@aol.com>
	<17522.28516.463762.723548@celery.zuhause.org>
Message-ID: <Pine.LNX.4.60.0605230658250.19444@pglaf.org>


On Mon, 22 May 2006, Bruce Albrecht wrote:

> Bowerbird@aol.com writes:
> > >    And the latest estimated I have received show that Google's
> > >    total number of books has just recently passed 50,000
> >
> > i do believe you misread that.   50,000 public-domain titles,
> > with another 42,000 under copyright, for a total of 92,000.

Then I was probably right to count Google's total as ~100,000
in my own public estimations, though I would prefer counts of
downloadable books to avoid Google's new policy of:

"Google Book Search is a means for helping users discover
books, not to read them online and/or download them."


> My searching found 50,000 public domain titles available as complete
> books, and another 42,000 that should have been available as complete
> books because they were published prior to 1923, but were only visible
> in snippet view.  I have no idea how many books Google scanned
> published after 1922 which are probably PD because the copyright was
> apparently not renewed, nor the number of books scanned even though
> the book is still under copyright.

Are you saying that there are actually 50,000 downloadable
full text Google eBooks?

Any idea of their level of accuracy?


Please allow me to renew the request from myself and LIS PhD Greg Newby,
CEO of Project Gutenberg, for a copy of the list we can look over,
even if we cannot make it public.


Thanks!!!

Give the world eBooks in 2006!!!

Michael S. Hart
Founder
Project Gutenberg

Blog at http://hart.pglaf.org

From jon at noring.name  Tue May 23 07:24:03 2006
From: jon at noring.name (Jon Noring)
Date: Tue May 23 07:24:09 2006
Subject: DotReader.com adr. [Re: OpenReader vs. the troll in the basement
	[re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries]]
In-Reply-To: <44724A6A.2030107@verizon.net>
References: <5eff08fa0605221539t6d8ae4e4l19de369b4d5f6c08@mail.gmail.com>
	<44724A6A.2030107@verizon.net>
Message-ID: <1858102527.20060523082403@noring.name>

Juliet wrote:
> David H. Rothman wrote:

>> It's the best of all worlds: open source for programmers and a 
>> powerful free reader for users--and e-book standards similar to 
>> existing ones for publishers. Plus, dotReader can handle other 
>> XML/CSS-related formats as well.

> If dotReader can handle XML/CSS-related formats, then many of the more
> recent PG books are already available for it, since they have been 
> produced in xhtml. Most of the output from DP these days comes with an
> xhtml version.

The first vocabulary supported by OpenReader, called the "Basic
Content Document 1.0" (BCD) is a structurally-oriented subset of
XHTML 1.0, and compatible, as best as possible, with XHTML 2.0
currently being developed by W3C. It is also quite compatible with
OEBPS 1.2.

The draft BCD spec is located at:

   http://openreader.org/spec/bcd10.html

We plan to create an "Extended Content Document" vocabulary by
simply adding XLink support plus some OpenReader namespace tags to
markup important things that XHTML does not natively support, such as
page breaks and boundaries and numbering (e.g., for preserving where
page breaks occurred in the original paper book), line breaks as
occurred in the original (<br/> is not sufficient for this as I could
talk about another time), other noteworthy "mile markers", inline
indexing information (so OpenReader "readers" can assemble a people-
authored index on the fly), etc., etc.

Anyway. Feedback on BDD from those in DP who produce XHTML versions of
books is more than welcome! Of course, looking for those willing to do
a careful vetting of the BCD spec (and anyone who does becomes a
contributor to be added to the list of contributors in the spec.)

Thanks.

Jon Noring
OpenReader Consortium

From gbnewby at pglaf.org  Tue May 23 07:51:10 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue May 23 07:51:12 2006
Subject: [gutvol-d] USA Today;
In-Reply-To: <5eff08fa0605230107o4769a8afofab749d48e78e244@mail.gmail.com>
References: <5eff08fa0605230107o4769a8afofab749d48e78e244@mail.gmail.com>
Message-ID: <20060523145110.GA21391@pglaf.org>

On Tue, May 23, 2006 at 04:07:06AM -0400, David H. Rothman wrote:
> ....
> PG people are very welcome to write me privately or phone
> me--especially Greg, if he's really serious about the comments he made
> to U.S. Today extolling interactivity. Here's PG's chance to adopt a
> powerful format (OpenReader) and enjoy readers worthy of it (dotReader
> and in the future FBReader). I'm all ears as far as suggestions from
> Greg or anyone else, and I know others will be as well.

I enjoyed reading those quotes, and they're pretty
accurate from an interview I did a few weeks ago
concerning launch of the newest Sony eBook reader
with electronic ink.

(I was just in Tokyo two weeks ago, and was unable
to find one of these units for sale.  I didn't look
all that hard, but peered closely in the PDA section
of Bic Camera which is a huge electronics chain store).

They somehow recycled the article for USA Today --
nice to see.  Of course I'm serious about limitations
of eBook readers, and am against any format that
is one-way, closed, non-fixable/editable, etc.  This
is a thread in the "about" essays Michael and I worked
on:  http://www.gutenberg.org/about , with a key theme
being "unlimited distribution."

For the OpenReader format, as Marcello said there is 
no conceptual resistance to using this as a "convert
to" format at gutenberg.org, just as plucker is.  All
we need is a clear and preferably open source processing
chain that we can insert into the ibiblio.org site.  Also,
of course, a reasonable support community so that PG
help staff (me, George & Marcello) don't end up being 
too challenged in supporting the format.

In short, as you've heard before, you should feel encouraged
to "go for it."
  -- Greg


From hart at pglaf.org  Tue May 23 07:58:36 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue May 23 07:58:37 2006
Subject: DotReader.com adr. [Re: OpenReader vs. the troll in the basement
	[re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries]]
In-Reply-To: <5eff08fa0605221539t6d8ae4e4l19de369b4d5f6c08@mail.gmail.com>
References: <5eff08fa0605221539t6d8ae4e4l19de369b4d5f6c08@mail.gmail.com>
Message-ID: <Pine.LNX.4.60.0605230748040.19444@pglaf.org>


On Mon, 22 May 2006, David H. Rothman wrote:

> Actually that's http://www.dotreader.com , not .org --mea culpa--and if PG
> really cares about open source, then it should encourage strong open source
> efforts of the OSoft variety rather than just wait until they catch on.

However, Mr. Rothman does not take his own advice here, but only supports
the particular open source projects that support him.


> Here's a little two-man software house in Tacoma, Washington, gambling
> hundreds of thousands of dollars on an open-source reader that can do far
> more than Plucker, allowing blogs and forums to be embedded inside books.

Again, Mr. Rothman should take his own advice. . .if he were really doing
his bit to suport this "little two-man software house in Tacoma" he would
not be mentioning them as anonymous creatures sitting behind keyboards.


> Plucker has many appreciative users, but dotReader/OpenReader will be of far
> greater importance to commercial publishers, who are already starting to
> show interest.

This is what everyone says about every project.

Let's not confuse the press releases with reality.

BTW, some people are totally amazed at how many Plucker files we send out.
I got an independent comment on that earlier this week.

However, to address Mr. Rothman's point that we should promote this one
particular piece of open source programming, with or without the hundreds
of thousands of dollars he mentioned, with or without programmers' names,
Project Gutenberg is not in the business of establising businesses.

However, on the other hand, if Mr. Rothman were to read the Newsletters,
he would know that it is only one stop to putting in an announcement.

"Better to light a single candle,
than to curse the darkness."


> In turn, that'll be wonderful for PG works and other public domain books.
> dotReader reader can work with many kinds of books while improving the user
> experience.
>
> dotReader uses a turbocharged version of existing e-book standards that
> techies and publishers have thrashed aroundfor years.

And this means. . .. ?


> It's the best of all worlds: open source for programmers and a powerful free
> reader for users--and e-book standards similar to existing ones for publishers.
> Plus, dotReader can handle other XML/CSS-related formats as well.

Similar?

Will it be able to use these similar files?


>> We serverd 89504 plucker books in May 2006.
>
> I think you'll do much better with OpenReader available as well. OSoft's
> e-reader for the format is a thing of beauty, and, as noted, it'll be free
> to download.

So is Adobe Acrobat Reader.


From hart at pglaf.org  Tue May 23 08:03:21 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue May 23 08:03:22 2006
Subject: [gutvol-d] Outsourcing scanning
In-Reply-To: <4472311C.3020408@bohol.ph>
References: <48c.693268.31a37fc3@aol.com>
	<6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu>
	<4472311C.3020408@bohol.ph>
Message-ID: <Pine.LNX.4.60.0605230802440.19444@pglaf.org>


Several people I know have tried oursourcing scanning, OCR, etc.,
but all with disappointing results.

Sorry,

mh

On Mon, 22 May 2006, Jeroen Hellingman (Mailing List Account) wrote:

>
> After demonstrating PGDP to some people, I got in touch with an NGO who likes 
> to scan their entire holdings, and make it available on the web.
>
> Has anybody on this list experience with outsourcing scanning jobs (on a 
> larger scale)? I am looking at a project which includes about half a million 
> pages that need to be digitized. Ofcourse I am not going to scan that much 
> myself, and I heard prices at that scale can be as low as a few cents per 
> page when done in the Philippines. Has anybody prepared documents describing 
> quality control processes, etc., for such a bulk process. Hopefully, much of 
> the material will be made available on-line, (although it will not copyright 
> clear with PG, I don't expect issues with copyright). I may even set up a 
> 'Distributed Proofreading' system for it.
>
> Jeroen.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From davidrothman at pobox.com  Mon May 22 13:41:32 2006
From: davidrothman at pobox.com (David H. Rothman)
Date: Tue May 23 08:06:44 2006
Subject: OpenReader vs. the troll in the basement [re: [gutvol-d] Kevin Kelly
	in NYT on future of digital libraries]
In-Reply-To: <20060522164050.939444851.davidrothman@pobox.com>
References: <1bc.5089b96.31a33cbb@aol.com>
	<20060522150104.1080651974.davidrothman@pobox.com>
	<20060522150149.325320550.davidrothman@pobox.com>
	<20060522150332.110483112.davidrothman@pobox.com>
	<20060522151120.333718414.davidrothman@pobox.com>
	<20060522151447.591623163.davidrothman@pobox.com>
	<20060522151743.130547599.davidrothman@pobox.com>
	<20060522151814.1734039935.davidrothman@pobox.com>
	<20060522152006.1760790518.davidrothman@pobox.com>
	<20060522152114.1379513363.davidrothman@pobox.com>
	<20060522152413.1383410050.davidrothman@pobox.com>
	<20060522152716.66366918.davidrothman@pobox.com>
	<20060522153443.1213060337.davidrothman@pobox.com>
	<20060522153518.272947584.davidrothman@pobox.com>
	<20060522153553.2079047670.davidrothman@pobox.com>
	<20060522153641.560443431.davidrothman@pobox.com>
	<20060522155003.2005252714.davidrothman@pobox.com>
	<20060522160310.484640607.davidrothman@pobox.com>
	<20060522160741.417495629.davidrothman@pobox.com>
	<20060522161008.1626692064.davidrothman@pobox.com>
	<20060522161155.1281312296.davidrothman@pobox.com>
	<20060522161648.620467403.davidrothman@pobox.com>
	<20060522163407.1897590220.davidrothman@pobox.com>
	<20060522164024.966193868.davidrothman@pobox.com>
	<20060522164045.1760429003.davidrothman@pobox.com>
	<20060522164050.939444851.davidrothman@pobox.com>
Message-ID: <20060522164132.1934380741.davidrothman@pobox.com>

> that "strategy" was worn out years ago here..

But you're STILL a troll ;-) And a censor, too. Your reverted to the old subject line without the T Word, and I don't mean "TeleRead." Doesn't this suggest that lists should be run with a little bit of order in mind? Well, blog areas, too, including the TeleBlog.

Here's the deal. You gratuitously attacked OpenReader out of the blue. After my present message, we'll both have had our say; and now I think the PG list should get back to being the PG list. Meanwhile thanks for documenting that your source code is not open, and that you're really after 200K rather than 50K, if you're serious about selling the code.

> as for my app itself, i've always said it's available for free.

but not disclosure of the source code? this is bizarre. evil corporate two-guy osoft will offer downloads of dotreader for free 
and publicly reveal the source code of the basic reader. and yet the e.e. cummings of the performance poetry circuit wants 200K for his code. i am worried. surely we are all doomed to the maw of mammon if even poets are demanding 200k. 

of course, the real reason could be that you're ashamed of the source code--hence, the 200K price, if financial gain isn't the object.

sure you don't want to share your app's code?

> they should just go off and write it themselves, so
> i don't think you could call me a very good businessman.

Oh, well, so much for your added value.

> p.s.  evidently, book expo ain't keeping you very busy.
kinda slow around the openreader booth, is it?       :+)

ROFL. Um, the show ended yesterday. I drove Jon Noring out to see Mt. Vernon (almost all work up until now), and, after a TeleBlog post, I'm gonna return to follow-up. Wait. Might do a few of the just-received notes first. Not sure.

> considering the costs involve application of heavy markup...
> i'm not sure how you will ever be able to convince anyone
> that your cost-benefit ratio will be the best one available.

Hey, we care very much about creation tools and the like to simplify things. Your own format isn't up to the range of content that OpenReader can handle, thereby reducing the chances of collecting your $200K, if that's what you want. Good luck at it, however. And now I suggest that you do the Netiquette routine and avoid a reply, now that we've both had our say. Remember, you were the one who broached these issues. Again, best of luck. Despite more than a little provocation, my big interest is in advancing OpenReader rather than harming ZML.

Thanks,
David

David Rothman | davidrothman@openreader.org | 703-370-6540
http://www.openreader.org
http://www.teleread.org/blog


------------Original Message------------
From: Bowerbird@aol.com
To: gutvol-d@lists.pglaf.org, Bowerbird@aol.com
Date: Mon, May-22-2006 12:11 PM
Subject: re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
geez, david, you'll have to do _something_
more original than calling me a "troll" here;
that "strategy" was worn out years ago here...

as for your fantasy that i "object" to openreader
because i have "competitive business interests",
what exactly would those "business interests" be?

my source code is available for $200,000, (not $50k),
but i will tell anyone who wants to buy it not to bother,
that they should just go off and write it themselves, so
i don't think you could call me a very good businessman.

as for my app itself, i've always said it's available for free.
again, not a good businessman.  what can i say?  i'm a poet.

a good businessman would get a booth at book expo,
and hawk the product there.  just like you're doing now.

for the record, i wish the osoft people all the best.
you were very lucky they came along when they did,
and i'll be glad when they finally turn your precious
"openreader" format into something besides vapor.

at that point, we will be able to measure its benefits and
its costs, to decide how worthwhile its cost-benefit will be.

considering the costs involve application of heavy markup,
and similar benefits can be delivered with lighter markup,
i'm not sure how you will ever be able to convince anyone
that your cost-benefit ratio will be the best one available.
but hey, microsoft has been able to do that for _years_, so
don't give up...

-bowerbird

p.s.  evidently, book expo ain't keeping you very busy.
kinda slow around the openreader booth, is it?       :+)
_______________________________________________ 
gutvol-d mailing list 
gutvol-d@lists.pglaf.org 
http://lists.pglaf.org/listinfo.cgi/gutvol-d 

From davidrothman at pobox.com  Mon May 22 17:38:39 2006
From: davidrothman at pobox.com (David H. Rothman)
Date: Tue May 23 08:06:45 2006
Subject: dotReader [Re: [gutvol-d] re: blogs in e-books]
Message-ID: <5eff08fa0605221738w772f25d0oe789fe9838507cc8@mail.gmail.com>

>  so why are you hyping it like it's such a big deal?

But here we're talking about these capabilities skillfully integrated
in a reader about to hit the market--and especially a standards-based
one.

> >  but since you've brought it up, will these "blogs and forums"
>  that are "contained" within openreader e-books from osoft
>  be addressable by the general public using web-browsers?

Yes. You can still have blogs and forums accessible through general
Web browsers. Jon would be a better one to discuss this, but obviously
it's a server issue. You unwittingly made the point in linking via
e-mail to a browser-readable blog. Same blog could appear in dotReader
or another OpenReader implementation.

> >  or, like the current osoft thoutreader, will people need to use
>  that particular piece of software in order to view the comments?

Thanks for helping me make the case for meaningful e-book standards
;-))))))))) This is exactly why OSoft is so keen on the OpenReader
format.

I msyelf don't want OSoft-format editions of PG books. I want
OpenReader-format books, and OSoft agrees--hence, dotReader's use of
OpenReader as the featured format.

Beyond that, I'll be disgusted if dotReader and OSoft are the only
ones able to do justice to the format. Jon's itching to work with
FBReader (a planned implementer) and others.

> >  a clear answer will tell us a lot about your attitude on "lock-in".

Well, I don't see how clearer the answer could be that than. If PG
wants to free its books from lock-in and add new capabilities,
especially interactivity, then a dotReader/OpenReader approach would
be the way to go. At the same time PG could still offer other formats,
including, yes, ZML, which would be trivial for dotReader to read. Let
the marketplace decide.

Elsewhere you write:

> i'm of the opinion that most books probably will not be able
> to find a sufficiently large number of commenters to warrant
> the work that an author will have to do to open up the process
> of writing to such interaction.  but it's an interesting experiment

It'll happen if e-books are easier for schools and libraries to use.
Tearing down the Tower of eBabel would be a start. Plus, major
publishers are talking to us about commercial uses of the
interactivity, such as for book clubs. Popularity of a book, by the
way, is only one determinant of whether there'll be commenters. The
fervor of the particpants matters, and that could happen with Long
Tail books, not just best-sellers.

As for if:book's cool work, the example you gave is Web based.
dotReader and hopefully Sophie will go beyond that. A toast to both!

David


On 5/22/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
> david said:
>  >   BIRD: In the snippet below, you simply LINKED to a blog.
>
>  and in most cases, that will be quite good enough, thanks,
>  because people can then read the blog there, and comment.
>
>  but since you've brought it up, will these "blogs and forums"
>  that are "contained" within openreader e-books from osoft
>  be addressable by the general public using web-browsers?
>
>  or, like the current osoft thoutreader, will people need to use
>  that particular piece of software in order to view the comments?
>
>  a clear answer will tell us a lot about your attitude on "lock-in".
>
>
>  >   dotReader, OpenReader's first implementation, will
>  >   CONTAIN blogs and forums and make them readable
>  >   to users even when they're offlline.
>
>  it's not hard to implement that...
>
>  the app just downloads the content,
>  and saves it for offline presentation,
>  then uploading what is to be posted,
>  capabilities already included in many
>  r.s.s. readers and blogging software...
>
>  depending on if people actually use it,
>  it could end up being a neat technology,
>  much like all the other instantiations of it
>  i discussed in my earlier post in this thread.
>
>  my point was that it's not difficult to implement.
>
>  it's not.
>
>  so why are you hyping it like it's such a big deal?
>
>  -bowerbird
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>
From davidrothman at pobox.com  Mon May 22 22:05:42 2006
From: davidrothman at pobox.com (David H. Rothman)
Date: Tue May 23 08:06:45 2006
Subject: [gutvol-d] re: blogs in e-books
In-Reply-To: <4a2.583199.31a3df83@aol.com>
References: <4a2.583199.31a3df83@aol.com>
Message-ID: <5eff08fa0605222205k6e9acf5bm1f40f09c6e5c5fc9@mail.gmail.com>

> will each and every blog/forum inside an openreader book
> be accessible with a general web-browser

I've already answered, and you're ill-serving PG members by trimming
out my response. The publisher could arrange for the same material to
be accessible via a server. As for "each and every"--well, that's up
to the publisher. But technically, it certainly would appear to be
possible. I'll cc Jon in case I'm somehow wrong on a nuance.

What's more, if the reader has the free reader and is using it as a
plug-in, he/she could go directly to the book from the Web and, if I'm
not mistaken, even reach an anchor. So why not a blog? In other words,
the Web and the book converge. Remember, the goal is for the reader to
reflect a STANDARD. So we're not talking about the proprietary act.
Like the browser, the reader will not be limited to a proprietary
format if it honors the standard.

>because as far as i know, i can't put a link in this e-mail that
would take the user to a comment in a thoutreader e-book.
but if it can be done, then by all means, please show us.

Nice going, Bowerbird. I've already said this is a server thing re the
general-purpose browser. OSoft did not set up its server that way, but
that's hardly an indication it can't be done. What's more, I've
already described the plug-in approach.You're extrapolating from a
specific setup at a specific company--one that will change in time.

David


------------Original Message------------
From: Bowerbird@aol.com
To: davidrothman@pobox.com, gutvol-d@lists.pglaf.org
Cc: Bowerbird@aol.com
Date: Mon, May-22-2006 11:46 PM
Subject: re: [gutvol-d] re: blogs in e-books
david said:
>   You unwittingly made the point in linking via e-mail
>   to a browser-readable blog. Same blog could appear
>   in dotReader or another OpenReader implementation.

that doesn't answer the question.

will each and every blog/forum inside an openreader book
be accessible with a general web-browser?

because as far as i know, i can't put a link in this e-mail that
would take the user to a comment in a thoutreader e-book.

but if it can be done, then by all means, please show us.

-bowerbird


On 5/22/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
> david said:
>  >   You unwittingly made the point in linking via e-mail
>  >   to a browser-readable blog. Same blog could appear
>  >   in dotReader or another OpenReader implementation.
>
>  that doesn't answer the question.
>
>  will each and every blog/forum inside an openreader book
>  be accessible with a general web-browser?
>
>  because as far as i know, i can't put a link in this e-mail that
>  would take the user to a comment in a thoutreader e-book.
>
>  but if it can be done, then by all means, please show us.
>
>  -bowerbird
>
From davidrothman at pobox.com  Mon May 22 22:52:08 2006
From: davidrothman at pobox.com (David H. Rothman)
Date: Tue May 23 08:06:46 2006
Subject: [gutvol-d] re: blogs in e-books
In-Reply-To: <5eff08fa0605222205k6e9acf5bm1f40f09c6e5c5fc9@mail.gmail.com>
References: <4a2.583199.31a3df83@aol.com>
	<5eff08fa0605222205k6e9acf5bm1f40f09c6e5c5fc9@mail.gmail.com>
Message-ID: <20060523015208.1501928951.davidrothman@pobox.com>

> > will each and every blog/forum inside an openreader book
> > be accessible with a general web-browser
> 
> I've already answered, and you're ill-serving PG members by trimming
> out my response. The publisher could arrange for the same material to
> be accessible via a server. As for "each and every"--well, that's up
> to the publisher. But technically, it certainly would appear to be
> possible. I'll cc Jon in case I'm somehow wrong on a nuance.

I'll hasten to add--just so it's clear even to newbies--a WEB server. The Web and the book-based blogs/forums converge. A general brower can work just as it would on other blogs/forums reachable through the Web.

David

From hart at pglaf.org  Tue May 23 08:14:34 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue May 23 08:14:35 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital  libraries
In-Reply-To: <6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu>
References: <48c.693268.31a37fc3@aol.com>
	<6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu>
Message-ID: <Pine.LNX.4.60.0605230806330.19444@pglaf.org>


On Mon, 22 May 2006, Frank van Drogen wrote:

>
>> it's clear that google has gotten their legs under them
>> in regard to doing the scanning.  let's hope that they'll
>> get their quality-control under control very soon too...

After about 25% of their 6 year schedule to 10 million books,
it would appear they are approaching 1% or 100,000 total books,
with perhaps half of those easily downloadable, but in varying
states of completion and accuracy.

If you presume they keep up with Moore's Law, 6 years looks like:

       Totals  Dates  Doublings Years

         00  Dec 14, 2004   0    0
     50,000  Jun 14, 2006   1   1.5
    100,000  Dec 14, 2007   2    3
    200,000  Jun 14, 2009   3   4.5
    400,000  Dec 14, 2010   4    6

which continues as

    800,000  Jun 14, 2012   5   7.5
  1,600,000  Dec 14, 2013   6    9
  3,200,000  Jun 14, 2015   7  10.5
  6,400,000  Dec 14, 2016   8   12
12,800,000  Jun 14, 2018   9  13.5

which would put them at over 12 years to their 10 million books
in terms of downloadable eBooks.

However, if you presume they have 100,000 by June 14, 2006,
this would take 18 months off their total time, by counting
non-downloadable and non-readable books.


> I have found less missing pages and other problems in books from Google then 
> in those from the MBP and Canadian/IA. They are, however, still far from 
> perfect. When they get a report regarding a missing or wrongly scanned page 
> in a PD book; it is apparently up to the providing library to get the problem 
> sorted out. I've heard report of complete books being rescanned (with the 
> risk of having another page missing in the end ;) ). I've also heard somebody 
> mentioning that the full rescanned book was stuck behind the existing one 
> (rather space consuming, but for DP purposes a lot saver.
>
> What worries me in this is that Google doesn't seem to care whether pages are 
> missing or not... as long as they get 99% of the pages from a book stored, 
> changes are most search terms pointing to the particular book will be 
> identified. Their interest lies in people purchasing the book via Amazon, Abe 
> etc. after identifying them via book.google.com.

When your goal is simply the appearance of having a lot of books,
99% is a perfectly good business plan.

And if your goal is to get people to BUY the books from your other
business partners, then there is even less reason for moving to 99+%.


> The best quality control I have encountered so far is on Gallica, where 
> appart from missing pages due to those pages missing in the original scanned 
> manuscript, I've not encountered incomplete books. I'd be actually 
> interesting to see how they perfrom their quality control.

If you can give me any contact info on Gallica,
I will see if I can find out for you.


Thanks!!!

Give the world eBooks in 2006!!!

Michael S. Hart
Founder
Project Gutenberg

Blog at http://hart.pglaf.org

From Bowerbird at aol.com  Tue May 23 08:14:39 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue May 23 08:14:43 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <49c.66a23e.31a480df@aol.com>

michael said:
>   I seem to recall an earlier report from someone 
>    who did lots of searches for Google books and 
>    determined that 88% of them were published after 1922.

i've posted this before, taken
from lorcan dempsey's weblog,
summarizing an article in d-lib.
i always find it again easily by
searching his site for "anatomy".

>    http://orweblog.oclc.org/archives/000800.html
>    The anatomy of an aggregate collection
>    September 17, 2005
>    Approximately half of the print books 
>    in the combined Google 5 collection 
>    were published after 1974. 
>    Almost three-quarters were published 
>    after the Second World War. 
>    Using the year 1923 as a rough break-off 
>    point between materials that are 
>    out of copyright and materials that are 
>    in copyright [16], more than 80 percent 
>    of the materials in the Google 5 collections are 
>    still in copyright (this is of course an upper bound).

if google has scanned roughly 100,000 pre-1923 items,
and they were taking books off the shelves randomly,
then we could assume they scanned 400,000 post-1923.

but if we assume they were doing the pre-1923 items first,
100,000 pre-1923 scanned means 100,000 total scanned.

seems to me assuming things does us absolutely no good.

but google is _going_ to scan 10+ million books, eventually,
so i'm not sure what difference it makes _how_many_ they've
done "so far".   are we really questioning their _resolve_ here?
seems to me that they've proven they are dedicated to this...

so attempts to figure out "how many books so far?" are silly.

especially since we know that many of the post-1923 items
did not have their copyrights renewed -- except that we do
_not_ know what percentage, and thus cannot even _assume_
the answer to that important question, not with any certainty.

if we say that half of the post-1923 books were not renewed,
then that means that 60% (20% and 40%) are not in copyright.
if we say that 1/3 of the post-1923 books were not renewed,
then that means that 53% (20% and 33%) are not in copyright.
if we say that 2/3 of the post-1923 books were not renewed,
then that means that 86% (20% and 66%) are not in copyright.

not that the answer would matter any, because due to the
litigious arena into which we have allowed the project to
be thrown, there's probably no way google would be likely
to take the risk of showing _any_ of the orphaned material.
so we're back to the original 20% that is pre-1923 and clear.

of course, the answer to this is to give google an immunity,
to let them serve as the "test-bed" that will act to bring out
any claims of copyrighted material that might be lurking...

in other words, let google show each book, in full, _until_
some _proof_ of copyright is rendered by another party.
(and i do mean proof, and not just some bullshit claim...)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/03351ed8/attachment.html
From hart at pglaf.org  Tue May 23 08:18:24 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue May 23 08:18:25 2006
Subject: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries]
In-Reply-To: <48b.696020.31a3809a@aol.com>
References: <48b.696020.31a3809a@aol.com>
Message-ID: <Pine.LNX.4.60.0605230815380.19444@pglaf.org>


It is quite obvious that many people view both Bowerbird's and Mr. Rothman's
comments as attacks, even Bowerbird and Mr. Rothman, though rarely would the
mention include self-reflection on this matter.

Honey versus vinegar?

The real question is whether either of them, or the issues they promote,
bring any real advances to the world of eBooks.


On Mon, 22 May 2006 Bowerbird@aol.com wrote:

> david said:
>>    You gratuitously attacked OpenReader out of the blue.
>
> no, i didn't.
>
> but i'm sure you'd like to spin it that way.
>
> what i said was that you were making a big deal about
> "putting a blog inside an e-book", when that is actually
> a somewhat trivial thing to do.
>
> look, i've inserted a blog inside of this e-mail:
>>
> http://www.buzzmachine.com/index.php/2006/05/19/the-book-is-dead-long-live-the-book/
>
> -bowerbird
>
From hart at pglaf.org  Tue May 23 08:20:51 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue May 23 08:20:53 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <48c.693268.31a37fc3@aol.com>
References: <48c.693268.31a37fc3@aol.com>
Message-ID: <Pine.LNX.4.60.0605230818590.19444@pglaf.org>

On Mon, 22 May 2006 Bowerbird@aol.com wrote:

> frank said:
>>    Even that number is a misinterpretation
>
> thanks for clearing that up for us, frank...
>
> it's clear that google has gotten their legs under them
> in regard to doing the scanning.   let's hope that they'll
> get their quality-control under control very soon too...
>
> it is important to keep in mind that 100,000 books is
> <1% of the 10.5 million (or more) they'll do eventually;
> it's understandable if the process isn't up to speed yet.

I wonder how great a percentage of Google's six year plan
will have to expire before Mr. Bowerbird will admit that
it doesn't look as if Google is even trying to make it to
10 million in 6 years.

My own projections show it taking about twice that long,
if Mr. Bowerbird is correct, and they have indeed gotten
their feet under them already.

mh
From davidrothman at yahoo.com  Tue May 23 08:37:53 2006
From: davidrothman at yahoo.com (David H. Rothman)
Date: Tue May 23 08:37:56 2006
Subject: [gutvol-d] USA Today;
In-Reply-To: <20060523145110.GA21391@pglaf.org>
References: <5eff08fa0605230107o4769a8afofab749d48e78e244@mail.gmail.com>
	<20060523145110.GA21391@pglaf.org>
Message-ID: <5eff08fa0605230837n4b7272dcm917d71ec7cb1b216@mail.gmail.com>

Many thanks, Greg. Those are all extremely reasonable conditions, and
I'll forward this to the appropriate folks, so they can be in direct
touch with you. We're eager to work with PG/DP and blend in well with
everyone's workflow. I also agree with you on the need for thinking
through the support issues. You could share with us the lessons you've
learned from Plucker. - David

> For the OpenReader format, as Marcello said there is
> no conceptual resistance to using this as a "convert
> to" format at gutenberg.org, just as plucker is.  All
> we need is a clear and preferably open source processing
> chain that we can insert into the ibiblio.org site.  Also,
> of course, a reasonable support community so that PG
> help staff (me, George & Marcello) don't end up being
> too challenged in supporting the format.
>
> In short, as you've heard before, you should feel encouraged
> to "go for it."
>   -- Greg


On 5/23/06, Greg Newby <gbnewby@pglaf.org> wrote:
> On Tue, May 23, 2006 at 04:07:06AM -0400, David H. Rothman wrote:
> > ....
> > PG people are very welcome to write me privately or phone
> > me--especially Greg, if he's really serious about the comments he made
> > to U.S. Today extolling interactivity. Here's PG's chance to adopt a
> > powerful format (OpenReader) and enjoy readers worthy of it (dotReader
> > and in the future FBReader). I'm all ears as far as suggestions from
> > Greg or anyone else, and I know others will be as well.
>
> I enjoyed reading those quotes, and they're pretty
> accurate from an interview I did a few weeks ago
> concerning launch of the newest Sony eBook reader
> with electronic ink.
>
> (I was just in Tokyo two weeks ago, and was unable
> to find one of these units for sale.  I didn't look
> all that hard, but peered closely in the PDA section
> of Bic Camera which is a huge electronics chain store).
>
> They somehow recycled the article for USA Today --
> nice to see.  Of course I'm serious about limitations
> of eBook readers, and am against any format that
> is one-way, closed, non-fixable/editable, etc.  This
> is a thread in the "about" essays Michael and I worked
> on:  http://www.gutenberg.org/about , with a key theme
> being "unlimited distribution."
>
> For the OpenReader format, as Marcello said there is
> no conceptual resistance to using this as a "convert
> to" format at gutenberg.org, just as plucker is.  All
> we need is a clear and preferably open source processing
> chain that we can insert into the ibiblio.org site.  Also,
> of course, a reasonable support community so that PG
> help staff (me, George & Marcello) don't end up being
> too challenged in supporting the format.
>
> In short, as you've heard before, you should feel encouraged
> to "go for it."
>   -- Greg
>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From Bowerbird at aol.com  Tue May 23 08:43:27 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue May 23 08:43:38 2006
Subject: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries]
Message-ID: <368.52ae186.31a4879f@aol.com>

michael said:
>    It is quite obvious that many people view 
>    both Bowerbird's and Mr. Rothman's comments as attacks, 
>    even Bowerbird and Mr. Rothman, though rarely would the
>    mention include self-reflection on this matter.

for the record, i do not feel i have made _any_ "attacks".

and i firmly believe if you look at what i have actually said -- 
as opposed to how david has _characterized_ what i've said 
-- you will see that that is so.   indeed, if you find otherwise,
i'd be happy for you to draw attention to it, so i can explain.

i have said _unflattering_ things.   but with good evidence.
i _never_ resort to spin-doctoring, such as his name-calling.
he flames me, and then tries to blame it on me by calling me
"a troll".   don't tell me you can't see through that transparency.

and really, take a look at the latest thread.   i made a simple point
-- which is that it is relatively easy for a programmer to "embed"
shared annotations into an e-book -- and he eventually ended up
dragging in a _myriad_ of unrelated charges, some of them _silly_.

meanwhile, it still remains easy to embed annotations in an e-book.
contrary to what a naive observer might be led to believe by teleblog
-- or even his recent posts here -- openreader is _not_ the only way
to provide "interactivity" to electronic-books.   not even close.

all these other issues that he is throwing up are a _smokescreen_
intended to divert your attention from that very simple fact, and
if anyone reading these posts doesn't realize that, then i _fear_
for their reading comprehension.   of course, david doesn't _want_
people to read these posts, that's why he's throwing up a barrage,
hoping that lurkers will just view the subject-headers (where he
promulgates his smear job by prominently using the "troll" word).

is what i'm saying here unflattering?   you betcha.   is it true?   you 
betcha.

but i don't do "attacks".   i do hard-headed analysis with a focus on fact.

and i don't get emotionally involved -- it interferes badly with the logic.
it's a lot smarter to stay on-point.

i do a lot of self-reflection.   i can look at myself in a mirror just 
fine...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/c3c641d4/attachment.html
From Bowerbird at aol.com  Tue May 23 09:05:19 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue May 23 09:05:30 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <483.8864e2.31a48cbf@aol.com>

michael said:
>    I wonder how great a percentage of Google's six year plan
>    will have to expire before Mr. Bowerbird will admit that
>    it doesn't look as if Google is even trying to make it to
>    10 million in 6 years.

michael, it certainly isn't necessary to call me "mr." bowerbird.
but hey, it sounds kinda funny and cute, so please, be my guest.

as for google's plan, i laid out my prediction last december:
    december 14, 2004 -- 0 books
    december 14, 2005 -- 10,000 books
    december 14, 2006 -- 100,000 books
    december 14, 2007 -- 1,000,000 books
    december 14, 2008 -- 10,000,000 books

so not only do i think they are still on-track, and doing well,
i actually think they'll wrap it up by the end of 2008, michael,
_if_ they stop at the 10.5 million unique titles they have now.

but i think the courts will clear a path for them by that time,
and more libraries will come on-board, and their focus will
expand from books into the wide variety of _other_ content
commonly found in libraries, including much of the local stuff
found in libraries nationwide, so that by december 14, 2012,
they will have scanned a grand total of some 100 million items,
at a cost of $10 billion.   (all the local stuff will jack up the cost,
from $1 billion to $10 billion.   but by this time, the google boys
will be worth $25 billion each, and google itself $75 billion, so
this will just be a cost of doing business written off their taxes.)

see, when you've got a ton of money, moore's law becomes
your _bottom_ bound, not your top one.   need to go faster?
all you need to do is buy more scanners and hire more people.


>    My own projections show it taking about twice that long,

so what if it does take "twice as long"?   or 3 or 4 times as long?

really, so what?

your own e-library took 35 years to get to 20,000 items,
and i think it's one of the best things in all of cyberspace...

well, except for some of those videos over on youtube.
people are really funny and creative, know what i mean?      ;+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/52784213/attachment.html
From hart at pglaf.org  Tue May 23 09:12:09 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue May 23 09:12:10 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <44721054.3000104@ibiblio.org>
References: <424.1cc65e1.31a33a85@aol.com>
	<Pine.LNX.4.60.0605221045230.32683@pglaf.org>
	<44721054.3000104@ibiblio.org>
Message-ID: <Pine.LNX.4.60.0605230852411.19444@pglaf.org>


On Mon, 22 May 2006, Michael Dyck wrote:

> Michael Hart wrote:
>> 
>> If we double that number to 100,000, we could pretend these
>> results indicated that Google had accomplished 1% of a goal
>> of 10,000,000 books, in 25% of their 6 year plan.
>
> In 1993, PG had accomplished 1% of its goal of 10,000,
> in about 70% of the total time.

Only if you keep refusing to acknowledge that there was not an
ordinary production schedule until 1991. . . .

Mr. Dyck has been refusing to acknowlege for some time that an
ordinarly growth curve is impossible to create when the growth
is linear. . .i.e. one per year. . . .

And sometimes it was even less, as when the copyright law fell
out of favor in 1976, and was replaced with a longer one, thus
ruining our first efforts at a Complete Shakespeare.

Project Gutenberg growth curves have always been presented for
dates starting in January of 1991, the first year of scheduled
production rates:

  1 per month in 1991
  2 per month in 1992
  4 per month in 1993
  8 per month in 1994
16 per month in 1995
32 per month in 1996 97 98 99 [Survived a big financial loss]
      [A big reason we don't let money rule Project Gutenberg]


Obviously by doubling the total number of books from 1971-90
in the single year of 1991, and then again, and again in the
next few years, eliminates any usefulness of incorporating a
linear, or less, growth rate that previously existed.

Here is the current graph, also included as an attachment to
insure it is not broken by mailer margination.

12341234123412341234123412341234123412341234123412341234123412341234
-90--91--92--93--94--95--96--97--98--99--00--01--02--03--04--05--06-
                                                  *Perhaps    20K   >07/06 20K
                                                 *Estimated  19.5K >05/06 19.5K
19,020 on March 31, 2006                                   19K   > 03/06  19K
18,500 on February 13, 2006                               18.5K > 02/06  18.5K
Added ~216 from PG of Europe January 01, 2006             18K   > 01/06   18K
Added 1 from PG PrePrint Site, January, 2006           17.5K  > 11/05    17.5K
                                                         17K   > 08/05     17K
                                                        16.5K > 06/05     16.5K
                                                        16K   > 04/05      16K
                                                       15.5K > 02/05      15.5K
                                                       15K   > 01/05       15K
                                                     14.5K  > 11/04       14.5K
                                                     14K    > 10/04        14K
                                                    13.5K  > 08/04        13.5K
                                                    13K   > 06/04          13K
                                                   12.5K  > 04/04         12.5K
                                                   12K   > 03/04           12K
                                                  11.5K  > 02/04          11.5K
                                                   11K   > 01/04           11K
                                                  10.5K > 11/03           10.5K
               >>>  October 15, 2003  >>>           10K > 10/03           *10K*
                                                 9,500 > 9/03             9,500
                                                 9,000 > 8/03             9,000
                                                 8,500 > 7/03             8,500
                                                8,000 > 5/03              8,000
                                               7,500 > 3/03               7,500
    Note this graph is in 1/4 years            7,000 > 1/03               7,000
                                              6,500 > 12/02               6,500
                                             6,000 > 9/02                 6,000
                                             5,500 > 7/02                 5,500
                                            5,000 > 4/02                  5,000
                                           4,500 > 2/02                   4,500
Added PG Australia in August, 2001        4,000> 10/01                   4,000
                                         3,500 >5/01                      3,500
                                       3,000> 12/00                       3,000
                                     2,500 > 8/00                         2,500
                                  2,000 > 12/99                           2,000
                               1,500 > 10/98                              1,500
                         1,000 > 8/97                                     1,000
                      500 > 4/96                                            500
            100 >12/93                   <<<December 10, 1993               100
  10>12/90                                                                   10
-90--91--92--93--94--95--96--97--98--99--00--01--02--03--04--05--06- YEARS
12341234123412341234123412341234123412341234123412341234123412341234 QUARTERS


-------------- next part --------------
12341234123412341234123412341234123412341234123412341234123412341234

-90--91--92--93--94--95--96--97--98--99--00--01--02--03--04--05--06-

                                                 *Perhaps    20K   >07/06 20K

                                                *Estimated  19.5K >05/06 19.5K

19,020 on March 31, 2006                                   19K   > 03/06  19K

18,500 on February 13, 2006                               18.5K > 02/06  18.5K

Added ~216 from PG of Europe January 01, 2006             18K   > 01/06   18K

Added 1 from PG PrePrint Site, January, 2006           17.5K  > 11/05    17.5K

                                                        17K   > 08/05     17K

                                                       16.5K > 06/05     16.5K

                                                       16K   > 04/05      16K

                                                      15.5K > 02/05      15.5K

                                                      15K   > 01/05       15K

                                                    14.5K  > 11/04       14.5K

                                                    14K    > 10/04        14K

                                                   13.5K  > 08/04        13.5K

                                                   13K   > 06/04          13K

                                                  12.5K  > 04/04         12.5K

                                                  12K   > 03/04           12K

                                                 11.5K  > 02/04          11.5K

                                                  11K   > 01/04           11K

                                                 10.5K > 11/03           10.5K

              >>>  October 15, 2003  >>>           10K > 10/03           *10K*

                                                9,500 > 9/03             9,500

                                                9,000 > 8/03             9,000

                                                8,500 > 7/03             8,500

                                               8,000 > 5/03              8,000

                                              7,500 > 3/03               7,500

   Note this graph is in 1/4 years            7,000 > 1/03               7,000

                                             6,500 > 12/02               6,500

                                            6,000 > 9/02                 6,000

                                            5,500 > 7/02                 5,500

                                           5,000 > 4/02                  5,000

                                          4,500 > 2/02                   4,500

Added PG Australia in August, 2001        4,000> 10/01                   4,000

                                        3,500 >5/01                      3,500

                                      3,000> 12/00                       3,000

                                    2,500 > 8/00                         2,500

                                 2,000 > 12/99                           2,000

                              1,500 > 10/98                              1,500

                        1,000 > 8/97                                     1,000

                     500 > 4/96                                            500

           100 >12/93                   <<<December 10, 1993               100

 10>12/90                                                                   10

-90--91--92--93--94--95--96--97--98--99--00--01--02--03--04--05--06- YEARS

12341234123412341234123412341234123412341234123412341234123412341234 QUARTERS


From gbnewby at pglaf.org  Tue May 23 09:21:58 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue May 23 09:22:00 2006
Subject: [gutvol-d] USA Today;
In-Reply-To: <5eff08fa0605230837n4b7272dcm917d71ec7cb1b216@mail.gmail.com>
References: <5eff08fa0605230107o4769a8afofab749d48e78e244@mail.gmail.com>
	<20060523145110.GA21391@pglaf.org>
	<5eff08fa0605230837n4b7272dcm917d71ec7cb1b216@mail.gmail.com>
Message-ID: <20060523162158.GA24437@pglaf.org>

On Tue, May 23, 2006 at 11:37:53AM -0400, David H. Rothman wrote:
> Many thanks, Greg. Those are all extremely reasonable conditions, and
> I'll forward this to the appropriate folks, so they can be in direct
> touch with you. We're eager to work with PG/DP and blend in well with
> everyone's workflow. I also agree with you on the need for thinking
> through the support issues. You could share with us the lessons you've
> learned from Plucker. - David

I haven't learned any particular lesson from Plucker, which is
probably good...  Marcello might have some views on how things
integrate.

Mail to help@pglaf.org goes to me & George Davis...George
answers most of them.  We get 5 or so inquiries per day.  Frustrations
come from our .lit, .pdb and .mp3 files which, when broken
our outdated, cannot be easily fixed.

We get frequent requests to submit this and that format, 
including some people who do the work of conversion then 
send me files.  Lots of PDF, but pretty well any format you
can think of (.doc, etc.).  For the most part I don't want to
add such formats in static files to the PG collection, instead
prefering conversion on the fly.

The goal, as oft stated, is automated conversion to many
formats from XML or HTML input.  Several people have made
great progress on this, and the XML production chain at
DP is in pretty good shape....but we're not there yet.

The current catalog/download interface at gutenberg.org is
close to the ideal: just a few static files, then a selection
of conversion options.  Today, Plucker is the only one Marcello
has available, but more can be added.  Conversion to PDF,  
MP3 & Braille are at the top of my personal list.

Not all input books or types can be reasonably accurately
converted to any possible format, especially for the
older titles with no well-formed & valid HTML version.
(David Widger converts several dozen eBooks per week,
minimum, to current standards.)
  -- Greg

> >For the OpenReader format, as Marcello said there is
> >no conceptual resistance to using this as a "convert
> >to" format at gutenberg.org, just as plucker is.  All
> >we need is a clear and preferably open source processing
> >chain that we can insert into the ibiblio.org site.  Also,
> >of course, a reasonable support community so that PG
> >help staff (me, George & Marcello) don't end up being
> >too challenged in supporting the format.
> >
> >In short, as you've heard before, you should feel encouraged
> >to "go for it."
> >  -- Greg
> 
> 
> 
> 
> 
> 
> On 5/23/06, Greg Newby <gbnewby@pglaf.org> wrote:
> >On Tue, May 23, 2006 at 04:07:06AM -0400, David H. Rothman wrote:
> >> ....
> >> PG people are very welcome to write me privately or phone
> >> me--especially Greg, if he's really serious about the comments he made
> >> to U.S. Today extolling interactivity. Here's PG's chance to adopt a
> >> powerful format (OpenReader) and enjoy readers worthy of it (dotReader
> >> and in the future FBReader). I'm all ears as far as suggestions from
> >> Greg or anyone else, and I know others will be as well.
> >
> >I enjoyed reading those quotes, and they're pretty
> >accurate from an interview I did a few weeks ago
> >concerning launch of the newest Sony eBook reader
> >with electronic ink.
> >
> >(I was just in Tokyo two weeks ago, and was unable
> >to find one of these units for sale.  I didn't look
> >all that hard, but peered closely in the PDA section
> >of Bic Camera which is a huge electronics chain store).
> >
> >They somehow recycled the article for USA Today --
> >nice to see.  Of course I'm serious about limitations
> >of eBook readers, and am against any format that
> >is one-way, closed, non-fixable/editable, etc.  This
> >is a thread in the "about" essays Michael and I worked
> >on:  http://www.gutenberg.org/about , with a key theme
> >being "unlimited distribution."
> >
> >For the OpenReader format, as Marcello said there is
> >no conceptual resistance to using this as a "convert
> >to" format at gutenberg.org, just as plucker is.  All
> >we need is a clear and preferably open source processing
> >chain that we can insert into the ibiblio.org site.  Also,
> >of course, a reasonable support community so that PG
> >help staff (me, George & Marcello) don't end up being
> >too challenged in supporting the format.
> >
> >In short, as you've heard before, you should feel encouraged
> >to "go for it."
> >  -- Greg
> >
> >
> >
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d@lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
From Bowerbird at aol.com  Tue May 23 09:51:27 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue May 23 09:51:37 2006
Subject: [gutvol-d] re: accurately converted to any possible format
Message-ID: <43c.18826a3.31a4978f@aol.com>

greg said:
>    Not all input books or types can be reasonably 
>    accurately converted to any possible format, especially 
>    for the older titles with no well-formed & valid HTML version.

as i've said for years now, with a small commitment from you
to consistent formatting, i could take plain-ascii files as input,
automatically apply the typographic niceties that are expected,
and output the results to .pdf and to .html, such that the .html 
can be converted to a large number of other auxiliary formats.

of course, i'm not unique.   david moynihan has done it for years.

david was willing to make a small commitment to edit the files
himself so as to obtain that consistent formatting.   i think it is
more important to teach you how to fish than to give you fish.

check with 3 tool-makers from distributed proofreaders --
thundergnat, donovan, and bill flis -- and they'll confirm that
a clear path for ascii-to-(x)html conversion is quite workable
-- due to the fact that d.p. now has the required consistency --
even with their current programs, and that if they worked on it
a bit more, they could make it into a regular part of the workflow.

there is no need for the more-complex switch to a .tei workflow.

-bowerbird

p.s.   if you only would have accepted moynihan's offer of his files
when he made it to you, you'd already _have_ a consistent library.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/aa153d27/attachment.html
From marcello at perathoner.de  Tue May 23 13:10:32 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue May 23 13:10:37 2006
Subject: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries]
In-Reply-To: <368.52ae186.31a4879f@aol.com>
References: <368.52ae186.31a4879f@aol.com>
Message-ID: <44736C38.104@perathoner.de>

Bowerbird@aol.com wrote:


> it is relatively easy for a programmer to "embed"
> shared annotations into an e-book

How would you know?


> i do a lot of self-reflection.   i can look at myself in a mirror just 
> fine...

Snip. Another one for "The Showcase of Pudd'nhead Bowerbird" at:

  http://www.gnutenberg.de/bowerbird/


-- 
Marcello Perathoner
webmaster@gutenberg.org

From walter.van.holst at xs4all.nl  Tue May 23 14:39:01 2006
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Tue May 23 14:44:08 2006
Subject: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries]
In-Reply-To: <44736C38.104@perathoner.de>
References: <368.52ae186.31a4879f@aol.com> <44736C38.104@perathoner.de>
Message-ID: <447380F5.6020204@xs4all.nl>

Marcello Perathoner wrote:
>
>> i do a lot of self-reflection.   i can look at myself in a mirror just 
>> fine...
>>     
>
> Snip. Another one for "The Showcase of Pudd'nhead Bowerbird" at:
>
>   http://www.gnutenberg.de/bowerbird/
>
>   
Aren't you honouring this, ehm, charachter a bit too much by putting 
this much effort into having a collection of his delusions online? I am 
not to judge your pastimes, but personally I would have preferred a 
thorough discussion on the number of angels that fits on the tip of a 
needle instead.

Regards,

 Walter
From Bowerbird at aol.com  Tue May 23 16:37:51 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue May 23 16:50:05 2006
Subject: [gutvol-d] re: Kevin Kelly in NYT on future of digital libraries]
Message-ID: <429.203f0c1.31a4f6cf@aol.com>

i said:
>    >    it is relatively easy for a programmer to "embed"
>    >    shared annotations into an e-book

marcello said:
>    >    How would you know?

um, because i've built a number of prototypes that do it.

of course.

before i make bold assertions, i've done years of research...

didn't i post a message saying i could do a demo-app for david,
to show how simple it is?,   and didn't david say, "ok, go ahead",
and didn't i say "ok, i will, you can expect it by thursday, i'd say?"

just a minute, let me check...

ok, the first two messages did indeed post ("i can do it", "ok, do it"),
but i never got around to sending the third one this morning, sorry...

i'll send it first thing tomorrow, since i think we've had enough for 
today...

still, this is fairly easy to program.   here's the code that sends out a 
note:
>    dim abcd as dictionary
>    dim socket1 as new httpsocket
>    dim lec as integer
>    abcd=new dictionary
>    abcd.value("myname")=editfield1.text
>    abcd.value("myemail")=editfield2.text
>    abcd.value("mycomment")=editfield3.text
>    socket1.setformdata abcd
>    socket1.post "http://users.aol.com/cgi-bin/guestbook/bowerbird/bbb.html"

that's it.   all the code you need.

that's a real-live form, on the web right now, found at:
>    http://users.aol.com/bowerbird/bbb.html

you can post from the page itself, filling out the form right there,
using any web-browser on any internet-connected machine...

or you can compose your comment on your own machine, and,
using the program that contains the code listed above, post that
comment to the website.   (the downstream app is interacting with
the code that runs the guestbook script on the .html page, so you
will find the same variable names as above if you look at the code
that creates the .html form on that page.   fairly easy to figure out.)

***

and here's the code that fetches the text of the webpage and
puts it in an editfield.   this is realbasic source-code, by the way.
>    dim http as new httpsocket
>    readit.text=http.get("users.aol.com/bowerbird/bbb.html",30)

open another window in the app on your machine, and the two-line
function above loads and displays the comments from the webpage.

well, _that_ was certainly simple, wasn't it?

and those are our two functions, one to post, the other to read.

***

ok, wrap a g.u.i. around it, and you've got the demo app, pronto.
with what?, a dozen lines of code?, all copied from the manual?
like i said, pretty elementary.   realbasic does all the heavy lifting.
your mileage, using your language and your compiler, may vary.

so, from my vantage-point, yes, this is dirt-simple.

do you have any questions?

-bowerbird

p.s.   marcello, thanks for creating my shrine.   i can point to it
in the future as holding a good number of my prognostications,
not to mention my phat and sassy attitude, donchajustadoreit?
thank goodness for the internet archive, right?   brewster rocks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/fd3d0e68/attachment.html
From gbnewby at pglaf.org  Tue May 23 23:08:08 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue May 23 23:08:11 2006
Subject: [gutvol-d] re: accurately converted to any possible format
In-Reply-To: <43c.18826a3.31a4978f@aol.com>
References: <43c.18826a3.31a4978f@aol.com>
Message-ID: <20060524060808.GD5644@pglaf.org>

On Tue, May 23, 2006 at 12:51:27PM -0400, Bowerbird@aol.com wrote:
> greg said:
> >    Not all input books or types can be reasonably 
> >    accurately converted to any possible format, especially 
> >    for the older titles with no well-formed & valid HTML version.
> 
> as i've said for years now, with a small commitment from you
> to consistent formatting, i could take plain-ascii files as input,
> automatically apply the typographic niceties that are expected,
> and output the results to .pdf and to .html, such that the .html 
> can be converted to a large number of other auxiliary formats.

Here's an eBook that should meet your requirements:
  http://www.gutenberg.org/etext/18257

You already have server space, to provide a conversion utility.
Looking forward the pudding...


> of course, i'm not unique.   david moynihan has done it for years.
>
> david was willing to make a small commitment to edit the files
> himself so as to obtain that consistent formatting.   i think it is
> more important to teach you how to fish than to give you fish.
>
> check with 3 tool-makers from distributed proofreaders --
> thundergnat, donovan, and bill flis -- and they'll confirm that
> a clear path for ascii-to-(x)html conversion is quite workable
> -- due to the fact that d.p. now has the required consistency --
> even with their current programs, and that if they worked on it
> a bit more, they could make it into a regular part of the workflow.

You make it sound like I'm saying "no," when all I've ever
said is "yes."  If people are put off by trying to get things
to "fit" in the existing www.gutenberg.org infrastructure,
I have two other servers (snowy.arsc.alaska.edu and readingroo.ms)
with complete copies of the PG collection for development.

Currently, two different people are pursing their dreams/ideas
on the readingroo.ms server.  You're already on snowy, and
there is room for more.

> there is no need for the more-complex switch to a .tei workflow.

To each their own pudding.  If I'm not saying "no" to you,
why would I say, "no" to someone with a different approach?

If people don't like the way DP does things, they can start
their own DP (they can even get help!!).

If people don't like the way PG postprocesses & posts eBooks,
they can grab 'em and do their own postings.  As long as there's
a reasonable adherance to the principle of unlimited distribution
etc. (http://www.gutenberg.org/about), we'll even link to 'em!

> -bowerbird
> 
> p.s.   if you only would have accepted moynihan's offer of his files
> when he made it to you, you'd already _have_ a consistent library.

I don't recall turning down David, but might have on the
grounds of being unable to effectively ingest & manage the files
he was producing.  Today, I'd offer him his own server space
to help him do things his way.
  -- Greg

From nwolcott2ster at gmail.com  Tue May 23 09:05:45 2006
From: nwolcott2ster at gmail.com (Norm Wolcott)
Date: Tue May 23 23:23:28 2006
Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital
	libraries
References: <439.1c91c11.31a3713c@aol.com><17522.28516.463762.723548@celery.zuhause.org>
	<Pine.LNX.4.60.0605230658250.19444@pglaf.org>
Message-ID: <00d901c67e83$e42e6860$650fa8c0@gw98>

There is no doubt that the Open Book Project of Brewster Kahle has the most
accurate books online. However they  have only 1000 books. The million books
project I would rate just barely above google books in quality and
completeness. It also helps if the page is not turned before the scan is
complete. The US is doing very well in providing a large number of useless
images online.

Does anyone  know how to get a book into Open Book Project? do you have to
be a library?

nwolcott2@post.harvard.edu
----- Original Message -----
From: "Michael Hart" <hart@pglaf.org>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Tuesday, May 23, 2006 10:03 AM
Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries


>
> On Mon, 22 May 2006, Bruce Albrecht wrote:
>
> > Bowerbird@aol.com writes:
> > > >    And the latest estimated I have received show that Google's
> > > >    total number of books has just recently passed 50,000
> > >
> > > i do believe you misread that.   50,000 public-domain titles,
> > > with another 42,000 under copyright, for a total of 92,000.
>
> Then I was probably right to count Google's total as ~100,000
> in my own public estimations, though I would prefer counts of
> downloadable books to avoid Google's new policy of:
>
> "Google Book Search is a means for helping users discover
> books, not to read them online and/or download them."
>
>
> > My searching found 50,000 public domain titles available as complete
> > books, and another 42,000 that should have been available as complete
> > books because they were published prior to 1923, but were only visible
> > in snippet view.  I have no idea how many books Google scanned
> > published after 1922 which are probably PD because the copyright was
> > apparently not renewed, nor the number of books scanned even though
> > the book is still under copyright.
>
> Are you saying that there are actually 50,000 downloadable
> full text Google eBooks?
>
> Any idea of their level of accuracy?
>
>
> Please allow me to renew the request from myself and LIS PhD Greg Newby,
> CEO of Project Gutenberg, for a copy of the list we can look over,
> even if we cannot make it public.
>
>
> Thanks!!!
>
> Give the world eBooks in 2006!!!
>
> Michael S. Hart
> Founder
> Project Gutenberg
>
> Blog at http://hart.pglaf.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From nwolcott2ster at gmail.com  Tue May 23 08:37:10 2006
From: nwolcott2ster at gmail.com (Norm Wolcott)
Date: Tue May 23 23:23:29 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital  libraries
References: <48c.693268.31a37fc3@aol.com>
	<6.2.0.8.0.20060522141040.02ed6a48@mail.scripps.edu>
Message-ID: <00d801c67e83$e3da7c00$650fa8c0@gw98>

My success with google pd books is about 30%. Some books are so dark they
are unreadable, let alone ocr, these seem to appear as jpegs. Others have
whole sides of pages clipped off throughout the entire book. When the images
are pretty good they seem to appear as png's. I have found most of the books
with the png extension are pretty good. All seemed to have the occasional
missing pages. I have sent many errors in to google and get a nice canned
reply, but no improvement in the output is visible nor further feedback. I
have found these books most useful when I already have a copy of the book,
and can use the google scan to help speed up the scanning/ocr process.

In fact I don't see how DP is coping with these google texts giving their
now stricter requirements that a perfect scan of every page and illustration
must be provided before the book can even get into their processing queue. A
missing part of a page or illegible word cannot be corrected from another
edition, due to their high standard of perfection. With the average book now
requiring 2 years to go through their four levels of proofreading, one does
wonder.

nwolcott2@post.harvard.edu
----- Original Message -----
From: "Frank van Drogen" <fvandrog@scripps.edu>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@pglaf.org>
Sent: Monday, May 22, 2006 5:19 PM
Subject: re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries


>
> >it's clear that google has gotten their legs under them
> >in regard to doing the scanning.  let's hope that they'll
> >get their quality-control under control very soon too...
>
> I have found less missing pages and other problems in books from Google
> then in those from the MBP and Canadian/IA. They are, however, still far
> from perfect. When they get a report regarding a missing or wrongly
scanned
> page in a PD book; it is apparently up to the providing library to get the
> problem sorted out. I've heard report of complete books being rescanned
> (with the risk of having another page missing in the end ;) ). I've also
> heard somebody mentioning that the full rescanned book was stuck behind
the
> existing one (rather space consuming, but for DP purposes a lot saver.
>
> What worries me in this is that Google doesn't seem to care whether pages
> are missing or not... as long as they get 99% of the pages from a book
> stored, changes are most search terms pointing to the particular book will
> be identified. Their interest lies in people purchasing the book via
> Amazon, Abe etc. after identifying them via book.google.com.
>
> The best quality control I have encountered so far is on Gallica, where
> appart from missing pages due to those pages missing in the original
> scanned manuscript, I've not encountered incomplete books. I'd be actually
> interesting to see how they perfrom their quality control.
>
> Frank
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From nwolcott2ster at gmail.com  Tue May 23 19:40:24 2006
From: nwolcott2ster at gmail.com (Norm Wolcott)
Date: Tue May 23 23:23:30 2006
Subject: [gutvol-d] changing email addresses
Message-ID: <007a01c67edb$d9ef15e0$650fa8c0@gw98>

Can whoever reads this please aarrange or tell me how to get reconnected with gutvol-d
Thanks
nwolcott2@post.harvard.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060523/b5e8f790/attachment-0001.html
From gbnewby at pglaf.org  Tue May 23 23:32:09 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue May 23 23:32:10 2006
Subject: [gutvol-d] changing email addresses
In-Reply-To: <007a01c67edb$d9ef15e0$650fa8c0@gw98>
References: <007a01c67edb$d9ef15e0$650fa8c0@gw98>
Message-ID: <20060524063209.GC6248@pglaf.org>

On Tue, May 23, 2006 at 10:40:24PM -0400, Norm Wolcott wrote:
> Can whoever reads this please aarrange or tell me how to get reconnected with gutvol-d

(done)

> Thanks
> nwolcott2@post.harvard.edu
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From jon.ingram at gmail.com  Wed May 24 01:22:01 2006
From: jon.ingram at gmail.com (Jon Ingram)
Date: Wed May 24 01:28:42 2006
Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital
	libraries
In-Reply-To: <00d901c67e83$e42e6860$650fa8c0@gw98>
References: <439.1c91c11.31a3713c@aol.com>
	<17522.28516.463762.723548@celery.zuhause.org>
	<Pine.LNX.4.60.0605230658250.19444@pglaf.org>
	<00d901c67e83$e42e6860$650fa8c0@gw98>
Message-ID: <4baf53720605240122j7cc1aaf2l3ea01f691f2820a5@mail.gmail.com>

On 5/23/06, Norm Wolcott <nwolcott2ster@gmail.com> wrote:
> There is no doubt that the Open Book Project of Brewster Kahle has the most
> accurate books online. However they  have only 1000 books. The million books
> project I would rate just barely above google books in quality and
> completeness. It also helps if the page is not turned before the scan is
> complete. The US is doing very well in providing a large number of useless
> images online.
>
> Does anyone  know how to get a book into Open Book Project? do you have to
> be a library?

I wish I knew. I've scanned almost a thousand books for Distributed
Proofreaders, and the Internet Archive would be a great place to
permanently store the images. Every time I've asked them on their
website, however, they either haven't replied, or have said that
letting outside people contribute material is something that they're
planning on setting up, but with no firm date.

-- 
Jon Ingram
From Bowerbird at aol.com  Wed May 24 03:03:30 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed May 24 03:03:46 2006
Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital
	libraries
Message-ID: <42a.2124045.31a58972@aol.com>

norm said:
>    My success with google pd books is about 30%.
and
>    The US is doing very well in providing a large number of useless images 
online.

see, now _that_ is the shame.
_that_ is what the complainers should be complaining about.
bad scans do _nobody_ any good.

***

jon ingram said:
>   I've scanned almost a thousand books for Distributed Proofreaders,
>    and the Internet Archive would be a great place to permanently store
>    the images.  Every time I've asked them on their website, however, 
>    they either haven't replied, or have said that letting outside people 
>    contribute material is something that they're planning on setting up, 
>    but with no firm date.

see, this is bad too.   this needs to be fixed.
when you people are willing to do this work,
something like _diskspace_ needs to become
a solved problem, not a recurring nightmare.

so, who can solve this problem for you guys?
what could i do to help you guys get it solved?

amazon just announced a new storage system.
the rates seemed pretty low to me, but i'd guess
we're looking for so much space that it'd add up.
especially since they charge you for pushing it in.
we need some concrete figures to discern pricing,
could you give us a ballpark number on that, jon?

another alternative would be to store it distributedly.
we could chop it up into a thousand pieces and have
a network of two thousand people storing it at home.
michael keeps telling us how cheap terabyte disks are.
maybe we can recreate fidonet with terabytes and d.s.l.

but face facts, if we've got a complete scan-set, it has to
be saved.   it has to.

and saved without the waste of even a second thought about it.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060524/c6abed44/attachment.html
From Bowerbird at aol.com  Wed May 24 03:16:44 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed May 24 03:16:53 2006
Subject: [gutvol-d] re: accurately converted to any possible format
Message-ID: <365.53a95d7.31a58c8c@aol.com>

i said:
>    >    as i've said for years now, with a small commitment from you
>    >    to consistent formatting, i could take plain-ascii files as input,
>    >    automatically apply the typographic niceties that are expected,
>    >    and output the results to .pdf and to .html, such that the .html
>    >    can be converted to a large number of other auxiliary formats.

greg said:
>    Here's an eBook that should meet your requirements:
>    http://www.gutenberg.org/etext/18257

well yes, that one works just fine.   already did it for that one.    :+)


>    You already have server space, 

yes, thank you very much, snowy has been very kind to me...


>   to provide a conversion utility.
>    Looking forward the pudding...

well, but first there is that minor matter of the small commitment
to ensure that all future files will conform to consistency as well...

you know, first you plug the leak, and only then clean up the mess.

and that's not a new request.   that was the original precondition.

but perhaps there's another way out of this impasse.


>   To each their own pudding.? If I'm not saying "no" to you,
>    why would I say, "no" to someone with a different approach?

it would be silly of me to expect _you_ to tell them "no",
as if they couldn't bloody well do it themselves anyway.

no, what _i_ am doing is telling them that .tei would be 
a waste of their time.   i'm giving them a friendly notice,
a heads-up, saying "hey i got it covered, you go on", but
they don't take it that way, they get all pissed off at me...
so, ok, you want to be gruff, i can be gruff, no problem.

oh yes, greg, _you_ are one of "them".   so you would be
telling yourself "no".   and i don't expect you to do that, no.
indeed, i don't even expect you to listen to me telling you
that .tei is a waste of your time.   that's fine.    you'll learn.


>    I have two other servers (snowy.arsc.alaska.edu and readingroo.ms)
>    with complete copies of the PG collection for development.

ok, now we're getting somewhere.   this could break through.


>    If people don't like the way PG postprocesses & posts eBooks,
>    they can grab 'em and do their own postings.? As long as there's
>    a reasonable adherance to the principle of unlimited distribution
>    etc. (http://www.gutenberg.org/about), we'll even link to 'em!

i have no stealaway desires, i'm happy to do things under your wing,
my intention is to try and show you -- project gutenberg -- how you
can save yourself a lot of work, and increase your unlimited distribution.
i want to help the best cyberlibrary get better, i don't want to tear it 
apart.

i would definitely agree to demonstrate some automatic transformations
of your e-texts on a library-wide basis that could show you some shit...

at first this would just be for experimental purposes.   no promises.

however -- if the effort continues, and it should ever come to that --
my "shaping" of the files by progressive transformations would result in
a substantial fork of the library, but once you have satisfied yourself that
it had retained all important data so the integrity of the books was intact
and that only inconsequential inconsistencies had been removed, you will
be amenable to _considering_ a wholesale replacement in one fell swoop.

the benefits will be quite obvious, though.
it won't be hard for you to be the decider...


>    I don't recall turning down David, but might have on the grounds of 
>    being unable to effectively ingest & manage the files he was producing.? 

>    Today, I'd offer him his own server space to help him do things his way.

ok, now you're _really_ talking.

because david already knows how to do this.

unfortunately, as we all know, he's kinda busy right now.

with any luck, though, he's bored, and restless because
the website that has been his business has been his life
has taken up a lot of his time in the last many years or so
is shut down, with any luck he's itching for some diskspace
to play with.   he might be jonesing for an ftp-interaction...

but you know, honestly, all _i_ would really need from him 
is his ascii files.   he's already made all of them consistent...

i've never asked him for them before.   but maybe now's the time?
and if you and i ask him together, so i can work on the files for you?

i've never seen his ascii files for sale, but i'd be willing to pay some.
after all, the reason i want those files is they'd save me a lot of time.
it seems quite reasonable to reimburse him for a little of that time.
especially because he's got lawyer bills, i'm sure.   (actually, i'd hope
a lawyer has taken this case pro bono for the great exposure, but
you know there are always lawsuit-related bills that have to be paid.
i consider him a trooper, and i do believe in supporting our troops!)

anyway...

with _consistent_ files, i can start turning neat tricks right soon now.
i might have to reshape david-consistency into zml-consistency, but
that'll be a lot easier than reshaping p.g.-inconsistency into anything.
so even if it's not immediate, it would be soon.

of course, i'm not taking david's files for granted, no sir,
as we haven't even yet asked.   and he _is_ busy these days.
he might even say yes, but have no time to fill the request.

but, to sum up, i would be most grateful for:
1.   on snowy, a copy of the p.g. library that i can start "shaping".
2.   also on snowy, a copy of moynihan's ascii files for experiments?
3.   diskspace for david to play, now if he wants, or sometime later.

the understanding is i'm just playing with your files.
no promises, no expectations, no guarantees.   ok?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060524/8c1991b9/attachment.html
From bruce at zuhause.org  Wed May 24 07:54:21 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Wed May 24 07:54:28 2006
Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital
	libraries
In-Reply-To: <00d901c67e83$e42e6860$650fa8c0@gw98>
References: <439.1c91c11.31a3713c@aol.com>
	<17522.28516.463762.723548@celery.zuhause.org>
	<Pine.LNX.4.60.0605230658250.19444@pglaf.org>
	<00d901c67e83$e42e6860$650fa8c0@gw98>
Message-ID: <17524.29597.975166.45531@celery.zuhause.org>

Norm Wolcott writes:
 > There is no doubt that the Open Book Project of Brewster Kahle has the most
 > accurate books online. However they  have only 1000 books. The million books
 > project I would rate just barely above google books in quality and
 > completeness. It also helps if the page is not turned before the scan is
 > complete. The US is doing very well in providing a large number of useless
 > images online.

What is the URL for this archive?  When I searched Google, I found an
"Open Book Project" at ibiblio, but it seemed to have almost nothing
there, and it looked like it wasn't from scans anyway.  If you're
referring to the "Open Content Alliance", I'd love to see a URL for
find its archives.  Internet Archive doesn't seem to have a category
for OCA texts yet.
From hart at pglaf.org  Wed May 24 08:08:19 2006
From: hart at pglaf.org (Michael Hart)
Date: Wed May 24 08:08:21 2006
Subject: [gutvol-d] changing email addresses
In-Reply-To: <007a01c67edb$d9ef15e0$650fa8c0@gw98>
References: <007a01c67edb$d9ef15e0$650fa8c0@gw98>
Message-ID: <Pine.LNX.4.60.0605240807590.26391@pglaf.org>


On Tue, 23 May 2006, Norm Wolcott wrote:

> Can whoever reads this please aarrange or tell me how to get reconnected with gutvol-d
> Thanks
> nwolcott2@post.harvard.edu

You asked about subcribing or unsubscribing from one of the
Project Gutenberg Newsletters.  Please save for reference:

This is the information from:

www.gutenberg.org/howto/subscribe-howto

Please check this site once in a while for updates:


Mailing Lists

    Various mailing lists for Project Gutenberg exist. A brief description
    of  each  follows,  along  with  a  link  to  visit  or  subscribe (or
    unsubscribe).  All  lists  live  at  http://lists.pglaf.org,  and  are
    moderated except for the discussion lists:
      * Newsletters,  with  new  eBook  listings,  calls  for  assistance,
        general information, and announcements:
           + gweekly:   Project   Gutenberg   Weekly  Newsletter.  Traffic
             consists mostly of one weekly newsletter.
           + gmonthly:   Project  Gutenberg  Monthly  newsletter.  Traffic
             consists mostly of one monthly newsletter.
      * Notification as new eBooks are posted:
           + posted:  receive  book  postings  as  they happen, along with
             other PG related internally-focused discussion (high traffic,
             over 10 postings per day)
      * Discussion for active volunteers:
           + gutvol-d:  general unmoderated volunteer discussion (moderate
             traffic)
           + gutvol-p:  programming  volunteers,  for software development
             (light traffic)
           + gutvol-w:  website  volunteers,  for website development (new
             list)
           + glibrary:  library  help,  for physically tracking down books
             and   copyright   research.   Low  traffic,  with  occasional
             requests.
      * Other lists:
           + gutvol-l: moderated volunteer announcements (light traffic)

    If  you  would  like  to  subscribe  to a mailing list simply select a
    mailing  list  name  above.  All  lists  require  a password and email
    confirmation to subscribe as part of the Lyris anti-spam measures.

    Copyright ? 1971-2004 Project Gutenberg -- All Rights Reserved.

    Most recently updated: 2004-08-07 16:33:32.
From hart at pglaf.org  Wed May 24 09:05:16 2006
From: hart at pglaf.org (Michael Hart)
Date: Wed May 24 09:05:18 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <483.8864e2.31a48cbf@aol.com>
References: <483.8864e2.31a48cbf@aol.com>
Message-ID: <Pine.LNX.4.60.0605240856380.26391@pglaf.org>


On Tue, 23 May 2006 Bowerbird@aol.com wrote:

> michael said:
>>    I wonder how great a percentage of Google's six year plan
>>    will have to expire before Mr. Bowerbird will admit that
>>    it doesn't look as if Google is even trying to make it to
>>    10 million in 6 years.
>
> michael, it certainly isn't necessary to call me "mr." bowerbird.
> but hey, it sounds kinda funny and cute, so please, be my guest.
>
> as for google's plan, i laid out my prediction last december:
>    december 14, 2004 -- 0 books
>    december 14, 2005 -- 10,000 books
>    december 14, 2006 -- 100,000 books
>    december 14, 2007 -- 1,000,000 books
>    december 14, 2008 -- 10,000,000 books
>
> so not only do i think they are still on-track, and doing well,
> i actually think they'll wrap it up by the end of 2008, michael,
> _if_ they stop at the 10.5 million unique titles they have now.

Of course a lot of this depends on what you think and eBook is.

The Library Of Congress set their own standard a decade ago as
a 99.95% accurate full text.

I don't think Google is even trying to get close to this.

Then again, some people think pictures of pages are as good
as full text, but that would entail a different definition.

1.  Making scans is trivial

2.  Making raw OCR is trivial

3.  Making a 99.95% accurate full text eBook from those is not.

1 and 2 are quick and dirty to most people, and the results
make that all too obvious, with so many errors and missing pages.

Not to mention that the "books reading each other" requires
full text at a reasonable level of accuracy.

Google is just getting to 1% of their goal, and they seem to
be heading in directions other than these systems and standards
would require to be defined as eBooks.

Of course, as you mention, when you have over 100 billion dollars,
some people are willing to fudge things for you.


"Ceci n'est pas une pipe."
Rene Magritte

"Don't confuse the map with the territory."


However, if Google manages to put 10 million books online,
even if just searchable in snippets, by December 14, 2010,
I will be glad to buy you dinner.

And it's a bet that would help the world at large!


Thanks!!!

Give the world eBooks in 2006!!!

Michael S. Hart
Founder
Project Gutenberg

Blog at http://hart.pglaf.org

From jmdyck at ibiblio.org  Wed May 24 12:15:15 2006
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Wed May 24 12:15:18 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <Pine.LNX.4.60.0605230852411.19444@pglaf.org>
References: <424.1cc65e1.31a33a85@aol.com>
	<Pine.LNX.4.60.0605221045230.32683@pglaf.org>
	<44721054.3000104@ibiblio.org>
	<Pine.LNX.4.60.0605230852411.19444@pglaf.org>
Message-ID: <4474B0C3.1040906@ibiblio.org>

Michael Hart wrote:
> 
> On Mon, 22 May 2006, Michael Dyck wrote:
>
>> Michael Hart wrote:
>>>
>>> If we double that number to 100,000, we could pretend these
>>> results indicated that Google had accomplished 1% of a goal
>>> of 10,000,000 books, in 25% of their 6 year plan.
>>
>> In 1993, PG had accomplished 1% of its goal of 10,000,
>> in about 70% of the total time.
> 
> Only if you keep refusing to acknowledge that there was not an
> ordinary production schedule until 1991. . . .

Hm. I made my statement based on this data:
   1971: 1 ebook
   1993: 100 ebooks
   2003: 10,000 ebooks
So it took  about 22 years to do the first 100, and about 32 years to do
the first 10,000. That is, 22/32 of the time to do 100/10,000 of the
books, or about 70% of the time to do the first 1%.

Another way to look at it is that the average production up to 1993 was
4.5 books per year, and after 1993 was 90 books per year, 20 times
faster. I.e., the production schedule for the first two decades was
significantly slower than that of the subsequent decade.

So, far from "refusing to acknowledge that there was not an ordinary
production schedule until 1991", this data (and my statement) actually 
*support* the claim of a radical change in the production schedule in 
the early 90's.

But if you like, we can ignore the pre-1991 data:
    1991 Jan:    10 ebooks
    1994 Jan:   110 ebooks
    2003 Oct: 10010 ebooks
So it took 3 years to do the "first" 100, and about 13 years to do the
"first" 10,000. That is, 3/13 of the time to do 100/10,000 of the books,
or around 23% of the time to do the first 1%. Which is remarkably close
to the "25% of the time to do the first 1%" that you gave for Google, above.


> Mr. Dyck has been refusing to acknowlege for some time that an
> ordinarly growth curve is impossible to create when the growth
> is linear. . .i.e. one per year. . . .

Huh? I'm pretty sure I've never refused to acknowlege that. I think you
have me confused with someone else, possibly Marcello. See, e.g.
    http://lists.pglaf.org/private.cgi/gutvol-d/2005-January/001262.html
(which had to do with picking a reference point for "Moore's Law"
growth). You should be more careful before casting aspersions.

However, if it makes things easier, I'll gladly refuse to acknowlege
that statement *now*. A linear growth curve *is* a growth curve, and a
pretty ordinary one at that. It certainly doesn't make it impossible to
create an ordinary growth curve. But that's just disagreeing with what
you said, rather than what you meant.

I think what you meant is something more like "a period of linear growth
makes it impossible to fit an exponential growth curve". My response
depends on how you interpret "fit".

If, by "fit a curve", you mean "rigidly conform to a curve", then I
agree: you can't make linear data conform to a exponential curve. But
then no real-world phenomenon will rigidly conform to an exponential
curve; there will always be some deviation.

So alternatively, if by "fit a curve", you mean "approximate with a
curve" or "model with a curve", then I disagree: you can certainly
approximate linear data with an exponential curve. Whether it's useful
depends on what you're trying to accomplish, but it's certainly not
impossible.

-Michael Dyck


From hart at pglaf.org  Wed May 24 13:07:53 2006
From: hart at pglaf.org (Michael Hart)
Date: Wed May 24 13:07:54 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <4474B0C3.1040906@ibiblio.org>
References: <424.1cc65e1.31a33a85@aol.com>
	<Pine.LNX.4.60.0605221045230.32683@pglaf.org>
	<44721054.3000104@ibiblio.org>
	<Pine.LNX.4.60.0605230852411.19444@pglaf.org>
	<4474B0C3.1040906@ibiblio.org>
Message-ID: <Pine.LNX.4.60.0605241256560.31889@pglaf.org>


On Wed, 24 May 2006, Michael Dyck wrote:

> However, if it makes things easier, I'll gladly refuse to acknowlege
> that statement *now*. A linear growth curve *is* a growth curve, and a
> pretty ordinary one at that. It certainly doesn't make it impossible to
> create an ordinary growth curve. But that's just disagreeing with what
> you said, rather than what you meant.

No, a line is not a curve.

Linear growth would have just been

    100 eBooks in 100 years.
  1,000 eBooks in 1,000 years.
10,000 eBooks in 10,000 years.
20,000 eBooks in 20,000 years.

This is not a growth curve, it is a growth line.


However, the case is even more drastic than that, as there was a period
of over a decade when 0 eBooks were added, due to the hassles of the US
Copyright Act of 1976, which took us forever to find out about, and the
truth is that we would probably NEVER have figured them out without the
help of one of the top dozen copyright lawyers in the US.

Thus, if you INSIST on talking about curves, the curve was downward.

Just one more obvious reason why you can't talk about growth curves
for this period of Project Gutenberg's history.

*

When There ARE Growth Curves:


You also mentioned that you can't fit real world items into such
growth curves, but you never mentioned that the graph I included
is a remarkably good overall fit, with deviations so small it is
hard to see them at all on such a graph representing eBooks at a
range of 500 eBook increments.

It's a much more impressive growth curve than anyone predicted--
except for some of us crazy people.


From joshua at hutchinson.net  Wed May 24 13:23:43 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed May 24 13:23:58 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <20060524202343.70C88DA5F8@ws6-6.us4.outblaze.com>


> ----- Original Message -----
> From: "Michael Hart" <hart@pglaf.org>
> On Wed, 24 May 2006, Michael Dyck wrote:
> 
> > However, if it makes things easier, I'll gladly refuse to acknowlege
> > that statement *now*. A linear growth curve *is* a growth curve, and a
> > pretty ordinary one at that. It certainly doesn't make it impossible to
> > create an ordinary growth curve. But that's just disagreeing with what
> > you said, rather than what you meant.
> 
> No, a line is not a curve.
> 

Here we go again.

1 - A growth curve doesn't necessarily curve nor does it necessarily grow.  A growth curve can be flat (no change), it can be negative (lose value like my stocks recently), it can be linear (grows at a constant rate) or it can be exponential (what we typically think of as a growth curve).

2 - Just because you don't WANT to include certain periods of PG history in your growth curve analysis doesn't mean no one else can or that they are wrong for doing so.  If someone wants to plot growth from the start of PG to present (which certainly seems reasonable to me), then that is a valid growth curve plot.  If you want to plot it in the "modern PG era" from circa 1993 to present, then that is valid, too.  Both plots need to specify the time frame they are plotting.

Since we argued this to death already (as Michael Dyck so nicely linked to previously), can we let it drop now?

Josh
From grythumn at gmail.com  Wed May 24 13:26:38 2006
From: grythumn at gmail.com (Robert Cicconetti)
Date: Wed May 24 13:26:41 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <Pine.LNX.4.60.0605241256560.31889@pglaf.org>
References: <424.1cc65e1.31a33a85@aol.com>
	<Pine.LNX.4.60.0605221045230.32683@pglaf.org>
	<44721054.3000104@ibiblio.org>
	<Pine.LNX.4.60.0605230852411.19444@pglaf.org>
	<4474B0C3.1040906@ibiblio.org>
	<Pine.LNX.4.60.0605241256560.31889@pglaf.org>
Message-ID: <15cfa2a50605241326s5caa3578lbad2d63cb1eda73e@mail.gmail.com>

Right. I think what needs to be reiterated here is that any extrapolation is
just that, an extrapolation, and that even a very complex growth curve
fitted to PG's output will vary greatly from reality based on various
environmental factors (how many volunteers are available, free time
available, major reorgs at DP, etc) and the biases built into the model.
Taking a standard exponential growth curve and extrapolating it is not
feasible for the long term. (Extend it a few hundred years out.. baring a
major population explosion, major AI improvements, or other unforeseen
circumstance, it's plainly not tenable. You run out of PD works and human
beings very quickly.)  It may be accurate enough for the short term,
however.

I know a statician on another forum.. perhaps he can explain it more
clearly.

R C

On 5/24/06, Michael Hart <hart@pglaf.org> wrote:
>
>
> On Wed, 24 May 2006, Michael Dyck wrote:
>
> > However, if it makes things easier, I'll gladly refuse to acknowlege
> > that statement *now*. A linear growth curve *is* a growth curve, and a
> > pretty ordinary one at that. It certainly doesn't make it impossible to
> > create an ordinary growth curve. But that's just disagreeing with what
> > you said, rather than what you meant.
>
> No, a line is not a curve.
>
> Linear growth would have just been
>
>     100 eBooks in 100 years.
>   1,000 eBooks in 1,000 years.
> 10,000 eBooks in 10,000 years.
> 20,000 eBooks in 20,000 years.
>
> This is not a growth curve, it is a growth line.
>
>
> However, the case is even more drastic than that, as there was a period
> of over a decade when 0 eBooks were added, due to the hassles of the US
> Copyright Act of 1976, which took us forever to find out about, and the
> truth is that we would probably NEVER have figured them out without the
> help of one of the top dozen copyright lawyers in the US.
>
> Thus, if you INSIST on talking about curves, the curve was downward.
>
> Just one more obvious reason why you can't talk about growth curves
> for this period of Project Gutenberg's history.
>
> *
>
> When There ARE Growth Curves:
>
>
> You also mentioned that you can't fit real world items into such
> growth curves, but you never mentioned that the graph I included
> is a remarkably good overall fit, with deviations so small it is
> hard to see them at all on such a graph representing eBooks at a
> range of 500 eBook increments.
>
> It's a much more impressive growth curve than anyone predicted--
> except for some of us crazy people.
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060524/45c19288/attachment.html
From Bowerbird at aol.com  Wed May 24 13:33:14 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed May 24 13:33:20 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <489.b8cd35.31a61d0a@aol.com>

michael said:
>    Then again, some people 
>    think pictures of pages are 
>    as good as full text

those are probably the people who
want to "just" _read_ the words of the book,
and don't want to copy out its text.


>   but that would entail a different definition.

yes, what a pity that their "definition" is so constrained.

when black ink is splashed onto a white page of paper,
the result is nothing more than a "picture" of the book.
but somehow, for over 500 years, that has been enough.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060524/2e7198c3/attachment.html
From jmdyck at ibiblio.org  Wed May 24 14:38:22 2006
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Wed May 24 14:38:27 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <Pine.LNX.4.60.0605241256560.31889@pglaf.org>
References: <424.1cc65e1.31a33a85@aol.com>
	<Pine.LNX.4.60.0605221045230.32683@pglaf.org>
	<44721054.3000104@ibiblio.org>
	<Pine.LNX.4.60.0605230852411.19444@pglaf.org>
	<4474B0C3.1040906@ibiblio.org>
	<Pine.LNX.4.60.0605241256560.31889@pglaf.org>
Message-ID: <4474D24E.2050409@ibiblio.org>

Michael Hart wrote:
> 
> On Wed, 24 May 2006, Michael Dyck wrote:
> 
>> However, if it makes things easier, I'll gladly refuse to acknowlege
>> that statement *now*. A linear growth curve *is* a growth curve, and a
>> pretty ordinary one at that. It certainly doesn't make it impossible to
>> create an ordinary growth curve. But that's just disagreeing with what
>> you said, rather than what you meant.
> 
> 
> No, a line is not a curve.

Well, it is to a mathematician. But if you want to use that definition
of "curve", fine, I agree, linear growth is straight, not bendy. I'm
*quite* positive that I never refused to acknowlege such a thing.

(If you look close enough, PG's ebook count is actually a step function,
piecewise constant: after a book is posted, the number of books is
constant until the next one is posted. Dunno if that satisfies your
definition of "curve".)

> However, the case is even more drastic than that, as there was a period
> of over a decade when 0 eBooks were added, due to the hassles of the US
> Copyright Act of 1976, which took us forever to find out about, and the
> truth is that we would probably NEVER have figured them out without the
> help of one of the top dozen copyright lawyers in the US.
> 
> Thus, if you INSIST on talking about curves, the curve was downward.

Uh, if no books are added, the number is constant, which I would
consider "flat" rather than "downward". But if you want to define it as
"downward", fine.

> Just one more obvious reason why you can't talk about growth curves
> for this period of Project Gutenberg's history.

A downward curve is still a curve. Though if you want to say it isn't a
"growth curve", fine. (Mind you, it's not hard to find talk of "negative
growth".)


> You also mentioned that you can't fit real world items into such
> growth curves, but you never mentioned that the graph I included
> is a remarkably good overall fit,

(Well, now you're blurring a distinction I made between two meanings of
"fit": real world data doesn't "fit" (= rigidly conform to) exponential
curves, but you can "fit" (=approximate) it to an exponential curve. But
anyway.)

And in fact, I HAVE mentioned how closely the PG numbers are
approximated by an exponential curve. See, e.g.,
    http://lists.pglaf.org/private.cgi/gutvol-d/2005-January/001263.html
and
    http://lists.pglaf.org/private.cgi/gutvol-d/2005-January/001456.html

But now we (well, you, really) have strayed from the topic that brought 
me in, the comparison between Google's progress and PG's (and dang, I 
wish I'd changed the subject line at that point), so my interest in this 
discussion is probably fading.

-Michael


From sly at victoria.tc.ca  Wed May 24 22:00:07 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed May 24 22:00:12 2006
Subject: [gutvol-d] re: accurately converted to any possible format
In-Reply-To: <20060524060808.GD5644@pglaf.org>
References: <43c.18826a3.31a4978f@aol.com> <20060524060808.GD5644@pglaf.org>
Message-ID: <Pine.GSO.4.58.0605242157210.20432@vtn1.victoria.tc.ca>


On Tue, 23 May 2006, Greg Newby wrote:

> > p.s.   if you only would have accepted moynihan's offer of his files
> > when he made it to you, you'd already _have_ a consistent library.
>
> I don't recall turning down David, but might have on the
> grounds of being unable to effectively ingest & manage the files
> he was producing.  Today, I'd offer him his own server space
> to help him do things his way.

If I remember correctly, a big issue with the David Moynihan
files was that many of them were not copyright-cleared for
PG, and incorporating them into the PG collection would have
taken a lot of effort.

Andrew
From gbnewby at pglaf.org  Wed May 24 22:02:50 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed May 24 22:02:51 2006
Subject: [gutvol-d] re: accurately converted to any possible format
In-Reply-To: <365.53a95d7.31a58c8c@aol.com>
References: <365.53a95d7.31a58c8c@aol.com>
Message-ID: <20060525050250.GG6694@pglaf.org>

On Wed, May 24, 2006 at 06:16:44AM -0400, Bowerbird@aol.com wrote:
> i said:
> >    >    as i've said for years now, with a small commitment from you
> >    >    to consistent formatting, i could take plain-ascii files as input,
> >    >    automatically apply the typographic niceties that are expected,
> >    >    and output the results to .pdf and to .html, such that the .html
> >    >    can be converted to a large number of other auxiliary formats.
> 
> greg said:
> >    Here's an eBook that should meet your requirements:
> >    http://www.gutenberg.org/etext/18257
> 
> well yes, that one works just fine.   already did it for that one.    :+)


I don't understand.  Where is the URL that does conversion on the
fly from that file to arbitrary formats?  PDF, HTML...others...with
user-specified settings.

More response, way at the bottom:


> 
> >    You already have server space, 
> 
> yes, thank you very much, snowy has been very kind to me...
> 
> 
> >   to provide a conversion utility.
> >    Looking forward the pudding...
> 
> well, but first there is that minor matter of the small commitment
> to ensure that all future files will conform to consistency as well...
> 
> you know, first you plug the leak, and only then clean up the mess.
> 
> and that's not a new request.   that was the original precondition.
> 
> but perhaps there's another way out of this impasse.
> 
> 
> >   To each their own pudding.? If I'm not saying "no" to you,
> >    why would I say, "no" to someone with a different approach?
> 
> it would be silly of me to expect _you_ to tell them "no",
> as if they couldn't bloody well do it themselves anyway.
> 
> no, what _i_ am doing is telling them that .tei would be 
> a waste of their time.   i'm giving them a friendly notice,
> a heads-up, saying "hey i got it covered, you go on", but
> they don't take it that way, they get all pissed off at me...
> so, ok, you want to be gruff, i can be gruff, no problem.
> 
> oh yes, greg, _you_ are one of "them".   so you would be
> telling yourself "no".   and i don't expect you to do that, no.
> indeed, i don't even expect you to listen to me telling you
> that .tei is a waste of your time.   that's fine.    you'll learn.
> 
> 
> >    I have two other servers (snowy.arsc.alaska.edu and readingroo.ms)
> >    with complete copies of the PG collection for development.
> 
> ok, now we're getting somewhere.   this could break through.
> 
> 
> >    If people don't like the way PG postprocesses & posts eBooks,
> >    they can grab 'em and do their own postings.? As long as there's
> >    a reasonable adherance to the principle of unlimited distribution
> >    etc. (http://www.gutenberg.org/about), we'll even link to 'em!
> 
> i have no stealaway desires, i'm happy to do things under your wing,
> my intention is to try and show you -- project gutenberg -- how you
> can save yourself a lot of work, and increase your unlimited distribution.
> i want to help the best cyberlibrary get better, i don't want to tear it 
> apart.
> 
> i would definitely agree to demonstrate some automatic transformations
> of your e-texts on a library-wide basis that could show you some shit...
> 
> at first this would just be for experimental purposes.   no promises.
> 
> however -- if the effort continues, and it should ever come to that --
> my "shaping" of the files by progressive transformations would result in
> a substantial fork of the library, but once you have satisfied yourself that
> it had retained all important data so the integrity of the books was intact
> and that only inconsequential inconsistencies had been removed, you will
> be amenable to _considering_ a wholesale replacement in one fell swoop.
> 
> the benefits will be quite obvious, though.
> it won't be hard for you to be the decider...
> 
> 
> >    I don't recall turning down David, but might have on the grounds of 
> >    being unable to effectively ingest & manage the files he was producing.? 
> 
> >    Today, I'd offer him his own server space to help him do things his way.
> 
> ok, now you're _really_ talking.
> 
> because david already knows how to do this.
> 
> unfortunately, as we all know, he's kinda busy right now.
> 
> with any luck, though, he's bored, and restless because
> the website that has been his business has been his life
> has taken up a lot of his time in the last many years or so
> is shut down, with any luck he's itching for some diskspace
> to play with.   he might be jonesing for an ftp-interaction...
> 
> but you know, honestly, all _i_ would really need from him 
> is his ascii files.   he's already made all of them consistent...
> 
> i've never asked him for them before.   but maybe now's the time?
> and if you and i ask him together, so i can work on the files for you?
> 
> i've never seen his ascii files for sale, but i'd be willing to pay some.
> after all, the reason i want those files is they'd save me a lot of time.
> it seems quite reasonable to reimburse him for a little of that time.
> especially because he's got lawyer bills, i'm sure.   (actually, i'd hope
> a lawyer has taken this case pro bono for the great exposure, but
> you know there are always lawsuit-related bills that have to be paid.
> i consider him a trooper, and i do believe in supporting our troops!)
> 
> anyway...
> 
> with _consistent_ files, i can start turning neat tricks right soon now.
> i might have to reshape david-consistency into zml-consistency, but
> that'll be a lot easier than reshaping p.g.-inconsistency into anything.
> so even if it's not immediate, it would be soon.
> 
> of course, i'm not taking david's files for granted, no sir,
> as we haven't even yet asked.   and he _is_ busy these days.
> he might even say yes, but have no time to fill the request.
> 
> but, to sum up, i would be most grateful for:
> 1.   on snowy, a copy of the p.g. library that i can start "shaping".

?? It's an official mirror, and you have a server login.  Help yourself.

/data1/ftp/mirrors/gutenberg
http://snowy.arsc.alaska.edu/gutenberg
ftp://snowy.arsc.alaska.edu/

There's enough space on /data1, where your home directory is, to
make your own copy of as much the collection as you want in your
own directory.

> 2.   also on snowy, a copy of moynihan's ascii files for experiments?

I just bought his PDF files, and would buy ASCII if they're for
sale.  Or, let's ask.

> 3.   diskspace for david to play, now if he wants, or sometime later.

Of course.

> the understanding is i'm just playing with your files.
> no promises, no expectations, no guarantees.   ok?

Have fun :)
  -- Greg

> -bowerbird

> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From gbnewby at pglaf.org  Wed May 24 22:10:24 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed May 24 22:10:26 2006
Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital
	libraries
In-Reply-To: <42a.2124045.31a58972@aol.com>
References: <42a.2124045.31a58972@aol.com>
Message-ID: <20060525051024.GH6694@pglaf.org>

> jon ingram said:
> >   I've scanned almost a thousand books for Distributed Proofreaders,
> >    and the Internet Archive would be a great place to permanently store
> >    the images.  Every time I've asked them on their website, however, 
> >    they either haven't replied, or have said that letting outside people 
> >    contribute material is something that they're planning on setting up, 
> >    but with no firm date.

Woah there, cowboy.

I've been waiting for DP to provide raw page scans for *years*.  This is
something I discussed with Charles & Juliet years ago.  The whitewashers
are ready.  iBiblio is ready.

We have other servers if growth is too fast.  Yes, that includes the
Internet Archive, where we have several usernames...plus our official
backup mirror.

I've also been pressing to get preprints from DP...scans before the
postprocessing is done, to release "to the wild" before they're quite
ready.  (Last count there are over 800 of these.)  There's even a new
preprints section (though this might not be the way we'd to DP
preprints) at http://preprints.readingroo.ms

If you could help to move things forward on either scans or preprints,
I'd be very grateful!  (Ditto for anyone else reading.)
  -- Greg
From gbnewby at pglaf.org  Wed May 24 22:12:27 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed May 24 22:12:29 2006
Subject: [gutvol-d] re: accurately converted to any possible format
In-Reply-To: <Pine.GSO.4.58.0605242157210.20432@vtn1.victoria.tc.ca>
References: <43c.18826a3.31a4978f@aol.com> <20060524060808.GD5644@pglaf.org>
	<Pine.GSO.4.58.0605242157210.20432@vtn1.victoria.tc.ca>
Message-ID: <20060525051227.GA7507@pglaf.org>

On Wed, May 24, 2006 at 10:00:07PM -0700, Andrew Sly wrote:
> 
> 
> On Tue, 23 May 2006, Greg Newby wrote:
> 
> > > p.s.   if you only would have accepted moynihan's offer of his files
> > > when he made it to you, you'd already _have_ a consistent library.
> >
> > I don't recall turning down David, but might have on the
> > grounds of being unable to effectively ingest & manage the files
> > he was producing.  Today, I'd offer him his own server space
> > to help him do things his way.
> 
> If I remember correctly, a big issue with the David Moynihan
> files was that many of them were not copyright-cleared for
> PG, and incorporating them into the PG collection would have
> taken a lot of effort.
> 
> Andrew

That's true.  Today, we have preprints.readingroo.ms, and
www.gutenberg.us that have easier procedures for copyright
clearance.

But very many of David's eBooks are PG titles, just reformatted,
and with the PG header/footer/license stripped.
  -- Greg
From hyphen at hyphenologist.co.uk  Wed May 24 23:16:58 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Wed May 24 23:17:11 2006
Subject: [gutvol-d] Subject: Kingkong was: [gweekly] PT1b Weekly Project
	Gutenberg Newsletter
Message-ID: <qpia725aeetdrm1hpb72i8tv99hibglsha@4ax.com>

On Wed, 24 May 2006 09:50:50 -0700 (PDT),  Michael Hart <hart@pglaf.org>
wrote:

|General Catalog of Old Books and Authors

Including Dates of Death.!!!!!
For the rest of the world on Life+70 or Life+50, i.e. non USA, date of
death is crucial for copyright.

|http://www.kingkong.demon.co.uk/ngcoba/ngcoba.htm
|
|which now indexes 24,000 books available free online, including all
|PG(US) & PG(Aus)'s books, along with some basic date information
|about them and their authors where you can find more.

IME a great site.   
He missed a couple of ?my? authors and added them quickly when I informed
him.   Check that he has your favourite authors.
-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain

From Bowerbird at aol.com  Thu May 25 01:34:46 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu May 25 01:34:54 2006
Subject: [gutvol-d] re: accurately converted to any possible format
Message-ID: <42f.2268306.31a6c626@aol.com>

greg said:
>    I don't understand.? Where is the URL that 
>    does conversion on the fly from that file 
>    to arbitrary formats?? PDF, HTML...others...
>    with user-specified settings.

there is no u.r.l.
conversions are done by my viewer-app.
you don't need to be involved, since
your users can do it all by themselves.


>    There's enough space on /data1, 
>    where your home directory is, 
>    to make your own copy of 
>    as much the collection as you want 
>    in your own directory.

great!   i'll figure it out.   thank you!

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/2d3acbbe/attachment.html
From greg at durendal.org  Thu May 25 04:39:10 2006
From: greg at durendal.org (Greg Weeks)
Date: Thu May 25 05:00:03 2006
Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital
	libraries
In-Reply-To: <20060525051024.GH6694@pglaf.org>
References: <42a.2124045.31a58972@aol.com> <20060525051024.GH6694@pglaf.org>
Message-ID: <Pine.LNX.4.63.0605250737090.16424@durendal.durendal.org>

On Wed, 24 May 2006, Greg Newby wrote:

> If you could help to move things forward on either scans or preprints,
> I'd be very grateful!  (Ditto for anyone else reading.)

I don't have everything on DP, but I have personal copies of everything 
I've ever scanned. What format do you want them in and where do you want 
them uploaded to? There are a number of other people who would do this 
also, even if it's not an official DP thing.

-- 
Greg Weeks
http://durendal.org:8080/greg/

From joshua at hutchinson.net  Thu May 25 05:26:22 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu May 25 05:26:23 2006
Subject: !@!re: [gutvol-d] Kevin Kelly in NYT on future of digital
	libraries
Message-ID: <20060525122622.C6712DA5C3@ws6-6.us4.outblaze.com>


> ----- Original Message -----
> From: "Greg Newby" <gbnewby@pglaf.org>
> 
> Woah there, cowboy.
> 
> I've been waiting for DP to provide raw page scans for *years*.  This is
> something I discussed with Charles & Juliet years ago.  The whitewashers
> are ready.  iBiblio is ready.
> 

Have we ironed out HOW the images should be organized?  I remember Marcello put forth a pretty good proposal for image naming/organization, but the response was luke-warm at best.

I don't have nearly the backlog of images available that Jon does (the man is a scanning machine) ... but I'd be happy to start submitting images with my books from here on out if we've got the organization worked out.

Josh
From hart at pglaf.org  Thu May 25 10:17:33 2006
From: hart at pglaf.org (Michael Hart)
Date: Thu May 25 10:17:36 2006
Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <4474D24E.2050409@ibiblio.org>
References: <424.1cc65e1.31a33a85@aol.com>
	<Pine.LNX.4.60.0605221045230.32683@pglaf.org>
	<44721054.3000104@ibiblio.org>
	<Pine.LNX.4.60.0605230852411.19444@pglaf.org>
	<4474B0C3.1040906@ibiblio.org>
	<Pine.LNX.4.60.0605241256560.31889@pglaf.org>
	<4474D24E.2050409@ibiblio.org>
Message-ID: <Pine.LNX.4.60.0605250932330.18093@pglaf.org>


On Wed, 24 May 2006, Michael Dyck wrote:

> Michael Hart wrote:
>> 
>> On Wed, 24 May 2006, Michael Dyck wrote:
>> 
>>> However, if it makes things easier, I'll gladly refuse to acknowlege
>>> that statement *now*. A linear growth curve *is* a growth curve, and a
>>> pretty ordinary one at that. It certainly doesn't make it impossible to
>>> create an ordinary growth curve. But that's just disagreeing with what
>>> you said, rather than what you meant.
>> 
>> 
>> No, a line is not a curve.
>
> Well, it is to a mathematician.

OK, back to basics, I have consulted with some mathematicians,
not that I think you didn't know this, but you are pressuring
me to make the point, so I will, as gently as possible mostly
with the presentation aids provided by the mathematicians.


Presumed givens:

A 9 year period in which 1 title was added each year to the
index of what would later become known as Project Gutenberg.
1971-1979

A ~15 year period of increasing growth as represented by the
graph previously presented.
1991-2006


Results:

First for:

A 9 year period in which 1 title was added each year to the
index of what would later become known as Project Gutenberg.
1971-1979


The equation that would describe this is known as a
"Linear Equation" of the variety y = mx + b

When plotted on the normal "x,y" plane this would be a straight line,
no matter what numeric variables you plugged into the equation.

When describing such results the term "line" is used to represent a
straight line of this nature, while various terms describing curves
are used in higher order equations.

Very often the term "exponential" is used in common parlance to
describe what is really just a "multiplicative" or "geometric"
growth pattern, but there is no need to go into that here.

When describing such a linear equation in opposition to curved
equations, the usual terms are "line" and "curve."

We would normally say that a line "intersects" with a curve,
in such a case where the equations have a common solution.

Trying to fit a straight line into an equation for a curve
has been one of the mathematical problems of the ages.

Look up "squaring the circle" for some history on this.

However, generally speaking, a curve could not contain
a portion of a graph that was identical to the portion
of a straight line.

In this case it has been speculated that the growth of
Project Gutenberg listings could be approximated via a
curve, and in particular that the end result should be
in some total comparison to the curve known as

Moore's Law

Which specifies that x should double every 18 months.

Obviously it would be very rare indeed for real world
curves to exactly match mathematical equations in the
sense of human endeavors, but a quick look at what we
have seen as the report of the dates of each 500 book
level passed by Project Gutenberg, would indicate the
approximate match to several well known curves.

I'll leave it to you to choose which is the best fit.


> But if you want to use that definition of "curve",
> fine, I agree, linear growth is straight, not bendy.
> I'm *quite* positive that I never refused to acknowlege such a thing.

You certainly seemed to be yesterday, apparently demanding
the above explanations of the difference between lines and
curves when describing graphs, intersections, etc.


> (If you look close enough, PG's ebook count is actually a step function,
> piecewise constant: after a book is posted, the number of books is
> constant until the next one is posted. Dunno if that satisfies your
> definition of "curve".)

I think that is why our mathematical friends above said "approximates"
a curve. . .since we are only using the "counting numbers."

A true curve would include many other kinds of numbers.

However, in this case, using counting numbers as input,
and 1/4 year increments on the graph, you do get graphs
that would normally be described as curves.

Growth curves is the term normally used.

In this case the growth "line" does not fit the growth "curve."


>> However, the case is even more drastic than that, as there was a period
>> of over a decade when 0 eBooks were added, due to the hassles of the US
>> Copyright Act of 1976, which took us forever to find out about, and the
>> truth is that we would probably NEVER have figured them out without the
>> help of one of the top dozen copyright lawyers in the US.
>> 
>> Thus, if you INSIST on talking about curves, the curve was downward.
>
> Uh, if no books are added, the number is constant, which I would
> consider "flat" rather than "downward". But if you want to define it as
> "downward", fine.

The number used in the equation to create a graph in approximation
to the performance would decrease, hence the term "downward" might
be applicable; a line with no growth lies "downward" of lines that
represent growth statistics.


>> Just one more obvious reason why you can't talk about growth curves
>> for this period of Project Gutenberg's history.
>
> A downward curve is still a curve. Though if you want to say it isn't a
> "growth curve", fine. (Mind you, it's not hard to find talk of "negative
> growth".)

In this case it would literally be a "negative growth" of the
slope of the line.

Technically the second order derivative, which is where these
terms probably go beyond what is appropriate here.


>> You also mentioned that you can't fit real world items into such
>> growth curves, but you never mentioned that the graph I included
>> is a remarkably good overall fit,
>
> (Well, now you're blurring a distinction I made between two meanings of
> "fit": real world data doesn't "fit" (= rigidly conform to) exponential
> curves, but you can "fit" (=approximate) it to an exponential curve. But
> anyway.)

I think all this was anticipated in the help I received above.

If not, ask for more detailed explanations.


> And in fact, I HAVE mentioned how closely the PG numbers are
> approximated by an exponential curve. See, e.g.,
>   http://lists.pglaf.org/private.cgi/gutvol-d/2005-January/001263.html
> and
>   http://lists.pglaf.org/private.cgi/gutvol-d/2005-January/001456.html

This was also apparently anticipated by our mathematical friends,
when they mentioned "approximate match to several well known curves."

and

"I'll leave it to you to choose which is the best fit."

Obviously there are a number of equations that make approximate fits,
obvious even to someone who hadn't seen your example in the URLs.


> But now we (well, you, really) have strayed from the topic that brought me 
> in, the comparison between Google's progress and PG's (and dang, I wish I'd 
> changed the subject line at that point), so my interest in this discussion is 
> probably fading.

Ah, it would appear that you already knew you were painting us into a corner.

Then I hope that the great effort spent in replying to your messages was
not a total waste for either yourself or the rest of us.

It is only a waste to me if no one gains an apprecation of how seriously
I take your messages, and of my willingness to provide the best answers.

However, as I stated in my opening paragraph, I presumed you already
knew all of this and thus presumed you were only asking the question
for other reasons.

May I ask what those reasons were?

>
> -Michael


Thanks!!!

Give the world eBooks in 2006!!!

Michael S. Hart
Founder
Project Gutenberg

Blog at http://hart.pglaf.org

From hart at pglaf.org  Thu May 25 10:51:59 2006
From: hart at pglaf.org (Michael Hart)
Date: Thu May 25 10:52:00 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <489.b8cd35.31a61d0a@aol.com>
References: <489.b8cd35.31a61d0a@aol.com>
Message-ID: <Pine.LNX.4.60.0605251046270.18093@pglaf.org>


It's so simple that even Mr. Bowerbird's rhetoric cannot confuse the issue:

A picture of the pages of a book, even if complete and OCRable,
is simply not a a full text eBook.

1.  It takes many times the drive space.

2.  It takes much more download wire time.

3.  You can't do ANY of the things you can do with full text,

EXCEPT THE MOST IMPORTANT. . .YOU CAN READ IT.

But the expense in time and money is much larger,
and it's much harder to write research papers.

By Mr. Bowerbird's logic, a pre-Gutenerg book
would be as useful as a post-Gutenberg book.

On Wed, 24 May 2006 Bowerbird@aol.com wrote:

> michael said:
>>    Then again, some people
>>    think pictures of pages are
>>    as good as full text
>
> those are probably the people who
> want to "just" _read_ the words of the book,
> and don't want to copy out its text.
>
>
>>   but that would entail a different definition.
>
> yes, what a pity that their "definition" is so constrained.
>
> when black ink is splashed onto a white page of paper,
> the result is nothing more than a "picture" of the book.
> but somehow, for over 500 years, that has been enough.
>
> -bowerbird
>
From hart at pglaf.org  Thu May 25 10:53:39 2006
From: hart at pglaf.org (Michael Hart)
Date: Thu May 25 10:53:40 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <15cfa2a50605241326s5caa3578lbad2d63cb1eda73e@mail.gmail.com>
References: <424.1cc65e1.31a33a85@aol.com>
	<Pine.LNX.4.60.0605221045230.32683@pglaf.org>
	<44721054.3000104@ibiblio.org>
	<Pine.LNX.4.60.0605230852411.19444@pglaf.org>
	<4474B0C3.1040906@ibiblio.org>
	<Pine.LNX.4.60.0605241256560.31889@pglaf.org>
	<15cfa2a50605241326s5caa3578lbad2d63cb1eda73e@mail.gmail.com>
Message-ID: <Pine.LNX.4.60.0605251052360.18093@pglaf.org>


My own predictions have always changed from eBook production to
eBook translation at about the 10 million eBook mark.

If you haven't seen those prediction, I will repost on request.

mh


On Wed, 24 May 2006, Robert Cicconetti wrote:

> Right. I think what needs to be reiterated here is that any extrapolation is
> just that, an extrapolation, and that even a very complex growth curve
> fitted to PG's output will vary greatly from reality based on various
> environmental factors (how many volunteers are available, free time
> available, major reorgs at DP, etc) and the biases built into the model.
> Taking a standard exponential growth curve and extrapolating it is not
> feasible for the long term. (Extend it a few hundred years out.. baring a
> major population explosion, major AI improvements, or other unforeseen
> circumstance, it's plainly not tenable. You run out of PD works and human
> beings very quickly.)  It may be accurate enough for the short term,
> however.
>
> I know a statician on another forum.. perhaps he can explain it more
> clearly.
>
> R C
>
> On 5/24/06, Michael Hart <hart@pglaf.org> wrote:
>> 
>> 
>> On Wed, 24 May 2006, Michael Dyck wrote:
>> 
>> > However, if it makes things easier, I'll gladly refuse to acknowlege
>> > that statement *now*. A linear growth curve *is* a growth curve, and a
>> > pretty ordinary one at that. It certainly doesn't make it impossible to
>> > create an ordinary growth curve. But that's just disagreeing with what
>> > you said, rather than what you meant.
>> 
>> No, a line is not a curve.
>> 
>> Linear growth would have just been
>> 
>>     100 eBooks in 100 years.
>>   1,000 eBooks in 1,000 years.
>> 10,000 eBooks in 10,000 years.
>> 20,000 eBooks in 20,000 years.
>> 
>> This is not a growth curve, it is a growth line.
>> 
>> 
>> However, the case is even more drastic than that, as there was a period
>> of over a decade when 0 eBooks were added, due to the hassles of the US
>> Copyright Act of 1976, which took us forever to find out about, and the
>> truth is that we would probably NEVER have figured them out without the
>> help of one of the top dozen copyright lawyers in the US.
>> 
>> Thus, if you INSIST on talking about curves, the curve was downward.
>> 
>> Just one more obvious reason why you can't talk about growth curves
>> for this period of Project Gutenberg's history.
>> 
>> *
>> 
>> When There ARE Growth Curves:
>> 
>> 
>> You also mentioned that you can't fit real world items into such
>> growth curves, but you never mentioned that the graph I included
>> is a remarkably good overall fit, with deviations so small it is
>> hard to see them at all on such a graph representing eBooks at a
>> range of 500 eBook increments.
>> 
>> It's a much more impressive growth curve than anyone predicted--
>> except for some of us crazy people.
>> 
>> 
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>> 
>
From jon at noring.name  Thu May 25 11:17:32 2006
From: jon at noring.name (Jon Noring)
Date: Thu May 25 11:17:37 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
In-Reply-To: <Pine.LNX.4.60.0605251046270.18093@pglaf.org>
References: <489.b8cd35.31a61d0a@aol.com>
	<Pine.LNX.4.60.0605251046270.18093@pglaf.org>
Message-ID: <1779074412.20060525121732@noring.name>

Michael Hart wrote:
> Bowerbird wrote:

>> when black ink is splashed onto a white page of paper,
>> the result is nothing more than a "picture" of the book.
>> but somehow, for over 500 years, that has been enough.

> It's so simple that even Mr. Bowerbird's rhetoric cannot confuse the
> issue:
>
> A picture of the pages of a book, even if complete and OCRable, is
> simply not a a full text eBook.
>
> 1.  It takes many times the drive space.
>
> 2.  It takes much more download wire time.
>
> 3.  You can't do ANY of the things you can do with full text,
>
> EXCEPT THE MOST IMPORTANT. . .YOU CAN READ IT.
>
> But the expense in time and money is much larger, and it's much
> harder to write research papers.
>
> By Mr. Bowerbird's logic, a pre-Gutenerg book would be as useful as
> a post-Gutenberg book.

Michael hits the nail on the head.

The important thing is that in the digital realm, we are no longer
constrained to the physical limitations of ink on pressed sheets of
pulped cellulosic materials (also known as paper.)

Thus, it makes no sense to be constrained in our thinking to the
pre-digital world. Nor should we be satisfied with only trying to
mimic that world.

That is, we need to think of what digital texts could be, and all the
various things that they may accomplish, when not constrained as paper
books have to be constrained.

Therefore, the most important question we should ask is: "What are ALL
the things we'd like digitized books to enable?"

The full answer to this question establishes a clear list of
requirements that our digitizing processes, formats, metadata, and
reading systems need to meet.

Of course, what I just said is patently obvious. But many of these
discussions tend to digress back to "how to emulate paper books"
rather than on "how to surpass paper books." I'm happy to see Michael
try to push the discussion back into "what can digital books do that
paper books cannot do."

Jon Noring
OpenReader Consortium

From bruce at zuhause.org  Thu May 25 11:30:12 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Thu May 25 11:30:15 2006
Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital
	libraries
In-Reply-To: <Pine.LNX.4.60.0605250932330.18093@pglaf.org>
References: <424.1cc65e1.31a33a85@aol.com>
	<Pine.LNX.4.60.0605221045230.32683@pglaf.org>
	<44721054.3000104@ibiblio.org>
	<Pine.LNX.4.60.0605230852411.19444@pglaf.org>
	<4474B0C3.1040906@ibiblio.org>
	<Pine.LNX.4.60.0605241256560.31889@pglaf.org>
	<4474D24E.2050409@ibiblio.org>
	<Pine.LNX.4.60.0605250932330.18093@pglaf.org>
Message-ID: <17525.63412.804984.36605@celery.zuhause.org>

My biggest problem with your growth extrapolations, Michael, is that
in the last few years, Distributed Proofreaders has been the primary
source of new Project Gutenberg books, and it's clear from their
statistics that they're not putting out an exponentially increasing
number of books.  It may well be that PG had an exponential growth
rate in the past (which is easy to do when you're starting out with
production in the single digits per year, and grow to thousands per
year), but that doesn't mean it's sustainable.  Explain to us how
you're going to get the labor to validate the ebooks produced, or how
the OCR will become reliable enough to skip the validation process,
and then we'll believe that you can sustain the growth rates seen when
Project Gutenberg's main source of new books when from a few hundred
(if that many) dedicated individuals, to about 5000 Distributed
Proofreaders.
From gbnewby at pglaf.org  Thu May 25 11:56:51 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu May 25 11:56:52 2006
Subject: [gutvol-d] re: accurately converted to any possible format
In-Reply-To: <42f.2268306.31a6c626@aol.com>
References: <42f.2268306.31a6c626@aol.com>
Message-ID: <20060525185651.GC20994@pglaf.org>

On Thu, May 25, 2006 at 04:34:46AM -0400, Bowerbird@aol.com wrote:
> greg said:
> >    I don't understand.? Where is the URL that 
> >    does conversion on the fly from that file 
> >    to arbitrary formats?? PDF, HTML...others...
> >    with user-specified settings.
> 
> there is no u.r.l.
> conversions are done by my viewer-app.
> you don't need to be involved, since
> your users can do it all by themselves.

If it's not online and on the fly, then it's
not what I've been talking about.  Sorry.

You seem to be saying that there is exactly
one application in the world that can change
a ZML-formatted eBook into HTML, PDF and a variety
of other formats.  That application is your
viewer application, which has already been discussed
in gutvol-d.

If/when there is such an application that can
run on our Unix/Linux servers, operate on the
fly, and integrate with the Web back end, it
will be great to provide access to PG readers.

We've done this for plucker.  We have not done
it for, for example, text-to-speech because nearly
all of the products are for standalone WinPCs.  (I've
wrestled with Festival somewhat and know it can do
the job, but can't figure it out myself.  Help
invited!).  I do think we can do it for Braille
with nfbtrans, and I've been negligent in helping
Marcello to set it up.  

Where can users who might be interested download
your viewer app from?  I don't see it on
the snowy site.  If it's out there, you could
write a little blurb for the PG newsletter inviting
people to try it.
  -- Greg
From Bowerbird at aol.com  Thu May 25 13:01:24 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu May 25 13:01:42 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <37a.30fde4a.31a76714@aol.com>

michael said:
>    It's so simple that even Mr. Bowerbird's rhetoric 
>    cannot confuse the issue:

it's even so simple that mr. hart's rhetoric cannot confuse it.       :+)


>    A picture of the pages of a book, even if complete and OCRable,
>    is simply not a a full text eBook.

a picture, by definition, is not text.

whether or not a _scan-set_ qualifies as an _e-book_,
however, is more of a semantic issue than anything else.


>    1.? It takes many times the drive space.

there's no question about that.   and no need to reiterate it.


>    2.? It takes much more download wire time.

there's no question about that.   and no need to reiterate it.


>    3.? You can't do ANY of the things you can do with full text,

except...


>    EXCEPT THE MOST IMPORTANT. . .YOU CAN READ IT.

and here we have the most important concession, finally --
that a person can indeed _read_ an e-book that is a scan-set.

so, for the person who _only_ wants to _read_ a book,
a scan-set of that book is all that that person needs...

nobody, least of all me, is going to argue with the position that
digital text is _better_ than a scan-set in a multitude of ways...
so if that's what you think this is about, michael, you're wrong.

nobody, least of all me, is saying that people should _settle_ for
a scan-set instead of digital text, especially for our cyberlibrary.
so if that's what you think this is about, michael, you're wrong.


>    But the expense in time and money is much larger,

i can figure out some interpretations of this that make sense,
thinking along the lines of file-storage and bandwidth costs.
but both of those things are cheap now, and getting cheaper.

and a look at the whole picture shows that _scanning_ incurs
the biggest cost, in terms of human labor and machine-costs.
so it's a good thing a rich company like google is doing _that_.

as for the next step -- the o.c.r. followed by the proofing --
that incurs more cost, mostly in the form of human labor costs.

so -- actually -- it would _cost_ less to just "settle" for the scans.


that would be a false economy, though, because the extra time
that it takes to convert a scan-set into digital-text is _worth_it_
(i.e., the benefits of having digital text _and_ the scan-set are
significantly greater than those of having _only_ the scan-set,
and the increase in benefits is greater than the digitization costs,
and this will become increasingly so as we automate the proofing.)

so there's no question that we should keep doing the digitization.
so if that's what you think this is about, michael, you're wrong.

it is important to keep in mind, though, that we have no funds
for doing this digitization, so we are relying on _volunteers_,
and the number of volunteers we have now, and can reasonably
anticipate having in the near future, is not even _close_ to being
enough to keep up with the rate at which google is now scanning.

and when google kicks up their rate, and the new scanning projects
get going as well, and the ones currently operating cumulate their
results, the number of undigitized scan-sets will become _huge_...

so it's counterproductive -- to say the least -- to continue to foster
some kind of unrealistic attitude about these scan-sets as being
totally without value.   we need to see they are of _immense_ value.

to continue to throw rocks at them because "they can't be searched"
or "you can't copy-and-paste their text" is silly to the point of stupid.
people lived with paper-books -- which cannot be searched and
from which you cannot copy-and-paste text -- for 500 years, man!

and maybe it's good training for me as an anarchist to have my hero
say something silly to the point of stupid.   but it sure is disillusioning.

the task before us now is to find a way to make use of the millions of
scan-sets that are out there, standing in line, waiting to be digitized.


>    and it's much harder to write research papers.

i'll let the researchers worry about writing their papers.
after all, that's why they get paid the big bucks, right?


>    By Mr. Bowerbird's logic, a pre-Gutenerg book
>    would be as useful as a post-Gutenberg book.

only if you try and twist my logic to say things that i don't.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/d5273add/attachment.html
From Bowerbird at aol.com  Thu May 25 13:04:11 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu May 25 13:04:28 2006
Subject: [gutvol-d] Kevin Kelly in NYT on future of digital libraries
Message-ID: <46b.180c14d.31a767bb@aol.com>

jon said:
>    Of course, what I just said is patently obvious.

of course it is.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/c9e7058e/attachment.html
From Bowerbird at aol.com  Thu May 25 13:46:56 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu May 25 13:47:05 2006
Subject: [gutvol-d] re: accurately converted to any possible format
Message-ID: <328.526c6b7.31a771c0@aol.com>

greg said:
>    If it's not online and on the fly, then it's
>    not what I've been talking about.? Sorry.

your model and my model differ.

my model calls for online maintenance of one file
per book -- the master version -- in z.m.l. format.

once a person has downloaded that master version,
a program running on their personal machine is used
to convert that master version into auxiliary versions.

i can certainly port my source-code over to perl, so it
would run on the web.   but that's not what my model is.

my model is to put the power into the hands of the user.
i don't like methods that _require_ access to the internet.


>   You seem to be saying that there is exactly
>    one application in the world that can change
>    a ZML-formatted eBook into HTML, PDF and 
>    a variety of other formats.?

one so far, yep.

but i don't see any reason why a multitude of other programmers
couldn't build their own programs that would do the same thing.

how many such programs do you think people need?

it is also of importance to keep in mind my main orientation,
which is to provide a reader-program that is so superior that
nobody even _wants_ to do a conversion to another format...

realistically, i can't see anyone ever using my app to make a .pdf,
because why would they want to use acrobat as their viewer-app?
and they won't convert to .html to read the e-book in a _browser_,
that's for sure.   the only conversion i can see them doing is .html
so they can put it on a rocketbook or one of the other handhelds,
and those machines will all be on the scrapheap before too long...


>    If/when there is such an application that can
>    run on our Unix/Linux servers, operate on the fly, 
>    and integrate with the Web back end, 
>    it will be great to provide access to PG readers.

i don't program much in any of the scripting web-languages,
so you'd have better luck trying those three d.p. programmers
that i pointed you to -- thundergnat, donovan, and bill flis.

as a said, their tools are already doing the vast majority of the
work involved in ascii-to-html conversions for post-processors.

and it would be _great_ to standardize your .html versions.
the one-off nature of your current .html versions means that
all of them will have to be replaced eventually, which is gonna
break the hearts of the post-processors who worked so hard
on them and expected that hard work to last many decades...

i would be happy to consult with anyone who wants to do an
open-source version of these converters.   i could certainly offer
pseudo-code (and even realbasic code, if it helps) that would
serve as a guide in programming routines.   for the most part,
however, i think the translation of z.m.l. structures to (x).h.t.m.l.
should be rather obvious and totally straightforward.   i surely
have encountered no difficulties in doing exactly what is needed.

examples of some .zml files with .html conversions can be seen here:

>   http://www.greatamericannovel.com/ahmmw/ahmmw.zml
>   http://www.greatamericannovel.com/ahmmw/ahmmwc001.html

>   http://www.greatamericannovel.com/mabie/mabie.zml
>    http://www.greatamericannovel.com/mabie/mabiep001.html

>    http://www.greatamericannovel.com/myant/myant.zml
>    http://www.greatamericannovel.com/myant/myantc001.html

>    http://www.greatamericannovel.com/sgfhb/sgfhb.zml
>    http://www.greatamericannovel.com/sgfhb/sgfhbc001.html


>   Where can users who might be interested 
>    download your viewer app from?

they can get a beta version by signing on to the beta-test listserve:
>    zml_talk-subscribe@yahoogroups.com

that beta-version is very old, though, and doesn't have the
converter routines that we have been talking about here...

i'll be bringing the program out of beta in the next month or two;
people will be able to download a copy from the z.m.l. website (new!):
>    http://www.z-m-l.com

i now have one brave linux alpha-tester, so watch for a linux version!

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/0a7df199/attachment.html
From gbnewby at pglaf.org  Thu May 25 14:39:45 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu May 25 14:39:46 2006
Subject: [gutvol-d] re: accurately converted to any possible format
In-Reply-To: <328.526c6b7.31a771c0@aol.com>
References: <328.526c6b7.31a771c0@aol.com>
Message-ID: <20060525213945.GB24081@pglaf.org>

On Thu, May 25, 2006 at 04:46:56PM -0400, Bowerbird@aol.com wrote:
> greg said:
> >    If it's not online and on the fly, then it's
> >    not what I've been talking about.? Sorry.
> 
> your model and my model differ.

I guess so.  The main PG server 280000 files in dozens of formats and
languages, with millions of hits per months.  You're not offering
anything that can help them, except for a mailing list subscription for
an outdated beta.

Your earlier message said you could offer reformatting into any
reasonable format, as long as the input was sufficiently well-formed to
your standards.  Sounds like you don't actually have any such thing.
Put it up for free public download, and I'll change my tune in a
heartbeat.

> my model calls for online maintenance of one file
> per book -- the master version -- in z.m.l. format.
> 
> once a person has downloaded that master version,
> a program running on their personal machine is used
> to convert that master version into auxiliary versions.

So you're against storing (or creating) WAP versions, Braille versions
and MP3 versions on our server?  That's cutting out a whole lot of
potential readers I would like to support.

Nobody's stopping anyone from running a program on their own computer to
do whatever conversion they like.  Why are you trying to stop me from
enabling various conversions on the server, for people who want or need
to get conversion done there?

  -- Greg


> i can certainly port my source-code over to perl, so it
> would run on the web.   but that's not what my model is.
> 
> my model is to put the power into the hands of the user.
> i don't like methods that _require_ access to the internet.
> 
> 
> >   You seem to be saying that there is exactly
> >    one application in the world that can change
> >    a ZML-formatted eBook into HTML, PDF and 
> >    a variety of other formats.?
> 
> one so far, yep.
> 
> but i don't see any reason why a multitude of other programmers
> couldn't build their own programs that would do the same thing.
> 
> how many such programs do you think people need?
> 
> it is also of importance to keep in mind my main orientation,
> which is to provide a reader-program that is so superior that
> nobody even _wants_ to do a conversion to another format...
> 
> realistically, i can't see anyone ever using my app to make a .pdf,
> because why would they want to use acrobat as their viewer-app?
> and they won't convert to .html to read the e-book in a _browser_,
> that's for sure.   the only conversion i can see them doing is .html
> so they can put it on a rocketbook or one of the other handhelds,
> and those machines will all be on the scrapheap before too long...
> 
> 
> >    If/when there is such an application that can
> >    run on our Unix/Linux servers, operate on the fly, 
> >    and integrate with the Web back end, 
> >    it will be great to provide access to PG readers.
> 
> i don't program much in any of the scripting web-languages,
> so you'd have better luck trying those three d.p. programmers
> that i pointed you to -- thundergnat, donovan, and bill flis.
> 
> as a said, their tools are already doing the vast majority of the
> work involved in ascii-to-html conversions for post-processors.
> 
> and it would be _great_ to standardize your .html versions.
> the one-off nature of your current .html versions means that
> all of them will have to be replaced eventually, which is gonna
> break the hearts of the post-processors who worked so hard
> on them and expected that hard work to last many decades...
> 
> i would be happy to consult with anyone who wants to do an
> open-source version of these converters.   i could certainly offer
> pseudo-code (and even realbasic code, if it helps) that would
> serve as a guide in programming routines.   for the most part,
> however, i think the translation of z.m.l. structures to (x).h.t.m.l.
> should be rather obvious and totally straightforward.   i surely
> have encountered no difficulties in doing exactly what is needed.
> 
> examples of some .zml files with .html conversions can be seen here:
> 
> >   http://www.greatamericannovel.com/ahmmw/ahmmw.zml
> >   http://www.greatamericannovel.com/ahmmw/ahmmwc001.html
> 
> >   http://www.greatamericannovel.com/mabie/mabie.zml
> >    http://www.greatamericannovel.com/mabie/mabiep001.html
> 
> >    http://www.greatamericannovel.com/myant/myant.zml
> >    http://www.greatamericannovel.com/myant/myantc001.html
> 
> >    http://www.greatamericannovel.com/sgfhb/sgfhb.zml
> >    http://www.greatamericannovel.com/sgfhb/sgfhbc001.html
> 
> 
> >   Where can users who might be interested 
> >    download your viewer app from?
> 
> they can get a beta version by signing on to the beta-test listserve:
> >    zml_talk-subscribe@yahoogroups.com
> 
> that beta-version is very old, though, and doesn't have the
> converter routines that we have been talking about here...
> 
> i'll be bringing the program out of beta in the next month or two;
> people will be able to download a copy from the z.m.l. website (new!):
> >    http://www.z-m-l.com
> 
> i now have one brave linux alpha-tester, so watch for a linux version!
> 
> -bowerbird

> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From Bowerbird at aol.com  Thu May 25 15:28:01 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu May 25 15:28:09 2006
Subject: [gutvol-d] re: accurately converted to any possible format
Message-ID: <219.16282fa8.31a78971@aol.com>

Skipped content of type multipart/alternative-------------- next part --------------
An embedded message was scrubbed...
From: Greg Newby <gbnewby@pglaf.org>
Subject: Re: [gutvol-d] re: accurately converted to any possible format
Date: Thu, 25 May 2006 14:39:45 -0700
Size: 7655
Url: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/1fad2692/attachment.mht
From donovan at abs.net  Thu May 25 15:18:33 2006
From: donovan at abs.net (D Garcia)
Date: Thu May 25 15:41:31 2006
Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...]
In-Reply-To: <20060525051024.GH6694@pglaf.org>
References: <42a.2124045.31a58972@aol.com> <20060525051024.GH6694@pglaf.org>
Message-ID: <200605251818.34015.donovan@abs.net>

By way of forking the discussion, on Thursday 25 May 2006 at 01:10 am, Greg 
Newby responded to Jon Ingram with:
> Woah there, cowboy.
>
> I've been waiting for DP to provide raw page scans for *years*.  This is
> something I discussed with Charles & Juliet years ago.  The whitewashers
> are ready.  iBiblio is ready.

And the volunteer is ready. I volunteered nearly two months ago to take up 
this task and am simply waiting on various action items from a few people. 
Charles always intended to have the scans from DP available to the general 
public whenever possible.

> I've also been pressing to get preprints from DP...scans before the
> postprocessing is done, to release "to the wild" before they're quite
> ready.  (Last count there are over 800 of these.)

It's an interesting idea, but initially I'd like to focus on getting the 
existing projects in order. :)

> If you could help to move things forward on either scans or preprints,
> I'd be very grateful!  (Ditto for anyone else reading.)
>   -- Greg

-- David
From Bowerbird at aol.com  Thu May 25 15:47:21 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu May 25 15:47:29 2006
Subject: [gutvol-d] re: accurately converted to any possible format
Message-ID: <439.23d6dc6.31a78df9@aol.com>

sorry, i screwed up that last post, and forgot to include this:

greg said:
>    Your earlier message said you could offer 
>    reformatting into any reasonable format, 
>    as long as the input was sufficiently 
>    well-formed to your standards.? 

actually, as the header shows, i said "any possible format",
which is kind of ludicrous in retrospect, isn't it?

so let's be precise about exactly _what_ formats we mean,
in the future, and let us further provide _samples_ so that
people can evaluate the _quality_ of the conversions we do.

otherwise, it's just a hype war, and that does no one any good.


>    Sounds like you don't actually have any such thing.

it's not released yet, no.   but i have pointed to samples.


>    Put it up for free public download, 
>    and I'll change my tune in a heartbeat.

would that you were so demanding of the .tei folks.

***

now, let me restate, just to remind everybody again,
i have no objection to the .tei folks, or the .xml folks.

i don't even have an objection that .tei is the "official"
position on how project gutenberg moves to the future.

i merely wish to assert _my_ opinion, which i will back up
with solid evidence, that a much simpler methodology will
give substantially similar (if not better) benefits, at a cost
(both initial and maintenance) that is _significantly_ lower.

even then, if people want to stick with the .tei/.xml method,
that's fine with me, as it is no skin off my nose.   comprenez?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/a168da38/attachment.html
From Bowerbird at aol.com  Thu May 25 15:56:18 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu May 25 15:56:23 2006
Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...]
Message-ID: <3b6.2f08aab.31a79012@aol.com>

donovan said:
>    It's an interesting idea, but initially I'd like to focus 
>    on getting the existing projects in order. :)

the volunteer who does the work sets the agenda.         :+)

but putting up existing scan-sets, so people could read them
while they are standing in line waiting to be digitized, seems
to me to be the best possible way to put them to use _now_...

and if anyone needs another good idea, i think it would be
a good idea to round up the scan-sets from google that
represent books that are already in the p.g. library...

a list of such books has been compiled on the d.p. forums.
the focus of that list is now "...so don't bother with these...",
but i think they could instead represent a rare opportunity...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060525/77ac427f/attachment.html
From jmdyck at ibiblio.org  Thu May 25 16:18:43 2006
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Thu May 25 16:18:51 2006
Subject: !@!Re: [gutvol-d] Kevin Kelly in NYT on future of digital
	libraries
In-Reply-To: <Pine.LNX.4.60.0605250932330.18093@pglaf.org>
References: <424.1cc65e1.31a33a85@aol.com>
	<Pine.LNX.4.60.0605221045230.32683@pglaf.org>
	<44721054.3000104@ibiblio.org>
	<Pine.LNX.4.60.0605230852411.19444@pglaf.org>
	<4474B0C3.1040906@ibiblio.org>
	<Pine.LNX.4.60.0605241256560.31889@pglaf.org>
	<4474D24E.2050409@ibiblio.org>
	<Pine.LNX.4.60.0605250932330.18093@pglaf.org>
Message-ID: <44763B53.5080801@ibiblio.org>

Michael Hart wrote:
> 
> OK, back to basics, I have consulted with some mathematicians,
> not that I think you didn't know this, but you are pressuring
> me to make the point,

I'm not pressuring you to make any point about lines and curves. If you
feel pressure to do so, that's unfortunate. If you feel I demanded it,
you're mistaken.

In your first reply to my posting, you made a statement about "an
ordinarly [sic] growth curve", assuming a particular definition for
"curve". I disagreed, based on a different definition of "curve". But I
also (in the same message!) agreed with you, based on my best guess at
what you actually *meant*. (And also disagreed again, based on another
possibility for what you meant.)

So it's just a little confusion over one's choice of terms. E.g., if you
had instead said "an exponential growth curve" (or "geometric growth
curve" or "Moore's Law curve"), there wouldn't have been that confusion.

(Of course, there still would have been another problem, namely that you 
ascribed to me a position I had never taken. It would be nice if you 
apologized for that.)


>> But now we (well, you, really) have strayed from the topic that 
>> brought me in, the comparison between Google's progress and PG's,
>> so my interest in this discussion is probably fading.
> 
> Ah, it would appear that you already knew you were painting us into a 
> corner.

I disagree that we're painted into a corner. There's still a chance that 
this could go in a useful direction, though it does seem slim.

> Then I hope that the great effort spent in replying to your messages was
> not a total waste for either yourself or the rest of us.

For myself, the replies (yours and mine) feel like mostly a waste so 
far. (Although if people gained some insight into Google's progress by 
my comparing it with PG's, then that would be a bright spot.) For the 
rest, I cannot say.

> However, as I stated in my opening paragraph, I presumed you already
> knew all of this

Correctly presumed.

> and thus presumed you were only asking the question for other reasons.
> 
> May I ask what those reasons were?

Sure, but what question are you talking about? I looked over my last
three messages for a question, and the only one I found was (and I
quote) "Huh?".

(If you're talking about a question in which I ask you to explain lines
and curves, that question does not exist.)

-Michael


From ricardofdiogo at gmail.com  Thu May 25 21:42:06 2006
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Thu May 25 21:48:48 2006
Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...]
In-Reply-To: <3b6.2f08aab.31a79012@aol.com>
References: <3b6.2f08aab.31a79012@aol.com>
Message-ID: <9c6138c50605252142h592580fbg7186503ae12e769c@mail.gmail.com>

> but putting up existing scan-sets, so people could read them
> while they are standing in line waiting to be digitized, seems
> to me to be the best possible way to put them to use _now_...
>

In this case, all DP's Content Providers must be explicitly warned
that ALL scans CAN be released to the public.
Some foreign legislation may allow people to scan a book and release
it to DP (since it's a "closed" website), but may forbid them to allow
those scans to be released to the general public.
Some volunteers may be scanning books at this moment expecting that
only the final E-text will be released. (According to my national law
(EU), for instance, I can theoretically scan a 1960's edition
(respecting the life+70 rule) and upload it to PGDP/DPE. But having
the images freely available online can be very compromising because
editors may still have typographic copyright. I don't know how it
works all around the world... That's why some special warning would be
advisable.

2006/5/25, Bowerbird@aol.com <Bowerbird@aol.com>:
>
> donovan said:
> >   It's an interesting idea, but initially I'd like to focus
> >   on getting the existing projects in order. :)
>
> the volunteer who does the work sets the agenda.        :+)
>
> but putting up existing scan-sets, so people could read them
> while they are standing in line waiting to be digitized, seems
> to me to be the best possible way to put them to use _now_...
>
> and if anyone needs another good idea, i think it would be
> a good idea to round up the scan-sets from google that
> represent books that are already in the p.g. library...
>
> a list of such books has been compiled on the d.p. forums.
> the focus of that list is now "...so don't bother with these...",
> but i think they could instead represent a rare opportunity...
>
> -bowerbird
> donovan said:
> >   It's an interesting idea, but initially I'd like to focus
> >   on getting the existing projects in order. :)
>
> the volunteer who does the work sets the agenda.        :+)
>
> but putting up existing scan-sets, so people could read them
> while they are standing in line waiting to be digitized, seems
> to me to be the best possible way to put them to use _now_...
>
> and if anyone needs another good idea, i think it would be
> a good idea to round up the scan-sets from google that
> represent books that are already in the p.g. library...
>
> a list of such books has been compiled on the d.p. forums.
> the focus of that list is now "...so don't bother with these...",
> but i think they could instead represent a rare opportunity...
>
> -bowerbird
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>


-- 
?Vi de que noite ? feita a luz do dia!?

(Antero de Quental)

D? livros electr?nicos ao Mundo. Ajude em http://www.pgdp.net e em
http://dp.rastko.net
From prosfilaes at gmail.com  Thu May 25 22:22:49 2006
From: prosfilaes at gmail.com (David Starner)
Date: Thu May 25 22:29:31 2006
Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...]
In-Reply-To: <9c6138c50605252142h592580fbg7186503ae12e769c@mail.gmail.com>
References: <3b6.2f08aab.31a79012@aol.com>
	<9c6138c50605252142h592580fbg7186503ae12e769c@mail.gmail.com>
Message-ID: <6d99d1fd0605252222m6e197ac6hbaaa6c9d3771f4f5@mail.gmail.com>

On 5/25/06, Ricardo F Diogo <ricardofdiogo@gmail.com> wrote:
> In this case, all DP's Content Providers must be explicitly warned
> that ALL scans CAN be released to the public.

That's always been the assumption, as far as I know.

> Some foreign legislation may allow people to scan a book and release
> it to DP (since it's a "closed" website), but may forbid them to allow
> those scans to be released to the general public.

DP lets anyone sign up and download the complete scans for a book. I
wouldn't be too trusting that that would cover you under any legal
system. Note that DP-EU is different; they don't let just anyone look
at more pages then they need to proof, which they expect will cover
them for offering US-cleared works from a life+50 server.
From gbnewby at pglaf.org  Thu May 25 22:33:57 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu May 25 22:33:59 2006
Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...]
In-Reply-To: <6d99d1fd0605252222m6e197ac6hbaaa6c9d3771f4f5@mail.gmail.com>
References: <3b6.2f08aab.31a79012@aol.com>
	<9c6138c50605252142h592580fbg7186503ae12e769c@mail.gmail.com>
	<6d99d1fd0605252222m6e197ac6hbaaa6c9d3771f4f5@mail.gmail.com>
Message-ID: <20060526053357.GA31126@pglaf.org>

On Fri, May 26, 2006 at 12:22:49AM -0500, David Starner wrote:
> On 5/25/06, Ricardo F Diogo <ricardofdiogo@gmail.com> wrote:
> >In this case, all DP's Content Providers must be explicitly warned
> >that ALL scans CAN be released to the public.
> 
> That's always been the assumption, as far as I know.
> 
> >Some foreign legislation may allow people to scan a book and release
> >it to DP (since it's a "closed" website), but may forbid them to allow
> >those scans to be released to the general public.
> 
> DP lets anyone sign up and download the complete scans for a book. I
> wouldn't be too trusting that that would cover you under any legal
> system. Note that DP-EU is different; they don't let just anyone look
> at more pages then they need to proof, which they expect will cover
> them for offering US-cleared works from a life+50 server.

If an eBooks is public domain in the US, then the scans are too.
Even people outside of the US cannot (or at least, not successfully)
claim PG can't redistribute them...or anyone else for that matter.
I've saved some of our escapades on such issues:
	http://cand.pglaf.org/

If DP agrees to not redistribute, that's another matter...  this
is sometimes done for particular sources of content, even if
it's public domain.  I think it's in everyone's interest to
not violate such agreements.  We have a little about this in the
"Harvesting" section at www.gutenberg.org/howto

  -- Greg

From mlockey at magma.ca  Thu May 25 22:22:43 2006
From: mlockey at magma.ca (Michael Lockey)
Date: Thu May 25 22:35:35 2006
Subject: [gutvol-d] Distributed Proofreaders Canada
Message-ID: <200605260522.k4Q5Mfox010349@mail2.magma.ca>

First, let me apologize for the delays; in my case, the word 'health' has
been a singleton oxymoron.

 
-Despite all this, DP-CAN should be up in a week.  It is currently a very
small organization, and there's a whole buncha positions available.

 
Anyone want to moderate a forum, or help in any administrative position; or
willing to help CP, is welcome and needed.  (Please note that anyone
involved in helping us with the generation or processing of Life+50 work
must be legal to do so.)

 
Michael Lockey (note new email)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060526/f392c895/attachment.html
From tb at baechler.net  Thu May 25 22:53:59 2006
From: tb at baechler.net (Tony Baechler)
Date: Thu May 25 22:58:19 2006
Subject: [gutvol-d] re: accurately converted to any possible format
In-Reply-To: <20060525185651.GC20994@pglaf.org>
References: <42f.2268306.31a6c626@aol.com> <20060525185651.GC20994@pglaf.org>
Message-ID: <7.0.1.0.2.20060525224931.03e641d0@baechler.net>

Greg wrote:


>invited!).  I do think we can do it for Braille
>with nfbtrans, and I've been negligent in helping
>Marcello to set it up.

Yes, NFBTrans will do conversion to Braille on the fly.  I did try to 
get volunteers to check the output but got no takers.  The problem is 
trying to guess at what formatting commands to send.  You can format 
the Braille files a certain way but it requires trial and error.  I'm 
not a programmer and know little about setting it up.  It may be that 
you have to cat a file of dot commands to set formatting with the PG 
text to get useful output.  There are examples to produce textbooks 
and other embosser files, but that kind of formatting is not helpful 
for people using PDAs etc.  Yes, this can most definitely be done 
though automatically. 

From prosfilaes at gmail.com  Thu May 25 23:03:50 2006
From: prosfilaes at gmail.com (David Starner)
Date: Thu May 25 23:03:53 2006
Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...]
In-Reply-To: <20060526053357.GA31126@pglaf.org>
References: <3b6.2f08aab.31a79012@aol.com>
	<9c6138c50605252142h592580fbg7186503ae12e769c@mail.gmail.com>
	<6d99d1fd0605252222m6e197ac6hbaaa6c9d3771f4f5@mail.gmail.com>
	<20060526053357.GA31126@pglaf.org>
Message-ID: <6d99d1fd0605252303w711d67d0j5b6a3440c21c1e9a@mail.gmail.com>

On 5/26/06, Greg Newby <gbnewby@pglaf.org> wrote:
> If an eBooks is public domain in the US, then the scans are too.
> Even people outside of the US cannot (or at least, not successfully)
> claim PG can't redistribute them...or anyone else for that matter.
> I've saved some of our escapades on such issues:
>         http://cand.pglaf.org/

I think he was more worried about the content provider's liability. I
suspect a copyright holder could get very annoyed about one or two of
the books I have provided to DP-EU, but I've personally chosen to have
no public connection to those books.

However

> We are unaware of any case where copyright laws of another country impacts public domain status in the US.

is not 100% true. To be pedantic, there are rule 6 clearances possible
based on a non-US work not being in copyright in its home country when
the URAA was passed.
From gbnewby at pglaf.org  Thu May 25 23:07:46 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu May 25 23:07:47 2006
Subject: [gutvol-d] Re: DPF images archives [Was: Re: Kevin Kelly ...]
In-Reply-To: <200605251818.34015.donovan@abs.net>
References: <42a.2124045.31a58972@aol.com> <20060525051024.GH6694@pglaf.org>
	<200605251818.34015.donovan@abs.net>
Message-ID: <20060526060746.GA31780@pglaf.org>

On Thu, May 25, 2006 at 06:18:33PM -0400, D Garcia wrote:
> By way of forking the discussion, on Thursday 25 May 2006 at 01:10 am, Greg 
> Newby responded to Jon Ingram with:
> > Woah there, cowboy.
> >
> > I've been waiting for DP to provide raw page scans for *years*.  This is
> > something I discussed with Charles & Juliet years ago.  The whitewashers
> > are ready.  iBiblio is ready.
> 
> And the volunteer is ready. I volunteered nearly two months ago to take up 
> this task and am simply waiting on various action items from a few people. 
> Charles always intended to have the scans from DP available to the general 
> public whenever possible.

Responding to Joshua's point about the desired format, as
well as Greg W's inquery.  

There were several messages and some proposals about the details of how
to handle page scans.  Stuff like whether individual pages should each
have their own file, and what format...

I will forward a message from Jim Tinsley about that in a moment,
from July 2004.  There was subsequent discussionn.  I don't think
we quite got closure, but will ask the WWs if they remember anything
specific.

My suggestion is to do a few dozen of these, and work out the workflow
as we go.  If you can upload a .zip or .tar or somesuch to the 
pglaf server via FTP (not via http://upload.pglaf.org), then
email me, I'll push them to the archive.  Let me know if you don't
have the (non-anonymous) upload/outgoing password for pglaf.org.

Ideally, zipped with the eBook #, and with everthing in a page-images,
xxxxx-page-images/ subdir:

	12345/12345-page-images/

that will allow our automated "push" script to put it in the right
place.  

If things seem to work OK, I'll set things up so I won't
need to intervene.

I think it's fine to experiment with different ways of doing
the images -- that will help us to know what's workable for
our readers, and useful for other purposes.  Rather than
rehashing all of the questions, options and issues, I'd just
as soon see some stuff get posted so we can invite folks
to try it.  (I'm not trying to quell discussion, just trying to 
avoid the discussion getting in the way of the work.)

Thanks for stepping up and trying this!  We do want to make
images part of the regular workflow, but because the whitewashers
tend to download the eBooks to their home/office systems for
final processing, we'll probably want to have the page scans
flow somewhat separately than everything else.

Whoopee, this is great!!  Yippee-ei-ayyyyyyyy!!
  -- Greg

> > I've also been pressing to get preprints from DP...scans before the
> > postprocessing is done, to release "to the wild" before they're quite
> > ready.  (Last count there are over 800 of these.)
> 
> It's an interesting idea, but initially I'd like to focus on getting the 
> existing projects in order. :)
> 
> > If you could help to move things forward on either scans or preprints,
> > I'd be very grateful!  (Ditto for anyone else reading.)
> >   -- Greg
> 
> -- David
From gbnewby at pglaf.org  Thu May 25 23:13:40 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu May 25 23:13:43 2006
Subject: [gutvol-d] Page scans draft policy
Message-ID: <20060526061340.GA32022@pglaf.org>

As I said, there was subsequent discussion about the details
of formatting...Here's some info about page scan formats.  I
note that point 4 is somewhat different than what I just typed in my
other message, and seems a whole lot smarter.
  -- Greg

----- Forwarded message from Jim Tinsley <jtinsley@pobox.com> -----

 From: "Jim Tinsley" <jtinsley@pobox.com>
 To: "Posted Etexts for Project Gutenberg" <posted@listserv.unc.edu>
 Subject: [posted] Posted (#12973, Butler) !
 Date: Tue, 20 Jul 2004 20:24:32 -0700 (PDT)


Personal Recollections of Pardee Butler, by Pardee Butler                12973
  [Editor: Mrs. Rosetta B. Hastings]
  [Contributor: Mrs. Rosetta B. Hastings]
  [Contributor: Elder John Boggs]
  [Contributor: Elder J. B. McCleery]
  [Link: http://www.gutenberg.net/1/2/9/7/12973 ]
  [Files: 12973.txt; 12973-h.htm; 12973-page-images]


Thanks to Roger for finding and scanning this book.


This is the first PG book to be posted with page images. We are now
beginning to accept page images along with the regular postings.
Of course, DP has always preserved its page images, and those will
eventually be uploaded in a big batch, or series of batches, but 
non-DP contributions may now begin adding page images.


For now, we're setting the following guidelines for page image postings:

1. PG is now accepting page images of books posted. Page images will
be posted _only_ as an addition to an etext posted in the normal way
-- we will not post page images without plain text.


2. Page images are an option; they are not and will not be required
for the posting of a text.


3. All page images should be good enough to work reasonably well with
OCR packages, up to 600 dpi, and should be stored as black-and-white
TIFFs with CCITT-4 (aka ITU-G4 or Fax Group 4) compression. This is
important, so that we keep the overall file size down to a sustainable
level. With this compression, a typical 600dpi page can be stored for
about 40KB. Our ability to post these images depends on the file sizes
staying fairly reasonable. Pages such as color pictures or greyscale
photos that cannot reasonably be stored as black-and-white only should
be stored as TIFF or JPEG with the best compression you can get for
that image.

(Note: Irfanview for Windows does this nicely individually or in batch.
ImageMagick v 6.x: convert myimage.png -compress group4 myimage.tif )


4. Each page image should be a separate file and named with the page
number within the set; e.g. 001.tif, 002.tif, etc. Separate, non-page
images, such as covers or color images scanned separately from the
pages, should have suitable names, such as "cover.jpg" or "072-image.tif"
All page images for the book will be zipped into one file, to be called
FILENUMBER-page-images, e.g. 12345-page-images.piz (reverse the extension)
for etext #12345, and stored in the main directory for that etext. It will
unzip to a subdirectory ./page-images, but we will not post separate page
images in that directory, since that would double the space used, and we
believe that people who want to consult the images will probably want
them all. So, for now at least, if you want the images, you download
the PIZ (backwards again) file.


jim


----- End forwarded message -----
From gbnewby at pglaf.org  Thu May 25 23:19:08 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu May 25 23:19:09 2006
Subject: [gutvol-d] DPF images archives [Was: Re: Kevin Kelly ...]
In-Reply-To: <6d99d1fd0605252303w711d67d0j5b6a3440c21c1e9a@mail.gmail.com>
References: <3b6.2f08aab.31a79012@aol.com>
	<9c6138c50605252142h592580fbg7186503ae12e769c@mail.gmail.com>
	<6d99d1fd0605252222m6e197ac6hbaaa6c9d3771f4f5@mail.gmail.com>
	<20060526053357.GA31126@pglaf.org>
	<6d99d1fd0605252303w711d67d0j5b6a3440c21c1e9a@mail.gmail.com>
Message-ID: <20060526061908.GA32191@pglaf.org>

On Fri, May 26, 2006 at 01:03:50AM -0500, David Starner wrote:
> On 5/26/06, Greg Newby <gbnewby@pglaf.org> wrote:
> >If an eBooks is public domain in the US, then the scans are too.
> >Even people outside of the US cannot (or at least, not successfully)
> >claim PG can't redistribute them...or anyone else for that matter.
> >I've saved some of our escapades on such issues:
> >        http://cand.pglaf.org/
> 
> I think he was more worried about the content provider's liability. I
> suspect a copyright holder could get very annoyed about one or two of
> the books I have provided to DP-EU, but I've personally chosen to have
> no public connection to those books.

Understood, and that's a good approach.  Laws in other
countries (France & Germany leap right to mind) can be pretty
different than the US about their approach to such things.

> However
> 
> >We are unaware of any case where copyright laws of another country impacts 
> >public domain status in the US.
> 
> is not 100% true. To be pedantic, there are rule 6 clearances possible
> based on a non-US work not being in copyright in its home country when
> the URAA was passed.

Sure, I understand about that, and didn't type a very
precise message.  A more precise attempt:

If a book is public domain in the US (including under GATT/URAA etc.),
then we (PG and US-based persons) get to treat it as public domain.
This is regardless of whether it might still be copyrighted elsewhere.


This is upsetting to many copyright holders (including those with
copyrights in the US, but for items that are public domain elsewhere).
http://cand.pglaf.org for a few examples.
  -- Greg
From Bowerbird at aol.com  Fri May 26 00:43:00 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri May 26 00:43:08 2006
Subject: [gutvol-d] Page scans draft policy
Message-ID: <37c.416bf76.31a80b84@aol.com>

someone said:
>    4. Each page image should be a separate file 
>    and named with the page number within the set; 
>    e.g. 001.tif, 002.tif, etc. Separate, non-page images, 
>    such as covers or color images scanned separately 
>    from the pages, should have suitable names, 
>    such as "cover.jpg" or "072-image.tif"

this is a bad policy.

a wonky naming convention screws everything up.
(and an inconsistent convention is a wonky one.)

also, it's absolutely imperative that a library have
_unique_filenames_ for every single file within it.
naming files "001.tif" and expecting their _folder_
to differentiate them is a disaster-in-the-making.

for a better way of doing things, check any of the
links to the .html files i gave in an earlier message.
all of them incorporate the image-scans.

you want an alphabetical sort of the filenames to give you
the _exact_ order in which the p-book pages were bound...


>    All page images for the book will be zipped into one file

this is a bad policy too.

you need to have each image individually accessible.
it is _tremendously_ important that this be the case...


>    we will not post separate page images in that directory, 
>    since that would double the space used, and we believe that 
>    people who want to consult the images will probably want them all.

that's a bad assumption.

there are all kinds of situations
where people want a single scan.
(ponder your own experience...)

and these scans _need_ to be accessible via the web,
not just as a download package.   again, very important!

if it comes down to a choice between saving them in a zip file
and saving them individually, toss the zip file _immediately_...

***

your policy on the way in which the scans are saved simply must
reflect the realities about how those scans will eventually be used.
spend some time analyzing those uses so you make good decisions.

-bowerbird

p.s.   here are those links again, straight into their subdirectories:
>   http://snowy.arsc.alaska.edu/bowerbird/mabie/
>   http://snowy.arsc.alaska.edu/bowerbird/myant/
>    http://snowy.arsc.alaska.edu/bowerbird/tolbk/
>    http://snowy.arsc.alaska.edu/bowerbird/sgfhb/
>    http://snowy.arsc.alaska.edu/bowerbird/henty/
>    http://snowy.arsc.alaska.edu/bowerbird/bachwm/
>    http://snowy.arsc.alaska.edu/bowerbird/ahmmw/

the .html files are on another machine; you can't see them as a group,
but you can view-source on any one of them to see how they operate.
in each case, the .html files were auto-generated from the .zml file,
and the image-file is spliced in to facilitate "continuous proofreading".

>    http://www.greatamericannovel.com/ahmmw/ahmmw.zml
>    http://www.greatamericannovel.com/ahmmw/ahmmw.html

>    http://www.greatamericannovel.com/mabie/mabie.zml
>    http://www.greatamericannovel.com/mabie/mabiep001.html

>    http://www.greatamericannovel.com/myant/myant.zml
>    http://www.greatamericannovel.com/myant/myantc001.html

>    http://www.greatamericannovel.com/sgfhb/sgfhb.zml
>    http://www.greatamericannovel.com/sgfhb/sgfhbc001.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060526/4aa6344c/attachment.html