From ke at gnu.franken.de  Sat Dec  1 00:17:55 2007
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Sat, 1 Dec 2007 09:17:55 +0100 (CET)
Subject: [gutvol-d] The Advent Calendar will be up tomorrow
In-Reply-To: <2vt1l3dhg8beqp31guqq8aaltc06198r91@4ax.com>
References: <pplvk3djhaup8sc7selq89qrgatiehkhdp@4ax.com>	<748ba8e50711300336g6066d752h6ba65f375fdeb3ed@mail.gmail.com>
	<3n60l3devcg49a88g6h5gu6m46gedtt5mb@4ax.com>
	<47509F69.20301@netronome.com>
	<2vt1l3dhg8beqp31guqq8aaltc06198r91@4ax.com>
Message-ID: <51940.83.171.144.232.1196497075.squirrel@www.franken.de>

> On Fri, 30 Nov 2007 18:40:25 -0500, you wrote:
>
>>Will the links be enabled separately day by day?
>
> No. Just like a chocolate calendar you should be able to abuse it.

This is a good one :-D But those who behave that badly have
to be punished somehow...


From robert_marquardt at gmx.de  Sat Dec  1 03:24:52 2007
From: robert_marquardt at gmx.de (Robert Marquardt)
Date: Sat, 01 Dec 2007 12:24:52 +0100
Subject: [gutvol-d] The Advent Calendar will be up tomorrow
In-Reply-To: <51940.83.171.144.232.1196497075.squirrel@www.franken.de>
References: <pplvk3djhaup8sc7selq89qrgatiehkhdp@4ax.com>	<748ba8e50711300336g6066d752h6ba65f375fdeb3ed@mail.gmail.com>
	<3n60l3devcg49a88g6h5gu6m46gedtt5mb@4ax.com>
	<47509F69.20301@netronome.com>
	<2vt1l3dhg8beqp31guqq8aaltc06198r91@4ax.com>
	<51940.83.171.144.232.1196497075.squirrel@www.franken.de>
Message-ID: <53h2l3hh7lkh41m8eqnl494qc148t4utus@4ax.com>

On Sat, 1 Dec 2007 09:17:55 +0100 (CET), you wrote:

>This is a good one :-D But those who behave that badly have
>to be punished somehow...

Punishment in Christmas time? That is the job of Santa.
-- 
Robert Marquardt (Team JEDI)  http://delphi-jedi.org

From ralf at ark.in-berlin.de  Sun Dec  2 04:01:03 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Sun, 2 Dec 2007 13:01:03 +0100
Subject: [gutvol-d] The Advent Calendar will be up tomorrow
In-Reply-To: <51940.83.171.144.232.1196497075.squirrel@www.franken.de>
References: <pplvk3djhaup8sc7selq89qrgatiehkhdp@4ax.com>
	<748ba8e50711300336g6066d752h6ba65f375fdeb3ed@mail.gmail.com>
	<3n60l3devcg49a88g6h5gu6m46gedtt5mb@4ax.com>
	<47509F69.20301@netronome.com>
	<2vt1l3dhg8beqp31guqq8aaltc06198r91@4ax.com>
	<51940.83.171.144.232.1196497075.squirrel@www.franken.de>
Message-ID: <20071202120103.GB14608@ark.in-berlin.de>

> >>Will the links be enabled separately day by day?
> >
> > No. Just like a chocolate calendar you should be able to abuse it.
> 
> This is a good one :-D But those who behave that badly have
> to be punished somehow...

Intellectual obstipation? Happens often enough ...


ralf


From Bowerbird at aol.com  Mon Dec  3 09:35:56 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 3 Dec 2007 12:35:56 EST
Subject: [gutvol-d] still waiting
Message-ID: <cdf.1d170c28.3485987c@aol.com>

still waiting for carlo to weigh in with that wdiff tutorial...

-bowerbird


**************************************
Check out AOL's list of 2007's hottest 
products.

(http://money.aol.com/special/hot-products-2007?NCID=aoltop00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071203/d25b1b0e/attachment.htm 

From Bowerbird at aol.com  Tue Dec  4 09:44:39 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 4 Dec 2007 12:44:39 EST
Subject: [gutvol-d] the cost of an e-book authoring system
Message-ID: <d58.1b0404b9.3486ec07@aol.com>

some people have taken issue with the price i put on the _source_code_
for my e-book applications.   (the programs themselves are free of cost.)

evidently, they don't know what the market will bear...

here's the latest piece of information they might want to consider:
>    
http://www.futureofthebook.org/blog/archives/2007/12/a_grant_for_sophie_from_the_ma.html

sophie, an authoring-tool from the institute for the future of the book, just 

received a grant of $400,000, which will "ensure" that v1.0 gets out the 
door.

note that this is on top of gobs of other cash sophie has received in the 
past,
not to mention the money that went toward earlier iterations (notably tk3)...

-bowerbird


**************************************
Check out AOL's list of 2007's hottest 
products.

(http://money.aol.com/special/hot-products-2007?NCID=aoltop00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071204/28634511/attachment.html 

From Bowerbird at aol.com  Fri Dec  7 16:01:26 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 7 Dec 2007 19:01:26 EST
Subject: [gutvol-d] 2000 books digitized
Message-ID: <cb8.1f38f5dd.348b38d6@aol.com>

distributed proofreaders has passed the 2000-book milestone for 2007...
congratulations to the hard-working volunteers for their accomplishment.

-bowerbird


**************************************
Check out AOL's list of 2007's hottest 
products.

(http://money.aol.com/special/hot-products-2007?NCID=aoltop00030000000001)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071207/a0b3ec85/attachment.htm 

From Bowerbird at aol.com  Mon Dec 10 12:19:38 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 10 Dec 2007 15:19:38 EST
Subject: [gutvol-d] that proofing challenge
Message-ID: <d60.1850a37e.348ef95a@aol.com>

i've made headway on that proofing challenge.

to refresh your memory:
>    http://z-m-l.com/misc/thechallenge.png
>    http://z-m-l.com/misc/thechallenge2.png

this table depicts one way to unravel the thing:
>    http://z-m-l.com/misc/thechallenge3.html

if anyone here is actually interested enough to
ask questions about it, i'll be happy to answer...
otherwise, i won't bore you with implementation.

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071210/0d866238/attachment.htm 

From radicks at bellsouth.net  Mon Dec 10 18:34:12 2007
From: radicks at bellsouth.net (Dick Adicks)
Date: Mon, 10 Dec 2007 21:34:12 -0500
Subject: [gutvol-d] Amazon's Kindle
Message-ID: <C3836154.B048%radicks@bellsouth.net>

Does anybody know if Amazon's new Kindle is capable of downloading PG books?
The Amazon website does not mention that applicability, so I assume that the
device is limited only to what Amazon sells.

Dick Adicks


From Bowerbird at aol.com  Mon Dec 10 20:30:09 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 10 Dec 2007 23:30:09 EST
Subject: [gutvol-d] Amazon's Kindle
Message-ID: <d6b.160218b4.348f6c51@aol.com>

you can send a file -- text, .html, .rtf -- to amazon to have it kindle-ized, 
yes.

but probably the best bet is to get the 10000-book d.v.d. from silkpagoda.com
(formerly blackmask.com) wherein much of the p.g. library has been converted.

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071210/54eaa15b/attachment.htm 

From desrod at gnu-designs.com  Mon Dec 10 21:33:57 2007
From: desrod at gnu-designs.com (David A. Desrosiers)
Date: Tue, 11 Dec 2007 00:33:57 -0500 (EST)
Subject: [gutvol-d] Amazon's Kindle
In-Reply-To: <20071211035609.GN5916@localhost>
References: <C3836154.B048%radicks@bellsouth.net>
	<20071211035609.GN5916@localhost>
Message-ID: <Pine.LNX.4.64.0712110030080.30649@neptune.gnu-designs.com>


> The device certainly is not limited to Amazon's offerings.  The 
> Kindle reads natively Mobibook (no-drm) *.mobi and *.pdb and *.txt 
> formats.

> It permits you to install such files free via USB.  (Or you can send 
> them wireless to the device for 10 cents a file).  And Kindle 
> supports SD cards up to 4GB.

It still submits the titles, length, bookmarks, annotation, etc. of 
ANY title you happen to be reading, back upstream (their TOS is clear 
on this point).


From greg at durendal.org  Thu Dec 13 11:22:50 2007
From: greg at durendal.org (Greg Weeks)
Date: Thu, 13 Dec 2007 14:22:50 -0500 (EST)
Subject: [gutvol-d] pre-press
Message-ID: <Pine.LNX.4.63.0712131421520.5771@durendal.durendal.org>


What was the address for the pre-press site? Posting the texts before they 
are ready for full posting has come up in a discussion on DP.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From hart at pglaf.org  Thu Dec 13 12:22:36 2007
From: hart at pglaf.org (Michael Hart)
Date: Thu, 13 Dec 2007 12:22:36 -0800 (PST)
Subject: [gutvol-d] pre-press
In-Reply-To: <Pine.LNX.4.63.0712131421520.5771@durendal.durendal.org>
References: <Pine.LNX.4.63.0712131421520.5771@durendal.durendal.org>
Message-ID: <Pine.LNX.4.64.0712131222110.19410@pglaf.org>


You want to SEND to that site, or RECEIVE from that side???


mh


On Thu, 13 Dec 2007, Greg Weeks wrote:

>
> What was the address for the pre-press site? Posting the texts before they
> are ready for full posting has come up in a discussion on DP.
>
> -- 
> Greg Weeks
> http://durendal.org:8080/greg/
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From greg at durendal.org  Thu Dec 13 12:27:04 2007
From: greg at durendal.org (Greg Weeks)
Date: Thu, 13 Dec 2007 15:27:04 -0500 (EST)
Subject: [gutvol-d] pre-press
In-Reply-To: <Pine.LNX.4.64.0712131222110.19410@pglaf.org>
References: <Pine.LNX.4.63.0712131421520.5771@durendal.durendal.org>
	<Pine.LNX.4.64.0712131222110.19410@pglaf.org>
Message-ID: <Pine.LNX.4.63.0712131525580.6180@durendal.durendal.org>

On Thu, 13 Dec 2007, Michael Hart wrote:

>
> You want to SEND to that site, or RECEIVE from that side???

Receive from it. I just wanted to show some people at DP what PG is doing 
with other sources material that not quite ready for full release.

Greg Weeks

> On Thu, 13 Dec 2007, Greg Weeks wrote:
>
>>
>> What was the address for the pre-press site? Posting the texts before they
>> are ready for full posting has come up in a discussion on DP.
>>
>> --
>> Greg Weeks
>> http://durendal.org:8080/greg/
>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d at lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

-- 
Greg Weeks
http://durendal.org:8080/greg/


From gbnewby at pglaf.org  Thu Dec 13 12:39:33 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu, 13 Dec 2007 12:39:33 -0800
Subject: [gutvol-d] pre-press
In-Reply-To: <Pine.LNX.4.63.0712131525580.6180@durendal.durendal.org>
References: <Pine.LNX.4.63.0712131421520.5771@durendal.durendal.org>
	<Pine.LNX.4.64.0712131222110.19410@pglaf.org>
	<Pine.LNX.4.63.0712131525580.6180@durendal.durendal.org>
Message-ID: <20071213203932.GA24302@mail.pglaf.org>

On Thu, Dec 13, 2007 at 03:27:04PM -0500, Greg Weeks wrote:
> On Thu, 13 Dec 2007, Michael Hart wrote:
> 
> >
> > You want to SEND to that site, or RECEIVE from that side???
> 
> Receive from it. I just wanted to show some people at DP what PG is doing 
> with other sources material that not quite ready for full release.
> 
> Greg Weeks

http://preprints.readingroo.ms

Enjoy :)


> >
> >>
> >> What was the address for the pre-press site? Posting the texts before they
> >> are ready for full posting has come up in a discussion on DP.
> >>
> >> --
> >> Greg Weeks
> >> http://durendal.org:8080/greg/
> >>
> >> _______________________________________________
> >> gutvol-d mailing list
> >> gutvol-d at lists.pglaf.org
> >> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >>
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d at lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> 
> -- 
> Greg Weeks
> http://durendal.org:8080/greg/
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From jon at noring.name  Thu Dec 13 13:07:20 2007
From: jon at noring.name (Jon Noring)
Date: Thu, 13 Dec 2007 14:07:20 -0700
Subject: [gutvol-d] Announcing: The Digital Text Community mailing list
Message-ID: <522962738.20071213140720@noring.name>

Everyone,

This is the second and final formal announcement in other forums on
the launch of "The Digital Text Community" (DTC), a public mailing
list (run on YahooGroups) devoted to serious discussion of digitizing
"ink-on-paper" publications.

The full group description is found at the group's "home page" at:

   http://groups.yahoo.com/group/digital-text/

The group was launched a month ago, and already has over 170
subscribers, including notables from a number of text digitization
projects. (The subscriber list includes, of course, a few people
involved with Project Gutenberg and Distributed Proofreaders.)

Discussion is beginning to become quite active. Anyone interested in
any aspect of the digitization of texts is invited to subscribe and
participate.

I will be happy to manually subscribe those who don't want to go
through the process of getting Yahoo accounts to subscribe -- just
let me know in private email and I'll add you to the list, no muss,
no fuss!

Note that posted messages to the group are lightly moderated, intended
only to remove spam, off-topic messages, and messages whose tone or
substance violate the group's "Prime Directive" of cordiality and
respect towards others (refer to the group description for further
information on the moderation policy.)

*****

(Further info taken from the first announcement)

The primary reason why we started DTC is that there is, surprisingly,
no independent, cross-project forum to discuss the various technical
and non-technical issues of digitizing "ink-on-paper" publications.

Current discussion on digitizing paper publications is disjointly
spread around in various nooks and crannies of the Internet. For
example, there are forums for particular digitization projects such
as those run by Project Gutenberg (e.g. "gutvol-d") and Distributed
Proofreaders (they maintain a web-based forum.)

And then there are forums which touch upon various issues of text
digitization but which is not their main focus. Examples include the
The eBook Community (TeBC) and Book Futures (BF, note that I am a
moderator for both of these Yahoo-based groups.)

The summary purpose of DTC is given in the last paragraph of the
DTC group description:

   "This group is not affiliated with any particular project or
   organization, but rather is independent. It is hoped this group
   will be a bridge between the various text digitization projects,
   enabling information exchange for everyone?s benefit."

Do consider subscribing to DTC. If you need any help with subscribing
to the group, let me know. Look forward to seeing you there!


Jon Noring
The Digital Text Community Administrator


From lee at novomail.net  Fri Dec 14 13:12:53 2007
From: lee at novomail.net (Lee Passey)
Date: Fri, 14 Dec 2007 14:12:53 -0700
Subject: [gutvol-d] Normalizing XML and text files.
Message-ID: <4762F1D5.4000705@novomail.net>

A couple of weeks ago I suggested that normalizing different files is 
the required first step in any automated attempt to correct them.

On the off chance that anyone is interested, I now have written a 
program which will normalize XML and non-XML ASCII files without losing 
critical markup. I now have it working to a point where, although it is 
not complete of error-free, I feel comfortable sharing it with others.

If anyone is interested in obtaining a copy of the source code and a 
Windows executable, contact me back-channel and I'll send you a copy. To 
compile and link, the program requires the domcapi source 
(http://sourceforge.net/projects/domcapi/) and the expat source 
(http://sourceforge.net/projects/expat/).

From Bowerbird at aol.com  Fri Dec 14 16:47:30 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 14 Dec 2007 19:47:30 EST
Subject: [gutvol-d] the word for the weekend is "widget"
Message-ID: <d38.19dba83a.34947e22@aol.com>

there are a lot of funny ideas floating around,
about systems of proofing, and tools to do it...

some are funny "ha-ha", some are funny "stupid",
and some are funny "ha-ha, that's _really_ stupid".

but no matter...

because the word for the weekend is "widget".

that's right, a widget, which grabs the page-scan for a page,
and the o.c.r. text for that page, and presents them to users,
and lets them fix errors in the text _or_ confirm it as correct;
the widget then scoots the results back up to your website...

a simple widget.   that's all your proofer ever needs to see...

you know what they say -- "a page a day".

but i betcha can't eat just one...

w-i-d-g-e-t.

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071214/f54a1846/attachment.htm 

From richfield at telkomsa.net  Fri Dec 14 00:12:18 2007
From: richfield at telkomsa.net (Jon Richfield)
Date: Fri, 14 Dec 2007 10:12:18 +0200
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
Message-ID: <47623AE2.1010501@telkomsa.net>

I have several books that to my mind are well worth preserving, but 
still in copyright and unlikely to be conserved by less eccentric 
spirits (not in time, anyway! Have you noticed the totally unnecessarily 
transient nature of many books nowadays?)  Some paperbacks that are 
still well in copyright, are almost unscannable already. 

Now, I realise that there are good reasons for observing copyright 
rules, but on the also reasonable assumption that as amorphous a 
structure as PG should survive the time of many of us, what is the view 
on accumulating books or other publications that only emerge from 
protection in a decade or two?  At present, should one submit  a 
candidate that dates from too close to 1922, be it never so out of 
print, all you get is "Not OK".  Then all one can do is to sit on the 
product for years or till one loses one's own collection or marbles.  (I 
tried to contact PG Aus some months ago, but got no response at all.  I 
might try again soon.  Does anyone have a sure-fire address for getting 
their attention?) 

However, it seems to me that if we established a repository, a sort of 
PG Purgatory, or at least PG Limbo, in which properly prepared books 
could rest for a few years or decades, hibernating till copyright 
lapsed, this would be harmless at worst.  If a list of dormant titles 
were available, that would enable persons in a position to waive 
copyright, to contact PG to authorise expedited eclosion.  Given the 
current price of bulk data storage, it sounds doable to me.  How about 
"Mr Belloc Objects" By Wells, eg?  (1926)  How about "Comic and curious 
verse" by J.M. Cohen 1952?  Limbo's full of such, many of them sliding 
towards oblivion, and sliding much faster than old copies of Punch or 
Scientific American.

Have I missed anything in the faqs, that deals with this point? 

Cheers,

Jon


From Bowerbird at aol.com  Sat Dec 15 00:12:20 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 15 Dec 2007 03:12:20 EST
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
Message-ID: <d20.1b951a76.3494e664@aol.com>


the supreme court _did_ rule that "time-shifting" _is_ legal, after all...    
    ;+)

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071215/ed4cc82b/attachment.htm 

From johnson.leonard at gmail.com  Sat Dec 15 08:25:34 2007
From: johnson.leonard at gmail.com (Leonard Johnson)
Date: Sat, 15 Dec 2007 11:25:34 -0500
Subject: [gutvol-d] Publishing founding father papers held up
Message-ID: <748ba8e50712150825o67e29e6dr6cef0828ed8c6925@mail.gmail.com>

This seems like a project that Project Gutenberg should get involved.

Excerpted from article in "Washington Post" article _In the Course of
Human Events, Still Unpublished_.

http://www.washingtonpost.com/wp-dyn/content/article/2007/12/14/AR2007121402119.html?wpisrc=newsletter

"More than 200 years after they were written, huge portions of the
papers of America's founding fathers are still decades away from being
published, prompting a distinguished group of scholars and federal
officials to pressure Congress to speed the process along."

"But the Pew-led lobbyists are not satisfied that enough has been
accomplished, especially McCullough (Pulitzer Prize winning author of
_John Adams_), who does not believe that a quicker completion would
sacrifice quality. Instead, he blames the slow progress on "the little
fiefdoms of each project, which have been working in their own way in
their world for over two generations."

Note: "Little fiefdoms", I think, refers to various institutions of
higher learning that hold the copies of this material.

While we, PG, would be unable to annotate the papers, if they were
scanned and made publicly available, we could digitize what is there,
and a wider number of scholars could be invited to help with the
additional annotation necessary to more fully understand the papers in
the context in which they were written. Even unannotated, these papers
must be very valuable.

Anyway, this is an interesting article.

Len Johnson
-- 
http://members.cox.net/leaonarddjohnson/

From jeroen.mailinglist at bohol.ph  Sat Dec 15 09:14:34 2007
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Sat, 15 Dec 2007 18:14:34 +0100
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <47623AE2.1010501@telkomsa.net>
References: <47623AE2.1010501@telkomsa.net>
Message-ID: <47640B7A.60601@bohol.ph>


Well, I think Google and a couple of others are scanning entire
libraries wholesale, and I guess they too will sit on their non-PD scans
until freedom comes. Others can do similar things....

On the other hand, we still have so many things in the PD to do, that
even with a concentrated effort, when finally the next year of stuff becomes
available in the US in 2018 or so, we probably have not eaten through it
all....

In my opinion, PG is less about preservation, but more about
accessibility (although to keep works a part of living culture also
requires accessibility)

Jeroen.

Jon Richfield wrote:
> I have several books that to my mind are well worth preserving, but 
> still in copyright and unlikely to be conserved by less eccentric 
> spirits (not in time, anyway! Have you noticed the totally unnecessarily 
> transient nature of many books nowadays?)  Some paperbacks that are 
> still well in copyright, are almost unscannable already. 
>
> Now, I realise that there are good reasons for observing copyright 
> rules, but on the also reasonable assumption that as amorphous a 
> structure as PG should survive the time of many of us, what is the view 
> on accumulating books or other publications that only emerge from 
> protection in a decade or two?  At present, should one submit  a 
> candidate that dates from too close to 1922, be it never so out of 
> print, all you get is "Not OK".  Then all one can do is to sit on the 
> product for years or till one loses one's own collection or marbles.  (I 
> tried to contact PG Aus some months ago, but got no response at all.  I 
> might try again soon.  Does anyone have a sure-fire address for getting 
> their attention?) 
>
> However, it seems to me that if we established a repository, a sort of 
> PG Purgatory, or at least PG Limbo, in which properly prepared books 
> could rest for a few years or decades, hibernating till copyright 
> lapsed, this would be harmless at worst.  If a list of dormant titles 
> were available, that would enable persons in a position to waive 
> copyright, to contact PG to authorise expedited eclosion.  Given the 
> current price of bulk data storage, it sounds doable to me.  How about 
> "Mr Belloc Objects" By Wells, eg?  (1926)  How about "Comic and curious 
> verse" by J.M. Cohen 1952?  Limbo's full of such, many of them sliding 
> towards oblivion, and sliding much faster than old copies of Punch or 
> Scientific American.
>
> Have I missed anything in the faqs, that deals with this point? 
>
> Cheers,
>
> Jon
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>   


From Bowerbird at aol.com  Sat Dec 15 12:34:17 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 15 Dec 2007 15:34:17 EST
Subject: [gutvol-d] Publishing founding father papers held up
Message-ID: <c22.278a800f.34959449@aol.com>

len said:
>    Anyway, this is an interesting article.

sure is.

when the library of congress accuses you of being slow,
that means you're being _very_ slow...

this is just a cozy cash cow for a small group of scholars...
and as such, yes, it'd be great fun to rip it from their hands.

unfortunately, libraries all over the place are starting to view
their unique volumes as something they can withhold for cash
in the coming digital cyberlibrary, an attitude that is only gonna
bankrupt all of them in the long run, and shortchange our access
to the cultural heritage that we _thought_ they were saving for us...

i'm convinced the only way we're gonna shake some sense into them
is to declare eminent domain on the institutions that are now holding
our cultural heritage for ransom, starting with the academic journals...

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071215/6f06b3e1/attachment.htm 

From Bowerbird at aol.com  Sat Dec 15 12:38:03 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 15 Dec 2007 15:38:03 EST
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
Message-ID: <cf8.2278c23b.3495952b@aol.com>

jeroen said:
>    PG is less about preservation, but more about accessibility

that's a false dichotomy we should remove from our thinking.

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071215/9f9c1e04/attachment.htm 

From jon at noring.name  Sat Dec 15 12:41:41 2007
From: jon at noring.name (Jon Noring)
Date: Sat, 15 Dec 2007 13:41:41 -0700
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <cf8.2278c23b.3495952b@aol.com>
References: <cf8.2278c23b.3495952b@aol.com>
Message-ID: <01165190.20071215134141@noring.name>

Bowerbird wrote:
> jeroen said:

>>?PG is less about preservation, but more about accessibility

>  that's a false dichotomy we should remove from our thinking.

Agreed.


Jon Noring


From Bowerbird at aol.com  Sat Dec 15 13:29:30 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 15 Dec 2007 16:29:30 EST
Subject: [gutvol-d] two overarching thoughts on a roundless system of
	proofing
Message-ID: <c71.1f7d4854.3495a13a@aol.com>

i'll have a lot more later -- it's written already, but i think i will wait
until monday to send it to this list -- but here are two overarching
thoughts about implementing a _roundless_ system of proofing...

(in case you're wondering why, this is a topic that is being discussed
over on the d.p. forums, presently, and often over the past few years.
and it's a shame it never moves past the discussion phase, since the
current system -- where _every_page_of_every_book_ is slated to go
through a specific number of rounds -- is grossly inefficient, and has
led to a huge waste of time and energy, plus endless discussions and 
a wide array of experiments to overcome its obvious shortcomings.
however, the discussion is marred by a bunch of people who simply
don't know what they're talking about, and by the fact that no one
over there seems to be able to separate the wheat from the chaff...)

anyway, here are those two overarching thoughts.

1.   it's unnecessary to "formulate some kind of metric" to inform you
when a specific page can be considered "finished".   it is _done_ when
a certain number of people -- say 2 to 4 -- can't find any errors in it.
at that point, even if there _are_ still errors in it, it has simply become 
unproductive to schedule yet _another_ set of eyes to look for them...
but, for the vast majority of pages, there just won't be any errors left.
you don't have to believe me.   just try it -- as the simplest thing that
_might_ work -- and you will happily discover it does indeed work...

2.   it's unnecessary to "formulate some kind of metric" to inform you
about the proofing skills of each volunteer.   it's easy enough to use
the obvious measures to determine a score, but it's unnecessary to
_use_ that score in order to assign pages to the proofer, since the
measure of whether a page is "finished" or not is impervious to the
skill levels of the proofers.   if 2-4 "average" proofers find no errors
left on a page, then the odds are that a "great" proofer won't either.
and -- once again -- you don't have to believe me that this is true;
try it -- as the simplest thing that _might_ work -- and find it does...

in other words, don't make it more complicated that it has to be...

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071215/db02ce07/attachment.htm 

From julio.reis at tintazul.com.pt  Sun Dec 16 07:36:56 2007
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio_Reis?=)
Date: Sun, 16 Dec 2007 15:36:56 +0000
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <mailman.2.1197748802.29671.gutvol-d@lists.pglaf.org>
References: <mailman.2.1197748802.29671.gutvol-d@lists.pglaf.org>
Message-ID: <47654618.4050900@tintazul.com.pt>

> Now, I realise that there are good reasons for observing copyright 
> rules, but on the also reasonable assumption that as amorphous a 
> structure as PG should survive the time of many of us, what is the 
> view on accumulating books or other publications that only emerge from 
> protection in a decade or two?
Well, PG should survive the time of many of us... really? Only if they 
stick to the laws. If PG starts saving copies of copyrighted books in 
2008, then I'm afraid it might not see the light of 2009.

Scanning a book is copying it. Proof? If you destroy the paper book, you 
still can read the scans. Therefore scanning is making a copy, therefore 
not legal. Supposing it were legal to scan a book and keep the copy 
personal, then perhaps Gutenberg could scan its own paper books and not 
show them to anyone else; when the time came they could release it in 
the rounds. Not before, as I think making the scans available in the 
rounds is actually allowing anyone who bothers to register in PGDP, to 
read the scans.

Supposing PG or PGDP could keep copies of books waiting for copyright. 
What would you do if the copyright were extended in the USA? 
Frustrating. So in these terms what can be done is campaign for more 
public domain, less profiteering over works which should be freely given 
over to Humanity.

I even find the concept that an author should be able to live his whole 
life out of the earnings of one book very strange; let alone that his 
grandchildren, 69 years after death would still be living off it. People 
who write books -- can't they have 50 years of rights from first 
publication, and save money for old age should they live after that 
period, like the rest of us? Wouldn't it be nice to pay homage to an 
80-year old writer who wrote a great book 50 years ago, and which is 
passing to the whole of Humanity right now? Wouldn't she enjoy it? 
Seeing a spurt of publication and of reading / use of her work? But I'm 
straying from topic.

More food for thought: if you don't draw the line in legality, where do 
you draw the line? I liked the movie /Eragon/ -- shouldn't we scan 
Christopher Paolini's book now when we can? And what about some of the 
nice books which are being sold this month?

I say we have our plates full when it comes to releasing works in the 
public domain. And getting back to lobbying for shorter copyright terms 
-- I see my part in it as being a contributor to PG, to make it more and 
more relevant. I urge you to do the same. So we can say in a few years 
"200 thousand books says people CARE for public domain!" And the 
Electronic Frontier Foundation et al. will do the legal and campaigning 
stuff.

> I tried to contact PG Aus some months ago, but got no response at 
> all.  I might try again soon.  Does anyone have a sure-fire address 
> for getting their attention? 
That's the way forward, I think. Publish where it's legal to. If it's 
not legal, then don't publish. My opinion.

J?lio aka Tintazul.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071216/4077edea/attachment.htm 

From Bowerbird at aol.com  Sun Dec 16 10:52:49 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 16 Dec 2007 13:52:49 EST
Subject: [gutvol-d] more complicated that
Message-ID: <d16.1be0155a.3496ce01@aol.com>

i said:
>    in other words, don't make it more complicated that it has to be...

ha-ha!         :+)

note to self:   in your text clean-up programs, insert a check for 
"more complicated that" as an error for "more complicated than".

***

along with the not/now confusion -- the most pernicious of all,
since it changes the meaning, and is often hard to diagnose --
and he/be (which always gives o.c.r. apps the heebie-jeebies),
than/that is a mix-up where both options are high-frequency...

so, in general, it might be good to program some checks for it...
(although i don't remember how often it shows up empirically...)

***

but let's play with google to check the specifics of _my_ goof...

>    0,081,200 = "more complicated that"
>    3,950,000 = "more complicated than"
>    4,031,200 = total cases
>    0,031,200 = estimated cases where "that" was correct
>    4,000,000 = cases where "that" was incorrect for "than"

so, roughly:
>    80,000/4,000,000
>    8,000/400,000
>    800/40,000
>    80/4,000
>    8/400
>    2/100 = 2%

***

i looked at roughly 100 hits, to see how often they were indeed errors...
(and the vast majority of them were.)

first, it happens to the best of us, like jon udell, writing for o'reilly:
>    Capturing a screencast needn't be much more complicated that 
>    capturing static screenshots.

and second of all, there's this page:
>    OK, the real world is more complicated that 
>    what is envisioned in the Heckscher-Ohlin model,
>    but it is also more complicated that 
>    what is envisioned in the Ricardian model as well.
where the error appears twice within a single sentence...

or this interesting variant:
>    You're more complicated that that.
which also appears here:
>    It can get more complicated that that, so when you're ready, you can
and, altogether, represents 20% (16,300) of the total 81,200 cases...

the phrase _can_ and _does_ appear in a correct form, of course:
>    It is a little more complicated that way (you have to back track). 

or as in this particularly compelling quote:
>    This is, effectively, a slam on Microsoft for making everything so much 
>    more complicated that with thousands of times more capability it still 
>    takes the same amount of time to do things.   Moore?s law marches on, 
>    but the amount of time you spend waiting on your computer remains 
static.

but in almost all of the _correct_ cases that google located, it was because 
the phrase contained some punctuation between "complicated" and "that":
>    other means would have been more complicated.   That is what he is 
saying
(google's blindness to punctuation often makes it inappropriate for this 
task.)

and, in one case out of the 100 i looked at, the "correct" form was vague:
>    Many problems that occur with lawn mowers are 
>    more complicated that fixing it yourself may be 
>    too much trouble.
(in this passage, "more" should probably have been "so", but so be it.)

***

curiously, 
>    "less complicated that that" = 578 (roughly 5% of the total "that" 
cases)
>    "less complicated that" = 10,300
>    "less complicated than" = 129,000

it seems to be less prevalent that things are "less complicated than"
-- as opposed to "more" -- but it's also the case that, when they are,
it's less likely that the phrase will be used correctly.   the numbers say
the glitch occurs in 10,000 out of 130,000 cases, a whopping 7.7%...

***

it's also interesting to look at "complex", as opposed to "complicated":
>    0,186,000 = "more complex that"
>    4,050,000 = "more complex than"
>    4,236,000 = total cases
>    0,236,200 = estimated cases where "that" was correct
>    4,000,000 = cases where "that" was incorrect for "than"

so, roughly:
>    186,000/4,000,000
>    186/4,000
>    200/4000
>    100/2000
>    10/200
>    1/20 = 5%

so "complex" looks to be twice as error-prone as "complicated"...
even though things _generally_ seem "less complex" more rarely,
glitch-wise it doesn't seem to matter whether it's "more" or "less"...

>    "less complex that that" = 1,400 (about 12% of the overall total)
>    "less complex that" = 11,400
>    "less complex than" = 190,000

so, roughly:
>    10,000/200,000
>    10/200
>    1/20 or 5%, the same as "more complex that"

***

it sure is fun to play with words, isn't it?              :+)

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071216/f2c4c438/attachment-0001.htm 

From lee at novomail.net  Sun Dec 16 11:17:24 2007
From: lee at novomail.net (Lee Passey)
Date: Sun, 16 Dec 2007 12:17:24 -0700
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <01165190.20071215134141@noring.name>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name>
Message-ID: <476579C4.9000200@novomail.net>

Jon Noring wrote:
> Bowerbird wrote:
>> jeroen said:
> 
>>>  PG is less about preservation, but more about accessibility
> 
>>  that's a false dichotomy we should remove from our thinking.
> 
> Agreed.

Disagreed.

The distinction between preservation and accessibility (in the sense of 
making a work available and usable) is very real, and has a serious 
impact on processes.

Both Google and the Open Content Alliance are engaged in efforts to 
"digitize" books, many of which are in the public domain. If I 
understand correctly, these efforts consist of making scan sets of the 
books, then doing an uncorrected OCR stage which references points in 
the image files. This makes it possible to search for "text" to the 
extent that the OCR is correct, but all you get back is an image of the 
page. In other words, the Open Content Alliance is about as useful as 
microfilm. Preservation: 10 - Accessibility: 2.

At the other end of the spectrum we have Project Gutenberg, which 
records nothing about the provenance of a work, strips public domain 
works of all but the alphabetic text, silently corrects "obvious" 
typographical errors, and occasionally creates hybrid works by combining 
various editions without comment or justification. Preservation: 2 - 
Accessibility: 6 (it would get higher marks for accessibility were it 
not for the strong bias against markup).

The obvious differences between these two projects is due to their 
underlying priorities; the dichotomy between accessibility and 
preservation is very real, and has profound practical consequences.

Now this does not mean that I think that the two philosophies are 
fundamentally irreconcilable. I think that thorough TEI encoding of a 
text can capture the meaning of the physical artifact almost as well as 
a high-resolution image. And TEI is not so esoteric that it cannot be 
easily used both as a presentation format using CSS, as a conversion 
format using XSLT or perl, or for other automated data processing 
functions which have nothing to do with presentation.

However, I do think there may be a point where the goals of "warts and 
all" is so fundamentally inconsistent with "no warts at all" that some 
choice between the two may be inevitable.

From jon at noring.name  Sun Dec 16 11:23:23 2007
From: jon at noring.name (Jon Noring)
Date: Sun, 16 Dec 2007 12:23:23 -0700
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <476579C4.9000200@novomail.net>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net>
Message-ID: <796087170.20071216122323@noring.name>

Lee wrote:
> Jon Noring wrote:
>> Bowerbird wrote:
>>> jeroen said:

>>>>  PG is less about preservation, but more about accessibility

>>>  that's a false dichotomy we should remove from our thinking.

>> Agreed.

> Disagreed.
>
> The distinction between preservation and accessibility (in the sense of
> making a work available and usable) is very real, and has a serious 
> impact on processes.

Er, ok. <smile/>

I interpreted Bowerbird's comment, maybe incorrectly, that one really
can't separate preservation from access. One preserves in order to
make something accessible at a future date. And PG is certainly about
both. (Whether they do it effectively is another matter, but I'm
talking about the intent.)

Certainly, digital preservation of content (making something
accessible in the distant future) adds requirements to those for
making something accessible just for today. For example, there have
to be requirements to assure the digital content even makes it to the
future, and if the digital content makes it to the future, that it is
accessible to the degree desired.

Hopefully that clarifies my "agreed".

Jon


From jeroen.mailinglist at bohol.ph  Sun Dec 16 14:24:08 2007
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Sun, 16 Dec 2007 23:24:08 +0100
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <476579C4.9000200@novomail.net>
References: <cf8.2278c23b.3495952b@aol.com>	<01165190.20071215134141@noring.name>
	<476579C4.9000200@novomail.net>
Message-ID: <4765A588.3000003@bohol.ph>


Lee, you phrased nicely a line of thinking I follow, and even when I say
PG is more about accessibility,
I fully agree with your 6... improving metadata, cataloging, and tagging
could increase that to a 7, and adding an
integrated library/reading system around it may turn it to a 10.

Accessibility is important for preservation. For only those works that
are part of a vibrant culture are ultimately preserved, as part
of our cultural heritage, and not just dust collecting on shelves. Only
works that can be freely cited, re-used, and re-purposed will survive.
That is why copyright has become the biggest enemy of cultural heritage,
as it starves works to death in a prison, only to release them when
they've starved and have become disconnected from cultural life after
having been forgotten for three or four generations. You can't preserve
what you do not love, and you do not love what you do not know.

I have been a "believer" in TEI for over 10 years now... For me it
works, and helps me to produce ebooks by large numbers, without
having to dive in HTML trouble all of the time.

Jeroen.

Lee Passey wrote:
> Jon Noring wrote:
>   
>> Bowerbird wrote:
>>     
>>> jeroen said:
>>>       
>>>>  PG is less about preservation, but more about accessibility
>>>>         
>>>  that's a false dichotomy we should remove from our thinking.
>>>       
>> Agreed.
>>     
>
> Disagreed.
>
> The distinction between preservation and accessibility (in the sense of 
> making a work available and usable) is very real, and has a serious 
> impact on processes.
>   


From prosfilaes at gmail.com  Sun Dec 16 17:41:58 2007
From: prosfilaes at gmail.com (David Starner)
Date: Sun, 16 Dec 2007 20:41:58 -0500
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <476579C4.9000200@novomail.net>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name> <476579C4.9000200@novomail.net>
Message-ID: <6d99d1fd0712161741l2d8f85c0waeb187b8712e80f@mail.gmail.com>

On Dec 16, 2007 2:17 PM, Lee Passey <lee at novomail.net> wrote:
> At the other end of the spectrum we have Project Gutenberg, which
> records nothing about the provenance of a work, ...
> it would get higher marks for accessibility were it
> not for the strong bias against markup).

I think Michael Hart has a point that WordStar files may not be
readable in a few years... oh, wait, did you mean now? The way you
were talking, I thought you were talking about the 20th century.

From Bowerbird at aol.com  Mon Dec 17 00:23:04 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 17 Dec 2007 03:23:04 EST
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
Message-ID: <c19.2967a216.34978be8@aol.com>

if you do not have _both_ "access" and "preservation",
somebody has cheated you along the line somewhere,
to the point where you'll have absolutely nothing at all.

so beware the mindset that tries to make you _choose_.
it is little more than a trick designed to make you _lose_.

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071217/84d76b73/attachment.htm 

From ralf at ark.in-berlin.de  Mon Dec 17 02:45:48 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Mon, 17 Dec 2007 11:45:48 +0100
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <476579C4.9000200@novomail.net>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name>
	<476579C4.9000200@novomail.net>
Message-ID: <20071217104548.GB7788@ark.in-berlin.de>

> At the other end of the spectrum we have Project Gutenberg, which 
> records nothing about the provenance of a work, strips public domain 
> works of all but the alphabetic text, silently corrects "obvious" 
> typographical errors, and occasionally creates hybrid works by combining 
> various editions without comment or justification. Preservation: 2 - 
> Accessibility: 6 (it would get higher marks for accessibility were it 
> not for the strong bias against markup).

The strong bias of proofreaders and uploaders, you should say,
because that's where the stats are made.

But! The line between preservation and accessibility will be
redrawn when PG's relatively accessible works get bibliographic
metadata AND this metadata is referred to by library projects
like http://firstsearch.oclc.org or http://www.eromm.org
I think that, in addition to scans, these projects will at some
time index text versions, too. And PG/pglaf should be prepared, then,
to provide metadata of the books of which the texts are made.


ralf


From lee at novomail.net  Mon Dec 17 09:49:29 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 17 Dec 2007 10:49:29 -0700
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <20071217104548.GB7788@ark.in-berlin.de>
References: <cf8.2278c23b.3495952b@aol.com>	<01165190.20071215134141@noring.name>	<476579C4.9000200@novomail.net>
	<20071217104548.GB7788@ark.in-berlin.de>
Message-ID: <4766B6A9.5080400@novomail.net>

Ralf Stephan wrote:

>> At the other end of the spectrum we have Project Gutenberg, which 
>> records nothing about the provenance of a work, strips public domain 
>> works of all but the alphabetic text, silently corrects "obvious" 
>> typographical errors, and occasionally creates hybrid works by combining 
>> various editions without comment or justification. Preservation: 2 - 
>> Accessibility: 6 (it would get higher marks for accessibility were it 
>> not for the strong bias against markup).
> 
> The strong bias of proofreaders and uploaders, you should say,
> because that's where the stats are made.

Nice spin! I see a bright future for you as a commentator for Fox News.

I am under the impression from your comments that you view enhanced, 
accurate metadata, including accurate historical data, as A Good Thing. 
It is not quite as clear, but it seems that you also view the increased 
use of markup as A Good Thing.

I find it slightly offensive that you would attempt to blame the PG 
volunteers for the inadequacies of the PG corpus. I believe that if 
proofreaders and uploaders have been contributing inadequate work 
product which has become part of the PG corpus, it is either because 
they have been encouraged to do things that way by the PG FAQ and 
culture, or because they have not been educated as to best known 
practices for digitization of printed works. As I see it, the ultimate 
cause of the current state of the PG corpus is either bad leadership or 
the failure of leadership.

But, wherever you choose to assign blame, that does nothing to change 
the state of the corpus. If PG gets a rating of 6 out of 10 for 
accessibility, the practices that have caused it only to get 6 out of 10 
are pretty much irrelevant, unless there is some commitment to change 
those practices--and I don't see /that/ happening any time soon.

> But! The line between preservation and accessibility will be
> redrawn when PG's relatively accessible works get bibliographic
> metadata AND this metadata is referred to by library projects
> like http://firstsearch.oclc.org or http://www.eromm.org
> I think that, in addition to scans, these projects will at some
> time index text versions, too. And PG/pglaf should be prepared, then,
> to provide metadata of the books of which the texts are made.

In my earlier message I tried to make the distinction between the 
ability to obtain a particular work and the ability to make use of a 
particular work once it had been obtained. Both of these components make 
up the overall principle of accessibility.

If the ability to find and download a work were the only consideration, 
Google and OCA would both get a higher score than Project Gutenberg. But 
having downloaded a book from Google one is pretty much limited to 
having a photo album of pictures of pages. Given the need for a 
relatively powerful computer with a reasonably large display to see this 
pictures, these Google photo albums are actually less useful than a 
paper book. Easier to get, perhaps, but less useful once obtained.

Project Gutenberg's e-texts, on the other hand, can be viewed on just 
about every piece of equipment ever manufactured, including S-100 bus 
machines running CP/M attached to VT-52 dumb terminals. The pure ASCII 
representation (well, not so pure now that LATIN-1 encoding is 
permitted) can easily be repurposed for other applications not 
envisioned by the original creators, such as defeating spam filters. (I 
don't see this as a bad thing, rather it is a real-life example of the 
usefulness of flexible, repurposable text).

This commitment to the least common denominator, however, has also led 
to documents which are not pleasing to read on modern equipment, and 
which, in many ways, prevents other useful repurposing, such as the 
automated creation of catalogs. Because these documents are less useful 
than they easily could be, they are also less accessible than they could be.

Even if PG could accurately recapture much of the metadata which it has 
lost, and even if it could become better referenced by certain library 
projects, that would not do much to increase the usability or 
repurposability of the existing e-texts; there would be improvements to 
their accessibility, but they would not be large scale improvements.

--

From piggy at netronome.com  Mon Dec 17 11:47:12 2007
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Mon, 17 Dec 2007 14:47:12 -0500
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <47623AE2.1010501@telkomsa.net>
References: <47623AE2.1010501@telkomsa.net>
Message-ID: <4766D240.8000801@netronome.com>

Jon Richfield wrote:
> I have several books that to my mind are well worth preserving, but 
> still in copyright and unlikely to be conserved by less eccentric 
> spirits (not in time, anyway! Have you noticed the totally unnecessarily 
> transient nature of many books nowadays?)  Some paperbacks that are 
> still well in copyright, are almost unscannable already.
>   

Scan them now and keep them on your own machine. You are unlikely to 
find yourself prosecuted over it. Depending on how rational the laws in 
your country are, it might even be legal.

I've had moderate success looking up the authors of such books and 
asking them to either make their book available under a Creative Commons 
license or make a license especially for PG.

In principle it is possible to get books added to PG under these conditions.

I'm still waiting on action from PGLAF for my most recent such effort. I 
included a scan of the letter from the author granting a CC license. The 
clearance inspector told me that we needed a letter from the author. I 
pointed out that there WAS a letter from the author in the clearance 
request. I don't know what the problem is at this point.


From hart at pglaf.org  Mon Dec 17 12:00:47 2007
From: hart at pglaf.org (Michael Hart)
Date: Mon, 17 Dec 2007 12:00:47 -0800 (PST)
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <4766B6A9.5080400@novomail.net>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name>
	<476579C4.9000200@novomail.net>
	<20071217104548.GB7788@ark.in-berlin.de>
	<4766B6A9.5080400@novomail.net>
Message-ID: <Pine.LNX.4.64.0712171156470.6029@pglaf.org>


Nobody forces anybody to do eBooks any particular way at PG.

If you would like to write your own FAQs about how you think
eBooks should be done, please do so, and we will try to find
as many volunteers for your methodology as possible, and 90%
of all our volunteers just might go that way, who knows?

We have found that the best way to gain such volunteers is a
set of examples made to your own specifications and posted--
then you can just point to them and as people if they should
not like this kind better than what else is available.

We'd be only too glad to post your FAQ in various locations,
including the Newsletters, etc.


Thanks!!!

Michael S. Hart
Founder
Project Gutenberg

Recommended Books:

Dandelion Wine, by Ray Bradbury:  For The Right Brain
Atlas Shrugged, by Ayn Ran,:  For The Left Brain [or both]
Diamond Age, by Neal Stephenson:  To Understand The Internet
The Phantom Toobooth, by Norton Juster:  Lesson of Life. . .


From gbnewby at pglaf.org  Mon Dec 17 13:51:36 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon, 17 Dec 2007 13:51:36 -0800
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <4766D240.8000801@netronome.com>
References: <47623AE2.1010501@telkomsa.net> <4766D240.8000801@netronome.com>
Message-ID: <20071217215136.GA9598@mail.pglaf.org>

> Jon Richfield wrote:
> > I have several books that to my mind are well worth preserving, but 
> > still in copyright and unlikely to be conserved by less eccentric 
> > spirits (not in time, anyway! Have you noticed the totally unnecessarily 
> > transient nature of many books nowadays?)  Some paperbacks that are 
> > still well in copyright, are almost unscannable already.

Project Gutenberg regularly receives such items (sometimes in the
hopes that they'll be judged as public domain in the US, under
our copyright clearance procedures).

As a library (under the US's tax ruling), we are legally able to archive
such items indefinitely, but not redistribute them. 

In short, you can send such items to me (or to Michael Hart) and
we'll do our best to hold them until they become public domain in
the US.
  -- Greg

From Bowerbird at aol.com  Mon Dec 17 15:08:54 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 17 Dec 2007 18:08:54 EST
Subject: [gutvol-d] a rare breath of fresh air
Message-ID: <caa.20a7890f.34985b86@aol.com>

one of my most important messages to the people over at d.p. was that
the use of preprocessing of their text would save _lots_ of proofer time.

they didn't want to hear it, but i nonetheless told them over and over...

so it's nice when the topic re-emerges, and a new preprocessing tool,
no matter how simple, is truly a welcome and rare breath of fresh air:
>    http://www.pgdp.net/phpBB2/viewtopic.php?t=30903&start=0

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071217/ea666e99/attachment.htm 

From Bowerbird at aol.com  Tue Dec 18 02:56:16 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 18 Dec 2007 05:56:16 EST
Subject: [gutvol-d] stop it
Message-ID: <d47.1a13bf32.34990150@aol.com>

stop taking bits and pieces of my messages here
and posting them to other lists.   stop it right now.
you know who you are, and so do i, so stop it now.

i've stopped talking to you here because "dialog"
with you is worthless.   so stop exporting my posts.

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071218/db93a585/attachment.htm 

From bzg at altern.org  Tue Dec 18 03:33:09 2007
From: bzg at altern.org (Bastien)
Date: Tue, 18 Dec 2007 12:33:09 +0100
Subject: [gutvol-d] stop it
In-Reply-To: <d47.1a13bf32.34990150@aol.com> (Bowerbird@aol.com's message of
	"Tue, 18 Dec 2007 05:56:16 EST")
References: <d47.1a13bf32.34990150@aol.com>
Message-ID: <874pegb5l6.fsf@bzg.ath.cx>

Bowerbird at aol.com writes:

> stop taking bits and pieces of my messages here
> and posting them to other lists.  stop it right now.
> you know who you are, and so do i, so stop it now.

So why don't you just send him a message?  

Please don't use the list to put pressure on people that you're not
directly naming, it looks suspicious and it's worth nothing.

-- 
Bastien

From Bowerbird at aol.com  Tue Dec 18 08:10:10 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 18 Dec 2007 11:10:10 EST
Subject: [gutvol-d] stop it
Message-ID: <c4d.25c6cc65.34994ae2@aol.com>


it's jon noring and lee passey.   and -- as i said -- they know who they are.

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071218/aa68fc04/attachment.htm 

From Bowerbird at aol.com  Tue Dec 18 10:12:51 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 18 Dec 2007 13:12:51 EST
Subject: [gutvol-d] quick note on "roundless proofing"
Message-ID: <bf4.177b5945.349967a3@aol.com>

i'll be posting a longish message later today summarizing my approach
to a methodology of "roundless proofing".   although i will not bother to
accommodate the whole of the baggage over at distributed proofreaders
-- because, frankly, most of it is unnecessary -- my post will nonetheless
be enlightening to anyone from there whose head is not stuck in the sand.

of course, this is just one more step along a path where i've been creating
all the tools to put together a system that's integrated across the workflow.

one of the common excuses given over at d.p. for not moving forward on
a roundless system is that their developers are overloaded.   that might be,
but it also indicates they don't have a clue how simple this programming is.

at $100/hour, which they've acknowledged as the going rate, my estimate is
that i could code a working demo in approximately 50 hours, meaning that
the expense would be within the range of reason.   of course, to anyone who
knows our happy little history, the notion that d.p. would pay me to design a
system for them is cause for chuckles.   nonetheless, the offer is open...    
   :+)

-bowerbird

p.s.   here's another way for d.p. to proceed on this, offered 
_free_of_charge:_
put every page of the book into a wiki, and turn your volunteers lose on 
it...


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071218/ad2a48ba/attachment.htm 

From Bowerbird at aol.com  Tue Dec 18 10:16:31 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 18 Dec 2007 13:16:31 EST
Subject: [gutvol-d] thoughts on a wiki
Message-ID: <c30.20883d7b.3499687f@aol.com>

i said:
>    p.s.   here's another way for d.p. to proceed on this, offered 
_free_of_charge:_
>    put every page of the book into a wiki, and turn your volunteers lose on 
it...

oh yeah, one of the things on my list for 2008 is to wikify the entire p.g. 
library.
let's invite the general public in and see if they can help clean up those 
e-texts...

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071218/0ec7c57e/attachment.htm 

From Bowerbird at aol.com  Tue Dec 18 15:27:22 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 18 Dec 2007 18:27:22 EST
Subject: [gutvol-d] stop it
Message-ID: <c21.1131953b.3499b15a@aol.com>

uh, lee, don't bother posting _here_ to my threads either...
i don't read your posts; i do not care what you have to say.
i do not want to have dialog with you, none, on any topic...
i don't want you taking my fragments from here elsewhere;
that's all that needs to be said in this thread, and i've said it.
rewrite your posts for other listserves, and leave me out of it.

you and noring just confuse the issues.   so leave me out of it.

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071218/34c58d87/attachment.htm 

From Bowerbird at aol.com  Tue Dec 18 16:58:47 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 18 Dec 2007 19:58:47 EST
Subject: [gutvol-d] my thoughts on a roundless system of proofing
Message-ID: <cf7.23ef7c3a.3499c6c7@aol.com>

here are those notes on a "roundless" system of proofreading...

i've done a lot of research that indicates to me unequivocally that
the "old" style of proofreading -- verifying each word in each line
against a scan -- will come to be seen as _hopelessly_inefficient_,
so keep that in mind as you read this.   but i'll proceed nonetheless.

oh, and i don't care to hear how this and/or that will or will not
fit into the d.p. system, since   i don't care about the d.p. system.
this is _my_ system, which you might not care about either...         :+)

***

i'm just going to talk about this from the standpoint of the proofer.

there _is_ back-end work involved (e.g., keeping track of the files),
and a lot of front-end peripheral stuff as well, but i'll ignore all that.

we'll just look at it in the same way that the proofers will look at it.

there are several "selection criteria" that each proofer would specify,
including what they would proof -- which languages, genres, etc. --
but let's also assume that we've surmounted all of those obstacles...

the final selection criteria that you would have to specify would be to
say whether you want to proof pages that are "raw", "better", or "done".

from this point on, you can absorb this more like a list of instructions,
and a description to the proofer on how the system works as a whole.

***

you'll be presented with a text-field, and the scan-image of the page.

fix all errors you can find, then say if the page is "better" or it's "done".

if -- after the last change is made on a page -- 2-4 proofers in a row
_confirm_ the page as being "done", then it is considered as "finished".
but any _change_ automatically resets the confirmation counter to zero.

(the question as to whether the pages need 2 or 3 or 4 _confirmations_
is one that will need to be subjected to some real-world empirical testing,
balancing the benefits against the costs of doing the extra confirmations.
we'd assume, of course, that more confirmations mean greater accuracy,
but 4 confirmations is _twice_ as costly as 2 confirmations, so we need to
gauge whether the extra accuracy is worth the cost of extra confirmations.
we'd start out requiring 4 confirmations, but will measure the number of
errors that are discovered after 2 and 3 confirmations, and their _nature_.
after some time, we should able to rationally assess the cost-benefit ratio.
i have a guess about what's optimal, but there's no need for guess-work.
note that it's also the case that people who want "higher-quality books"
can choose to work at the 3- and 4-confirmation levels to ensure that,
whereas people who think the 2-confirmation level is fine can do _that_.
and the people who like to tackle raw o.c.r. can do that totally guilt-free.)

ok, back to the instructions now...

you will be informed of all pages that are changed after you proofed them.

you can _dispute_ a change that's made after you, or _confirm_ it's correct.

any pages in dispute can be specifically requested by any other proofer, so
disputes will be resolved quickly, and proofers can come to full agreement.
(after all, the point of disputes is to answer the question of "best 
practice".)

if you've marked a page as "done", and someone later makes a change to it,
and that correction is valid (i.e., it is later confirmed by 2-4 other 
proofers),
your _accuracy_score_ goes down.   take pride in your accuracy score, folks!

if you have marked a page as "done", and it goes on to become "finished" too,
that will raise your _accuracy_score_.   again, take pride in your accuracy 
score!

at the same time, if you never declare a page as "done", any remaining errors
will _not_ be held against you.   so if you'd rather be safe than sorry, 
_be_safe_.

confirmations _raise_ your accuracy score, as long as they are _valid_ 
ones...
inappropriate confirmations, on the other hand, _lower_ your accuracy score.

the specific point-values for all the relevant actions haven't been decided, 
but
correcting the last error on a page is the best way to raise your accuracy 
score!

here are some rough approximations, though:
7 points -- making the last change to a page (i.e., none made after you)
6 points -- declaring a page "done" and having it confirmed as "finished"
5 points --   being the 2nd confirmation that a page is actually "finished"
4 points --   being the 3rd confirmation that a page is actually "finished"
3 points -- being the 4th confirmation that a page is actually "finished"
2 points -- making a page better by correcting one or more errors
1 point -- proofing a page, but not finding errors, because it has none
0 points -- not finding any errors, even though there are one or more
-1 points -- declaring a page as "done" when it still contains an error
-2 points -- 2nd confirmation as "done" when the page contains an error
-3 points -- 3rd confirmation as "done" when the page contains an error
-4 points -- 4th confirmation as "done" when the page contains an error
-5 points -- introducing two or more errors onto a page

the main idea here is that a person will get points for sticking their neck 
out,
_providing_ that they were right.   but if they're wrong, they get 
decapitated...

your accuracy score is _public_, as is the number of pages you have proofed.

(but neither of the variables has an effect on the pages you are given to 
proof!
our initial assumption is that everyone is capable of proofing every single 
page.
at the same time, however, no one is _forced_ to do any page either, so if 
_you_
decide any one page is "too difficult" for you at that moment, just don't 
proof it.)

again, that was just a rough pass at the point-values, and nobody should get
too obsessed with the points, because they're just there for "bragging 
rights",
and to remind people that the _quality_ is just as meaningful as the 
_quantity_.

you make _your_own_judgment_ on the importance of "quantity" and "quality".
it's not necessary that you move a page to "done", just that you make it 
"better".
(but if you _do_ call a page "done", then you had better make sure it really 
_is_.)

differing priorities on the value of "quality" will not adversely effect the 
output,
because every page will keep getting proofed until a consensus emerges on it.
even if you do make a goof on a page, it's no big deal, because _every_ 
change
needs to be confirmed by other eyes, so there's no danger of errors 
persisting.

although we call everyone "proofers", a page should _not_ be marked as "done"
until it is free of all scanning and printer errors, _and_ properly formatted 
too!

(so, again, if all you want to do is _proof_, then just mark the page as 
"better".
any action improving the page is something you can, and should, be proud of.
you don't have to do formatting if you don't want, and you're not 
"penalized".)

formatting is done with z.m.l., so once a page is formatted, it's completely 
done,
and when all of the pages in the book are done, the _book_ is finished as 
well...
(we'll still want one person to finalize it by checking it at the "whole-book 
level".)

a book's pages are structured like a wiki, so every revision to each page is 
saved.
it's possible for any proofer to "step through" the consecutive edits of each 
page.
again, when 2-4 people call a page "done" -- with no changes -- it's 
"finished".
but any time another change is made, the confirmation counter is re-set to 
zero.

when a page is displayed, any questionable aspects about it are 
_flagged_in_red_.

if the page is marked as "done" while it still contains questionable _words_ 
on it,
those words are automatically added to the _vocabulary_ for that book, which 
is
the text-file used to determine which words will be flagged on subsequent 
pages.
this means that the next time that page is displayed, the word will not be 
flagged.

however, proofers are able to constantly view the vocabulary for a specific 
book,
to check if a questionable word has been added to it, and those proofers can 
then
specifically call up the page containing that word, so as to _verify_ that 
it's correct.

proofers can also do a "find" on a keyword/phrase, and proof pages containing 
it.

if a "badword" is corrected, you can search for other instances of "badword", 
and
view them on a book-wide basis, then approve the correction on those other 
pages.
if you find and correct a scanno that appears on 20 different pages, you'll 
get credit
for making every one of those 20 pages "better".   but check each of them 
carefully, 
because you'll also get docked if one of those changes shouldn't have been 
made!

when you mark a page as "better", it means that it's not "done" yet, and 
needs either
(1) more eyes to look at it, and/or (2) specialized treatment from a certain 
"expert".
the specific type of "expert" that is required can be noted in a series of 
checkboxes,
like "greek", "tables", "index", "ads", and so on.   specialists can request 
such pages...

proofers can jump around in the book randomly, or step through it 
page-by-page.

a page will move from "raw" to "finished" over the course of hours, maybe 
_minutes_.
so the viewing of "diffs" will be quite _immediate_, to get everyone on the 
same page.
one point of a roundless system is to move a book from start to finish within 
_days_.
let the books jostle for entrance into the system; but once in, get 'em out 
right away!

since a book will come to "completion" so quickly, proofers interested in 
their "diffs"
should plan on examining them within 24 hours, while their experience is 
still fresh.

as for a "meaningless diff", there's no such thing.   there's one
-and-only-one _correct_
way to proof and format a page, and our sole object is to move the page to 
that state...
that doesn't mean that any one particular proofer will be _able_ to attain 
that goal, and
there's no shame in being unable to do so.   just mark the page as "better", 
and go on...

once all pages in a book are considered "finished", the book moves into the 
stage of
"continuous proofreading", where it essentially sits in the wiki for about 
3-6 months,
during which time anyone can "smooth-read" it, and make any necessary 
corrections.
(and yes, any later corrections ramify back on your "accuracy score" too, so 
be sharp!)


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071218/9542842e/attachment.htm 

From ralf at ark.in-berlin.de  Wed Dec 19 01:34:16 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Wed, 19 Dec 2007 10:34:16 +0100
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <4766B6A9.5080400@novomail.net>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name>
	<476579C4.9000200@novomail.net>
	<20071217104548.GB7788@ark.in-berlin.de>
	<4766B6A9.5080400@novomail.net>
Message-ID: <20071219093416.GA29329@ark.in-berlin.de>

> I am under the impression from your comments that you view enhanced, 
> accurate metadata, including accurate historical data, as A Good Thing. 
> It is not quite as clear, but it seems that you also view the increased 
> use of markup as A Good Thing.

That is so. However, catalog.rdf is a start and 'just' needs to be
augmented with 1) original publisher, 2) publishing year, 3) publishing
place 4) possible edition and series information, 5) possible editor/
translator info, and 6) a working link to the etext page instead of
simply 'etext21990'. In short, metadata about the etext should 'link'
to that info which is given to pglaf when a clearance is requested,
the metadata of the book.

Now, it's possible there is/will be a format tailored for holding
both info (MODS?). OTOH, Dublin Core RDF should be able to do that, too.

So, where can pglaf metadata be accessed?

> Project Gutenberg's e-texts, on the other hand, can be viewed on just 
> about every piece of equipment ever manufactured, including S-100 bus 
> machines running CP/M attached to VT-52 dumb terminals. The pure ASCII 
> representation (well, not so pure now that LATIN-1 encoding is 
> permitted) can easily be repurposed for other applications not 
> envisioned by the original creators, such as defeating spam filters. (I 
> don't see this as a bad thing, rather it is a real-life example of the 
> usefulness of flexible, repurposable text).

And just *how* repurposable it is you can see from the fact that all
those old ASCII-only versions can be potentially transformed into
texts that are 'pleasing to read' as you say, by any volunteer who
wants to upload an HTML or even TEI version with bibliographic markup
of it.

You write in another mail:
> At the other end of the spectrum we have Project Gutenberg, which records 
> nothing about the provenance of a work, strips public domain works of all but 
> the alphabetic text, silently corrects "obvious" typographical errors, and 
> occasionally creates hybrid works by combining various editions without 
> comment or justification.

I take this as hyperbole as quite a few people now upload HTML and
include metadata in the title page. People using TEI use markup for
corrections from which original spelling can be reconstructed. And
the fingers of two hands suffice to count hybrid works.

Ah, but now I see, you talk about the backlog! Well, backlog *always*
is something to whine about, isn't it?


ralf


From schultzk at uni-trier.de  Wed Dec 19 01:00:45 2007
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Wed, 19 Dec 2007 10:00:45 +0100
Subject: [gutvol-d] my thoughts on a roundless system of proofing
In-Reply-To: <cf7.23ef7c3a.3499c6c7@aol.com>
References: <cf7.23ef7c3a.3499c6c7@aol.com>
Message-ID: <550CCD3D-E029-4A1A-AC5A-809FC7F61AD0@uni-trier.de>

Hi Bowerbird,

	An interresting concept. My gut feeling that there
	are still many questions that should/need be worked out.

Am 19.12.2007 um 01:58 schrieb Bowerbird at aol.com:

> here are those notes on a "roundless" system of proofreading...
>
> [Snip, Snip]
> you'll be presented with a text-field, and the scan-image of the page.
>
> fix all errors you can find, then say if the page is "better" or  
> it's "done".
>
> if -- after the last change is made on a page -- 2-4 proofers in a row
> _confirm_ the page as being "done", then it is considered as  
> "finished".
> but any _change_ automatically resets the confirmation counter to  
> zero.
	This assumes you can get several proofers for a particular page/text.
	What do you do then? Do you set a time limit? Also, how do you deal  
with
	the fact that 2/several proofers battle it out and keep recorrecting  
to what they feel
	is correct? The page/text may never get done!
>
>
> (the question as to whether the pages need 2 or 3 or 4 _confirmations_
> is one that will need to be subjected to some real-world empirical  
> testing,
> balancing the benefits against the costs of doing the extra  
> confirmations.
> we'd assume, of course, that more confirmations mean greater accuracy,
> []
[snip, snip]
>

> here are some rough approximations, though:
> 7 points -- making the last change to a page (i.e., none made after  
> you)
> 6 points -- declaring a page "done" and having it confirmed as  
> "finished"
> 5 points --  being the 2nd confirmation that a page is actually  
> "finished"
> 4 points --  being the 3rd confirmation that a page is actually  
> "finished"
> 3 points -- being the 4th confirmation that a page is actually  
> "finished"
> 2 points -- making a page better by correcting one or more errors
> 1 point -- proofing a page, but not finding errors, because it has  
> none
> 0 points -- not finding any errors, even though there are one or more
> -1 points -- declaring a page as "done" when it still contains an  
> error
> -2 points -- 2nd confirmation as "done" when the page contains an  
> error
> -3 points -- 3rd confirmation as "done" when the page contains an  
> error
> -4 points -- 4th confirmation as "done" when the page contains an  
> error
	OOOOppppps!!!! After a page has be confirmed as DONE who catches the  
error???
>
> -5 points -- introducing two or more errors onto a page
	How do you determine this. If a correction is made and others  
consider it not correct
	is that introduced error. What if the original has a typo in it and  
is corrected, yet
	others say no way jose!! How do detect this and deal with it?

	regards
		Keith.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071219/d4af9629/attachment.htm 

From Bowerbird at aol.com  Wed Dec 19 04:59:26 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 19 Dec 2007 07:59:26 EST
Subject: [gutvol-d] my thoughts on a roundless system of proofing
Message-ID: <c53.26071b60.349a6fae@aol.com>

keith said:
>    Hi Bowerbird,

and hello to you, mr. keith.   i've been wondering where you were at...


>    An interresting concept. My gut feeling that there are 
>    still many questions that should/need be worked out.

that could be...   but i do believe i've got all of the main bases covered...


>    This assumes you can get several proofers for a particular page/text.

oh yeah.   i expect that a half-dozen or more people might read each page.
my system makes the perusal of pages quite swift, which is very important.
i will encourage my proofers to go at large consecutive swatches of pages,
and to _read_for_content_, because that's a good way to find subtle errors.
(i realize some people disagree with that.   fine.   but let's not have that 
talk,
because it is an _empirical_ question, not a matter of _opinion_, thank you.
i expect the quality coming out of my system to be the proof of its pudding.)


>    What do you do then? Do you set a time limit? 

well, as i said, i expect to move every page in a book from raw to "finished"
in a matter of a day or two, and perhaps a mere matter of a couple hours...

it depends on how many proofers you have, of course.   but it also depends
on how long you _decide_ to let a book linger before you introduce another.
d.p. right now is charting that a good many of its books might take 
_years_...
we can debate whether or not that's a good form of architecture.   i wouldn't
_necessarily_ say that it's a bad way of proceeding, but still, i wouldn't do 
it.
my modus operandi would be to get a book through as quickly as possible.

on the other hand, as i also said, i'd leave a book to "simmer" for 3-6 
months,
where the _only_ way for end-users to read it is the hybrid text/image mode
of "continuous proofreading", a not-so-subtle "invitation" for 'em to proof 
it,
one that makes it dirt-simple to report an error on-the-spot, if 
encountered...


>    Also, how do you deal with the fact that 2/several proofers battle it 
out 
>    and keep recorrecting to what they feel is correct? The page/text may 
>    never get done!

i should have made clear that they can't "battle it out".   all they can do 
is to
create a situation of "dispute", which other people must come in and settle.

this is _not_ wikipedia.   neither proofer would benefit from "a revision 
war".
it will help neither their _quantity_ -- i.e., their pages proofed -- nor 
their
"quality" -- i.e., their accuracy score -- to go "back and forth" over a 
point.

if there's a disagreement, it's simply _policy_ to settle it one way or the 
other.

however, if you start with the notion that there's one-and-only-one "correct"
way to do the page, you won't have much argument about _how_ to obtain it.

some of my policies could well be ones that might cause volunteers to decide
_not_ to proof on my site.   bully for them, i say, standing up for their 
opinion.
let them go to -- or start up -- another site that does things how _they_ 
want.
there's plenty of room for everyone.

for instance, my philosophical position is to correct mistakes, and do it 
silently.
this fits with my vision of myself as a _republisher_, since this is almost 
always
what publishers have done in the past, and continue to do up to this very 
day...

but, you know, if someone else wants to elaborately annotate every correction
that they've made, saving information on what it was, and the change they 
made,
i'd wish them good luck and godspeed.   like i said, there's plenty of room 
for all...

but, given a firm orientation, there'll be no need for a "revision war" in 
this task...


>    OOOOppppps!!!! After a page has be confirmed as DONE who catches the 
error???

a page starts out as "raw".   everyone who changes it after that marks it as 
"better",
until _someone_ sticks their neck out and courageously declares it as being 
"done".

at that point, the page would be served up to any proofers who have indicated
they want to handle "confirmations", and they would try their best to find 
errors.

if they do _not_ find an error, they will issue a "confirmation" the page is 
"done".
depending upon how quickly they get their confirmation in, they might receive
anywhere from 3-5 points.

however, if they _do_ find an error, they will _correct_ it, and -- since 
their proof
is the latest to make a change to the page -- they'd get _7_ points for their 
action.
so even though "confirmations" are good, a negative-confirmation is even 
better.
that's what these eagle-eyes are _hoping_ to find, a page that was so good 
that
_someone_ thought it was worth sticking their neck out on, but oops! -- 
gotcha!

the positive points assume that "confirmations" of the page are indeed 
_correct_.

if _incorrect_ "confirmations" are issued -- that is, if someone comes along 
and
finds an error _after_ you had declared a confirmation -- you will _lose_ 
points...

of course, we won't be confident that the "error" they "corrected" _was_ an 
error,
at least not until _their_ version of the page receives 2-4 confirmations 
itself...

so -- from the standpoint of the system -- all we do is just sit back and 
_wait_
until a specific version of a page has received 2-4 confirmations, and then 
we
trace its history.   anyone who endorsed the page in that form gets positive 
points,
and anyone who endorsed _another_ version of the page (as "done" or 
"confirmed")
gets _negative_ points.   this even applies during the 3- to 6-month "simmer" 
period.

think of it as a parlor game, where you have fun with your friends by 
"beating" them.
the slow and steady route is to collect small amounts of points making pages 
better.
the only way to win big is to declare pages as "done" and/or do some 
confirmations,
but that route also exposes you to a risk of _losing_ points big-time.   it's 
a gamble...


>    -5 points -- introducing two or more errors onto a page
>    How do you determine this. 

the same way you determine whether the page is "correct" or not, i.e., 
whether it
receives 2-4 confirmations by other proofers.   so if i introduce errors onto 
a page,
and another person corrects 'em, and 2-4 other people confirm those 
corrections,
then i'm charged with negative points for introducing the errors.   it's 
quite simple.


>    If a correction is made and others consider it not correct
>    is that introduced error. What if the original has a typo in it 
>    and is corrected, yet others say no way jose!! 

well, since you've said straight out that "the original has a typo in it",
and i've said straight out that "the official policy is we correct typos",
then there is no question that this typo _should_ have been corrected,
and therefore there is no way you can say no way jose!   dispute solved.

the more interesting question, of course, is when it is _unclear_ whether
there is -- or is not -- a typo.   if you feel that a specific word _is_ a 
typo,
then you would stick your neck out and correct it.   and if the next person
agrees with you, they will stick their neck out and declare the page "done".
and if the person after that agrees with you both, they'll stick their neck 
out
and issue a "confirmation".   and when you finally get enough confirmations,
the page will be marked as "finished".   and that settles the whole 
question...

on the other hand, if someone disagrees with you, they will revert your edit,
so then you'll challenge their reversion, and the page will become a 
"dispute".
and everyone will waste gobs of time fighting about what it _should_ be, 
until
they all realize that until they settle this dispute, none of them are 
improving
_either_ their quantity (number of pages proofed) or quality (accuracy 
score),
so they're all losing ground to other proofers smart enough to avoid 
disputes.

so, as proofers grow into adults, and formulate solid policies, disputes go 
away.


>    How do detect this and deal with it?

well, it's quite easy to _detect_ if a page has reverted to an earlier 
version --
you just compare every "new" version to each of the previous saved versions.

as for _dealing_ with it, i just gave the reasoning how the problem solves 
itself.
at some point, people realize that they're not getting anything done while 
they
are engaged in a dispute, so they settle it and move on.   i mean, 
_realistically_,
the number of cases that are both _vague_ enough and _important_ enough
that people will engage in a long-running dispute becomes vanishingly small.
(we have ink-on-paper, for one, and a whole raft of grammar rules as guides.)
this becomes _especially_ true when there are clear guiding policies in 
place...

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071219/bb79caff/attachment-0001.htm 

From lee at novomail.net  Wed Dec 19 15:22:34 2007
From: lee at novomail.net (Lee Passey)
Date: Wed, 19 Dec 2007 16:22:34 -0700
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <20071219093416.GA29329@ark.in-berlin.de>
References: <cf8.2278c23b.3495952b@aol.com>	<01165190.20071215134141@noring.name>	<476579C4.9000200@novomail.net>	<20071217104548.GB7788@ark.in-berlin.de>	<4766B6A9.5080400@novomail.net>
	<20071219093416.GA29329@ark.in-berlin.de>
Message-ID: <4769A7BA.1070102@novomail.net>

Ralf Stephan wrote:

> So, where can pglaf metadata be accessed?

/Can/ pglaf metadata be accessed? Has it been preserved in any form? If
it is recreated from other sources, is it sufficiently reliable?

>> Project Gutenberg's e-texts, on the other hand, can be viewed on 
>> just about every piece of equipment ever manufactured, including 
>> S-100 bus machines running CP/M attached to VT-52 dumb terminals. 
>> The pure ASCII representation (well, not so pure now that LATIN-1 
>> encoding is permitted) can easily be repurposed for other 
>> applications not envisioned by the original creators, such as 
>> defeating spam filters. (I don't see this as a bad thing, rather it
>> is a real-life example of the usefulness of flexible, repurposable
>> text).
> 
> And just *how* repurposable it is you can see from the fact that all
> those old ASCII-only versions can be potentially transformed into 
> texts that are 'pleasing to read' as you say, by any volunteer who 
> wants to upload an HTML or even TEI version with bibliographic markup
> of it.

What you're talking about here is not repurposability, it is
re-creatability. Of course, the old ASCII-only versions can be edited,
but any electronic document can be edited; even old Wordstar documents,
can be still be converted and edited. But can they be used for
unintended purposes /without/ editing?

A few examples, based primarily, this time, on navigation:

Let's suppose I wrote the killer e-reader application. It can, of
course, display PG e-texts in their native state, with the exception
that it will ignore single newline sequences, and double newline
sequences will be converted to a single newline and 5 spaces of indentation.

What I want to do is auto-generate a table of contents for my users. How
can I do this reliably? If every PG text had the chapters labeled <h3
class="chapter"> I could search for these headers. If every PG text were
encoded in TEI I could scan for the <head> tags and create an outline
table of contents from there. If every PG text were marked in z.m.l. I
could search for the 3 or 4 or 5 blank lines (I have no recollection of
what the exact number really is) that indicate a chapter break and
select the first following block as the chapter title. And while there
are individual instances of each of these files in the PG corpus, 1.
they are a vast minority (is that an oxymoron?) 2. they are not reliably
marked in a way that allows me to distinguish them from other files
which have no markup, and 3. they are not even internally consistent
(most of the TEI files available from PG use the <p> tag to mark chapter
headers, making them indistinguishable from real paragraphs).

Let's suppose that I am an English teacher in Kenya. We are studying
George Eliot's book, _Silas Marner_. Half of my class has tattered
copies of the various editions of the book, and half of my class has a
new OLPC laptop, some set to various font sizes because of uncorrected
vision problems. I want to call my students attention to a particular
passage in the middle of the 5th paragraph in chapter 3. How do I do
that for the portion of the class that is using a PG e-text on their
laptops?

The only way I can think of is to expose enough of the text that the
students can do a word search for the phrase; and hope that they don't
mistype any part of which could be a lengthy passage, including
punctuation, or that the phrase is unique enough that they won't be led
to some other point in the text, or that the edition I'm using matches
the (unknown, unspecified) edition(s) used to create the PG edition.
This method would probably also be required for PG-TEI texts, because
due to the fact that /every/ block of text frequently seems to be marked
as a <p>aragraph, I can't even find the start of a chapter and count
paragraphs from there.

Let's suppose I'm a researcher doing research in American Naval History.
I have obtained a PG version of a 19th century American History
textbook. I want to find all the references to the warship USS
Constitution ("Old Ironsides") but not any references to the
Constitution of the United States.

Now this last one is a bit of a stretch; I would expect to be able to do
this a text marked up with level-three (complete) version of TEI, but
probably not with basic-level markup. But it /is/ one more example of
repurposing.

This is what I mean by repurposing: taking an existing text, without
manual edits (conversions using automated methods is acceptable), and
using it for a purpose other than that for which it was originally
intended. Some of these purposes may have been foreseen, some of them
may have been foreseeable if unforeseen, and some of them may have been
practically unforeseeable. But even in this last category, if we are
committed to saving as much data as possible, and doing so in a
structured way, making it amenable to automated data processing, we
might be able to satisfy even some of those unforeseeable purposes.

It seems to me that the vast majority of PG e-texts were designed to be
read by scrolling down a screen having a fixed, immutable, font of 80
characters per line. These files really are inadequate for just about
any other purpose. They have to be edited just to make them pleasing to
read on hand-held devices (just look at how many suggestions appear from
time to time on how to edit PG e-texts so that they word-wrap properly).
These files are editable, but they are not repurposable.

I have redone, or am in the process of redoing, in highly structured
markup, two or three texts that also exist in the PG corpus. I have
found that trying to add markup and metadata to existing PG texts is
more difficult that simply starting from scratch. So while PG texts are
clearly editable, there's no reason you would even want to upgrade them
in that fashion.

I /have/ found that having OCRed a new scan, the PG e-texts /are/ useful
in helping me find scan errors; but even then I have to edit the text by
normalizing it to match it against the scanned text.

> You write in another mail:
>> At the other end of the spectrum we have Project Gutenberg, which 
>> records nothing about the provenance of a work, strips public 
>> domain works of all but the alphabetic text, silently corrects 
>> "obvious" typographical errors, and occasionally creates hybrid 
>> works by combining various editions without comment or 
>> justification.
> 
> I take this as hyperbole as quite a few people now upload HTML and 
> include metadata in the title page. People using TEI use markup for 
> corrections from which original spelling can be reconstructed. And 
> the fingers of two hands suffice to count hybrid works.

Well, about 20% of the PG e-texts I have examined are hybrid works --
but then I've only looked at about 5 e-texts closely enough to make a
determination one way or the other. So I suppose my conclusion that 20%
of PG e-texts are hybrids is about as accurate as your conclusion that
less than 10 e-texts are hybrids. In reality, we just don't know, do we?
No one has kept any records of where any particular e-text came from, or
what changes have been made over time (except for some rare
acknowledgments of the sort of "John Doe was responsible for chapters
1-10 and Jane Roe added chapters 11-33.") We /do/ know that some
percentage of the files are hybrids or otherwise altered (not
necessarily intentionally or maliciously) but lacking that evidence I
believe the percentage is significant enough that I am unwilling to give
the e-texts of unknown provenance the benefit of the doubt.

> Ah, but now I see, you talk about the backlog! Well, backlog *always*
> is something to whine about, isn't it?

Yes, and this may be approaching the nub of the whole issue.

I have no doubt that there are gems amidst the dross, and that the
percentage of gems is increasing in later submissions; this is probably
due to the increasing sophistication of the volunteers who are producing
e-texts. Distributed Proofreaders has obviously had a significant
impact, but this impact tends to be restricted to the accuracy of the
text rather than the overall quality or repurposability of the structure
of the texts.

My particular management bias is that quality control is not so much an
issue of quality workers as it is an issue of quality processes. Thus,
while I think that the PG corpus, as a whole, is quite poorly done, I
don't think that's the fault of the volunteers who have been
contributing the work. Rather, it is a failure of the processes that
are, or should be, in place to /encourage/ and /enable/ quality work.

In an earlier response to this thread, Michael Hart said:

> Nobody forces anybody to do eBooks any particular way at PG.

He could just as easily have said:

"Nobody requires anybody to do eBooks in any particular way at PG."

or

"Nobody recommends that anybody do eBooks in any particular way at PG."

or

"Nobody encourages anybody to do eBooks in any particular way at PG."

or

"Nobody suggests that anybody do eBooks in any particular way at PG."

This is what I was referring to when I mentioned a failure of
leadership. There are /no/ explicit processes in place at PG to ensure
quality, and no indications that any will be forthcoming.

The quality of texts being included in the PG corpus is slowing 
increasing over the years, I believe, as I stated earlier, because 
volunteers contributing to PG are becoming more sophisticated in their 
knowledge of how to create e-books. I also believe that this increasing 
quality is occurring in spite of the efforts of The Powers That Be at 
PG, not because of them.

In a later portion of his message, Mr. Hart suggested that

> If you would like to write your own FAQs about how you think
> eBooks should be done, please do so, and we will try to find
> as many volunteers for your methodology as possible, and 90%
> of all our volunteers just might go that way, who knows?

In other words, Mr. Hart's approach to the problem of quality lies 
exclusively within the province of the volunteers. If you want to 
improve the quality of the PG corpus, just go out and find every 
volunteer who contributes to PG and try to convince him or her 
individually of the value of a quality product, and explain to him or 
her how that can be achieved. Then, to the extent you do succeed, PG 
will take that quality product, degrade it to a feature-less plain text 
edition, and then throw both products into the bin with the other rotten 
apples. The quality edition may be there, but you won't know it until 
you run across it.

But for those who want to see PG e-texts, as a whole, improve in 
quality, I don't think that relying on the increasing sophistication of 
the contributors is the right way to go. Rather, PG should, as an 
organization not as a diffuse group of contributors, adopt some 
practices and guidelines which will tend to increase quality.

Albert Einstein is credited with defining insanity as "doing the same 
thing over and over again and expecting different results." If PG 
continues to operate as it has over the past 15 years, I don't see any 
reason to believe that there will be any significant change in the 
overall quality of the corpus, despite the efforts of a few highly 
competent and highly motivated individuals.


From Bowerbird at aol.com  Wed Dec 19 16:47:34 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 19 Dec 2007 19:47:34 EST
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
Message-ID: <cc6.21d0a247.349b15a6@aol.com>

ralf said:
>    catalog.rdf is a start and 'just' needs to be augmented with 
>    1) original publisher, 
>    2) publishing year, 
>    3) publishing place 
>    4) possible edition and series information, 
>    5) possible editor/translator info, and 
>    6) a working link to the etext page instead of simply 'etext21990'. 
>    In short, metadata about the etext should 'link' to that info which is 
>    given to pglaf when a clearance is requested, the metadata of the book.

at last, somebody gets down to the _specifics_, instead of simply uttering 
the
supposedly-magical "metadata" word and expecting everyone to genuflect...

but -- as usual -- once we get down to the specifics, we _do_ have clarity.

the "original publisher" for a p.g. e-text is -- ta-da! -- project gutenberg.

the "publishing year" is whatever year the e-text is posted.

the "publishing place" is cyberspace.

the "possible edition and series information" is carried in the name.

and the "possible editor/translator info" is usually contained in the text.

_that_ is your metadata.   and you had it all along.   you had it all 
along...

***

now, of course, _none_ of this will make the anal-complusives happy.
they want to match up that e-text with some long-ago p-book twin...

(and then they want to _criticize_ it because it's not an _exact_ match!)

but that misses the point that this _electronic_ book is a _new_ edition...
there is no external requirement imposed that it has to match a p-book.
it simply must match itself, and it's even free to morph to something else.
and that's the very _essence_ of what it means to "be in the public domain".
the public-domain isn't a long-dead creature you've stuffed and mounted...
it's a living, breathing animal, as up-to-date as today (without any hyphen),
and as fully capable of ensnaring the mind as it has done for many decades.

the only question remaining is "why do these critics continue their efforts 
to
change p.g. into something which it has never been, and will never become?"

why don't they go off and create the kind of library that _they_ want 
instead?

the answer is obvious, of course.   they have no volunteers, and it takes 
them
_years_ -- evidently -- to create just a _handful_ of books in their 
manner...

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071219/0d20f184/attachment.htm 

From jon at noring.name  Wed Dec 19 17:40:18 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 19 Dec 2007 18:40:18 -0700
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <cc6.21d0a247.349b15a6@aol.com>
References: <cc6.21d0a247.349b15a6@aol.com>
Message-ID: <1933820418.20071219184018@noring.name>

Bowerbird wrote:

>  now, of course, _none_ of this will make the anal-complusives happy.
>  they want to match up that e-text with some long-ago p-book twin...

The term "anal-compulsive" is uncalled for.


About the rest of Bowerbird's comments, it is clear that PG is really
a repository of whatever anyone wants to put into it. The "YouTube" of
the public domain text world. About the only rigor exercised by PG is
its copyright clearance procedure. PG is simply a kind of barely-
organized anarchy, for better and for worse.

So calling PG a republisher is a real stretch, because PG does not do
anything editorial, or selective, or anything else even remotely what
a "publisher" does. (And of course, note that officially PG, via its
non-profit organization for donation purposes, calls itself a "literary
archive," not a publisher.)

So PG is simply a public repository of "stuff". (O.k., I'll accept
"literary archive" as a descriptor even if that is also a stretch
based on my personal views of what a literary archive should do.)

Now in the past, both Michael and Greg have put on the flag of
"republisher", and it is possible they may reassert that claim in
response to this message. Anyone can claim anything they want. I'm
simply calling PG as I see it is today, and I'll let the silent
majority reading this message decide for themselves what they believe
PG is.

About metadata issues, well maybe for another time...

Jon Noring


From schultzk at uni-trier.de  Thu Dec 20 00:56:09 2007
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 20 Dec 2007 09:56:09 +0100
Subject: [gutvol-d] my thoughts on a roundless system of proofing
In-Reply-To: <c53.26071b60.349a6fae@aol.com>
References: <c53.26071b60.349a6fae@aol.com>
Message-ID: <0E23DA80-78B8-4D25-A389-5138F66B6A5C@uni-trier.de>

Hi Bowerbird,


Am 19.12.2007 um 13:59 schrieb Bowerbird at aol.com:

> keith said:
> >   Hi Bowerbird,
>
> and hello to you, mr. keith.  i've been wondering where you were at...
	Thanks, for the mr. but Keith will do just fine. Actually, Schultz  
is the name,
	but I am mostly know as Keith, or Kies as the Germans pronounce it.
	Life has been giving a rough time, so I could only linger.
	
>
>
>
> >   An interresting concept. My gut feeling that there are
> >   still many questions that should/need be worked out.
>
> that could be...  but i do believe i've got all of the main bases  
> covered...
	From your other remarks, I can see that there is a lot that you have  
not said.
	Lets see if I understand things now:
		1) Your time line is quite short, but that is not a real problem.
		2) Work on the page is kind of anarchal in the sense that the proofers
		    must somehow agree or solve their disputes
		3) "Confirmers" seem to have the deciding vote, yet are not  
almighty since
	             another "confirmer" can refute
		4) Disputes have no mediator:
			If the disputees do not agree they may simply leave the page better.
			Others may step in and get the job "done"
			Proofers are expected to grow up ????
				This list is proof that adults enjoy and indulge in
				kindergarten games.
		5) I see no one who is there to enforce the rules, except peer  
presure.

	I see that you are basing the system on the psychological game that  
is very popular.
	I forgot its name. The problem is the game is not realistic. Its  
rules and parameters
	are far to restricted and do not reflect human nature. Yours is a  
nice implementation
	and allows more flexiblity. Also, seriuous proofers do not want to  
play games and may well
	be turned off by overly enthusiatic ones. There has to be some kind  
of authority.

	Like I said seems O.K. still some rough edges. Yet, as someone you  
know quite well said
	the proof is in eating the pudding.

	regards
		Keith.

	P.S. Season Greetings and to all a good night

>
>
> >   This assumes you can get several proofers for a particular page/ 
> text.
>
> oh yeah.  i expect that a half-dozen or more people might read each  
> page.
> my system makes the perusal of pages quite swift, which is very  
> important.
> i will encourage my proofers to go at large consecutive swatches of  
> pages,
> and to _read_for_content_, because that's a good way to find subtle  
> errors.
> (i realize some people disagree with that.  fine.  but let's not  
> have that talk,
> because it is an _empirical_ question, not a matter of _opinion_,  
> thank you.
> i expect the quality coming out of my system to be the proof of its  
> pudding.)
>
>
> >   What do you do then? Do you set a time limit?
>
> well, as i said, i expect to move every page in a book from raw to  
> "finished"
> in a matter of a day or two, and perhaps a mere matter of a couple  
> hours...
>
> it depends on how many proofers you have, of course.  but it also  
> depends
> on how long you _decide_ to let a book linger before you introduce  
> another.
> d.p. right now is charting that a good many of its books might take  
> _years_...
> we can debate whether or not that's a good form of architecture.  i  
> wouldn't
> _necessarily_ say that it's a bad way of proceeding, but still, i  
> wouldn't do it.
> my modus operandi would be to get a book through as quickly as  
> possible.
>
> on the other hand, as i also said, i'd leave a book to "simmer" for  
> 3-6 months,
> where the _only_ way for end-users to read it is the hybrid text/ 
> image mode
> of "continuous proofreading", a not-so-subtle "invitation" for 'em  
> to proof it,
> one that makes it dirt-simple to report an error on-the-spot, if  
> encountered...
>
>
> >   Also, how do you deal with the fact that 2/several proofers  
> battle it out
> >   and keep recorrecting to what they feel is correct? The page/ 
> text may
> >   never get done!
>
> i should have made clear that they can't "battle it out".  all they  
> can do is to
> create a situation of "dispute", which other people must come in  
> and settle.
>
> this is _not_ wikipedia.  neither proofer would benefit from "a  
> revision war".
> it will help neither their _quantity_ -- i.e., their pages proofed  
> -- nor their
> "quality" -- i.e., their accuracy score -- to go "back and forth"  
> over a point.
>
> if there's a disagreement, it's simply _policy_ to settle it one  
> way or the other.
>
> however, if you start with the notion that there's one-and-only-one  
> "correct"
> way to do the page, you won't have much argument about _how_ to  
> obtain it.
>
> some of my policies could well be ones that might cause volunteers  
> to decide
> _not_ to proof on my site.  bully for them, i say, standing up for  
> their opinion.
> let them go to -- or start up -- another site that does things how  
> _they_ want.
> there's plenty of room for everyone.
>
> for instance, my philosophical position is to correct mistakes, and  
> do it silently.
> this fits with my vision of myself as a _republisher_, since this  
> is almost always
> what publishers have done in the past, and continue to do up to  
> this very day...
>
> but, you know, if someone else wants to elaborately annotate every  
> correction
> that they've made, saving information on what it was, and the  
> change they made,
> i'd wish them good luck and godspeed.  like i said, there's plenty  
> of room for all...
>
> but, given a firm orientation, there'll be no need for a "revision  
> war" in this task...
>
>
> >   OOOOppppps!!!! After a page has be confirmed as DONE who  
> catches the error???
>
> a page starts out as "raw".  everyone who changes it after that  
> marks it as "better",
> until _someone_ sticks their neck out and courageously declares it  
> as being "done".
>
> at that point, the page would be served up to any proofers who have  
> indicated
> they want to handle "confirmations", and they would try their best  
> to find errors.
>
> if they do _not_ find an error, they will issue a "confirmation"  
> the page is "done".
> depending upon how quickly they get their confirmation in, they  
> might receive
> anywhere from 3-5 points.
>
> however, if they _do_ find an error, they will _correct_ it, and --  
> since their proof
> is the latest to make a change to the page -- they'd get _7_ points  
> for their action.
> so even though "confirmations" are good, a negative-confirmation is  
> even better.
> that's what these eagle-eyes are _hoping_ to find, a page that was  
> so good that
> _someone_ thought it was worth sticking their neck out on, but  
> oops! -- gotcha!
>
> the positive points assume that "confirmations" of the page are  
> indeed _correct_.
>
> if _incorrect_ "confirmations" are issued -- that is, if someone  
> comes along and
> finds an error _after_ you had declared a confirmation -- you will  
> _lose_ points...
>
> of course, we won't be confident that the "error" they "corrected"  
> _was_ an error,
> at least not until _their_ version of the page receives 2-4  
> confirmations itself...
>
> so -- from the standpoint of the system -- all we do is just sit  
> back and _wait_
> until a specific version of a page has received 2-4 confirmations,  
> and then we
> trace its history.  anyone who endorsed the page in that form gets  
> positive points,
> and anyone who endorsed _another_ version of the page (as "done" or  
> "confirmed")
> gets _negative_ points.  this even applies during the 3- to 6-month  
> "simmer" period.
>
> think of it as a parlor game, where you have fun with your friends  
> by "beating" them.
> the slow and steady route is to collect small amounts of points  
> making pages better.
> the only way to win big is to declare pages as "done" and/or do  
> some confirmations,
> but that route also exposes you to a risk of _losing_ points big- 
> time.  it's a gamble...
>
>
> >   -5 points -- introducing two or more errors onto a page
> >   How do you determine this.
>
> the same way you determine whether the page is "correct" or not,  
> i.e., whether it
> receives 2-4 confirmations by other proofers.  so if i introduce  
> errors onto a page,
> and another person corrects 'em, and 2-4 other people confirm those  
> corrections,
> then i'm charged with negative points for introducing the errors.   
> it's quite simple.
>
>
> >   If a correction is made and others consider it not correct
> >   is that introduced error. What if the original has a typo in it
> >   and is corrected, yet others say no way jose!!
>
> well, since you've said straight out that "the original has a typo  
> in it",
> and i've said straight out that "the official policy is we correct  
> typos",
> then there is no question that this typo _should_ have been corrected,
> and therefore there is no way you can say no way jose!  dispute  
> solved.
>
> the more interesting question, of course, is when it is _unclear_  
> whether
> there is -- or is not -- a typo.  if you feel that a specific word  
> _is_ a typo,
> then you would stick your neck out and correct it.  and if the next  
> person
> agrees with you, they will stick their neck out and declare the  
> page "done".
> and if the person after that agrees with you both, they'll stick  
> their neck out
> and issue a "confirmation".  and when you finally get enough  
> confirmations,
> the page will be marked as "finished".  and that settles the whole  
> question...
>
> on the other hand, if someone disagrees with you, they will revert  
> your edit,
> so then you'll challenge their reversion, and the page will become  
> a "dispute".
> and everyone will waste gobs of time fighting about what it  
> _should_ be, until
> they all realize that until they settle this dispute, none of them  
> are improving
> _either_ their quantity (number of pages proofed) or quality  
> (accuracy score),
> so they're all losing ground to other proofers smart enough to  
> avoid disputes.
>
> so, as proofers grow into adults, and formulate solid policies,  
> disputes go away.
>
>
> >   How do detect this and deal with it?
>
> well, it's quite easy to _detect_ if a page has reverted to an  
> earlier version --
> you just compare every "new" version to each of the previous saved  
> versions.
>
> as for _dealing_ with it, i just gave the reasoning how the problem  
> solves itself.
> at some point, people realize that they're not getting anything  
> done while they
> are engaged in a dispute, so they settle it and move on.  i mean,  
> _realistically_,
> the number of cases that are both _vague_ enough and _important_  
> enough
> that people will engage in a long-running dispute becomes  
> vanishingly small.
> (we have ink-on-paper, for one, and a whole raft of grammar rules  
> as guides.)
> this becomes _especially_ true when there are clear guiding  
> policies in place...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071220/1a3bde42/attachment-0001.htm 

From ralf at ark.in-berlin.de  Thu Dec 20 03:40:33 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Thu, 20 Dec 2007 12:40:33 +0100
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <4769A7BA.1070102@novomail.net>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name>
	<476579C4.9000200@novomail.net>
	<20071217104548.GB7788@ark.in-berlin.de>
	<4766B6A9.5080400@novomail.net>
	<20071219093416.GA29329@ark.in-berlin.de>
	<4769A7BA.1070102@novomail.net>
Message-ID: <20071220114033.GA31581@ark.in-berlin.de>

> > So, where can pglaf metadata be accessed?

Is it possible to get offcial info on this, Mister Hart?


Regards,
R Stephan


From ralf at ark.in-berlin.de  Thu Dec 20 08:08:25 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Thu, 20 Dec 2007 17:08:25 +0100
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <4769A7BA.1070102@novomail.net>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name>
	<476579C4.9000200@novomail.net>
	<20071217104548.GB7788@ark.in-berlin.de>
	<4766B6A9.5080400@novomail.net>
	<20071219093416.GA29329@ark.in-berlin.de>
	<4769A7BA.1070102@novomail.net>
Message-ID: <20071220160825.GA32015@ark.in-berlin.de>

You wrote 
> No one has kept any records of where any particular e-text came from, or
> what changes have been made over time (except for some rare
> acknowledgments of the sort of "John Doe was responsible for chapters

In an etext with number as low as 4080 I found this snippet:

Corrected EDITIONS of our etexts get a new NUMBER, 8gyge11.txt
VERSIONS based on separate sources get new LETTER, 8gyge10a.txt

So, you're saying changes to texts aren't documented--I find this
a bit strong. You should give evidence that this happens frequently
if you don't want to be named liar.

However, even if there were such cases, how often do you think will
they lead to a hybrid because both the corrector and the WW don't
get it that there are two significantly different versions?
Most correctors, I'd assume, are after typos they found while reading.
Why would someone submit half a text, anyway.


ralf


From Bowerbird at aol.com  Thu Dec 20 12:56:47 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 20 Dec 2007 15:56:47 EST
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
Message-ID: <d33.1e032420.349c310f@aol.com>

i said:
>? now, of course, _none_ of this will make the anal-complusives happy.

i'm sorry.

i shouldn't have said that.

because, of course, it's actually spelled like this:
>? now, of course, _none_ of this will make the anal-compulsives happy.

so let's see what google says about this:
>    compulsive:   12,000,000 hits
>    complusive:   57,600 hits

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071220/ec4ba2fb/attachment.htm 

From hart at pglaf.org  Thu Dec 20 13:34:49 2007
From: hart at pglaf.org (Michael Hart)
Date: Thu, 20 Dec 2007 13:34:49 -0800 (PST)
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
Message-ID: <Pine.LNX.4.64.0712201333050.16611@pglaf.org>


On Wed, 19 Dec 2007, Lee Passey wrote:

[snip]


> In an earlier response to this thread, Michael Hart said:
> 
>> Nobody forces anybody to do eBooks any particular way at PG.
> 
> He could just as easily have said:

Trying to force words into my mouth isn't getting you anywhere,
it just makes you look silly in the eyes of YOUR volunteers, or
potential volunteers.

Have you considered writing as though someone might be reading
what you are saying 10, 20, 30 years from now?


> "Nobody requires anybody to do eBooks in any particular way at 
> PG."
> 
> or
> 
> "Nobody recommends that anybody do eBooks in any particular way 
> at PG."
> 
> or
> 
> "Nobody encourages anybody to do eBooks in any particular way at 
> PG."
> 
> or
> 
> "Nobody suggests that anybody do eBooks in any particular way at 
> PG."
> 
> This is what I was referring to when I mentioned a failure of 
> leadership. There are /no/ explicit processes in place at PG to 
> ensure quality, and no indications that any will be forthcoming.

Different people view leadership in different ways, and you are
totally correct that the kind of leadership you promote here in
this thread, and that others, including Jon Noring, promoted in
past threads, will not "be forthcoming."

The answer is, as it always has been, that YOU the, volunteers,
provide that kind of leadership when and if you want it, but WE
do not. . .we leave the doors open to all volunteers:  not just
those who toe YOUR political party line.

Everyone is welcome to lead, or to follow, and you have to make
your own leaders and followers, if you want them to exist.

Greg Newby, The Board of Directors, and I will not do that kind
of thing for you. . .you want to be a boss, find/create an army
of followers on your own. . .by example. . .we will not specify
YOUR plans as the "official" plans of Project Gutenberg.

This has been tried time and time again, and all support is the
constant offer to help YOU get YOUR ideas into practice.

The results are YOURS, the volunteers'.

If YOU don't like those results, YOU do it the way YOU want it.

We will gladly help.

But if all you want to do is stand on the sidelines yelling the
instructions for everyone else to follow, I cannot say you have
great chance of success, given the years of history of persons,
much like yourself, trying this kind of thing over and over.

"Just DO it!". . .the way YOU want it, and see who follows.

If no one follows, try again, and again, and again.

If no one EVER follows, perhaps you are just no a leader.

WE are not going to just CALL you a leader because you ask.

You seem to think that because there is a vacuum in the kind of
leadership YOU think there should be that it would be easy from
YOUR point of view to step in and take over.

Volunteers don't repond to leadership the same way as others.

Get used to it, and learn how to lead them.

Run your ideas/ideals up the flagpole and see who salutes.

That's the way Project Gutenberg has always worked.

Work with it, or work against it, the choices are ALL YOURS.

No one chooses for YOU, no one chooses for THEM.

Each volunteer can do what s/he wants to do.


> The quality of texts being included in the PG corpus is slowing 
> increasing over the years, I believe, as I stated earlier, 
> because volunteers contributing to PG are becoming more 
> sophisticated in their knowledge of how to create e-books. I 
> also believe that this increasing quality is occurring in spite 
> of the efforts of The Powers That Be at PG, not because of them.

It is the freedoms listed above that allowed for such growth.

If we held the reins as tightly as others would have done. . .
who knows if our volunteers should have felt free enough to do
what you admire, or anything else, for that matter.

It's hard, if not impossible, to do something "in spite of the
efforts of The Powers That Be at PG, not because of them, when
"The Powers That Be at PG" encourage everyone to do what their
own desires direct them to do.

Your cutting off your nose to "spite" your face here in public
does little to increase your image with the volunteers.

If you really want to be one of The Powers That Be at PG," all
you have to do it pick up "The Power" and run with it.

It's just laying there for ANYONE to pick up and run with it.

And _I_, personally, think THAT is what you complain about!!!

_I_ think YOU do NOT want PG to be so grassroots. . . YOU want
some kind of CONTROL. . .that isn't there, never has been.

BUT!!!

If you REALLY want to DO something, the door is always open.

If you just want to kibitz, the breezes just pass you thru.


> In a later portion of his message, Mr. Hart suggested that
> 
>> If you would like to write your own FAQs about how you think
>> eBooks should be done, please do so, and we will try to find
>> as many volunteers for your methodology as possible, and 90%
>> of all our volunteers just might go that way, who knows?
> 
> In other words, Mr. Hart's approach to the problem of quality 
> lies exclusively within the province of the volunteers.

This is the way Project Gutenberg always has been.

Ruled by the volunteers.

If 90% of them like YOUR idea/ideal YOU become "The Power."

Just look at "Distributed Proofreaders" as an example.

Or any of the other national or regional Gutenberg sites.

No one "knighted" them and proclaimed them "The Powers."

"Just DO It!" was their motto, and they just did it.

Yes, we helped, but we help everyone. . . .


> If you want to improve the quality of the PG corpus, just go out 
> and find every volunteer who contributes to PG and try to 
> convince him or her individually of the value of a quality 
> product, and explain to him or her how that can be achieved.

You are welcome to use our own "Powers That Be" tools to do it,
via the Newsletter, or what have you.

However, it is obvious from the way you state your case that it
is more of a sarcastic comment than anything real, though we of
"The Powers That Be" would certainly hope otherwise. . . .


> Then, to the extent you do succeed, PG will take that quality 
> product, degrade it to a feature-less plain text edition, and 
> then throw both products into the bin with the other rotten 
> apples. The quality edition may be there, but you won't know it 
> until you run across it.

More sarcasm, which makes me wonder if I just wasted an hour--
perhaps "the thin veneer" has worn off, and it was only such a
sarcastic message all along. . . .


> But for those who want to see PG e-texts, as a whole, improve in 
> quality, I don't think that relying on the increasing 
> sophistication of the contributors is the right way to go.

Don't forget that the tools at our disposal are also indreasing
in sophistication, and it is much easier to increase quality or
quantity than ever before, even with the same volunteers.

All YOU have to do is LEAD THE WAY!!!


> Rather, PG should, as an organization not as a diffuse group of 
> contributors, adopt some practices and guidelines which will 
> tend to increase quality.

"PG" "as an organziation" should not adopt YOUR "practices and
guidlines" and more than anyone elses. . .YOU have to convince
"a diffuse group of contributors" just as we all had to do.

"I am not a number, I am a free man!"


> Albert Einstein is credited with defining insanity as "doing the 
> same thing over and over again and expecting different results." 
> If PG continues to operate as it has over the past 15 years, I 
> don't see any reason to believe that there will be any 
> significant change in the overall quality of the corpus, despite 
> the efforts of a few highly competent and highly motivated 
> individuals.

Perhaps you would like to make a wager based on that???


;-)


Thank You!!!


Give the world eBooks for 2008!!!


Michael S. Hart
Founder
Project Gutenberg

100,000 eBooks easy to download at:
http://www.gutenberg.org [already passed 25,500 eBooks]
http://www/gutenberg.cc [already passed 75,000 eBooks]
http://gutenberg.net.au   Project Gutenberg of Australia 1570+
http://pge.rastko.net 65 languages  PG of Europe ~500
http://gutenberg.ca  Project Gutenberg of Canada
http://preprints.readingroo.ms  Not Primetime Ready ~400

>>> Your Project Gutenberg Site Could Be Listed Here <<<

Blog at http://hart.pglaf.org


From gbnewby at pglaf.org  Thu Dec 20 15:27:49 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu, 20 Dec 2007 15:27:49 -0800
Subject: [gutvol-d] PGLAF metadata
In-Reply-To: <20071220114033.GA31581@ark.in-berlin.de>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name>
	<476579C4.9000200@novomail.net>
	<20071217104548.GB7788@ark.in-berlin.de>
	<4766B6A9.5080400@novomail.net>
	<20071219093416.GA29329@ark.in-berlin.de>
	<4769A7BA.1070102@novomail.net>
	<20071220114033.GA31581@ark.in-berlin.de>
Message-ID: <20071220232748.GC20405@mail.pglaf.org>

On Thu, Dec 20, 2007 at 12:40:33PM +0100, Ralf Stephan wrote:
> > > So, where can pglaf metadata be accessed?
> 
> Is it possible to get offcial info on this, Mister Hart?
> 
> 
> Regards,
> R Stephan

(I changed the subject line because I had been mostly ignoring
the thread)

What official PGLAF metadata do you want to access?  If
you're just looking for copyright clearance info that identifies
print volumes, David Price's list is a good place to start:

  http://www.dprice48.freeserve.co.uk/GutIP.html

  -- Greg

From ralf at ark.in-berlin.de  Fri Dec 21 00:12:36 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Fri, 21 Dec 2007 09:12:36 +0100
Subject: [gutvol-d] PGLAF metadata
In-Reply-To: <20071220232748.GC20405@mail.pglaf.org>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name>
	<476579C4.9000200@novomail.net>
	<20071217104548.GB7788@ark.in-berlin.de>
	<4766B6A9.5080400@novomail.net>
	<20071219093416.GA29329@ark.in-berlin.de>
	<4769A7BA.1070102@novomail.net>
	<20071220114033.GA31581@ark.in-berlin.de>
	<20071220232748.GC20405@mail.pglaf.org>
Message-ID: <20071221081236.GA782@ark.in-berlin.de>

> What official PGLAF metadata do you want to access?  If
> you're just looking for copyright clearance info that identifies
> print volumes, David Price's list is a good place to start:
> 
>   http://www.dprice48.freeserve.co.uk/GutIP.html

I'm sorry to say that the info does not identify print volumes.
Especially the well known books have several editions. So, what's
missing is

- original publishing place
- original publishing year

Let's say we don't need the publisher because it's highly unlikely
different editions have the same place and year. No one would need
this info if we could access the cleared title pages, however, from
the etext page, for example.

So, is it possible to access place/year for a work? If not, is it
possible to get at the title scan?


Thanks for your time,
R Stephan

From richfield at telkomsa.net  Fri Dec 21 00:57:48 2007
From: richfield at telkomsa.net (Jon Richfield)
Date: Fri, 21 Dec 2007 10:57:48 +0200
Subject: [gutvol-d] Bookworm frass
Message-ID: <476B800C.3020700@telkomsa.net>

To Greg in particular, who said:
 >

Project Gutenberg regularly receives such items (sometimes in the
hopes that they'll be judged as public domain in the US, under
our copyright clearance procedures).

As a library (under the US's tax ruling), we are legally able to archive
such items indefinitely, but not redistribute them. 

In short, you can send such items to me (or to Michael Hart) and
we'll do our best to hold them until they become public domain in
the US.
<

Thanks Greg, that was the definitive answer to my question.  I'll be back.  
Meanwhile, greetings (seasonal and otherwise) and thanks to everyone else who responded.

Cheers,

Jon


From Bowerbird at aol.com  Fri Dec 21 09:06:03 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 21 Dec 2007 12:06:03 EST
Subject: [gutvol-d] chinese
Message-ID: <d68.1b3109d5.349d4c7b@aol.com>

love to see all the books in chinese being posted!

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071221/2221b50a/attachment.htm 

From Bowerbird at aol.com  Fri Dec 21 09:19:48 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 21 Dec 2007 12:19:48 EST
Subject: [gutvol-d] my thoughts on a roundless system of proofing
Message-ID: <bef.1c596bdf.349d4fb4@aol.com>

keith said:
>    3) "Confirmers" seem to have the deciding vote, yet 
>    are not almighty since another "confirmer" can refute

"consensus" is the operative concept here...

the question in a roundless system is "when is a page done?"

in a system that consists of a known number of rounds, a page 
is considered "finished" when it goes through all those rounds.
even though it still might contain errors, it's considered "done".

in a roundless system, you need to have some mechanism that
tells you that the page -- though it might still contain errors --
is nonetheless to be considered "done".

the answer that's been in the forefront at distributed proofreaders
-- for some time now, with apparently no one to challenge it --
is some combination of the measures of the difficulty of the page
and the competence of the various proofers who have attacked it.

as you can imagine, it's pretty difficult to _obtain_ those measures.

what i propose, in contrast, is a measure that is starkly simple...
and -- perhaps more importantly -- will work as well, or better.


>    4) Disputes have no mediator:
>    If the disputees do not agree they may simply leave the page better.
>    Others may step in and get the job "done"

disputes _do_ have a mediator.   that mediator is _consensus_.

i didn't detail the processes that might lead to a consensus when
we are in a "dispute" situation, but they might take _many_ forms,
assuming that the two sides of the dispute couldn't work it all out.
at the most basic, you could have people vote on the two options.
or the person running the site could decide on the official policy...
(over at d.p. presently, the "match-the-page" motto can trump all.)

however, the fact of the matter is that there are very few "issues"
that lead to people digging in their heels for an extended conflict.

in deciding what those ink-marks _mean_ on the page of a book,
it's not all that convoluted.   if it was, readers would have rebelled.

i've followed the forums over at distributed proofreaders for years,
literally, and i cannot say that i remember even one such situation.
there are a lot of fights about the system, but not the basic task...

in most situations that come up, there is _uncertainty_ about them
-- is that a comma or a semicolon?, is this a printer-error or not? --
which is quite different from two sides locked in an intractable fight.


>    Proofers are expected to grow up ????
>    This list is proof that adults enjoy and indulge in   kindergarten 
games.

the "job" of this list isn't straightforward, like digitizing a page of a 
book.


>    5) I see no one who is there to enforce the rules, except peer presure.

it's my system, so i'll enforce the rules.           :+)

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071221/d0060e20/attachment-0001.htm 

From lee at novomail.net  Fri Dec 21 10:29:43 2007
From: lee at novomail.net (Lee Passey)
Date: Fri, 21 Dec 2007 11:29:43 -0700
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <Pine.LNX.4.64.0712201333050.16611@pglaf.org>
References: <Pine.LNX.4.64.0712201333050.16611@pglaf.org>
Message-ID: <476C0617.1000503@novomail.net>

Michael Hart wrote:

[snip]

> Different people view leadership in different ways, and you are
> totally correct that the kind of leadership you promote here in
> this thread, and that others, including Jon Noring, promoted in
> past threads, will not "be forthcoming."

[snip]

> Greg Newby, The Board of Directors, and I will not do that kind
> of thing for you. . .you want to be a boss, find/create an army
> of followers on your own. . .by example. . .we will not specify
> YOUR plans as the "official" plans of Project Gutenberg.

[snip]

> Each volunteer can do what s/he wants to do.

[snip]

> It's hard, if not impossible, to do something "in spite of the
> efforts of The Powers That Be at PG, not because of them, when
> "The Powers That Be at PG" encourage everyone to do what their
> own desires direct them to do.

[snip]

>> In a later portion of his message, Mr. Hart suggested that
>>
>>> If you would like to write your own FAQs about how you think
>>> eBooks should be done, please do so, and we will try to find
>>> as many volunteers for your methodology as possible, and 90%
>>> of all our volunteers just might go that way, who knows?
>> In other words, Mr. Hart's approach to the problem of quality 
>> lies exclusively within the province of the volunteers.
> 
> This is the way Project Gutenberg always has been.

[snip]

> "PG" "as an organziation" [sic] should not adopt YOUR "practices and
> guidlines" [sic] and [sic] more than anyone elses [sic] . . .YOU have to convince
> "a diffuse group of contributors" just as we all had to do.

For the record, I have absolutely no desire to become one of "The Powers 
That Be" at Project Gutenberg. And I certainly haven't ever advocated 
the adoption by PG any particular standard or guideline, let alone my own.

I /do/ believe that the existence of well-publicized standards and 
guidelines is a necessary prerequisite to effect quality and control, 
and I believe that PG would benefit from the adoption of such standards 
and guidelines, no matter what they may be, but then I'm not 
particularly interested in improving Project Gutenberg, either. I long 
ago realize the futility of any such attempt.

Like BowerBird, I /would/ like the PG corpus (body of works stored in 
the PG database) to be internally consistent; that way I can steal some 
of its components and write software to convert them into something more 
useful. But internal consistency would require standards, and we should 
all recognize by now that /that/ ain't gonna happen.

[In a message posted this last October Mr. Hutchinson suggested that the 
PG white-washers would reject any submission that was not accompanied by 
a degraded text version, or at least something that could be confused 
for a degraded text version (I'm assuming something using z.m.l. or 
reStructured text would probably pass muster). If this is true, PG does, 
in fact, have some sort of standards, it's just that no one in the PG 
organization is willing to admit it and those wielding the power haven't 
been identified.]

My comments are mostly intended to reinforce the message of Michael Hart:

PG has no standards, PG will never have standards, PG won't even make 
suggestions to help the volunteers learn how to create e-texts for fear 
that they might be misconstrued as organizational standards.

If you are a volunteer who feels s/he could work better in an 
organization that provides guidance and quality control, you should find 
some other organization to work with. (Any suggestions as to what other 
organizations meet these requirements would be welcome; Distributed 
Proofreaders is an obvious option).

If you are a volunteer who does not want the quality work you have 
performed to be subverted and degraded, you should try to find a more 
appropriate repository for your work. (Internet Archive?)

If you are a volunteer who has ideas about how e-texts can be improved, 
or how the process of creating e-texts can be improved, you are welcome 
and encouraged to drag your soap box to any forum on the internet you 
can find (including this one) but don't expect any support from PG 
beyond the obvious "we support your right to express your opinion."

>> Albert Einstein is credited with defining insanity as "doing the 
>> same thing over and over again and expecting different results." 
>> If PG continues to operate as it has over the past 15 years, I 
>> don't see any reason to believe that there will be any 
>> significant change in the overall quality of the corpus, despite 
>> the efforts of a few highly competent and highly motivated 
>> individuals.
> 
> Perhaps you would like to make a wager based on that???

Sure. All we have to do is agree on the standards by which the perceived 
increase in quality will be judged.

> ;-)


From Bowerbird at aol.com  Fri Dec 21 11:59:47 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 21 Dec 2007 14:59:47 EST
Subject: [gutvol-d] obsessive-compulsive delightful
Message-ID: <d15.18a55cb8.349d7533@aol.com>

i am informed that jon noring said:
>   The term "anal-compulsive" is uncalled for.

first of all, _i_ can decide for myself what words are "called for"
to describe my positions.   but hey jon, thanks for the feedback.

alas, jon noring has used the term "anal" himself, in a way that
one can only come to believe he thinks it applies _to_ himself,
so i'm not sure why he would make a fuss when i use the term.
(and note that i didn't point any fingers when i used the term.)

for example, here:
>    http://groups.yahoo.com/group/distscan/message/31
jon noring said this:
>    There are lots of people who love to scan, and who are 
>    super-meticulous and into high quality -- some would 
>    call them anal. I'd rather have five people who are anal

and here:
>    http://groups.yahoo.com/group/distscan/message/33
jon noring said this:
>   some volunteer in this group who has a lot of experience 
>    with scanning, and preferably who describes themselves
>    as a quality fanatic (if they admit they are "anal" about 
>    doing it right, that's the person I want)

and here:
>    http://groups.yahoo.com/group/distscan/message/35
jon noring said this:
>   This is another reason why the scanner needs to be thorough 
>    almost to an anal level

it's obvious that jon noring considers "anal" as a compliment...

yet when i use the same term, he considers it to be an insult...

and such, my good friends, is the beauty of a rorschach blot:
by telling you what _it_ is, a person tells you who _they_ are...

or perhaps, since jon noring considers "anal" as a compliment,
then it must be that "compulsive" is the "uncalled for" word here.

but, you know what they say: the first step in solving a problem
is admitting that you _have_ a problem...   happy solstice, folks!

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071221/f44f33a8/attachment.htm 

From lee at novomail.net  Fri Dec 21 12:05:29 2007
From: lee at novomail.net (Lee Passey)
Date: Fri, 21 Dec 2007 13:05:29 -0700
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <20071220160825.GA32015@ark.in-berlin.de>
References: <cf8.2278c23b.3495952b@aol.com>	<01165190.20071215134141@noring.name>	<476579C4.9000200@novomail.net>	<20071217104548.GB7788@ark.in-berlin.de>	<4766B6A9.5080400@novomail.net>	<20071219093416.GA29329@ark.in-berlin.de>	<4769A7BA.1070102@novomail.net>
	<20071220160825.GA32015@ark.in-berlin.de>
Message-ID: <476C1C89.7000809@novomail.net>

Ralf Stephan wrote:

> You wrote
 >
>> No one has kept any records of where any particular e-text came from, or
>> what changes have been made over time (except for some rare
>> acknowledgments of the sort of "John Doe was responsible for chapters
> 
> In an etext with number as low as 4080 I found this snippet:
> 
> Corrected EDITIONS of our etexts get a new NUMBER, 8gyge11.txt
> VERSIONS based on separate sources get new LETTER, 8gyge10a.txt
> 
> So, you're saying changes to texts aren't documented--I find this
> a bit strong. You should give evidence that this happens frequently
> if you don't want to be named liar.

No, all I should have to do is give evidence that it happens 
occasionally. If it happens occasionally, and there is no way of knowing 
when it did or did not happen, then nothing can be relied on.

Let me be clear, here, that I'm not saying reliability is necessary. 
Most of the e-texts at PG can carry the same amount of enjoyment whether 
they are reliable or not, and in most cases the lack of reliability is 
just not that important. It's only if reliability is important to you 
that PG e-texts fall short (at least in this regard).

There is a saying in the Intelligence community that "knowing there is a 
secret is half the secret." So, if you find a text whose name is a 
string of characters followed by a number greater than 10 (I believe 10 
represents version 1.0) you have discovered half of the secret: you now 
know that there /is/ a secret. (I have no idea how you would discover 
this secret for the newer e-texts above 10000 which are named by number 
alone.)

But while you have discovered that there is a secret, you haven't 
discovered what the secret actually is. What are the changes, who made 
them, and why were they made?

> However, even if there were such cases, how often do you think will
> they lead to a hybrid because both the corrector and the WW don't
> get it that there are two significantly different versions?
> Most correctors, I'd assume, are after typos they found while reading.
> Why would someone submit half a text, anyway.

I dunno, but it happens.

Consider the infamous case of _Frankenstein_. In the header of the most 
recent PG edition (15) it states:

<quote>
Release Date: October 31, 1993  [eBook #84]
[Most recently updated: May 30, 2005]

...

[Chapters 1-6:  mostly scanned by David Meltzer,
Meltzer at ..., proofread, partially typed and submitted by
Christy Phillips, Caphilli at ..., submitted on 9/24/93.
Proofread by Lynn Hanninen, submitted 10/93.]

Frankenstein, continued (Chapters 20-24)
Scanned by Judy Boss (boss at ...)
Proofread by Christy Phillips (caphilli at ...)
Reproofed by Lynn Hanninen (leh1 at ...)
Margination and last proofing by anonymous volunteers
</quote>

It would appear from this entry that Mr. Meltzer and colleagues scanned 
and submitted the first 6 chapters in the fall of '93. It also appears 
that sometime there after the project was taken up again by Ms. Boss and 
colleagues, at least for chapters 20-24. How chapters 7 through 19 got 
into the text is a mystery.

In February 2005 it was reported that by empirical investigation, the PG 
edition of _Frankenstein_ differed, mostly in punctuation, from most 
paper editions of _Frankenstein_, but it was virtually identical to the 
1981 Penguin Classics edition of that work, which had "modernized" both 
the punctuation and spelling. 
(http://groups.yahoo.com/group/ebook-community/message/22105).

Later, it was discovered (I thought the message was posted here, but I 
can't seem to find it in the archives) that at some point the PG texts 
began to exhibit the more archaic spelling and punctuation styles. It 
could have been as early as Chapter 7 in the book, but may have been as 
late as chapter 24.

At any rate, it is by now fairly clear that the PG e-text of 
_Frankenstein_ started as a transcription of the 1981 Penguin Classics 
edition, and was completed using some other, more archaic, edition -- 
the classic definition of a hybrid work.

The earliest version of _Frankenstein_ I was able to find on the PG 
website was version 10, claimed to have been released on October 31, 
1993. It was not until version 14 that the texts even contained any 
mention that edits had occurred (version 14 indicates "[Date last 
updated: May 15, 2004]").

So, knowing that there is a secret (the numbers changed), and knowing 
how to find older versions of texts in the PG archive, we could probably 
  normalize and "diff" the texts and find out what changes were made 
between any to versions. But even that will only tell us /what/ has 
changed, it won't tell us /why/ it was changed, /who/ made the change, 
or what the justification for the change was.

In May of 2005, on this listserv, Michael Hart stated:

 > Recently just such a discussion ocurred [sic] about Frankenstein and
 > about The Memoirs of Sherlock Holmes, and new editions have already
 > appeared for each of these, and yet another new edition is already in
 > progress for each of them, each with significan [sic] improvements
 > from different sources, as well as improvements we tend to make along
 > the way.

While I have not done the research necessary to verify this, it would 
appear from Mr. Hart's comments that the PG edition of _The Memoirs of 
Sherlock Holmes_ is yet another of these hybrid editions.

Lastly, in August of this year Mr. Newby wrote:

 > We don't enforce any adherance to a particular printed edition,...and
 > do have a number of frankentexts that have benefitted from different
 > sources.

I conclude that there are an unknown but significant number of these 
hybrid texts included in the PG corpus, for whatever that is worth.

Again, I want to point out that I do not necessarily think that these 
hybrid texts are a bad thing. The Penguin Classics edition of 
_Frankenstein_ was clearly an edited, modernized version, but there is 
no acknowledgment of that in their book, nor any identification of 
/their/ source materials.  I think that BowerBird was absolutely correct 
when he replied that the metadata for any PG text should be:

Publisher: Project Gutenberg
Publishing year: whenever you downloaded it
Publishing place: http://www.gutenberg.org
Possible edition and series information: the file name
Possible editor/translator info: Anonymous Project Gutenberg volunteers.

This is as good as you're going to get from any print publisher, why 
should you expect better metadata from e-texts published by Project 
Gutenberg? Now it may be that you have misconstrued Project Gutenberg as 
an electronic archive or electronic library. Rather than rendering my 
own opinion, let me just encourage you to consider the hallmarks of an 
electronic archive, an electronic library, and an electronic publisher, 
and come to your own conclusions.


From hart at pglaf.org  Fri Dec 21 12:33:18 2007
From: hart at pglaf.org (Michael Hart)
Date: Fri, 21 Dec 2007 12:33:18 -0800 (PST)
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <476C1C89.7000809@novomail.net>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name>
	<476579C4.9000200@novomail.net>
	<20071217104548.GB7788@ark.in-berlin.de>
	<4766B6A9.5080400@novomail.net>
	<20071219093416.GA29329@ark.in-berlin.de>
	<4769A7BA.1070102@novomail.net>
	<20071220160825.GA32015@ark.in-berlin.de>
	<476C1C89.7000809@novomail.net>
Message-ID: <Pine.LNX.4.64.0712211226310.9510@pglaf.org>


As far as I know, every version of every Project Gutenberg 
eBook is still in the Project Gutenberg archives.

In cases where I have found missing eBooks right after the
books were posted even, I have been able to retrieve these
missing eBooks from various overnight archives.

While I am sure someday someone MIGHT find a missing one--
I can also tell you that NOT ONCE has ONE person asked for
a 1988 copy of Alice In Wonderland, etc.

While I do not engage in dancing on pinheads, I am not the
kind of person who forbids it, either.

There is a better record of the history of our eBooks than
in any other eLibrary, once completed. . .but as for those
who did which pages, the answer is simple. . .a very large
amount of the early work was done by. . .anonyous.

In the particular case mentioned, I may even know who that
anonymous volunteer was.

But I'm not going to ever violate that confidence in these
current times, or perhaps even after.

mh

From hart at pglaf.org  Fri Dec 21 12:36:54 2007
From: hart at pglaf.org (Michael Hart)
Date: Fri, 21 Dec 2007 12:36:54 -0800 (PST)
Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm frass?
Message-ID: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>


On Fri, 21 Dec 2007, Lee Passey wrote:

> For the record, I have absolutely no desire to become one of 
> "The Powers That Be" at Project Gutenberg. And I certainly 
> haven't ever advocated the adoption by PG any particular 
> standard or guideline, let alone my own.

You certainly want Project Gutenberg to be more like YOUR
image if what Project Gutenberg SHOULD be. . .period.

However, you think that you can hide behind NOT making an
effective statement of your own. . .thus simply adding an
assortment of verbiage to the noise level, no signal just
more noise. . . .

Just because your statements of desire are fuzzy does not
make them non-existent.


> I /do/ believe that the existence of well-publicized standards 
> and guidelines is a necessary prerequisite to effect quality and 
> control,

This is the advocation of a standard, however fuzzy those
advocations may be. . . .


> and I believe that PG would benefit from the adoption of such 
> standards and guidelines, no matter what they may be,

Everyone could run Project Gutenberg better than we do.

Obviously.

And we invite you to do it.

However, if fuzzy woolgathering is all you have in mind,
I think we have better things to do.


> but then I'm not particularly interested in improving Project 
> Gutenberg, either.

Then what are you doing here?

Is naysaying all you have in mind?

If you are "not particularly interested in improving Project
Gutenberg, either," then all you are doing is making noise.


> I long ago realize the futility of any such attempt.

This presumes that you were here "long ago," but I don't see
any books contributed from the direction you are proposing--
or indeed anything with your name on it--for the past year.

Should I presume you may have contributed anonymously?

Or is your resistance really totally futile?


> Like BowerBird, I /would/ like the PG corpus (body of works 
> stored in the PG database) to be internally consistent; that way 
> I can steal some of its components and write software to convert 
> them into something more useful.

Then you /should/ do something to exemplify your desires.

How can anyone evaluate what you say without such examples?

Perhaps that is your reason for avoiding being specific?


> But internal consistency would require standards, and we should 
> all recognize by now that /that/ ain't gonna happen.

It certainly won't happen if you won't even create examples.


> [In a message posted this last October Mr. Hutchinson suggested 
> that the PG white-washers would reject any submission that was 
> not accompanied by a degraded text version, or at least 
> something that could be confused for a degraded text version 
> (I'm assuming something using z.m.l. or reStructured text would 
> probably pass muster). If this is true, PG does, in fact, have 
> some sort of standards, it's just that no one in the PG 
> organization is willing to admit it and those wielding the power 
> haven't been identified.]

The WhiteWashers are not the only way eBooks get donated.


Need examples?


Just look at all the books coming out this week. . . .


> My comments are mostly intended to reinforce the message of 
> Michael Hart:

OH OH!!!

I sense more words being stuffed into my mouth.


> 
> PG has no standards, PG will never have standards,

Not standards than can be forced on volunteers, only standards
that are more on the order of suggestions, excepting legal and
other standards of a minimal nature.


> PG won't even make suggestions to help the volunteers learn how 
> to create e-texts for fear that they might be misconstrued as 
> organizational standards.

What Mr. Passey fears is that there are too many standards,
not that there aren't any at all. . . .

Any volunteer can simply pick out a favorite book and decide,
for him/herself that this is how they want THEIR books to be.

But no one else gets to decide for them, unless they WANT to
be a member of a certain group. . .still voluntary.


> If you are a volunteer who feels s/he could work better in an 
> organization that provides guidance and quality control, you 
> should find some other organization to work with.

Actually, you should just form your own group, if none of the
already existing sub-groups of Project Gutenberg suit you.

Mr. Passey is confusing freedom and independence with the lack
of any standards whatsoever. . .I think this happened before.


> (Any suggestions as to what other organizations meet these 
> requirements would be welcome; Distributed Proofreaders is an 
> obvious option).

Distributed Proofreaders came about in exactly the manner _I_
have been describing, from WITHIN Project Gutenberg and quite
WITHOUT any need for such nasty commentaries.

Distributed Proofreaders is a GREAT example of how volunteers
create their own standards, their own groups, etc., with lots
of help from "The Powers That Be."


> If you are a volunteer who does not want the quality work you 
> have performed to be subverted and degraded, you should try to 
> find a more appropriate repository for your work.

All any volunteer has to do is SAY inside their eBook that they
don't want it changed. . .period.

Mr. Passey is sooo off base here, it's not even a red herring.


> (Internet Archive?)

Project Gutenberg has worked with The Internet Archive all along, 
but their goals are not nearly identical.

Project Gutenberg is much more about full text documents--
well proofread--with user generated error correction.

Mr. Passey will not find this in Google, Yahoo, Amazon, or
even Sony, or too many other eBook sources.

Just WHY is Mr. Passey saying all this, anyway?

He SAYS he has no hope, want to make no contribution.

Is his ONLY goal the destruction of Project Gutenberg
by getting all the volunteers to desert?


> If you are a volunteer who has ideas about how e-texts can be 
> improved, or how the process of creating e-texts can be 
> improved, you are welcome and encouraged to drag your soap box 
> to any forum on the internet you can find (including this one) 
> but don't expect any support from PG beyond the obvious "we 
> support your right to express your opinion."

At least you won't get censored here, as Mr. Noring proposed,
as Mr. Ockerbloom was famous for. . .we even let Mr. Passey
rant and rave to his heart's content.

Why?

We don't believe in censorship.

Censorship of you.

Censorship of him.

Personally, I think he is complaining because we don't pick
one standard and censor all the others.

Personally, I don't think THE standard for eBooks exists yet.

But I think it will become obvious when it does.

I was unwilling to force HTML on our volunteers and readers
when Sir Tim Berners-Lee invited me, and Project Gutenberg,
to be one of the charter members of The World Wide Web, and
I stand behind that decision today, simply because I am not
a purveyor of standards, and neither is Project Gutenberg.

Whatever standards emerge from the real world are just fine.

If Mr. Passey is unwilling to provide examples of standards,
then it is highly unlikely that he will ever get exactly the
standards he wants, or anything close to it.

I was involved with Unicode people when it was being set up,
and I can tell it it was a zoo, same with TEI, ZML, ZML, and
all the rest.

The time wasted was enormous.

I'm glad I didn't get involved more than I did.

When I realized NONE of my suggestions would be taken I just
decided to do it on my own, which is exactly what Mr. Passey
and Mr. Noring should be doing. . .should have done long ago.

If YOU won't put YOUR time, effort, and money where the mouth
has put in so much mileage. . .why should anyone else?

The reason The Web succeeded, browsers succeeded, and eBooks,
is the the people doing them didn't wait for approval. . . .

"Just DO It!"


From vze3rknp at verizon.net  Fri Dec 21 13:23:28 2007
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Fri, 21 Dec 2007 16:23:28 -0500
Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm
 frass?
In-Reply-To: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
Message-ID: <476C2ED0.5030406@verizon.net>


Michael Hart wrote:
> Distributed Proofreaders came about in exactly the manner _I_
> have been describing, from WITHIN Project Gutenberg and quite
> WITHOUT any need for such nasty commentaries.
>
> Distributed Proofreaders is a GREAT example of how volunteers
> create their own standards, their own groups, etc., with lots
> of help from "The Powers That Be."
>   
Well, that's one version of history.

Charles Franks (founder of DP) was a PG volunteer who decided to try to 
make a better way of proofreading books for PG. Aside from the usual 
"let lots of flowers bloom" statements from PG there was NO early 
support. Charles did all the coding, ran the software on his own server, 
etc. When I went looking for DP in April of 2002, having remembered that 
it had been mentioned on gutvol-d several years before (2000), I could 
find no mention of it at all on the PG website. From PG it sure looked 
like DP didn't exist. In the public interviews and other publicity that 
Michael Hart did for PG in 2002 and 2003, at least those that I was 
aware of, Michael never once mentioned DP.

However, in the summer of 2002, PGLAF did provide a high-speed, 
destructive scanner setup for Charles Franks. And at the beginning of 
2003 provided a second setup. The Internet Archive provided our next 
server, around Sept. 2002 or so. When we could no longer afford "free" 
service from IA, PGLAF bought DP a server and paid for hosting. So I'm 
not saying that PG has been unsupportive of DP. Once it was apparent 
that DP was thriving and would produce lots of material for PG, there 
was plenty of support. But for those first 2.5 years it was different. 
Today, most of the DP volunteers still arrive via the banner at PG and 
most are strongly supportive of PG and its mission.

JulietS
Distributed Proofreaders


From jon at noring.name  Fri Dec 21 13:44:02 2007
From: jon at noring.name (Jon Noring)
Date: Fri, 21 Dec 2007 14:44:02 -0700
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <476C0617.1000503@novomail.net>
References: <Pine.LNX.4.64.0712201333050.16611@pglaf.org>
	<476C0617.1000503@novomail.net>
Message-ID: <191898394.20071221144402@noring.name>

Lee wrote:

> I /do/ believe that the existence of well-publicized standards and
> guidelines is a necessary prerequisite to effect quality and
> control, and I believe that PG would benefit from the adoption of
> such standards and guidelines, no matter what they may be, but then
> I'm not particularly interested in improving Project Gutenberg,
> either. I long ago realize the futility of any such attempt.

Lee summarizes the two approaches to running a "movement" like PG:

1) It is Not Good for a "movement" like PG to require any standards/
   guidelines beyond the absolute minimum necessary to define the
   project and to protect it from any legal problems.

   In the case of PG, the absolute minimum necessary is

      a) to receive and make available for free download textual
         content in human readable plain text form. (This defines
         the historical core of PG's vision.)

      b) to copyright clear all texts submitted to the archive.
         (This protects PG from its enemies, real and imagined.)

   From what Michael has said over the years, I have the impression,
   right or wrong, that he believes *any* organizational standard/
   requirement beyond the above will harm the vision and goals of PG.

2) It is Good for a "movement" like PG to have a clear-cut set of
   standards/guidelines necessary to assure collection uniformity,
   consumer quality, and to enhance the collection's repurposeability.


> Like BowerBird, I /would/ like the PG corpus (body of works stored
> in the PG database) to be internally consistent; that way I can
> steal some of its components and write software to convert them into
> something more useful. But internal consistency would require
> standards, and we should all recognize by now that /that/ ain't
> gonna happen.

It's interesting that I think everyone, including Michael, would love
for the PG corpus to be internally consistent to *something*.

However, I would have to assume that Michael believes it is preferable
for PG to have major inconsistencies in the collection rather than for
PG to lay down a set of requirements (beyond the current minimalist
ones) which would have been necessary to greatly improve the internal
consistency of the collection.

Now the supporters of PG's minimalist approach will point to the
collection and say "look at the size of it! The minimalist approach
works!"

Those who believe in having at least a few requirements to meet the
goal of consistency will point to the collection and say "Look at the
size of this mess! The minimalist approach has failed."

Since we can't rewind the clock to the early 90's and restart a
project like PG with a few more requirements, all we can do is
speculate where the collection would have been today -- alternative
history sort of thing.

However, we can certainly lend data to the speculation by looking at
Distributed Proofreaders. DP has by now passed all other sources of PG
texts, and they collectively work under a set of guidelines that are
stricter than PG's itself. (Now we can argue whether or not their work
product is consistent enough, but that's not germane to this
discussion -- they clearly have a few requirements beyond PG's, and
they clearly are producing a hell of a lot of texts at a pace that is
far outstripping everyone else. And, I don't see a problem with DP
from an organizational sense -- they are not moving to the "dark side"
or anything.)

So several of us believe that PG's corpus could have been just as
large today, yet be at a higher level of consistency, usability, and
repurposeability, had PG from the beginning issued a few more
requirements for the texts submitted to it.

Since hindsight is 20-20, and we can't rewind the clock, there's no
need to beat a dead horse. We can only look to the future and decide
what is best to do. I see two options for PG:

1) No change in the collection requirements and keep collecting it
   from anywhere and everywhere.

2) Require all new texts to meet a few new requirements, and then
   encourage the older texts to be reworked as necessary to meet the
   new requirements.

The problem I see in all the discussions the last few years is that we
tend to confuse "option #1 vs. option #2" with "if option #2, what
should be the requirements."

Until Michael and Greg decide that they are serious *to do something*
so as to improve the PG collection's long-term consistency, we can
talk all we want about what #2 should entail.

But it is sort of useless (from the perspective of improving PG at
least) until those who control access to the archive (Michael and
Greg) get serious and clear as to what they really want. By their
seeming silence on what they want I can only assume that either they
are satisfied with the way things are, or they are hoping someone will
come along (e.g. Bowerbird) and wave a magic wand without them having
to establish any more "requirements", and all will be made well.

*****

As an aside, maybe what is needed is a new text archive (and Michael
and Greg enthusiastically support others to do this!) which sets a few
more collection requirements, and invites contributions.

So those who now contribute to PG can consider striving to make the
texts they produce *also* meet the requirements of the new archive
(and I think the requirements will not be that difficult to meet nor
conflict with PG's minimal requirements), and submit their texts to
*both* PG and to the new archive. PG gets what they want, and those of
us who believe a text archive should meet a set of certain
requirements beyond what PG sets, get what we want.

This is really not a competition, but rather should be something where
everybody wins.


> PG has no standards, PG will never have standards, PG won't even
> make suggestions to help the volunteers learn how to create e-texts
> for fear that they might be misconstrued as organizational standards.

This is my conclusion, too.


> If you are a volunteer who feels s/he could work better in an
> organization that provides guidance and quality control, you should
> find some other organization to work with. (Any suggestions as to
> what other organizations meet these requirements would be welcome;
> Distributed Proofreaders is an obvious option).

You know, just as I have been advocating for a while that we separate
the digital text master format from the delivery format(s), maybe we
need to separate in our mind "digital text archive" from "digital text
production".

In essence, PG is evolving towards this model where it simply is the
YouTube equivalent for textual content: "just dump your stuff here!".

DP has played a major role in this paradigm shift.


> If you are a volunteer who has ideas about how e-texts can be
> improved, or how the process of creating e-texts can be improved,
> you are welcome and encouraged to drag your soap box to any forum on
> the internet you can find (including this one) but don't expect any
> support from PG beyond the obvious "we support your right to express
> your opinion."

Of course, I suggest the "Digital Text Community" to be the place to
discuss new ideas for the digitization of "ink-on-paper" texts. Here's
the URL to the home page:

   http://groups.yahoo.com/group/digital-text/

The gutvol-* groups are really for internal discussions relating to
the operation of PG itself. They overall support a specific project,
and are not neutral meeting places for other projects which digitize
and archive texts totally apart from PG.

It's amazing how many projects have joined DTC. There are people from
PG and DP there, of course.

Jon Noring


From jon at noring.name  Fri Dec 21 14:01:09 2007
From: jon at noring.name (Jon Noring)
Date: Fri, 21 Dec 2007 15:01:09 -0700
Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm
	frass?
In-Reply-To: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
Message-ID: <1103068917.20071221150109@noring.name>

[posted publicly to gutvol-d and cc: to Michael]


Michael wrote:
> Lee Passey wrote:

>> (Any suggestions as to what other organizations meet these 
>> requirements would be welcome; Distributed Proofreaders is an 
>> obvious option).

> Distributed Proofreaders came about in exactly the manner _I_
> have been describing, from WITHIN Project Gutenberg and quite
> WITHOUT any need for such nasty commentaries.
>
> Distributed Proofreaders is a GREAT example of how volunteers
> create their own standards, their own groups, etc., with lots
> of help from "The Powers That Be."

Michael and Greg:

Supposing someone were to:

1) set up an *online archive*, totally independent from PG, and
   invited submissions of digital texts which meet a certain set of
   guidelines the new archive has established, and

2) Personally contacted many of the current contributors of texts to
   PG, asking them to contribute their texts to the new and
   independent archive *in addition* to contributing them to PG, and

3) The new archive will NEVER ever mention PG nor mention that some of
   the texts may also have been contributed to PG,

Would you make a request to the volunteer contributors not to
contribute to the new archive so long as they contribute to PG?

[Note, it would be made TOTALLY clear to the volunteer contributors
that the new archive has no ties to PG in any manner whatsoever, so
don't worry about that issue in your reply to my question.]


Jon Noring


From jon at noring.name  Fri Dec 21 14:17:09 2007
From: jon at noring.name (Jon Noring)
Date: Fri, 21 Dec 2007 15:17:09 -0700
Subject: [gutvol-d] obsessive-compulsive delightful
In-Reply-To: <d15.18a55cb8.349d7533@aol.com>
References: <d15.18a55cb8.349d7533@aol.com>
Message-ID: <114738108.20071221151709@noring.name>

Bowerbird wrote:
> Jon Nirng wrote:

> i am informed that jon noring said:

>>?? The term "anal-compulsive" is uncalled for.

>  first of all, _i_ can decide for myself what words are "called for"
>  to describe my positions. but hey jon, thanks for the feedback.

Well, ultimately it is not up to you or me to decide whether the
disparaging use of the phrase "anal-compulsive" is uncalled for. It
is up to the rest of the readers here to decide that for themselves.

All I know from running a lot of lists over the years is that those
who oppose hostile-tone speech (which is what I believe your speech
amounted to) far outnumbers those who believe it to be acceptable (and
some believe it to even be useful.)

Since those who run this list apparently believe it's alright for
people to express themselves in a quite hostile manner (so long as
they don't go way off the deep end), I don't plan to email them and
ask them to do something. I don't administer or moderate this list,
I'm simply a guest in their house as we all are...


Jon Noring


From gbnewby at pglaf.org  Fri Dec 21 14:24:38 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri, 21 Dec 2007 14:24:38 -0800
Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm
	frass?
In-Reply-To: <1103068917.20071221150109@noring.name>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
	<1103068917.20071221150109@noring.name>
Message-ID: <20071221222438.GA14265@mail.pglaf.org>

On Fri, Dec 21, 2007 at 03:01:09PM -0700, Jon Noring wrote:
> [posted publicly to gutvol-d and cc: to Michael]
> 
> 
> Michael wrote:
> > Lee Passey wrote:
> 
> >> (Any suggestions as to what other organizations meet these 
> >> requirements would be welcome; Distributed Proofreaders is an 
> >> obvious option).
> 
> > Distributed Proofreaders came about in exactly the manner _I_
> > have been describing, from WITHIN Project Gutenberg and quite
> > WITHOUT any need for such nasty commentaries.
> >
> > Distributed Proofreaders is a GREAT example of how volunteers
> > create their own standards, their own groups, etc., with lots
> > of help from "The Powers That Be."
> 
> Michael and Greg:
> 
> Supposing someone were to:
> 
> 1) set up an *online archive*, totally independent from PG, and
>    invited submissions of digital texts which meet a certain set of
>    guidelines the new archive has established, and
> 
> 2) Personally contacted many of the current contributors of texts to
>    PG, asking them to contribute their texts to the new and
>    independent archive *in addition* to contributing them to PG, and
> 
> 3) The new archive will NEVER ever mention PG nor mention that some of
>    the texts may also have been contributed to PG,
> 
> Would you make a request to the volunteer contributors not to
> contribute to the new archive so long as they contribute to PG?

Of course not.

* We'll even help someone do #s 1-3, if desired *

How much more encouragement do you expect to get?  It's surprising to
see this questions asked, given that over and over, everyone has been
encouraged to do it his or her own way.  The encouragement hasn't
included stipulations or limitations.
  -- Greg

> [Note, it would be made TOTALLY clear to the volunteer contributors
> that the new archive has no ties to PG in any manner whatsoever, so
> don't worry about that issue in your reply to my question.]
> 
> 
> Jon Noring
> 
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From Bowerbird at aol.com  Fri Dec 21 14:49:13 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 21 Dec 2007 17:49:13 EST
Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm
	frass?
Message-ID: <d09.25dd5cf3.349d9ce9@aol.com>

juliet said:
>    Once it was apparent that DP was thriving 
>    and would produce lots of material for PG, 
>    there was plenty of support. 

"plenty of support" from an organization that has _zero_budget_,
and hasn't paid its own founder much (if anything?) in several years
is probably not something that should be sneezed at...


>    But for those first 2.5 years it was different.

how many books did you digitize in those first 2.5 years?

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071221/57fab6be/attachment.htm 

From Bowerbird at aol.com  Fri Dec 21 14:59:09 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 21 Dec 2007 17:59:09 EST
Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm
	frass?
Message-ID: <d16.1c7ebf51.349d9f3d@aol.com>

greg said:
>   It's surprising to see this questions asked, given that over and over, 
>    everyone has been encouraged to do it his or her own way.? 

and i've noted here, several times, that this offer is quite genuine.

i've given project gutenberg tons and tons of constructive criticism,
and they've responded by offering me free webspace and bandwidth.

this is even after i've explicitly said that it is my fullest expectation and
specific intention that -- upon my "repurposing" of the p.g. e-texts --
my cyberlibrary will usurp project gutenberg as the most useful one...

and they're, like, "great! how can we help?"

i know, i know, it seems strange to me too.   but what can you say?        
:+)


>    The encouragement hasn't included stipulations or limitations.

nope, it hasn't.

-bowerbird

p.s.   besides, it's a stupid question because everyone knows that you
can repurpose _any_ public-domain e-text from project gutenberg,
simply by stripping off the p.g. legalese.   and every e-text tells you so.


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071221/67a8d42d/attachment.htm 

From gbnewby at pglaf.org  Fri Dec 21 14:59:17 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri, 21 Dec 2007 14:59:17 -0800
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <191898394.20071221144402@noring.name>
References: <Pine.LNX.4.64.0712201333050.16611@pglaf.org>
	<476C0617.1000503@novomail.net>
	<191898394.20071221144402@noring.name>
Message-ID: <20071221225917.GB14265@mail.pglaf.org>

I'm just responding to a few points, but leaving in the whole
context.  Skip down 100 lines for my responses:

On Fri, Dec 21, 2007 at 02:44:02PM -0700, Jon Noring wrote:
> Lee wrote:
> 
> > I /do/ believe that the existence of well-publicized standards and
> > guidelines is a necessary prerequisite to effect quality and
> > control, and I believe that PG would benefit from the adoption of
> > such standards and guidelines, no matter what they may be, but then
> > I'm not particularly interested in improving Project Gutenberg,
> > either. I long ago realize the futility of any such attempt.
> 
> Lee summarizes the two approaches to running a "movement" like PG:
> 
> 1) It is Not Good for a "movement" like PG to require any standards/
>    guidelines beyond the absolute minimum necessary to define the
>    project and to protect it from any legal problems.
> 
>    In the case of PG, the absolute minimum necessary is
> 
>       a) to receive and make available for free download textual
>          content in human readable plain text form. (This defines
>          the historical core of PG's vision.)
> 
>       b) to copyright clear all texts submitted to the archive.
>          (This protects PG from its enemies, real and imagined.)
> 
>    From what Michael has said over the years, I have the impression,
>    right or wrong, that he believes *any* organizational standard/
>    requirement beyond the above will harm the vision and goals of PG.
> 
> 2) It is Good for a "movement" like PG to have a clear-cut set of
>    standards/guidelines necessary to assure collection uniformity,
>    consumer quality, and to enhance the collection's repurposeability.
> 
> 
> > Like BowerBird, I /would/ like the PG corpus (body of works stored
> > in the PG database) to be internally consistent; that way I can
> > steal some of its components and write software to convert them into
> > something more useful. But internal consistency would require
> > standards, and we should all recognize by now that /that/ ain't
> > gonna happen.
> 
> It's interesting that I think everyone, including Michael, would love
> for the PG corpus to be internally consistent to *something*.
> 
> However, I would have to assume that Michael believes it is preferable
> for PG to have major inconsistencies in the collection rather than for
> PG to lay down a set of requirements (beyond the current minimalist
> ones) which would have been necessary to greatly improve the internal
> consistency of the collection.
> 
> Now the supporters of PG's minimalist approach will point to the
> collection and say "look at the size of it! The minimalist approach
> works!"
> 
> Those who believe in having at least a few requirements to meet the
> goal of consistency will point to the collection and say "Look at the
> size of this mess! The minimalist approach has failed."
> 
> Since we can't rewind the clock to the early 90's and restart a
> project like PG with a few more requirements, all we can do is
> speculate where the collection would have been today -- alternative
> history sort of thing.
> 
> However, we can certainly lend data to the speculation by looking at
> Distributed Proofreaders. DP has by now passed all other sources of PG
> texts, and they collectively work under a set of guidelines that are
> stricter than PG's itself. (Now we can argue whether or not their work
> product is consistent enough, but that's not germane to this
> discussion -- they clearly have a few requirements beyond PG's, and
> they clearly are producing a hell of a lot of texts at a pace that is
> far outstripping everyone else. And, I don't see a problem with DP
> from an organizational sense -- they are not moving to the "dark side"
> or anything.)
> 
> So several of us believe that PG's corpus could have been just as
> large today, yet be at a higher level of consistency, usability, and
> repurposeability, had PG from the beginning issued a few more
> requirements for the texts submitted to it.
> 
> Since hindsight is 20-20, and we can't rewind the clock, there's no
> need to beat a dead horse. We can only look to the future and decide
> what is best to do. I see two options for PG:
> 
> 1) No change in the collection requirements and keep collecting it
>    from anywhere and everywhere.
> 
> 2) Require all new texts to meet a few new requirements, and then
>    encourage the older texts to be reworked as necessary to meet the
>    new requirements.
> 
> The problem I see in all the discussions the last few years is that we
> tend to confuse "option #1 vs. option #2" with "if option #2, what
> should be the requirements."


You're mischaracterizing #1.
#2 has happened, and continues to happen, in many ways.

You're also not distinguishing a few important things:

a. collection development policy versus technical requirements for submissions
b. within-eBook quality versus cross-collection consistency


> Until Michael and Greg decide that they are serious *to do something*
> so as to improve the PG collection's long-term consistency, we can
> talk all we want about what #2 should entail.

I think you mostly care about consistency.  Sorry, but Michael
and I really don't, in terms of within-eBook content.  (We do
have consistent headers, filenames, etc., as mentioned below.)

> But it is sort of useless (from the perspective of improving PG at
> least) until those who control access to the archive (Michael and
> Greg) get serious and clear as to what they really want. By their
> seeming silence on what they want I can only assume that either they
> are satisfied with the way things are, or they are hoping someone will
> come along (e.g. Bowerbird) and wave a magic wand without them having
> to establish any more "requirements", and all will be made well.


Please don't interpret my silence on this thread, or others,
as meaningful.  As mentioned frequently, I almost never read
anything authored by you, or by a few other people who post to
gutvol-d.  This is my choice, and means I'm sometimes not tuned
into whatever discussion is going on.  

Even when I'm following a thread, I sometimes stop myself from
responding...mostly because I don't want to write something that
will be interpreted as policy, when it was really just an opinion.

More on this theme:


> As an aside, maybe what is needed is a new text archive (and Michael
> and Greg enthusiastically support others to do this!) which sets a few
> more collection requirements, and invites contributions.

You keep asking, and we keep saying, "yes," 
then the cycle repeats of expressing dissatisfaction with the
way things are.  There has never yet been substantial action.
We've been doing this for years.

This is not positive reinforcement for me to continue to engage
in the discussion.  If you were DOING something, you'd get a lot
more of my attention (for whatever that might be worth).
  (Yes, I know you've DONE a few things!  But mostly it's just talk...and
  essentially the same talk, over and over.)

A fact to consider that gutvol-d only has a few hundred people on it.
The subscribership is relatively flat (a few people coming & going
over time, but basically the same # of subscribers since the start
of the list 7ish years ago).

That's fewer than the unique # of individuals who have submitted eBooks,
just in 2007 (around 300).  Outside of those contexts, Michael and I
are in ongoing discussions with any number of people who want to do
interesting stuff with the PG content, or start their own affiliated
site, or have other ideas.  Recycling the same tired discussions on 
gutvol-d is just not very compelling to me, and it has not helped
PG achieve much.

> So those who now contribute to PG can consider striving to make the
> texts they produce *also* meet the requirements of the new archive
> (and I think the requirements will not be that difficult to meet nor
> conflict with PG's minimal requirements), and submit their texts to
> *both* PG and to the new archive. PG gets what they want, and those of
> us who believe a text archive should meet a set of certain
> requirements beyond what PG sets, get what we want.

So do it!!!!

> This is really not a competition, but rather should be something where
> everybody wins.

Are you sure you believe that?  It seems above that you're implying
that everyone LOSES if PG doesn't follow the types of guidelines
you've advocated.

> > PG has no standards, PG will never have standards, PG won't even
> > make suggestions to help the volunteers learn how to create e-texts
> > for fear that they might be misconstrued as organizational standards.
> 
> This is my conclusion, too.


Your definition of standards is not my definition of standards.
PG has quite a few.

But they don't cover a variety of items you seem to be interested
in, such as:
- maintaining provenance of sources
- particular markup / layout standards

And are very permissive for things you would like less permissive:
- various formats accepted
- various content types
- and yes, varying quality in presentation, proofreading, etc.

And yet:
- we have a fixed format & set of rules for copyright clearances
- we have gutcheck and a variety of other automated programs
- we have a set of file naming procedures
- we have a unified catalog
- we produce valid HTML, reasonably sized images (in subdirectories),
  provide conversion on the fly to various formats
and lots more.

"No standards" and "will never have standards" is presumably a reference
to some sorts of standards you care about.  I don't accept it as
accurate for the PG collection.

As you said, DP is even stricter in what they produce...standards
for items to pass the final PPV check, as well as per-book standards
that producers apply.


> > If you are a volunteer who feels s/he could work better in an
> > organization that provides guidance and quality control, you should
> > find some other organization to work with. (Any suggestions as to
> > what other organizations meet these requirements would be welcome;
> > Distributed Proofreaders is an obvious option).
> 
> You know, just as I have been advocating for a while that we separate
> the digital text master format from the delivery format(s), maybe we
> need to separate in our mind "digital text archive" from "digital text
> production".

So do it!!!

> In essence, PG is evolving towards this model where it simply is the
> YouTube equivalent for textual content: "just dump your stuff here!".
> 
> DP has played a major role in this paradigm shift.


I don't think it's fair or accurate to say that DP has helped
PG shift towards a "just dump your stuff here" model, nor that
it is the model PG has.

I don't really know what you're talking about with this,
actually.  The "just dump your stuff here" model is, as far as
I can tell, being pursued by eBook initiatives from Google, Yahoo, 
the IA and spinoffs (including OCA), where the emphasis has been
on mass scanning of entire shelves, then raw OCR + page scans
are presented as an eBook.  "Dump" is an overstatement, but 
not much of one.


> > If you are a volunteer who has ideas about how e-texts can be
> > improved, or how the process of creating e-texts can be improved,
> > you are welcome and encouraged to drag your soap box to any forum on
> > the internet you can find (including this one) but don't expect any
> > support from PG beyond the obvious "we support your right to express
> > your opinion."
> 
> Of course, I suggest the "Digital Text Community" to be the place to
> discuss new ideas for the digitization of "ink-on-paper" texts. Here's
> the URL to the home page:
> 
>    http://groups.yahoo.com/group/digital-text/
> 
> The gutvol-* groups are really for internal discussions relating to
> the operation of PG itself. They overall support a specific project,
> and are not neutral meeting places for other projects which digitize
> and archive texts totally apart from PG.
> 
> It's amazing how many projects have joined DTC. There are people from
> PG and DP there, of course.
> 
> Jon Noring


  -- Greg

From jon at noring.name  Fri Dec 21 16:35:05 2007
From: jon at noring.name (Jon Noring)
Date: Fri, 21 Dec 2007 17:35:05 -0700
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <20071221225917.GB14265@mail.pglaf.org>
References: <Pine.LNX.4.64.0712201333050.16611@pglaf.org>
	<476C0617.1000503@novomail.net> <191898394.20071221144402@noring.name>
	<20071221225917.GB14265@mail.pglaf.org>
Message-ID: <659760181.20071221173505@noring.name>

Greg wrote:
> Jon wrote:

First, I appreciate your reply, Greg!


> You're also not distinguishing a few important things:
>
> a. collection development policy versus technical requirements for
>    submissions
> b. within-eBook quality versus cross-collection consistency

Well, everything gets even finer than this, but I did not want to
delve into the specifics since my discussion wanted to look at the
more global aspects. For example, some aspects of "within e-book
quality" definitely have an impact on "cross-collection consistency."

(Of course, I can already see that we can have different definitions
as to what "cross-collection consistency" means. I think this aspect
of the discussion should probably not continue.)

I would like to understand PG's "official collection development
policy." If this is spelled out at the PG site (a Google search turned
up nothing using that phrase), a link to it would be appreciated. I
have an idea what it is, but since a collection development policy is
clearly an organizational policy, the official policy has to originate
from PGLAF.


>> As an aside, maybe what is needed is a new text archive (and
>> Michael and Greg enthusiastically support others to do this!) which
>> sets a few more collection requirements, and invites contributions.

> You keep asking, and we keep saying, "yes," 
> then the cycle repeats of expressing dissatisfaction with the
> way things are.  There has never yet been substantial action.
> We've been doing this for years.

Well, I did say above "and Michael and Greg enthusiastically support
others to do this!", so I got the message. <laugh/>

My other message, where I ask a couple hypotheticals about starting a
project and inviting contributions of texts which are also being
contributed to PG, was asked to absolutely clarify your "yes" because
of concerns shared with me in private by a couple other individuals.
So just because you say "yes" does not mean everyone interprets "yes"
in the same way.

I hope that the positive answer you provided to me in private will be
also posted to gutvol-d since there are a number of people who like to
hear it as a sort of "official" statement.

I request that it be written and posted to the PG site. A simple
paragraph would sufficie. Of course, I have my own suggested wording,
but whatever is put down in writing has to come from PGLAF:

   "An important mission of Project Gutenberg is to assure the Public
   Domain is completely digitized and freely available to all. Thus,
   Project Gutenberg fully and enthusiastically encourages others to
   start independent archives which freely distribute public domain
   texts. PG invites all who contribute public domain texts to other
   archives, to also contribute them to Project Gutenberg's archive.
   This assures wider distribution and archival redundancy. Likewise,
   PG also supports contributors to PG's archive to also consider
   donating their texts to other archives. PG has minimal format
   requirements, but does require copyright clearance."

Or something like that. (I'm sure the intent can be written much more
cleanly and succinctly than I wrote it.)


> And are very permissive for things you would like less permissive:
>
> - various formats accepted
> - various content types
> - and yes, varying quality in presentation, proofreading, etc.

Well, since we are trying to understand each other, I have no
difficulty with multiple formats. In fact the more derivatives
the better.


> I don't think it's fair or accurate to say that DP has helped PG
> shift towards a "just dump your stuff here" model, nor that it is
> the model PG has.

The word "dump" is probably harsh, but it cannot be denied that as
an organization, PG is not itself digitizing texts. At the most, on
the text production end, it is encouraging people to digitize texts
and then asking them to donate the texts so long as they meet some
minimal requirements of format, plus pass copyright clearance.

Thus, I group PG along with IA/OCA, Google Books and YouTube, among
other content archives. It is interesting that YouTube is, in some
respects, not much different from PG in terms of how they collect
and distribute content. By its marketing, YouTube is saying "please
give us your video!" and seems to have pretty minimal requirements
regarding format (video is much more repurposeable than texts,
though), and to meet copyright law (if PG is rigorous on anything,
though, it is with copyright clearance, and this is one thing which PG
does very very well, at least as seen from my vantage point.)

PG, by its well-understood mission, is asking people to donate texts
to it for archiving and distribution. In fact, the lack of anything
other than minimal text format requirements only reinforces the view
that PG is primarily a text archive. Even PGLAF, by its name "literary
text archive" acknowledges that the real focus of PG is on archival
and delivery of texts, and not on production of texts itself.

Now there is NOTHING wrong with PG being considered primarily a text
archive. It's a good thing. It is through the archive that the texts
make it to the world.

Jon Noring


From Bowerbird at aol.com  Fri Dec 21 17:18:30 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 21 Dec 2007 20:18:30 EST
Subject: [gutvol-d] is it december 10th yet?
Message-ID: <c0f.17fa1a5c.349dbfe6@aol.com>

hmm...

it appears i missed my annual december 10th post to michael hart.

darn!

so...   sorry this is late, michael, i guess i was busy;
you know how time flies when you're having fun...

anyway...

michael hart, thank you for bringing project gutenberg into the world!

you birthed electronic-books, michael, and -- along with them --
boosted the very concept of _unlimited_distribution_via_cyberspace_.

let the record show -- clearly -- that many of the earliest e-texts were 
created when you -- michael hart, a semi-dyslexic? -- _typed_them_in._

and the first 1,000 e-texts from others were _proofed_ by you too, until
you simply could not keep up with the increasing pace of submissions...

why?   because you assimilated a community of people around you who
were eager to share, and help you create, your envisioned cyberlibrary.

you built it.   and they came.

and is it really only 4 years ago we were celebrating number 10,000?
when -- this year -- you are celebrating (give or take some) 25,000?

not that numbers matter anymore.   it's the idea, and the idea is loose,
and the reason the idea is loose is that you put the idea into motion...

while lots of people might have been talking about electronic-libraries
way back when -- i know _i_ was -- _you_ sat down and set to typing,
and you and your typing made all the difference, my friend, _all_ of it...

google may have more books than you, but they got the idea from you.

lots of people have a finger in the pie now -- the google boys, brewster,
bezos, adobe's sharks, you name it -- but you _baked_ that pie, michael.

so god bless you, michael hart.   god bless you for what you've done.

i love you, man.

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071221/5abdbdd1/attachment.htm 

From lee at novomail.net  Fri Dec 21 18:04:07 2007
From: lee at novomail.net (Lee Passey)
Date: Fri, 21 Dec 2007 19:04:07 -0700
Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm
 frass?
In-Reply-To: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
Message-ID: <476C7097.9070106@novomail.net>

Michael Hart wrote:

> On Fri, 21 Dec 2007, Lee Passey wrote:
> 
>> For the record, I have absolutely no desire to become one of 
>> "The Powers That Be" at Project Gutenberg. And I certainly 
>> haven't ever advocated the adoption by PG any particular 
>> standard or guideline, let alone my own.
> 
> You certainly want Project Gutenberg to be more like YOUR
> image if what Project Gutenberg SHOULD be . . . period.

Well, in the sense of "it sure would be nice if Project Gutenberg could 
furnish e-texts that I might find useful," that might be true. In the 
sense of "Project Gutenberg should remake itself in whatever image it 
thinks I'm advocating," certainly not. The success or failure of Project 
Gutenberg is of virtually no consequence to me, and I certainly wouldn't 
presume to substitute my judgment for that of those to whom Project 
Gutenberg /is/ of consequence.

No, really. I'm not suggesting you do anything at all. I may express 
opinions as to the likelihood of certain consequences following from 
certain behaviors, but I have no opinion at all as to whether those 
consequences are desirable or not, or should be sought or avoided. Those 
kind of judgments are yours, and yours alone.

[snip]

>> I /do/ believe that the existence of well-publicized standards 
>> and guidelines is a necessary prerequisite to effect quality and 
>> control,
> 
> This is the advocation of a standard, however fuzzy those
> advocations may be. . . .

No, it is the statement of a causal relationship. I make no value judgments.

>> and I believe that PG would benefit from the adoption of such 
>> standards and guidelines, no matter what they may be,

Ok, /this/ is a value judgment, but it's only my values. I'm not in 
charge, you are, and I respect your right to run Project Gutenberg 
according to your values.

[snip]

>> but then I'm not particularly interested in improving Project 
>> Gutenberg, either.
> 
> Then what are you doing here?

An interesting question. I am deeply committed to e-books, and making 
quality digital versions of important literature available to the 
public. And to be honest, I'm getting tired of hearing things in the 
press like "I downloaded an e-book from Project Gutenberg, and obviously 
e-books will never be anywhere near as good as paper books."

So one thing I want to do is speak out whenever I see the emperor 
without clothes. I want to be sure everyone understands what Project 
Gutenberg is, and is not, and if I can accomplish that in your own 
words, so much the better. I don't want people coming to Project 
Gutenberg and thinking that the quality of its work product is 
indicative of e-books in general.

And of course, just about anyone who is interested in e-books is going 
to pass through so poking my head up here from time to time is a way to 
do a little social networking, and maybe make some contacts with 
like-minded individuals. If I can find some individual who is interested 
in producing quality e-books, and I can help him or her understand how 
to build a better mousetrap, then I have accomplished something.

> Should I presume you may have contributed anonymously?

No, you should presume that I'm not particularly interested in throwing 
my work product into the PG mill. My stuff is out on the internet, and 
if it interests you enough to find it you're welcome to it.

>> Like BowerBird, I /would/ like the PG corpus (body of works 
>> stored in the PG database) to be internally consistent; that way 
>> I can steal some of its components and write software to convert 
>> them into something more useful.
> 
> Then you /should/ do something to exemplify your desires.

I have, and I continue to do so. I have frequently offered markup advice 
to people who are faced with particularly knotty problems, and in 
general it has been well received. I have written all sorts of software 
designed to facilitate the creation of quality e-books, and have offered 
here on the mailing list.

I've recreated several works that PG also has it its corpus and placed 
them on my web site for critique and evaluation.

The fact that I'm not doing any of the things /you/ want me to be doing 
does not mean I'm doing nothing.

[snip]

> The WhiteWashers are not the only way eBooks get donated.
> 
> Need examples?
> 
> Just look at all the books coming out this week. . . .

Rather than pointing to examples of books which have managed to avoid 
the white-washer gauntlet, it would be more useful to explain the 
process used to avoid that gauntlet. I'm sure I'm not the only one 
interested in the answer to /that/ question.

>> My comments are mostly intended to reinforce the message of 
>> Michael Hart:
> 
> OH OH!!!
> 
> I sense more words being stuffed into my mouth.
> 
>> PG has no standards, PG will never have standards,
> 
> Not standards than can be forced on volunteers, only standards
> that are more on the order of suggestions, excepting legal and
> other standards of a minimal nature.
> 
>> PG won't even make suggestions to help the volunteers learn how 
>> to create e-texts for fear that they might be misconstrued as 
>> organizational standards.
> 
> What Mr. Passey fears is that there are too many standards,
> not that there aren't any at all. . . .

Now who's putting words into whose mouth? :-)

But although you still seem a little unclear on the concept, you are 
mostly right.

A standard has to be explicitly definable, if not defined, and can be 
defined as well by what it is not as by what it is. A "standard" that 
includes everything is, patently, no standard at all.

Now what I really would like (just to be clear, I'm speaking in general; 
I'm not asking for PG to do anything to satisfy this desire) is a corpus 
of works which are susceptible to repurposing via automated data 
processing. Works which are usable /by/ computers, not just by humans 
who happen to be using computers.

Not only is the concept of thousands of "standards" meaningless, it's 
virtually impossible to do anything with. If I want to automatically 
extract the author and title from every work in my mythical corpus, and 
every one of them follows a different standard in identifying those two 
data points, it's impractical, if not impossible, to accomplish my 
desired task. Even if I /were/ able to write a thousand programs to 
match my thousand "standards" I would need some mechanism to know which 
text follows which standard so I know which program to apply to the 
particular file. In other words, I would need one, or some limited 
number, of meta-standards.

Project Gutenberg is not the corpus I need. I'm not saying that it 
should be, I'm just saying that it's not. And anyone who comes to PG 
thinking that PG e-texts /can/ be used by computers, and not just 
humans, should be rapidly disabused of that notion.

[snip]

> Mr. Passey is confusing freedom and independence with the lack
> of any standards whatsoever . . . 

I don't think so. If a "standard" isn't published it can't be a 
standard, and if everyone is not only free to ignore a published 
"standard", and does, in fact, ignore that "standard" then in fact there 
are no standards. Anarchy and conformism are not necessarily 
diametrically opposed, but they're pretty darn close.

[snip]

> Project Gutenberg has worked with The Internet Archive all along, 
> but their goals are not nearly identical.

Indeed. That is why it may be a more appropriate repository for those 
people interested in preserving the Public Domain. I was proposing the 
Internet Archive as an alternative to Project Gutenberg, not as a companion.

[snip]

> Is his ONLY goal the destruction of Project Gutenberg
> by getting all the volunteers to desert?

I certainly would not encourage /anyone/ committed to digitizing paper 
books to abandon Project Gutenberg unless they had found some other 
organization which better suited their own preferences.

> Personally, I think he is complaining because we don't pick
> one standard and censor all the others.

I'm sorry, but I can't help thinking here about BowerBird's Rorschach 
blots ...

> Personally, I don't think THE standard for eBooks exists yet.

Nor do I. I /do/ think that several GOOD standards for e-books exist, 
anyone of which would be reasonable to adopt. I even think it would be 
acceptable to choose a half-dozen of them with the requirement that 
whichever one you choose you clearly identify your choice then use it 
exclusively.

> But I think it will become obvious when it does.

I disagree. I think e-book standards will continue to evolve as 
hardware, software, and our understanding of natural languages improve.

> I was unwilling to force HTML on our volunteers and readers
> when Sir Tim Berners-Lee invited me, and Project Gutenberg,
> to be one of the charter members of The World Wide Web, and
> I stand behind that decision today, simply because I am not
> a purveyor of standards, and neither is Project Gutenberg.

Hopefully, you've made that perfectly clear. I understand this, and I 
hope everyone else does as well.

> Whatever standards emerge from the real world are just fine.
> 
> If Mr. Passey is unwilling to provide examples of standards,
> then it is highly unlikely that he will ever get exactly the
> standards he wants, or anything close to it.

XHTML, TEI, DocBook, z.m.l., OEB ... take your pick.

> I was involved with Unicode people when it was being set up,
> and I can tell it it was a zoo, same with TEI, ZML, ZML, and
> all the rest.
> 
> The time wasted was enormous.

And yet, some pretty good standards emerged. It was obviously not a 
waste of time for /everyone/ involved. And now I can leverage all the 
good work they did! Time well spent, if you ask me.

[snip]

> The reason The Web succeeded, browsers succeeded, and eBooks,
> is the the people doing them didn't wait for approval. . . .

The reason the web and browsers succeeded is because Sir Tim Berners-Lee 
invented the HyperText Markup Language and the HyperText Transfer 
Protocol, and everyone agreed to use it. The reason e-books /haven't/ 
succeeded is because everyone insists on doing things their own way.

> "Just DO It!"

Sounds like good advice to me.

From hart at pglaf.org  Fri Dec 21 20:02:43 2007
From: hart at pglaf.org (Michael Hart)
Date: Fri, 21 Dec 2007 20:02:43 -0800 (PST)
Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm
 frass?
In-Reply-To: <476C7097.9070106@novomail.net>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
	<476C7097.9070106@novomail.net>
Message-ID: <Pine.LNX.4.64.0712211909470.18344@pglaf.org>


On Fri, 21 Dec 2007, Lee Passey wrote:

> Michael Hart wrote:
>
>> On Fri, 21 Dec 2007, Lee Passey wrote:
>>
>>> For the record, I have absolutely no desire to become one of
>>> "The Powers That Be" at Project Gutenberg. And I certainly
>>> haven't ever advocated the adoption by PG any particular
>>> standard or guideline, let alone my own.
>>
>> You certainly want Project Gutenberg to be more like YOUR
>> image if what Project Gutenberg SHOULD be . . . period.
>
> Well, in the sense of "it sure would be nice if Project 
> Gutenberg could furnish e-texts that I might find useful," 
> that might be true. In the sense of "Project Gutenberg should 
> remake itself in whatever image it thinks I'm advocating," 
> certainly not. The success or failure of Project Gutenberg is 
> of virtually no consequence to me, and I certainly wouldn't 
> presume to substitute my judgment for that of those to whom 
> Project Gutenberg /is/ of consequence.

Well said!


> No, really. I'm not suggesting you do anything at all. I may 
> express opinions as to the likelihood of certain consequences 
> following from certain behaviors, but I have no opinion at 
> all as to whether those consequences are desirable or not, or 
> should be sought or avoided. Those kind of judgments are 
> yours, and yours alone.

Well. . .I try to leave that to the judgement of those who do
the actual choosing and working to create the books.

I would not be comfortable telling someone how to prepare the
favorite books of and entire life time, it shoud be, "a labor
of love," so to speak, not merely following a checklist.

If making eBooks were only following a standards checkist for
xxx number of pages, then it could easily be done by programs
and we could just dump all the text in those programs but the
result would be much like artificial reading voices.

[Footnote:  as many of you know, _I_ don't personally care in
any way about the "appearance" or "look and feel" of eBooks--
I tend to SEE the book as a vision rather than seeing a words
on the page or screen kind of thing. . .hence my inhibitions,
as it were to tell be what/how to do are more on the lines of
personal respect for individual human beings.

Perhaps that last part said it best after all these years.]


> [snip]
>
>>> I /do/ believe that the existence of well-publicized 
>>> standards and guidelines is a necessary prerequisite to 
>>> effect quality and control,
>>
>> This is the advocation of a standard, however fuzzy those 
>> advocations may be. . . .
>
> No, it is the statement of a causal relationship. I make no 
> value judgments.

Some people make statements they do no view as judgemental in
cases where others see them as VERY judgemental.

That is also perhaps better said than I managed before, but I
hesitate to include every case in that statement.


>>> and I believe that PG would benefit from the adoption of 
>>> such standards and guidelines, no matter what they may be,
>
> Ok, /this/ is a value judgment, but it's only my values.

That's all I was trying to say. . . .

Sorry, it took so much time and effort for us to manage it.


> I'm not in charge, you are, and I respect your right to run 
> Project Gutenberg according to your values.

I still don't think I am running Project Gutenberg according,
as you say, to my values, other than that my values are value
the people who volunteer to help Project Gutenberg.

After all, I have nothing to offer them than my thanks and my
respect, along with a chance to perhaps change the world in a
way not seen since The Gutenberg Press.

>
> [snip]
>
>>> but then I'm not particularly interested in improving 
>>> Project Gutenberg, either.
>>
>> Then what are you doing here?
>
> An interesting question. I am deeply committed to e-books, 
> and making quality digital versions of important literature 
> available to the public. And to be honest, I'm getting tired 
> of hearing things in the press like "I downloaded an e-book 
> from Project Gutenberg, and obviously e-books will never be 
> anywhere near as good as paper books."

You'll probably have plenty of time to get even more tired of
comments such as those, as it appears to me that most of such
comments come from people who WANT eBooks to fail. . . !

I'm absolutely sure the same comments were made during shifts
in paradigms from stone to parchment to papyrus to paper, and
from tablets to scrolls to books to eBooks.

Naysaying is just part of their tactics and strategies.

The answer lies, as always, in the cost/benefit ratio.

The reason Project Gutenberg has survived better than eBooks
from commercial sources is just that. . .cost benefit ratio.

Project Gutenberg can't fail. . .at least until someone else
figures out a way to make eBooks ubiquitous.


> So one thing I want to do is speak out whenever I see the 
> emperor without clothes. I want to be sure everyone 
> understands what Project Gutenberg is, and is not, and if I 
> can accomplish that in your own words, so much the better. I 
> don't want people coming to Project Gutenberg and thinking 
> that the quality of its work product is indicative of e-books 
> in general.

I wouldn't either. . .but for the opposite reasons.

I don't like the other eBooks as well.

Tell me, does anyone else hand out 35 million eBooks per year?

Even those who claim to have millions to hand out?

Not to mention that those 35 million are from 25,000 titles.


> And of course, just about anyone who is interested in e-books 
> is going to pass through so poking my head up here from time 
> to time is a way to do a little social networking, and maybe 
> make some contacts with like-minded individuals. If I can 
> find some individual who is interested in producing quality 
> e-books, and I can help him or her understand how to build a 
> better mousetrap, then I have accomplished something.

Everyone has a different idea of the ideal mouse trap, so I am
trying to leave the door open for all of them.

I certainly to not expect eBooks to LOOK the same in 100 years
or perhaps even in 10 years.

But I should hope the underlying text would still be 99.99 the
same characters that the original sources had.


>> Should I presume you may have contributed anonymously?
>
> No, you should presume that I'm not particularly interested 
> in throwing my work product into the PG mill. My stuff is out 
> on the internet, and if it interests you enough to find it 
> you're welcome to it.

OK, I'll see what I can do.

Any particular credit line you would like?


>>> Like BowerBird, I /would/ like the PG corpus (body of works 
>>> stored in the PG database) to be internally consistent; 
>>> that way I can steal some of its components and write 
>>> software to convert them into something more useful.

If it's TOO consistent, then it's just preprogrammed output.

I was hoping for something better than a Xerox machine.

However, I would actually accept eBooks made by machine
as long as they were 99.99% accurate.

After all, it's the books that matter, not our pride.


>> Then you /should/ do something to exemplify your desires.
>
> I have, and I continue to do so. I have frequently offered 
> markup advice to people who are faced with particularly 
> knotty problems, and in general it has been well received.

There is one place we differ, I don't care much about markup
and I apologize if that causes a rift between us or others--
I hate to say it in at least one manner. . .but some friends
I really like are VERY into markup and appearance, but I may
be just "old school" enough NOT to want to judge a book by a
collection of appearance variables. . .rather than content.

What I see everywhere are comments on FORM, APPEARANCE, that
stuff that goes down to JUDGING A BOOK BY ITS COVER.

I'm just not that sort of person. . . .


> I have written all sorts of software designed to facilitate 
> the creation of quality e-books, and have offered here on the 
> mailing list.
>
> I've recreated several works that PG also has it its corpus 
> and placed them on my web site for critique and evaluation.
>
> The fact that I'm not doing any of the things /you/ want me 
> to be doing does not mean I'm doing nothing.

Again my apologies, those kinds of standards are just not the
part of books that interest me.

I hope you realize I'm not being judgemental here, I just see
something else than that stuff when I read a book.


> [snip]
>
>> The WhiteWashers are not the only way eBooks get donated.
>>
>> Need examples?
>>
>> Just look at all the books coming out this week. . . .
>
> Rather than pointing to examples of books which have managed 
> to avoid the white-washer gauntlet, it would be more useful 
> to explain the process used to avoid that gauntlet. I'm sure 
> I'm not the only one interested in the answer to /that/ 
> question.

The simple answer, as always, is just contect Newby or myself.


>>> My comments are mostly intended to reinforce the message of 
>>> Michael Hart:
>>
>> OH OH!!!
>>
>> I sense more words being stuffed into my mouth.
>>
>>> PG has no standards, PG will never have standards,
>>
>> Not standards than can be forced on volunteers, only 
>> standards that are more on the order of suggestions, 
>> excepting legal and other standards of a minimal nature.
>>
>>> PG won't even make suggestions to help the volunteers learn 
>>> how to create e-texts for fear that they might be 
>>> misconstrued as organizational standards.
>>
>> What Mr. Passey fears is that there are too many standards, 
>> not that there aren't any at all. . . .
>
> Now who's putting words into whose mouth? :-)

I am interpreting what you have said as best I know now, but I
am the first to admit that the speaker is the ONLY one who can
know exactly what they meant.

However, when I disagree, I do ask for literal translations.

;-)


> But although you still seem a little unclear on the concept, 
> you are mostly right.

Thank you for being so kind as to say that so publicly.

More thanks!


> A standard has to be explicitly definable, if not defined, 
> and can be defined as well by what it is not as by what it 
> is. A "standard" that includes everything is, patently, no 
> standard at all.

As you may know, I tried to keep the standards obvious so it
woud be possible for anyone on any hardware/software combo--
past, present or future--to create PG eBooks.

I wasn't about to rule out whole portions of the population.


> Now what I really would like (just to be clear, I'm speaking 
> in general; I'm not asking for PG to do anything to satisfy 
> this desire) is a corpus of works which are susceptible to 
> repurposing via automated data processing. Works which are 
> usable /by/ computers, not just by humans who happen to be 
> using computers.

Actually, I agree here more than you might think.

However, I think what Bowerbird, you, and others, wanted was
something that could be totally automated.

I wanted something that required the preservation of just an
exemplary touch of humanity. . .not 100% automation.

However, you will probably get your wish in your lifetime.

MY wish was simply to break down the bars of ignorance and
of illiteracy for the world at large, not for computers.

Even though I was adamant that computers without a special
software or hardware committement should be able to read.


> Not only is the concept of thousands of "standards" 
> meaningless, it's virtually impossible to do anything with.

Well, you might think that "thousands of `standards'" might
be enough of a "reductio ad absurdum/infinitum" to ward off
any possible contradiction. . .

. . .the truth is that with minimal standards you can never
rule out how many thousands of standards might fit in a way
that both people and machines can easily read.


> If I want to automatically extract the author and title from 
> every work in my mythical corpus, and every one of them 
> follows a different standard in identifying those two data 
> points, it's impractical, if not impossible, to accomplish my 
> desired task. Even if I /were/ able to write a thousand 
> programs to match my thousand "standards" I would need some 
> mechanism to know which text follows which standard so I know 
> which program to apply to the particular file. In other 
> words, I would need one, or some limited number, of 
> meta-standards.

Sorry, you lost me there.

I'm just talking about reading the books.


> Project Gutenberg is not the corpus I need. I'm not saying 
> that it should be, I'm just saying that it's not. And anyone 
> who comes to PG thinking that PG e-texts /can/ be used by 
> computers, and not just humans, should be rapidly disabused 
> of that notion.

I diagree, as do many programmers who use our eBooks.

>
> [snip]
>
>> Mr. Passey is confusing freedom and independence with the 
>> lack of any standards whatsoever . . .
>
> I don't think so. If a "standard" isn't published it can't be 
> a standard, and if everyone is not only free to ignore a 
> published "standard", and does, in fact, ignore that 
> "standard" then in fact there are no standards. Anarchy and 
> conformism are not necessarily diametrically opposed, but 
> they're pretty darn close.

It's just that the standards are so simple, not that they were
never published. . .and that we don't force them on volunteers.

>
> [snip]
>
>> Project Gutenberg has worked with The Internet Archive all 
>> along, but their goals are not nearly identical.
>
> Indeed. That is why it may be a more appropriate repository 
> for those people interested in preserving the Public Domain. 
> I was proposing the Internet Archive as an alternative to 
> Project Gutenberg, not as a companion.

Well, we've been companions since before The Internet Archive
even got famous, so it's a littel late for that.


>
> [snip]
>
>> Is his ONLY goal the destruction of Project Gutenberg by 
>> getting all the volunteers to desert?
>
> I certainly would not encourage /anyone/ committed to 
> digitizing paper books to abandon Project Gutenberg unless 
> they had found some other organization which better suited 
> their own preferences.

Well said!


>> Personally, I think he is complaining because we don't pick 
>> one standard and censor all the others.
>
> I'm sorry, but I can't help thinking here about BowerBird's 
> Rorschach blots ...

Even Bowerbird would prefer just one standard, though he will
work with the simple standards mentioned above, and gives the
credit for them where credit is due.


>> Personally, I don't think THE standard for eBooks exists 
>> yet.
>
> Nor do I. I /do/ think that several GOOD standards for 
> e-books exist, anyone of which would be reasonable to adopt. 
> I even think it would be acceptable to choose a half-dozen of 
> them with the requirement that whichever one you choose you 
> clearly identify your choice then use it exclusively.

I think one has to be VERY careful when assigning standards.

VERY careful.

More than I could do with anything that wasn't VERY simple.

And don't forget the time factor. . . .


>> But I think it will become obvious when it does.
>
> I disagree. I think e-book standards will continue to evolve 
> as hardware, software, and our understanding of natural 
> languages improve.

I think that eventually eBooks will settle into patterns quite
much the way paper books did.

Look at the early ones, all over the place, in size, paper and
binding, fonts, inks, and everything else.

That's the way pioneers are.

Later on comes the pressure for everyone to be alike. . . .

And the pioneers either die out or move on.


>> I was unwilling to force HTML on our volunteers and readers 
>> when Sir Tim Berners-Lee invited me, and Project Gutenberg, 
>> to be one of the charter members of The World Wide Web, and 
>> I stand behind that decision today, simply because I am not 
>> a purveyor of standards, and neither is Project Gutenberg.
>
> Hopefully, you've made that perfectly clear. I understand 
> this, and I hope everyone else does as well.

So glad.


>> Whatever standards emerge from the real world are just fine.
>>
>> If Mr. Passey is unwilling to provide examples of standards, 
>> then it is highly unlikely that he will ever get exactly the 
>> standards he wants, or anything close to it.
>
> XHTML, TEI, DocBook, z.m.l., OEB ... take your pick.

Sadly to say, at least SOME of the people behind those WANT the
standards THEY developed to ELIMINATE all other standards.

I've asked them in person. . . .


>> I was involved with Unicode people when it was being set up, 
>> and I can tell it it was a zoo, same with TEI, ZML, ZML, and 
>> all the rest.
>>
>> The time wasted was enormous.
>
> And yet, some pretty good standards emerged. It was obviously 
> not a waste of time for /everyone/ involved. And now I can 
> leverage all the good work they did! Time well spent, if you 
> ask me.

If only you said the same about Project Gutenberg. . . eh?

;-)


>
> [snip]
>
>> The reason The Web succeeded, browsers succeeded, and 
>> eBooks, is the the people doing them didn't wait for 
>> approval. . . .
>
> The reason the web and browsers succeeded is because Sir Tim 
> Berners-Lee invented the HyperText Markup Language and the 
> HyperText Transfer Protocol, and everyone agreed to use it. 
> The reason e-books /haven't/ succeeded is because everyone 
> insists on doing things their own way.

Actually, I think it was as much the invention of browsers,
search engines, etc., that did it. . . .

It could have been ANY markup system. . .well not ANY, but,

MANY. . . .


>
>> "Just DO It!"
>
> Sounds like good advice to me.


I would be more than happy to assist you in doing it,
if you would allow me. . . .


Thanks!!!

Michael S. Hart
Founder
Project Gutenberg

Recommended Books:

Dandelion Wine, by Ray Bradbury:  For The Right Brain
Atlas Shrugged, by Ayn Ran,:  For The Left Brain [or both]
Diamond Age, by Neal Stephenson:  To Understand The Internet
The Phantom Toobooth, by Norton Juster:  Lesson of Life. . .


From hart at pglaf.org  Fri Dec 21 20:16:07 2007
From: hart at pglaf.org (Michael Hart)
Date: Fri, 21 Dec 2007 20:16:07 -0800 (PST)
Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm
 frass?
In-Reply-To: <476C2ED0.5030406@verizon.net>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
	<476C2ED0.5030406@verizon.net>
Message-ID: <Pine.LNX.4.64.0712212004460.18344@pglaf.org>


Sorry, but Juliet wasn't there when I went to see Charles
in Las Vegas [where he lived, but since has moved] to see
about building foundations for Distributed Proofreaders.

I asked Charles on any number of occasions what he wanted
from both myself and Project Gutenberg and my silence was
at his request or I would have said much more.

I asked again and again, and cannot recall even on time a
request made by Charles was not honored.

The truth is that he wanted DP to appear to stand alone--
as much as possible--and Greg and I did as requested to a
point of keeping much quieter than at least _I_ would.

Personally, I am very proud of Distributed Proofreaders--
more than anyone but myself could possibly know, and I do
mention DP just about every chance I give a presentation.

However, I was asked NOT to say much in the Newsletter or
other similar media. . .so I didn't. . . .

THAT is how independent we allow the volunteers to be.

Something Mr. Noring may have just read in a reply PG CEO
Greg Newby just made to his remarks of today.

We are MORE than glad to help such project behind scenes,
or to give them maximum PR, or anywhere in between.

Thanks!!!

Michael S. Hart
Founder
Project Gutenberg

Recommended Books:

Dandelion Wine, by Ray Bradbury:  For The Right Brain
Atlas Shrugged, by Ayn Ran,:  For The Left Brain [or both]
Diamond Age, by Neal Stephenson:  To Understand The Internet
The Phantom Toobooth, by Norton Juster:  Lesson of Life. . .


On Fri, 21 Dec 2007, Juliet Sutherland wrote:

>
>
> Michael Hart wrote:
>> Distributed Proofreaders came about in exactly the manner _I_
>> have been describing, from WITHIN Project Gutenberg and quite
>> WITHOUT any need for such nasty commentaries.
>>
>> Distributed Proofreaders is a GREAT example of how volunteers
>> create their own standards, their own groups, etc., with lots
>> of help from "The Powers That Be."
>>
> Well, that's one version of history.
>
> Charles Franks (founder of DP) was a PG volunteer who decided to try to
> make a better way of proofreading books for PG. Aside from the usual
> "let lots of flowers bloom" statements from PG there was NO early
> support. Charles did all the coding, ran the software on his own server,
> etc. When I went looking for DP in April of 2002, having remembered that
> it had been mentioned on gutvol-d several years before (2000), I could
> find no mention of it at all on the PG website. From PG it sure looked
> like DP didn't exist. In the public interviews and other publicity that
> Michael Hart did for PG in 2002 and 2003, at least those that I was
> aware of, Michael never once mentioned DP.
>
> However, in the summer of 2002, PGLAF did provide a high-speed,
> destructive scanner setup for Charles Franks. And at the beginning of
> 2003 provided a second setup. The Internet Archive provided our next
> server, around Sept. 2002 or so. When we could no longer afford "free"
> service from IA, PGLAF bought DP a server and paid for hosting. So I'm
> not saying that PG has been unsupportive of DP. Once it was apparent
> that DP was thriving and would produce lots of material for PG, there
> was plenty of support. But for those first 2.5 years it was different.
> Today, most of the DP volunteers still arrive via the banner at PG and
> most are strongly supportive of PG and its mission.
>
> JulietS
> Distributed Proofreaders
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Fri Dec 21 20:22:12 2007
From: hart at pglaf.org (Michael Hart)
Date: Fri, 21 Dec 2007 20:22:12 -0800 (PST)
Subject: [gutvol-d] obsessive-compulsive delightful
In-Reply-To: <114738108.20071221151709@noring.name>
References: <d15.18a55cb8.349d7533@aol.com>
	<114738108.20071221151709@noring.name>
Message-ID: <Pine.LNX.4.64.0712212017140.19218@pglaf.org>


On Fri, 21 Dec 2007, Jon Noring wrote:

> Bowerbird wrote:
>> Jon Norng wrote:
>
>> i am informed that jon noring said:
>
>>> ?? The term "anal-compulsive" is uncalled for.
>
>>  first of all, _i_ can decide for myself what words are 
>> "called for"
>>  to describe my positions. but hey jon, thanks for the 
>> feedback.
>
> Well, ultimately it is not up to you or me to decide whether 
> the disparaging use of the phrase "anal-compulsive" is 
> uncalled for. It is up to the rest of the readers here to 
> decide that for themselves.
>
> All I know from running a lot of lists over the years is that 
> those who oppose hostile-tone speech (which is what I believe 
> your speech amounted to) far outnumbers those who believe it 
> to be acceptable (and some believe it to even be useful.)
>
> Since those who run this list apparently believe it's alright 
> for people to express themselves in a quite hostile manner 
> (so long as they don't go way off the deep end), I don't plan 
> to email them and ask them to do something. I don't 
> administer or moderate this list, I'm simply a guest in their 
> house as we all are...
>
>
> Jon Noring


Jon Noring is as hostile as anyone on this list, still welcome, 
after all these years of hostility, mixed with other things.

Mr. Noring was responsible for the ONLY "moderation" of anyone,
ever, on this list, and yet is still welcome.

Greg Newby and I still hope he will take that energy and focus,
so we can assist him in something more productive.


Michael

From hart at pglaf.org  Fri Dec 21 20:29:57 2007
From: hart at pglaf.org (Michael Hart)
Date: Fri, 21 Dec 2007 20:29:57 -0800 (PST)
Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm
 frass?
In-Reply-To: <1103068917.20071221150109@noring.name>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
	<1103068917.20071221150109@noring.name>
Message-ID: <Pine.LNX.4.64.0712212022410.19218@pglaf.org>


I would LOVE it if Mr. Noring would make such an effort.

I would LOVE to help, and for PG to help.

I wouldn't want to be called on to lie about helping him.

In fact, after Juliet's remarks earlier today about early
support of Distributed Proofreaders being lacking, I must
state that feel I should be cautious in the future.

However, as far as I am concerned, and from what I read a
moment of from Dr. Newby, there is no question a project,
or projects would be Mr. Noring's, and he will be welcome
to the lion's share of the credit.

Again, I don't want to actually LIE and say I didn't help
in any way whatsoever, but, as with Juliet's example I am
willing to stand so far in the background that I might be
accused by those who come later, of not helping.

Michael


On Fri, 21 Dec 2007, Jon Noring wrote:

> [posted publicly to gutvol-d and cc: to Michael]
>
>
> Michael wrote:
>> Lee Passey wrote:
>
>>> (Any suggestions as to what other organizations meet these
>>> requirements would be welcome; Distributed Proofreaders is an
>>> obvious option).
>
>> Distributed Proofreaders came about in exactly the manner _I_
>> have been describing, from WITHIN Project Gutenberg and quite
>> WITHOUT any need for such nasty commentaries.
>>
>> Distributed Proofreaders is a GREAT example of how volunteers
>> create their own standards, their own groups, etc., with lots
>> of help from "The Powers That Be."
>
> Michael and Greg:
>
> Supposing someone were to:
>
> 1) set up an *online archive*, totally independent from PG, and
>   invited submissions of digital texts which meet a certain set of
>   guidelines the new archive has established, and
>
> 2) Personally contacted many of the current contributors of texts to
>   PG, asking them to contribute their texts to the new and
>   independent archive *in addition* to contributing them to PG, and
>
> 3) The new archive will NEVER ever mention PG nor mention that some of
>   the texts may also have been contributed to PG,
>
> Would you make a request to the volunteer contributors not to
> contribute to the new archive so long as they contribute to PG?
>
> [Note, it would be made TOTALLY clear to the volunteer contributors
> that the new archive has no ties to PG in any manner whatsoever, so
> don't worry about that issue in your reply to my question.]
>
>
> Jon Noring
>
>

From jon at noring.name  Fri Dec 21 22:06:35 2007
From: jon at noring.name (Jon Noring)
Date: Fri, 21 Dec 2007 23:06:35 -0700
Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm
	frass?
In-Reply-To: <Pine.LNX.4.64.0712211909470.18344@pglaf.org>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
	<476C7097.9070106@novomail.net>
	<Pine.LNX.4.64.0712211909470.18344@pglaf.org>
Message-ID: <15110491966.20071221230635@noring.name>

Michael wrote in reply to Lee:

> There is one place we differ, I don't care much about markup
> and I apologize if that causes a rift between us or others--
> I hate to say it in at least one manner. . .but some friends
> I really like are VERY into markup and appearance, but I may
> be just "old school" enough NOT to want to judge a book by a
> collection of appearance variables. . .rather than content.

Well, the markup proponents here on gutvol-d, in general, fall into
two camps as Joshua has observed:

1) Use of markup in a presentational sense to emulate a particular
   look/feel in presentation, usually some sort of reproduction of
   the original.

2) Use of markup to specify the various document structures and
   important inline semantics.

#2 is what makes the text highly repurposeable when done right.

Even Bowerbird's ZML falls into #2. His rules for formatting the plain
text using white space characters is intended to communicate certain
document structures, and thus is classifiable as "markup camp #2".

I fall into camp #2 quite firmly, as everyone knows by now.


> What I see everywhere are comments on FORM, APPEARANCE, that
> stuff that goes down to JUDGING A BOOK BY ITS COVER.
>
> I'm just not that sort of person. . . .

Actually, Michael, in that regards you, Lee and I think much alike.
(And so does Bowerbird with his ZML where nearly all original
typography is stripped out, leaving only content and document
structure.)

Give me the content (structured is better) and then I can repurpose it
any way I want, including viewing it (or listening to it) formatted
the way *I want*.

And imagine being blind. What value does typography have for books
where typography itself is only there for visual presentational
purposes?

I've actually described a test to show the real importance of
typography to content comprehension. Read the book to your child, or
to a blind person. How often do you have to explain the typography
itself in order to communicate the content of the book? How often do
you say "oh, and this paragraph is in 12-pt Garamond"?  Or, "and this
paragraph is indented 1.0 em"? This is a very useful test for helping
make a number of decisions when massaging digital texts, such as
markup.

(Yes, one role of visual typography is to communicate structure by
understood conventions. But communicating structure is NOT content,
it describes what the content is, and that can be communicated in
other ways depending upon the milieu.)

Now I understand the love some people have for the typography of ye
olde books. I've been planning for a while the typography for the
limited print edition of the "1001 Arabian Nights" project I've
been working on for a number of years (I've been playing with
InDesign, for example.) So I understand the joy of visual typography,
and of beautifully crafted paper books. It's a lot of fun... Btw,
I probably will use Adobe Garamond for the font of this book. :^)

As a final note, and which Lee explained a while back quite
convincingly: when we have properly structured text, it is possible
to repurpose it into an exact or near-exact digital reproduction
of the original typography -- if one wants. And without a lot of
work.

But if one starts with highly-presentational markup, the text is NOT
very repurposeable without having to expend a lot of work. (E.g, the
HTML version of Burton's Arabian Nights in the PG collection is a
nightmare to repurpose and I gave up using it as my starting point.
Since then I've learned that the source used for the PG version is
not the original source, and it made significant amendments to the
original text, such as removal of non-Latin characters in footnotes!,
etc.)

Now it is possible to mix both presentational markup and structural
markup, and seemingly have it both ways. In the "master-derivative"
approach later described, embedding some original typography is
alright, but I tend to be ambivalent about it, mainly because I think
that those interested in a true reproduction are in the minority of
users, so I think about the added work for adding presentational
stuff during the mastering process. Enuf said for the moment...)


>> If I want to automatically extract the author and title from 
>> every work in my mythical corpus, and every one of them 
>> follows a different standard in identifying those two data 
>> points, it's impractical, if not impossible, to accomplish my 
>> desired task. Even if I /were/ able to write a thousand 
>> programs to match my thousand "standards" I would need some 
>> mechanism to know which text follows which standard so I know 
>> which program to apply to the particular file. In other 
>> words, I would need one, or some limited number, of 
>> meta-standards.

> Sorry, you lost me there.
>
> I'm just talking about reading the books.

Well, this is an important topic I think most everyone producing
e-books, even non-programmers, should understand what Lee is
saying -- and it is especially of importance to long-term digital
preservation of e-books.

But it is also a topic that to explain is a whole long message in
itself (I started writing up something and realized it was getting
very long very quickly, and not sure how to restate it in a brief
paragraph. Maybe Lee can recast what he said without using too
many paragraphs.)


> Even Bowerbird would prefer just one standard, though he will
> work with the simple standards mentioned above, and gives the
> credit for them where credit is due.

With regards to Bowerbird's ZML, I've probably been the #1 supporter
of his ZML (to his chagrin), since it is an attempt at regularizing
plain text. (Where we differ is that he envisions it being *the*
format for nearly all texts under the sun -- to be both master and
derivative, etc. -- and I see it as being insufficient for that
purpose.)

But if PG is to continue to include a plain text version for each
book where a plain text is even possible (and that's nearly all
books), I do think it a good idea if the plain text is regularized
in some fashion. And ZML is the only candidate out there at this
time. (There are problems with relying upon any text regularization
that Lee elaborated upon in the past, but I won't repeat it here.
But if the regularized plain text is intended just for direct reading
purposes, and not as a critical component in machine repurposing, I
don't have any problems.)


> Sadly to say, at least SOME of the people behind those WANT the
> standards THEY developed to ELIMINATE all other standards.
>
> I've asked them in person. . . .

Well, I hope you don't think I'm in that group.

I take a different tact. I'd like to see the "master-derivative"
approach taken, where the book is mastered in TEI (and properly done
per camp #2), then use that to auto-convert to a wide-range of
end-user derivative formats for reading purposes.

Certainly we could and should, as a matter of policy, create one or
more permanent derivative versions to sit side-by-side with the
master, such as regularized plain text. In fact, I've been intrigued
with the "raw text master" approach, something I may discuss here
another time.

Jon Noring


From ralf at ark.in-berlin.de  Fri Dec 21 23:45:14 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Sat, 22 Dec 2007 08:45:14 +0100
Subject: [gutvol-d] TEI corpora with public collaboration
In-Reply-To: <476C0617.1000503@novomail.net>
References: <Pine.LNX.4.64.0712201333050.16611@pglaf.org>
	<476C0617.1000503@novomail.net>
Message-ID: <20071222074514.GA26939@ark.in-berlin.de>

Lee writes:
> If you are a volunteer who feels s/he could work better in an 
> organization that provides guidance and quality control, you should find 
> some other organization to work with. (Any suggestions as to what other 
> organizations meet these requirements would be welcome; 

There may probably even one that meets your standards, if one is
interested in Irish history, literature, or politics:

http://celt.ucc.ie

It combines proofreading with TEI SGML formatting.


ralf


From ralf at ark.in-berlin.de  Fri Dec 21 23:53:17 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Sat, 22 Dec 2007 08:53:17 +0100
Subject: [gutvol-d] Why wait till we have to work from bookworm frass?
In-Reply-To: <Pine.LNX.4.64.0712211226310.9510@pglaf.org>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name>
	<476579C4.9000200@novomail.net>
	<20071217104548.GB7788@ark.in-berlin.de>
	<4766B6A9.5080400@novomail.net>
	<20071219093416.GA29329@ark.in-berlin.de>
	<4769A7BA.1070102@novomail.net>
	<20071220160825.GA32015@ark.in-berlin.de>
	<476C1C89.7000809@novomail.net>
	<Pine.LNX.4.64.0712211226310.9510@pglaf.org>
Message-ID: <20071222075317.GB26939@ark.in-berlin.de>

Michael Hart wrote 
> There is a better record of the history of our eBooks than
> in any other eLibrary, once completed.

Not so. While I won't state that history info can't be
researched with PG, such info can be accessed much more
easily with Wikisource texts. They have other problems, however.


ralf


From Bowerbird at aol.com  Sat Dec 22 11:06:21 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 22 Dec 2007 14:06:21 EST
Subject: [gutvol-d] some things never change (winter solstice edition)
Message-ID: <ca4.2304016a.349eba2d@aol.com>

some things never change...

david rothman has returned to labeling the o.l.p.c. machine as
"the $100 laptop" in the headline on his latest blog entry on it
>    
http://www.teleread.org/blog/2007/12/21/needed-asap-on-the-100-laptop-fbreader-and-easy-opera-installation/
because, as he "explains" in the text, that is the price that he
"hopes" that it'll reach "some day".   some people never learn.

wayne vota, from the unofficial o.l.p.c. news site, said this recently:
>    A lot of OLPC's problems today date back to the original buzz 
>    about the "$100 laptop," in Vota's opinion. With the price point 
>    capturing attention, OLPC didn't speak to the concept of this 
>    being a revolution in education, Vota said. He contends that 
>    now that OLPC has successfully created interest in the technology, 
>    it should focus on empowering education with technology.

you can find that quote at:
>    http://www.pcworld.com/article/id,140698-c,notebooks/article.html

so it seems that vota agrees with me that david's price-spin is a bad idea.

***

meanwhile, jon noring is inventing a new need for a new format,
this one involving annotations.   find the write-up on the teleblawg:
>    
http://www.teleread.org/blog/2007/12/21/be-my-pal-call-for-annotationlinking-open-standard/

(the long u.r.l. titles are a cheap way to enhance david's google-juice.)

and no, of course we don't need a new format for this kind of thing.
we can use the existing infrastructure to make it happen quite easily,
if we ignore format junkies and have programmers make the tools...

meanwhile, the clumsy kids over at the teleblawg enact a funny skit
that specifically involves _annotations_ (and yes, that _is_ so ironic),
in that -- after robert nagel and jon noring did an "upgrade" to the
wordpress templates -- the blawg lost its comment-summary page.

these guys -- who are constantly telling us that an x.m.l./c.s.s. setup
will solve all our problems -- can't even make the _templates_ work!

and then they have the gall to tell us that this stuff is not complicated.

it's like a keystone-cops routine over there.

man, you would think that as much as these guys _hate_my_sarcasm_,
they wouldn't give me such ample opportunity to use it, wouldn't you?

well, after a time, they've finally got the comment-summary page back:
>    http://www.teleread.org/blog/wp-stats.php

but they still haven't gotten it to work correctly.   if you click on a link,
you are taken to the _page_ that contains the entry, but _not_ to the
specific _comment_ on that page.   and, as if that wasn't bad enough,
that has the terrible side-effect that once you have clicked on the link
to go to _one_ comment on a page, then all of the subsequent links
-- on new comments that are made -- appear as "previously-visited",
meaning you have to _manually_ track if you've viewed that comment
by going to view it again.   it's such a mess that i rarely view comments
any more after scanning the headlines, and the comments have been
the only worthwhile thing on the teleblawg for quite some time now.
(you already know what david is gonna blog, as he's said it all before;
it used to be fun to see just how he'd work in the same old point again
-- especially in the "openreader" epoch (you remember that, right?) --
but lately even _that_ particular tack has become boring. c'est la vie...)

so even though noring can't do the simple hacking that's necessary to
make comments -- which are "annotations to the original blog entry" --
work correctly on the teleblawg, he nonetheless wants to "assemble" the
"brain power" needed to foist a new "annotation standard" on all of us...

yeah, right...

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071222/4b6544d8/attachment-0001.htm 

From jon at noring.name  Sat Dec 22 11:43:10 2007
From: jon at noring.name (Jon Noring)
Date: Sat, 22 Dec 2007 12:43:10 -0700
Subject: [gutvol-d] some things never change (winter solstice edition)
In-Reply-To: <ca4.2304016a.349eba2d@aol.com>
References: <ca4.2304016a.349eba2d@aol.com>
Message-ID: <18610032348.20071222124310@noring.name>

Bowerbird wrote:

>  meanwhile, jon noring is inventing a new need for a new format,
>  this one involving annotations.?  find the write-up on the teleblawg:

>>?http://www.teleread.org/blog/2007/12/21/be-my-pal-call-for-annotationlinking-open-standard/

Thanks for cross-posting the URL here!


>  that specifically involves _annotations_ (and yes, that _is_ so ironic),
>  in that -- after robert nagel and jon noring did an "upgrade" to the
>  wordpress templates -- the blawg lost its comment-summary page.

I was not involved in any way with the "upgrade". The last time I
helped out with the TeleRead blog was with some CSS during the
template change a number of weeks ago. Otherwise I have no idea what
they are doing.


Again, thanks for posting the link to my annotation article to
gutvol-d.

Jon Noring


From richfield at telkomsa.net  Sat Dec 22 13:31:36 2007
From: richfield at telkomsa.net (Jon Richfield)
Date: Sat, 22 Dec 2007 23:31:36 +0200
Subject: [gutvol-d] December the how-manyth???  DUCK everybody!!!!
Message-ID: <476D8238.4050100@telkomsa.net>

Ruin!  Destruction! Run for your sanity!  Guys are being nice to each 
other!  In gutvol-d yet!  What to do?  What to DOOOOO???  Oh no! It is 
overtaking me too!  I feel it coming over me and I'll never enjoy Swift 
again!  I can't resist it!  Bowerbird?  Next you will be building ornate 
nests to pull the chicks, you traitor! 

Still, while the spirit is on me, you lot may be variously maniacal, 
backbiting, opinionated, logical... errr...  illogical... err... make 
that incomprehensible, people who come up with such nonsense that it 
sometimes seems alarmingly like sense, but somehow some good things 
happen around and among you.  Some of you have been very nice about my 
efforts to produce things, and PG has enabled me to download large 
volumes of materials sometimes very valuable to me.  (That is why I am 
grateful for the opportunity to contribute; one-way benefits bother me!) 

So, please do not get so teed off with each other as to eject dissenters 
or waste time on them.  Just carry on and go on working miracles (those 
of you that do, anyway.) 

As for MH, welll...  sainthood is not very profitable this side of the 
grave anyway, and sometimes I think that is a pity.  But sometimes I 
wonder how long his name will be inflicted on the high-school and 
sociology and IT historians of the future.

I won't say compliments of the season, because many of us probably take 
the season very lightly, but I might as well follow BB's dating:  
Compliments of 12/10 all round!

Cheers,

Jon

From davidrothman at pobox.com  Sat Dec 22 15:21:31 2007
From: davidrothman at pobox.com (David H. Rothman)
Date: Sat, 22 Dec 2007 18:21:31 -0500
Subject: [gutvol-d] Troll Bird's $350K ZML goal vs. OLPC and e-book
	standards [Re: some things...]
Message-ID: <5eff08fa0712221521s372e7ea8j55e2a56763b00be6@mail.gmail.com>

As the main perp of the TeleRead blog, I'm always amused by negative PR from
Bowerbird---considering his famous usefulness as a contrary indicator. It
takes a very special kind of humanitarian poet to attack people for sticking
up for the One Laptop Per Child Project.

To quote a joke from a poet friend of mine, "Poets make the best liars," and
the Bird's trollish post is rather misleading at best. BOTH Wayan and I have
vigorously criticized the business side of OLPC and other current aspects of
it, but long term, we're less excited over the immediate details than over
the attention that the project is focusing on needs and solutions for
developing countries, not to mention the awesome low-cost display
technology. Yes, it's great that the OLPC example has gotten others to come
up with their own solutions. Here's to diversity! But Wayan and I are still
cheering on OLPC while asking the necessary questions.  We can't wait for
our own XO laptops---and those of the recipient children in the Give One Get
One program---to arrive from OLPC. Besides, I doubt that Wayan would
challenge my belief that XO-style machines will eventually go for less than
$100. Remember when electronic calculators used to cost hundreds or
thousands of dollars? Today the basic models are given away to help move
magazine subscriptions. The attack on me for my long-term optimism on OLPC
is really part of Bowerbird's campaign to discredit critics of his ZML
standard. More on this later.

Meanwhile I'm growing nostalgic for the time when the Bird and some others
dissed the OLPC machine as mere vaporware. My vaporware will probably be
arriving in the next week or so--I won't have a heart attack if it's
longer--and along the way I'll be helping a child in a developing country
get his/her XO laptop from OLPC. Wayan, too, can't wait for his XO. We'll
enjoy near-E Ink screen quality in the reflective mode, and I wouldn't be
surprised if FBReader will eventually be ported over to the XO make it
possible for me to read books in the standard .epub format that Hachette and
other publishing giants will be creating, not to mention the existing .epub
public domain efforts of Feedbooks (time for PG to catch up, especially
since an open source validation project has been started for .epub?).

What's really charming is the way the humanitarian poet loves to beat up on
a linux-powered open-source machine whose creators want to spread around
free public domain text. Not the wisest or most consistent strategy for
someone associated with Project Gutenberg. Especially since the XO has a
better screen for e-reading than rivals do and still costs less.

Of course, as noted, the Bird's real goal isn't to harm the XO, but rather
to discredit Jon Noring, me and other advocates of .epub, a nonproprietary
e-book standard--while meanwhile the humanitarian poet is hoping that
someone will pay him $350,000 for "full rights" to HIS ZML efforts ("just
10% of what amazon paid for mobipocket, it's a fair price"). See a Bird
quote below. Of course, somewhat more than greed, the real driver here is
Birdish vanity.

As for the TeleRead blog, it's doing great despite little flaws that are
inevitable. Our traffic often exceeds that of libraryjournal.com. Robert
Nagle is busy with his latest gig, and I'm busy posting at TeleRead and
elsewhere; and if anyone wants to pitch in on wp-stats, we'll relish the
assistance. Unlike the humanitarian poet, we're not holding ourselves out as
real programmers. But like Jon Noring, we have a damned good idea of what we
want in an e-publishing standard, and unfortunately Bowerbird's ZML would be
a DISASTER for many kinds of publications. I don't see HarperCollins, Simon
& Schuster or other .epub supporters begging the Bird for ZML rights. ZML is
Kiddiesville. Laughable for advanced scientific publishing, for example.

This time I won't directly repeat TeleRead's URL even though Bowerbord was
kind enough to give an address with our domain. Instead, for genuinely
humanitarian PG folks, here's the address to visit to buy an XO for yourself
and a child in a developing country:

http://www.laptopgiving.org/en/index.php

Phone number for Give One Get One is 1-877-70-LAPTOP (1-877-705-2786), and
the deadline for orders is the end of the year.

A relevant AP story appears at:

http://www.chicagotribune.com/news/nationworld/chi-laptop_webdec22,1,6878223.story?ctrack=2&cset=true

Lead is, "Doubts about whether poor, rural children really can benefit from
quirky little computers evaporate as quickly as the morning dew in this
hilltop Andean village, where 50 primary school children got machines from
the One Laptop Per Child project six months ago."

If anything the AP story is much gentler on OLPC than Wayan and I have been,
but we still appreciate the immense good that the project has done. Perhaps
it's time for Bowerbird to shut his beak about OLPC and squawk instead about
Iraq. The United States is torturing and killing innocent people by the
thousands, but the humanitarian poet can find nothing better to do than to
try to beat up on me for observing the obvious---that the price of laptops
for kids will be coming down. Pathetic.

Looking beyond the Bird's latest message, it might be time for PG to examine
the real purpose of this list. Is it to further PG's goal of promoting
genuine mass literacy or to provide the humanitarian poet with a platform
for performance trolling? I'd love to see PG involved with .epub and
checking up on every line of code going into the validation tool--it IS
important to monitor the International Digital Publishing Forum and ask
skeptical questions. Pure .epub could be a real blessing for PG. But
Bowerbird doesn't want you to care about .epub or the XO project; ZML must
be the real show. And who cares if the Bird makes it impossible for list
participants to stand up for standards without getting flamed? ZML, vanity
and the $350K first!

David Rothman for TeleRead

=================================

From: Bowerbird at aol.com <Bowerbird at aol.com>
Date: Oct 22, 2007 11:44 PM
Subject: [gutvol-d] nice weekend
To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com

[...]

for a long time now, the price for anyone wanting to buy the rights to z.m.l
.
has been "six figures".  and i think it was a couple years back that i
raised it
to $200,000 minimum.  now, with fairly solid conversions to .html and .pdf,
an offline-standalone authoring tool, and a to-be-announced-quite-soon
web-based authoring tool, plus viewer-apps, i'll be raising the price
again...

as of november 1, 2007, the price for full rights to z.m.l. will be
$350,000.
since this is just 10% of what amazon paid for mobipocket, it's a fair
price...

preference will be given to buyers who will make the package open-source,
and such buyers can negotiate for a substantial discount, maybe up to 50%...

of course, you know, you could just figure it all out for yourself.  it's
simple...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071222/4726734e/attachment.htm 

From gbnewby at pglaf.org  Sat Dec 22 18:35:13 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat, 22 Dec 2007 18:35:13 -0800
Subject: [gutvol-d] Master format
In-Reply-To: <15110491966.20071221230635@noring.name>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
	<476C7097.9070106@novomail.net>
	<Pine.LNX.4.64.0712211909470.18344@pglaf.org>
	<15110491966.20071221230635@noring.name>
Message-ID: <20071223023513.GA3659@mail.pglaf.org>

On Fri, Dec 21, 2007 at 11:06:35PM -0700, Jon Noring wrote:
> ...
> I take a different tact. I'd like to see the "master-derivative"
> approach taken, where the book is mastered in TEI (and properly done
> per camp #2), then use that to auto-convert to a wide-range of
> end-user derivative formats for reading purposes.

That's been a long-term goal.  There are a lot of capable
TEI tools that were rolled out by a number of PG volunteers,
and we've posted a few.

While there a variety of reasons why this isn't the default
not for submissions from DP and other sources, I do expect
the transformation to a TEI master to take place.

(I think most of the people on gutvol-d already know this, but
just in case... also, it's an opportunity for anyone who cares to
to drill deeper into the state of the art and outstanding "to do"
items in this effort.  I don't have such a list.)

  -- Greg


From lee at novomail.net  Sat Dec 22 21:30:45 2007
From: lee at novomail.net (Lee Passey)
Date: Sat, 22 Dec 2007 22:30:45 -0700
Subject: [gutvol-d] !@!Re: Why wait till we have to work from bookworm
 frass?
In-Reply-To: <Pine.LNX.4.64.0712211909470.18344@pglaf.org>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>	<476C7097.9070106@novomail.net>
	<Pine.LNX.4.64.0712211909470.18344@pglaf.org>
Message-ID: <476DF285.8030503@novomail.net>

Michael Hart wrote:

[snip]

> Any particular credit line you would like?

If "Project Gutenberg" is going to go to the trouble to seek these 
things out and republish them, then I think "Project Gutenberg" should 
get the credit. I certainly /wouldn't/ want it implied that I did the 
work for or on behalf of PG.

[snip]

> What I see everywhere are comments on FORM, APPEARANCE, that
> stuff that goes down to JUDGING A BOOK BY ITS COVER.

At dinner last week, a contractor friend of mine mentioned that in 
interviewing two candidates for a job as a framing carpenter, all else 
being equal he would hire the one with a college degree, even in a 
totally unrelated field, because it indicated that s/he was someone who 
could follow instructions and carry through to the end.

One of my supervisors once pointed out to me that while clothes do not 
make the man, they do announce him.

When I see a text file that looks like the kind of automated OCR texts 
we see from OCA or Google I think to myself, "If they couldn't figure 
out how to put pages together, or make a file that word wraps (or if 
they weren't interested in taking the time to do so) what else have they 
missed? Am I going to get halfway through and discover a missing page? 
Will the scannos be so distracting that I won't be able to concentrate 
on the content? I'm not going to bother with this."

If PG e-texts /claim/ to be word perfect, but are uncomfortable to read 
due to their inability to word wrap, or unexplained _ * or # characters, 
I get much the same feeling. Am I petty to judge the quality of a work 
by the quality of its presentation? Maybe. But I do it anyway.

> I'm just not that sort of person. . . .

And I guess I am. I'm comfortable with that.

[snip]

> Again my apologies, those kinds of standards are just not the
> part of books that interest me.

Understandable. But why would you assume that they don't interest anyone 
else?

[snip]

>> Rather than pointing to examples of books which have managed 
>> to avoid the white-washer gauntlet, it would be more useful 
>> to explain the process used to avoid that gauntlet. I'm sure 
>> I'm not the only one interested in the answer to /that/ 
>> question.
> 
> The simple answer, as always, is just contect Newby or myself.

Wouldn't it just be easier to instruct the white-washers to not reject 
texts that aren't accompanied by markup-free, simple ASCII texts?

[snip]

> Sorry, you lost me there.
> 
> I'm just talking about reading the books.

And I'm taking about everything else.

[snip]

> I diagree, as do many programmers who use our eBooks.

Well, would you have some of them contact me because /I/ can't figure it 
out.

[snip]

> It's just that the standards are so simple, not that they were
> never published. . .and that we don't force them on volunteers.

Now /this/ I understand. My wife's always expecting me to read /her/ 
mind too. "But honey, it was so /obvious/!"

[snip]

> I think that eventually eBooks will settle into patterns quite
> much the way paper books did.

I agree. I would hope it would happen sooner rather than later, but I'm 
sure it will happen. Like Mr. Newby, I'm thinking the short-term 
standard for digital text masters will probably end up being TEI. The 
long term standard will probably be something that hasn't been invented 
yet, but there will certainly be an automated migration path from TEI to 
that new standard.

> Look at the early ones, all over the place, in size, paper and
> binding, fonts, inks, and everything else.
> 
> That's the way pioneers are.
> 
> Later on comes the pressure for everyone to be alike. . . .
> 
> And the pioneers either die out or move on.

And everything the pioneers did that can't be converted to the new 
system dies out with them.

[snip]

> Sadly to say, at least SOME of the people behind those WANT the
> standards THEY developed to ELIMINATE all other standards.
> 
> I've asked them in person. . . .

This sounds just so bizarre to me that I have a hard time relating to 
it. I suppose that sort of megalomania exists, if you say so, but I 
can't imagine anyone taking them seriously.

>> The reason the web and browsers succeeded is because Sir Tim 
>> Berners-Lee invented the HyperText Markup Language and the 
>> HyperText Transfer Protocol, and everyone agreed to use it. 
>> The reason e-books /haven't/ succeeded is because everyone 
>> insists on doing things their own way.
> 
> Actually, I think it was as much the invention of browsers,
> search engines, etc., that did it. . . .
> 
> It could have been ANY markup system. . .well not ANY, but,
> 
> MANY. . . .

Sorry, I just don't buy it. It /could/ have been any markup system (at 
least any markup system that met the requirements -- text reflow, 
alternate presentations, hyperlinking, etc.) but it could be only /one/. 
The victorious system may have been arbitrary, but it was victorious.

>>> "Just DO It!"
>> Sounds like good advice to me.
> 
> 
> I would be more than happy to assist you in doing it,
> if you would allow me. . . .

Thanks . . . but I think I'll try to muddle along on my own -- or with 
anyone else who would care to join me.

From Bowerbird at aol.com  Sat Dec 22 23:24:07 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 23 Dec 2007 02:24:07 EST
Subject: [gutvol-d] realism
Message-ID: <cc3.206da342.349f6717@aol.com>

gee, i thought i had david in my spam folder.   he will be from now on.
but hey, no reason for all of you not to enjoy his )inadvertent) humor.

for the record, i _love_ the o.l.p.c. project.   even ordered one myself,
first day i could (heck, i waited a _month_ before buying the iphone),
and will be contributing some software to the kids if i'm able to do so.
negroponte himself told me that he'd love to have some of my apps...

i know tech.   and i know you don't do it any favors by spinning vapor, 
and hype that can't be attained, since that leads to false expectations,
which only come to bite you in the butt and rob you of your credibility.

rothman raved about openreader for years...   literally years (no joke!)...
and where is it now?   on the scrapheap.   along with rothman's credibility.
and it didn't get there because of anything _i_ said.   it's his _own_ 
fault...

it's enough to make me chuckle.

-bowerbird

p.s.   look for the next update in my "some things never change" series
on the spring equinox...   it'll be the same story.   because (wait for 
it...)
some things never change...


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071223/805a6b18/attachment-0001.htm 

From ralf at ark.in-berlin.de  Sun Dec 23 00:05:33 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Sun, 23 Dec 2007 09:05:33 +0100
Subject: [gutvol-d] Troll Bird's $350K ZML goal vs. OLPC and
	e-book	standards [Re: some things...]
In-Reply-To: <5eff08fa0712221521s372e7ea8j55e2a56763b00be6@mail.gmail.com>
References: <5eff08fa0712221521s372e7ea8j55e2a56763b00be6@mail.gmail.com>
Message-ID: <20071223080533.GA16701@ark.in-berlin.de>

> A relevant AP story appears at:
> 
> http://www.chicagotribune.com/news/nationworld/chi-laptop_webdec22,1,6878223.story?ctrack=2&cset=true

Thanks for the link but this here is registration-free:

http://www.chicagotribune.com/news/nationworld/chi-laptop_webdec22,1,6878223.story


ralf


From davidrothman at pobox.com  Sun Dec 23 02:47:04 2007
From: davidrothman at pobox.com (David H. Rothman)
Date: Sun, 23 Dec 2007 05:47:04 -0500
Subject: [gutvol-d] Troll Bird's $350K ZML goal vs. OLPC and e-book
	standards [Re: some things...]
In-Reply-To: <20071223080533.GA16701@ark.in-berlin.de>
References: <5eff08fa0712221521s372e7ea8j55e2a56763b00be6@mail.gmail.com>
	<20071223080533.GA16701@ark.in-berlin.de>
Message-ID: <5eff08fa0712230247n4e2934cew3a3249c08a273a07@mail.gmail.com>

Ralf: Thanks! Within the bounds of fair use, I'll repro a little more of the
AP piece below from the Chicago Tribune--just in case the Trib later yanks
the reg-free version. A very quick Google act doesn't seem to point me to
other full texts right now.

Bowerbird: It's great you see positives in the OLPC project. Too bad they
were lost in your eagerness to carry on your jihad against e-book standards,
as part of your effort to market your $350,000 project by trying to
discredit the skeptics. As for the OpenReader standard, we got co-opted---by
the IDPF, which, after years of  dragging its feed, finally woke up as a
result of our efforts, as Adobe's Bill McCoy acknowledged. Not the worst
tragedy. While I had serious issues with the implementation side of
OpenReader, I don't think we did too badly in the end. The big lesson I
learned is that publishers want to deal with the IDPF, so that's where I'm
focusing my efforts--while encouraging people to monitor the group's
standards initiative for purity. It really pains me to see
newbies--potential PG readers, kids included!--so confused by the Tower of
eBabel of 20+ warring e-book formats. And that's just part of the damage
from eBabel.

Well, enough. In honor of the holidays, during which Bird supposedly was
taking a break from his troll act (he's the one who started the round and
intends to continue his jihad next year), I'll stop after this paragraph.
May 2008 be the year when this list finally stands up against Bird-style
trolling! Can you imagine a potential PG funder tuning in? It's great to
question other people's statements to arrive at the truth while being civil
about it; but there's a difference between that and sustained personal
attacks that go on for years. While people can flame Troll Bird right back,
as they do in self-defense, it gets rather boring. It's a little like
burning up dry bamboo--not to mention the time stolen from the advancement
of PG's goals.

HH and thanks,

David

On Dec 23, 2007 3:05 AM, Ralf Stephan <ralf at ark.in-berlin.de> wrote:

> > A relevant AP story appears at:
> >
> >
> http://www.chicagotribune.com/news/nationworld/chi-laptop_webdec22,1,6878223.story?ctrack=2&cset=true
>
> Thanks for the link but this here is registration-free:
>
>
> http://www.chicagotribune.com/news/nationworld/chi-laptop_webdec22,1,6878223.story
>
>
========================

These offspring of peasant families whose monthly earnings rarely exceed the
cost of one of the $188 laptops ? people who can ill afford pencil and paper
much less books ? can't get enough of their "XO" laptops.

At breakfast, they're already powering up the combination
library/videocam/audio recorder/music maker/drawing kits. At night, they're
dozing off in front of them ? if they've managed to keep older siblings from
waylaying the coveted machines.

   "It's really the kind of conditions that we designed for," Walter Bender,
president of the Massachusetts Institute of Technology spinoff, said of this
agrarian backwater up a precarious dirt road.

Founded in 2005 by former MIT Media Lab director Nicholas Negroponte, the
One Laptop program has retreated from early boasts that developing-world
governments would snap up millions of the pint-sized laptops at $100 each.

In a backhanded tribute, One Laptop now faces homegrown competitors
everywhere from Brazil to India ? and a full-court press from Intel Corp.'s
more power-hungry Classmate.

But no competitor approaches the XO in innovation. It is hard drive-free,
runs on the Linux operating system and stretches wireless networks with
"mesh" technology that lets each computer in a village relay data to the
others.

Mass production began last month and Negroponte says he expects at
least 1.5million machines to be sold by next November. Even that would
be far less
than Negroponte originally envisioned. The higher-than-initially-advertised
price and a lack of the Windows operating system, still being tested for the
XO, have dissuaded many potential government buyers.

Peru made the single biggest order to date ? more than 272,000 machines ? in
its quest to turn around a primary education system that the World Economic
Forum recently ranked last among 131 countries.

[...]

"Some tell me that they don't want to be like their parents, working in the
fields," first-grade teacher Erica Velasco says of her pupils. She had just
sent them to the Internet to seek out photos of invertebrates ? animals
without backbones.

Antony, 12, wants to become an accountant.

Alex, 7, aspires to be a lawyer.

Kevin, 9, wants to play trumpet.

Saida, 10, is already a promising videographer, judging from her artful
recording of the town's recent Fiesta de la Virgen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071223/7c797d54/attachment.htm 

From jon at noring.name  Sun Dec 23 07:53:47 2007
From: jon at noring.name (Jon Noring)
Date: Sun, 23 Dec 2007 08:53:47 -0700
Subject: [gutvol-d] Update on OpenReader...
In-Reply-To: <cc3.206da342.349f6717@aol.com>
References: <cc3.206da342.349f6717@aol.com>
Message-ID: <1116604583.20071223085347@noring.name>

Bowerbird wrote:

>  rothman raved about openreader for years...?  literally years (no joke!)...
>  and where is it now??  on the scrapheap.?  along with rothman's credibility.
>  and it didn't get there because of anything _i_ said.?  it's his _own_ fault...
>  
>  it's enough to make me chuckle.

Thanks for opening this opportunity to provide an update of OpenReader
to gutvol-d! I am always glad when Bowerbird brings up new topics we
can discuss that might be of interest to some in the gutvol community.

To start off this update, it's been said "I'd rather see a man try and
fail, then not try anything at all and fail by default."

For example, I believe ZML will fail for the purpose it is being
designed for, but despite that belief I admire Bowerbird for trying and
for his general persistence and chutzpah (some would call what he says
about ZML is hype, but he really believes in what he is doing so I call
it enthusiastic promotion.)

And Bowerbird's efforts will not go for naught -- good things will
come of them to benefit e-books, but what those benefits will be are
unknown at this time. We'll only know after a few years...

And note when ZML does not develop as Bowerbird hopes it will, I will
NOT point my finger at Bowerbird and say he lost credibility or he
failed. Rather, I will admire what he tried to do and said "job well
done!" and focus on the benefits which come out of that effort.

*****

Regarding OpenReader, the OR web page starkly says that OpenReader was a
rousing success and a complete, utter failure. And you know, I'm proud
of that failure, because *we tried*. Draft specs were written which
are still being mined by others for the ideas they contain (I spent
about two man-months of almost full time effort hammering down these
drafts, and I've had a lot of praise for their thoroughness, quality
and consistency.)

Also, untold hours were spent communicating with publishers (large and
small), technology developers, building the consortium, etc. We lobbied
hard, fought hard, and spec'd hard -- and we fought some power players
and I believe we came close to reaching critical mass despite it being
quite an uphill battle:

   http://www.openreader.org/

Bill McCoy at Adobe, in public, stated that OpenReader was a success
since it spurred IDPF to take action and do *exactly* what I lobbied it
should do for several years (more on this right below):

   http://www.teleread.org/blog/2007/01/05/openreader-victorious/

(Now OpenReader is not yet dead, but still has innovations I believe
the next version of OPS/EPub should incorporate. We'll see...)

I started OpenReader (with the help of David Rothman who brought in a
lot of energy -- and David came up with the name OpenReader) because I
was frustrated with IDPF not taking action to develop a universal
consumer e-book format per the requirements of the article I wrote back
in 2003:

   http://www.teleread.org/blog/2007/08/29/e-book-standards-article-redux-a-comparison-between-2003-dreams-and-2007-reality/

(Let's say I spent a few years trying to work in the "system" lobbying
for change -- finally I had to do something myself to force the issue.)

Much of IDPF's non-responsiveness was due to the power players in IDPF
not wanting this -- each had their own proprietary format solution they
wanted to dominate in the marketplace. But shortly after OpenReader was
announced, two major events occurred:

1) IDPF had a fairly major power realignment (e.g., Microsoft left)
   which left a vacuum, and

2) Adobe (the only remaining "power player" in IDPF) had a 180 degree
   turnaround (which surprised me), through the efforts of Bill McCoy,
   and decided there was a pressing need for a reflowable, open standards
   e-book format. (Up to then, Adobe believed PDF to be the solution.)

This led to IDPF making a fast 180 degree turn, and put renewed energy
into implementing the *exact* things my 2003 article recommended. And
they worked *very* fast -- too fast in my opinion, but nevertheless
ETI and Adobe drove the wagon very hard.

And I proudly contributed to the new "EPub" standard that resulted.
("EPub" is not yet an official name -- the standards that underlie
"EPub" are OPS, OPF, and OCF -- sorry for the acronym soup.)

Anyway, I'll let others decide whether or not OpenReader was a
success, or failure, or something else. Bill McCoy, who is a General
Manager of ePublishing at Adobe, has weighed in his thoughts: success.
Bowerbird said it was an utter failure and destroyed my credibility in
the eyes of the world (btw, does he speak for the world?) What are
your thoughts, dear reader?

Jon Noring


From Bowerbird at aol.com  Sun Dec 23 10:31:45 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 23 Dec 2007 13:31:45 EST
Subject: [gutvol-d] ample opportunities
Message-ID: <d01.1fb13682.34a00391@aol.com>

greg said:
>    I do expect the transformation to a TEI master to take place.

as i said earlier, it's nice of all you people to just _hand_me_
such ample opportunities to employ my infamous sarcasm...

t.e.i. has been "the official policy" of project gutenberg for years.
it had already been "the official policy" for a few years in _2003_,
when i first came to this listserve.   and now, 4 years later, it still is.
but yet, look for yourself how many e-texts are marked up in t.e.i.
(and then look at the "derivative formats", and laugh at this "future".)

moreover, since the library has grown 150% in those last 4 years, 
the "backlog" of e-texts needing markup has grown considerably.

the whitewashers can't even maintain the library in its current form;
look at how long it's taking to move to the "new" directory structure.
and you're saying that backlog will be marked up in t.e.i. soon?   ha!

but hey, bring on "the official policy".   the quicker the p.g. library
gets its t.e.i. combover, the quicker it will start to deteriorate from
the lack of maintenance which will inevitably follow, and therefore
the quicker which my z.m.l. mirror will supplant the t.e.i. mutant...

***

and over on the teleblawg, david rothman makes it easy to mock him
by talking out of every side of his mouth.   he's against d.r.m., but he's
_for_ "social" d.r.m., but he "recognizes" that publishers will "require"
some form of d.r.m., so "grudgingly" accepts it, but he demands a type
of d.r.m. that would be "interoperable", which he evidently believes is
_possible_, which means he must know more than steve jobs, who has
said (in his famous letter) that it's basically impossible, because once
the "secret sauce" has to be shared, you just can't keep it in the bottle,
but meanwhile rothman wants "open-source" solutions which _require_
that the "secret sauce" be not just _shared_, but openly available to all...

david is also busy being a lap-dog for the publishing houses other ways.
he acts like it's important that they be "on-board" for all "his" 
initiatives...
listen up, people.   the publishing houses are idiots.   and they're 
dinosaurs.
they're idiots because they are actually trying to _follow_the_footsteps_ of 
recording companies, who, as you know, are now waist-deep in quicksand.
that's right, they think that by ignoring digital distribution, it will "go 
away".
yeah, that's really smart.   meanwhile, you can get nearly any book you want
from "pirate" networks.   if you didn't know this, you probably haven't 
tried,
which is no surprise.   very few people even _want_ to read books anymore.
and if you take a good look at the "bestseller lists", it's very easy to see 
why,
because 8 out of the top 10 books (and 34 of the top 40) are pure garbage.
word up, it's _the_publishing_industry_ that ruined the publishing industry.
the recording industry has a teensy bit of success "blaming" p2p networks,
even though their demise has been largely their own fault.   but publishers?
they've got no one to blame but themselves.   no one even wants to _steal_
8/10ths of the garbage they put out for sale.

maybe you think some segments are still worthy.   like perhaps textbooks?
yeah, right, they've been gouging school districts and college students for
so long, and in such an obvious way, that they no longer get any respect...

academic journals?   even worse.   and libraries have started to _stand_up_
against them, and inform them in no uncertain terms they will crush them.

so who cares what format the publishing industry decides upon?   not me!

because the future of books involves artists going directly to their 
audience.
the publishing houses have been disintermediated, and that's a good thing.

i'm interested in a format that's simple enough that you don't have to hire a
"consultant" to negotiate the technoid obstacle-course to create an e-book.
a format that gives readers the text -- "out in the clear", as the saying 
goes,
a format that also gives them the ability to set options the way they want 
'em
-- and do that quickly and easily -- and remix text to their heart's content,
including repurposing it into any other format that they desire at any 
time...

that's what my format does.   that's why the authors of tomorrow will use it.

-bowerbird

p.s.   here's a little present for you, some interview segments with david 
byrne:
>    
http://www.wired.com/entertainment/music/magazine/16-01/ff_byrne?currentPage=all#s


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071223/88b1a3c9/attachment-0001.htm 

From gbnewby at pglaf.org  Sun Dec 23 23:21:58 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun, 23 Dec 2007 23:21:58 -0800
Subject: [gutvol-d] Whitewashers (was Re: !@!Re: Why wait till we have to
	work from bookworm frass?)
In-Reply-To: <476DF285.8030503@novomail.net>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
	<476C7097.9070106@novomail.net>
	<Pine.LNX.4.64.0712211909470.18344@pglaf.org>
	<476DF285.8030503@novomail.net>
Message-ID: <20071224072158.GB25293@mail.pglaf.org>

On Sat, Dec 22, 2007 at 10:30:45PM -0700, Lee Passey wrote:
> >> Rather than pointing to examples of books which have managed 
> >> to avoid the white-washer gauntlet, it would be more useful 
> >> to explain the process used to avoid that gauntlet. I'm sure 
> >> I'm not the only one interested in the answer to /that/ 
> >> question.
> > 
> > The simple answer, as always, is just contect Newby or myself.
> 
> Wouldn't it just be easier to instruct the white-washers to not reject 
> texts that aren't accompanied by markup-free, simple ASCII texts?

There seems to be a mysticism about the whitewashers.  Drop it.
They're volunteers, like pretty well everyone else who ever does
anything for PG.  

Those very few individuals do not "reject texts that aren't
accompanied by markup-free, simple ASCII texts."  Since Michael
already told you a way to get around any possible "gauntlet"
you might see before you, quit yer bitchin'.

Anyway, how would you know, Lee?  I can't find any record of you
submitting any etexts, or even a copyright clearance request,
or having ever communicated with the whitewashers via their 
mail list, or ever emailing me with a text you want to submit
or a question about one.  You're an outsider who is not only
casting criticism without construct, but also besmirching the
efforts of those who actually *do* the work.

I'm leaping to the WWers defense because they don't subscribe
to this list.  Meanwhile, I'll go back to ignoring your prattle.
 -- Greg

From lee at novomail.net  Mon Dec 24 08:44:10 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 24 Dec 2007 09:44:10 -0700
Subject: [gutvol-d] Whitewashers
In-Reply-To: <20071224072158.GB25293@mail.pglaf.org>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>	<476C7097.9070106@novomail.net>	<Pine.LNX.4.64.0712211909470.18344@pglaf.org>	<476DF285.8030503@novomail.net>
	<20071224072158.GB25293@mail.pglaf.org>
Message-ID: <476FE1DA.9030603@novomail.net>

On 8 Oct 2007 Ralf Stephan wrote:

> This may be a FAQ. Would PG accept files that are a superset of PGTEI
> and a subset of TEI? If so, which ending should the file have to not
> confuse it with a possible PGTEI file?

On 8 Oct 2007 I wrote:

> It should, by now, be well established that Mr. Hart and the PGPTB 
> are strongly opposed to the establishment of any file format as the 
> "preferred" format, regardless of its capabilities. If you look 
> carefully at the PG FAQ you will note that while an ASCII text 
> version is requested, it is not required. Thus, you should be able to
> submit a valid TEI file to PG, and no other format.

On 8 Oct 2007 joshua at hutchinson.net wrote:

> Well, yes and no.  The FAQ does not say it is required ... but none 
> of the whitewashers will post it without a text file.  You'd have to
> go through Greg Newby and get a special dispensation from on high.
> :) And there has to be a "read good reason" to not post a text
> version.

On 23 Dec 2007 Greg Newby wrote:

[snip]

> There seems to be a mysticism about the whitewashers.  Drop it. 
> They're volunteers, like pretty well everyone else who ever does 
> anything for PG.
> 
> Those very few individuals do not "reject texts that aren't 
> accompanied by markup-free, simple ASCII texts."  Since Michael 
> already told you a way to get around any possible "gauntlet" you 
> might see before you, quit yer bitchin'.

So we now have two conflicting answers to the question, one of which I 
would deem authoritative, the other of which I would deem speculative. 
Since no one had challenged Mr. Hutchinson's earlier answer, I was just 
trying to clear up any confusion on this matter.


From lopez2 at netscorp.net  Mon Dec 24 09:11:49 2007
From: lopez2 at netscorp.net (Kevin Edward Lopez)
Date: Mon, 24 Dec 2007 11:11:49 -0600 (CST)
Subject: [gutvol-d] ample opportunities
In-Reply-To: <d01.1fb13682.34a00391@aol.com>
References: <d01.1fb13682.34a00391@aol.com>
Message-ID: <3184.216.150.45.176.1198516309.squirrel@216.150.45.176>

what kind of opportuities?

Ed


> greg said:
>>    I do expect the transformation to a TEI master to take place.
>
> as i said earlier, it's nice of all you people to just _hand_me_
> such ample opportunities to employ my infamous sarcasm...
>
> t.e.i. has been "the official policy" of project gutenberg for years.
> it had already been "the official policy" for a few years in _2003_,
> when i first came to this listserve.   and now, 4 years later, it still
> is.
> but yet, look for yourself how many e-texts are marked up in t.e.i.
> (and then look at the "derivative formats", and laugh at this "future".)
>
> moreover, since the library has grown 150% in those last 4 years,
> the "backlog" of e-texts needing markup has grown considerably.
>
> the whitewashers can't even maintain the library in its current form;
> look at how long it's taking to move to the "new" directory structure.
> and you're saying that backlog will be marked up in t.e.i. soon?   ha!
>
> but hey, bring on "the official policy".   the quicker the p.g. library
> gets its t.e.i. combover, the quicker it will start to deteriorate from
> the lack of maintenance which will inevitably follow, and therefore
> the quicker which my z.m.l. mirror will supplant the t.e.i. mutant...
>
> ***
>
> and over on the teleblawg, david rothman makes it easy to mock him
> by talking out of every side of his mouth.   he's against d.r.m., but he's
> _for_ "social" d.r.m., but he "recognizes" that publishers will "require"
> some form of d.r.m., so "grudgingly" accepts it, but he demands a type
> of d.r.m. that would be "interoperable", which he evidently believes is
> _possible_, which means he must know more than steve jobs, who has
> said (in his famous letter) that it's basically impossible, because once
> the "secret sauce" has to be shared, you just can't keep it in the bottle,
> but meanwhile rothman wants "open-source" solutions which _require_
> that the "secret sauce" be not just _shared_, but openly available to
> all...
>
> david is also busy being a lap-dog for the publishing houses other ways.
> he acts like it's important that they be "on-board" for all "his"
> initiatives...
> listen up, people.   the publishing houses are idiots.   and they're
> dinosaurs.
> they're idiots because they are actually trying to _follow_the_footsteps_
> of
> recording companies, who, as you know, are now waist-deep in quicksand.
> that's right, they think that by ignoring digital distribution, it will
> "go
> away".
> yeah, that's really smart.   meanwhile, you can get nearly any book you
> want
> from "pirate" networks.   if you didn't know this, you probably haven't
> tried,
> which is no surprise.   very few people even _want_ to read books anymore.
> and if you take a good look at the "bestseller lists", it's very easy to
> see
> why,
> because 8 out of the top 10 books (and 34 of the top 40) are pure garbage.
> word up, it's _the_publishing_industry_ that ruined the publishing
> industry.
> the recording industry has a teensy bit of success "blaming" p2p networks,
> even though their demise has been largely their own fault.   but
> publishers?
> they've got no one to blame but themselves.   no one even wants to _steal_
> 8/10ths of the garbage they put out for sale.
>
> maybe you think some segments are still worthy.   like perhaps textbooks?
> yeah, right, they've been gouging school districts and college students
> for
> so long, and in such an obvious way, that they no longer get any
> respect...
>
> academic journals?   even worse.   and libraries have started to
> _stand_up_
> against them, and inform them in no uncertain terms they will crush them.
>
> so who cares what format the publishing industry decides upon?   not me!
>
> because the future of books involves artists going directly to their
> audience.
> the publishing houses have been disintermediated, and that's a good thing.
>
> i'm interested in a format that's simple enough that you don't have to
> hire a
> "consultant" to negotiate the technoid obstacle-course to create an
> e-book.
> a format that gives readers the text -- "out in the clear", as the saying
> goes,
> a format that also gives them the ability to set options the way they want
> 'em
> -- and do that quickly and easily -- and remix text to their heart's
> content,
> including repurposing it into any other format that they desire at any
> time...
>
> that's what my format does.   that's why the authors of tomorrow will use
> it.
>
> -bowerbird
>
> p.s.   here's a little present for you, some interview segments with david
> byrne:
>>
> http://www.wired.com/entertainment/music/magazine/16-01/ff_byrne?currentPage=all#s
>
>
>
> **************************************
> See AOL's top rated recipes
> (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From lopez2 at netscorp.net  Mon Dec 24 10:32:44 2007
From: lopez2 at netscorp.net (Kevin Edward Jordan)
Date: Mon, 24 Dec 2007 12:32:44 -0600 (CST)
Subject: [gutvol-d] ample opportunities
In-Reply-To: <d01.1fb13682.34a00391@aol.com>
References: <d01.1fb13682.34a00391@aol.com>
Message-ID: <3951.216.150.45.189.1198521164.squirrel@216.150.45.189>


what kind of opportunities ?


> greg said:
>>    I do expect the transformation to a TEI master to take place.
>
> as i said earlier, it's nice of all you people to just _hand_me_
> such ample opportunities to employ my infamous sarcasm...
>
> t.e.i. has been "the official policy" of project gutenberg for years.
> it had already been "the official policy" for a few years in _2003_,
> when i first came to this listserve.   and now, 4 years later, it still
> is.
> but yet, look for yourself how many e-texts are marked up in t.e.i.
> (and then look at the "derivative formats", and laugh at this "future".)
>
> moreover, since the library has grown 150% in those last 4 years,
> the "backlog" of e-texts needing markup has grown considerably.
>
> the whitewashers can't even maintain the library in its current form;
> look at how long it's taking to move to the "new" directory structure.
> and you're saying that backlog will be marked up in t.e.i. soon?   ha!
>
> but hey, bring on "the official policy".   the quicker the p.g. library
> gets its t.e.i. combover, the quicker it will start to deteriorate from
> the lack of maintenance which will inevitably follow, and therefore
> the quicker which my z.m.l. mirror will supplant the t.e.i. mutant...
>
> ***
>
> and over on the teleblawg, david rothman makes it easy to mock him
> by talking out of every side of his mouth.   he's against d.r.m., but he's
> _for_ "social" d.r.m., but he "recognizes" that publishers will "require"
> some form of d.r.m., so "grudgingly" accepts it, but he demands a type
> of d.r.m. that would be "interoperable", which he evidently believes is
> _possible_, which means he must know more than steve jobs, who has
> said (in his famous letter) that it's basically impossible, because once
> the "secret sauce" has to be shared, you just can't keep it in the bottle,
> but meanwhile rothman wants "open-source" solutions which _require_
> that the "secret sauce" be not just _shared_, but openly available to
> all...
>
> david is also busy being a lap-dog for the publishing houses other ways.
> he acts like it's important that they be "on-board" for all "his"
> initiatives...
> listen up, people.   the publishing houses are idiots.   and they're
> dinosaurs.
> they're idiots because they are actually trying to _follow_the_footsteps_
> of
> recording companies, who, as you know, are now waist-deep in quicksand.
> that's right, they think that by ignoring digital distribution, it will
> "go
> away".
> yeah, that's really smart.   meanwhile, you can get nearly any book you
> want
> from "pirate" networks.   if you didn't know this, you probably haven't
> tried,
> which is no surprise.   very few people even _want_ to read books anymore.
> and if you take a good look at the "bestseller lists", it's very easy to
> see
> why,
> because 8 out of the top 10 books (and 34 of the top 40) are pure garbage.
> word up, it's _the_publishing_industry_ that ruined the publishing
> industry.
> the recording industry has a teensy bit of success "blaming" p2p networks,
> even though their demise has been largely their own fault.   but
> publishers?
> they've got no one to blame but themselves.   no one even wants to _steal_
> 8/10ths of the garbage they put out for sale.
>
> maybe you think some segments are still worthy.   like perhaps textbooks?
> yeah, right, they've been gouging school districts and college students
> for
> so long, and in such an obvious way, that they no longer get any
> respect...
>
> academic journals?   even worse.   and libraries have started to
> _stand_up_
> against them, and inform them in no uncertain terms they will crush them.
>
> so who cares what format the publishing industry decides upon?   not me!
>
> because the future of books involves artists going directly to their
> audience.
> the publishing houses have been disintermediated, and that's a good thing.
>
> i'm interested in a format that's simple enough that you don't have to
> hire a
> "consultant" to negotiate the technoid obstacle-course to create an
> e-book.
> a format that gives readers the text -- "out in the clear", as the saying
> goes,
> a format that also gives them the ability to set options the way they want
> 'em
> -- and do that quickly and easily -- and remix text to their heart's
> content,
> including repurposing it into any other format that they desire at any
> time...
>
> that's what my format does.   that's why the authors of tomorrow will use
> it.
>
> -bowerbird
>
> p.s.   here's a little present for you, some interview segments with david
> byrne:
>>
> http://www.wired.com/entertainment/music/magazine/16-01/ff_byrne?currentPage=all#s
>
>
>
> **************************************
> See AOL's top rated recipes
> (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From lee at novomail.net  Mon Dec 24 10:34:30 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 24 Dec 2007 11:34:30 -0700
Subject: [gutvol-d] ample opportunities
In-Reply-To: <d01.1fb13682.34a00391@aol.com>
References: <d01.1fb13682.34a00391@aol.com>
Message-ID: <476FFBB6.7090800@novomail.net>

Bowerbird at aol.com wrote:

> greg said:
>
>>    I do expect the transformation to a TEI master to take place.
> 
> as i said earlier, it's nice of all you people to just _hand_me_
> such ample opportunities to employ my infamous sarcasm...
> 
> t.e.i. has been "the official policy" of project gutenberg for years.
> it had already been "the official policy" for a few years in _2003_,
> when i first came to this listserve.   and now, 4 years later, it still is.
> but yet, look for yourself how many e-texts are marked up in t.e.i.
> (and then look at the "derivative formats", and laugh at this "future".)

I'm sorry, didn't you get the memo? Project Gutenberg has no "official 
policies," except as it involves copyright clearances, and certainly 
none that involve file formatting or markup. While TEI files are 
permitted, they are not required, nor even encouraged beyond the generic 
"give us whatever you want." And as near as I can tell, TEI files are 
not even preferred over, say, z.m.l.

Mr. Hart has noted earlier that, "Whatever standards emerge from the 
real world are just fine." I seem to recall you saying something to the 
effect that most of the files in the PG corpus already conform to 
z.m.l., so if you are correct it would appear that z.m.l. is as close to 
an "official policy" as we're going to get.

So hop on it, and get us that tool that will distinguish valid z.m.l. 
from invalid z.m.l. so we can start making a list of that which does, 
and does not, conform to the "official policy."

From hart at pglaf.org  Mon Dec 24 10:40:15 2007
From: hart at pglaf.org (Michael Hart)
Date: Mon, 24 Dec 2007 10:40:15 -0800 (PST)
Subject: [gutvol-d] Whitewashers (was Re: !@!Re: Why wait till we have
 to work from bookworm frass?)
In-Reply-To: <20071224072158.GB25293@mail.pglaf.org>
References: <Pine.LNX.4.64.0712211233330.9510@pglaf.org>
	<476C7097.9070106@novomail.net>
	<Pine.LNX.4.64.0712211909470.18344@pglaf.org>
	<476DF285.8030503@novomail.net>
	<20071224072158.GB25293@mail.pglaf.org>
Message-ID: <Pine.LNX.4.64.0712241030360.3896@pglaf.org>


This is the opposite of the way things usually are here.

Usually _I_ am the one to respond so strongly to persons
who are so obviously trying to push the buttons of PG, &
Greg is the one who then responds more clamly later.

Yes it is obvious that Mr. Passey is pushing PG buttons.

So what?

If Mr. Passey really has a goal in mind, it will become,
eventually, pretty clear to us all, even if that goal is
merely to muddy the PG waters and waste our time.

However, I would like to think better of Mr. Passey, and
of ourselves, and hope this will all lead us to stronger
positions in the future, as we have seen from the others
who have taken similar positions in the past.

As I have so often said, everyone can run PG better than
I/we do, which is why we run it so very little.

When it comes to eBooks, to each his own, even Passey or
Noring or anyone else. . . .

Thanks!!!

Michael S. Hart
Founder
Project Gutenberg

Recommended Books:

Dandelion Wine, by Ray Bradbury:  For The Right Brain
Atlas Shrugged, by Ayn Ran,:  For The Left Brain [or both]
Diamond Age, by Neal Stephenson:  To Understand The Internet
The Phantom Toobooth, by Norton Juster:  Lesson of Life. . .


On Sun, 23 Dec 2007, Greg Newby wrote:

> On Sat, Dec 22, 2007 at 10:30:45PM -0700, Lee Passey wrote:
>>>> Rather than pointing to examples of books which have managed
>>>> to avoid the white-washer gauntlet, it would be more useful
>>>> to explain the process used to avoid that gauntlet. I'm sure
>>>> I'm not the only one interested in the answer to /that/
>>>> question.
>>>
>>> The simple answer, as always, is just contect Newby or myself.
>>
>> Wouldn't it just be easier to instruct the white-washers to not reject
>> texts that aren't accompanied by markup-free, simple ASCII texts?
>
> There seems to be a mysticism about the whitewashers.  Drop it.
> They're volunteers, like pretty well everyone else who ever does
> anything for PG.
>
> Those very few individuals do not "reject texts that aren't
> accompanied by markup-free, simple ASCII texts."  Since Michael
> already told you a way to get around any possible "gauntlet"
> you might see before you, quit yer bitchin'.
>
> Anyway, how would you know, Lee?  I can't find any record of you
> submitting any etexts, or even a copyright clearance request,
> or having ever communicated with the whitewashers via their
> mail list, or ever emailing me with a text you want to submit
> or a question about one.  You're an outsider who is not only
> casting criticism without construct, but also besmirching the
> efforts of those who actually *do* the work.
>
> I'm leaping to the WWers defense because they don't subscribe
> to this list.  Meanwhile, I'll go back to ignoring your prattle.
> -- Greg
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From Bowerbird at aol.com  Mon Dec 24 12:10:01 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 24 Dec 2007 15:10:01 EST
Subject: [gutvol-d] ample opportunities
Message-ID: <c1f.271c2d6b.34a16c19@aol.com>

well, an opportunity to mock t.e.i. as a needlessly complicated form of 
markup,
which hasn't been implemented even though it's been "official policy" for 
years,
for instance...

or the opportunity to mock rothman's mushy-mouthed proclamations where
he tries to honor the win-friends-and-influence-people doctrine which says
you should always give your opponents some wiggle-room to "come around",
which essentially means then that you cannot take an unequivocal stand, even
against something that's purely evil, like putting locks on our cultural 
heritage,
or trying to turn e-books into cash-registers that ka-ching on every 
page-turn.

it's important to mock such things.   humor -- even the sarcasm form, which
allows your antagonists to _spin_ your criticism as being "mean-spirited" --
is one of the best ways of pointing out failures of rhetoric when they occur,
so it's extremely kind of my sparring partners to spout their various 
silliness,
because it gives me _ample_opportunities_ for humor.   i mean, otherwise,
i might have to get _serious_, and what fun is that?                 :+)

people like to laugh.   so it's good when my opponents say ridiculous 
things...

***

heck, it's even funny when my _friends_ say ridiculous things, like michael 
here:
>   If Mr. Passey really has a goal in mind, it will become,
>    eventually, pretty clear to us all, even if that goal is
>    merely to muddy the PG waters and waste our time.

for whom is this still unclear?            :+)

and if it hasn't become clear to you _yet_, then what -- pray tell --
would mr. passey have to do in order to _make_ it "become clear"?
(this is a serious question.   i mean, it might be funny, but it's serious 
too.)

michael continues:
>    However, I would like to think better of Mr. Passey, 
>    and of ourselves, and hope this will all lead us to 
>    stronger positions in the future, as we have seen from 
>    the others who have taken similar positions in the past.

it appears michael is filled with the holiday spirit.   or perhaps
he has merely been drinking too many of those holiday spirits.

whatever the case, it's certainly a nice change of pace...

and i, too, am filled with _joy_ today.   because santa claus
-- cleverly disguised as a fed-ex delivery person, sneaky! --
just brought an o.l.p.c. machine to my front door, oh yes...

my girlfriend, bless her heart, is gift-wrapping it for me,
and i'm just like a little kid looking forward to opening it!

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071224/e988b1f0/attachment.htm 

From gbnewby at pglaf.org  Mon Dec 24 14:05:54 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon, 24 Dec 2007 14:05:54 -0800
Subject: [gutvol-d] Collection development policy (Re: Why wait till we have
	to work from bookworm frass?)
In-Reply-To: <659760181.20071221173505@noring.name>
References: <Pine.LNX.4.64.0712201333050.16611@pglaf.org>
	<476C0617.1000503@novomail.net>
	<191898394.20071221144402@noring.name>
	<20071221225917.GB14265@mail.pglaf.org>
	<659760181.20071221173505@noring.name>
Message-ID: <20071224220554.GA8269@mail.pglaf.org>

Apologies to those who get annoyed by the changing subject
line, but I find it easier to track the different asynchronous
discussion themes.

On Fri, Dec 21, 2007 at 05:35:05PM -0700, Jon Noring wrote:
>.
> I would like to understand PG's "official collection development
> policy." If this is spelled out at the PG site (a Google search turned
> up nothing using that phrase), a link to it would be appreciated. I
> have an idea what it is, but since a collection development policy is
> clearly an organizational policy, the official policy has to originate
> from PGLAF.

(For those who don't know, "collection development" is a term from
librarianship...you can even take full graduate level courses in it!  A
google search turned up lots of examples.)

There is none.  There won't be one any time soon.  When I started
writing one, two years ago, I discovered that the various "FAQ" items
that Michael and I wrote obviate the need for a separate collection
development policy.


I'll paste in FAQ #0 below, from
http://www.gutenberg.org/wiki/Gutenberg:Project_Gutenberg_Mission_Statement_by_Michael_Hart
Some of the other essays in the set (under "About Us" from
the main page at www.gutenberg.org) reinforce the idea that
PG is quite open about what materials are added.

There are some things we choose not to add to the collection...  and
sometimes those simply go to other collections (such as
preprints.readingroo.ms or gutenberg.us).  Sometimes that's due to
format, or not wanting to be a vanity press, or a few other reasons that
are scattered in the HOWTO section of www.gutenberg.org.  Assembling
those reasons together would serve to itemize the few things we
generally don't add to the collection...but the better "collection
development policy" is below:


The mission of Project Gutenberg is simple:

    To encourage the creation and distribution of eBooks.

This mission is, as much as possible, to encourage all those who are
interested in making eBooks and helping to give them away.

In fact, Project Gutenberg approves about 99% of all requests from those
who would like to make our eBooks and give them away, within their
various local copyright limitiations.

Project Gutenberg is powered by ideas, ideals, and by idealism.

Project Gutenberg is not powered by financial or political power.

Therefore Project Gutenberg is powered totally by volunteers.

Because we are totally powered by volunteers we are hesitant to be very
bossy about what our volunteers should do, or how to do it.

We offer as many freedoms to our volunteers as possible, in choices of
what books to do, what formats to do them in, or any other ideas they
may have concerning "the creation and distribution of eBooks."

Project Gutenberg is not in the business of establishing standards. If
we were, we would have gladly accepted the request to convert an
exemplary portion of our eBooks into HTML when World Wide Web was a
brand new idea in 1993; we are happy to bring eBooks to our readers in
as many formats as our volunteers wish to make.

In addition, we do not provide standards of accuracy above those as
recommended by institutions such as the U.S. Library of Congress at the
level of 99.95%.

While most of our eBooks exceed these standards and are presented in the
most common formats, this is not a requirement; people are still
encouraged to send us eBooks in any format and at any accuracy level and
we will ask for volunteers to convert them to other formats, and to
incrementally correct errors as times goes on.

Many of our most popular eBooks started out with huge error levels--only
later did they come to the more polished levels seen today. In fact,
many of our eBooks were done totally without any supervision--by people
who had never heard of Project Gutenberg--and only sent to us after the
fact.

We want to continue to encourage everyone to send us eBooks, even if
they have already created some without any knowledge of who we were,
what we were doing, or how we were doing it.

Everyone is welcome to contribute to Project Gutenberg.

Thus, there are no dues, no membership requirements: and still only the
most general guidelines to making eBooks for Project Gutenberg.

We want to provide as many eBooks in as many formats as possible for the
entire world to read in as many languages as possible.

Thus, we are continually seeking new volunteers, whether to make one
single favorite book available or to make one new language available or
to help us with book after book.

Everyone is welcome here at Project Gutenberg.

Everyone is free to do their own eBooks their own way.

Written by Michael S. Hart June 20, 2004. Updated October 23, 2004.

From gbnewby at pglaf.org  Mon Dec 24 14:17:52 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon, 24 Dec 2007 14:17:52 -0800
Subject: [gutvol-d] PGLAF metadata
In-Reply-To: <20071221081236.GA782@ark.in-berlin.de>
References: <cf8.2278c23b.3495952b@aol.com>
	<01165190.20071215134141@noring.name>
	<476579C4.9000200@novomail.net>
	<20071217104548.GB7788@ark.in-berlin.de>
	<4766B6A9.5080400@novomail.net>
	<20071219093416.GA29329@ark.in-berlin.de>
	<4769A7BA.1070102@novomail.net>
	<20071220114033.GA31581@ark.in-berlin.de>
	<20071220232748.GC20405@mail.pglaf.org>
	<20071221081236.GA782@ark.in-berlin.de>
Message-ID: <20071224221752.GF8269@mail.pglaf.org>

On Fri, Dec 21, 2007 at 09:12:36AM +0100, Ralf Stephan wrote:
> > What official PGLAF metadata do you want to access?  If
> > you're just looking for copyright clearance info that identifies
> > print volumes, David Price's list is a good place to start:
> > 
> >   http://www.dprice48.freeserve.co.uk/GutIP.html
> 
> I'm sorry to say that the info does not identify print volumes.
> Especially the well known books have several editions. So, what's
> missing is
> 
> - original publishing place
> - original publishing year

We keep that info in the copyright clearance system (it's part of a
clearance request).  We do not redistribute it with eBooks or put it in
our catalog, because we're not creating analogs to particular print
editions, and don't claim that our eBooks match the particular print
source(s) they were derived from.

> Let's say we don't need the publisher because it's highly unlikely
> different editions have the same place and year. No one would need
> this info if we could access the cleared title pages, however, from
> the etext page, for example.
> 
> So, is it possible to access place/year for a work? If not, is it
> possible to get at the title scan?

There is no public front-end to the scans submitted to the copyright
clearance system.  However, I can provide them on request.  Ditto with
the other metadata submitted at clearance time.

But instead, why not just contact the eBook's producer?  The
credit line in virtually every eBook tells you who it came from.
PG can help you get in touch with a producer, if needed.

Today, many eBook producers choose to put the text from the title page
etc. in their eBooks.  Check there first.

Also, we are more frequently posting the page scans along with
eBooks.  Producers are encouraged to submit the scans, but most
have not.  Eventually, we anticipate most items from DP will have
their page scans uploaded.

To summarize, there are MANY ways of finding out more about print
sources used for particular eBooks.

We do not try to track such info in the PG online catalog or
in the eBooks themselves.
  -- Greg

PS: If someone has the skills & inclination to create a system that
provides public access to the copyright clearance metadata and scans,
linking that to released eBooks, I'd be willing to help.  We should
redirect such conversation to gutvol-p .  It's a lot harder than it
looks, due to widely varying quality & consistency in the
metadata...also due to changes to how the metadata are listed between
the cleared item, the eBook itself, and the PG catalog.  Accuracy in
such metadata is probably not achievable solely through automation.

From hart at pglaf.org  Tue Dec 25 17:39:25 2007
From: hart at pglaf.org (Michael Hart)
Date: Tue, 25 Dec 2007 17:39:25 -0800 (PST)
Subject: [gutvol-d] ample opportunities
In-Reply-To: <c1f.271c2d6b.34a16c19@aol.com>
References: <c1f.271c2d6b.34a16c19@aol.com>
Message-ID: <Pine.LNX.4.64.0712251733100.1198@pglaf.org>


On Mon, 24 Dec 2007, Bowerbird at aol.com wrote:

> well, an opportunity to mock t.e.i. as a needlessly complicated form of 
> markup, which hasn't been implemented even though it's been "official 
> policy" for years, for instance...


This is not really in the Christmas spirit, nor have I been
imbibing the other kinds of spirits, but I should add at an
opportune moment that the TEI founders I spoke with down at
Oak Ridge, Tennessee [once the nations' largest electricity
consumption per building], actually SAID, right out loud in
my face that the goal of TEI was to eliminate plain text.

This was back in the early 1990's as I recall, and may have
begun the incipient warfare between markup and plain text.

I have never been against markup for those who want it from
a personal perspective, but WAY too much perspective in the
whole markup evolution has been profit and power.

It's one thing to have a preference.

It's another thing to want that preference applied to all!

mh


>
> or the opportunity to mock rothman's mushy-mouthed proclamations where
> he tries to honor the win-friends-and-influence-people doctrine which says
> you should always give your opponents some wiggle-room to "come around",
> which essentially means then that you cannot take an unequivocal stand, even
> against something that's purely evil, like putting locks on our cultural
> heritage,
> or trying to turn e-books into cash-registers that ka-ching on every
> page-turn.
>
> it's important to mock such things.   humor -- even the sarcasm form, which
> allows your antagonists to _spin_ your criticism as being "mean-spirited" --
> is one of the best ways of pointing out failures of rhetoric when they occur,
> so it's extremely kind of my sparring partners to spout their various
> silliness,
> because it gives me _ample_opportunities_ for humor.   i mean, otherwise,
> i might have to get _serious_, and what fun is that?                 :+)
>
> people like to laugh.   so it's good when my opponents say ridiculous
> things...
>
> ***
>
> heck, it's even funny when my _friends_ say ridiculous things, like michael
> here:
>>   If Mr. Passey really has a goal in mind, it will become,
>>    eventually, pretty clear to us all, even if that goal is
>>    merely to muddy the PG waters and waste our time.
>
> for whom is this still unclear?            :+)
>
> and if it hasn't become clear to you _yet_, then what -- pray tell --
> would mr. passey have to do in order to _make_ it "become clear"?
> (this is a serious question.   i mean, it might be funny, but it's serious
> too.)
>
> michael continues:
>>    However, I would like to think better of Mr. Passey,
>>    and of ourselves, and hope this will all lead us to
>>    stronger positions in the future, as we have seen from
>>    the others who have taken similar positions in the past.
>
> it appears michael is filled with the holiday spirit.   or perhaps
> he has merely been drinking too many of those holiday spirits.
>
> whatever the case, it's certainly a nice change of pace...
>
> and i, too, am filled with _joy_ today.   because santa claus
> -- cleverly disguised as a fed-ex delivery person, sneaky! --
> just brought an o.l.p.c. machine to my front door, oh yes...
>
> my girlfriend, bless her heart, is gift-wrapping it for me,
> and i'm just like a little kid looking forward to opening it!
>
> -bowerbird
>
>
>
> **************************************
> See AOL's top rated recipes
> (http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
>

From Bowerbird at aol.com  Thu Dec 27 11:52:37 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 27 Dec 2007 14:52:37 EST
Subject: [gutvol-d] ample opportunities
Message-ID: <c0a.26ae7dde.34a55c85@aol.com>

michael said:
>    This is not really in the Christmas spirit, nor have I been
>    imbibing the other kinds of spirits, but I should add at an
>    opportune moment that the TEI founders I spoke with down 
>    at Oak Ridge, Tennessee [once the nations' largest electricity
>    consumption per building], actually SAID, right out loud in
>    my face that the goal of TEI was to eliminate plain text.

well, i'm sure they meant it in the nicest way.   really.   i'm totally 
serious.
you know, most successful competitor, nothing personal -- all that rot.

it's just too bad they lost all of their credibility in that little poker 
game...


>    This was back in the early 1990's as I recall, and may have
>    begun the incipient warfare between markup and plain text.

nah, i don't believe that...   simplicity and complexity have been warring
ever since human beings were granted their first touch of intelligence...

and actually, had i not been stirring the hornet's nest of heavy-markup
on a regular basis here, the topic would've come up quite infrequently...

plus, aside from the merry-go-rounders repeating their same old lines,
none of the heavy-markup people even bother to defend any turf here,
because they cannot dispute the inexorable move away from the model.

it's quite clear, to anyone with eyes, that light-markup is our future...

***

further, the _next_ evolutionary jump has come rather quickly upon us.

i hinted at it years ago, in some listserve messages over on bookpeople.
and a year ago, on christmas day, i unveiled solid research over on d.p.
(they didn't use the information, so i'm now re-gifting it to p.g. proper.)

to sum it in a sentence, o.c.r. from the large-scale digitization initiatives
is approaching a rather impressive accuracy, especially when combined
with post-o.c.r. book-wide correction routines, including _comparison_
with pre-existing digitizations (among them, many of the p.g. e-texts)...

our _new_ world is one where we don't need the "proofing" of our past.

***

i'm currently analyzing "moby dick", as digitized by the o.c.a., thank you.

the o.c.r. is -- on the whole -- very good.   it's not anywhere _near_ the
many-9's figures that some projects like to _claim_ as their "standard".

but, then again, very little of the actual _text_ in those projects _meets_
the standard they claim.   the university of michigan, for example, _says_
that it requires 99.95% accuracy.   but i defy you to find _any_ texts there
matching that level.   while i can point to _hundreds_ that fail to meet 
it...

so i'm not going to claim big numbers on accuracy.   not from raw o.c.r.

but at the same time, i can vouch that most of the words on most pages
are correct.   and of the things that are incorrect, many can be fixed with
_automatic_ post-o.c.r. cleanup routines.   i will give solid examples here
in the coming days, so don't bother presenting some "theoretical" counter
(i.e., examples you pull out of your butt).   and once you compare that text
with a pre-existing digitization -- like, say an e-text in project gutenberg,
which is what i'm using right now -- the list of things to look at gets 
small.
there's _certainly_ no need to eye every word, when so many of 'em match.

so -- at the end of the day -- the accuracy that you obtain is _excellent_.
it starts very good, and then escalates, rather quickly, toward perfection...

just to give you an idea, using the data that i laid out in detail over at 
d.p.
in a thread last christmas, out of a book that contained some 5700 lines,
5500+ of the lines were correctly recognized by the o.c.r. from the o.c.a.,
and the remaining 200 lines being isolated for review, via a comparison
with two other digitizations (one from google, the other a p.g. e-text)...

in other words, it wasn't just that i obtained 96.5% accuracy on the text;
it's that i had info pointing _exactly_ to the 3.5% that i needed to check...
obviously, if you only have to check 4% of a book, it goes _much_ faster.

and this is what _i_ know _i_ can do, based on the knowledge _i_ have...
i'm sure that other people have other knowledge that could jack accuracy
even higher, until we're getting phenomenal results with very little effort.

what does this mean?   and why is it of relevance to the markup question?

well, the first thing that it means is that you will be able to grab a 
scan-set
from the o.c.a. or google, and o.c.r. it, and obtain extremely accurate text.
(and this assumes that someone didn't beat you to the punch, and post it.)

more to the point, _anyone_ can do that.   on _any_ scan-set_   at _any_ 
time.
so there'll be no compelling need to "digitize" books "en masse" any more.
it can be done as a one-off, by anyone, at any time.   "digitize on demand".
and if we _do_ continue to do it "en masse", that project will go very 
fast...

distributed proofreaders -- and efforts like it -- will become unnecessary.

and that becomes relevant to the markup question because the best time
to apply heavy-markup _would_have_been_ during a digitization process,
at least when that process was being enacted by _knowledgeable_ people.

for instance, over at d.p. they've broken down the workflow into two parts,
proofing and formatting.   the proofing rounds get the _characters_ right,
and the formatting rounds "do the formatting"; that is, they apply markup.

plainly put, it is the formatting volunteers who would do the heavy-markup.

but if the entirety of distributed proofreaders is tossed out as 
"unnecessary"
-- because the o.c.r. and post-o.c.r. cleanup gets all the characters right 
--
then there won't be any formatting rounds, and markup will not get applied...

so really, it's only been in this particular timeframe -- let's say 2000-2010 
--
where heavy-markup had a "window of opportunity" for meaningful uptake...
in this "window", we were "manually" correcting scans, using human eyeballs.
and as long as they were doing _that_, they could've applied heavy-markup...

but by 2010, a mere button-click will have a scan-set digitized 
automatically,
with no humans involved, and therefore no one who could apply any markup.

so heavy-markup doesn't stand a chance any more.
it's never had a cost-benefit ratio that was any good,
but as long as you had _volunteers_ paying the costs
(or, at the very least, a _hope_ that they would do so)
you could afford to ignore the poor cost-benefit ratio.

but with no suckers left to apply heavy-markup for you,
that means you'll have to _pay_ to have it done, and that
means you'll have to have a _budget_, up-front no less,
and that complicates your implementation immensely...

heavy-markup is doomed.   i won't even bother arguing
against it in 2008, because it just ain't worth the time...

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071227/1ffa0212/attachment.htm 

From lee at novomail.net  Fri Dec 28 11:55:43 2007
From: lee at novomail.net (Lee Passey)
Date: Fri, 28 Dec 2007 12:55:43 -0700
Subject: [gutvol-d] Preservation of line endings
Message-ID: <477554BF.6000508@novomail.net>

There has been some discussion here in the past about whether or not it 
is important to preserve line endings when OCRing new texts. Personally, 
I'm ambivalent, but I recognize that others have strong feelings on the 
subject.

When using ABBYY FineReader it is possible to ask for line endings to be 
preserved when selecting HTML output (and really, there is no reason to 
make any other selection). When doing so, a <br> tag is output 
everywhere a line ends in the source image.

ABBYY is quite good at recognizing when line-ending hyphens are a result 
of line wrapping (soft hyphenation) as opposed to being part of a 
compound word (hard hyphenation). Unfortunately, when selecting the 
"keep line breaks" option in FR, the recognition of soft hyphens is lost.

In order to preserve my cake and eat it to, I have written a program 
which compares two otherwise identical output files from ABBYY (one with 
"Keep line breaks" selected and one with "Keep line breaks" unselected) 
and merges the two, resulting in a file which preserves line breaks but 
which flags all hard hyphens with the extra notation '<br 
class="hardhyphen">' when line-ending hyphenation exists in the output 
file where "Keep line breaks" was unselected.

As usual, if anyone wants a copy of this (Win32, console) program, with 
accompanying source code, contact me back channel.

From joshua at hutchinson.net  Fri Dec 28 12:18:03 2007
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri, 28 Dec 2007 20:18:03 +0000 (GMT)
Subject: [gutvol-d] Preservation of line endings
Message-ID: <783243558.231031198873083729.JavaMail.mail@webmail01>

An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071228/468a0199/attachment.htm 

From cannona at fireantproductions.com  Fri Dec 28 12:52:01 2007
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Fri, 28 Dec 2007 14:52:01 -0600
Subject: [gutvol-d] bitlet and the CD/DVD images
Message-ID: <4113C7CC6DF0474AA76DB2C130EE499A@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Hi All.

I added a link to the PG Wiki on the page
http://www.gutenberg.org/wiki/Gutenberg:The_CD_and_DVD_Project .  (Actually,
to be precise, three links.)  These links should allow one to download the
CD and DVD images via BitTorrent without having to install a BitTorrent
client.

Anyway, if anyone wants to test them out, I would be interested to hear how
it goes.

Thanks!

Aaron


- --
Skype: cannona
MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFHdWIcI7J99hVZuJcRA+cCAKCgSLMUOaVayMgOjy/AJuZRouXBJACdFSnI
oBhE8Xr9GhfdQSwj/6FyeMs=
=hLnZ
-----END PGP SIGNATURE-----

From lee at novomail.net  Fri Dec 28 14:00:43 2007
From: lee at novomail.net (Lee Passey)
Date: Fri, 28 Dec 2007 15:00:43 -0700
Subject: [gutvol-d] Preservation of line endings
In-Reply-To: <783243558.231031198873083729.JavaMail.mail@webmail01>
References: <783243558.231031198873083729.JavaMail.mail@webmail01>
Message-ID: <4775720B.5070307@novomail.net>

Joshua Hutchinson wrote:

> Congrats, Lee.  You've successfully reinvented the wheel.  ;)

Perhaps, but this one's a steel-belted radial, not 4-ply polyester.

> DP created such a utility (which uses TEXT instead of HTML output) many moons 
> ago (WinPrep, I believe is the name).

And there's the rub. Anything that requires that I discard significant 
markup to use is worthless to me. My program merges the two files 
together in such a way that I can subsequently create a file where the 
soft hyphens are displayed or not, according to style sheet settings, 
/without losing any of the other markup that ABBYY has provided/.

It seems to me that the current DP process for creating e-texts is to 
save all OCR output as simple text, discarding whatever markup the OCR 
engine is able to provide. The text is then checked for accuracy, and 
finally the markup is laboriously re-applied by hand. I'm trying to 
develop some tools that will allow me to carry the markup forward 
throughout the entire process in order to simplify the final result.

From donovan at abs.net  Fri Dec 28 15:21:06 2007
From: donovan at abs.net (D Garcia)
Date: Fri, 28 Dec 2007 18:21:06 -0500
Subject: [gutvol-d] Preservation of line endings
In-Reply-To: <4775720B.5070307@novomail.net>
References: <783243558.231031198873083729.JavaMail.mail@webmail01>
	<4775720B.5070307@novomail.net>
Message-ID: <200712281821.06799.donovan@abs.net>

On Friday 28 December 2007 17:00, Lee Passey wrote:
> Joshua Hutchinson wrote:
> > DP created such a utility (which uses TEXT instead of HTML output) many
> > moons ago (WinPrep, I believe is the name).
>
> And there's the rub. Anything that requires that I discard significant
> markup to use is worthless to me. My program merges the two files
> together in such a way that I can subsequently create a file where the
> soft hyphens are displayed or not, according to style sheet settings,
> /without losing any of the other markup that ABBYY has provided/.

I believe it actually uses RTF for its input files, so no markup is actually 
discarded unless you have changed settings in FineReader.

> It seems to me that the current DP process for creating e-texts is to
> save all OCR output as simple text, discarding whatever markup the OCR
> engine is able to provide. The text is then checked for accuracy, and
> finally the markup is laboriously re-applied by hand. I'm trying to
> develop some tools that will allow me to carry the markup forward
> throughout the entire process in order to simplify the final result.

The problem is that the markup output by FineReader is frequently wrong (false 
bold and false italic being the most common, along with false superscript 
letters/numbers in place of quotation marks) and nearly as frequently 
mispositioned around punctuation, depending on the initial quality of the 
printing and the scans.  For those cases, there's really no point in carrying 
that markup over. Adding the markup back to a clean text at least insures 
that you haven't any wrong markup, although you may miss some.

It's usually more laborious to have to remove and/or correct the incorrect 
markup, and there's a higher risk of accidentally deleting surrounding text.

D

From Bowerbird at aol.com  Fri Dec 28 16:25:49 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 28 Dec 2007 19:25:49 EST
Subject: [gutvol-d] let's get this over with
Message-ID: <c4c.26bd53e7.34a6ee0d@aol.com>

ok, let's dispense of this during 2007, so we 
won't have to sully 2008 with such nonsense.

i've noticed, from some bits of their messages
that were quoted by other people recently, that
the merry-go-rounders have tried to imply that
there are some arenas where i agree with them...

while it might be true that there are a few specific
issues where it can appear we agree on a position,
i can assure you that this is nothing more than that
proverbial stopped clock that's "correct" twice a day.

the merry-go-rounders do not _love_ project gutenberg.
i do.   that's a huge difference of opinion right at the start.

i mostly want project gutenberg to follow its _own_ "rules",
the ones it has laid out in its guidelines for its contributors
(e.g., 4 blank lines before a heading and 2 blank lines after).
the merry-go-rounders want p.g. to switch to heavy markup,
a workflow that is immensely more complicated and difficult,
one that basically stands on its head a longtime p.g. doctrine,
the very doctrine that has made p.g. the premiere cyberlibrary.

i believe that project gutenberg's dedicated focus on _readers_,
and not "scholars", gives it the most effective cost-benefit ratio.
the merry-go-rounders want to increase p.g. digitization costs
considerably, for a range of benefits that's completely unproven.
(academics will _never_ cotton to something done by volunteers,
since that would be a virtual denial of their "professional" status.)

i am comfortable with the p.g. policy that allows an e-text to be
created from multiple sources.   the merry-go-rounders hate it...

furthermore, i like the fact that p.g. uses contemporary standards,
e.g., removing the hyphen from olde-tyme words like "to-day" and
closing up the spacey punctuation , which is common in old books.
the merry-go-rounders would like to impose a cumbersome system
where changes like these would be laboriously "annotated" as such...

i'm _proud_ of michael hart, and i cherish the wisdom he has shown.
the merry-go-rounders try to paint michael as some kind of buffoon,
and imply that project gutenberg would have been better without him.
(as if project gutenberg would've even _existed_ without michael hart!)

the merry-go-rounders might want you to believe that i am like them.
but i can assure you that i am not.   nope, i've gone to extreme lengths
to explain, perfectly clearly, why i do not want to be confused as them.

in a nutshell, i am _disgusted_ by the merry-go-rounders.

yes, it's a strong word.   but it's _the_ word that describes my feelings,
and describes them _accurately_, so i'm not going to "sugar-coat" it...

are there things about project gutenberg that i'd like to see changed?
you betcha.   and i'm gonna continue telling you what those things are.

on some of these things -- such as "preservation of line-endings" --
the merry-go-rounders might even agree with me, at least according
to a subject-header that i see in my spam-folder at this very minute...
(i'm not going to read it to confirm that, because based on experience
spanning a decade, reading the merry-go-rounders is a waste of time.)

but make no mistake, this superficial kind of "agreement" means nada.
i'm disgusted by the merry-go-rounders, on the most _fundamental_
of levels, way down deep in the gut at the philosophical cornerstone...

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071228/4a234f08/attachment.htm 

From joshua at hutchinson.net  Sat Dec 29 05:41:53 2007
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat, 29 Dec 2007 13:41:53 +0000 (GMT)
Subject: [gutvol-d] Preservation of line endings
Message-ID: <1071916590.313441198935713858.JavaMail.mail@webmail01>

An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071229/669a1824/attachment.htm 

From walter.van.holst at xs4all.nl  Sat Dec 29 09:15:04 2007
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Sat, 29 Dec 2007 18:15:04 +0100
Subject: [gutvol-d] bitlet and the CD/DVD images
In-Reply-To: <4113C7CC6DF0474AA76DB2C130EE499A@blackbox>
References: <4113C7CC6DF0474AA76DB2C130EE499A@blackbox>
Message-ID: <47768098.4060608@xs4all.nl>

Aaron Cannon wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
> 
> Hi All.
> 
> I added a link to the PG Wiki on the page
> http://www.gutenberg.org/wiki/Gutenberg:The_CD_and_DVD_Project .  (Actually,
> to be precise, three links.)  These links should allow one to download the
> CD and DVD images via BitTorrent without having to install a BitTorrent
> client.
> 
> Anyway, if anyone wants to test them out, I would be interested to hear how
> it goes.

A truly splendid idea. It prompts one on how to save it and I happen to 
know that the DVD might be a .iso file. However, other users may not 
know this.

A related question, is there a chance of this bitlet client being 
available to be embedded in pages in stead of getting redirected to 
bitlet.org?

Regards,

  Walter

From walter.van.holst at xs4all.nl  Sat Dec 29 09:17:51 2007
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Sat, 29 Dec 2007 18:17:51 +0100
Subject: [gutvol-d] bitlet and the CD/DVD images
In-Reply-To: <4113C7CC6DF0474AA76DB2C130EE499A@blackbox>
References: <4113C7CC6DF0474AA76DB2C130EE499A@blackbox>
Message-ID: <4776813F.5070908@xs4all.nl>

Aaron Cannon wrote:

> Anyway, if anyone wants to test them out, I would be interested to hear how
> it goes.

Slight correction, it is a zip file of course, but the client 
nonetheless prompts the user for a filename.

Regards,

  Walter

From ricardofdiogo at gmail.com  Sat Dec 29 12:23:26 2007
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Sat, 29 Dec 2007 20:23:26 +0000
Subject: [gutvol-d] RIP Robert Marquardt
Message-ID: <9c6138c50712291223q221ce242kd531969adf3819c7@mail.gmail.com>

Robert Marquardt, PG's wiki sysop, died this morning. His work must
remain unfinished and questions to him will remain unanswered. This
was his last will.
This info was added by his brother Rolf Marquardt today at PG's wiki.
Robert joined Project Gutenberg in December 2006 to create the Science
Fiction Bookshelf. He has completed the Project Gutenberg Science
Fiction CD, which was a tremendously huge success. He also made a
promotional video for PG in Esperanto. He was now working in other
bookshelves.
Robert always worked very hard at PG, even while he was doing his hard
cancer treatments.
I'm sure that these are sad news to all volunteers. PGLAF should
perhaps take some symbolic condolences action.

Ricardo

From julio.reis at tintazul.com.pt  Sun Dec 30 17:06:45 2007
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Mon, 31 Dec 2007 01:06:45 +0000
Subject: [gutvol-d] RIP Robert Marquardt
In-Reply-To: <mailman.2.1199044802.26318.gutvol-d@lists.pglaf.org>
References: <mailman.2.1199044802.26318.gutvol-d@lists.pglaf.org>
Message-ID: <1199063205.6607.31.camel@abetarda>

> RIP Robert Marquardt

May all his friends and loved ones also find peace in this time of
grief.

J?lio aka Tintazul.


From richfield at telkomsa.net  Sun Dec 30 22:26:45 2007
From: richfield at telkomsa.net (Jon Richfield)
Date: Mon, 31 Dec 2007 08:26:45 +0200
Subject: [gutvol-d] Robert Marquardt:  A thought from Piet Hein
Message-ID: <47788BA5.8030308@telkomsa.net>

Those of you who knew the Grooks of the late (and though I am far from 
being Danish, I affirm: great) Piet Hein, may remember this one:

Giving in is no defeat
Passing on is no retreat
Selves were made to rise above
You shall live in what you love.

In gratitude to Robert and all those who labour for no greater reward 
than the satisfaction of lighting candles, whether they curse the dark 
or not, I suggest that this Grook encapsulates a concept that might help 
support such people while they work, and offer some comfort their 
friends, families, and beneficiaries after they have gone.

Go well,

Jon


From ricardofdiogo at gmail.com  Mon Dec 31 15:46:26 2007
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Mon, 31 Dec 2007 23:46:26 +0000
Subject: [gutvol-d] Etext #24073 copyrighted?
Message-ID: <9c6138c50712311546h1f439984w3c81c542a3ae8dc6@mail.gmail.com>

Does the TP&V of etext #24073 states it is pre-1923? By reading the
etext I can't see any evidence of that.
1654 is _NOT_ the edition date. It's just the year when the speech was
first said.

Ricardo

From gbnewby at pglaf.org  Mon Dec 31 17:40:26 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon, 31 Dec 2007 17:40:26 -0800
Subject: [gutvol-d] Etext #24073 copyrighted?
In-Reply-To: <9c6138c50712311546h1f439984w3c81c542a3ae8dc6@mail.gmail.com>
References: <9c6138c50712311546h1f439984w3c81c542a3ae8dc6@mail.gmail.com>
Message-ID: <20080101014025.GA18049@mail.pglaf.org>

On Mon, Dec 31, 2007 at 11:46:26PM +0000, Ricardo F Diogo wrote:
> Does the TP&V of etext #24073 states it is pre-1923? By reading the
> etext I can't see any evidence of that.
> 1654 is _NOT_ the edition date. It's just the year when the speech was
> first said.
> 
> Ricardo

Hi, Ricardo.  I didn't see a posted note for this, so am cc'ing
the pgww list.  The file does not state that it's copyrighted, and
I don't see why it would be.

  http://www.gutenberg.org/files/2/4/0/7/24073/

  -- Greg

From ricardofdiogo at gmail.com  Mon Dec 31 17:49:54 2007
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Tue, 1 Jan 2008 01:49:54 +0000
Subject: [gutvol-d] Etext #24073 copyrighted?
In-Reply-To: <20080101014025.GA18049@mail.pglaf.org>
References: <9c6138c50712311546h1f439984w3c81c542a3ae8dc6@mail.gmail.com>
	<20080101014025.GA18049@mail.pglaf.org>
Message-ID: <9c6138c50712311749h432119fcs13543af93bad5fc4@mail.gmail.com>

According to the page images, this IS NOT _ definitively_ a pre-1923
edition (I know it because of the spelling).
It is possible, however, that this _may be_ an official compilation of
Priest Vieira's sermons, in which case it _could_ have been released
into the public domain by the Brazilian insitution that published it.
If  the physical book was an ordinary edition, I'm affraid it's not in
public domain in the US under the pre-1923 rule.

Ricardo

2008/1/1, Greg Newby <gbnewby at pglaf.org>:
> On Mon, Dec 31, 2007 at 11:46:26PM +0000, Ricardo F Diogo wrote:
> > Does the TP&V of etext #24073 states it is pre-1923? By reading the
> > etext I can't see any evidence of that.
> > 1654 is _NOT_ the edition date. It's just the year when the speech was
> > first said.
> >
> > Ricardo
>
> Hi, Ricardo.  I didn't see a posted note for this, so am cc'ing
> the pgww list.  The file does not state that it's copyrighted, and
> I don't see why it would be.
>
>   http://www.gutenberg.org/files/2/4/0/7/24073/
>
>   -- Greg
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>