From jmdyck at ibiblio.org  Mon Nov  1 00:16:45 2004
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Mon Nov  1 00:17:06 2004
Subject: [gutvol-d] The release of PG etext #7000
Message-ID: <4185F0ED.D061AF37@ibiblio.org>

For those of you who wish to celebrate the
release of PG etext #7000 ("The Kalevala"),
today would appear to be the day:
    http://www.gutenberg.org/etext/7000

-Michael Dyck
From Bowerbird at aol.com  Mon Nov  1 09:34:56 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Nov  1 09:35:09 2004
Subject: [gutvol-d] linking page-scans to the text
Message-ID: <8b.18fe1d2c.2eb7cdc0@aol.com>

greg said:
>   We're planning to include the scanned page images 
>   along with eBooks.  In fact, this is part of the intent 
>   with the new directory structure for the PG servers 
>   (the /1/0/8/0/...  structure).
>
>   We haven't done any (or many, anyway) because 
>   we're still trying to figure out how to best name the page files, 
>   and how to link them on a page-by-page basis into the 
>   (marked up?) eBooks.  Jim Tinsley drafted some general guidelines 
>   for the image files themselves, but linking them to the eBooks
>   is something we need to figure out still.
>
>   (BTW, the Million Books project at archive.org uses djvu
>   for this purpose.  It's not bad, but I like our intended solution 
>   of XML markup much better.  Plus, of course, the MBP is mostly 
>   working with relatively poor quality proofreading.  For PG, 
>   the text has taken the main emphasis, not the appearance.)
>
>   My notion is that the PGTEI and TEI lite solutions I've been
>   reading about in this list will be easily adaptable to 
>   including links to specific page image files, so I've 
>   not mentioned it until now.

sometimes i feel like i'm talking to a wall...

greg, i can give you this capability _right_now_,
with your plain-text files (i.e., the whole library),
if you would only make it your policy to:
(1) include page-break information in the files, and
(2) use a sensible and consistent naming standard;
neither of these is difficult to realize in the slightest.
(if you need some input on them, i'll be happy to give it.)

if you'd like to see a demo program that does this
-- using the page-scans and text-files over at d.p. --
say so publicly (before thursday) and i'll put one up.

or continue delaying, it makes no difference to me...

-bowerbird
From traverso at dm.unipi.it  Mon Nov  1 10:53:45 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Mon Nov  1 10:53:54 2004
Subject: [gutvol-d] (no subject)
Message-ID: <200411011853.iA1IrjMD018781@posso.dm.unipi.it>


    Bowerbird> greg said:
    >> We're planning to include the scanned page images along with
    >> eBooks.  In fact, this is part of the intent with the new
    >> directory structure for the PG servers (the /1/0/8/0/...
    >> structure).
    >> 
    >> We haven't done any (or many, anyway) because we're still
    >> trying to figure out how to best name the page files, and how
    >> to link them on a page-by-page basis into the (marked up?)
    >> eBooks.  Jim Tinsley drafted some general guidelines for the
    >> image files themselves, but linking them to the eBooks is
    >> something we need to figure out still.
    >> 

    Bowerbird> greg, i can give you this capability _right_now_, with
    Bowerbird> your plain-text files (i.e., the whole library), if you
    Bowerbird> would only make it your policy to: (1) include
    Bowerbird> page-break information in the files, 

How do you include the information in the files if it has been removed?
This can at best be valid for future production. 

And moreover, how do you find the correct page when some material
(e.g. the footnotes) has been moved, and the page contents are no longer
consecutive?

I have a solution of both problems for DP-produced books using the
files output by DP before the post-processing stage; these files
correspond to individual pages of the original book, and you can find
the image corresponding to a fragment of text through a grep on the
DP-file. The concept has been implemented recently by a student, and a
test of 300 recently posted PG ebooks should be publicly available
before the end of this week. This is a part of a system for ebook
maintenance (an user can submit a proposal of correction of a text
through a web page, after consulting the original images, and an
administrator later can accept - or reject - the proposals and obtain
automatically a corrected version).

Carlo

From hmacdougall at stny.rr.com  Mon Nov  1 11:05:42 2004
From: hmacdougall at stny.rr.com (Hugh MacDougall)
Date: Mon Nov  1 11:05:40 2004
Subject: [gutvol-d] Page Breaks
References: <200411011853.iA1IrjMD018781@posso.dm.unipi.it>
Message-ID: <00b301c4c045$cae75090$331a1842@Hugh>

    I don't often enter this discussion, though I have put a number of items 
by James Fenimore Cooper and Susan Fenimore Cooper on gutenberg. Today I 
generally put them on the James Fenimore Cooper Society website, in html, in 
part because of my frustration with italics and foreign accidents.
    However, on page breaks, I have for some time (I'm my own webmaster) 
adopted the practice, in putting books (not short articles) on our website, 
of inserting the page numbers of the original in {curly brackets} which I 
generally don't use for other purposes. This not only identifies the page 
from the original one is reading (helpful both for checking and for 
bibliographic reference) and, because it is surrounded by {curly brackets} 
is easy to search for without finding other materials.
    Anyhow, it's a thought.

Hugh MacDougall, Secretary/Treasurer
James Fenimore Cooper Society
8 Lake Street, Cooperstown, NY 13326-1016
http://www.oneonta.edu/external/cooper

From Bowerbird at aol.com  Mon Nov  1 11:48:59 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Nov  1 11:49:09 2004
Subject: [gutvol-d] re: talking to the walls
Message-ID: <25.5149e49d.2eb7ed2b@aol.com>

carlo said:
>   How do you include the information in the files
>   if it has been removed?

go back to a source and get it, that's how.

where applicable, information about page-breaks
can be obtained from the d.p. proofed text-files;
it's a simple matter of matching up image-scans
with the text they contained.  (see the page that
contains a table of the scans with their text-files.)
that's why i offered a demo using that specifically.

(nonetheless, it's positively _criminal_ that we
should even have to do _anything_ to re-gain this
information, since it was _willfully_ discarded.
when is this bad practice going to be halted?)

for books not done by distributed proofreaders,
it's as easy as loading the text-file into my viewer
and clicking on each word that starts a new page
as you get that information by viewing a paper-copy.
(my viewer will then save an updated copy of the file.)
this process can be facilitated by setting the leading
so the lines-per-page is equivalent to the paper-copy,
making the task almost trivially easy (but still useful!).


>   And moreover, how do you find the correct page 
>   when some material (e.g. the footnotes) has been moved, 
>   and the page contents are no longer consecutive?

footnotes are easy.  (my viewer displays them on the page
where they are called anyway, so there's no problem there.)

and if you point me to some examples of the other "material"
that is moved, i'll be happy to tell you how i'd deal with that.


>   I have a solution of both problems for DP-produced books 
>   using the files output by DP before the post-processing stage; 

right.


>   these files correspond to individual pages of the original book, 
>   and you can find the image corresponding to a fragment of text 
>   through a grep on the DP-file. 

that's one way of doing it.

but why not run the process systematically, one time,
restoring the page-break information in the text-files,
and incorporating the ability to grab the image-scans
-- automatically and simply -- using that information.

i'm sure you know that the eyes of most users 
glaze over when you start talking about "grep".

besides, what needs to be done is to _thoroughly_incorporate_ the
error-reporting process _into_ the end-user's reading-experience,
so as to maximize the eyeballs of all the people reading the e-texts.
it's just a shame that -- at the same time readers are condemning
the e-texts because "they are full of errors" -- practically _nothing_
is being done to harness their ability to _catch_ and _report_ errors.


>   The concept has been implemented recently by a student, 
>   and a test of 300 recently posted PG ebooks should be 
>   publicly available before the end of this week. This is 
>   a part of a system for ebook maintenance (an user can 
>   submit a proposal of correction of a text through a web page, 
>   after consulting the original images, and an administrator later 
>   can accept - or reject - the proposals and obtain automatically 
>   a corrected version).

sounds like a process i described in great detail months ago here.

i'm glad somebody is programming it for you guys, because i'll be
leaving here shortly.  but i intend to write the app anyway, because
users who want to grab content from the million-book-project will
need it to turn those scans into nicely-proofed and formatted text...

-bowerbird
From Bowerbird at aol.com  Mon Nov  1 12:09:20 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Nov  1 12:09:42 2004
Subject: [gutvol-d] Page Breaks
Message-ID: <96.18f280d1.2eb7f1f0@aol.com>

hugh said:
>   However, on page breaks, I have for some time (I'm my own webmaster) 
>   adopted the practice, in putting books (not short articles) on our 
website, 
>   of inserting the page numbers of the original in {curly brackets} which I 
>   generally don't use for other purposes. This not only identifies the page 
>   from the original one is reading (helpful both for checking and for 
>   bibliographic reference) and, because it is surrounded by {curly 
brackets} 
>   is easy to search for without finding other materials.

that's a pretty good strategy.

it's still fairly obtrusive on the reading experience --
we need to recognize most users don't want to see this info
-- but i could live with that method if it became the policy.

what i would do, with my viewer-program, is to "vanish" them;
that is, i'd display them only if the user specified to show them,
and even then, i'd move them out to the margins to be discreet...

what i suggest, instead, for a standard for project gutenberg,
would be to use some of the under-32 ascii-characters to
indicate the various types of page-breaks, so they would be
invisible to an ordinary end-user with an ordinary text-viewer,
while a savvy viewer-program would be able to discern them
and show them to the occasional reader that might want them.

also -- and i'm sorry i haven't mentioned this up until now --
i think it's very important that these page-break indicators
not gunk up the text.  again, for the average user, they will be
a nuisance in most cases, and we need to minimize that nuisance.

for instance, consider the current practice in the .html versions
coming out of distributed proofreaders with page-number info.
even when the page-number display is moved out to the margin,
with c.s.s., they are still there right in the middle of the text!

so when a person selects a range of text including a page-number,
and copies it out of the browser-window, boom, that page-number
is sitting right in the middle of it.  and it's a hassle to get rid of it.
(and, as far as i know, that's the case even when you have elected to
"turn off" display of the page-numbers, but i could be wrong on that.)

one of the basic aspects of project gutenberg e-texts has always been
that you could easily copy out the text and repurpose it, and i believe
that is an important asset to protect...

-bowerbird
From bkeir at pgdp.net  Mon Nov  1 20:24:43 2004
From: bkeir at pgdp.net (bkeir@pgdp.net)
Date: Mon Nov  1 20:25:00 2004
Subject: [gutvol-d] pglaf.org settings might block some messages
In-Reply-To: <20041101071016.GA30421@pglaf.org>
References: <20041101043133.GA26281@pglaf.org>
	<lhnbo0959v8pntpal744qdm1gvk4v2bm92@4ax.com>
	<20041101071016.GA30421@pglaf.org>
Message-ID: <21029.203.11.112.2.1099369483.squirrel@203.11.112.2>

I had repeated bounces of the following message, as described. This was
sent to catalog twice and help once...


Hi

Sorry, I know this isn't the correct address, but this mail has bounced
twice now from catalog AT pglaf.org

My original message was:

Hi

Perhaps

Long, William Joseph (1866 - 1952)

http://www.gutenberg.net/catalog/world/authrec?fk_authors=744

and

Long, William J. (1866 - 1952)

http://www.gutenberg.net/catalog/world/authrec?fk_authors=3505

are the same person?

Cheers!

Bill

Here's the second bounce report:


Your message did not reach some or all of the intended recipients.

      Subject:	FW: Duplicate author?
      Sent:	27/09/2004 12:51 PM

The following recipient(s) could not be reached:

      'catalog@pglaf.org' on 29/09/2004 12:51 PM
            The message was undeliverable because the recipient specified
in the recipient postal address was not known at this address
	The MTS-ID of the original message is: c=AU;a=
;p=Matrikon;l=EXCHANGE-NCS-040927025037Z-16573


From sly at victoria.tc.ca  Tue Nov  2 10:43:17 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Nov  2 10:43:26 2004
Subject: [gutvol-d] pglaf.org settings might block some messages
In-Reply-To: <21029.203.11.112.2.1099369483.squirrel@203.11.112.2>
References: <20041101043133.GA26281@pglaf.org>
	<lhnbo0959v8pntpal744qdm1gvk4v2bm92@4ax.com>
	<20041101071016.GA30421@pglaf.org>
	<21029.203.11.112.2.1099369483.squirrel@203.11.112.2>
Message-ID: <Pine.GSO.4.58.0411021038240.5329@vtn1.victoria.tc.ca>


>From the catalog point of view, this has been settled, but it does bring
up a point...

I've seen in wikipedia and a few other places URLs constructed like the
two below, using an "author number". After amalgamating the two
author records below, one of them will no longer link to
William Joseph Long.

So this is just a warning that URLs formed like this are not necessarily
permanant.

Andrew

On Tue, 2 Nov 2004 bkeir@pgdp.net wrote:

> I had repeated bounces of the following message, as described. This was
> sent to catalog twice and help once...
>
>
> Hi
>
> Sorry, I know this isn't the correct address, but this mail has bounced
> twice now from catalog AT pglaf.org
>
> My original message was:
>
> Hi
>
> Perhaps
>
> Long, William Joseph (1866 - 1952)
>
> http://www.gutenberg.net/catalog/world/authrec?fk_authors=744
>
> and
>
> Long, William J. (1866 - 1952)
>
> http://www.gutenberg.net/catalog/world/authrec?fk_authors=3505
>
> are the same person?
>
> Cheers!
>
> Bill
>
> Here's the second bounce report:
>
>
> Your message did not reach some or all of the intended recipients.
>
>       Subject:	FW: Duplicate author?
>       Sent:	27/09/2004 12:51 PM
>
> The following recipient(s) could not be reached:
>
>       'catalog@pglaf.org' on 29/09/2004 12:51 PM
>             The message was undeliverable because the recipient specified
> in the recipient postal address was not known at this address
> 	The MTS-ID of the original message is: c=AU;a=
> ;p=Matrikon;l=EXCHANGE-NCS-040927025037Z-16573
>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From joel at oneporpoise.com  Tue Nov  2 11:06:45 2004
From: joel at oneporpoise.com (Joel A. Erickson)
Date: Tue Nov  2 11:20:41 2004
Subject: [gutvol-d] author lookup [was: pglaf.org settings might block
	some messages]
References: <20041101043133.GA26281@pglaf.org><lhnbo0959v8pntpal744qdm1gvk4v2bm92@4ax.com><20041101071016.GA30421@pglaf.org><21029.203.11.112.2.1099369483.squirrel@203.11.112.2>
	<Pine.GSO.4.58.0411021038240.5329@vtn1.victoria.tc.ca>
Message-ID: <000601c4c10f$189d2ac0$6601a8c0@JOEL>

I'm assuming numbers are used because it's easier on the programming side. 
But wouldn't it be easier for the users if it was the name instead. It 
probably has its downsides, but if author lookup was based on a name, then 
when the name was modified, the system could just look for the closest 
match(es). Or, I suppose, the author number could be forwarded, 3505 -> 744.

On a side note, the cookies for personalizing the PG skin seem to teminate 
rather quickly. Usually they last less than a day, it seems. Has anyone else 
noticed this? Perhaps I should ask Marcello.

Joel

----- Original Message ----- 
From: "Andrew Sly" <sly@victoria.tc.ca>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Tuesday, November 02, 2004 10:43 AM
Subject: Re: [gutvol-d] pglaf.org settings might block some messages


>
>>From the catalog point of view, this has been settled, but it does bring
> up a point...
>
> I've seen in wikipedia and a few other places URLs constructed like the
> two below, using an "author number". After amalgamating the two
> author records below, one of them will no longer link to
> William Joseph Long.
>
> So this is just a warning that URLs formed like this are not necessarily
> permanant.
>
> Andrew
>
> On Tue, 2 Nov 2004 bkeir@pgdp.net wrote:
>
>> I had repeated bounces of the following message, as described. This was
>> sent to catalog twice and help once...
>>
>>
>> Hi
>>
>> Sorry, I know this isn't the correct address, but this mail has bounced
>> twice now from catalog AT pglaf.org
>>
>> My original message was:
>>
>> Hi
>>
>> Perhaps
>>
>> Long, William Joseph (1866 - 1952)
>>
>> http://www.gutenberg.net/catalog/world/authrec?fk_authors=744
>>
>> and
>>
>> Long, William J. (1866 - 1952)
>>
>> http://www.gutenberg.net/catalog/world/authrec?fk_authors=3505
>>
>> are the same person?
>>
>> Cheers!
>>
>> Bill
>>
>> Here's the second bounce report:
>>
>>
>> Your message did not reach some or all of the intended recipients.
>>
>>       Subject: FW: Duplicate author?
>>       Sent: 27/09/2004 12:51 PM
>>
>> The following recipient(s) could not be reached:
>>
>>       'catalog@pglaf.org' on 29/09/2004 12:51 PM
>>             The message was undeliverable because the recipient specified
>> in the recipient postal address was not known at this address
>> The MTS-ID of the original message is: c=AU;a=
>> ;p=Matrikon;l=EXCHANGE-NCS-040927025037Z-16573
>>
>>
>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From joshua at hutchinson.net  Tue Nov  2 11:35:27 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Nov  2 11:35:39 2004
Subject: [gutvol-d] author lookup [was: pglaf.org settings might
	blocksome messages]
Message-ID: <20041102193528.14AD99E775@ws6-2.us4.outblaze.com>

But what happens when you have two different authors that share the same name?  It will happen (if it hasn't already).  Using a forward from a deprecate author number is probably not a bad idea...

Josh


----- Original Message -----
From: "Joel A. Erickson" <joel@oneporpoise.com>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] author lookup [was: pglaf.org settings might blocksome messages]
Date: Tue, 2 Nov 2004 11:06:45 -0800

> 
> I'm assuming numbers are used because it's easier on the programming side. 
> But wouldn't it be easier for the users if it was the name instead. It 
> probably has its downsides, but if author lookup was based on a name, then 
> when the name was modified, the system could just look for the closest 
> match(es). Or, I suppose, the author number could be forwarded, 3505 -> 744.
> 
> On a side note, the cookies for personalizing the PG skin seem to teminate 
> rather quickly. Usually they last less than a day, it seems. Has anyone else 
> noticed this? Perhaps I should ask Marcello.
> 
> Joel
> 
> ----- Original Message ----- 
> From: "Andrew Sly" <sly@victoria.tc.ca>
> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
> Sent: Tuesday, November 02, 2004 10:43 AM
> Subject: Re: [gutvol-d] pglaf.org settings might block some messages
> 
> 
> >
> >>From the catalog point of view, this has been settled, but it does bring
> > up a point...
> >
> > I've seen in wikipedia and a few other places URLs constructed like the
> > two below, using an "author number". After amalgamating the two
> > author records below, one of them will no longer link to
> > William Joseph Long.
> >
> > So this is just a warning that URLs formed like this are not necessarily
> > permanant.
> >
> > Andrew
> >
> > On Tue, 2 Nov 2004 bkeir@pgdp.net wrote:
> >
> >> I had repeated bounces of the following message, as described. This was
> >> sent to catalog twice and help once...
> >>
> >>
> >> Hi
> >>
> >> Sorry, I know this isn't the correct address, but this mail has bounced
> >> twice now from catalog AT pglaf.org
> >>
> >> My original message was:
> >>
> >> Hi
> >>
> >> Perhaps
> >>
> >> Long, William Joseph (1866 - 1952)
> >>
> >> http://www.gutenberg.net/catalog/world/authrec?fk_authors=744
> >>
> >> and
> >>
> >> Long, William J. (1866 - 1952)
> >>
> >> http://www.gutenberg.net/catalog/world/authrec?fk_authors=3505
> >>
> >> are the same person?
> >>
> >> Cheers!
> >>
> >> Bill
> >>
> >> Here's the second bounce report:
> >>
> >>
> >> Your message did not reach some or all of the intended recipients.
> >>
> >>       Subject: FW: Duplicate author?
> >>       Sent: 27/09/2004 12:51 PM
> >>
> >> The following recipient(s) could not be reached:
> >>
> >>       'catalog@pglaf.org' on 29/09/2004 12:51 PM
> >>             The message was undeliverable because the recipient specified
> >> in the recipient postal address was not known at this address
> >> The MTS-ID of the original message is: c=AU;a=
> >> ;p=Matrikon;l=EXCHANGE-NCS-040927025037Z-16573
> >>
> >>
> >>
> >> _______________________________________________
> >> gutvol-d mailing list
> >> gutvol-d@lists.pglaf.org
> >> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >>
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d@lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> > 
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From hyphen at hyphenologist.co.uk  Tue Nov  2 12:13:39 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Tue Nov  2 12:13:59 2004
Subject: [gutvol-d] Test from Dave F
In-Reply-To: <41768369.6050204@perathoner.de>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>
	<41768369.6050204@perathoner.de>
Message-ID: <ghqfo0lfp34d3cco8jqli988cmmm70odec@4ax.com>

Test 

-- 
Dave F

From joshua at hutchinson.net  Tue Nov  2 13:26:05 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Nov  2 13:26:13 2004
Subject: [gutvol-d] PG TEI
Message-ID: <20041102212605.3B1482F8B6@ws6-3.us4.outblaze.com>

Good afternoon, everyone!

I have a few things I want to try to get a consensus on as to HOW we want to handle some aspects of the PG TEI master document.  Some of the questions will pertain to specific people (such as the automatic inclusion of the PG header/footer) and some will pertain to everyone interested in PG TEI.  The TEI master I am using for the basis of this discussion is available at http://home.alltel.net/hutch2000/sunny/start.xml.

So, without further ado!

1 - Currently, Marcello's online converter (TEI -> HTML) automatically adds a PG standard header and footer.  (http:\\home.alltel.net/hutch2000/sunny/sunny.html)  It looks nicer (to my eyes anyway) than the monospaced header and footer that the whitewashers currently use.

However, is this a "bad thing" in the eyes of the whitewashers?  As near as I can tell, the only people of information that will need to be manually added by the whitewashers is the EBook number that is assigned to this text.  If this is placed in the TEI master, then it is automatically put into the HTML version when it is run through the TEI -> HTML converter.

If this is an "ok thing" but needs some work ... what needs changed?  Jim, you're a vocal whitewasher!  Rip this apart!  

(This question also includes suggestions for style improvements to the header/footer, too.)

***

2 - The version I have posted above has two rather significant CSS changes from the style used in Marcello's converter.  

  A) The margins have been set to 10% whitespace on the right and left.  This is a fairly arbitrary number arrived at because it is the "defacto" standard at DP.  Suggestions/comments?

  B) The paragraph markup has been changed back to HTML standard.  Marcello's original style more closely resembles TeX formatting, where there is no white space between paragraphs and each paragraph is indented.  This was jarring to me, hence the change.  Again, suggestions/comments?

  The rest of the style is as Marcello's converter made it.  It is a bit verbose by some people's standards (almost everything has a class attribute), but this can be a very good thing because it now allows CSS to affect the layout/look of nearly every aspects of the document.

***

3 - The TEI master uses rend="indent" markup in the poetry.  This validates fine, but currently the TEI -> HTML converter basically ignores the indent markup.  What I want to address here is how we want to have those indents converted.

  TEI master markup:

<lg>
<l>"I thank the goodness and the grace</l>
<l rend="indent">That on my birth have smiled,</l>
<l>And made me in these Christian days</l>
<l rend="indent">A happy English child."</l>
</lg>

  Option #1 - Convert the rend="indent" markup to & emsp ; & emsp ; (remove spaces for use).  Pro: Degrades gracefully on non-CSS enabled browsers like Lynx.  Con: Treats the indent as content.

  Option #2 - Convert the rend="indent" markup to CSS markup equivalent (my mind is going blank right now or I'd give an example).

  Option #3 - Any other ideas how to handle this?

***

4 - I used <quote rend="display"> markup for blockquotes.  This looks fine to me.  However, in previous discussions, some people did not like the rend="display" for this purpose.  As far as I am concerned, it works and doesn't seem to be a problem, but I'm willing to hear opposing arguments.

***

5 - I used <lb /> to indicate a blank line of text (commonly called a thoughtbreak over at DP).  Marcello's documentation indicates this isn't what it is truly meant for, though.  Anyone see a problem with this implementation?  Or see an improvement we should use instead?

***

6 - This work has a small example of drama markup.  It is very simple markup (verse with no partial lines), but it seems to work well.  I don't have any problems with it, but I also know that my experience with drama markup is extremely limited.  Any suggestions/concerns?

***

7 - The only other thing I can remember that was at all out of the ordinary with this text was the retention of small caps.  I used the rend="sc" markup and it worked just as I expected it to in the TEI -> HTML converter.  Any suggestions/comments/improvements?

***

I'm sure I'll remember something on the way home tonight that I forgot to mention, but that's what I can think of right now for discussion.

I'm looking forward to everyone's input.

Josh
From marcello at perathoner.de  Tue Nov  2 13:46:12 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Nov  2 13:46:24 2004
Subject: [gutvol-d] author lookup [was: pglaf.org settings might block
	some messages]
In-Reply-To: <000601c4c10f$189d2ac0$6601a8c0@JOEL>
References: <20041101043133.GA26281@pglaf.org><lhnbo0959v8pntpal744qdm1gvk4v2bm92@4ax.com><20041101071016.GA30421@pglaf.org><21029.203.11.112.2.1099369483.squirrel@203.11.112.2>	<Pine.GSO.4.58.0411021038240.5329@vtn1.victoria.tc.ca>
	<000601c4c10f$189d2ac0$6601a8c0@JOEL>
Message-ID: <41880024.1020002@perathoner.de>

Joel A. Erickson wrote:

> I'm assuming numbers are used because it's easier on the programming 
> side. But wouldn't it be easier for the users if it was the name 
> instead. It probably has its downsides, but if author lookup was based 
> on a name, then when the name was modified, the system could just look 
> for the closest match(es). Or, I suppose, the author number could be 
> forwarded, 3505 -> 744.

The canonical url for linking to an author is

   http://www.gutenberg.org/author/Mark_Twain

This is described in

   http://www.gutenberg.org/howto-link


> On a side note, the cookies for personalizing the PG skin seem to 
> teminate rather quickly. Usually they last less than a day, it seems. 
> Has anyone else noticed this? Perhaps I should ask Marcello.

They should terminate a year from issue. Maybe your browser limits the 
duration to one session?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Tue Nov  2 13:58:51 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Nov  2 13:59:02 2004
Subject: [gutvol-d] PG TEI
In-Reply-To: <20041102212605.3B1482F8B6@ws6-3.us4.outblaze.com>
References: <20041102212605.3B1482F8B6@ws6-3.us4.outblaze.com>
Message-ID: <4188031B.5020205@perathoner.de>

Joshua Hutchinson wrote:

> The rest of the style is as Marcello's converter made it.  It is a
> bit verbose by some people's standards (almost everything has a class
> attribute), but this can be a very good thing because it now allows
> CSS to affect the layout/look of nearly every aspects of the
> document.

Everything has a class attribute because this way you can use the 
generated html in a web site -- eg. for an online reader -- and the book 
and site style will not clash.

TODO: all generated styles should have the same prefix: pgtei.


> 3 - The TEI master uses rend="indent" markup in the poetry.  This
> validates fine, but currently the TEI -> HTML converter basically
> ignores the indent markup.  What I want to address here is how we
> want to have those indents converted.

I'm working on implementing indent and a few other rend attribute 
gimmicks. It will understand rend="indent" and rend="indent(n)" where n 
can be any positive or negative number.


> 5 - I used <lb /> to indicate a blank line of text (commonly called a
> thoughtbreak over at DP).  Marcello's documentation indicates this
> isn't what it is truly meant for, though.  Anyone see a problem with
> this implementation?  Or see an improvement we should use instead?

<lb ed="first folio"> is meant to record line breaks in a certain 
edition like <pb>, not to output ones.

To get a thought break enclose both "thoughts" in <divs>.

   <div type="chapter">
     <head>1.</head>
     <div>
       <p></p>
       ...
       <p></p>
     </div>
     <!-- thought break will be inserted here -->
     <div>
       <p></p>
       ...
       <p></p>
     </div>
   </div>


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joel at oneporpoise.com  Tue Nov  2 19:25:55 2004
From: joel at oneporpoise.com (Joel A. Erickson)
Date: Tue Nov  2 19:25:50 2004
Subject: [gutvol-d] author lookup
References: <20041101043133.GA26281@pglaf.org><lhnbo0959v8pntpal744qdm1gvk4v2bm92@4ax.com><20041101071016.GA30421@pglaf.org><21029.203.11.112.2.1099369483.squirrel@203.11.112.2>	<Pine.GSO.4.58.0411021038240.5329@vtn1.victoria.tc.ca><000601c4c10f$189d2ac0$6601a8c0@JOEL>
	<41880024.1020002@perathoner.de>
Message-ID: <001901c4c154$d44ebc80$6601a8c0@JOEL>

Marcello Perathoner wrote:
> The canonical url for linking to an author is
>
>   http://www.gutenberg.org/author/Mark_Twain

But why couldn't that lead to the author record, instead of search results. 
>From a user point of view, since http://www.gutenberg.org/etext/12345 leads 
to the the etext record 12345, according to some reasoning the Mark Twain 
link should lead to the Mark Twain author record. If there is no direct 
match, then it should be forwarded to the search results.

Joel 

From scott_bulkmail at productarchitect.com  Tue Nov  2 20:02:28 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Tue Nov  2 21:37:10 2004
Subject: [gutvol-d] PG TEI
In-Reply-To: <20041102212605.3B1482F8B6@ws6-3.us4.outblaze.com>
References: <20041102212605.3B1482F8B6@ws6-3.us4.outblaze.com>
Message-ID: <p06110422bdadecfae0d5@[192.168.0.52]>

>1 - Currently, Marcello's online converter (TEI -> HTML) automatically adds a PG standard header and footer.  (http:\\home.alltel.net/hutch2000/sunny/sunny.html)  It looks nicer (to my eyes anyway) than the monospaced header and footer that the whitewashers currently use.

One advantage of monospaced: it clearly distinguishes the long PG footer from the book's content.  One could instead use a smaller size and sans serif font.  (Personally, I would prefer omitting the license and just including a link.)

Also, as noted in http://classicosm.com/xml/feedbackonpgtei.html: In the PG license, section numbers such as "1.A." should appear on the same line as the text that follows -- per the original and to avoid wasting space.


>  A) The margins have been set to 10% whitespace on the right and left.  This is a fairly arbitrary number arrived at because it is the "defacto" standard at DP.  Suggestions/comments?

Looks good to me.


>  B) The paragraph markup has been changed back to HTML standard.

As you say, it's the HTML standard and thus appropriate for the default CSS.


>  The rest of the style is as Marcello's converter made it.  It is a bit verbose by some people's standards (almost everything has a class attribute), but this can be a very good thing because it now allows CSS to affect the layout/look of nearly every aspects of the document.

A few notes based on a quick look:
- class=dgp does seem to be overused.
- span class="hi" style="font-variant: small-caps;" is a bit much; how about span class="smallCaps"?

I also hate that the HTML is wrapped at 78 (or whatever) chars.  I suppose few people will edit the output, but it seems like a wasteful throwback.  Don't people have editors that wrap text???


>3 - The TEI master uses rend="indent" markup in the poetry.  This validates fine, but currently the TEI -> HTML converter basically ignores the indent markup.  What I want to address here is how we want to have those indents converted.
>
>  TEI master markup:
>
><lg>
><l>"I thank the goodness and the grace</l>
><l rend="indent">That on my birth have smiled,</l>
><l>And made me in these Christian days</l>
><l rend="indent">A happy English child."</l>
></lg>
>
>  Option #1 - Convert the rend="indent" markup to & emsp ; & emsp ; (remove spaces for use).  Pro: Degrades gracefully on non-CSS enabled browsers like Lynx.  Con: Treats the indent as content.

I think the XHTML version should be completely modern, e.g. here's one way to indent using CSS:
	.indent {margin-left:40px; margin-right:40px}

There are benefits to an "old fashioned HTML" version, but let's make that a different file, probably 4.01 transitional.


>4 - I used <quote rend="display"> markup for blockquotes.  This looks fine to me.  However, in previous discussions, some people did not like the rend="display" for this purpose.  As far as I am concerned, it works and doesn't seem to be a problem, but I'm willing to hear opposing arguments.

The issue as I understand it: q is for words spoken, quote is for text attributed to an outside source.  Either may occur inline or set off in an indented block.  So, a long "speech" by a character should (I think) be <q rend="display">.

I think the TEI tags and explanation are confusing, but that's perhaps a different issue.


>5 - I used <lb /> to indicate a blank line of text (commonly called a thoughtbreak over at DP).  Marcello's documentation indicates this isn't what it is truly meant for, though.  Anyone see a problem with this implementation?  Or see an improvement we should use instead?

Marcello suggested that a closing and opening div creates a blank line; I'm not convinced that's a good idea in general.

=====

Misc. questions:

* The following looks like a (minor) error:
	<head>Letter XVII</head>
	<p>LETTER XVII.</p>

The latter looks redundant.

* Was the italics in the original here?

	<p><hi rend="sc">Andover</hi>, <emph>May</emph> 30, 1854.</p>

* Does the original really have several pages with no paragraph breaks?
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
http://ProductArchitect.com/ - consulting
From joshua at hutchinson.net  Wed Nov  3 05:30:07 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Nov  3 05:30:11 2004
Subject: [gutvol-d] PG TEI
Message-ID: <20041103133007.BB1C04F441@ws6-5.us4.outblaze.com>


----- Original Message -----
From: Scott Lawton <scott_bulkmail@productarchitect.com>
>
> 
> Also, as noted in http://classicosm.com/xml/feedbackonpgtei.html: In the PG license, section numbers such as "1.A." should appear on the same line as the text that follows -- per the original and to avoid wasting space.
> 

I agree.  I'll work on an updated footer to forward on to Marcello.

> 
> 
> A few notes based on a quick look:
> - class=dgp does seem to be overused.
> - span class="hi" style="font-variant: small-caps;" is a bit much; how about span class="smallCaps"?
> 

You've got a point.  I'll add a todo item to go through the default style and make it more intuitive (ie change the class names where appropriate) and create classes for things like small caps.


> I also hate that the HTML is wrapped at 78 (or whatever) chars.  I suppose few people will edit the output, but it seems like a wasteful throwback.  Don't people have editors that wrap text???
> 
 
That you can blame on me.  I used Tidy to rewrap everything because of the lots and lots of playing I did with the source code.  Marcello's converter leaves the line breaks that were in the original XML source in place.

> >  Option #1 - Convert the rend="indent" markup to & emsp ; & emsp ; (remove spaces for use).  Pro: Degrades gracefully on non-CSS enabled browsers like Lynx.  Con: Treats the indent as content.
> 
> I think the XHTML version should be completely modern, e.g. here's one way to indent using CSS:
> 	.indent {margin-left:40px; margin-right:40px}
> 
> There are benefits to an "old fashioned HTML" version, but let's make that a different file, probably 4.01 transitional.
> 

I'm hoping for more discussion on this.  I've had fairly heated discussion at DP on it.

> 
> 
> 
> >5 - I used <lb /> to indicate a blank line of text (commonly called a thoughtbreak over at DP).  Marcello's documentation indicates this isn't what it is truly meant for, though.  Anyone see a problem with this implementation?  Or see an improvement we should use instead?
> 
> Marcello suggested that a closing and opening div creates a blank line; I'm not convinced that's a good idea in general.
> 

I don't like it definitely from the point of view of having to create the markup.

> =====
> 
> Misc. questions:
> 
> * The following looks like a (minor) error:
> 	<head>Letter XVII</head>
> 	<p>LETTER XVII.</p>
> 
> The latter looks redundant.
> 

My bad.  Must have done a copy/paste into the head tag instead of a cut/paste there.


> * Was the italics in the original here?
> 
> 	<p><hi rend="sc">Andover</hi>, <emph>May</emph> 30, 1854.</p>
> 

I went by the text provided by DP on this.  I didn't check closely to the original on this type of thing.

> * Does the original really have several pages with no paragraph breaks?

Yep.  Makes for easy reading, huh?  ;)


Josh
From Bowerbird at aol.com  Wed Nov  3 12:33:31 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov  3 12:33:48 2004
Subject: [gutvol-d] 
	thoroughly depressed, and off to collect thoughts in blog
Message-ID: <d2.1b0b1992.2eba9a9b@aol.com>


well, i am thoroughly depressed by the 
specter of four more years of george w.

anyway, i will be heading out of here shortly
-- off to collect my thoughts and criticisms in a blog,
instead of attempting to share them with you here, to
be free of the incessant swirl of noise and flack that
my detractors here seem to love to throw around me --
so does anyone have any questions for me before i leave?

-bowerbird
From nwolcott2 at kreative.net  Thu Nov  4 11:30:12 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Thu Nov  4 13:21:27 2004
Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders
Message-ID: <006201c4c2b4$2d233380$0e9495ce@net>

A new team Jules Verne 2005 has been set up at DP to help get Verne online by the 100th anniversary in March 2005. DP members can either join the team or read and post messages without joining at 
   http://www.pgdp.net/c/stats/teams/tdetail.php?tid=353  

Team members can assist with a number of activities connected with the project. Bi-lingual persons are especially needed for the french texts, and clarification of obscure references.


nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041104/5f3c8921/attachment.html
From jeroen at bohol.ph  Thu Nov  4 14:34:50 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Thu Nov  4 14:33:48 2004
Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders
In-Reply-To: <006201c4c2b4$2d233380$0e9495ce@net>
References: <006201c4c2b4$2d233380$0e9495ce@net>
Message-ID: <418AAE8A.5040107@bohol.ph>

Norm Wolcott wrote:

> A new team Jules Verne 2005 has been set up at DP to help get Verne 
> online by the 100th anniversary in March 2005. DP members can either 
> join the team or read and post messages without joining at
>    http://www.pgdp.net/c/stats/teams/tdetail.php?tid=353 
>  

We are also planning to do a number of works of Jules Verne in Dutch 
translations, hopefully to go on the wave...

Jeroen Hellingman.

From hyphen at hyphenologist.co.uk  Thu Nov  4 19:07:02 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Thu Nov  4 19:07:33 2004
Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders
In-Reply-To: <006201c4c2b4$2d233380$0e9495ce@net>
References: <006201c4c2b4$2d233380$0e9495ce@net>
Message-ID: <ugrlo01bk03ropk0t0mkvmc11kn0dl3icd@4ax.com>

On Thu, 4 Nov 2004 14:30:12 -0500,  "Norm Wolcott" <nwolcott2@kreative.net>
wrote:

| This is a multi-part message in MIME format.
| 
| --===============1330105761==
| Content-Type: multipart/alternative;
| 	boundary="----=_NextPart_000_0053_01C4C27A.CB04BAE0"
| 
| This is a multi-part message in MIME format.
| 
| ------=_NextPart_000_0053_01C4C27A.CB04BAE0

And so looks a shambles on Agent set to a80.

-- 
Dave F


From shalesller at writeme.com  Thu Nov  4 21:20:49 2004
From: shalesller at writeme.com (D. Starner)
Date: Thu Nov  4 23:36:48 2004
Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders
Message-ID: <20041105052049.923344BE64@ws1-1.us4.outblaze.com>

Dave Fawthrop writes:

> And so looks a shambles on Agent set to a80. 

MIME has been a standard - an RFC - for quite some time
now. I'm not sure that mail agents that don't support that
should still be a big concern.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From nwolcott2 at kreative.net  Fri Nov  5 06:13:27 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Fri Nov  5 06:29:43 2004
Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders
References: <006201c4c2b4$2d233380$0e9495ce@net> <418AAE8A.5040107@bohol.ph>
Message-ID: <006501c4c343$d5fefea0$2d9495ce@net>

If you can send copyright clearance and a single file text and html that
would speed things up immeasurably!
nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
----- Original Message -----
From: "Jeroen Hellingman" <jeroen@bohol.ph>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Thursday, November 04, 2004 5:34 PM
Subject: Re: [gutvol-d] New Jules Verne Team at Dist Proofreaders


> Norm Wolcott wrote:
>
> > A new team Jules Verne 2005 has been set up at DP to help get Verne
> > online by the 100th anniversary in March 2005. DP members can either
> > join the team or read and post messages without joining at
> >    http://www.pgdp.net/c/stats/teams/tdetail.php?tid=353
> >
>
> We are also planning to do a number of works of Jules Verne in Dutch
> translations, hopefully to go on the wave...
>
> Jeroen Hellingman.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From nwolcott2 at kreative.net  Fri Nov  5 06:15:05 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Fri Nov  5 06:29:43 2004
Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders
References: <006201c4c2b4$2d233380$0e9495ce@net>
	<ugrlo01bk03ropk0t0mkvmc11kn0dl3icd@4ax.com>
Message-ID: <006601c4c343$d6c86920$2d9495ce@net>

What do you think happened to the message?

nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
----- Original Message -----
From: "Dave Fawthrop" <hyphen@hyphenologist.co.uk>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Thursday, November 04, 2004 10:07 PM
Subject: Re: [gutvol-d] New Jules Verne Team at Dist Proofreaders


> On Thu, 4 Nov 2004 14:30:12 -0500,  "Norm Wolcott"
<nwolcott2@kreative.net>
> wrote:
>
> | This is a multi-part message in MIME format.
> |
> | --===============1330105761==
> | Content-Type: multipart/alternative;
> | boundary="----=_NextPart_000_0053_01C4C27A.CB04BAE0"
> |
> | This is a multi-part message in MIME format.
> |
> | ------=_NextPart_000_0053_01C4C27A.CB04BAE0
>
> And so looks a shambles on Agent set to a80.
>
> --
> Dave F
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From nwolcott2 at kreative.net  Fri Nov  5 06:16:39 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Fri Nov  5 06:29:45 2004
Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders
References: <20041105052049.923344BE64@ws1-1.us4.outblaze.com>
Message-ID: <006701c4c343$d79974c0$2d9495ce@net>

The message went to other sites ok. looks like none of my posts are getting
through why?
nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
----- Original Message -----
From: "D. Starner" <shalesller@writeme.com>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Friday, November 05, 2004 12:20 AM
Subject: Re: [gutvol-d] New Jules Verne Team at Dist Proofreaders


> Dave Fawthrop writes:
>
> > And so looks a shambles on Agent set to a80.
>
> MIME has been a standard - an RFC - for quite some time
> now. I'm not sure that mail agents that don't support that
> should still be a big concern.
> --
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From nwolcott2 at kreative.net  Fri Nov  5 06:28:27 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Fri Nov  5 06:29:53 2004
Subject: [gutvol-d] Test message why not going through
Message-ID: <006901c4c343$dcf845e0$2d9495ce@net>

This is a test message which gets scrambled somehow on the way why? 

nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041105/df54f5c5/attachment.html
From hyphen at hyphenologist.co.uk  Fri Nov  5 08:05:24 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Fri Nov  5 08:05:46 2004
Subject: [gutvol-d] Test message why not going through
In-Reply-To: <006901c4c343$dcf845e0$2d9495ce@net>
References: <006901c4c343$dcf845e0$2d9495ce@net>
Message-ID: <419no09ge87dkrogjmb2t77t31u7d6e8n9@4ax.com>

On Fri, 5 Nov 2004 09:28:27 -0500,  "Norm Wolcott" <nwolcott2@kreative.net>
wrote:

| This is a test message which gets scrambled somehow on the way why? 
| 
| nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest

Because it had an attachment, and so gets deleted by spam filters?

-- 
Dave F


From nwolcott2 at kreative.net  Sat Nov  6 07:30:51 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Sat Nov  6 08:12:11 2004
Subject: [gutvol-d] Test message why not going through
References: <006901c4c343$dcf845e0$2d9495ce@net>
	<419no09ge87dkrogjmb2t77t31u7d6e8n9@4ax.com>
Message-ID: <00b101c4c41b$505c3dc0$5b9495ce@net>

The message had no attachments. the Mime attachments must have been
automatically generated as the message was sent.

I have sent other test messages, none arrive. I believe someone has removed
me from the listserve. I recieved 3 messages saying my messages have been
received  at lists dot pglaf dot org . But none of them have showed up in my
mailbox, nor have the been returned to me by a bot.

Who is in charge of the listserve now that g newby has moved?
nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
----- Original Message -----
From: "Dave Fawthrop" <hyphen@hyphenologist.co.uk>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Friday, November 05, 2004 11:05 AM
Subject: Re: [gutvol-d] Test message why not going through


> On Fri, 5 Nov 2004 09:28:27 -0500,  "Norm Wolcott"
<nwolcott2@kreative.net>
> wrote:
>
> | This is a test message which gets scrambled somehow on the way why?
> |
> | nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood
Forrest
>
> Because it had an attachment, and so gets deleted by spam filters?
>
> --
> Dave F
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From marcello at perathoner.de  Sat Nov  6 10:32:26 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Nov  6 10:32:35 2004
Subject: [gutvol-d] Test message why not going through
In-Reply-To: <00b101c4c41b$505c3dc0$5b9495ce@net>
References: <006901c4c343$dcf845e0$2d9495ce@net>	<419no09ge87dkrogjmb2t77t31u7d6e8n9@4ax.com>
	<00b101c4c41b$505c3dc0$5b9495ce@net>
Message-ID: <418D18BA.1080707@perathoner.de>

Norm Wolcott wrote:

> I have sent other test messages, none arrive. I believe someone has removed
> me from the listserve. I recieved 3 messages saying my messages have been
> received  at lists dot pglaf dot org . But none of them have showed up in my
> mailbox, nor have the been returned to me by a bot.

Go to lists.pglaf.org and change your settings.

There is an option you must set if you want to get your own messages.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From hyphen at hyphenologist.co.uk  Sat Nov  6 11:06:32 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sat Nov  6 11:06:52 2004
Subject: [gutvol-d] Test message why not going through
In-Reply-To: <418D18BA.1080707@perathoner.de>
References: <006901c4c343$dcf845e0$2d9495ce@net>	<419no09ge87dkrogjmb2t77t31u7d6e8n9@4ax.com>
	<00b101c4c41b$505c3dc0$5b9495ce@net>
	<418D18BA.1080707@perathoner.de>
Message-ID: <f48qo0tmncon8b19bieeirm0evrhlht9f5@4ax.com>

On Sat, 06 Nov 2004 19:32:26 +0100,  Marcello Perathoner
<marcello@perathoner.de> wrote:

| Norm Wolcott wrote:
| 
| > I have sent other test messages, none arrive. I believe someone has removed
| > me from the listserve. I recieved 3 messages saying my messages have been
| > received  at lists dot pglaf dot org . But none of them have showed up in my
| > mailbox, nor have the been returned to me by a bot.
| 
| Go to lists.pglaf.org and change your settings.
| 
| There is an option you must set if you want to get your own messages.

Now that is a totally *daft* idea.

-- 
Dave F


From hyphen at hyphenologist.co.uk  Sat Nov  6 11:05:15 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sat Nov  6 11:07:49 2004
Subject: [gutvol-d] Test message why not going through
In-Reply-To: <00b101c4c41b$505c3dc0$5b9495ce@net>
References: <006901c4c343$dcf845e0$2d9495ce@net>
	<419no09ge87dkrogjmb2t77t31u7d6e8n9@4ax.com>
	<00b101c4c41b$505c3dc0$5b9495ce@net>
Message-ID: <jk7qo01jvm2v1p2kbv9bbfa3iua6vof113@4ax.com>

On Sat, 6 Nov 2004 10:30:51 -0500,  "Norm Wolcott" <nwolcott2@kreative.net>
wrote:

| The message had no attachments. the Mime attachments must have been
| automatically generated as the message was sent.

When it got to me it had an *html* attachment. Had my spam trap system not
caught it as something interesting and put it in my Gutenberg Directory, It
would have gone straight into trash with all the other html rubbish.

Had I seen it in the inbox, I would have punched "delete" on seeing the
html bit without reading the subject line.

400 spam per day here. :-(

Only plain text emails get through reliably.

-- 
Dave F


From sly at victoria.tc.ca  Tue Nov  9 10:23:05 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Nov  9 10:23:12 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>


== Resent message; It was bounced the first time ==


On Tue, 9 Nov 2004, Steve Thomas wrote:

> This was all well and good, and eventually we ended up with
> around 3,800 records for PG titles in our catalogue.
>
> However, the advent of DP put paid to all that. The volume of
> works appearing each month very quickly overwhelmed me, and I
> was forced to abandon the effort, so that an unfortunate side
> effect of DP was that I could no longer add MARC records to our
> catalogue.

I believe something like this is also faced by John Mark Ockerbloom,
who maintains the Online Books page. He has cataloged a large portion
of PG, as well as thousands of online books from other sources.
However, as you say, one person cannot keep up with the increasing
number of old books being digitized.

> I believe that recent changes and enhancements to the PG archive
>   may make a similar effort possible once more. First, I am told
> that there is now an XML file of the PG database, and that this
> contains much more and better detail than the old GUTINDEX list.

I would qualify this with a "yes, but..."
Yes, this does exist (see the link Greg gave, or here's
a link directly to the compressed rdf file:
http://www.gutenberg.org/feeds/catalog.rdf.bz2)
But, as is PG custom, it has its own inconsistancies.
All new records are generated automatically from
information in the headers of newly posted files
(and this is not always accurate) Many older records
were copied from the old catalog from promo.net,
which sometimes had "interesting" variations.

Many records have additional information such as subject headings
LOC classifications and sometimes other material of bibliographical
interest in a "notes" field. But many records have only
very basic information. Additional information is generally added
when one of the volunteers who has write access to the catalog
takes an interest in looking it up. So this happens somewhat
irregularly.

Taken all together, the PG online catalog does present plently
of information that can help people interact with the collection
in meaningful ways; but it may make professional librarians
roll their eyes.

> Second, PG now has a neater way of accessing texts,
> using a simple URL like http://www.gutenberg.org/etext/1234
> Previously, one could only link directly to the individual files
> in the archive, and this complicated matters, since every title
> has at least two files (.txt and .zip) and often there are
> multiple versions and formats.

Yes. In my own opinion, the ability to do this is perhaps the
best thing to have happened for PG in the last year. This provides
a much more ideal way to link to a PG title from any place such
as newsgroups, websites, catalogs, whatever. (Thanks Marcello!)
This also makes it easier to present selections from PG, organized
by whatever criteria you choose. (eg, Marcello's list of "Top 100"
downloads, my list of Canadiana.) All of this only encourages
more exposure for PG, and a greater chance that some computer
user will come across (perhaps by accident) a PG text that
interests him.

> Of course, one has to ask whether the effort of creating and
> *maintaining* catalogue records for PG is worth while. We live
> in the age of Google, and it is a lament frequently heard from
> librarians that the user is more often likely to search the 'net
> with Google than to use the Library catalogue.

I believe the effort is worth while. Good cataloging can
lead to a user finding an item of interest that may have been
missed otherwise. And yes, google does index the PG "bibrec"
pages, so any additional work done in cataloging could
possibly lead to a text being found from someone searching
with google.

> However, redundancy is no bad thing with information, and the
> more ways of getting at it the better -- so long as those ways
> remain accurate. So I believe many libraries would welcome the
> chance to load marc records pointing at PG texts -- provided
> that they can be sure the record contents are accurate and the
> links remain so.

At this point in time, I would say a good deal of manual tweaking
would be needed to get a result that would be somewhat satisfactory
for librarians. Links should not be a problem, as the canonical
URLs discussed above show every sign of being much more permanent
than most.


Andrew

From marcello at perathoner.de  Tue Nov  9 11:06:20 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Nov  9 11:06:34 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
Message-ID: <4191152C.9080702@perathoner.de>

Andrew Sly wrote:

> Taken all together, the PG online catalog does present plently
> of information that can help people interact with the collection
> in meaningful ways; but it may make professional librarians
> roll their eyes.

The design philosophy of the catalog database is:

   To help people find a book they may want to read.

That includes both, people who already know which book they want and 
people who want a suggestion.

The catalog database was not designed to be a tool for professionals. 
But this doesn't mean that I'm not willing to add some functions to help 
them out, so long as those functions don't get in the way of the primary 
functionality.

Producing MARC records out of existing catalog entries seems to be a 
pretty forward thing. Importing other people's MARC into our database 
will be much hairier.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From aakman at csufresno.edu  Tue Nov  9 11:26:21 2004
From: aakman at csufresno.edu (Alev Akman)
Date: Tue Nov  9 11:26:29 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <4191152C.9080702@perathoner.de>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<4191152C.9080702@perathoner.de>
Message-ID: <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu>

At 11:06 AM 11/9/2004, you wrote:

>Andrew Sly wrote:
>
>>Taken all together, the PG online catalog does present plently
>>of information that can help people interact with the collection
>>in meaningful ways; but it may make professional librarians
>>roll their eyes.
>
>The design philosophy of the catalog database is:
>
>   To help people find a book they may want to read.
>
>That includes both, people who already know which book they want and 
>people who want a suggestion.
>
>The catalog database was not designed to be a tool for professionals. But 
>this doesn't mean that I'm not willing to add some functions to help them 
>out, so long as those functions don't get in the way of the primary 
>functionality.
>
>Producing MARC records out of existing catalog entries seems to be a 
>pretty forward thing.

Obviously it is not an _easy_ pretty forward thing! Otherwise, the whole 
thing would be in place by now.

On the other hand, PG database may not be capable of the Z39.50 imports but 
there are many MANY (if not all!) library cataloging software packages that 
will do it in a short time. The advantage of importing from the existing 
catalog entries is that we have our pick of what fits our needs for 
especially the subject fields. Of course there is always work to edit and 
customize them for the PG user database.

I don't see why we can't have a commercial software to do most of the work 
and keep the existing catalog as a backup.

And for the record, I have been involved in the PG cataloging effort for 
more than six years and anyone who says I am not interested in it any more 
is clearly not aware of the full facts. It may be quite disappointing when 
one's years of volunteer efforts have been deleted with the "new improvements"!

Alev.
an "official" librarian


>  Importing other people's MARC into our database will be much hairier.
>
>
>
>--
>Marcello Perathoner
>webmaster@gutenberg.org
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>
>
>---
>Incoming mail is certified Virus Free.
>Checked by AVG anti-virus system (http://www.grisoft.com).
>Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
-------------- next part --------------

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
From lynne at rhodesresearch.biz  Tue Nov  9 18:11:28 2004
From: lynne at rhodesresearch.biz (Lynne Anne Rhodes)
Date: Tue Nov  9 18:10:21 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<4191152C.9080702@perathoner.de>
	<6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu>
Message-ID: <200411091911.28485.lynne@rhodesresearch.biz>

I'm new around here so please forgive me if I go over old ground. 

I have subscribed to the RSS Recently Posted or Updated feeds and it is truely 
amazing to see the way the entries roll in every night. However, it is 
frustrating to see when one of the entries is opened up there is little 
information apart from the author (with or without dates) and a title. In 
most cases I have no idea what the book is about and whether I am interested 
in it.

I, and I am sure many others would love to see a bit more detail such as the 
original date of publication and a brief synopsis of the work. Obviously to 
enter such information day after day with such a rush of material is far 
beyond the resources of a small group of volunteers, however, dedicated.

Would it not be possible to devise a distributed cataloguing system followng 
along the model of DP. For each book "in the frame" a form would be provided 
with spaces for the required items. When these were completed (and checked) 
the data would then be transferred, in an agreed format--MARC or 
otherwise,--to a file held within the books directory tree. In many cases 
this information is provided at the time of proofreadng and then it seems to 
be lost.

Obviously some of the infomation might be easy to complete such as book or 
serial. However other fields might need research such as key dates, author 
bio etc. Also a meaningful synopsis would mean most likely reading the text 
or abstracting a portion from another work. I could also see that 
multilingual versions might be needed. I would think there are many who would 
rise to the challenge of helping in such an endevour,

Lynne


On Tuesday 09 November 2004 12:26 pm, Alev Akman wrote:
> At 11:06 AM 11/9/2004, you wrote:
> >Andrew Sly wrote:
> >>Taken all together, the PG online catalog does present plently
> >>of information that can help people interact with the collection
> >>in meaningful ways; but it may make professional librarians
> >>roll their eyes.
> >
> >The design philosophy of the catalog database is:
> >
> >   To help people find a book they may want to read.
> >
> >That includes both, people who already know which book they want and
> >people who want a suggestion.
> >
> >The catalog database was not designed to be a tool for professionals. But
> >this doesn't mean that I'm not willing to add some functions to help them
> >out, so long as those functions don't get in the way of the primary
> >functionality.
> >
> >Producing MARC records out of existing catalog entries seems to be a
> >pretty forward thing.
>
> Obviously it is not an _easy_ pretty forward thing! Otherwise, the whole
> thing would be in place by now.
>
> On the other hand, PG database may not be capable of the Z39.50 imports but
> there are many MANY (if not all!) library cataloging software packages that
> will do it in a short time. The advantage of importing from the existing
> catalog entries is that we have our pick of what fits our needs for
> especially the subject fields. Of course there is always work to edit and
> customize them for the PG user database.
>
> I don't see why we can't have a commercial software to do most of the work
> and keep the existing catalog as a backup.
>
> And for the record, I have been involved in the PG cataloging effort for
> more than six years and anyone who says I am not interested in it any more
> is clearly not aware of the full facts. It may be quite disappointing when
> one's years of volunteer efforts have been deleted with the "new
> improvements"!
>
> Alev.
> an "official" librarian
>
> >  Importing other people's MARC into our database will be much hairier.
> >
> >
> >
> >--
> >Marcello Perathoner
> >webmaster@gutenberg.org
> >
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d@lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> >
> >
> >
> >---
> >Incoming mail is certified Virus Free.
> >Checked by AVG anti-virus system (http://www.grisoft.com).
> >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
From shalesller at writeme.com  Tue Nov  9 15:07:38 2004
From: shalesller at writeme.com (D. Starner)
Date: Tue Nov  9 20:33:48 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com>

Marcello Perathoner writes:
> The design philosophy of the catalog database is: 
> 
> To help people find a book they may want to read. 

It does a pretty horrid job at that, then. If you
don't know what you're looking for, it's very hard
to find it. One step might be making the list of
LoC classifications available, so you can scroll down
to the list of histories.

When I'm looking for something to read, I often look
for a list of science-fiction or mysteries. Being in
a college library, I miss the spine stickers loudly
identifying the genre of the fiction. PG's catalog
has nothing in that direction.

Another thing I will do is to browse the stacks. I
guess if the LoC classifications are available, that
would be possible.

The thing I would honestly like is the Amazon-style "if
you liked this, you might like ...". 

I don't mean to be harsh in this email, but I'm having
a real hard time believing your statement, because
the catalog so badly sucks at it. Not that most of the
library catalogs I've dealt with have been good at it,
but it's never been stated as a design philosophy for them.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From sly at victoria.tc.ca  Tue Nov  9 21:23:16 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Nov  9 21:23:37 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com>
References: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com>
Message-ID: <Pine.GSO.4.58.0411092119280.7270@vtn1.victoria.tc.ca>


On Tue, 9 Nov 2004, D. Starner wrote:

>
> When I'm looking for something to read, I often look
> for a list of science-fiction or mysteries. Being in
> a college library, I miss the spine stickers loudly
> identifying the genre of the fiction. PG's catalog
> has nothing in that direction.

If it helps, I've assembled a small list of PG books that
would fall under the heading of science fiction.
I haven't done anything with it yet, as I feel it's
rather on the small side, and surely misses many of
the examples which we have.

Another catagory that could be of interest to some
is cook books, of which there are now quite a decent number
in PG.


Andrew
From shalesller at writeme.com  Tue Nov  9 19:46:32 2004
From: shalesller at writeme.com (D. Starner)
Date: Tue Nov  9 21:27:07 2004
Subject: [gutvol-d] draft TEI conventions and larger example file
Message-ID: <20041110034632.5655F164005@ws1-4.us4.outblaze.com>

<Jeroen.Hellingman@kabelfoon.nl> writes:

> Isn't sco gaelic?

No. sco has the name "Scots"; gd has the name "Scottish Gaelic". Since
they're distinct codes, sco must be the germanic language.
 
> I use foreign exclusively as a holder for the lang attribute.

My problem with that, is it means there's no way to transform a 
document such that the foreign words are marked differently from 
the emphasized words, or not marked at all. 
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From traverso at dm.unipi.it  Tue Nov  9 22:08:18 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Tue Nov  9 22:08:36 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <200411091911.28485.lynne@rhodesresearch.biz> (message from Lynne
	Anne Rhodes on Tue, 9 Nov 2004 19:11:28 -0700)
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<4191152C.9080702@perathoner.de>
	<6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu>
	<200411091911.28485.lynne@rhodesresearch.biz>
Message-ID: <200411100608.iAA68I6P016938@posso.dm.unipi.it>

>>>>> "Lynne" == Lynne Anne Rhodes <lynne@rhodesresearch.biz> writes:

    Lynne> I, and I am sure many others would love to see a bit more
    Lynne> detail such as the original date of publication and a brief
    Lynne> synopsis of the work. Obviously to enter such information
    Lynne> day after day with such a rush of material is far beyond
    Lynne> the resources of a small group of volunteers, however,
    Lynne> dedicated.

DP would be delighted of preserving these data. Most books that pass
through DP are accompanied by a small html page that describes the
author, the book, etc; and the data on the original book are preserved
in proofreading, and often deleted in post-processing.

We have also discussed keeping a catalogue of our books, with this kind
of additional information.

One of the problems is copyright: most of the info on the author are
taken from sources that could not resist a clearance procedure
(i.e. are raided from other sites). So this cannot be integrate with
the PG catalogue; but might build the core of an added-value site that
maintains a PG catalogue adding information and classification
data. The PG catalogue remains authoritative and terse, but you can
get additional features. Exactly as with many etexts, for which sites
exist that add formats for PG ebooks.

The first step however is to have better PG records, and a method to
avoid losing information from DP to the PG catalogue.

Carlo


From Bowerbird at aol.com  Tue Nov  9 23:42:11 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Nov  9 23:42:35 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <6EBFB319.08D194C6.023039A8@aol.com>

david starner said:
>   It does a pretty horrid job at that

hey, before i skip on out of here, 
i get to agree with david for once.

the catalog just ain't gonna help someone
know what kind of book they'd like to read.
(and a marc record won't help them either.)

that's a job that collaborative filtering
will eventually do much better than anything
you can do in the form of a catalog of any type.

(and the collaborative filtering that amazon uses
is absolutely primitive compared to what it could be.)

just make e-texts.  clean, consistent, clear e-texts.
do that, and the rest will take care of itself...

-bowerbird

From marcello at perathoner.de  Wed Nov 10 01:12:39 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 10 01:13:01 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com>
References: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com>
Message-ID: <4191DB87.7020904@perathoner.de>

D. Starner wrote:

> It does a pretty horrid job at that, then. If you
> don't know what you're looking for, it's very hard
> to find it. One step might be making the list of
> LoC classifications available, so you can scroll down
> to the list of histories.

We already have LoC class as a search criterium. What we lack is the data.

Are you volunteering to type the data in ?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Wed Nov 10 02:29:14 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 10 02:29:36 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>	<4191152C.9080702@perathoner.de>
	<6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu>
Message-ID: <4191ED7A.3020203@perathoner.de>

Alev Akman wrote:

> Obviously it is not an _easy_ pretty forward thing! Otherwise, the whole 
> thing would be in place by now.

Nobody requested that feature before. And, to be exact, nobody is 
requesting that feature now. Its just some of us *think* that libraries 
could use that. As a rule, I'm not putting work into features that maybe 
nobody will use.


> On the other hand, PG database may not be capable of the Z39.50 imports 
> but there are many MANY (if not all!) library cataloging software 
> packages that will do it in a short time. The advantage of importing 
> from the existing catalog entries is that we have our pick of what fits 
> our needs for especially the subject fields. Of course there is always 
> work to edit and customize them for the PG user database.
 >
 > I don't see why we can't have a commercial software to do most of the
 > work and keep the existing catalog as a backup.

  - Does it provide web access for users?
  - For catalogers?
  - How much will an unlimited worldwide public access license cost?
  - Will it run on Linux/Apache?
  - Will it manage our files?
  - Will it provide download links for the files?
  - Do we get the source code to adapt it to our particular needs?

I think any commercial library-use-oriented catalog software will fall 
far short of what we have now. We don't need so much of a catalog 
system. What we need is a web shop system ? la Amazon. But I have my 
doubts they will give us theirs.

The problems with MARC are:

  - the standard is not free.
  - the records are not free.
  - the technology is obsolete

I don't know what the copyright status of the LoC MARC records is. They 
are an US government agency, so they should be free. But do we know?

To request a MARC record I have to implement an obscure Z39.50 protocol. 
  And I get back a record full of numeric codes that I have to look up 
before knowing what they are. Why can't I simply post a HTTP request and 
get an XML/RDF answer?

Which MARC record should we import for a book. If you search thru the 
LoC catalog you'll find many examples of works that have got different 
MARC subject classifications for the different copies held by the LoC.

LoC class codes have shifted semantically over the years. What was XY in 
1970 will not necessarily be XY in 2000. So you'll have to keep the LoC 
class code, the year the classification was made and the list of class 
codes that was authoritative in that year. Of course same goes for Dewey 
etc.


> And for the record, I have been involved in the PG cataloging effort for 
> more than six years and anyone who says I am not interested in it any 
> more is clearly not aware of the full facts. 

I didn't say that. I said Greg and me wanted to get you as manager of 
the catalog team but last time I mailed Greg about it he said he got no 
answer from you. Your last post on this list was on 3/18.


> It may be quite 
> disappointing when one's years of volunteer efforts have been deleted 
> with the "new improvements"!

I don't know of any data that has willfully been deleted. Please give an 
example.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Wed Nov 10 02:33:48 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 10 02:34:11 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <Pine.GSO.4.58.0411092119280.7270@vtn1.victoria.tc.ca>
References: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com>
	<Pine.GSO.4.58.0411092119280.7270@vtn1.victoria.tc.ca>
Message-ID: <4191EE8C.7030109@perathoner.de>

Andrew Sly wrote:

> If it helps, I've assembled a small list of PG books that
> would fall under the heading of science fiction.
> I haven't done anything with it yet, as I feel it's
> rather on the small side, and surely misses many of
> the examples which we have.
> 
> Another catagory that could be of interest to some
> is cook books, of which there are now quite a decent number
> in PG.

You could add a subject "Science Fiction" or "Cooking" entry to all 
those books.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Wed Nov 10 02:38:47 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 10 02:39:10 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <200411100608.iAA68I6P016938@posso.dm.unipi.it>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>	<4191152C.9080702@perathoner.de>	<6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu>	<200411091911.28485.lynne@rhodesresearch.biz>
	<200411100608.iAA68I6P016938@posso.dm.unipi.it>
Message-ID: <4191EFB7.6080203@perathoner.de>

Carlo Traverso wrote:

> The first step however is to have better PG records, and a method to
> avoid losing information from DP to the PG catalogue.

If you put a complete <teiHeader> ... </teiHeader> somewhere in the 
files, maybe at the back where it won't hurt much, I can easily pick it 
out and parse it into the database. Of course it has to stay in the file 
after being posted.

What is happening now is that I parse the tiny header at the top of the 
file and I get just what's there.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From M.J.Farmer at bham.ac.uk  Wed Nov 10 03:02:16 2004
From: M.J.Farmer at bham.ac.uk (Malcolm Farmer)
Date: Wed Nov 10 03:35:56 2004
Subject: [gutvol-d] Re:  catalog data
In-Reply-To: <4191EE8C.7030109@perathoner.de>
References: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com>	<Pine.GSO.4.58.0411092119280.7270@vtn1.victoria.tc.ca>
	<4191EE8C.7030109@perathoner.de>
Message-ID: <4191F538.5030201@bham.ac.uk>

Marcello Perathoner wrote:

> Andrew Sly wrote:
>
>> If it helps, I've assembled a small list of PG books that
>> would fall under the heading of science fiction.
>> I haven't done anything with it yet, as I feel it's
>> rather on the small side, and surely misses many of
>> the examples which we have.
>>
>> Another catagory that could be of interest to some
>> is cook books, of which there are now quite a decent number
>> in PG.
>
>
> You could add a subject "Science Fiction" or "Cooking" entry to all 
> those books.

Is there a simple process for doing this?

For historical novels, there's a book in PG, "A Guide to the Best 
Historical Novels and Tales"  (#1359)  which lists hundreds of such,  
also listing their time and place  settings: an ever-increasing number 
of the titles  are in PG, so it would make an interesting project for 
someone with write access to the catalog data to go through the listing 
and add this classification to those titles.  At present, there are only 
six titles in the catalog categorised as historical fiction.

Distributed Proofreaders projects are classified under various headings 
(history, cooking, children's etc. ) when they're first started: it may 
be worth working out with DP a way of passing that data on to the 
catalog when the work is submitted. That only covers DP books, and 
probably doesn't match proper library classifications, but it should 
help in giving some information to the prospective reader.  
From marcello at perathoner.de  Wed Nov 10 04:06:52 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 10 04:07:19 2004
Subject: [gutvol-d] Re:  catalog data
In-Reply-To: <4191F538.5030201@bham.ac.uk>
References: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com>	<Pine.GSO.4.58.0411092119280.7270@vtn1.victoria.tc.ca>	<4191EE8C.7030109@perathoner.de>
	<4191F538.5030201@bham.ac.uk>
Message-ID: <4192045C.4070908@perathoner.de>

Malcolm Farmer wrote:

>> You could add a subject "Science Fiction" or "Cooking" entry to all 
>> those books.
> 
> Is there a simple process for doing this?

First you have to agree with Andrew on the subject headings you want to 
tackle. Then you can build an ASCII-list like this:


Subject: Cooking
1234
2345
3456

Subject: Science Fiction
7777
8888
9999

The numbers are the etext numbers. I will then import that data into the 
database. That's the easiest way to get a *lot* of data into the catalog.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Wed Nov 10 05:25:48 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Nov 10 05:25:52 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041110132548.EC56D9E7E7@ws6-2.us4.outblaze.com>

I've heard this suggestion before, but I think it bears repeating...

This sounds an awful lot like a Wikipedia entry.  I think a loose partnership between us and wikipedia would be useful here.  We have the book and a link could be made from the catalog page to that books entry in Wikipedia.  Then, we have a place to put all sorts of information about the book, the author, where it was published, the historical context it was conceived in, ... just about anything someone wants to add.  

Wikipedia's hyperlinked nature would also allow someone looking up information on, say, 20,000 Leagues Under the Sea, to get information on all sorts of related Jules Verne books and even perhaps other Science Fiction books in the PG collection ... provided someone takes the time to enter the information.

This doesn't solve any catalog problems we may have, but it does address some of the concerns raised by Lynne.  And, the only change needed on our side would be a link to the wikipedia article on each book (something that could be implemented piecemeal as someone makes a wikipedia article available).

Thoughts?

Josh


----- Original Message -----
From: Lynne Anne Rhodes <lynne@rhodesresearch.biz>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] [BP] The Future of eBooks
Date: Tue, 9 Nov 2004 19:11:28 -0700

> 
> I'm new around here so please forgive me if I go over old ground. 
> 
> I have subscribed to the RSS Recently Posted or Updated feeds and it is truely 
> amazing to see the way the entries roll in every night. However, it is 
> frustrating to see when one of the entries is opened up there is little 
> information apart from the author (with or without dates) and a title. In 
> most cases I have no idea what the book is about and whether I am interested 
> in it.
> 
> I, and I am sure many others would love to see a bit more detail such as the 
> original date of publication and a brief synopsis of the work. Obviously to 
> enter such information day after day with such a rush of material is far 
> beyond the resources of a small group of volunteers, however, dedicated.
> 
> Would it not be possible to devise a distributed cataloguing system followng 
> along the model of DP. For each book "in the frame" a form would be provided 
> with spaces for the required items. When these were completed (and checked) 
> the data would then be transferred, in an agreed format--MARC or 
> otherwise,--to a file held within the books directory tree. In many cases 
> this information is provided at the time of proofreadng and then it seems to 
> be lost.
> 
> Obviously some of the infomation might be easy to complete such as book or 
> serial. However other fields might need research such as key dates, author 
> bio etc. Also a meaningful synopsis would mean most likely reading the text 
> or abstracting a portion from another work. I could also see that 
> multilingual versions might be needed. I would think there are many who would 
> rise to the challenge of helping in such an endevour,
> 
> Lynne
> 
> 
> 
> On Tuesday 09 November 2004 12:26 pm, Alev Akman wrote:
> > At 11:06 AM 11/9/2004, you wrote:
> > >Andrew Sly wrote:
> > >>Taken all together, the PG online catalog does present plently
> > >>of information that can help people interact with the collection
> > >>in meaningful ways; but it may make professional librarians
> > >>roll their eyes.
> > >
> > >The design philosophy of the catalog database is:
> > >
> > >   To help people find a book they may want to read.
> > >
> > >That includes both, people who already know which book they want and
> > >people who want a suggestion.
> > >
> > >The catalog database was not designed to be a tool for professionals. But
> > >this doesn't mean that I'm not willing to add some functions to help them
> > >out, so long as those functions don't get in the way of the primary
> > >functionality.
> > >
> > >Producing MARC records out of existing catalog entries seems to be a
> > >pretty forward thing.
> >
> > Obviously it is not an _easy_ pretty forward thing! Otherwise, the whole
> > thing would be in place by now.
> >
> > On the other hand, PG database may not be capable of the Z39.50 imports but
> > there are many MANY (if not all!) library cataloging software packages that
> > will do it in a short time. The advantage of importing from the existing
> > catalog entries is that we have our pick of what fits our needs for
> > especially the subject fields. Of course there is always work to edit and
> > customize them for the PG user database.
> >
> > I don't see why we can't have a commercial software to do most of the work
> > and keep the existing catalog as a backup.
> >
> > And for the record, I have been involved in the PG cataloging effort for
> > more than six years and anyone who says I am not interested in it any more
> > is clearly not aware of the full facts. It may be quite disappointing when
> > one's years of volunteer efforts have been deleted with the "new
> > improvements"!
> >
> > Alev.
> > an "official" librarian
> >
> > >  Importing other people's MARC into our database will be much hairier.
> > >
> > >
> > >
> > >--
> > >Marcello Perathoner
> > >webmaster@gutenberg.org
> > >
> > >_______________________________________________
> > >gutvol-d mailing list
> > >gutvol-d@lists.pglaf.org
> > >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> > >
> > >
> > >
> > >
> > >---
> > >Incoming mail is certified Virus Free.
> > >Checked by AVG anti-virus system (http://www.grisoft.com).
> > >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From marcello at perathoner.de  Wed Nov 10 05:42:26 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 10 05:42:30 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <20041110132548.EC56D9E7E7@ws6-2.us4.outblaze.com>
References: <20041110132548.EC56D9E7E7@ws6-2.us4.outblaze.com>
Message-ID: <41921AC2.20107@perathoner.de>

Joshua Hutchinson wrote:

> This sounds an awful lot like a Wikipedia entry.  I think a loose
> partnership between us and wikipedia would be useful here.  We have
> the book and a link could be made from the catalog page to that books
> entry in Wikipedia.

We already have many links into wikipedia from the author pages. I could 
implement that functionality for the bibrec pages too.

Still somebody has to enter all the links...


-- 
Marcello Perathoner
webmaster@gutenberg.org

From M.J.Farmer at bham.ac.uk  Wed Nov 10 05:48:22 2004
From: M.J.Farmer at bham.ac.uk (Malcolm Farmer)
Date: Wed Nov 10 06:20:46 2004
Subject: [gutvol-d] Re:  catalog data
In-Reply-To: <4192045C.4070908@perathoner.de>
References: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com>	<Pine.GSO.4.58.0411092119280.7270@vtn1.victoria.tc.ca>	<4191EE8C.7030109@perathoner.de>	<4191F538.5030201@bham.ac.uk>
	<4192045C.4070908@perathoner.de>
Message-ID: <41921C26.9090100@bham.ac.uk>

Marcello Perathoner wrote:

> Malcolm Farmer wrote:
>
>>> You could add a subject "Science Fiction" or "Cooking" entry to all 
>>> those books.
>>
>>
>> Is there a simple process for doing this?
>
>
> First you have to agree with Andrew on the subject headings you want 
> to tackle. Then you can build an ASCII-list like this:
>
>
> Subject: Cooking
> 1234
> 2345
> 3456
> [snip]
> The numbers are the etext numbers. I will then import that data into 
> the database. That's the easiest way to get a *lot* of data into the 
> catalog.

Oh, right then. It really *is* simple.  In that case I'd be happy to 
volunteer to look up  the numbers for the historical fiction texts 
listed in the bibliography I mentioned.  That won't cover every book in 
this category (post-1900 works, for example, will be missing), but it 
should considerably expand that category's listing.


From sly at victoria.tc.ca  Wed Nov 10 07:14:54 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Nov 10 07:14:59 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <20041110132548.EC56D9E7E7@ws6-2.us4.outblaze.com>
References: <20041110132548.EC56D9E7E7@ws6-2.us4.outblaze.com>
Message-ID: <Pine.GSO.4.58.0411100657470.6168@vtn1.victoria.tc.ca>


On Wed, 10 Nov 2004, Joshua Hutchinson wrote:

> This sounds an awful lot like a Wikipedia entry.  I think a loose
partnership between us and wikipedia would be useful here.  We have the
book and a link could be made from the catalog page to that books entry in
Wikipedia.  Then, we have a place to put all sorts of information about
the book, the author, where it was published, the historical context it
was conceived in, ... just about anything someone wants to add.


Yes, I think that in some ways Project Gutenberg and Wikipedia
can complement each other very well.

As I see it, PG is about preserving the original content
of the printed material, and Wikipedia appears to be an ideal
place for all that extra information that we may have.
As someone (I believe Carlo) has mentioned, very often the
people involved in the scanning and digitizing of texts
have more knowlege about the author, the text itself, etc.,
which could be passed on to either the PG online catalog
or Wikipedia, as appropriate.

In the last few months, I have added countless links between
Wikipedia and the PG online catalog, sometimes creating new
Wikipedia articles for authors I think worthy of mention.
However, it's still only a small portion of what could be
done. Anyone else interested?

Andrew
From Jeroen.Hellingman at kabelfoon.nl  Wed Nov 10 08:11:52 2004
From: Jeroen.Hellingman at kabelfoon.nl (Jeroen.Hellingman@kabelfoon.nl)
Date: Wed Nov 10 08:11:59 2004
Subject: [gutvol-d] draft TEI conventions and larger example file
Message-ID: <20041110161152.95CC555639@betazoid.kabelfoon.nl>

Op 10-11-2004 04:46, schreef jij:

> <Jeroen.Hellingman@kabelfoon.nl> writes:
> 
> > I use foreign exclusively as a holder for the lang attribute.
> 
> My problem with that, is it means there's no way to transform a 
> document such that the foreign words are marked differently from 
> the emphasized words, or not marked at all. 

Well, the issue of course is that you first have to tag foreign words
as foreign, and emphasized words as emphasized, now what if a foreign
word is emphasized? The original typography will not always allow you
to distinguish those cases. That is the core reason for me using <hi>
instead of <emph> or <foreign> or something semantic: I don't always
know the intended semantic.

However, it is fairly easy to, for example, color all German words in a
text green, based on the value of the lang attribute, or to print in
roman all italic words in a certain language.

Jeroen


From tb at baechler.net  Wed Nov 10 08:27:22 2004
From: tb at baechler.net (Tony Baechler)
Date: Wed Nov 10 08:26:13 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <4191152C.9080702@perathoner.de>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
Message-ID: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com>

At 08:06 PM 11/9/2004 +0100, you wrote:

>The design philosophy of the catalog database is:
>
>   To help people find a book they may want to read.
>
>That includes both, people who already know which book they want and 
>people who want a suggestion.

Hello list.  Sorry if I seem to be complaining, but I must say that I find 
the current PG catalog to be mostly useless.  I should qualify that.  I can 
easily search through GUTINDEX.ALL to find a certain title or author.  I've 
found that grep works great for that.  However, there are no clues anywhere 
that tell me what a book is about, whether it's mystery, drama, nonfiction 
or something else, or even a basic subject classification.  I admit that 
some of this might be found by using the search form or the 
gutenberg.org/etext1234 url, but from the standpoint of a user who is in a 
hurry and just wants something to read it's still inconvenient.  Let's pick 
a random example of something which has been recently discussed.

http://gutenberg.org/etext/1473

First, the link for in-depth information takes you to the volunteer 
pages.  This is misleading since it looks like I would be able to find more 
information on the book.  More than once I have followed that link only to 
find myself in the wrong place and I had to go back in my browser.  Second, 
let's look at the subject.  All it says is "fiction."  OK, but about 
what?  What category of fiction?  While bookshare.org has a catalog not 
designed for professionals either, most books have a synopsis and are 
sorted by category.

I have a possible suggestion for solving part of this.  Put something in 
the newsletter asking people who read PG etexts to write summaries of them 
and categorize them.  Somehow create a form which only allows books to be 
reviewed or summarized, maybe like a wiki but more confined.  Someone would 
still manually approve the summary ("good" isn't helpful) and add it to the 
catalog.  That would at least give the end user some idea of what a book is 
about first.  Just for clarity, I would suggest that this summary, 
synopsis, categorization etc. would show up on the etext/1234 page and be 
added to the rdf feed but not appear in GUTINDEX.ALL. 

From marcello at perathoner.de  Wed Nov 10 08:44:37 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 10 08:44:44 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>	<Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com>
Message-ID: <41924575.2030708@perathoner.de>

Tony Baechler wrote:

> Second, let's look at the subject.  All it says is "fiction."  
> OK, but about what?  What category of fiction?  While bookshare.org has 
> a catalog not designed for professionals either, most books have a 
> synopsis and are sorted by category.

Everybody is complaining about the missing subject information.

Complaining won't help. Stepping up and volunteering to enter the data 
would help.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From aakman at csufresno.edu  Wed Nov 10 08:59:20 2004
From: aakman at csufresno.edu (Alev Akman)
Date: Wed Nov 10 08:59:25 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <41924575.2030708@perathoner.de>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com>
	<41924575.2030708@perathoner.de>
Message-ID: <6.1.2.0.2.20041110085132.08917318@zimmer.csufresno.edu>

At 08:44 AM 11/10/2004, you wrote:

>Tony Baechler wrote:
>
>>Second, let's look at the subject.  All it says is "fiction."
>>OK, but about what?  What category of fiction?  While bookshare.org has a 
>>catalog not designed for professionals either, most books have a synopsis 
>>and are sorted by category.
>
>Everybody is complaining about the missing subject information.
>
>Complaining won't help. Stepping up and volunteering to enter the data 
>would help.

Maybe if the computer people stuck to "computering" and listened to how the 
library world does it? After all, the library sytems and conventions have 
been in place for a while.

And, Marcello, my dear, don't give me that line about not having been on 
the list since 3/18. Just because I don't believe in the diarrhea of the 
mouth like some people we know : ) does not mean I do not care!

It would be good if the people who know the technical side would listen to 
library requirements (whether _they_ think MARC records are needed, or 
not!) once in a while. Otherwise, PG will be sentenced to being a whoever, 
whatever kind of project.

Alev.


>--
>Marcello Perathoner
>webmaster@gutenberg.org
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>
>
>---
>Incoming mail is certified Virus Free.
>Checked by AVG anti-virus system (http://www.grisoft.com).
>Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
-------------- next part --------------

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
From joshua at hutchinson.net  Wed Nov 10 09:09:52 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Nov 10 09:09:57 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com>


----- Original Message -----
From: Alev Akman <aakman@csufresno.edu>
> 
> Maybe if the computer people stuck to "computering" and listened to how the 
> library world does it? After all, the library sytems and conventions have 
> been in place for a while.


Great!  Answer my earlier question then.  What fields should be mandatory for our <teiHeader> and which fields should be optional?

ie, 

Author, title, Original publisher = mandatory.

Optional?  Author birth/death dates?  Which printing of the original source we derived from?  Others?

I'm not a librarian.  I need someone knowledgeable to answer these questions.

Josh

PS If we define good teiHeader information for each work, it becomes a much simpler task for Marcello's cataloging scripts to find all sorts of fun information for the reader.
From sly at victoria.tc.ca  Wed Nov 10 09:24:14 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Nov 10 09:24:21 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com>
Message-ID: <Pine.GSO.4.58.0411100917260.8444@vtn1.victoria.tc.ca>


Something very similar to this has been attempted before, with
rather dismal results. Hardly anyone seemed interested in writing
a little synopsis (or "blurb")

On a few records in the online catalog, you will see a link labled
"Reviews" which contain these. Many of them are actually only brief
excerpts from the text in question.

Andrew

On Wed, 10 Nov 2004, Tony Baechler wrote:

> I have a possible suggestion for solving part of this.  Put something in
> the newsletter asking people who read PG etexts to write summaries of them
> and categorize them.  Somehow create a form which only allows books to be
> reviewed or summarized, maybe like a wiki but more confined.  Someone would
> still manually approve the summary ("good" isn't helpful) and add it to the
> catalog.  That would at least give the end user some idea of what a book is
> about first.  Just for clarity, I would suggest that this summary,
> synopsis, categorization etc. would show up on the etext/1234 page and be
> added to the rdf feed but not appear in GUTINDEX.ALL.
From sly at victoria.tc.ca  Wed Nov 10 09:32:23 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Nov 10 09:32:30 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <41924575.2030708@perathoner.de>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com>
	<41924575.2030708@perathoner.de>
Message-ID: <Pine.GSO.4.58.0411100925360.12273@vtn1.victoria.tc.ca>


On Wed, 10 Nov 2004, Marcello Perathoner wrote:

> Everybody is complaining about the missing subject information.
>
> Complaining won't help. Stepping up and volunteering to enter the data
> would help.


I don't believe we are ready. There is right now no agreement
about what form this data would take, or what standard to try
to comply with.

If various volunteers all get to enter their own idea of what
catagories and subject headings appeal to them, we will end up
with a mish-mash of conflicting and overlapping data.

I am no expert here, but I have read enough to know that
doing subject cataloging _well_ is more involved most
people realise.

Andrew
From brad at chenla.org  Wed Nov 10 09:56:33 2004
From: brad at chenla.org (Brad Collins)
Date: Wed Nov 10 09:58:29 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <6EBFB319.08D194C6.023039A8@aol.com> (Bowerbird@aol.com's
	message of "Wed, 10 Nov 2004 02:42:11 -0500")
References: <6EBFB319.08D194C6.023039A8@aol.com>
Message-ID: <wkk6sty4dq.fsf@chenla.org>

Bowerbird@aol.com writes:

> david starner said:

> the catalog just ain't gonna help someone
> know what kind of book they'd like to read.
> (and a marc record won't help them either.)
>
> that's a job that collaborative filtering
> will eventually do much better than anything
> you can do in the form of a catalog of any type.
>

But this won't be of any help to brick and mortar libraries who want
to integrate PG etexts into their existing catalogs.

MARC is best way to accomplish this.  This would also let PG to offer
a Z39.50 gateway to the catalog which would be very cool.

I like the distributed cataloging idea, but it's not the same as DP
or Wikipedia which are brilliant and making it as simple and easy to
contribute as possible.  

Cataloging is not simple and it's not easy, and if it's not correct
and consistent it will result in a mess which will do more harm than
good.

That said, there are a number of steps in the process that can't be
easily automated which can be done in a distributed environment by
people fairly easily, but these steps have to be identified and then
a mechanism for people to contribute.

The catalog as it stands represents a lot more effort than a lot of
people realize.  I hope people keep that in mind when they slam the
existing catalog.

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From Bowerbird at aol.com  Wed Nov 10 10:10:16 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov 10 10:10:25 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <128.4f68b436.2ec3b388@aol.com>

i offered, many months ago, to shepard a system that would
provide a wiki for every e-text, which would contain all the 
info about that e-text (in a part of the wiki that was "locked")
such as its change-log, and also allow users to write summaries
and comments, make error-reports, and hold discussions.
nobody provided the disk-space i would've needed to make it work.

steve sakoman built a system of his own to do something similar.
nobody provided him with the support he needed to make it work.

now it appears that the gutvol-d merry-go-round has once again
come to this spot as it runs around in circles.  at this time, though,
i'm heading out of here, because nothing seems to get done here,
other than throwing more and more inconsistent e-texts on the pile.
(and i might, suggest, humbly, that a pause in _that_ machinery
to rethink and plan might be a very good idea at this point in time.)

all the good suggestions being made now have been made before,
but the process for implementing them seems to be badly broken.
if you want to fix _anything_, you will need to fix _that_ first...

a wiki for each e-text is still a good idea, but someone else will have to
step up and take responsibility for it.

-bowerbird
From gbnewby at pglaf.org  Wed Nov 10 10:28:28 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Nov 10 10:28:30 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com>
References: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com>
Message-ID: <20041110182828.GA27968@pglaf.org>

On Wed, Nov 10, 2004 at 12:09:52PM -0500, Joshua Hutchinson wrote:
> 
> ----- Original Message -----
> From: Alev Akman <aakman@csufresno.edu>
> > 
> > Maybe if the computer people stuck to "computering" and listened to how the 
> > library world does it? After all, the library sytems and conventions have 
> > been in place for a while.
> 
> 
> Great!  Answer my earlier question then.  What fields should be mandatory for our <teiHeader> and which fields should be optional?

(I don't know)

> ie, 
> 
> Author, title, Original publisher = mandatory.

Quick note: "Author" isn't the only term.  The categories are:

Author
Illustrator
Annotator
Commentator
Compiler
Editor
Illustrator
Translator
Unknown role

These are the fields we use for copyright clearances
& the online catalog.  I think they match the MARC format
too.  They're used somewhat unevenly in the current
eBook metadata field.
  -- Greg

> Optional?  Author birth/death dates?  Which printing of the original source we derived from?  Others?
> 
> I'm not a librarian.  I need someone knowledgeable to answer these questions.
> 
> Josh
> 
> PS If we define good teiHeader information for each work, it becomes a much simpler task for Marcello's cataloging scripts to find all sorts of fun information for the reader.

Yes, this is the intent.  Although the details are a little
elusive right now, I think that including the authoritative
catalog information in the XML file makes a lot of sense.
The cataloging scripts are already ready for this.
  -- Greg
From shalesller at writeme.com  Wed Nov 10 10:28:53 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Nov 10 10:29:04 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041110182853.9E2444BE64@ws1-1.us4.outblaze.com>

Andrew Sly writes:
 
> Something very similar to this has been attempted before, with 
> rather dismal results. Hardly anyone seemed interested in writing 
> a little synopsis (or "blurb") 

What do you mean "has been attempted before"? If you mean the newsletter,
you've got less than a week to write it up, and last time I submitted
something to the newsletter, it got dropped into the void. If you're
interested in synopsises, then express that interest and where we can
direct the result to, and I'm sure people will respond. 
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From Bowerbird at aol.com  Wed Nov 10 10:32:56 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov 10 10:33:08 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <9e.191bcb1d.2ec3b8d8@aol.com>

brad said:
>   But this won't be of any help to brick and mortar libraries 
>   who want to integrate PG etexts into their existing catalogs.

why is this a priority of any kind?

but perhaps i don't understand.  just precisely what would it mean for a
"brick and mortar library" to "integrate" this e-library into their catalog?

that if i walk into the place and go to the catalog to look for a book,
it will tell me that it's available online?  d'uh, next time i'll stay home,
and search google.  why should a brick-and-mortar library want this?

i thought the goal here was to create a global library, one that is
available 24/7 from anyplace in the world, with millions of books
that are never "unavailable" because they are "checked out" or
"mis-shelved" or "awaiting reshelving" or "going through re-binding"
or because "this branch has never had a copy of that book, sorry,
you'll have to go to the main library downtown."

am i the one who's not seeing things clearly?  or are you?


>   The catalog as it stands represents a lot more effort 
>   than a lot of people realize.  I hope people keep that in mind 
>   when they slam the existing catalog.

i agree.

conversely, there's something perverse about giving people "credit"
for working in a way that is clearly not efficient or productive...

i could spend hours and hours and hours making a flyer, for instance,
telling people how wonderful project gutenberg is, a flyer that would
produce little effect out in the world.  would you pat me on the back?
or would you suggest instead that there is a better use for my energy?

i humbly and respectfully suggest there is a better use for your energy.

-bowerbird
From gbnewby at pglaf.org  Wed Nov 10 10:36:10 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Nov 10 10:36:11 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <41924575.2030708@perathoner.de>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com>
	<41924575.2030708@perathoner.de>
Message-ID: <20041110183610.GB27968@pglaf.org>

On Wed, Nov 10, 2004 at 05:44:37PM +0100, Marcello Perathoner wrote:
> Tony Baechler wrote:
> 
> >Second, let's look at the subject.  All it says is "fiction."  
> >OK, but about what?  What category of fiction?  While bookshare.org has 
> >a catalog not designed for professionals either, most books have a 
> >synopsis and are sorted by category.
> 
> Everybody is complaining about the missing subject information.
> 
> Complaining won't help. Stepping up and volunteering to enter the data 
> would help.

It's a little more complicated than that.  I'll send a few
messages more about this in a few minutes.

The basic story is that the FIRST approach to cataloging our stuff
will be "copy" cataloging.  This includes adding subject terms,
as well as regularizing the titles, authors and other data.  This
involves finding an existing catalog record in MARC format via
OCLC or similar resources.  Alev thinks this is possible for the 
majority of our works, even the very obscure ones and non-US items.

The SECOND approach will be original cataloging, to create a 
record from scratch (or based on existing info like author
records).  This is something we'd like to do only when necessary.

In either case, adding a new record requires looking at consistency
with other records and other uses of the subject information, because
these things tend to change over time.

My view is that we will be able to get a corps of "distributed
catalogers" to work on the first approach, though just as with
distributed proofreaders, there will probably be different levels
at which people feel comfortable/confident/competent in creating
or changing records.  

** I'll send some further info about how this could get underway.
** At some point soon, though, let's move this to the "gutcat"
** list.  http://lists.pglaf.org to join

  -- Greg


From ke at suse.de  Wed Nov 10 04:00:28 2004
From: ke at suse.de (Karl Eichwalder)
Date: Wed Nov 10 10:37:37 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <4191ED7A.3020203@perathoner.de> (Marcello Perathoner's message
	of "Wed, 10 Nov 2004 11:29:14 +0100")
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<4191152C.9080702@perathoner.de>
	<6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu>
	<4191ED7A.3020203@perathoner.de>
Message-ID: <shlld9ap7n.fsf@frechet.suse.de>

Marcello Perathoner <marcello@perathoner.de> writes:

> To request a MARC record I have to implement an obscure Z39.50
> protocol.

You can use yaz-client as it comes with the YAZ toolkit
(http://www.indexdata.dk/yaz/).  Index Data also offers a database
system: http://www.indexdata.dk/zebra/ (GPL).

-- 
Key fingerprint = B2A3 AF2F CFC8 40B1 67EA  475A 5903 A21B 06EB 882E
From shalesller at writeme.com  Wed Nov 10 10:49:25 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Nov 10 10:49:37 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041110184925.013A24BE65@ws1-1.us4.outblaze.com>

"Joshua Hutchinson" writes:

> This sounds an awful lot like a Wikipedia entry. I 
> think a loose partnership between us and wikipedia 
> would be useful here. We have the book and a link 
> could be made from the catalog page to that books 
> entry in Wikipedia. Then, we have a place to put all 
> sorts of information about the book, the author, 
> where it was published, the historical context it was 
> conceived in, ... just about anything someone wants to add. 

But does every single book in the catalog deserve a Wikipedia
entry? And a lot of details wanted are about the specific
edition, when it was published and what not, that would never
fit for a Wikipedia entry. That's a lot of short entries we'd
be adding to Wikipedia.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From shalesller at writeme.com  Wed Nov 10 11:02:11 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Nov 10 11:02:18 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041110190211.BC8A14BE64@ws1-1.us4.outblaze.com>

Brad Collins writes:

> Cataloging is not simple and it's not easy, and if it's not correct 
> and consistent it will result in a mess which will do more harm than 
> good. 

I've heard that about producing an ebook and about producing an operating
system. I don't buy it. An incomplete list of science fiction books still
helps. The fact that some of the computer books in the library are sorted
under 510 and some under 000, like in the professionally catalogued Oklahoma
State Library does not cause the roof to fall in. The worst thing a catalog
can do is force you to try and handle things without it, which is what you'd
be forced to do if you had no catalog.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From sly at victoria.tc.ca  Wed Nov 10 11:02:29 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Nov 10 11:02:36 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <20041110182853.9E2444BE64@ws1-1.us4.outblaze.com>
References: <20041110182853.9E2444BE64@ws1-1.us4.outblaze.com>
Message-ID: <Pine.GSO.4.58.0411101051080.5761@vtn1.victoria.tc.ca>


On Wed, 10 Nov 2004, D. Starner wrote:

> Andrew Sly writes:
>
> > Something very similar to this has been attempted before, with
> > rather dismal results. Hardly anyone seemed interested in writing
> > a little synopsis (or "blurb")
>
> What do you mean "has been attempted before"? If you mean the newsletter,
> you've got less than a week to write it up, and last time I submitted
> something to the newsletter, it got dropped into the void. If you're
> interested in synopsises, then express that interest and where we can
> direct the result to, and I'm sure people will respond.

I mean that this has been tried before. (admittedly, it was a while
ago, late in 2000) And, after the initial contributions, the interest
died out.

If enough people would like to contribute a brief synopsis for texts
in the collection, we already have a place in the catalog they can
go. (although I don't know about the mechanics behind it)

When I tried to make a few of these myself, I found that
writing a good brief syopsis of a novel was harder than I
would have thought.


Andrew
From gbnewby at pglaf.org  Wed Nov 10 11:04:24 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Nov 10 11:04:25 2004
Subject: [gutvol-d] MARC to the catalog
Message-ID: <20041110190424.GA29073@pglaf.org>

Here's some information to try to get subject cataloging
moving forward.  As you've seen, Alev (who cataloged our
first 3500 or so books) has stepped up to try to help
shape this project.  Andrew Sly has also stepped up, and
has already been doing a lot of editing of existing
catalog data.

(I'm sending this to gutvol-d, but hope we can soon take
this conversation to gutcat@lists.pglaf.org.  Visit
http://lists.pglaf.org to subscribe)

One of our goals is to get proper subject headings into the Project
Gutenberg catalog.  ("Proper" means that they come from the Library of
Congress Subject Headings corpus or similar authoritative source, and
were generated by librarians or similarly clued-in people.)

Currently, less than 1/4 of the Project Gutenberg collection
has subject headings.  Furthermore, the names we use for 
authors and titles are not always consistent.  There are
other limitations with the current catalog data, too.

This message is partially to let people know how I think
we'd like to start, and partially to ask Marcello and others
(like Steve Thomas) to look at what it will take to move things
forward.

The basic scenario is that the easiest way to get authoritative
catalog data (including subject headings) for our holdings is
to find existing library catalog entries.  There are some 
great resources and software for doing this, and a data
interchange format called MARC.  MARC stands for Machine Readable
Cataloging, and it actually has a few variations.  Essentially,
it's delimited fields with data about an item: author, title,
etc.  (Many, many fields and subfields - most of which are not needed for
a particular item.)

** What I'd like to enable is import of MARC records to 
the catalog, to update, augment or replace existing catalog
entry for a particular item.

This is harder than it might sound, for a variety of reasons.
I'm appending a few MARC records from some recently released
PG titles that Alev was able to find (yes, she found existing
catalog records for these items, even though some are obscure
and non-English).

I don't want to over-specify how I think the workflow should
happen.  I think that's still to be determined.  But the overall
flow needs to be somewhat circular: librarians need to import
existing PG catalog records, preferably in MARC format, to
existing software.  (Alev has a couple of programs for this; PGLAF
can probably acquire software for other folks who'd like to
work a lot on this activity.)  Then, updated records would need
to be shipped back into the catalog.

Below are some MARC records, also the listing of info in the PG
catalog and our clearance records (which are incomplete - though
we usually do have the page scans sent for clearance)
  -- Greg

One format:

Title	Author	Date	Publisher	Note
Punch, or, The London Charivari.	Punch (London, England)	1841	Published for the proprietors by R. Bryant	"No. 1. For the week ending July 17, 1941. Price Threepence."--At head of cover.
Lippincott's magazine of popular literature and science	(none)	1871; 1871-1880	J.B. Lippincott and Co.	Edited in Philadelphia for the first seventeen years by John Foster Kirk, Lippincott's Magazine published many notable English and American writers including Henry James, Oscar Wilde, Amelie Rive, Conan Doyle, and Rudyard Kipling.  In addition to long and short fiction, there was much literary criticism and many book reviews and illustrated travel articles.  Although the contents were of high quality, competition with popular New York magazines eventually caused Lippincott's to be sold in 1914 to McBride, Nast and Company who moved it to New York and changed the name to McBride's Magazine.  After a short time, however, it was merged with Scribner's; Title from caption.; Microfilm.
The Lady of the Lake	Scott, Walter; Rolfe, W. J.	1922	Houghton Mifflin company	(none)
The authoritative life of General William Booth	Railton, George Scott	1912; c1912	Hodder & Stoughton, George H. Doran company	(none)
Camp and trail	Hornibrook, Isabel Katherine	1897	Lothrop publishing company	(none)
The Outdoor Girls in army service, or, Doing their bit for the soldier boys	Hope, Laura Lee.	1918; c1918	Grosset & Dunlap	(none)
Grace Harlowe's second year at Overton College	Flower, Jessie Graham.	1914; c1914	Henry Altemus	(none)
Les trois mousquetaires	Dumas, Alexandre; Le Courrier des ?tats-Unis	1846	P. Gaillardet	At head of title: Semaine litt?raire du Courrier des ?tats-Unis.
George Borrow	Thomas, Edward.	1912	Chapman & Hall, ltd.	"Bibliography of George Borrow": p.[323]-333.


Another:

00796nam  2200217 a 45M0001001300000003000400013005001700017008004100034040001300075043001200088050002000100130002800120245003700148246002100185260006500206300003300271500008500304650004000389650003900429852011000468
NYUb11968217
NYU
19990310183125.0
990310s1841    enka          000 0 eng d
  aNNUcNNU
  ae-uk-en
 4aAP101b.P8 1841
0 aPunch (London, England)
10aPunch, or, The London Charivari.
30aLondon Charivari
  aLondon :bPublished for the proprietors by R. Bryant,c1841.
  a14, [2] p. :bill. ;c30 cm.
  a"No. 1. For the week ending July 17, 1941. Price Threepence."--At head of cover.
 0aEnglish wit and humorvPeriodicals.
 0aPopular literaturezGreat Britain.
  aNNUbNYUbBobstbSpecColhAP101i.P8 1841712081221mNon-circulatingpN10964924t1yAvailable3no.15no.1

02256cas  2200373 a 45M0001001300000003000400013005001700017007001400034008004100048010001600089035002600105035001800131040002700149042000800176090005100184245007400235246002600309260005700335300002800392310001200420362004900432500069900481500002401180533015201204760004201356776006001398780006301458785002601521830005401547866007901601950001001680998010101690852009101791
NYUb10726168
NYU
19940713181853.0
hduafu---buca
890810d18711880miuuu p a     0uuua0eng d
  asn 85060910
  a(CStRLIN)NYUG89-S4496
  aGLIS007261686
  aOAkUcOAkUdNdMHdNNU
  alcd
  i06/03/93 Th10/26/92 Th09/19/89 Th08/10/89 T
00aLippincott's magazine of popular literature and scienceh[microform].
14aLippincott's magazine
  aPhiladelphia :bJ.B. Lippincott and Co.,c1871-1880.
  a20 v. :bill. ;c28 cm.
  aMonthly
0 aVol. 7, no. 1 (Jan. 1871)-v. 26 (Dec. 1880).
  aEdited in Philadelphia for the first seventeen years by John Foster Kirk, Lippincott's Magazine published many notable English and American writers including Henry James, Oscar Wilde, Amelie Rive, Conan Doyle, and Rudyard Kipling.  In addition to long and short fiction, there was much literary criticism and many book reviews and illustrated travel articles.  Although the contents were of high quality, competition with popular New York magazines eventually caused Lippincott's to be sold in 1914 to McBride, Nast and Company who moved it to New York and changed the name to McBride's Magazine.  After a short time, however, it was merged with Scribner'szCf. American periodicals, 1741-1900.
  aTitle from caption.
  aMicrofilm.bAnn Arbor, Mich. :cXerox University Microfilms,d1972.e8 microfilm reels ; 4 in., 35 mm.f(American periodicals, 1850-1900 ; 317-324)
0 tAmerican periodical series, 1850-1900
1 tLippincott's magazine of popular literature and science
00tLippincott's magazine of literature, science and education
00tLippincott's magazine
 0aAmerican periodical series, 1850-1900 ;v317-324.
  lBobst Microform dFilm 277 APS III R317-322e8908f0g5hj7-26k1871-1880
  lBMICR
  a06/03/93tcs9110nNNUwDCLCSF8999097Sd08/10/89cMJDbSKHi930603h921026h890919h890810lNYUG
  aNNUbNYUbBobstbMicroform711635364mNon-circulatingpN10396809yAvailable5N10396809

00953cam  22002531  4500001000800000005001700008008004100025035002100066906004500087010001700132035001900149040001800168050002100186100003700207245009700244250002200341260006300363300005600426490004300482650005200525700005200577985002100629991004900650
9668953
20031210181225.0
830715s1922    msuab         000 0 eng  
  9(DLC)   25005333
  a7bcbccoclcrplduencipf19gy-gencatlg
  a   25005333 
  a(OCoLC)9706316
  aDLCcMsJdDLC
00aPR5308b.A1 1922
1 aScott, Walter,cSir,d1771-1832.
14aThe Lady of the Lake,cby Sir Walter Scott, Bart.; edited with notes by William J. Rolfe ...
  aRev. and enl. ed.
  aBoston,aNew York [etc.]bHoughton Mifflin companyc[1922]
  axvi, 272, [2] p. incl. front., illus., map.c17 cm.
0 a[Riverside literature series,vno. 53]
 0aLady of the Lake (Legendary character)vPoetry.
1 aRolfe, W. J.q(William James),d1827-1910,eed.
  eOCLC REPLACEMENT
  bc-GenCollhPR5308i.A1 1922tCopy 1wOCLCREP

00614nam  2200157I  4500001000800000008004100008010001300049035001500062050001800077100003900095245014900134260006800283300005300351600003200404610002000436
1828178
830316s1912    nyucf         00010beng  
  a13000924
  a0313-23760
0 aBX9743.B7bR3
1 aRailton, George Scott,d1849-1913.
04aThe authoritative life of General William Booth,bfounder of the Salvation army,cby G. S. Railton ... with a preface by General Bramwell Booth.
  aNew York,bHodder & Stoughton, George H. Doran companyc[c1912]
  a7 p. l., 331 p.bfront., ports., facsim.c20 cm.
10aBooth, William,d1829-1912.
20aSalvation Army.

00595nam  2200181u  4500001000800000005001700008008004100025035002100066906004500087010001700132040001900149050001600168100006000184245002100244260004800265300004600313991005400359
5859494
00000000000000.0
810904s1897    mauf   j      000 0 eng  
  9(DLC)   04016828
  a0bcbccpremunvduencipf19gy-gencatlg
  a   04016828 
  aDLCcCarPdDLC
00aPZ7.H784bC
1 aHornibrook, Isabel Katherine,d1859- [from old catalog]
10aCamp and trail; 
  aBoston,bLothrop publishing companyc[1897]
  a2 p.bl., 5-305 p. front. plates.c20 cm.
  bc-GenCollhPZ7.H784iCp00024749368tCopy 1wPREM

00478nam  2200145Ia 4500001000900000005001700009008004100026040002300067090002200090100002100112245010200133260004200235300002900277490002600306
10004676
19880111095007.0
831012s1918    nyua   j      00011 eng d
  aNGUcNGUdm/cdBGU
  aPS3515.O585bO846
10aHope, Laura Lee.
14aThe Outdoor Girls in army service, or, Doing their bit for the soldier boys /cby Laura Lee Hope.
0 aNew York :bGrosset & Dunlap,cc1918.
  a212 p. :bill. ;c20 cm.
0 aOutdoor girls series.

00497nam  2200157Ii 4500001000800000005001700008008004100025040001800066090002100084100002700105245007900132260004300211300002900254490003000283830002600313
2810423
19880329145958.0
770317s1914    xx     j      00011 eng d
  aMNLcMNLdBGU
  aPS3511.L78bG758
10aFlower, Jessie Graham.
10aGrace Harlowe's second year at Overton College /cby Jessie Graham Flower.
0 aPhiladelphia :bHenry Altemus,cc1914.
  a248 p. :bill. ;c19 cm.
1 aThe college girls series.
 0aCollege girls series.

00765nam  22002291  4500001000800000005001700008008004100025035002100066906004500087010001700132035002000149040001900169050002100188100003400209245005100243260003700294300001900331500007100350700004400421985002100465991004900486
9603832
19980421190136.0
850703s1846    nyu           000 1 fre  
  9(DLC)   03029683
  a7bcbccoclcrplduencipf19gy-gencatlg
  a   03029683 
  a(OCoLC)12231807
  aDLCcNBuUdDLC
00aPQ2228b.A1 1846
1 aDumas, Alexandre,d1802-1870.
14aLes trois mousquetaires,cpar Alexandre Dumas.
  aNew York,bP. Gaillardet,c1846.
  a268 p.c26 cm.
  aAt head of title: Semaine litt?eraire du Courrier des ?Etats-Unis.
2 aLe Courrier des ?Etats-Unis,cNew York.
  eOCLC REPLACEMENT
  bc-GenCollhPQ2228i.A1 1846tCopy 1wOCLCREP

00568nam  2200169I  4500001000800000005001700008008004100025010001300066040002400079050001400103100002000117245006200137260004200199300006900241500005000310600003800360
4929310
19880421065446.0
790504s1912    enkcfh   b    00110 eng  
  a13012350
  aDLCcAMHdm.c.dm/c
0 aPR4156.T5
10aThomas, Edward.
10aGeorge Borrow,bthe man and his books,cby Edward Thomas.
0 aLondon,bChapman & Hall, ltd.,c1912.
  axi, 333, viii p., 1 ?.bfront., plates, ports., facsims.c23 cm.
  a"Bibliography of George Borrow": p.[323]-333.
10aBorrow, George Henry,d1803-1881.


The above are for these entries:


1. Celsissimus (German)
http://www.gutenberg.org/etext/13953
gbn0403071608: Arthur Achleitner, Celsissimus (german).  user@host.  1902p.  3/21/2004.  ok.

(that's a cleared2.gbn clearance line)


2. The Pocket George Borrow
http://www.gutenberg.org/etext/13957
OK      20041030020123thomas    The Pocket George Borrow        Edward Thomas  1912:c 


3. Les trois mousquetaires (French)
http://www.gutenberg.org/etext/13951
OK      20041019125907dumas     Les trois mousquetaires Alexandre Dumas 1844:p


4. Grace Harlowe's Second Year at Overton College
http://www.gutenberg.org/etext/6858
gbn520:  Grace Harlowe's Second Year at Overton College, Jessie Graham Flower  user@host.  1914c.  9/13/2002.  ok.

(that's a cleared.gbn , really old clearance line)


5. The Outdoor Girls in Army Service, Or, doing their bit for the soldier boys 
http://www.gutenberg.org/etext/7494
gbn560g: The Outdoor Girls in Army Service, Laura Lee Hope.  user@host.  1918c.  9/10/2002.  ok.
gbn568: Laura Lee Hope, The Outdoor Girls in Army Service.  user@host.  1918c.  9/13/2002.  ok.

(cleared twice, but looks like the same edition)


6. Camp and Trail, A Story of the Maine Woods
http://www.gutenberg.org/etext/13946
OK      20040825223614hornibrook        Camp and Trail  Isabel Hornibrook      1897:c 


7. The Authoritative Life of General William Booth
http://www.gutenberg.org/etext/13958
gbn0403190519: G[eorge] S[cott] Railton, The Authoritative Life of General William Booth.  user@host.  1912c.  3/23/2004.  ok.


8. The Lady of the Lake
http://www.gutenberg.org/etext/3011
The Lady of the Lake         Walter Scott         J. C. Byers      11/23/99 ok 82-83c

(this cleared line is from Michael's Xeroxes)

9. Lippincott's Magazine of Popular Literature and Science, Vol. XVII. No. 101. May, 1876.
http://www.gutenberg.org/etext/13956
gbn0405261819: various, Lippincott's Magazine v. 17 Jan-June 1875.  user@host.  1876c.  5/26/2004.  ok.
OK      20040808141522various   Lippincott's Magazine v. 17 Jan-Jun 1876        various 1876:c 

(two clearances for this, too.  We often clear entire year-long
or multi-year volumes for periodicals based on a single TP&V scan)

10. Punch, or the London Charivari, Vol. 152, June 27, 1917 1917 Almanack
http://www.gutenberg.org/etext/13954
gbn0402060846: Various, Punch - Vol. 152..  user@host.  1917p.  2/6/2004.  ok.


And, just for fun:

Title	Author	Date	Publisher	Note	Editor	Call Number	Corporate Author	Description	Edition	Illustrator	ISBN	ISSN	Language	LC Call Number	Main Series	Subject Heading
Gone with the wind	Mitchell, Margaret; Herman Finkelstein Collection (Library of Congress); Alfred Whital Stern Collection of Lincolniana (Library of Congress)	1936	Macmillan	"Published May 1936"--Verso of t.p. Actual publication of the 1st ed. was delayed to June 30, 1936. Cf. Gone with the wind as book and film / Richard Harwell. c1983. P. [xv].; LC copy has dust jacket. Newspaper clipping from the Parade section, Oct. 31, 1976 and magazine clipping from Publishers weekly, Sept. 6, 1976 on author laid in.; Source: Gift of Herman Finkelstein, Dec. 30, 1980.	(none)	PS3525.I972	(none)	1037 p. 22 cm.	(none)	(none)	(none)	(none)	eng	(none)	(none)	Women
From aakman at csufresno.edu  Wed Nov 10 11:14:19 2004
From: aakman at csufresno.edu (Alev Akman)
Date: Wed Nov 10 11:14:28 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com>
References: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com>
Message-ID: <6.1.2.0.2.20041110092605.02074690@zimmer.csufresno.edu>

Joshua,

I think the minimum mandatory fields should include:

Author (including birth/death dates)
Title (and/or Uniform Title)
Subtitle (when it exists)
Editor/Translator/Illustrator
Date the book was published
Physical properties of the "original print work" like number of pages, size 
of the book, illustrations, etc.
Notes (Contents for collections, for example)
Call numbers (LC/Dewey)
Subjects
Genre (that's where the Mystery, Historical Fiction, etc would come in)

That's what I can think of now. Does the list help?

Alev.


At 09:09 AM 11/10/2004, you wrote:


>----- Original Message -----
>From: Alev Akman <aakman@csufresno.edu>
> >
> > Maybe if the computer people stuck to "computering" and listened to how 
> the
> > library world does it? After all, the library sytems and conventions have
> > been in place for a while.
>
>
>Great!  Answer my earlier question then.  What fields should be mandatory 
>for our <teiHeader> and which fields should be optional?
>
>ie,
>
>Author, title, Original publisher = mandatory.
>
>Optional?  Author birth/death dates?  Which printing of the original 
>source we derived from?  Others?
>
>I'm not a librarian.  I need someone knowledgeable to answer these questions.
>
>Josh
>
>PS If we define good teiHeader information for each work, it becomes a 
>much simpler task for Marcello's cataloging scripts to find all sorts of 
>fun information for the reader.
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>---
>Incoming mail is certified Virus Free.
>Checked by AVG anti-virus system (http://www.grisoft.com).
>Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
-------------- next part --------------

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
From joshua at hutchinson.net  Wed Nov 10 11:15:46 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Nov 10 11:15:53 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041110191546.5ED692F8F7@ws6-3.us4.outblaze.com>


----- Original Message -----
From: Bowerbird@aol.com
> 
> a wiki for each e-text is still a good idea, but someone else will have to
> step up and take responsibility for it.

As with everything else around here, someone has to step up.  You act like you tried before... You didn't.  You basically said, someone set up the wiki, someone decide this is how we are doing it and then you would "shepherd" it.  No one is going to be able to wave a magic wand to make things happen.  If no one feels strongly enough about something to take control and make it happen ... it ain't gonna happen.

Josh
From shalesller at writeme.com  Wed Nov 10 11:17:58 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Nov 10 11:18:27 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041110191758.73D9C4BE64@ws1-1.us4.outblaze.com>

Alev Akman writes:

> Physical properties of the "original print work" like number of pages, size 
> of the book, illustrations, etc. 

How can this be mandatory? We've got a few composite books, that don't have
a single print analogue, and many books where it would be hard or arbitrary
to find an edition to get this information from. 
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From joshua at hutchinson.net  Wed Nov 10 11:22:33 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Nov 10 11:22:40 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041110192233.0332C4F550@ws6-5.us4.outblaze.com>


----- Original Message -----
From: "D. Starner" <shalesller@writeme.com>

> "Joshua Hutchinson" writes:
> 
> > This sounds an awful lot like a Wikipedia entry. I 
> > think a loose partnership between us and wikipedia 
> > would be useful here. We have the book and a link 
> > could be made from the catalog page to that books 
> > entry in Wikipedia. Then, we have a place to put all 
> > sorts of information about the book, the author, 
> > where it was published, the historical context it was 
> > conceived in, ... just about anything someone wants to add. 
> 
> But does every single book in the catalog deserve a Wikipedia
> entry? And a lot of details wanted are about the specific
> edition, when it was published and what not, that would never
> fit for a Wikipedia entry. That's a lot of short entries we'd
> be adding to Wikipedia.

I was thinking more of the summary/synopsis, author info, genre, etc.  The stuff that everyone was saying we need in order to better use the collection.  Author, publication date, etc belongs in MARC records or teiHeaders.  As far as I know, neither really has a mechanism for holding things like a summary or review information (I suppose since TEI is XML based, that functionality could be added, but I don't think that is the proper place for it).

Josh
From aakman at csufresno.edu  Wed Nov 10 11:26:13 2004
From: aakman at csufresno.edu (Alev Akman)
Date: Wed Nov 10 11:26:21 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <20041110191758.73D9C4BE64@ws1-1.us4.outblaze.com>
References: <20041110191758.73D9C4BE64@ws1-1.us4.outblaze.com>
Message-ID: <6.1.2.0.2.20041110111917.08b7e760@zimmer.csufresno.edu>

At 11:17 AM 11/10/2004, you wrote:

>Alev Akman writes:
>
> > Physical properties of the "original print work" like number of pages, 
> size
> > of the book, illustrations, etc.
>
>How can this be mandatory? We've got a few composite books, that don't have
>a single print analogue, and many books where it would be hard or arbitrary
>to find an edition to get this information from.

I was speaking for our future records. I am aware that some of our files 
are even compilations of various additions. Hopefully we are getting away 
from works obtained that way, maybe even redoing them. If we want 
dependable works, we should be able to prove our source. No more chickening 
out for copyright reasons.

Part of the reason PG does not have the power it should with the libraries 
is that the information required for citation purposes are a mishmash. We 
should be ready to present the best replicate of the work at hand and the 
information that goes with it.

Alev.

>--
>___________________________________________________________
>Sign-up for Ads Free at Mail.com
>http://promo.mail.com/adsfreejump.htm
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>---
>Incoming mail is certified Virus Free.
>Checked by AVG anti-virus system (http://www.grisoft.com).
>Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
-------------- next part --------------

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
From Bowerbird at aol.com  Wed Nov 10 11:28:59 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov 10 11:29:14 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <193.328d8437.2ec3c5fb@aol.com>

josh said:
>   As with everything else around here, someone has to step up.  
>   You act like you tried before... You didn't.

i not only "tried", i _did_ "step up".
i developed a whole system to use.
all i required was a place to put it.

steve even took things a step further,
and put up his system in his own space.
and nobody went there to support him
or got anyone else to go there to do it.
(go there, right now, and you'll see it's true.)

maybe that's because you are all talk and no action.
or maybe it's because there is no demand from users.
(certainly not to justify a huge expenditure of effort.)
either way, there's no future in _this_ "future of e-books".
so i'll be making my way out of here...

-bowerbird
From joshua at hutchinson.net  Wed Nov 10 11:31:52 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Nov 10 11:32:00 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041110193152.1E5479E829@ws6-2.us4.outblaze.com>

Thank you!

One quick question jumped out at me.

You mention Illustrations.  Do you just mean a total number of illustrations or a list of illustrations in the work?  Total will be easy... a list of illustrations is a lot of work!  :)

Josh

----- Original Message -----
From: Alev Akman <aakman@csufresno.edu>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] [BP] The Future of eBooks
Date: Wed, 10 Nov 2004 11:14:19 -0800

> 
> Joshua,
> 
> I think the minimum mandatory fields should include:
> 
> Author (including birth/death dates)
> Title (and/or Uniform Title)
> Subtitle (when it exists)
> Editor/Translator/Illustrator
> Date the book was published
> Physical properties of the "original print work" like number of pages, size 
> of the book, illustrations, etc.
> Notes (Contents for collections, for example)
> Call numbers (LC/Dewey)
> Subjects
> Genre (that's where the Mystery, Historical Fiction, etc would come in)
> 
> That's what I can think of now. Does the list help?
> 
> Alev.
> 
> 
> At 09:09 AM 11/10/2004, you wrote:
> 
> 
> >----- Original Message -----
> >From: Alev Akman <aakman@csufresno.edu>
> > >
> > > Maybe if the computer people stuck to "computering" and listened to how 
> > the
> > > library world does it? After all, the library sytems and conventions have
> > > been in place for a while.
> >
> >
> >Great!  Answer my earlier question then.  What fields should be mandatory 
> >for our <teiHeader> and which fields should be optional?
> >
> >ie,
> >
> >Author, title, Original publisher = mandatory.
> >
> >Optional?  Author birth/death dates?  Which printing of the original 
> >source we derived from?  Others?
> >
> >I'm not a librarian.  I need someone knowledgeable to answer these questions.
> >
> >Josh
> >
> >PS If we define good teiHeader information for each work, it becomes a 
> >much simpler task for Marcello's cataloging scripts to find all sorts of 
> >fun information for the reader.
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d@lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> >
> >---
> >Incoming mail is certified Virus Free.
> >Checked by AVG anti-virus system (http://www.grisoft.com).
> >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
> 

>
> 
> 
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
> 

>
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From marcello at perathoner.de  Wed Nov 10 11:33:01 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 10 11:33:12 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <shlld9ap7n.fsf@frechet.suse.de>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>	<4191152C.9080702@perathoner.de>	<6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu>	<4191ED7A.3020203@perathoner.de>
	<shlld9ap7n.fsf@frechet.suse.de>
Message-ID: <41926CED.1080900@perathoner.de>

Karl Eichwalder wrote:

> You can use yaz-client as it comes with the YAZ toolkit
> (http://www.indexdata.dk/yaz/).

Ok. I got so far. For all of you that wondered what a MARC record looks 
like here is an example:

000  01109cam  2200277 a 4500
001  708964
005  19980710092633.8
008  970604s1997    inuab    b    001 0 eng
035      $9(DLC)   97023698
906      $a7$bcbc$corignew$d1$eocip$f19$gy-gencatlg
955      $apc16 to ja00 06-04-97; jd25 06-05-97; jd99 06-05-97; jd11 
06-06-97;aa05 06-10-97; CIP ver. pv08 11-05-97
010      $a   97023698
020      $a0253333490 (alk. paper)
040      $aDLC$cDLC$dDLC
050  00  $aQE862.D5$bC697 1997
082  00  $a567.9$221
245  04  $aThe complete dinosaur /$cedited by James O. Farlow and M.K. 
Brett-Surman ; art editor, Robert F. Walters.
260      $aBloomington :$bIndiana University Press,$cc1997.
300      $axi, 752 p. :$bill. (some col.), maps ;$c26 cm.
504      $aIncludes bibliographical references and index.
650   0  $aDinosaurs.
700  1   $aFarlow, James Orville.
700  2   $aBrett-Surman, M. K.,$d1950-
920      $a**LC HAS REQ'D # OF SHELF COPIES**
991      $bc-GenColl$hQE862.D5$iC697 1997$tCopy 1$wBOOKS
991      $br-SciRR$hQE862.D5$iC697 1997$tCopy 1$wGenBib bi 98-003434


The first problem is: how do we relate existing and new books to LoC 
MARC records. Meaning: we have to find out the Control Number (001) or 
the LoC Control Number (010) of every book we have.


We need a few volunteers to build a list: etext-number => Control 
Number. Then we can import that list into the database.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From aakman at csufresno.edu  Wed Nov 10 11:41:57 2004
From: aakman at csufresno.edu (Alev Akman)
Date: Wed Nov 10 11:42:08 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <20041110193152.1E5479E829@ws6-2.us4.outblaze.com>
References: <20041110193152.1E5479E829@ws6-2.us4.outblaze.com>
Message-ID: <6.1.2.0.2.20041110114008.08af1420@zimmer.csufresno.edu>

What I meant was whether the book has any illustrations. Usually the 
catalog record also indicates if  any or all are in color. Never the 
number/list of illustrations.

Alev.

At 11:31 AM 11/10/2004, you wrote:

>Thank you!
>
>One quick question jumped out at me.
>
>You mention Illustrations.  Do you just mean a total number of 
>illustrations or a list of illustrations in the work?  Total will be 
>easy... a list of illustrations is a lot of work!  :)
>
>Josh
>
>----- Original Message -----
>From: Alev Akman <aakman@csufresno.edu>
>To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
>Subject: Re: [gutvol-d] [BP] The Future of eBooks
>Date: Wed, 10 Nov 2004 11:14:19 -0800
>
> >
> > Joshua,
> >
> > I think the minimum mandatory fields should include:
> >
> > Author (including birth/death dates)
> > Title (and/or Uniform Title)
> > Subtitle (when it exists)
> > Editor/Translator/Illustrator
> > Date the book was published
> > Physical properties of the "original print work" like number of pages, 
> size
> > of the book, illustrations, etc.
> > Notes (Contents for collections, for example)
> > Call numbers (LC/Dewey)
> > Subjects
> > Genre (that's where the Mystery, Historical Fiction, etc would come in)
> >
> > That's what I can think of now. Does the list help?
> >
> > Alev.
> >
> >
> > At 09:09 AM 11/10/2004, you wrote:
> >
> >
> > >----- Original Message -----
> > >From: Alev Akman <aakman@csufresno.edu>
> > > >
> > > > Maybe if the computer people stuck to "computering" and listened to 
> how
> > > the
> > > > library world does it? After all, the library sytems and 
> conventions have
> > > > been in place for a while.
> > >
> > >
> > >Great!  Answer my earlier question then.  What fields should be mandatory
> > >for our <teiHeader> and which fields should be optional?
> > >
> > >ie,
> > >
> > >Author, title, Original publisher = mandatory.
> > >
> > >Optional?  Author birth/death dates?  Which printing of the original
> > >source we derived from?  Others?
> > >
> > >I'm not a librarian.  I need someone knowledgeable to answer these 
> questions.
> > >
> > >Josh
> > >
> > >PS If we define good teiHeader information for each work, it becomes a
> > >much simpler task for Marcello's cataloging scripts to find all sorts of
> > >fun information for the reader.
> > >_______________________________________________
> > >gutvol-d mailing list
> > >gutvol-d@lists.pglaf.org
> > >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> > >
> > >
> > >---
> > >Incoming mail is certified Virus Free.
> > >Checked by AVG anti-virus system (http://www.grisoft.com).
> > >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
> >
>
> >
> >
> >
> > ---
> > Outgoing mail is certified Virus Free.
> > Checked by AVG anti-virus system (http://www.grisoft.com).
> > Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
> >
>
> >
> >
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d@lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>---
>Incoming mail is certified Virus Free.
>Checked by AVG anti-virus system (http://www.grisoft.com).
>Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
-------------- next part --------------

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
From marcello at perathoner.de  Wed Nov 10 12:01:07 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 10 12:01:16 2004
Subject: [gutvol-d] MARC to the catalog
In-Reply-To: <20041110190424.GA29073@pglaf.org>
References: <20041110190424.GA29073@pglaf.org>
Message-ID: <41927383.9030308@perathoner.de>

Greg Newby wrote:

> I don't want to over-specify how I think the workflow should
> happen.  I think that's still to be determined.  But the overall
> flow needs to be somewhat circular: librarians need to import
> existing PG catalog records, preferably in MARC format, to
> existing software.  (Alev has a couple of programs for this; PGLAF
> can probably acquire software for other folks who'd like to
> work a lot on this activity.)  Then, updated records would need
> to be shipped back into the catalog.

I think an easier solution would be to build an ASCII list containing 
the etext-number and the LoC Call Number for all etexts we have.

We would then import the LoC Call Number into a field in the database.

The catalog software could then update a number of fields (Subject, LoC 
Class, Unified Title) automatically from the LoC database (TODO Check 
copyright status of LoC database !!!)

Then we could do a manual pass over the database with the MARC record at 
hand and fix the author / coauthor attributions, link into wikipedia if 
an article exists, add summaries etc.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From brad at chenla.org  Wed Nov 10 12:37:53 2004
From: brad at chenla.org (Brad Collins)
Date: Wed Nov 10 12:39:58 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <9e.191bcb1d.2ec3b8d8@aol.com> (Bowerbird@aol.com's message of
	"Wed, 10 Nov 2004 13:32:56 EST")
References: <9e.191bcb1d.2ec3b8d8@aol.com>
Message-ID: <wkd5ylxwwu.fsf@chenla.org>

Bowerbird@aol.com writes:

> brad said:
>>   But this won't be of any help to brick and mortar libraries 
>>   who want to integrate PG etexts into their existing catalogs.
>
> why is this a priority of any kind?
>
> but perhaps i don't understand.  just precisely what would it mean for a
> "brick and mortar library" to "integrate" this e-library into their catalog?
>
> that if i walk into the place and go to the catalog to look for a book,
> it will tell me that it's available online?  d'uh, next time i'll stay home,
> and search google.  why should a brick-and-mortar library want this?
>

Are you kidding?

Check any of the major online catalogs.  They all try to integrate
records for electronic works.

The ISBD spends a very large amount of spec-space on how to format
records for electronic formated works.

I'm sorry, I live in SE Asia, and the libraries I've visited in Hong
Kong, Thailand, Malaysia, Japan and Singapore and mainland China--even
the smaller libraries have terminals which provide an electronic
catalog.  The rows of wooden drawers with paper catalog cards are
still there but most people use the computer catalog.

I haven't been back to the States much in the last 15 years but when
I have I always spend time in some University library.  They all use
online catalogs.

A growing number of brick and mortar libraries are now adding etexts
to their collections.  Sometimes they only provide links to websites
but often they are local copies of the etexts which correspond to
their catalog entry.

> i thought the goal here was to create a global library, one that is
> available 24/7 from anyplace in the world, with millions of books
> that are never "unavailable" because they are "checked out" or
> "mis-shelved" or "awaiting reshelving" or "going through re-binding"
> or because "this branch has never had a copy of that book, sorry,
> you'll have to go to the main library downtown."
>

The goal here is to create etexts which can be used anywhere--in your
home, in a high school, in a public library as well as over the
Internet.  A remote library in northern India may not be able to
afford the bandwidth to download PG texts.  But they can provide
access to a CDROM collection of the PG texts.

Librarians would love to be able to say "all copies have been checked
out, but the etext is available in pdf, html and plain text"

> am i the one who's not seeing things clearly?  or are you?
>

I think what I'm seeing and saying is very clear.  MARC is the present
standard for the vast majority of bibliographic data for libraries.
Libraries fill a very real need for their communities which will
change over time but will not vanish because of the Internet.

I'd like to hear one good reason why the catalog shouldn't be
available in as many different formats as is needed for everyone to
find and access PG texts?

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From gbnewby at pglaf.org  Wed Nov 10 12:52:46 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Nov 10 12:52:47 2004
Subject: [gutvol-d] MARC to the catalog
In-Reply-To: <41927383.9030308@perathoner.de>
References: <20041110190424.GA29073@pglaf.org> <41927383.9030308@perathoner.de>
Message-ID: <20041110205246.GA457@pglaf.org>

On Wed, Nov 10, 2004 at 09:01:07PM +0100, Marcello Perathoner wrote:
> Greg Newby wrote:
> 
> >I don't want to over-specify how I think the workflow should
> >happen.  I think that's still to be determined.  But the overall
> >flow needs to be somewhat circular: librarians need to import
> >existing PG catalog records, preferably in MARC format, to
> >existing software.  (Alev has a couple of programs for this; PGLAF
> >can probably acquire software for other folks who'd like to
> >work a lot on this activity.)  Then, updated records would need
> >to be shipped back into the catalog.
> 
> I think an easier solution would be to build an ASCII list containing 
> the etext-number and the LoC Call Number for all etexts we have.
> 
> We would then import the LoC Call Number into a field in the database.
> 
> The catalog software could then update a number of fields (Subject, LoC 
> Class, Unified Title) automatically from the LoC database (TODO Check 
> copyright status of LoC database !!!)
> 
> Then we could do a manual pass over the database with the MARC record at 
> hand and fix the author / coauthor attributions, link into wikipedia if 
> an article exists, add summaries etc.

I like this idea, but am concerned that there will still need
to be human oversight.  Just importing records will only work
if there are unambiguous matches, and it seems that matching
is often ambiguous.

>From doing lots of copyright clearances, I know that many
items are not in the LoC database (most of our non-English
is not in there).  But this would be a good start, and there
are other national library catalogs that offer Z39.50 access
to their records.
  -- Greg
 
From shalesller at writeme.com  Wed Nov 10 12:56:24 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Nov 10 12:56:37 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041110205624.5FAD64BE64@ws1-1.us4.outblaze.com>

"Joshua Hutchinson" writes:

> I was thinking more of the summary/synopsis, author info, genre, etc.

Genre must be in our system to be useful; we want to search on that.
Author birth and death is part of the MARC info.

But still, is Wikipedia the right place for an article on every book
ever written and every author who ever wrote a book? We should collaborate
with them on the encyclopedia-worthy entries, but "The Influence of Old Norse
Literature on English" isn't really encyclopedia-worthy, and neither is 
its author, Nordby, who wrote but one book. However, it might be worth
extracting the author information from the book for PG's author page.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From hart at pglaf.org  Wed Nov 10 12:57:29 2004
From: hart at pglaf.org (Michael Hart)
Date: Wed Nov 10 12:57:29 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <20041110184925.013A24BE65@ws1-1.us4.outblaze.com>
References: <20041110184925.013A24BE65@ws1-1.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0411101256220.375@pglaf.org>


Let's change the subject header to PG catalog
so I can find the relevant messages easily.

Thanks!

Michael
From shalesller at writeme.com  Wed Nov 10 13:15:35 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Nov 10 13:15:44 2004
Subject: [gutvol-d] MARC to the catalog
Message-ID: <20041110211535.0B2644BE64@ws1-1.us4.outblaze.com>

Marcello Perathoner writes:

> TODO Check 
> copyright status of LoC database !!!) 

It was created by employees of the US federal government in
the course of their work for the government. It's public 
domain.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From marcello at perathoner.de  Wed Nov 10 13:47:55 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 10 13:48:06 2004
Subject: [gutvol-d] MARC to the catalog
In-Reply-To: <20041110205246.GA457@pglaf.org>
References: <20041110190424.GA29073@pglaf.org> <41927383.9030308@perathoner.de>
	<20041110205246.GA457@pglaf.org>
Message-ID: <41928C8B.3010600@perathoner.de>

Greg Newby wrote:

>>The catalog software could then update a number of fields (Subject, LoC 
>>Class, Unified Title) automatically from the LoC database (TODO Check 
>>copyright status of LoC database !!!)

> I like this idea, but am concerned that there will still need
> to be human oversight.  Just importing records will only work
> if there are unambiguous matches, and it seems that matching
> is often ambiguous.

We can start to match the easy ones and leave the hard ones to our 
librarians. We can periodically output a list of still unmatched books.

The fields I propose to import (Subject, LoC, Unified Title) should not 
be ambiguous. It doesn't matter which edition of a work we match.

OTOH for new books still in the DP queue it might be wiser to match the 
exact edition down to the format and number of pages and the coffee 
stain on page 42.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From shalesller at writeme.com  Wed Nov 10 13:51:37 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Nov 10 13:51:54 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041110215137.69DFD4BE64@ws1-1.us4.outblaze.com>

Alev Akman writes:

> >How can this be mandatory? We've got a few composite books, that don't have 
> >a single print analogue, and many books where it would be hard or arbitrary 
> >to find an edition to get this information from. 
> 
> I was speaking for our future records. I am aware that some of our files 
> are even compilations of various additions. Hopefully we are getting away 
> from works obtained that way, maybe even redoing them. If we want 
> dependable works, we should be able to prove our source. No more chickening 
> out for copyright reasons. 

It still can't be mandatory until we're willing to seek out editions for
every single book we've done, either the exact same edition or be willing to
update the text to an edition we can find.

More towards what I was thinking, there are books that are compilations of
several printed books. PG recently posted The Fifteen Comforts of Matrimony: 
Responses from Men and The Fifteen Comforts of Matrimony: Responses from Women.
We could break those down into the individual 8 page pamphlets, but PG has
generally discouraged that. Likewise, if I can ever find the papers, a book
of short stories by Edna St. V. Millay will go to DP, consistenting of several
stories written for magazines but never collected as a book. There's no reason
for us to break that up, either.

If you enter "Het Esperanto" into the LoC catalog, you get 
"[Language, languages, and writing pamphlets]./1819-1947/51 items". So this
is a practice that LoC engages in, in its own way.

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From aakman at csufresno.edu  Wed Nov 10 14:20:51 2004
From: aakman at csufresno.edu (Alev Akman)
Date: Wed Nov 10 14:21:05 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <20041110215137.69DFD4BE64@ws1-1.us4.outblaze.com>
References: <20041110215137.69DFD4BE64@ws1-1.us4.outblaze.com>
Message-ID: <6.1.2.0.2.20041110141155.08ce5598@zimmer.csufresno.edu>

At 01:51 PM 11/10/2004, you wrote:

>Alev Akman writes:
>
> > >How can this be mandatory? We've got a few composite books, that don't 
> have
> > >a single print analogue, and many books where it would be hard or 
> arbitrary
> > >to find an edition to get this information from.
> >
> > I was speaking for our future records. I am aware that some of our files
> > are even compilations of various additions. Hopefully we are getting away
> > from works obtained that way, maybe even redoing them. If we want
> > dependable works, we should be able to prove our source. No more 
> chickening
> > out for copyright reasons.
>
>It still can't be mandatory until we're willing to seek out editions for
>every single book we've done, either the exact same edition or be willing to
>update the text to an edition we can find.
>
>More towards what I was thinking, there are books that are compilations of
>several printed books. PG recently posted The Fifteen Comforts of Matrimony:
>Responses from Men and The Fifteen Comforts of Matrimony: Responses from 
>Women.
>We could break those down into the individual 8 page pamphlets, but PG has
>generally discouraged that. Likewise, if I can ever find the papers, a book
>of short stories by Edna St. V. Millay will go to DP, consistenting of several
>stories written for magazines but never collected as a book. There's no reason
>for us to break that up, either.

Here's the tabbed text format of the title you mentioned:

The Fifteen comforts of matrimony.      La Sale, Antoine de     1795    [by 
Isaiah Thomas] and sold at the Worcester bookstore. Translation of: Les 
quinze joyes de mariage, sometimes attributed to Antoine de La Sale.; 
Printer's name supplied by Evans.; Evans; Microform version available in 
the Readex Early American Imprints series.; Electronic text and image 
data.; EvansDigital.   (none)  144 p. ill. 16 cm. (12mo)       (none)  16 
cm. 
(12mo)   (none)  (none)  (none)  (none)  (none)  (none)  eng     (none) 
(none)  144 p.  Printed at Worcester, Massachusetts     Marriage.; Women; 
Women in literature.  With an addition of three comforts more. : Wherein 
the various miscarriages of the wedded state, and the miserable 
consequences of rash and inconsiderate marriages are laid open and 
detected. (none)  (none)  (none)

Obviously, fields shown as (none) would be shown as blank in the database.

If necessary, we can have a catalog entry for the combined works but give 
the information for the individual works within the Notes field, indicating 
the titles, dates, whatever.

And, here's the MARC version:


041 1 $a eng$h fre
245 04$a The Fifteen comforts of matrimony.$h [electronic resource] :$b 
With an addition of three comforts more. : Wherein the various miscarriages 
of the wedded state, and the miserable consequences of rash and 
inconsiderate marriages are laid open and detected.
260   $a Printed at Worcester, Massachusetts, :$b [by Isaiah Thomas] and 
sold at the Worcester bookstore.,$c 1795.
300   $a 144 p. :$b ill. ;$c 16 cm. (12mo)
500   $a Translation of: Les quinze joyes de mariage, sometimes attributed 
to Antoine de La Sale.
500   $a Printer's name supplied by Evans.
510 4 $a Evans$c 28948.
530   $a Microform version available in the Readex Early American Imprints 
series.
533   $a Electronic text and image data.$b [Chester, Vt. :$c Readex, a 
division of Newsbank, Inc.,$d 2002-2004.$e Includes files in TIFF, GIF and 
PDF formats with inclusion of keyword searchable text.$f (Early American 
imprints. First series ; no. 28948)
590   $a EvansDigital.
650  0$a Marriage.
650  0$a Women $v Anecdotes.
650  0$a Women in literature.
655  7$a ebooks$2 local.
655  7$a eresource$2 local.
655  7$a Facetiae.$2 rbgenr.
690   $a Evans Digital Edition.
690   $a Early American Imprints.
700 1 $a La Sale, Antoine de ,$d b. 1388?
752   $a United States$b Massachusetts$d Worcester.
830  0$a Early American imprints.$n First series ;$v no. 28948.
856 41$u 
http://libproxy.unm.edu/login?url=http://opac.newsbank.com/select/evans/28948 
$y Click here $z to access Evans Digital Edition

You see also where we would attach our files? On field 856?

Yup, it is doable.

You think a record like that where we can harvest what we need is useful?

Alev.


>If you enter "Het Esperanto" into the LoC catalog, you get
>"[Language, languages, and writing pamphlets]./1819-1947/51 items". So this
>is a practice that LoC engages in, in its own way.
>
>--
>___________________________________________________________
>Sign-up for Ads Free at Mail.com
>http://promo.mail.com/adsfreejump.htm
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>---
>Incoming mail is certified Virus Free.
>Checked by AVG anti-virus system (http://www.grisoft.com).
>Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
-------------- next part --------------

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004
From Bowerbird at aol.com  Wed Nov 10 15:22:26 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov 10 15:22:42 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <190.32b05033.2ec3fcb2@aol.com>

brad said:
>   Are you kidding?

no.  are you?


>   Check any of the major online catalogs.  
>   They all try to integrate records for electronic works.

yeah.  but so what?

i don't get the impression that the people who are looking 
for e-texts go to a library's catalog to try and find 'em...


>   The rows of wooden drawers with paper catalog cards are
>   still there but most people use the computer catalog.

um, yes, i quite realize that.
as far as i know, the "rows of wooden drawers"
were effectively replaced a decade or more ago.
they're still kind of quaint, though, don't you think?

in spite of their digitized equivalents, however,
google (and the others) are the new "card catalog".

so i would think it would be _far_ more productive to
implement a strategy that leverages the search engines
(because, honestly, the system you have now does not),
instead of plays into all of the antiquated systems.

heck, i'd even like to see a decent system 
on the website.  the one there now is good
if you know the title and/or the author-name.
that's a start.  but it doesn't go very far,
not in _recommending_ a book to a reader...


>   A growing number of brick and mortar libraries 
>   are now adding etexts to their collections.  
>   Sometimes they only provide links to websites
>   but often they are local copies of the etexts 
>   which correspond to their catalog entry.

well, that's all very nice.  and if they did all of the work to
integrate this e-library into their system, i'd say "thanks".

but i don't see much purpose in doing that work myself,
not when a whole host of other capabilities would be
_far_ more useful, in my eyes, like full-text search...

if getting to the patrons of a specific brick-and-mortar library
could _only_ be done by getting myself in that specific catalog,
it might be a completely different story.  but when those patrons
can just as easily use their computer to find my e-texts in google
as to find them in the library's catalog, i don't see much difference.

besides, show me one good system -- from any library out there! --
that helps a person find a book that they will like.  show me!  please!

i said it before, and i'll say it again.  collaborative filtering does this.
spend time _productively_, building a collaborative filtering system.
people don't decide what to read based on the info in a marc record.


>   A remote library in northern India may not be able to
>   afford the bandwidth to download PG texts.  But they can 
>   provide access to a CDROM collection of the PG texts.

ok, except now you're talking about something different.
putting a c.d. of the e-texts into a brick-and-mortar library
-- or indeed, in every residence in india with a computer --
is a great idea.  but that has little to do with marc records.


>   Librarians would love to be able to say 
>   "all copies have been checked out, 
>   but the etext is available in pdf, html and plain text"

you'd think so.  but michael reports he has encountered resistance.
say what you will, i don't think "the absence of marc records" is why.
(and when librarians finally decide they _do_ want a copy of the c.d.,
an absence of those marc records will be of zero consequence to them.)

nonetheless, i'm sure their intelligent patrons will be able to find a copy.
online.  using google.  to download.  and burn copies for their neighbors.


>   I'd like to hear one good reason why the catalog shouldn't be
>   available in as many different formats as is needed 
>   for everyone to find and access PG texts?

the best reason of all is because no one seems to want to do the work.
most of you say "it's a good idea" (although i haven't heard one single
compelling reason _why_), but very few have done anything about it.
(kudos to andrew.  and what is his experience?  "it's hard work!")

now, what is your counterargument to that?  yeah, that's what i thought.

-bowerbird
From marcello at perathoner.de  Wed Nov 10 15:38:25 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 10 15:38:40 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com>
References: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com>
Message-ID: <4192A671.4010002@perathoner.de>

Joshua Hutchinson wrote:


> What fields should be mandatory for our <teiHeader> and which fields
> should be optional?

Start with something like this:

Note that the sourceDesc should always accurately describe the physical 
source. If you collect items from more than one phys. source all sources 
should be listed. Also if you split one physical source into multiple 
etexts, the source should appear in all etexs.

Note the DP project number and the LoC Call Number. The LCCN should be 
given in the sourceDesc if it matches the physical source exactly. It 
should be given in the publicationStmt if it matches a different edition 
of the same work.

Note: in the HTML file you'll have to enclose the teiHeader in comments 
and make sure you replace all occurences of -- inside the header with 
&mdash;


You don't have to provide the <sourceDesc> and <profileDesc> stuff if 
you don't want to. I really just need the LCCN number and can pull the 
rest from the LoC database.

An absolutely minimal header should look like:

<teiHeader>
   <fileDesc>
     <publicationStmt>
       <idno type="dp-prj">DP Project Number goes here</idno>
       <idno type="lccn">Same work LCCN goes here.</idno>
     </publicationStmt>
     <sourceDesc>
       <biblStruct>
         <idno type="lccn">Only exact source LCCN goes here.</idno>
       </biblStruct>
     </sourceDesc>
   </fileDesc>
</teiHeader>

A full header should look like:

<teiHeader>
   <fileDesc>
     <titleStmt>
       <title>Common sense</title>
       <author>Paine, Thomas (1737-1809)</author>
     </titleStmt>
     <publicationStmt>
       <publisher>Project Gutenberg</publisher>
       <idno type="dp-prj">DP Project Number goes here</idno>
       <idno type="lccn">Same work LCCN goes here.</idno>
     </publicationStmt>
     <editionStmt>
       <edition n='10'>First PG Edition</edition>
     </editionStmt>
     <notesStmt>
       <note><emph>Brief</emph> notes on the text.</note>
     </notesStmt>
     <sourceDesc>
       <biblStruct>
         <monogr>
           <editor>Foner, Philip S.</editor>
           <title>The collected writings of Thomas Paine</title>
           <imprint>
             <pubPlace>New York</pubPlace>
             <publisher>Citadel Press</publisher>
             <date value="1945-01">January 1945</date>
           </imprint>
           <extent>19 pp.</extent>
         </monogr>
         <idno type="lccn">Only exact source LCCN goes here.</idno>
       </biblStruct>
     </sourceDesc>
   </fileDesc>
   <encodingDesc>
     <classDecl>
       <taxonomy id="lcsh">
         <bibl>Library of Congress Subject Headings</bibl>
       </taxonomy>
       <taxonomy id="lc">
         <bibl>Library of Congress Classification</bibl>
       </taxonomy>
       <taxonomy id="lccn">
         <bibl>Library of Congress Call Number</bibl>
       </taxonomy>
       <taxonomy id="dp-prj">
         <bibl>Distributed Proofreaders Project Number</bibl>
       </taxonomy>
     </classDecl>
   </encodingDesc>
   <profileDesc>
     <creation>
       <date>Date text was created goes here: 1774</date>
     </creation>
     <langUsage>
       <language id="en">English.</language>
     </langUsage>
     <textClass>
       <keywords scheme="lcsh">
         <list>
           <item>Political science</item>
           <item>United States &mdash; Politics and government &mdash;
           Revolution, 1775-1783</item>
         </list>
       </keywords>
       <classCode scheme="lc">JC 177</classCode>
     </textClass>
   </profileDesc>
   <revisionDesc>
     <change>
       <date value="2004-01">January 2004</date>
       <respStmt>
         <name>Joshua Hutchinson</name>
         <name>Tonya Allen</name>
         <name>Distributed Proofreaders Team</name>
       </respStmt>
       <item>Scanned and proofed it.</item>
     </change>
   </revisionDesc>
</teiHeader>


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Wed Nov 10 15:47:28 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 10 15:47:40 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <190.32b05033.2ec3fcb2@aol.com>
References: <190.32b05033.2ec3fcb2@aol.com>
Message-ID: <4192A890.6040201@perathoner.de>

Bowerbird@aol.com wrote:


> besides, show me one good system -- from any library out there! --
> that helps a person find a book that they will like.  show me!  please!

I think you should program one over the weekend in BASIC to keep up your 
record of phenomenal software success stories.


> i said it before, and i'll say it again.  collaborative filtering does this.
> spend time _productively_, building a collaborative filtering system.

Oh, no! He learned a new buzzword. He'll never let us hear the end of it.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Wed Nov 10 16:25:25 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov 10 16:25:45 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <1da.2f1c0458.2ec40b75@aol.com>

marcello said:
>   I think you should program one over the weekend in BASIC 
>   to keep up your record of phenomenal software success stories.

i'm busy this weekend.  but i'll get to it next month, i promise...

but it's appropriate that -- on my way out -- you'd try to
smear me because i use _basic_ -- goodness gracious --
since that was exactly what you did when i first appeared.

you also thought it was fashionable to try and ridicule me
because i'm on a.o.l., so do try and work that in now too...

because you can bet money that when i start up my blog and
tell the world what a bunch of idiots are running the show here,
i will make fun of you because you're a bunch of script kiddies
who think that you'll rule the world because you know reg-ex.


>   Oh, no! He learned a new buzzword. 
>   He'll never let us hear the end of it.

actually, i've served my one-year sentence here,
much of it in solitary, so i'll be leaving very shortly.

oh, i learned that "buzzword" a long, long time ago.
as far as "finding" things goes, it's a silver bullet...

and if anyone wants to see it applied to books,
the guy at alexlit was doing it many years ago.
might wanna talk to him about it...

-bowerbird
From shalesller at writeme.com  Wed Nov 10 17:21:38 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Nov 10 17:21:52 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041111012138.989774BE64@ws1-1.us4.outblaze.com>

Alev Akman writes:

> 
> At 01:51 PM 11/10/2004, you wrote: 
>
> >More towards what I was thinking, there are books that are compilations of 
> >several printed books. PG recently posted The Fifteen Comforts of Matrimony: 
> >Responses from Men and The Fifteen Comforts of Matrimony: Responses from 
> >Women. 
>
> Here's the tabbed text format of the title you mentioned: 
> 
> The Fifteen comforts of matrimony. La Sale, Antoine de 1795 [by 
> Isaiah Thomas] and sold at the Worcester bookstore. 

But it's not. A book that included part of the current PG title was published 
and printed then; but the PG title also includes various other publications
that weren't combined with it in book form.

(It's probably a moot point, but that's not the text; that part of PG text was 
printed in 1706 in England. It may be a reprint, but it could be a seperate 
translation or a different text trading on the popularity of the first.)

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From gbnewby at pglaf.org  Wed Nov 10 17:30:08 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Nov 10 17:30:10 2004
Subject: [gutvol-d] iPod transformation to try
Message-ID: <20041111013008.GA10145@pglaf.org>

I'm forwarding the information below in the hopes that
some folks with iPods could check this out.  Provided it
works, I'll link it in our PDA "howto" and maybe we'll put
a note about it in the newsletters.  

Please Cc any feedback to Dan, as well as gutvol-d.  Thanks!  Greg

----- Forwarded message from Dan Duris <dusoft@staznosti.sk> -----

 From: Dan Duris <dusoft@staznosti.sk>
 To: Greg Newby <gbnewby@pglaf.org>
 Subject: Re: Converting Project Gutenberg's books for read on iPod
 Date: Thu, 11 Nov 2004 02:10:34 +0100

iPod eBook Creator allows you to convert any text file to notes (note files 
for iPod). You can then read almost any book, magazine or message available 
in plain text format on your iPod. It's free online service that basically 
splits the large text files to small ones (due to iPod limitation on note 
files). It also links the notes, so you can easily browse them same way as 
turning pages in the book.

URL:
http://www.ambience.sk/ipod-ebook-creator/ipod-book-notes-text-conversion.php

Any comments are welcome at: dusoft@staznosti.sk
(I am not subscribed to this mailing list)
From stephen.thomas at adelaide.edu.au  Wed Nov 10 17:51:24 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Wed Nov 10 17:51:46 2004
Subject: [gutvol-d] PG Catalog
In-Reply-To: <4191EFB7.6080203@perathoner.de>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<4191152C.9080702@perathoner.de>
	<6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu>
	<200411091911.28485.lynne@rhodesresearch.biz>
	<200411100608.iAA68I6P016938@posso.dm.unipi.it>
	<4191EFB7.6080203@perathoner.de>
Message-ID: <4192C59C.4080203@adelaide.edu.au>

The central problem, if I've understood all the posts, is that 
the catalog entry is generated from the final header, which as 
we all know omits lots of detail which the volunteers have.

Would it be possible to add manual cataloguing to the posting 
workflow? By which I mean, when a person (whitewasher?) posts a 
new text, they also edit the catalog to add whatever level of 
detail for the work is to hand.

I understand that we don't want to add to the whitewasher's 
workload, but -- thanks to Marcello's web interface -- it is 
really quite easy to add to a catalog entry, so probably not a 
great deal of work in comparison to the work they already do.

Of course, once all the TEI stuff is in place, this won't be 
necessary, but in the meantime ...


Steve


Marcello Perathoner wrote:
> Carlo Traverso wrote:
> 
>> The first step however is to have better PG records, and a method to
>> avoid losing information from DP to the PG catalogue.
> 
> 
> If you put a complete <teiHeader> ... </teiHeader> somewhere in the 
> files, maybe at the back where it won't hurt much, I can easily pick it 
> out and parse it into the database. Of course it has to stay in the file 
> after being posted.
> 
> What is happening now is that I parse the tiny header at the top of the 
> file and I get just what's there.
> 
> 
> 

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From stephen.thomas at adelaide.edu.au  Wed Nov 10 17:51:34 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Wed Nov 10 17:51:52 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <Pine.GSO.4.58.0411100925360.12273@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com>
	<41924575.2030708@perathoner.de>
	<Pine.GSO.4.58.0411100925360.12273@vtn1.victoria.tc.ca>
Message-ID: <4192C5A6.3020006@adelaide.edu.au>

Andrew Sly wrote:
> 
> I don't believe we are ready. There is right now no agreement
> about what form this data would take, or what standard to try
> to comply with.
> 
> If various volunteers all get to enter their own idea of what
> catagories and subject headings appeal to them, we will end up
> with a mish-mash of conflicting and overlapping data.
> 
> I am no expert here, but I have read enough to know that
> doing subject cataloging _well_ is more involved most
> people realise.

Yes indeed. Library systems use what's known as an authority 
file for subject headings (and also for authors). This lists 
only headings that are "authorised" -- e.g. for LCSH, conform to 
the LCSH standards.

Now, PG is *never* going to have such a file (it would be huge) 
and I don't think it should -- LCSH is famously arcane and often 
seems rather arbitrary. (Although there are teams of librarians 
working day and night in a dark tower somewhere making sure that 
only the "correct" terms are used. ;-)

Ideally though, there should be some guidelines about what terms 
should be used in the subject field, otherwise it will be less 
than useful. For example, if we are going to apply the term 
"Fiction" to some works of fiction, then it should be applied to 
all. Otherwise, it's usefulness as a search term is diminished.

The key problem is one of scale. Do you limit the field to a 
short list of valid terms ("fiction", "history", ...) and risk 
them being too broad to be useful, or do you allow a longer list 
with greater precision, and risk the list being too long to be 
manageable?

Sorry, I don't have an answer to that. Needs debate.

Steve

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From aakman at csufresno.edu  Wed Nov 10 18:39:54 2004
From: aakman at csufresno.edu (Alev Akman)
Date: Wed Nov 10 18:40:10 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <276b28274fcb.274fcb276b28@cvip.net>


----- Original Message -----
From: "D. Starner" <shalesller@writeme.com>
Date: Wednesday, November 10, 2004 5:21 pm
Subject: Re: [gutvol-d] [BP] The Future of eBooks

> Alev Akman writes:
> 
> > 
> > At 01:51 PM 11/10/2004, you wrote: 
> >
> > >More towards what I was thinking, there are books that are 
> compilations of 
> > >several printed books. PG recently posted The Fifteen Comforts 
> of Matrimony: 
> > >Responses from Men and The Fifteen Comforts of Matrimony: 
> Responses from 
> > >Women. 
> >
> > Here's the tabbed text format of the title you mentioned: 
> > 
> > The Fifteen comforts of matrimony. La Sale, Antoine de 1795 [by 
> > Isaiah Thomas] and sold at the Worcester bookstore. 
> 
> But it's not. A book that included part of the current PG title 
> was published 
> and printed then; but the PG title also includes various other 
> publicationsthat weren't combined with it in book form.

I was just trying to make a point and give an example. I had no way of knowing what the publication date for the title in question was. Did I? : )

Alev. 


> 
> (It's probably a moot point, but that's not the text; that part of 
> PG text was 
> printed in 1706 in England. It may be a reprint, but it could be a 
> seperate 
> translation or a different text trading on the popularity of the 
> first.)
> -- 
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From shalesller at writeme.com  Wed Nov 10 20:01:10 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Nov 10 20:01:29 2004
Subject: [gutvol-d] [BP] The Future of eBooks
Message-ID: <20041111040110.D9C664BE64@ws1-1.us4.outblaze.com>

Alev Akman writes:

> > But it's not. A book that included part of the current PG title 
> > was published 
> > and printed then; but the PG title also includes various other 
> > publicationsthat weren't combined with it in book form. 
> 
> I was just trying to make a point and give an example. I had no way of knowing what the publication date for the title in question was. Did I? : ) 

What point? It's obvious we're at cross purposes. The publication date
had nothing to do with it. My point was that that data was wrong, because
the PG book was a unique compliation of several original volumes; the fact 
that the publication date was wrong was a red herring.

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From sly at victoria.tc.ca  Wed Nov 10 23:59:57 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Nov 11 00:00:16 2004
Subject: [gutvol-d] PG Catalog
In-Reply-To: <4192C59C.4080203@adelaide.edu.au>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<4191152C.9080702@perathoner.de>
	<6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu>
	<200411091911.28485.lynne@rhodesresearch.biz>
	<200411100608.iAA68I6P016938@posso.dm.unipi.it>
	<4191EFB7.6080203@perathoner.de> <4192C59C.4080203@adelaide.edu.au>
Message-ID: <Pine.GSO.4.58.0411102259470.8285@vtn1.victoria.tc.ca>


Hi Steve.

The problem with this proposition is that at the time a
whitewasher is working on the final posting of a text,
there is no catalog record to edit yet.

New records are only generated once a day, when the
directories are automatcally scanned to find any new files.

Also, I do have the impression that the whitewashers
would rather not deal with cataloging issues. (where
a small change can suddenly require further following
up in order to keep the catalog somewhat consistent,
deal with further issues, etc.)

As the closest thing we have to a "Catalog content supervisor"
I will volunteer to work with additional information if
we can find some way to get it to me--preferrably via
catalog[at]pglaf.org--from the people producing the texts.

And I must add here that simply having a tei template in
place will not remove the advisability of still manually
looking through every record. With the amount of less-than-ideal
modifications that can creep in when just dealing with a Title
and Author, I can only think I would see more if more fields
are included.

Andrew

On Thu, 11 Nov 2004, Steve Thomas wrote:

> The central problem, if I've understood all the posts, is that
> the catalog entry is generated from the final header, which as
> we all know omits lots of detail which the volunteers have.
>
> Would it be possible to add manual cataloguing to the posting
> workflow? By which I mean, when a person (whitewasher?) posts a
> new text, they also edit the catalog to add whatever level of
> detail for the work is to hand.
>
> I understand that we don't want to add to the whitewasher's
> workload, but -- thanks to Marcello's web interface -- it is
> really quite easy to add to a catalog entry, so probably not a
> great deal of work in comparison to the work they already do.
>
> Of course, once all the TEI stuff is in place, this won't be
> necessary, but in the meantime ...
>
>
> Steve
From gld199 at yahoo.com  Thu Nov 11 07:07:56 2004
From: gld199 at yahoo.com (Gemma Dearing)
Date: Thu Nov 11 07:08:02 2004
Subject: [gutvol-d] browse books published in a specific year?
Message-ID: <20041111150756.26633.qmail@web50803.mail.yahoo.com>

Hi,

I want to read books which were published in a
specific range of years (1800-1820); is there any way
I can browse/search for this?

I realise that specific dates might not always be
known, but even approximate date ranges would help a
lot.

TIA, 
Gemma.


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system
(http://www.grisoft.com).
Version: 6.0.793 / Virus Database: 537 - Release Date:
10/11/2004


___________________________________________________________ 
Moving house? Beach bar in Thailand? New Wardrobe? Win 10k with Yahoo! Mail to make your dream a reality. 
Get Yahoo! Mail http://uk.mail.yahoo.com
From scott_bulkmail at productarchitect.com  Thu Nov 11 08:17:19 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Thu Nov 11 08:19:37 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <Pine.GSO.4.58.0411101051080.5761@vtn1.victoria.tc.ca>
References: <20041110182853.9E2444BE64@ws1-1.us4.outblaze.com>
	<Pine.GSO.4.58.0411101051080.5761@vtn1.victoria.tc.ca>
Message-ID: <p0611040cbdb935d4cc26@[192.168.0.52]>

>If enough people would like to contribute a brief synopsis for texts
>in the collection, we already have a place in the catalog they can
>go. (although I don't know about the mechanics behind it)

What we don't have are links on the book's Web page which say:
	Add a summary
	Add a review

When/if this gets implemented, I strongly recommend that the person's contribution is posted automatically and immediately.  People want to see an immediate benefit from their effort (however modest), rather than remembering to check back to see if their voice was heard.

To minimize spam, the software could email a copy to a gut-comments-verification list (or some such), and any authorized person could go in and delete/edit if needed.

Note that I'm inverting the usual process: instead of requiring every contribution to be approved, only require extra effort in the rare case of spam or other problem.

FWIW, I also think a simple registration system would be fine, e.g. verify that a commenter is a member of any gut* list, or just do a simple round-trip to verify that they are supplying a valid email address.


>When I tried to make a few of these myself, I found that
>writing a good brief syopsis of a novel was harder than I
>would have thought.

True, but we shouldn't let that stand in the way of easy cases.  Sometimes it's enough to copy or excerpt the preface.  For example:

13032 The Book of Noodles

>From the Preface: My design has been to bring together, from widely scattered sources, many of which are probably unknown or inaccessible to ordinary readers, the best of this class of humorous narratives, in their oldest existing Buddhist and Greek forms as well as in the forms in which they are current among the people in the present day. It will, perhaps, be thought by some that a portion of what is here presented might have been omitted without great loss; but my aim has been not only to compile an amusing story-book, but to illustrate to some extent the migrations of popular fictions from country to country.
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
http://ProductArchitect.com/ - consulting
From marcello at perathoner.de  Thu Nov 11 08:35:51 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov 11 08:35:57 2004
Subject: [gutvol-d] browse books published in a specific year?
In-Reply-To: <20041111150756.26633.qmail@web50803.mail.yahoo.com>
References: <20041111150756.26633.qmail@web50803.mail.yahoo.com>
Message-ID: <419394E7.9060607@perathoner.de>

Gemma Dearing wrote:

> I want to read books which were published in a
> specific range of years (1800-1820); is there any way
> I can browse/search for this?

Not in the online catalog.

There are many dates that would come in useful if we knew them:

  - date the text was created
  - date the text was first published
  - date the text was first published in the USA
  - date the edition was first published
  - date the last contributor died


-- 
Marcello Perathoner
webmaster@gutenberg.org

From traverso at dm.unipi.it  Thu Nov 11 09:27:55 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Thu Nov 11 09:28:07 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <p0611040cbdb935d4cc26@[192.168.0.52]> (message from Scott Lawton
	on Thu, 11 Nov 2004 11:17:19 -0500)
References: <20041110182853.9E2444BE64@ws1-1.us4.outblaze.com>
	<Pine.GSO.4.58.0411101051080.5761@vtn1.victoria.tc.ca>
	<p0611040cbdb935d4cc26@[192.168.0.52]>
Message-ID: <200411111727.iABHRtHL008551@posso.dm.unipi.it>

>>>>> "Scott" == Scott Lawton <scott_bulkmail@productarchitect.com> writes:

    >> If enough people would like to contribute a brief synopsis for
    >> texts in the collection, we already have a place in the catalog
    >> they can go. (although I don't know about the mechanics behind
    >> it)

    Scott> What we don't have are links on the book's Web page which
    Scott> say: 
    Scott>    Add a summary 
    Scott>    Add a review


    Scott> the person's contribution is posted automatically and
    Scott> immediately.  People want to see an immediate benefit from
    Scott> their effort (however modest), rather than remembering to
    Scott> check back to see if their voice was heard.

I see a risk in this: the easiest way to add a summary is to grab one
somewhere in the net, or in a book cover, and paste it. But it may be
copyrighted, and PG will risk being sued for this. Not because of
this, but some publisher might try to shut down PG with this excuse.

This should be made by another web site, completely separated, and not
being an interesting target. The way of accessing this other web site
from the book catalogue page might be done in a legally safe way.

Carlo
From sly at victoria.tc.ca  Thu Nov 11 13:23:42 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Nov 11 13:23:51 2004
Subject: [gutvol-d] Sources for PG texts
Message-ID: <Pine.GSO.4.58.0411111323160.10940@vtn1.victoria.tc.ca>


On Thu, 11 Nov 2004, Jon Noring wrote:

> Let me clarify (again) below, what I wrote in a separate message. You
> may still reject it, but PG's past carelessness and looseness leads to
> legitimate questions about the accuracy and acceptance of the pre-DP-
> era texts. "Corrupt" may be a strong word (and inaccurate), but not
> placing "textual integrity" as #1 (including the perception of textual
> integrity) is simply wrong. Note that perceptions are just as real as
> reality itself.


I would be careful about making a distinction between pre and post
DP-era texts. Since the creation of DP, there have been countless
texts added to the PG collection from other sources, without page
images saved, or indication of exact editions used.

Project Gutenberg has always been open to accepting texts from
any source, as long as they can be copyright-cleared. I don't
see any likelihood of this changing.


Andrew
From sly at victoria.tc.ca  Thu Nov 11 14:51:04 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Nov 11 14:51:15 2004
Subject: [gutvol-d] Cataloging
In-Reply-To: <4192C5A6.3020006@adelaide.edu.au>
References: <Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<Pine.GSO.4.58.0411091020540.19838@vtn1.victoria.tc.ca>
	<5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com>
	<41924575.2030708@perathoner.de>
	<Pine.GSO.4.58.0411100925360.12273@vtn1.victoria.tc.ca>
	<4192C5A6.3020006@adelaide.edu.au>
Message-ID: <Pine.GSO.4.58.0411111424060.23978@vtn1.victoria.tc.ca>


(Yes, there is a mailing list for discussing cataloging
issues, but it seems to have very little traffic, and
I feel I may have a better chance of sharing my ideas
with people here.)

On Thu, 11 Nov 2004, Steve Thomas wrote:

> The key problem is one of scale. Do you limit the field to a
> short list of valid terms ("fiction", "history", ...) and risk
> them being too broad to be useful, or do you allow a longer list
> with greater precision, and risk the list being too long to be
> manageable?
>
> Sorry, I don't have an answer to that. Needs debate.

I don't have an answer either. So I'll ask a question:
Is it possible to have both large and small scale?

Here is one possible way that could be approached:

In the recent discussion on this list regarding cataloging,
I've seen mention of different things that I might label
genre, form and subject.

Genre would be examples such as Science Fiction, Mystery,
Historical Fiction, etc.

Form would be examples such as novel, essays, drama, poetry,
short stories, etc., as Steve mentioned is coded in MARC 008
field.

Subject would be the subject headings one could find in a
traditional library's catalog. For example:
Legends--British Columbia--Vancouver


We already have some examples creeping into the PG catalog
of trying to cover all of these in the Subject field.
(ie a collection of poems with "Subject: Poetry". This
should be used for a book which is _about_ poetry, not one
which merely contains poetry.)


All three of these divisions could really be of great use
to people using the catalog; however, having enough volunteer
effort to have them consistently entered is of course a
sticking point.

Andrew
From gbnewby at pglaf.org  Thu Nov 11 18:24:13 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Nov 11 18:24:13 2004
Subject: Did this get slipped in without discussion? (was Re: [gutvol-d]
	Cleaning up messes)
In-Reply-To: <LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
Message-ID: <20041112022413.GB8242@pglaf.org>

(I redirected to gutvol-d@lists.pglaf.org.  Who sent this
to Lyris @ listserv.unc.edu?  That server is broken, the list
there is defunct.  I have been trying to delete the list there
for months, but the software is perpetually non-responsive)

This will probably be my last response to Jon.  Clearly, some
people can't take "yes, go for it!" as an answer.  Jon
wants to tell other people how they should do things, but
is unwilling to make things happen himself.  He insists there
is a "right" way of doing things, and belittles the efforts
of those who don't fit his notions.  

My view is that Jon will not be content until all the people
working on PG are ousted, in favor of his preferred organization,
governance, fundraising, production rules, and collection
guidelines.  This is not going to happen anytime soon, and 
other than being critical of the status quo, Jon has contributed
nothing towards making it happen anyway.  Instead, Jon has
repeatedly been offered the ability -- with support and encouragement --
to create the organization or content he so strongly desires.

Some people can't take "yes" for an answer, or are not content
with the ability to control their own domain without controlling
others.  

A few more comments:

On Thu, Nov 11, 2004 at 01:35:04PM -0700, Jon Noring wrote:
> Greg Newby wrote:
> 
> > I have only a few brief things to say about this.  Jon, and other
> > interested persons, are very much welcome to start their own projects,
> > sub-projects or related activities to pursue this agenda, or other
> > agendas.  We (the messy ones) will provide encouragement and support.
> >
> > We have pretty extensive wording on this philosophy and encouragement
> > in the "FAQ" items Michael and I wrote, online at
> >
> > http://gutenberg.org/about
> 
> I urge all the PG people reading this to read Michael Hart's statement
> of the principles of PG governance given in
> 
>    http://www.gutenberg.org/about/faq1
> 
> (Notice the date of it from June, and edited in October, after much of
> the discussion about the organization and governance of PG. As far as
> I know, this was silently put up without any announcement to the
> group.)
> 
> Was this statement of principles run by the actual owners of PG, the
> thousands and thousands of volunteers who have donated their untold
> hours of time to further Project Gutenberg? Did they get a chance to
> discuss and approve of this statement?

There were announcements with requests for feedback in about 6
*months* of weekly & monthly newsletters, with advance copies going
back to around May.  There was a posting to the front page of
gutenberg.org, for months and months.  There were at least a few
mentions on gutvol-d.

> Or is PG a "benevolent" dictatorship, where the volunteers-at-large
> are not given any real say?

You know better.

> So much for democracy and decentralization, where "less is more."
> (Orwell?)

I see PG primarily as a meritocracy.  Always, the pattern
is to enable, empower, support and encourage those who want
to do things to further the mission - or related activities.
The people who do the most are the most active in shaping policy
and future direction.

Your insinuation that there are central power brokers who are
insulated from the many people who are contributing is inconsistent
with how things -- *all* things -- get done.  

> Who owns Project Gutenberg, anyway? Until that is clarified, nothing
> can be resolved.

You know the answer to this, too.  You are simply trying to
stir up discontent and create an "us vs. them" atmosphere.

For those who, unlike Jon, don't know: visit http://gutenberg.org/fundraising
for a quick rundown.  An even quicker rundown:

- Michael created Project Gutenberg, and owns the trademarked name,
  "Project Gutenberg"
- PGLAF was formed in 2001 as the legal entity that operates Project
  Gutenberg
- PGLAF has four board members, including me.  I'm also the CEO.
- I am a volunteer for PGLAF, and have worked with PG since 1992.

The extent to which Michael, or I, or PGLAF, has sway over the
daily activities of PG is limited.  Set direction: yes.  Control
some of the technologies: to some extent.  Get people to do stuff:
only as they agree & desire.

The ability of Jon or anyone else to take leadership and
make things happen is just as strong as mine, or anyone's.

Flinging mud because so few people subscribe to your view
of reality is certainly not going to create progress towards
your goals.

> > Finally and most importantly, I utterly reject Jon's accusation that
> > the lack of source matter or other metadata (or formatting, or
> > anything else) makes the Project Gutenberg content of today or
> > yesterday "corrupt."
> 
> Let me clarify (again) below, what I wrote in a separate message. You
> may still reject it, but PG's past carelessness and looseness leads to
> legitimate questions about the accuracy and acceptance of the pre-DP-
> era texts. "Corrupt" may be a strong word (and inaccurate), but not
> placing "textual integrity" as #1 (including the perception of textual
> integrity) is simply wrong. Note that perceptions are just as real as
> reality itself.
> 
> I take it PG's official position, then, is that PG will continue
> with the policy of not requiring the source information to be included
> in the metadata associated with each PG text? If this policy is to

Yes.  There is very little that is required, and as the FAQs
mentioned above say quite clearly, we intend to keep it that way.

> continue, why? If this policy been changed, then that calls into
> question those texts where the pedigree is unknown.

Question them all you want.  Or don't even read them.  But
if you want to fix them, get started, rather than talking
about careless, inaccuracy, lack of textual integrity, etc.

As I mentioned, I'm tired of saying "yes" to you, and then
having you argue about it.  You have all the freedom you could
possibly want to do things your way.  What you cannot have
is control of the past or present of PG.

> I'm pretty certain that the vast, overwhelming majority of PG
> volunteers who do take a position either way on this issue want the
> full source information to be included in the metadata.
> 
> As I said in the followup clarification:
> 
> "A clarification...
> 
> "Note that certainly any third-party can attempt to verify the
> authenticity of a PG text even if the source information is not known
> and no scans are given. However, not giving the source work (and not
> making the scans immediately available), the third-party has a much
> more difficult time in verifying the text.

You are envisioning frustrated scholars and others who care
about such things.  Those are not our target audience, and never
have been.  While it's likely that some such scholars have "turned off"
to PG, I can tell you that there are close to zero requests for
such pedigree information that come in on a monthly basis.  

In short, you are trying to portray your pet peeve as a universal
truth, desired by all.  First, again (and as stated in literally
*every* PG header, for decades): we do not try to keep our books
in accord with any particular print edition.  We are not catering
to scholars who care about particular dead trees sources.  Second,
I do not accept your idea that this is a major impediment to 
use and acceptance by scholars, or anyone else.  This is pure
speculation on your part (regardless of whether it's backed up
by a few personal stories), and counter-indicated by the uses and
support requests we hear about.

Finally, and perhaps most importantly from your point of view,
I'm still saying "yes," not "no."  I will be perfectly happy,
overjoyed even, to have better tracking of source information,
richer markup, and available scans for more of our eBooks.  I
expect that part of our cataloging discussion outcomes will be
better facilities for doing this -- as will the outcomes of the
PGTEI markup.  But as I keep saying, (a) the lack of 
pedigree, scans, etc. are not going to stop us from adding submitted
eBooks; (b) people who want to retroactively work on existing
eBooks are welcome to do so.

> "PG, by identifying the source document, *and* providing scans, adds a
> lot of credibility (and greater usability) of the digitized texts it
> produces and distributes. This action effectively says: "We are proud
> of our work, and stand behind it fully. We even provide you, the user,
> with full information about its pedigree, and the original page scans
> are available for your use and easy verification."

Once again, you are belittling the efforts of everyone who created
these works.  Did you ever hear the story about flies, honey & vinegar?

> "Of course, it also aids in copyright clearance having the original
> scans and full source information available. Scholars and researchers,
> too, will now find the collection to be sufficiently authoritative for
> their purposes, where now it is NOT. If PG wishes to become Big League,
> it has to begin playing Big League ball."

Your view of Big League ball for eBooks seems to include the following:

- stating that all the work of past & current volunteers is crap.  Or
  was it just "corrupt," or "careless?"  Or "loose" and "messy?"

- dictating that all new content from all sources must include
  pedigree information and scans, and may only remain true to the
  printed dead trees edition

- only accept complete markup allowing for re-creation of the 
  original printed word

In my final words, I again encourage you to start your own effort to
make such things happen.  Use the PG mailing lists & newsletter to
solicit like-minded participants.  Work with DP to spin off your own
projects there, or your own independent DP-like effort.  Play in the
big league.  Cater to scholars.  Include only the works you think pass
muster.  Build your own constituency.

Meanwhile, you might want to review the documents in
http://gutenberg.org/about and see again why your efforts
to belittle past efforts or pursue your agenda to restrict
current activities are rejected.
  -- Greg
From Bowerbird at aol.com  Thu Nov 11 20:00:52 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov 11 20:01:11 2004
Subject: [gutvol-d] a few questions that i don't know the answer to
Message-ID: <9d.529136af.2ec58f74@aol.com>


here are a few questions before i skip on out of here...

first, for jon, and/or any other c.s.s. supporters out there:
my finding is that, at least using internet explorer, when i
copy c.s.s.-indented text (such as that in a block-quote)
out of the browser-window, the indentation gets lost...
(i'm not sure whether other browsers are able to retain 
the indentation.)  do you have a solution to this problem?

second, for greg.  people over at distributed proofreaders
have reported that the f.a.q. here at project gutenberg
do not state that styled text (specifically, italics and bold)
be marked with underbars and asterisks in the text files.
the understanding i have from you is that this has become
the official policy of project gutenberg.  if that's not the case,
would you please inform people here?  and if it _is_ the case,
when you next update the f.a.q., could you include this policy?
thank you.

the next questions involve long-standing and often-repeated
requests that i have made for other changes to p.g. policy.
having received no satisfaction, in spite of the reasonableness
of these requests, i will pursue a strategy of lobbying for them
to a wider audience, but i make them again here for the record.

1.  could you _please_ strive for consistency in your e-books?

2.  could you please ensure the policy on styled text is upheld?

3.  could you please start including graphic-file-names in your
plain-text versions, so my viewer-app knows what to display where?

4.  could you please start including page-break information in your
plain-text versions, so my viewer-app can use original page-breaks
for those end-users that might desire that capability?

5.  could you please start including line-break information in your
plain-text versions, for the same reason?

my documentation on zen markup language (z.m.l.) will demonstrate
how you can incorporate these requests into your plain-text files...

i'll continue to monitor this listserve until the end of the year,
so you can respond here, or backchannel, whichever you prefer.
not that i expect a response, since i've never gotten one so far.

david widger and jim tinsley, i salute you for all the hard work you do.
j. michael, thanks for being the one person who always treated me
fairly here.  and carlo, your record was almost as good; and as i've
told you backchannel, you are one of the very few people here who
quite consistently demonstrates a solid grasp on the problems and
has good ideas about how to build the solutions to those problems,
so keep on asserting yourself, because these people really need you.

michael hart, if there's anything i can do for you, let me know.

-bowerbird
From lofstrom at lava.net  Thu Nov 11 20:45:02 2004
From: lofstrom at lava.net (Karen Lofstrom)
Date: Thu Nov 11 20:45:18 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <20041112022413.GB8242@pglaf.org>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
Message-ID: <Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>


On Thu, 11 Nov 2004, Greg Newby wrote:

> You are envisioning frustrated scholars and others who care
> about such things.  Those are not our target audience, and never
> have been.  While it's likely that some such scholars have "turned off"
> to PG, I can tell you that there are close to zero requests for
> such pedigree information that come in on a monthly basis.

Well, yes, because scholars AREN'T using PG, that's why you don't get any
requests.

At DP, we're processing things that no one but a scholar will ever read.
Ever. I'm proofreading one of Canon Sells' books about Islam. No one who
is interested in current, up-to-date information is going to read this
book.  It's antiquated. However, some scholar working on a book re
"history of Western perceptions of Islam" might be thrilled to get access
to an old out-of-print work. If he/she feels the work is reliable, that
is.

If you don't want to cater to scholars, you're throwing away much of DP's
work.

-- 
Karen Lofstrom
{Zora on DP}
From traverso at dm.unipi.it  Thu Nov 11 21:37:34 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Thu Nov 11 21:37:53 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <Pine.BSI.4.58.0411111832270.18744@malasada.lava.net> (message
	from Karen Lofstrom on Thu, 11 Nov 2004 18:45:02 -1000 (HST))
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
Message-ID: <200411120537.iAC5bYHk016230@posso.dm.unipi.it>


One of the problems is that, until recently, the whitewashers removed
the informations on the origin of the book, like date of the edition,
publisher, etc; and the change in policy has not been sufficiently
advertised, so  some people (even at DP) remove the information to
conform to the perceived PG policy. 

We should at least change the official policy to recommend including
the full information on the sources (as well as information on
e.g. page numbers when it is useful, e.g. when there is an index or
cross-references by pages, or when the origin is a standard
reference).

I believe that PG has space for everything: combined editions,
abridged editions (provided they are stated to be abridged
editions...) scholarly editions. What is what should however be
stated, and acessible through the catalogue.

Cataloguing work may be distributed. I am sure that at DP a
cataloguing step done by specialized volunteers might be added, and
probably extended to non-DP submissions. The same team might be
willing to update the existing items, starting from past DP
contributions but extending to the other PG items. 

But please let us start to have sound cataloguing procedures for the
future. For example, PG should have a separate whitewashing step for
the catalogue (that might be done by a separate team, the competences
required being different). 


Carlo


From gbnewby at pglaf.org  Thu Nov 11 23:33:29 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Nov 11 23:33:31 2004
Subject: Marking bold & italics in .txt (was Re: [gutvol-d] a few questions
	that i don't know the answer to)
In-Reply-To: <9d.529136af.2ec58f74@aol.com>
References: <9d.529136af.2ec58f74@aol.com>
Message-ID: <20041112073329.GA18841@pglaf.org>

On Thu, Nov 11, 2004 at 11:00:52PM -0500, Bowerbird@aol.com wrote:
> second, for greg.  people over at distributed proofreaders
> have reported that the f.a.q. here at project gutenberg
> do not state that styled text (specifically, italics and bold)
> be marked with underbars and asterisks in the text files.
> the understanding i have from you is that this has become
> the official policy of project gutenberg.  if that's not the case,
> would you please inform people here?  and if it _is_ the case,
> when you next update the f.a.q., could you include this policy?
> thank you.

Jim maintains the FAQ, and DP has their own style guides that
sometimes vary for different texts.  So, I'm not really the right guy
to ask.  I don't think there was agreement on how to handle bold
& italics, but I do think everyone I heard from agreed it should be
indicated somehow in plain text.

So, I don't think there is an official policy on handling
bold & italics in plain text files.  But if DP has an official
policy I'm unaware of, then it should probably be reflected
in the FAQ as a recommendation.

Sorry I don't know the current state on this, but perhaps
Jim or some of the DP project managers can contribute the
latest thinking.
   -- Greg

From Bowerbird at aol.com  Thu Nov 11 23:41:53 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov 11 23:42:14 2004
Subject: [gutvol-d] goodbye
Message-ID: <1d1.2be96420.2ec5c341@aol.com>

nearly exactly one year ago, i came on the gutvol listserves,
challenging the gutvol-p markup wonks as all-talk/no-action,
noting that in spite of years of discussion, they hadn't made
any of the promises of a heavy-markup strategy materialize,
and indeed hadn't even gotten many of the e-texts marked up!

a year later, after an exchange of literally hundreds of messages
-- maybe thousands, i don't know, i wasn't really counting them --
many of them inflammatory, the situation is essentially the same.

i tried to wake people up to a simpler way that was still effective.
but nobody here was willing to hear it.  in fact, i was "moderated",
blamed for the flaming that my detractors used to victimize me.

i don't play the victim role though, so i'm off now, to work on my own.

i'll drop by again in another year, and see if y'all have made any progress
by then, or if you're still stuck in the same old merry-go-round circles...

in the meantime, i'll start up a blog, telling the world all the things that
i think you're doing wrong.  because i think they deserve to learn them.
i tried to tell you all this personally, as friends, here on your own lists,
but you weren't willing to listen.  so now i'm going public.  i tell you this
so you know i'm not going "behind your backs".  heck, you might want to
read the blog yourself; maybe it'll help you see what you've been missing.

here's a message i posted a while back.
you can call it "21 steps to happiness".

------------------------------------------------------------------------------

here's a little _overview_ to help you get your bearings,
at least in regard to _my_ work, _my_ viewer-program,
_my_ format, _my_ markup system, and _my_ philosophy.

1.  the e-texts -- as they are now -- must be regularized.
2.  i can write programs to do most of that automatically.
3.  the results need to be checked for quality control, and
4.  some missing information will need to be re-inserted.
5.  once that is done, the files will be _finished_, in that
6.  my viewer will present them as high-powered e-books.
7.  users can push a button to create high-end .html files,
8.  or save text as an .rtf file, or print out to paper or .pdf,
9.  in a way that gives 'em customized high-quality output.
10.  my program will do text-to-speech, and screenshots,
11.  and let people explore the project gutenberg library,
12.  and easily report errors they encounter in any e-text.
13.  those error-correction reports will be automatically
14.  routed to a system that presents all the material, so
15.  a human only has to say "yes" to approve the mod, and
16.  change-logs will be updated and a notice distributed.
17.  this e-text standardization and ease of handling will
18.  nurture a flowering of synergistic uses of the library
19.  by an array of creative and imaginative programmers
20.  that will engender a book-driven revolution in thought.
21.  and everyone will live happily ever after.  the end.

------------------------------------------------------------------------------

in the absence of a more compelling vision from anyone else,
i now depart...

-bowerbird
From bill at truthdb.org  Fri Nov 12 00:04:48 2004
From: bill at truthdb.org (bill jenness)
Date: Fri Nov 12 00:05:31 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <200411111727.iABHRtHL008551@posso.dm.unipi.it>
References: <200411111727.iABHRtHL008551@posso.dm.unipi.it>
Message-ID: <32924.134.117.137.186.1100246688.squirrel@134.117.137.186>

>>>>>> "Scott" == Scott Lawton <scott_bulkmail@productarchitect.com>
>>>>>> writes:
>
>     >> If enough people would like to contribute a brief synopsis for
>     >> texts in the collection, we already have a place in the catalog
>     >> they can go. (although I don't know about the mechanics behind
>     >> it)
>
>     Scott> What we don't have are links on the book's Web page which
>     Scott> say:
>     Scott>    Add a summary
>     Scott>    Add a review
>
>
>     Scott> the person's contribution is posted automatically and
>     Scott> immediately.  People want to see an immediate benefit from
>     Scott> their effort (however modest), rather than remembering to
>     Scott> check back to see if their voice was heard.
>
> I see a risk in this: the easiest way to add a summary is to grab one
> somewhere in the net, or in a book cover, and paste it. But it may be
> copyrighted, and PG will risk being sued for this. Not because of
> this, but some publisher might try to shut down PG with this excuse.
>
> This should be made by another web site, completely separated, and not
> being an interesting target. The way of accessing this other web site
> from the book catalogue page might be done in a legally safe way.
>
> Carlo
>
>
I think the way to go is to have a pg wiki linked to the catalog page
where the users could input reviews, literary commentary, author
biographical details, and etc. This would allow DP and other producers to
concentrate on producing and not get bogged down with researching
extraneous useful facts. I am certain there are some open source wikis
available that could be adapted. Perhaps the documentation side could be
set up as a separate foundation.
From hyphen at hyphenologist.co.uk  Fri Nov 12 00:33:10 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Fri Nov 12 00:33:49 2004
Subject: [gutvol-d] Sources for PG texts
In-Reply-To: <Pine.GSO.4.58.0411111323160.10940@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0411111323160.10940@vtn1.victoria.tc.ca>
Message-ID: <44t8p05ohkte7iufgtckqnr3kbikevmkcn@4ax.com>

On Thu, 11 Nov 2004 13:23:42 -0800 (PST),  Andrew Sly <sly@victoria.tc.ca>
wrote:

| 
| 
| On Thu, 11 Nov 2004, Jon Noring wrote:
| 
| > Let me clarify (again) below, what I wrote in a separate message. You
| > may still reject it, but PG's past carelessness and looseness leads to
| > legitimate questions about the accuracy and acceptance of the pre-DP-
| > era texts. "Corrupt" may be a strong word (and inaccurate), but not
| > placing "textual integrity" as #1 (including the perception of textual
| > integrity) is simply wrong. Note that perceptions are just as real as
| > reality itself.
| 
| 
| I would be careful about making a distinction between pre and post
| DP-era texts. Since the creation of DP, there have been countless
| texts added to the PG collection from other sources, without page
| images saved, or indication of exact editions used.
| 
| Project Gutenberg has always been open to accepting texts from
| any source, as long as they can be copyright-cleared. I don't
| see any likelihood of this changing.

So all  you volunteers who have specialized interests and want Out of
Copyright books about your special interest available forever.   
Get working.

-- 
Dave F

From jon at noring.name  Fri Nov 12 01:23:49 2004
From: jon at noring.name (Jon Noring)
Date: Fri Nov 12 01:24:24 2004
Subject: [gutvol-d] My spanking <laugh/>, and my reply
In-Reply-To: <20041112022413.GB8242@pglaf.org>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
Message-ID: <109886230796.20041112022349@noring.name>

Greg wrote:
> Jon wrote:

Wow, my backside is really sore from the spanking Greg just
administered to me. Some of the spanking was deserved, but some of it
was not, imho. More on that later, but first I'd like to first give
some thoughts on the problems with the gutvol-d list and archive
before answering several of Greg's comments. (Walking slowly...)


> (I redirected to gutvol-d@lists.pglaf.org.  Who sent this
> to Lyris @ listserv.unc.edu?  That server is broken, the list
> there is defunct.  I have been trying to delete the list there
> for months, but the software is perpetually non-responsive)

I'm not sure. I think I directed all my replies to the right place.

Btw, I tried to search the gutvol-d archives with regards to the
FAQ0 and FAQ1 issue (that is, how much was it really discussed on the
lists as Greg said it was?), and noticed that indeed the archive
appears broken -- everything before August is gone. James Linden told
me that the older archives may be lost for good, at least the Lyris
version.

Did anyone here keep their own copy of the gutvol-d (and I suppose
other gut*) archives? I've kept full backup archives of the several
dozen mailing lists I've run since 1992 (by simply collecting all the
emails sent out in plain text unix mbox format), but not lists I don't
run, thinking that those who administer them do as I do and create
redundant backups in a universal plain text format (as Michael would
approve!)

Since I've lately been sticking my nose into various affairs here some
think I should not, I may as well do it one more time and give another
opinion that the various Gutenberg lists be moved to YahooGroups (with
2-3 people designated as backup archivists in unix mbox format -- I'll
gladly volunteer to be one of the backup archivists since I already do
that for over twenty lists I run and co-administer.)

Why YahooGroups and not some listserv software running on PG's own
server?

1) I've had experience running various listserv software since 1992,
   and I find a lot of time is saved when someone else does it for me
   as YahooGroups does.

2) YahooGroups is actually very good and reliable, and since so many
   people now subscribe to one or more YG lists, it's easy to
   subscribe to one more. My decision to move The eBook Community, now
   with over 2400 subscribers, to YahooGroups in 1999 has proven to be
   the right decision.

   With a custom listserv run by PG, it's just another list I have to
   separately subscribe to, and if I have to change my email address,
   it's another separate service I have to contend with. YahooGroups
   consolidates all my subscribed lists into an easy, manageable form
   that no other listserv software comes close to in power and
   convenience.

3) YahooGroups includes other useful services, such as a Files Area
   and facilitated YahooIM Chat.

4) It's free! It doesn't take up any diskspace or bandwidth on the
   local server. (There are the insufferable ads, though, but these
   are easily ignored.)

5) It is possible to extract plain text (with full headers) for every
   posted message to any YahooGroup.

6) It archives messages for quite a while back. The eBook Community
   presently has 21289 messages available in the online archive dating
   from 1999 -- I don't know when YahooGroups will begin lopping off
   the oldest ones to save space, but it hasn't yet. I have separate
   archives for the mirrored unix mbox archive.

7) It has a web access for those who prefer that over receiving email.

8) Administration by the moderators is a breeze.


**********************************************************************

O.k., to address some of the issues Greg brings up. He is certainly
angry with several of my comments today. As noted above, some of it
I deserved, either in what I said or how I said something...


> My view is that Jon will not be content until all the people
> working on PG are ousted, in favor of his preferred organization,
> governance, fundraising, production rules, and collection
> guidelines.  This is not going to happen anytime soon, and 
> other than being critical of the status quo, Jon has contributed
> nothing towards making it happen anyway.  Instead, Jon has
> repeatedly been offered the ability -- with support and encouragement --
> to create the organization or content he so strongly desires.

There are several related points I'd like to address here, since Greg
brings up a couple I didn't really want to talk about (who otherwise
cares about my motivation for being here and for what I've brought up
recently?):

*****

My motivation is certainly not to "take over" PG and build a
dictatorship, and to kick out the old guard. Those who know me know
that I'm the opposite and in fact fear the same things Michael does
with respect to proprietary interests trying to defang the growth of a
robust and fully available digital public domain. The OpenReader
Project, which I co-founded, clearly shows my focus on open standards,
open source, and creating an ebook future founded on these principles.
In personality type, I am definitely a Fighting Idealist, for better
and for worse. I am definitely not very politically savvy and not
very diplomatic with my words, again for better and oftentimes for
worse.

For example, I commented earlier today, in response to a message
Juliet posted, that maybe DP should consider a policy that if they
don't get unencumbered page scans to put freely online (because some
group is anal about their beloved source document of a public domain
work), then they should not accept that situation and work around it.
Who's the idealist here? (referring to PG's FAQ0 or FAQ1.)

(But DP has their way of doing things and policies, which is fine. I
greatly admire DP for what they have accomplished, are now doing, and
fully support their vision for going to the next-level with an
XML-based system. Juliet is doing an extraordinary job and has not
been thanked enough for what she and her volunteers have accomplished,
which borders on the remarkable. I am working with Juliet and Charles
(who's currently on "sabbatical") to help them, as I can, with the
organizational challenges in their wish to move to next level, both in
XML implementation, and in increasing their capacity to meet the
challenges for the intriguing "Million Digital Texts Project.")

I make no bones I have strong feelings based on the bigger picture as
I see it -- and I honestly believe my vision is even bigger than
Michael's. I don't believe the ad-hoc, everyone does it their own way
approach for producing etexts is sufficient any more to accomplish
this Big Vision, and in fact will work against the Big Vision. Greg no
doubts disagrees with me as FAQ0/1/3 outlines, but so be it -- history
will be the ultimate arbiter of our differing world views.

I see how inadequate the current PG collection is for the future. This
evaluation is based upon three different ventures I've been involved
with since 1999 (including one now in development) where this Big
Vision has been, and is now being researched, by some really sharp
technical people who are nailing down the many architectural and
technical requirements. There are many more subtle requirements than
one would at first imagine -- I'm only now beginning to understand
them in a holistic sense -- and they reflect themselves all the way
back to the fundamental structure of the texts themselves, and the
associated metadata/catalog information.

I see millions of high-quality, uniform digital texts, both public
domain and Creative Commons, in a single repository which allows
people to access them, annotate them, and link them together and with
other texts and with other types of multimedia content in other
repositories in very powerful ways that would take too long to
describe here. That's one reason I state the master texts must be in
well-structured XML, since that will enable the advanced features this
repository will have. Properly done XML also confers many other
benefits too numerous to mention here. Both DP and PG have blessed the
right XML approach (e.g., as exemplified by Marcellos PGTEI), which is
very encouraging. But there's more.

For reasons I won't go into here (again for brevity sake), this Big
Vision also sets slightly more stringent requirements on both metadata
and cataloging than is currently done in PG, and it's the spinning
wheels of the current discussion on metadata and cataloging that lead
to my posts this afternoon out of sheer frustration. I see no
*requirements* mentioned, and no vision as to *what* the metadata/
catalog information is to be used for. How can one fix the metadata
requirements without a discussion of what the metadata will be used,
and useful for?

It is frustrating to see all this ad-hoc activity happening with no
guidance as to the who, what, when, where, why and to what extent --
the purpose of the metadata -- being resolved based on general
requirements, which in turn are derived from the full and detailed
vision (which is NOT given in the FAQs) of why PG exists and what it
produces. Certainly I could try to force my way further into the
discussion (more than I have now) and try to provide answers to these
questions, but then I'll just become another voice to add the ad-hoc
cacophony we now have where the one who produces something first wins,
even if it ends up not meeting the full long-term goals.

This is the result of the FAQ0 and FAQ1 philosophy, which does not
always give the results one hopes for. To get resolution on tough
issues it is oftentimes necessary for the leadership to take charge
and to firmly guide discussion to logically resolve what must be
done. In some ways, it may be that the "leadership" simply doesn't
have the time (because it is voluntary) to formalize the process to
force a structured approach to fast decision-making and buy-in to
the result. Understandable, but sad.

What I fear the most, and this I've expressed to Brewster Kahle (who I
meet again next week about Project Gramophone) and to JD Lasica (who's
launching the ourmedia project and I'm assisting with the metadata/
cataloging side) is that many people will develop these wonderful
repositories of digital content (I'm also working on Project
Gramophone/Sound Preserve to transfer and archive millions of old
sound recordings), with billions of digital objects, which simply
won't and can't "talk" with each other, because everyone is "doing
their own thing" PG-style. Wheeee, the late 60's all over again.
<smile/>

Let me give a small example to illustrate just a corner of what the
world could be like if everything is done properly:

   Imagine someone creating a video for ourmedia where someone is
   playing the piano, say "Take the 'A' Train", composed by Billy
   Strayhorn and which became Duke Ellington's theme song. We would
   want to be able to allow the viewer to link, if they so choose,
   with the song lyric repository, with various wikipedia entries,
   and to Sound Preserve to bring up orchestral recordings of "Take
   The 'A' Train" by Duke Ellington and others. We'd also like to link
   to the Project Gutenberg collection for any works, such as Duke
   Ellington's book "Music is My Mistress" (assuming PG got permission
   to add it, likely not.) And of course we'd allow the end-user to
   join special communities built around any particular topic
   connected with that song -- just as Ellington communities, jazz
   communities, Strayhorn communities, etc.

Doing all of this (and a lot more) confers a few added requirements,
especially with regards to metadata information (text has the
redeeming grace that it is fairly easy to dig out some information by
full text searching -- but not standardized subject matter fields! --
but it is much harder with video and audio so the metadata and
cataloging requirements for video and audio will likely be more
stringent and extensive.)

PG's self-enforced isolation, because of its seeming fear of working
with the Big Boys (which is somewhat understandable) is working
against PG in various ways in seeing the bigger picture of how the
text production activities it is catalyzing will mesh with this much
bigger, more wonderful world. But if the various repositories don't do
it right from the start, including Project Gutenberg, and they end up
with millions and billions of digital objects *not done right*, then
the interlinkage will be much more difficult and nowhere near as
powerful and useful as it could be. It will be essentially impossible
to fix after the fact. JD Lasica now recognizes this and is supporting
somewhat expanded metadata standards to assure inter-repository
linkage, but I don't see the PG "leadership" seeing this, nor am I
confident it can because of the FAQ0/1/3 constraints.

Note how PG is having difficulty fixing the metadata and catalog info
for a *measly* 10,000 or so texts. Imagine having a million of them
*not done right* (especially with regards to metadata and catalog
information requiring human input -- for some digital objects, if the
data is not collected right at the start, it will be impossible to
figure it out much later, even with human intervention. So much for
the power of our digital future.)

(Part of the Big Vision calls for aiding integration using James
Linden's very interesting "Open Genesis" concept, currently under
development. James is probably not yet ready to discuss this, but it
is best described as the "Semantic Web Done Right From the Start."
The requirements Open Genesis confers upon digital content
repositories are surprisingly quite minimal -- but it is needed to
have a standardized framework to improve inter-repository and
inter-object linking. Marcello's effort to bring RDF into the mix is
laudable and will certainly aid more robust intra- and inter-
repository linking.)

I'd love to see PG take the lead to make this happen for the text side
of the house, and that's my motivation in pressing a lot of issues
here to the point where I may become personna non grata, but it won't
happen until PG realizes that it needs to confer more requirements on
the texts and metadata it catalyzes and collects from the many
volunteers (outside of DP, which is doing things mostly right by my
reckoning), as well as to more actively work with other repositories
-- to become a part of the bigger world rather than isolating itself
as it seems to. It needs one or two full-time people -- this costs
some $$$ -- this requires a somewhat higher level of organization and
a maybe a slightly different governance to even be given this $$$ (or
to develop some ongoing revenue stream.) And if it wants to play a
major role in the "Million Digitized Texts Project" (should it get
successfully launched), it *has* to change its governance and how it
interacts with the world at large.

Frankly, the FAQ0 and FAQ1 documents are actually quite hostile by
inferring the world at large is somehow evil and out to get PG. Yes,
some parts of the world at large are hostile to PG and wish it gone,
but not all of them. The wisdom is to associate with your friends and
those who share the same vision, not drive them away by painting
everyone with the same "evil" brush. If you don't believe FAQ0 and
FAQ1 sends this message to those in various outside groups, I suggest
the wording of FAQ0/1 be looked at again by what it doesn't say but
should say. For example, there's little in there about building, for
example, close strategic partnerships with other like-minded
organizations, and to work together on common standards and common
goals. Nothing there is mentioned about joining standards and other
types of organizations so as to promote PG's interests. PG has become
disturbingly quite xenophobic in orientation -- it acts as if the rest
of the world does not exist or does exist and is evil, and that magic
will always automatically happen if you simply let everyone do their
own thing. Magic does happen often, but magic can also run out.

To answer Greg's "I don't take Yes For an Answer" (which is,
interestingly enough, what the New York Times William Safire today
used to describe Arafat's 1999 refusal of unbelievable concessions by
the Israelis), let me say that I am working hard on the vision. I'm
coordinating with ourmedia, with Project Gramophone (now called Sound
Preserve), and working with another venture dedicated to tying this
all together and to launch the "Million Digitized Texts Project." Will
we succeed in at least launching MDTP? Maybe. Maybe not. But I am
taking Greg's "Yes for an answer" to heart and I am working on it as
I envision it -- it's just that it is not restricted to the closed
world of PG so that's why it seems somewhat out of lockstep with what
is going on here. But if we do succeed in launching MDTP and the Bigger
Vision it will be a part of, and if PG wants to play a *major* role
with MDTP -- and I'd certainly welcome PG and its "leader volunteers"
to jump onboard for many obvious reasons -- PG will have to change in
certain ways simply to work as a major player with the MDTP project.
If PG decides it rather not change its governance and focus by
increasing the acceptable text and metadata standards (which really
are not that much), then that's totally understandable -- PG could
still play a role, but it would essentially be peripheral and the
parade may end up marching by it.
   
*****

On another point, if I expressed wording reflecting hostility to those
who have contributed texts to the PG collection over the years, this
was not my intent, and I apologize for this. I've typed in whole books
by hand, and then laboriously proofed them, marked them up, and
converted into ebooks, so I am familiar firsthand with this process
of love. Some of the books being talked about here -- the very
difficult 17th/18th century texts -- is a remarkable achievement to
digitize (and markup as well.) It amazes me the commitment many people
here have to digitizing texts.

My comments were directed at the leadership for not following what I
believe are slightly more stringent policies with regards to metadata
and text formatting requirements (some of which are understandable
given where things were in the early 1990's). I'm a firm believer in
the principle of "the buck stops here". That is, if there are
problems, it is the responsibility of the PG leadership due to their
prior decisions and established system. It may be unfair at times
since it is impossible to accurately predict the future and to develop
the right approach to meet that future (e.g., Michael Hart's early
allergy to including source information in texts appeared to be a
protection mechanism against copyright infringement claims.) But
nevertheless it is up to the leadership to take responsibility, adjust
accordingly, and to pro-actively "fix it". Maybe some of the problems
are best solved by the ad-hoc, hands-off approach as given in FAQ0/1/3,
but I don't believe all problems with the PG Collection will be solved
by this approach, especially when looking at the useful linkage of the
PG collection with other content repositories as outlined above, which
requires an integrated approach, and working cooperatively with other
groups.

*****

On a point related to what I wrote earlier, I'm troubled by this view
that PG's collection should be focused toward a particular use niche,
rather than to be designed to be useful for just about every use. As
I've analyzed things, the added requirements to make PG digital texts
useful not only for general reading, but for scholarship and research
(plus linking to other repositories) are so few that to ignore them is
downright puzzling. What is needed? Well, require the source info be
included in the metadata -- that's the major one. The next one is to
work hard to acquire and preserve page scans. There is likely a few
other requirements which are even less burdensome.

The vast majority of the effort to produce digital texts from paper
copy is to scan (or type in) the book and then proofreading it. The
rest of the added stuff to make the texts more useful is time- and
effort-wise miniscule by comparison.

This reminds me of a Minnesota-Norwegian joke about the Norwegian
who tried to swim across a lake -- when he got 95% of the way to the
other side, he decided he couldn't make it, and swam back. It's
ludicrous not to make that extra 5% effort, and elevate the PG
collection to a significantly higher plane of usefulness, quality, and
better digital integrity (talked about next). This is especially
tragic given the hundreds of thousands of hours already devoted to the
PG collection, when that extra 5% (if that) would have made a
significant improvement.

*****

And about digital integrity, I stick to my position that anything
which PG requires to increase the digital integrity of the text
itself to the original source is a Good Thing (tm). Certainly
deviations from the source must be allowed, such as correcting some
obvious typesetting errors (as an aside, has PG established a uniform
policy for what types of edits/corrections in the digital text are
allowed? Or is this again one of those FAQ0/1 "let's not interfere
with anyone", type of things?)

But what I mean by digital integrity has to do with the faithfulness,
or more importantly, the perception of faithfulness, of the *meaning*
of the text to the original source. It's a legitimate question to ask
whether those involved in producing digital texts took more liberties
with the text than they should have? This is not a trivial issue when
we look at history where censorship is the norm. Certainly, as Greg
pointed out, the source texts themselves may have been grossly edited
contrary to the author's original intent (if it were not the first
edition, for example), but we must not add to this problem in any way
(instead, let's also do the first edition!) In addition, I believe one
intent of PG is to assist with the effort to assure the digital texts
will survive into the distant future, to hopefully survive wars,
revolutions, totalitarianism, digital "book burnings", etc. As the
centuries roll by, the issue of digital integrity becomes more and
more important for the integrity of the information being passed on to
future generations.

That is why I believe it is necessary for PG to establish policies for
new texts, and to begin working on upgrading some of the existing
texts at the appropriate time, to standardize the digital integrity
requirements as much as possible, and more importantly to acquire and
preserve the original page scans whenever possible.

Having the original page scans available side-by-side with the digital
texts also benefits everyone (and the Big Vision) by resolving any
difficulties in presentation of the digital texts (we all know how
weird some texts are), and for fighting against claims of copyright
infringement. Contrary to Michael Hart's early policies in hiding the
pedigree of digital texts, having the page scans available, so long as
our copyright clearance procedure is sufficent, actually strengthens
PG against claims of copyright infringement.

*****

As a final note, I do agree with several who responded today about
my call for redoing the older PG texts, saying we should wait until
DP moves to the next-generation XML-based system before redoing these
texts. I definitely agree as I think about it. What I think could be
done, however, is to prepare for this eventuality by 1) flagging those
texts we'd like to redo someday, 2) search for higher-quality source
books which will give us *unencumbered* page scans, and then 3) file
those page scans away in the archive for later conversion to digital
text at the appropriate time. There's nothing wrong with decoupling
the scanning stage from the proofreading stage.


No doubt my answers will not satisfy everyone, and may not satisfy
anyone. But after my spanking, I needed to reply, and in one case
apologize.

Jon Noring

From traverso at dm.unipi.it  Fri Nov 12 03:41:04 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Fri Nov 12 03:41:30 2004
Subject: [gutvol-d] My spanking <laugh/>, and my reply
In-Reply-To: <109886230796.20041112022349@noring.name> (message from Jon
	Noring on Fri, 12 Nov 2004 02:23:49 -0700)
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<109886230796.20041112022349@noring.name>
Message-ID: <200411121141.iACBf4iN022809@posso.dm.unipi.it>


I have kept all PG mail since I subscribed in sept. 2001. It needs to
be sorted in the different lists, might contain some extraneous items,
and might miss something. If somebody wants them to reconstruct the
archives, Il be glad to contribute them. I don't make them immediately
available, since I would have to check first that something private is
not contained there, since my filtering is not always accurate.

I dislike YahooGroups, I by far prefer a pglaf-based mailman. 

Carlo
From jmk at his.com  Fri Nov 12 04:04:25 2004
From: jmk at his.com (Janet Kegg)
Date: Fri Nov 12 04:04:46 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <200411120537.iAC5bYHk016230@posso.dm.unipi.it>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<200411120537.iAC5bYHk016230@posso.dm.unipi.it>
Message-ID: <fc89p0decn763sn8ek18nnelgrr75pg4bf@4ax.com>

On Fri, 12 Nov 2004 06:37:34 +0100, you wrote:

>
>One of the problems is that, until recently, the whitewashers removed
>the informations on the origin of the book, like date of the edition,
>publisher, etc; and the change in policy has not been sufficiently
>advertised, so  some people (even at DP) remove the information to
>conform to the perceived PG policy. 

Until recently?  I've been regularly including publisher, place of
publication, and date in DP books I've uploaded to PG. Except in a few
cases earlier this year, all but the date has been deleted by the WW.
This has been mildly bugging me for a while--since I do see other new
PG eBooks with publisher information included. 

And as long as I'm delurking, I'll mention that my DP projects include
project comments with quoted biographical info on the author (from Web
sources, and usually other Web links). Would it be somehow useful if I
include the url to the DP project page in the comments section of the
upload form? 

-- Janet Kegg
 
From marcello at perathoner.de  Fri Nov 12 05:03:31 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov 12 05:03:34 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
Message-ID: <4194B4A3.3050305@perathoner.de>

Karen Lofstrom wrote:

> At DP, we're processing things that no one but a scholar will ever read.
> Ever. I'm proofreading one of Canon Sells' books about Islam. No one who
> is interested in current, up-to-date information is going to read this
> book.  It's antiquated. 

The Koran makes the Top 20 of our downloads and is much older.

> However, some scholar working on a book re
> "history of Western perceptions of Islam" might be thrilled to get access
> to an old out-of-print work. If he/she feels the work is reliable, that
> is.

The problem lieth not within PG. It lieth within Academia.

Academia has to adapt its methods and processes to the new world where 
information resources are ephemeral.


If you cite a dead tree edition of something you are quite confident 
that the cited text stays put. It wont change its wording or glide from 
the cited page into the next etc.

If you cite an electronic resource you have no such confidence. How do 
you make sure that the text at the url you cite will not be edited or 
removed? You cannot. How do you make sure the medium you cite will still 
be readable in some years? In a hundred years reading a CDROM may be 
harder than it was to read the rosetta stone.


> If you don't want to cater to scholars, you're throwing away much of DP's
> work.

Its not our problem. Any amount of catering will not do away with 
Academias percieved "limitations" of electronic media.

The best value for Academia (and the least work for us) would be just to 
include the page scans. Any transcription you make will fall short of 
the requirements of some scholar. I think we should use our time for 
producing more books for a general audience instead than producing 
Academia-certified editions of them.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Fri Nov 12 05:10:35 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov 12 05:10:39 2004
Subject: [gutvol-d] goodbye
In-Reply-To: <1d1.2be96420.2ec5c341@aol.com>
References: <1d1.2be96420.2ec5c341@aol.com>
Message-ID: <4194B64B.3010301@perathoner.de>

Bowerbird@aol.com wrote:

> in the meantime, i'll start up a blog, telling the world all the things that
> i think you're doing wrong.

That will help.

But, don't let your departure be delayed by poor little me.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Fri Nov 12 05:20:25 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov 12 05:20:29 2004
Subject: [gutvol-d] [BP] The Future of eBooks
In-Reply-To: <32924.134.117.137.186.1100246688.squirrel@134.117.137.186>
References: <200411111727.iABHRtHL008551@posso.dm.unipi.it>
	<32924.134.117.137.186.1100246688.squirrel@134.117.137.186>
Message-ID: <4194B899.7000907@perathoner.de>

bill jenness wrote:

> I think the way to go is to have a pg wiki linked to the catalog page
> where the users could input reviews, literary commentary, author
> biographical details, and etc.

I think you underestimate the maintenance work that goes into a wiki. 
Please go over to Wikipedia and read the Talk pages for some 
controversial topic, eg. Israel vs. Islamic World. Or read the vote 
pages where competing groups try to get the other group's pages removed 
by vote.

I sure don't want to spend my day inside the wiki admin page for "The 
Koran" or "The Communist Manifesto" other works with high controversial 
potential.


> This would allow DP and other producers to
> concentrate on producing and not get bogged down with researching
> extraneous useful facts.

Do you know for a fact that they are bogged down?


> I am certain there are some open source wikis
> available that could be adapted. Perhaps the documentation side could be
> set up as a separate foundation.

Go ahead. Get your wiki started. If you reach critical mass we'll 
implement links from the bibrec pages to your wiki.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Fri Nov 12 05:23:42 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov 12 05:23:45 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <fc89p0decn763sn8ek18nnelgrr75pg4bf@4ax.com>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>	<20041112022413.GB8242@pglaf.org>	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>	<200411120537.iAC5bYHk016230@posso.dm.unipi.it>
	<fc89p0decn763sn8ek18nnelgrr75pg4bf@4ax.com>
Message-ID: <4194B95E.1060607@perathoner.de>

Janet Kegg wrote:

> Would it be somehow useful if I
> include the url to the DP project page in the comments section of the
> upload form? 


It would be useful to include the dp project number in some form. We 
have a discussion ongoing with Joshua on how to achieve this.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From bkeir at pgdp.net  Fri Nov 12 05:57:58 2004
From: bkeir at pgdp.net (bkeir@pgdp.net)
Date: Fri Nov 12 05:58:02 2004
Subject: [gutvol-d] Scholarly acceptance
Message-ID: <36207.203.12.144.232.1100267878.squirrel@203.12.144.232>

The many discussions I've had with academics about PG and DP point to
their unshakable distrust, sight unseen, of the quality of work done by
"unqualified" volunteer/amateurs.

"You mean you let ANYONE do your proofreading??!?!?" is both a question I
was asked, and a fair summary of their attitude of incredulity.

The open-source, distributed but computer-linked volunteer paradigm is
still too new in the world for its strengths, and the quality of its
productions, to be trusted by the average academic. Give it a few decades,
and population replacement MAY change this.


From hart at pglaf.org  Fri Nov 12 06:10:51 2004
From: hart at pglaf.org (Michael Hart)
Date: Fri Nov 12 06:10:54 2004
Subject: [gutvol-d] Scholarly acceptance
In-Reply-To: <36207.203.12.144.232.1100267878.squirrel@203.12.144.232>
References: <36207.203.12.144.232.1100267878.squirrel@203.12.144.232>
Message-ID: <Pine.LNX.4.60.0411120602200.28125@pglaf.org>


On Sat, 13 Nov 2004 bkeir@pgdp.net wrote:

> The many discussions I've had with academics about PG and DP point to
> their unshakable distrust, sight unseen, of the quality of work done by
> "unqualified" volunteer/amateurs.
>
> "You mean you let ANYONE do your proofreading??!?!?" is both a question I
> was asked, and a fair summary of their attitude of incredulity.
>
> The open-source, distributed but computer-linked volunteer paradigm is
> still too new in the world for its strengths, and the quality of its
> productions, to be trusted by the average academic. Give it a few decades,
> and population replacement MAY change this.


On the other hand, scholars and librarians around the world have also
said just the opposite, remarking VERY positively about our collections
of Robert Louis Stevenson, Charles Dickens, and many others.

The truth is that there will always be those who can't abide anything
"not invented here."  This goes from messages we have received that
ONLY the sender's favorite edition should be used, and all others
should be denied a place in ANY eBook library.

On the other hand, there is always the Darwinian approach:

Those who do not use eBooks simply won't be able to keep up with
the those who do.

This might be one of the best reasons for NOT giving them each
eBook as an exact copy of a particular paper edition.

I've also heard that many of those who complain, actually use our
eBooks in secret, and ONLY want the provenance so they can steal
them without giving credit where credit is due.

Apparently they feel they can't actually take them publicly,
because they don't wan't to give credit to Project Gutenberg,
but if they know which paper edition we used, they can bypass
giving us any credit.

Somehow this reminds me of Napoleon, in "Animal Farm". . . .


Michael

From hart at pglaf.org  Fri Nov 12 06:23:29 2004
From: hart at pglaf.org (Michael Hart)
Date: Fri Nov 12 06:23:31 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <4194B4A3.3050305@perathoner.de>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<4194B4A3.3050305@perathoner.de>
Message-ID: <Pine.LNX.4.60.0411120613120.28125@pglaf.org>


On Fri, 12 Nov 2004, Marcello Perathoner wrote:

> Karen Lofstrom wrote:
>
>> At DP, we're processing things that no one but a scholar will ever read.
>> Ever. I'm proofreading one of Canon Sells' books about Islam. No one who
>> is interested in current, up-to-date information is going to read this
>> book.  It's antiquated. 
>
> The Koran makes the Top 20 of our downloads and is much older.
>
>> However, some scholar working on a book re
>> "history of Western perceptions of Islam" might be thrilled to get access
>> to an old out-of-print work. If he/she feels the work is reliable, that
>> is.
>
> The problem lieth not within PG. It lieth within Academia.
>
> Academia has to adapt its methods and processes to the new world where 
> information resources are ephemeral.

Actually, Project Gutenberg eBooks have proven much less ephemeral
than paper books published in the same period, as all of the Project
Gutenberg eBooks have been available continuously from their first
day of release, while most paper books from over 5 years ago are no
longer in print.


> If you cite a dead tree edition of something you are quite confident that the 
> cited text stays put. It wont change its wording or glide from the cited page 
> into the next etc.

But only if you find the exact same paper edition.


> If you cite an electronic resource you have no such confidence. How do you 
> make sure that the text at the url you cite will not be edited or removed? 
> You cannot.

Actually, it's pretty easy to find all the original Project Gutenberg eBooks,
as well as the newer versions, because so many places keep them, usually in
the thousands for any of our eBooks that have been out for even a week.


> How do you make sure the medium you cite will still be readable 
> in some years? In a hundred years reading a CDROM may be harder than it was 
> to read the rosetta stone.

There are SO many copies of each Project Gutenberg eBook out there that the
question of a particular medium becomes irrelevant. . .when you download a
copy of Huck Finn, you never know at your end whether it is stored on a
CDROM, DVD, RAID, Terabrick, or even a floppy.  Most of you don't realize
that less then 20 years ago our eBooks were available from my BBS, and
that the entire BBS ran on hi-density floppy drives.

The fact that the eBooks are independent of the medium, and of hardware
or software requirements in "Unlimited Distribution" is what makes them
last longer than anything else on the entire Internet.

Where else can you find files that were originally posted 33 years ago?


Michael

From hart at pglaf.org  Fri Nov 12 06:31:23 2004
From: hart at pglaf.org (Michael Hart)
Date: Fri Nov 12 06:31:25 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <4194B4A3.3050305@perathoner.de>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<4194B4A3.3050305@perathoner.de>
Message-ID: <Pine.LNX.4.60.0411120623310.28125@pglaf.org>


On Fri, 12 Nov 2004, Marcello Perathoner wrote:

>> Karen Lofstrom wrote:

> The problem lieth not within PG. It lieth within Academia.

I must agree.

Academia is perhaps the worst when it comes to the "not invented here,"
syndome. . .and it pays the price by lagging behind.

>
>> If you don't want to cater to scholars, you're throwing away much of DP's
>> work.
>
> Its not our problem. Any amount of catering will not do away with Academias 
> perceived "limitations" of electronic media.

That is, until they take over the eBooks, and claim them as their own.


>> If you don't want to cater to scholars,
>> you're throwing away much of DP's work.

If we cater to scholars, we are only expanding the "digital divide,"
so to speak.  Our goal is to provide a large viable library to all,
not just to the scholars, who represent less than 1% of the people,
and are often very elitist.

The real value of the work lies in making it available to the masses,
not to the scholars.

If we can increase literacy by even 10%,
we make more difference than if we cater
to the scholars.


> The best value for Academia (and the least work for us) would be just to 
> include the page scans. Any transcription you make will fall short of the 
> requirements of some scholar. I think we should use our time for producing 
> more books for a general audience instead than producing Academia-certified 
> editions of them.

Hear Hear!


Michael

From tb at baechler.net  Fri Nov 12 07:11:46 2004
From: tb at baechler.net (Tony Baechler)
Date: Fri Nov 12 07:10:06 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <Pine.LNX.4.60.0411120613120.28125@pglaf.org>
References: <4194B4A3.3050305@perathoner.de>
	<LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<4194B4A3.3050305@perathoner.de>
Message-ID: <5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com>

At 06:23 AM 11/12/2004 -0800, you wrote:

>Actually, it's pretty easy to find all the original Project Gutenberg eBooks,
>as well as the newer versions, because so many places keep them, usually in
>the thousands for any of our eBooks that have been out for even a week.

Hello.  Actually, I've had a hard time finding any of the very early 
editions of PG files.  There are some old files in the etext90 directory, 
but not edition 10 of the first several ebooks.  I would be interested to 
find the very first edition of when10.txt or whatever it was called as MH 
posted it.  Even the old GUTINDEX.* files have been removed, with the 
earliest being GUTINDEX.96 when it used to be GUTINDEX.90. 

From nwolcott2 at kreative.net  Fri Nov 12 07:08:31 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Fri Nov 12 07:26:02 2004
Subject: [gutvol-d] Gone with the wind i s "Gone with the wind"
Message-ID: <006d01c4c8cb$e4d38440$069595ce@net>

Sayonara. Apparently all versions of GWTW have disappeared from the net. 

nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041112/77ed0a7f/attachment.html
From nwolcott2 at kreative.net  Fri Nov 12 07:22:08 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Fri Nov 12 07:26:05 2004
Subject: [gutvol-d] Perfection
Message-ID: <006e01c4c8cb$e5c621a0$069595ce@net>

Instead of worrying about perfection, we would be better advised to fix the many texts which are or have become unreadable. It is also uncomfortable, when there are several translations of a work with the same title and an anonymous translator  to havve the publisher routinely or randomly removed. Also there are many DOS texts with accents that are hence unreadable. Any code page should be acceptable? maybe but. . . 

Also although there are explicit directions for submitting a text, correcting one or updataing one, even one I contributed, has apparently no explicit provision. Also, at random apparently, a little preamble I have added to help the reader identify the text or its possible shortcomings is removed. 

Although many texts shave no unique provenance as MH has advised, but that is no reason for removing any hint of preovenance when one is supplied by a contributor.

nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest keeping the inkpots full. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041112/d52a5c13/attachment.html
From mbuch at mcsp.com  Fri Nov 12 07:48:18 2004
From: mbuch at mcsp.com (Her Serene Highness)
Date: Fri Nov 12 07:46:27 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <006e01c4c8cb$e5c621a0$069595ce@net>
Message-ID: <EIEBIMKJHGGGNFLDOPPKCEKOCJAA.mbuch@mcsp.com>

And herein lies some of the problem.  I'm a college professor, and I
recently earned my PhD.  I would have had a hard time getting a rtext past
my professors without being able to document who published it.  I would have
a hard time making a citation to a document with no pages.  I would be very
annoyed with a student who just pointed to something on the net that had no
provenance whatsoever- even many pieces of ephemera have provenance.  I
don't think this is a matter of fuddy-duddy professors who just don't
understand how wonderful e-books are;  I think the very concept of e-books
as it now stands, while excellent for casual readers or people who simply
want to educate themselves, is deeply flawed. When I am citing a text, I
cannot refer to a vague document.  I need to know EXACTLY when the original
was published, who published it, and where, since there are variant texts
out there.  Even a single word change that might have occurred in the
copying process could change the meaning of a vital sentence. PG is
wonderful- but as a student and a teacher, I don't think that most
cybertexts provide the citability that is so important for academics. If PG
was the only source in the world for vital texts, that would be one thing-
but it isn't.

I love PG, and I sned students to it all the time- but only for the purpose
of reading.  I would not seend a student to a PG text in order to make a
citation.  I have no way of knowing where many of the texts came from,
whether the edition copied was a variant on the original, what page the
information appeared on inthe original copy, or anything else. In the social
sciences and liberal arts, these things are very important. It is the soul
of how we check for plagiarism, understand the history of a work, and make
specific references. PG is great for when I want to read a Tom Swift book or
understand the human genome - but it doesn't help me if I need to explain
the migration in the ideas of Franz Boas over time and through eiditions of
his works or examine the changes between editions of Dust Tracks on the
Road.
  -----Original Message-----
  From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Norm Wolcott
  Sent: Friday, November 12, 2004 10:22 AM
  To: Project Gutenberg Volunteer Discussion
  Cc: Norm Wolcott
  Subject: [gutvol-d] Perfection


  Instead of worrying about perfection, we would be better advised to fix
the many texts which are or have become unreadable. It is also
uncomfortable, when there are several translations of a work with the same
title and an anonymous translator  to havve the publisher routinely or
randomly removed. Also there are many DOS texts with accents that are hence
unreadable. Any code page should be acceptable? maybe but. . .

  Also although there are explicit directions for submitting a text,
correcting one or updataing one, even one I contributed, has apparently no
explicit provision. Also, at random apparently, a little preamble I have
added to help the reader identify the text or its possible shortcomings is
removed.

  Although many texts shave no unique provenance as MH has advised, but that
is no reason for removing any hint of preovenance when one is supplied by a
contributor.

  nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood
Forrest keeping the inkpots full.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041112/3467f280/attachment.html
From mbuch at mcsp.com  Fri Nov 12 08:12:39 2004
From: mbuch at mcsp.com (Her Serene Highness)
Date: Fri Nov 12 08:10:47 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <Pine.LNX.4.60.0411120623310.28125@pglaf.org>
Message-ID: <EIEBIMKJHGGGNFLDOPPKOEKOCJAA.mbuch@mcsp.com>


-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Hart
Sent: Friday, November 12, 2004 9:31 AM
To: Project Gutenberg Volunteer Discussion
Subject: Re: [gutvol-d] PG audience


On Fri, 12 Nov 2004, Marcello Perathoner wrote:

>> Karen Lofstrom wrote:

> The problem lieth not within PG. It lieth within Academia.

I must agree.

Academia is perhaps the worst when it comes to the "not invented here,"
syndome. . .and it pays the price by lagging behind.

Sometimes it's not a matter of laggig behind. Academia has different needs
and goals than the casual reader. I'm an academic, and I will use PG with
undergrands- but tell them to go to paper books for citations.  Why? because
provenance is important in citation. My students tend to think everything on
the net is 'true'- they don't understand that books on the net may or may
not reflect scholarly knowledge or acceptance. And often the divisions are
too large for useful citation- the page is not only a piece of paper.  It's
a unit of citation. Page 193 in the 3rd edition of a particular book by a
particular poublisher is page 193 in every copy, and contains a finite
number of words. Chapter 23 may have a finite number of words, but how do I
find the sentence I want to cite? Plus, the edition used on PG might not be
the standard- it might be a variant.  Variant problems are crucial when
trying to read poetry and literature for scholarly purposes.

Chefs aren't 'lagging behind' just because most of them still chop food by
hand instead of using Cuisinarts. They can control the texture and shape of
what they cook much better by using an old-fashioned blade. On the other
hand, electric mixers are much more efficient for making cakes and can do a
better job than a person beating eggs and butter by hand- which is why
pastry chefs use machines most of the time.

>
>> If you don't want to cater to scholars, you're throwing away much of DP's
>> work.
>
> Its not our problem. Any amount of catering will not do away with
Academias
> perceived "limitations" of electronic media.

That is, until they take over the eBooks, and claim them as their own.

We probably won't, unless we can find ways of making exact facsimile scans
of books with page numbers, citations, illustrations, and so on.
Are musicians silly because they choose to play instruments instead of
having machines do all the work? No. Machines, no matter how good they are,
don't have the same warmth that physical instruments have.  Even if one day
they do, I doubt all the instuments in the world will be thrown away.

Why do you care whether academics cite PG?  You seem to think they should
come to you- did you ever think we have this thing called a 'page' that acts
as a standard unit of knowledge, and that when we cite something, we need
that page to stay reasonably stable?  And it does, even with the vagaries of
publishing. PG is great but most of the the books you publish aren't the
sorts of things that would be useful to a grad student anyway- or even an
undergrad, most of the time. For people who want a book on the go, who are
looking for an out of print book for nostalgia's sake, for people who need
to change print size for readability, PG is perfect.  But it's not very
useful for citations, any more than tv science programs are.

I do think that dedicated proofers can do a great deal, and should be
applauded.  They can have exactitude.  But that's not the problem.  The
problem is provenance. If you wanted academics to accept you, you would have
to provide that, and maybe have experts on particular books vet them.


>> If you don't want to cater to scholars,
>> you're throwing away much of DP's work.

If we cater to scholars, we are only expanding the "digital divide,"
so to speak.  Our goal is to provide a large viable library to all,
not just to the scholars, who represent less than 1% of the people,
and are often very elitist.

I agree, and i'm a scholar.  Stop worrying about what we think.  PG has
shown me books I couldn't enjoy otherwise.  Scholars don't read scholarly
books all the time, and they have places to go for that.

The real value of the work lies in making it available to the masses,
not to the scholars.

If we can increase literacy by even 10%,
we make more difference than if we cater
to the scholars.


> The best value for Academia (and the least work for us) would be just to
> include the page scans. Any transcription you make will fall short of the
> requirements of some scholar. I think we should use our time for producing
> more books for a general audience instead than producing
Academia-certified
> editions of them.

Hear Hear!

I agree- but I would love to see page scans.  I don't think that most casual
readers (and by that I even include 'serious' readers who do not use written
material for citation) understand why pagination is so important to
scholars.  That's fine. But pease stop assuming that we're all Luddites just
because PG is pretty much useless to us academically.  Hey- professional
baskeball players sometimes play one-on-one for fun; that doesn't mean they
have to take such play seriously for it to have value.


Michael

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

From joshua at hutchinson.net  Fri Nov 12 08:28:33 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Nov 12 08:28:38 2004
Subject: [gutvol-d] a few questions that i don't know the answer to
Message-ID: <20041112162833.3C91E9E793@ws6-2.us4.outblaze.com>


----- Original Message -----
From: Bowerbird@aol.com
> 
> 
> the next questions involve long-standing and often-repeated
> requests that i have made for other changes to p.g. policy.
> having received no satisfaction, in spite of the reasonableness
> of these requests, i will pursue a strategy of lobbying for them
> to a wider audience, but i make them again here for the record.
> 
> 1.  could you _please_ strive for consistency in your e-books?

We do.  We don't always succeed, but we do strive for it.

> 
> 2.  could you please ensure the policy on styled text is upheld?
> 

The whitewashers most definitely do.  As far as the style goes (beyond _ and *, we don't specify much).

> 3.  could you please start including graphic-file-names in your
> plain-text versions, so my viewer-app knows what to display where?
> 

Nope.  Plain-text means ... plain text, nothing more.

> 4.  could you please start including page-break information in your
> plain-text versions, so my viewer-app can use original page-breaks
> for those end-users that might desire that capability?
> 

Once again, nope.  Text doesn't provide a way to have this information in the file without it being jarringly placed in the middle of the flow of the text.  Other formats can provide this, but plain text cannot.

> 5.  could you please start including line-break information in your
> plain-text versions, for the same reason?
> 

Nope (see answer for the above).  This would be even more invasive to the reading experience, because you'd have a character or markup of some kind all over the place.  And plain text has to be able to be fully used in any old text editor/reader.  We can't assume the reader program will hide the line break information.


> my documentation on zen markup language (z.m.l.) will demonstrate
> how you can incorporate these requests into your plain-text files...

Sigh.  And once they are marked up, they are no longer plain text files.  They are z.m.l. files.  If you want to convert the entire catalog to z.m.l, feel free.  No one will use them, but feel free.

Josh
From jon at noring.name  Fri Nov 12 08:32:44 2004
From: jon at noring.name (Jon Noring)
Date: Fri Nov 12 08:32:54 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <4194B4A3.3050305@perathoner.de>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<4194B4A3.3050305@perathoner.de>
Message-ID: <111911965953.20041112093244@noring.name>

Marcello wrote:
> Karen Lofstrom wrote:

> If you cite a dead tree edition of something you are quite confident 
> that the cited text stays put. It wont change its wording or glide from 
> the cited page into the next etc.
>
> If you cite an electronic resource you have no such confidence. How do 
> you make sure that the text at the url you cite will not be edited or 
> removed? You cannot. How do you make sure the medium you cite will still 
> be readable in some years? In a hundred years reading a CDROM may be 
> harder than it was to read the rosetta stone.

Actually, this issue is dealable using hash functions. Once a digital
document is finished and archived, simply calculate a hash value for
it (or the set of files the work comprises.) Use a published, open
standards hashing algorithm -- there's many out there to choose from.

It's also possible to use digital signatures in some manner, but I'll
let the experts in this area discuss this possibility.

Textual integrity is definitely an issue, and it goes beyond just
keeping academics happy -- it is germane to the perceived integrity
of the entire collection of texts by society-at-large. By keeping the
page scans along with the digital texts, we are, in effect, telling
the users of the digital texts that we fully stand by the textual
integrity of the collection, that we did not pull any fast ones, and
that it can be trusted. We are putting our reputation on the line.

With using digital hashes and digital signatures, and redundant/
mirrored text repositories, we go a long ways towards assuring the
collection maintains its integrity. As others have noted, some
dictator or totalitarian regime in the future may break into one of
the repositories and start tweaking texts. So long as the whole
world does not revert to totalitarianism (where then we have much
bigger problems than the integrity of texts), then with a properly
designed repository it will always be possible to restore the original
digital texts from a clean, untouched digital repository. Hopefully
individuals will also keep digital texts laying around, but again here
we also need to keep in mind individuals can also tweak the texts, thus
the use of hashing/digital signatures is still needed.


>> If you don't want to cater to scholars, you're throwing away much of DP's
>> work.

> Its not our problem. Any amount of catering will not do away with 
> Academias percieved "limitations" of electronic media.

I don't have such a pessimistic view of academia. Yes, academics are
strange birds. But as the old generation dies, and a new generation
arises, familiar with accessing digital information, they will embrace
digital media with a fervor.

PG can certainly make its texts "academia friendly", or at least
reasonably so. The incremental effort (delta-t) to do the few more
things to make PG texts more academia-friendly is pretty small
compared to the overall time it takes to scan/type/OCR/proof a text.
And many of these added things have other small benefits outside of
academia itself, benefits for other user groups of PG texts.


> The best value for Academia (and the least work for us) would be just to 
> include the page scans. Any transcription you make will fall short of 
> the requirements of some scholar. I think we should use our time for 
> producing more books for a general audience instead than producing 
> Academia-certified editions of them.

It behooves PG to at least reasonably reach out to the requirements
of "academia" (which is not as monolithic as implied) in markup and
metadata, and include the original page scans for every work. That's
all that can be done and should be done.

Making the page scans available has purposes beyond just keeping
academics happy. For example, someone may wish to issue a retypeset
print edition of some work using the XML-based PG texts. Having the
original page scans there to verify document structure and layout
oddities will be useful to those doing final proofing of the output
typography. And as noted above, having the original page scans
available to future generations is a further protection of the textual
integrity of the digital text. It also has the side-benefit of being a
digital preservation of the original source, and this alone is a very
powerful argument to keep the page scans as an honored and integral
part of the PG collection -- it will greatly add value and purpose to
the PG collection. Disk space and bandwidth is no longer an issue
(well, it's no longer a major, show-stopper issue as it was a decade
ago.)

It mystifies me why the original page scans are treated by some here
as some sort of waste product, meant to be flushed down the toilet
when done, or that we don't need to preserve them, or need to have
access to them (I'm still surprised to hear that the scans for some of
the DP texts are not available to the public because of licensing
issues.)

Jon Noring

From joshua at hutchinson.net  Fri Nov 12 09:00:17 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Nov 12 09:00:22 2004
Subject: [gutvol-d] PG audience
Message-ID: <20041112170017.93D4C2F902@ws6-3.us4.outblaze.com>


----- Original Message -----
From: Marcello Perathoner <marcello@perathoner.de>
> 
> Janet Kegg wrote:
> 
> > Would it be somehow useful if I
> > include the url to the DP project page in the comments section of the
> > upload form? 
> 
> 
> It would be useful to include the dp project number in some form. We 
> have a discussion ongoing with Joshua on how to achieve this.
> 

Yep, pretty easy to implement in the teiHeader.  And then it can be used by Marcello's script to link back to the DP project commments fairly easily.  At this point, I figure it will be in there in the final specs.

Josh
From joshua at hutchinson.net  Fri Nov 12 09:06:02 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Nov 12 09:06:07 2004
Subject: [gutvol-d] Scholarly acceptance
Message-ID: <20041112170602.983BB109901@ws6-4.us4.outblaze.com>


----- Original Message -----
From: Michael Hart <hart@pglaf.org>
>
> On the other hand, there is always the Darwinian approach:
> 
> Those who do not use eBooks simply won't be able to keep up with
> the those who do.
> 
> This might be one of the best reasons for NOT giving them each
> eBook as an exact copy of a particular paper edition.


Ok, I'm not following the mental jump from point A to point B.

How does people being savvey with eBooks lead to not putting good bibliographic information in place in the book?  Dead tree editions will usually have information on how the text was obtained.  Why not ours?

Josh
From joshua at hutchinson.net  Fri Nov 12 09:20:49 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Nov 12 09:20:55 2004
Subject: [gutvol-d] PG audience
Message-ID: <20041112172049.3FA5EEDC4E@ws6-1.us4.outblaze.com>


----- Original Message -----
From: Jon Noring <jon@noring.name>
>
> It mystifies me why the original page scans are treated by some here
> as some sort of waste product, meant to be flushed down the toilet
> when done, or that we don't need to preserve them, or need to have
> access to them (I'm still surprised to hear that the scans for some of
> the DP texts are not available to the public because of licensing
> issues.)
> 

Just to clarify a little...

DP produces a large quantity of work based on scans produced by other organizations.  At different points in our history, we couldn't have kept up with the proofers any other way.

With some organizations, we are able to automate the "harvesting" of images with utilities.  Some, though, are willing to work with us, sending us the files or providing an easy access path.  In return, we agree (if they ask) not to post the images for use other than as proofing sources.  The credit lines always have a reference back to the original site for the images.

Basically, if we wanted to be rude, we could post the images any way we wanted to.  The copyright laws certainly seem to be in our favor if we did.  But these sites are going out of their way to help us, so we return the favor by routing people that want to see the images (the fruits of their labor) back to them.

Josh
From shalesller at writeme.com  Fri Nov 12 09:49:44 2004
From: shalesller at writeme.com (D. Starner)
Date: Fri Nov 12 09:49:56 2004
Subject: [gutvol-d] Perfection
Message-ID: <20041112174944.77DE74BE64@ws1-1.us4.outblaze.com>

"Norm Wolcott" writes:

> Also there are many DOS texts with accents that are hence 
> unreadable. Any code page should be acceptable? maybe but. . . 

Can you point to one? I was under the impression that all the
texts in old DOS codepages were updated to use Latin-1.

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From marcello at perathoner.de  Fri Nov 12 10:31:51 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov 12 10:31:59 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <EIEBIMKJHGGGNFLDOPPKOEKOCJAA.mbuch@mcsp.com>
References: <EIEBIMKJHGGGNFLDOPPKOEKOCJAA.mbuch@mcsp.com>
Message-ID: <41950197.2020707@perathoner.de>

Her Serene Highness wrote:

> Why do you care whether academics cite PG?  You seem to think they should
> come to you- did you ever think we have this thing called a 'page' that acts
> as a standard unit of knowledge, and that when we cite something, we need
> that page to stay reasonably stable?  

Did it ever occur to you that the "page" as "standard unit of knowledge" 
is a purely arbitrary thing? The standard unit of knowledge depends on 
the information technology of the epoch.

It first was the "cave wall", then became the "clay plate", then it 
became the "scroll", then it became the "page" and today it is the 
"internet resource".

I can google any cited phrase on the net in a few keystrokes' time. OTOH 
to verify a quotation, it may take months until I get my hands on a 
physical copy of some random obscure book .


-- 
Marcello Perathoner
webmaster@gutenberg.org

From gbnewby at pglaf.org  Fri Nov 12 10:46:48 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Nov 12 10:46:49 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <EIEBIMKJHGGGNFLDOPPKCEKOCJAA.mbuch@mcsp.com>
References: <006e01c4c8cb$e5c621a0$069595ce@net>
	<EIEBIMKJHGGGNFLDOPPKCEKOCJAA.mbuch@mcsp.com>
Message-ID: <20041112184648.GC3160@pglaf.org>

On Fri, Nov 12, 2004 at 10:48:18AM -0500, Her Serene Highness wrote:
> And herein lies some of the problem.  I'm a college professor, and I
> recently earned my PhD.  I would have had a hard time getting a rtext past
> my professors without being able to document who published it.  I would have
> a hard time making a citation to a document with no pages.  I would be very
> annoyed with a student who just pointed to something on the net that had no
> provenance whatsoever- even many pieces of ephemera have provenance.  I
> don't think this is a matter of fuddy-duddy professors who just don't
> understand how wonderful e-books are;  I think the very concept of e-books
> as it now stands, while excellent for casual readers or people who simply
> want to educate themselves, is deeply flawed. When I am citing a text, I
> cannot refer to a vague document.  I need to know EXACTLY when the original
> was published, who published it, and where, since there are variant texts
> out there.  Even a single word change that might have occurred in the
> copying process could change the meaning of a vital sentence. PG is
> wonderful- but as a student and a teacher, I don't think that most
> cybertexts provide the citability that is so important for academics. If PG
> was the only source in the world for vital texts, that would be one thing-
> but it isn't.
>...

My Ph.D. in Information Transfer is from 1993.  I've taught
Internet stuff and a whole lot of other things since 1988.
I went to college in 1983, and never left, holding faculty
positions since 1991 - in short, I'm very much a professional
academic.  Here are some of my experiences related to
electronic texts:

- I *have* entirely electronic articles cited in my
academic vita (http://petascale.org/vita.html).
Nobody (none of my deans, etc.) has even
raised an eyebrow.  Today, like always, peer review and
the reputation of the publication are what matters, not
whether it was printed.

- I have refused paper submissions of any assignments
from my students for years (http://petascale.org/paperless.html),
including master's theses and doctoral dissertations.
Again, this is just not a problem.  At the end of the
degree process, we (the committee) signs a piece of paper
and the student submits copies of the printed document
to the library.  Then, a PDF or similar goes to various
archives and Web pages, and is available for widespread
free access.

- I was recently appointed Editor of the standards document
series in the Global Grid Forum (http://www.ggf.org),
which publishes an all-electronic document series modeled
after the RFC series published by the IETF (which is
much older, and is essentially the standards that defines
the Internet).  

- Every citation format (APA, MLA, Chicago, etc.) specifies
how to cite documents which are not printed.  For the 
most part, they distinguish between epheremal stuff like
email messages and more permanent stuff like online
journal articles.  This is still difficult, and many
people cite inappropriate items as though they were
published documents rather than things like personal communication,
changeable Web pages, etc.  But it's certainly done,
and it's done in journal articles (print & electronic),
standards documents, books, newspaper articles, etc.
Here's one of many good pages describing electronic
citation: 
  http://owl.english.purdue.edu/handouts/research/r_docelectric.html

In short, I'm happy to say that my experience is completely
different than yours.  Moreover, unlike you, I seem
to have specific documents, citations and processes
to back up my impressions, while you haven't provided any.  

Certainly it's the case that some academic fields rely
more on the exact words of a particular printed item.
Hermeneutics is an example, and some others of the
historical, classic & humanities disciplines.  But to
dismiss "academics" as being unable to deal with
online content (as the subject/object of research, as
support for research, or as the published outcome of
research) is certainly an overstatement, and inconsistent
with the experiences of me and my academic peers.
  -- Greg


From gbnewby at pglaf.org  Fri Nov 12 10:54:00 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Nov 12 10:54:02 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com>
References: <4194B4A3.3050305@perathoner.de>
	<LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<4194B4A3.3050305@perathoner.de>
	<5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com>
Message-ID: <20041112185400.GD3160@pglaf.org>

On Fri, Nov 12, 2004 at 07:11:46AM -0800, Tony Baechler wrote:
> At 06:23 AM 11/12/2004 -0800, you wrote:
> 
> >Actually, it's pretty easy to find all the original Project Gutenberg 
> >eBooks,
> >as well as the newer versions, because so many places keep them, usually in
> >the thousands for any of our eBooks that have been out for even a week.
> 
> Hello.  Actually, I've had a hard time finding any of the very early 
> editions of PG files.  There are some old files in the etext90 directory, 
> but not edition 10 of the first several ebooks.  I would be interested to 
> find the very first edition of when10.txt or whatever it was called as MH 
> posted it.  Even the old GUTINDEX.* files have been removed, with the 
> earliest being GUTINDEX.96 when it used to be GUTINDEX.90. 

Michael might have some of the older files.  There are a few
sources, like old Walnut Creek CDs, that might also be able
to help.

These days, we essentially never delete anything (not strictly
true, but close enough...  and we run a no-delete mirror for
when mistakes happen).  But in the past, Michael would remove
older files.  This was largely due to space constraints on
the hosting servers.

As for the GUTINDEX* files, we don't keep older files around,
since they are essentially always updated weekly.  
I can see the reason for interest in looking back through older
files, though - maybe we'll start doing this in a new subdirectory.

Note that the GUTINDEX files have been through many iterations.
Michael used to maintain them, then I did, and now George
Davis does.  The filenames have changed, and so has the format.
For the most part, this has been simply to accommodate the changing
nature of the publications, enhanced metadata (like contents
listings), and other pragmatics.

Unrelated story: I needed to print GUTINDEX.ALL the other day
(as part of an affadavit for another legal case I'm helping
with, where we once again show there are "significant non-infringing
uses" for online content).  It's about 550 pages.  Whew!  I hope
that's the only time in this decade anyone needs to print it.
  -- Greg

From sly at victoria.tc.ca  Fri Nov 12 10:53:40 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Fri Nov 12 10:57:48 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <006e01c4c8cb$e5c621a0$069595ce@net>
References: <006e01c4c8cb$e5c621a0$069595ce@net>
Message-ID: <Pine.GSO.4.58.0411121047160.4645@vtn1.victoria.tc.ca>


On Fri, 12 Nov 2004, Norm Wolcott wrote:

>randomly removed. Also there are many DOS texts with accents that are
>hence unreadable. Any code page should be acceptable? maybe but. . .

We have a couple people who are fixing up and reposting older files,
(which is often more involved than simply changing character encoding)
A little while ago, I heard over 400 etexts had been reposted, so
it's more than that by now.

Are you volunteering to help?

Andrew
From mbuch at mcsp.com  Fri Nov 12 13:58:52 2004
From: mbuch at mcsp.com (Her Serene Highness)
Date: Fri Nov 12 13:57:07 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <20041112184648.GC3160@pglaf.org>
Message-ID: <EIEBIMKJHGGGNFLDOPPKIELECJAA.mbuch@mcsp.com>


-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Greg Newby
Sent: Friday, November 12, 2004 1:47 PM
To: Project Gutenberg Volunteer Discussion
Subject: Re: [gutvol-d] Perfection


On Fri, Nov 12, 2004 at 10:48:18AM -0500, Her Serene Highness wrote:
> And herein lies some of the problem.  I'm a college professor, and I
> recently earned my PhD.  I would have had a hard time getting a rtext past
> my professors without being able to document who published it.  I would
have
> a hard time making a citation to a document with no pages.  I would be
very
> annoyed with a student who just pointed to something on the net that had
no
> provenance whatsoever- even many pieces of ephemera have provenance.  I
> don't think this is a matter of fuddy-duddy professors who just don't
> understand how wonderful e-books are;  I think the very concept of e-books
> as it now stands, while excellent for casual readers or people who simply
> want to educate themselves, is deeply flawed. When I am citing a text, I
> cannot refer to a vague document.  I need to know EXACTLY when the
original
> was published, who published it, and where, since there are variant texts
> out there.  Even a single word change that might have occurred in the
> copying process could change the meaning of a vital sentence. PG is
> wonderful- but as a student and a teacher, I don't think that most
> cybertexts provide the citability that is so important for academics. If
PG
> was the only source in the world for vital texts, that would be one thing-
> but it isn't.
>...

My Ph.D. in Information Transfer is from 1993.  I've taught
Internet stuff and a whole lot of other things since 1988.
I went to college in 1983, and never left, holding faculty
positions since 1991 - in short, I'm very much a professional
academic.  Here are some of my experiences related to
electronic texts:

- I *have* entirely electronic articles cited in my
academic vita (http://petascale.org/vita.html).
Nobody (none of my deans, etc.) has even
raised an eyebrow.  Today, like always, peer review and
the reputation of the publication are what matters, not
whether it was printed.

Agreed.  I have no reason to doubt you.  However- you did say that your work
is in Information Transfer, right? Do you think there might be a teensy bit
of difference between a reference by an Information Transfer academic that
is from an electronic journal and was published for other academics in that
and related fields, and a citation of, say, Emily Dickenson's poetry without
information as to when the book it was taken from was published- considering
that it is now known that many earlier copies of Dickenson used incorrect
punctuation because previous editors messed around with them?

I'd have no problem accepting or using a citation of the US Census online-
I've done it.  I've used citations of NYS divorce and sexual offense law
from online sources- no problem.  All of those are frequently updated. But a
citation of an out of print book in anthropology, English literature, the
hard scieces, et al, which might very well not be correct in its
information- that will be problematic.

I would be very happy to see Boas online.  Eventually I hope to track down
an out of copywright version of his writings and scan it for PG.  I'd like
to do the same with Zora Neale Hurston, Ruth Benedict, and quite a few other
people. However- and this is the big 'however'- while these texts would be
useful for casual and serious non-academic readers, and even for many
academic readers as a point of reference, theie usefulness would be
seriously impaired without info as to who originally published the books and
when. Boas' works vary according to edition- therefore, knowing which
edition you are reading can matter if you are doing research on his
theories.  If I were doing online research in a general fashion onthe
history of anthropology, it wouldn't matter.  If I were writing a scholarly
work, it would.  It would also matter if there was no pagination.

Again- I'm not talking about materials produced in the past twenty years.
I'm talking about historical materials. They are not entirely electronic.

Another example- I'm tutoring a 15 year old about the incidents that led up
to WW2.  We go online and find the Treaty of Versailles.  He can cite it-
not only is it a well-known document (making it easy to check for errors and
lacunae), but each section of the treaty is numbered.  It's easy for him to
refer to Article 15 in a paper, and easy for a teacher to find the section
in an online document. I would encourage him to use it in class, and to do
an internet citation- no problem.  But if he was to try to cite Winston
Churchill's autobiography from an online site (Not that it's online) or Mein
Kampf (which probably is), he'd run up against a problem.  In chapter 5
there might be a very quotable sentence- but what my student doesn't know is
that this sentence was changed in later editions.  And there's no page
number- does he tell his teacher to read the entire chapter to find a
sentence that won't be there in a later edition?

The last time I looked at PG (a few weeks ago) I found it very easy to red
books if I wanted to read the whole text.  If I wanted to find chapters or
pages I had hard luck- I had to scan through whole documents.

You don't have to believe me.  Just find this quote.  It's from The Koran.
"And thou takest vengeance on us only because we have believed on the signs
of our Lord when they came to us. Lord! pour out constancy upon us, and
cause us to die Muslims." It's in Sura VII.  I have no doubt that you'll
find it- but it will take you quite a while to do so with no page numbers
and no way to go to each section separately.  As a teacher, I don't have
time to read half The Koran (that's a hint, by the way)to find this one
quote on PG.  I can however find websites that will make the search much
easier for me, and will provide some info on the translation.  After all, I
have no idea who JM Rodwell was, or whether his translation of The Koran is
the definitive English version, or why his translation was chosen- other
than that his book was out of copyright. From my point of view, that's a red
flag itself.  If this translation is so superb, why isn't it still being
used- or is it?
to the library.  Then, a PDF or similar goes to various
archives and Web pages, and is available for widespread
free access.

- I was recently appointed Editor of the standards document
series in the Global Grid Forum (http://www.ggf.org),
which publishes an all-electronic document series modeled
after the RFC series published by the IETF (which is
much older, and is essentially the standards that defines
the Internet).

- Every citation format (APA, MLA, Chicago, etc.) specifies
how to cite documents which are not printed.  For the
most part, they distinguish between epheremal stuff like
email messages and more permanent stuff like online
journal articles.  This is still difficult, and many
people cite inappropriate items as though they were
published documents rather than things like personal communication,
changeable Web pages, etc.  But it's certainly done,
and it's done in journal articles (print & electronic),
standards documents, books, newspaper articles, etc.
Here's one of many good pages describing electronic
citation:
  http://owl.english.purdue.edu/handouts/research/r_docelectric.html

I'm aware of that.  As I states above, I've used electronic citations, even
when professors raised eyebrows.  But you are not dealing with my particular
statements, which have nothing to do with the citation of contemporary
documents and ephemera, or with copies of documents that make searches for
particular passages much easier for readers and writers. I was very specific
in my criticism- and since you have a degree in Information Transfer and
have taught Library Science, it ought to be of concern to you, too.

But being an expert in Information Trnasfer is not the same thing as doing
research using out of print documents.  Your business is making them
readable and accessible, which is important.  From where I stand that is
important too, but less important than being able to consistently find
passages, and checking to see the differences  according to editions.
Nietzsche's work for instance, was butchered by his sister.  There are
conflicting copies of his work floating around.  When his works were copied
for Project Gutenberg, did someone go for an out of copyright copy that is
definitive, or one that his sister chopped up?  Did that matter, or was it
just more important to get a copy up?

Cattle ranchers, butchers, and chefs all deal with meat.  That doesn't make
a chef an expert on cattle feed or an butcher an expert on how to best
prepare beef in orange sauce. We may both be involved in academia, but our
concerns regarding information technology might be very different- that
doesn't mean that one or both of us are idiots, or that I'm a Luddite, or
that you're a geek with no appreciation for what's inside the books you put
up.
support for research, or as the published outcome of
research) is certainly an overstatement, and inconsistent
with the experiences of me and my academic peers.
  -- Greg


_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

From joshua at hutchinson.net  Fri Nov 12 14:14:26 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Nov 12 14:14:36 2004
Subject: [gutvol-d] Perfection
Message-ID: <20041112221426.87CDD10997C@ws6-4.us4.outblaze.com>


----- Original Message -----
From: "Her Serene Highness" <mbuch@mcsp.com>
> 
> You don't have to believe me.  Just find this quote.  It's from The Koran.
> "And thou takest vengeance on us only because we have believed on the signs
> of our Lord when they came to us. Lord! pour out constancy upon us, and
> cause us to die Muslims." It's in Sura VII.  I have no doubt that you'll
> find it- but it will take you quite a while to do so with no page numbers
> and no way to go to each section separately.  

Not trying to be a smart-### here, but I tried your example.  Time to open PG's website and search for Koran... ~25 seconds.  Clicked on the Koran link, it downloaded quickly (thanks to a T1 here at work! ;) ~ 5 seconds.  Control-F, paste in the first four words from your quote ("And thou takest vengeance), hit return ... first hit was the right one.  5 seconds at most.

Total time from reading your paragraph to reading the passage from the Koran... 35 seconds.

This is why electronic citation is so much BETTER.  I someone points me to the file they cited from, a quick search will turn it up in seconds and opposed to finding the book, flipping to the page and skimming down through the text to find the quote material.

And just so you know I did find the material in the Koran...

I will surely cut off your hands and feet on opposite sides; then will I have
you all crucified."

They said, "Verily, to our Lord do we return;

And thou takest vengeance on us only because we have believed on the signs of
our Lord when they came to us. Lord! pour out constancy upon us, and cause us
to die Muslims."

Then said the chiefs of Pharaoh's people-"Wilt thou let Moses and his people
go to spread disorders in our land, and desert thee and thy gods?" He said,
"We will cause their male children to be slain and preserve their females
alive: and verily we shall be masters over them."

****

Ah, the oneness of religion!  Christianity is built upon the foundation of Judaism ... Islam references both ... Yet religious reasons are giving for so much fighting.

Josh
From jlinden at projectgutenberg.ca  Fri Nov 12 12:18:46 2004
From: jlinden at projectgutenberg.ca (James Linden)
Date: Fri Nov 12 14:18:45 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <EIEBIMKJHGGGNFLDOPPKIELECJAA.mbuch@mcsp.com>
Message-ID: <GMEEKFFEKIEMPAIELDFNIEGDCDAA.jlinden@projectgutenberg.ca>

A simple implementation of the id="" HTML attribute would solve the issues
regarding quoting a particular sentence or paragraph... for example:
http://kodekrash.com/project/btw_ufs.html#p191 -- will put you right at a
paragraph talking about learning multiplication before cube roots (in Booker
T Washington's autobiography).

If we had decent master versions of the texts, such features would be
child's play... I _will_not_ go into the "master versions" rant again tho.

-- James


From kris at transitory.org  Fri Nov 12 15:24:14 2004
From: kris at transitory.org (kris foster)
Date: Fri Nov 12 15:24:25 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <20041112221426.87CDD10997C@ws6-4.us4.outblaze.com>
References: <20041112221426.87CDD10997C@ws6-4.us4.outblaze.com>
Message-ID: <20041112182216.Y99646@krweb.net>

> This is why electronic citation is so much BETTER.  I someone points me 
> to the file they cited from, a quick search will turn it up in seconds 
> and opposed to finding the book, flipping to the page and skimming down 
> through the text to find the quote material.

this is a dangerous reliance on a transitory medium.  electronic citation 
is merely more convenient.

--kris
From shalesller at writeme.com  Fri Nov 12 15:45:27 2004
From: shalesller at writeme.com (D. Starner)
Date: Fri Nov 12 15:45:40 2004
Subject: [gutvol-d] Perfection
Message-ID: <20041112234527.EF05E4BE64@ws1-1.us4.outblaze.com>

Let me note, I had no way of telling Greg's comments apart
from yours except for context. Perhaps you relied on some
HTML thing; please don't do so. I'm not going to argue the
wisdom of HTML email, but HTML email that does not degrade
nicely to plain text is going to look awful to many of the
recievers.

"Her Serene Highness" writes:

> But a 
> citation of an out of print book in anthropology, English literature, the 
> hard scieces, et al, which might very well not be correct in its 
> information- that will be problematic. 

But this has nothing to do with etexts; this has to do with older books.

> 
> I would be very happy to see Boas online. Eventually I hope to track down 
> an out of copywright version of his writings and scan it for PG. 

It'll be a long time, unless you move to Canada. The last of his
works are out of print for another 7 years in the EU and 33 years in
the US. The Bureau of American Ethnology volumes are being worked on
up to 1930 (since it's a US government publication) and I believe that
includes some work by Boas.

> In chapter 5 
> there might be a very quotable sentence- but what my student doesn't know is 
> that this sentence was changed in later editions. And there's no page 
> number- does he tell his teacher to read the entire chapter to find a 
> sentence that won't be there in a later edition? 

What is he supposed to do, give a page reference to one of a dozen editions
that might be very hard for the teacher to find? With etexts, you know
that your recipent has access to the same edition you have. And as someone
else pointed out, if you quote the sentence, the context can be found in
seconds.

> After all, I 
> have no idea who JM Rodwell was, or whether his translation of The Koran is 
> the definitive English version, or why his translation was chosen- other 
> than that his book was out of copyright. From my point of view, that's a red 
> flag itself. If this translation is so superb, why isn't it still being 
> used- or is it? 

And how do I know that if I pull it off the library shelves? My college library
has a half dozen different translations of the Koran; how am I to know which
are in use?

As for the reason it's not being used, I would suggest that the fact that 
academics like to retranslate everything every decade might be an explanation.
My class used a modern translation of the Iliad, but that doesn't mean that
in several hundred years of English translation of the work that's now public
domain, there's not one competent, even superb translation.
 
> Nietzsche's work for instance, was butchered by his sister. There are 
> conflicting copies of his work floating around. When his works were copied 
> for Project Gutenberg, did someone go for an out of copyright copy that is 
> definitive, or one that his sister chopped up? Did that matter, or was it 
> just more important to get a copy up? 

I doubt that the people who scanned it were aware of the differences. 
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From jmdyck at ibiblio.org  Fri Nov 12 15:54:30 2004
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Fri Nov 12 15:54:51 2004
Subject: [gutvol-d] PG audience
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<4194B4A3.3050305@perathoner.de>
	<Pine.LNX.4.60.0411120623310.28125@pglaf.org>
Message-ID: <41954D36.7B9503D9@ibiblio.org>

Michael Hart wrote:
> 
> If we cater to scholars, we are only expanding the "digital divide,"
> so to speak.  Our goal is to provide a large viable library to all,
> not just to the scholars, who represent less than 1% of the people,
> and are often very elitist.

I don't think anyone is advocating providing the PG library "just to the
scholars", so that's a strawman.

Instead, some people simply want to make PG texts more useful to
scholars than they currently are, and I think we can do that without
making them less useful or less available to non-scholars.

-Michael
From jmdyck at ibiblio.org  Fri Nov 12 15:59:48 2004
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Fri Nov 12 16:00:08 2004
Subject: [gutvol-d] increasing literacy
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<4194B4A3.3050305@perathoner.de>
	<Pine.LNX.4.60.0411120623310.28125@pglaf.org>
Message-ID: <41954E74.4EFB64DE@ibiblio.org>

Michael Hart wrote:
> 
> If we can increase literacy by even 10%,
> we make more difference than if we cater
> to the scholars.

We could make even more difference by doing both!

Setting that aside, do we have any data (or even anecdotal evidence)
re the effect of Project Gutenberg on literacy levels?

-Michael
From jon at noring.name  Fri Nov 12 16:22:13 2004
From: jon at noring.name (Jon Noring)
Date: Fri Nov 12 16:22:31 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <41954D36.7B9503D9@ibiblio.org>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<4194B4A3.3050305@perathoner.de>
	<Pine.LNX.4.60.0411120623310.28125@pglaf.org>
	<41954D36.7B9503D9@ibiblio.org>
Message-ID: <184940134937.20041112172213@noring.name>

Michael Dyck wrote:
> Michael Hart wrote:

>> If we cater to scholars, we are only expanding the "digital divide,"
>> so to speak.  Our goal is to provide a large viable library to all,
>> not just to the scholars, who represent less than 1% of the people,
>> and are often very elitist.

> I don't think anyone is advocating providing the PG library "just to the
> scholars", so that's a strawman.
>
> Instead, some people simply want to make PG texts more useful to
> scholars than they currently are, and I think we can do that without
> making them less useful or less available to non-scholars.

Agreed.

It is possible to come up with a "happy medium" set of baseline
requirements which will make the PG texts useful for many purposes.
Those who wish to make particular texts even more useful than the
baseline for a particular user group simply add more stuff. XML
makes it quite easy to extend the features -- just add markup to the
content and to the metadata fields.

A possibly useful exercise is to categorize the various uses and user
groups, and then determine what are the most important features each
user group especially desires/needs.

Without thinking about it for more than 30 seconds, here's a partial
list of different user groups. No doubt this list can be expanded and
much better described/subcategorized. But it's a start to further
discussion if enough here deem it of interest.

1) Personal interest readers
2) Scholars and researchers
3) Students (K-12 and post-secondary)
4) Professional and vocational


Jon Noring

From mbuch at mcsp.com  Fri Nov 12 16:37:30 2004
From: mbuch at mcsp.com (Her Serene Highness)
Date: Fri Nov 12 16:35:49 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <20041112234527.EF05E4BE64@ws1-1.us4.outblaze.com>
Message-ID: <EIEBIMKJHGGGNFLDOPPKKELICJAA.mbuch@mcsp.com>


-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of D. Starner
Sent: Friday, November 12, 2004 6:45 PM
To: Project Gutenberg Volunteer Discussion
Subject: RE: [gutvol-d] Perfection


Let me note, I had no way of telling Greg's comments apart
from yours except for context. Perhaps you relied on some
HTML thing; please don't do so. I'm not going to argue the
wisdom of HTML email, but HTML email that does not degrade
nicely to plain text is going to look awful to many of the
recievers.

**Michele here.  I'll clarify it for you.  I didn't use HTML- I thought
those arrow thingies would show up, and they didn't.**

"Her Serene Highness" writes:

> But a
> citation of an out of print book in anthropology, English literature, the
> hard scieces, et al, which might very well not be correct in its
> information- that will be problematic.

But this has nothing to do with etexts; this has to do with older books.

** In some cases it does have to do with older books.  But we aren't dealing
with new books.  We're dealing with old ones. We're also dealing with the
problem of not having the master texts.**

>
> I would be very happy to see Boas online. Eventually I hope to track down
> an out of copywright version of his writings and scan it for PG.

It'll be a long time, unless you move to Canada. The last of his
works are out of print for another 7 years in the EU and 33 years in
the US. The Bureau of American Ethnology volumes are being worked on
up to 1930 (since it's a US government publication) and I believe that
includes some work by Boas.

**I'm young enough that I'm willing to wait, and for all I know I may end up
in Canada.  But that's not the point.  i would like to see Boas available to
everyone.  And to some extent he is- in paper. An ebook of his work isn't
'better' as someone said- it's different.**

> In chapter 5
> there might be a very quotable sentence- but what my student doesn't know
is
> that this sentence was changed in later editions. And there's no page
> number- does he tell his teacher to read the entire chapter to find a
> sentence that won't be there in a later edition?

What is he supposed to do, give a page reference to one of a dozen editions
that might be very hard for the teacher to find? With etexts, you know
that your recipent has access to the same edition you have. And as someone
else pointed out, if you quote the sentence, the context can be found in
seconds.

**Why not?  It's done all the time. Students and scholars have cited rare
books that are impossible to find before- I remember citing a rare book that
contained the concordat between the Vatican and Germany for a grad class
years ago, and information on the Black Star line of Marcus Garvey while
still in high school.  Why did my professors accept  my citations? Because
they could be tracked down.  It wasn't impossible to find the originals-
just difficult. The former one was located in Bobst Library at NYU and the
latter was in the NY Public Library's Schomberg Collection. I can find both
of them more easily now, because both libraries have their catalogues
online.  That menas I can find the cites and then go look at the actual
books.  Since there is no physical book with PG that an outsider can hold,
it would be nice to have a master scan of the text. PG isn't meant to be a
master text- it's a repository for copies.  But copies come from somewhere.

'The context can be found in seconds'. Uh huh.  The context of what?  The
context of a no longer accepted version of an original text? The context of
a book that is out of date?  I looked at the front end of the Koran.  From
the Translator's note, he (I assume it was a he) made the translation
sometime in the 19th centruy- or the early 20th. I can tell, because he used
the word 'Mohammedan', and because I know know PG uses books out of
copyright, and because the language and other signs pointed to it being from
the 19th century. But other than as a work of literature, i'd have problems
using it- like if I were comparing 19th century versions of Arabic texts,
because I'm not even sure it was written in the 19th century.**

> After all, I
> have no idea who JM Rodwell was, or whether his translation of The Koran
is
> the definitive English version, or why his translation was chosen- other
> than that his book was out of copyright. From my point of view, that's a
red
> flag itself. If this translation is so superb, why isn't it still being
> used- or is it?

And how do I know that if I pull it off the library shelves? My college
library
has a half dozen different translations of the Koran; how am I to know which
are in use?
**How? Easy. You look at other books about Koranic translations and see if
they refer to this one- and guess what? You can't do that online. Which
means you have to go to a library anyway. Online isn't BETTER.  It's
different.
By the way- in a library, I can tell if a book is a reprint.  If it was
reprinted, chances are someone thought it was good enough to put out for
sale again.  I can tell that without even picking up any other books on the
shelf- it's called 'looking at the publishing date and the edition number'.
It an old trick but you knew that already.  Most school libraries don't keep
first editions of definitive books on shelves- they're too valuable. A first
edition with value- one that is considered important- would be kept in back.
I can tell things like that from a card catalogue- PG is a library without
one, in the snese of having the kinds of basic info that card catalogues
(even electronic ones) have.**

As for the reason it's not being used, I would suggest that the fact that
academics like to retranslate everything every decade might be an
explanation.
My class used a modern translation of the Iliad, but that doesn't mean that
in several hundred years of English translation of the work that's now
public
domain, there's not one competent, even superb translation.

> Nietzsche's work for instance, was butchered by his sister. There are
> conflicting copies of his work floating around. When his works were copied
> for Project Gutenberg, did someone go for an out of copyright copy that is
> definitive, or one that his sister chopped up? Did that matter, or was it
> just more important to get a copy up?

I doubt that the people who scanned it were aware of the differences.

**That's my point. Thjey did a great job copying it. People all around the
world can read it and learn from it.  That gives it worth.  But i sit worth
a whole lot to someone doing work on Nietzsche, and on how his ideas were
changed by his editors?

You hear that sound? It's the sound of someone starting their car to go to a
physical library. anyone who is really interested in his philosophy- or even
a college student trying to do a decent paper- has no way of knowing which
edition this is or where it came from. A few added words on the front end
(original publisher, original publishing date, number of pages edition
number) and that problem would be gone.  No academic worth his or her salt
would be able to seriously dispute it.  If the original index were included
(if there was one- authors often do that themselves, too) and the
bibliography too- well then you've got yourself a book.  I could print it
out and share it with my friends- after all, most people don't read whole
books online. Control F is only useful if I'm in front of a machine.  If I
want to read a Tom Swift book to my kids at a chapter a night, I'm not going
to do it from a laptop or park little Johnny's bed next to my desk. Maybe
other people here take their T1 connections to the beach with them or on the
subway, but I don't.  When I want to cite something to my students, i print
out something at home and take it with me, maybe with highlights all over
the paper.  That's funny thing about books- even casual readers like writing
in the margins.

I happen to love PG- but it will be in ideal form when it has hyperlinks to
other books and to the notes I type up, when I can print it out and have it
paginated, when I can tell if I'm reading a facsimile of a first edition.  I
know that's a lot, but a girl can dream. amd some sites are doing that kind
of thing with individual books already- but their scope isn't as large as
PG's.  PG's scope is what makes it valuable, but I wouldn't use it foor
scholarly work.

One person made the comment that PG shouldn't try to anticipate what
scholars want- it should let scholars discover it and let them say what they
need.  I just did, and most of what I'm hearing is that I have to learn to
adapt to PG, when there are perfectly good college libraries out there.
There is no reason for scholars to embrace a site that doesn't even meet up
with basic MLA guidelines for books.  After all, that's the business you are
in- not original websites, but books.
--
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

From mbuch at mcsp.com  Fri Nov 12 16:47:34 2004
From: mbuch at mcsp.com (Her Serene Highness)
Date: Fri Nov 12 16:45:53 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <41954D36.7B9503D9@ibiblio.org>
Message-ID: <EIEBIMKJHGGGNFLDOPPKOELICJAA.mbuch@mcsp.com>


-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Dyck
Sent: Friday, November 12, 2004 6:55 PM
To: Project Gutenberg Volunteer Discussion
Subject: Re: [gutvol-d] PG audience


Michael Hart wrote:
>
> If we cater to scholars, we are only expanding the "digital divide,"
> so to speak.  Our goal is to provide a large viable library to all,
> not just to the scholars, who represent less than 1% of the people,
> and are often very elitist.

I don't think anyone is advocating providing the PG library "just to the
scholars", so that's a strawman.

Instead, some people simply want to make PG texts more useful to
scholars than they currently are, and I think we can do that without
making them less useful or less available to non-scholars.

-Michael

**I have all kinds of books on my shelf- first edition anthro texts, humor
books, cook books.  Each one of them has a publisher and info on the
publishing date.  If PG is a publishing house for out-of-copyright books,
fine.  But it's supposedly a book repository.  If it's a repository of books
that were actually published in the real world, why are the original
paginations, illustrations and figures, maps, indexes and bibliographies,
and publication dates such a problem?  If I want to be taken seriously as an
engineer but I use my own terminology for basic engineering terms or just
refuse to lose them at all, why should I get shirty if engineers with
college degrees don't take me seriously?
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

From mbuch at mcsp.com  Fri Nov 12 16:49:04 2004
From: mbuch at mcsp.com (Her Serene Highness)
Date: Fri Nov 12 16:47:20 2004
Subject: [gutvol-d] increasing literacy
In-Reply-To: <41954E74.4EFB64DE@ibiblio.org>
Message-ID: <EIEBIMKJHGGGNFLDOPPKGELJCJAA.mbuch@mcsp.com>

Illiterates rarely use computers for reading. PG would be useful after a
person became literate, i.e., able to read. Even the children's books on PG
are a bit too advanced for a person who is non-litereate.  Having taught
reading, it would not be the first place I would turn- it's too text-heavy,
for one thing.

-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Dyck
Sent: Friday, November 12, 2004 7:00 PM
To: Project Gutenberg Volunteer Discussion
Subject: [gutvol-d] increasing literacy


Michael Hart wrote:
>
> If we can increase literacy by even 10%,
> we make more difference than if we cater
> to the scholars.

We could make even more difference by doing both!

Setting that aside, do we have any data (or even anecdotal evidence)
re the effect of Project Gutenberg on literacy levels?

-Michael
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

From jeroen at bohol.ph  Fri Nov 12 17:30:05 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Fri Nov 12 17:30:01 2004
Subject: [gutvol-d] Gone with the wind i s "Gone with the wind"
In-Reply-To: <006d01c4c8cb$e4d38440$069595ce@net>
References: <006d01c4c8cb$e4d38440$069595ce@net>
Message-ID: <4195639D.5060600@bohol.ph>

Norm Wolcott wrote:

> Sayonara. Apparently all versions of GWTW have disappeared from the net.
>  
> nwolcott2@post.harvard.edu <mailto:nwolcott2@post.harvard.edu>  Friar 
> Wolcott, Gutenberg Abbey, Sherwood Forrest


Try the wayback machine, www.archive.org

Jeroen.

From jon at noring.name  Fri Nov 12 17:51:02 2004
From: jon at noring.name (Jon Noring)
Date: Fri Nov 12 17:51:22 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <EIEBIMKJHGGGNFLDOPPKKELICJAA.mbuch@mcsp.com>
References: <EIEBIMKJHGGGNFLDOPPKKELICJAA.mbuch@mcsp.com>
Message-ID: <189945464046.20041112185102@noring.name>

Michele "Her Serene Highness" wrote:

> [snip of excellent comments]
>
> **Why not?  It's done all the time. Students and scholars have cited rare
> books that are impossible to find before- I remember citing a rare book that
> contained the concordat between the Vatican and Germany for a grad class
> years ago, and information on the Black Star line of Marcus Garvey while
> still in high school.  Why did my professors accept  my citations? Because
> they could be tracked down.  It wasn't impossible to find the originals-
> just difficult. The former one was located in Bobst Library at NYU and the
> latter was in the NY Public Library's Schomberg Collection. I can find both
> of them more easily now, because both libraries have their catalogues
> online.  That menas I can find the cites and then go look at the actual
> books.  Since there is no physical book with PG that an outsider can hold,
> it would be nice to have a master scan of the text. PG isn't meant to be a
> master text- it's a repository for copies.  But copies come from somewhere.

The above comment suggests two basic requirements PG should embrace
for all texts:

1) The original source (or sources for composite works) is fully
   identified and described in the metadata using accepted library
   cataloging standards, and that these fields are searchable.

2) The original page scans also exist in the database, linked to and
   from the digital text version (easy to do in XML -- TEI has markup
   for this purpose.)


> I happen to love PG- but it will be in ideal form when it has hyperlinks to
> other books and to the notes I type up, when I can print it out and have it
> paginated, when I can tell if I'm reading a facsimile of a first edition.  I
> know that's a lot, but a girl can dream. amd some sites are doing that kind
> of thing with individual books already- but their scope isn't as large as
> PG's.  PG's scope is what makes it valuable, but I wouldn't use it foor
> scholarly work.

The ability to annotate, reference and interlink texts within a
digital text repository are very powerful features. The fundamental
architecture of the "PG Library System" should include this as a
future possibility. To me, this is even more exciting than some of
the other things being considered, such as language translation.

The requirements associated with these features strongly point to
formatting all PG master texts in XML. W3C's XPointer can be used to
address both spots and ranges within an XML document using several
schemes (both W3C defined and custom schemes within the XPointer
Framework.) The most common and most robust/persistent scheme is the
well-known fragment identifier. But there's also a scheme to point to
a particular element (tag) in a document which does not have an 'id',
as well as to point to a spot within content (this scheme is still in
Draft form -- it is not a W3C Recommend.)

So long as the XML document remains unchanged (and for the fragment
identifier scheme where the 'id's are kept unchanged even if changes
are made to the document), the XPointer addresses will still work.
(The term used here is "persistence".)

One problem area, which gets into Identifiers, is how to address the
XML document itself -- can it be addressed "standalone", or must it be
addressed only when it resides within a repository (such as the PG
Library)? If the XML document can be addressed standalone, apart from
the repository, then obviously it must internally contain an
identifier, the same one used to identify it within the repository
and which forms part of the URI reference.

It was an interesting exercise last year when the Open eBook Forum's
Publication Structure Working Group spent three months studying how
to reference and interlink OEBPS Publications, and how to address
particular spots and ranges within particular XML documents within a
Publication (OEBPS allows multiple documents to comprise one
Publication.) Of course, complicating things, which may be less of an
issue for PG, is that we wanted the linkability to persist even when
the OEBPS Publication is converted to something else, provided the
converted format can contain the relevant internal pointers. In this
study, Identifiers became a Significant Issue (tm). PG will need to
come up with a viable identifier system and specialized URI syntax for
using XLink.

For many of you, the above is probably all Greek. But if one wants
to enable annotation, referencing, and text interlinking within the
PG Library system, then this will put constraints and requirements
that need to be considered. One workable solution is where all the
texts are in XML, and one uses these cool technologies called
XPointer and XLink to enable these features. Fortunately, it appears
the "powers who are" have decided upon moving someday to XML for the
PG Master Texts.


> One person made the comment that PG shouldn't try to anticipate what
> scholars want- it should let scholars discover it and let them say what they
> need.  I just did, and most of what I'm hearing is that I have to learn to
> adapt to PG, when there are perfectly good college libraries out there.
> There is no reason for scholars to embrace a site that doesn't even meet up
> with basic MLA guidelines for books.  After all, that's the business you are
> in- not original websites, but books.

Michele's point is that before PG makes any substantive decisions, it
needs to decide upon which user groups it would like its texts to
target (the more the better in my opinion), and then ask the experts
in those groups to submit requirements. This should be done *before*,
not after, matters have been decided and the next-gen (or next-version)
PG system is ready to be built.

As I've said before, I believe it possible to come up with a set of
basic requirements for all PG texts which will reasonably meet the
needs for most, if not all, groups we identify (maybe by the "80-20"
rule, at the minimum.) By designing the system to be extensible for
particular special needs, then it will be able to fill in where the
basic requirements don't.

A summary rehash: If one considers that PG texts are not to be solely
standalone (which is the traditional view), but rather are components
of a dynamic and powerful repository (where the whole is greater than
the sum of the parts), then this creates specific requirements which
simultaneously impacts upon the areas of format, metadata/identifiers,
database structure, user interface design, to name a few. A holistic
approach is definitely necessary to assure that whatever is decided
for one area will not cause problems in another area. Thinking
holistically, factoring in the long-term vision of what we want the
PG Library to do and to be fifty years from now (and I don't believe
this is being discussed enough), is important.

Jon Noring

From jon at noring.name  Fri Nov 12 18:08:03 2004
From: jon at noring.name (Jon Noring)
Date: Fri Nov 12 18:08:22 2004
Subject: [gutvol-d] Oops, forgot accessibility (was PG audience)
In-Reply-To: <184940134937.20041112172213@noring.name>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<4194B4A3.3050305@perathoner.de>
	<Pine.LNX.4.60.0411120623310.28125@pglaf.org>
	<41954D36.7B9503D9@ibiblio.org>
	<184940134937.20041112172213@noring.name>
Message-ID: <187946484406.20041112190803@noring.name>

I wrote:

> Without thinking about it for more than 30 seconds, here's a partial
> list of different user groups. No doubt this list can be expanded and
> much better described/subcategorized. But it's a start to further
> discussion if enough here deem it of interest.
>
> 1) Personal interest readers
> 2) Scholars and researchers
> 3) Students (K-12 and post-secondary)
> 4) Professional and vocational

Geez, I forgot one of the most important user groups of all:

5) Readers with special needs (blind, dyslexic, etc.)


Note that there's a strong movement to require that K-12 and public
post-secondary educational materials be highly accessible, to be
offered in accessible formats. In the U.S., for textual materials this
will very likely be mandated as the XML-based NIMAS specification
(which in turn is derived from the DAISY Digital Talking Book
specification.)

If we want PG texts to be legally used in the classroom setting, which
I think is an *opportunity*, not a *burden*, then we definitely need
to assess how the "master" XML Schema settled upon (probably through
DP) will be compatible with NIMAS by XSLT or other conversion method.
It should be pretty easy to conform most if not all PG Master texts
to the NIMAS requirements, since from what I understand the PG Master
text Schema will likely be a subset of TEI.

I strongly suggest that before any XML-based vocabulary be decided
upon as the "master" PG format, that we consult with the technical
folk at DAISY, RFB&D, CAST, etc., to assure we aren't overlooking
something or doing something which would make accessibility more
difficult. As a heads up -- they love good navigational aids in the
markup and in external metadata (imagine being blind -- having
multiple verbal menus to access the texts in different ways is
important!) We might even be able to solicit the help of the
accessibility community to add navigational markup to selected PG
texts.

Jon Noring

From jtinsley at pobox.com  Fri Nov 12 18:30:28 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Fri Nov 12 18:30:42 2004
Subject: Marking bold & italics in .txt (was Re: [gutvol-d] a few
	questions that i don't know the answer to)
Message-ID: <20041113023028.GA8211@panix.com>

On Thu, 11 Nov 2004 23:33:29 -0800, Greg Newby <gbnewby@pglaf.org> wrote:

>On Thu, Nov 11, 2004 at 11:00:52PM -0500, Bowerbird@aol.com wrote:
>> second, for greg.  people over at distributed proofreaders
>> have reported that the f.a.q. here at project gutenberg
>> do not state that styled text (specifically, italics and bold)
>> be marked with underbars and asterisks in the text files.
>> the understanding i have from you is that this has become
>> the official policy of project gutenberg.  if that's not the case,
>> would you please inform people here?  and if it _is_ the case,
>> when you next update the f.a.q., could you include this policy?
>> thank you.
>
>Jim maintains the FAQ, and DP has their own style guides that
>sometimes vary for different texts.  So, I'm not really the right guy
>to ask.  I don't think there was agreement on how to handle bold
>& italics, but I do think everyone I heard from agreed it should be
>indicated somehow in plain text.
>
>So, I don't think there is an official policy on handling
>bold & italics in plain text files.  But if DP has an official
>policy I'm unaware of, then it should probably be reflected
>in the FAQ as a recommendation.
>
>Sorry I don't know the current state on this, but perhaps
>Jim or some of the DP project managers can contribute the
>latest thinking.

Italics is well covered at 

http://gutenberg.net/faq/V-94
http://gutenberg.net/faq/V-95

About three years or so ago, 'most everyone settled on _underscores_
for italics, with a few holdouts for /slants/. CAPITALS, of course,
are still represented in a lot of older texts, but I haven't seen
anyone using them in a new text for quite some time.

Compared to italics, bold as a method of emphasizing text,
as opposed to bold as an incidental property of a heading, 
is relatively rare.

Where bold does need to be rendered in plain text, the current
most common usage (from DP) is *bold text*. There are times when
it is appropriate to signify bold, but I have seen some texts
coming from DP where it has been used unnecessarily -- mostly
to indicate a sub-heading or chapter title in the book. In
such a case, where a chapter title is clearly a chapter title
and on a line by itself, there really is no need to mark it in
the plain text version as having been bold face in the original. 
I think this practice comes from people pre-marking the text for
later conversion to HTML, rather than any intent to clutter the
plain text.

jim


From servalan at ar.com.au  Fri Nov 12 11:56:49 2004
From: servalan at ar.com.au (Pauline)
Date: Fri Nov 12 18:32:43 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <20041112170017.93D4C2F902@ws6-3.us4.outblaze.com>
References: <20041112170017.93D4C2F902@ws6-3.us4.outblaze.com>
Message-ID: <41951581.5060303@ar.com.au>

Joshua Hutchinson wrote:
> ----- Original Message -----
> From: Marcello Perathoner <marcello@perathoner.de
>>It would be useful to include the dp project number in some form. We 
>>have a discussion ongoing with Joshua on how to achieve this.
>>
> 
> 
> Yep, pretty easy to implement in the teiHeader.  And then it can be used by Marcello's script to link back to the DP project commments fairly easily.  At this point, I figure it will be in there in the final specs.

Please note there is no 1 to 1 correspondence between the DP projectID & 
an etext number.

Multi-volume works/works which are split at DP for ease of processing, 
appear as with a single etext number in PG, while having multiple DP 
projectIDs.

DP does record the PG etext number for each projectID once the work has 
been posted.

DP requires a user be logged in to be able to view the Project Comments 
pages at present.

i.e. I do not see the wisdom of linking to the internal DP projectID 
from the PG database.

It would great to capture the bio. info in some of the DP Project 
Comments pages in PG, for some of the projects I post process, the DP 
Project Manager has been adding information to existing wikipedia 
entries. e.g.
http://en.wikipedia.org/wiki/Kermit_Roosevelt

Cheers,
P
-- 
Distributed Proofreaders: http://www.pgdp.net
"Preserving history one page at a time."
From joshua at hutchinson.net  Fri Nov 12 19:06:56 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Nov 12 19:06:56 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <41951581.5060303@ar.com.au>
References: <20041112170017.93D4C2F902@ws6-3.us4.outblaze.com>
	<41951581.5060303@ar.com.au>
Message-ID: <41957A50.2050809@hutchinson.net>

Pauline wrote:

> Joshua Hutchinson wrote:
>
>> ----- Original Message -----
>> From: Marcello Perathoner <marcello@perathoner.de
>>
>>> It would be useful to include the dp project number in some form. We 
>>> have a discussion ongoing with Joshua on how to achieve this.
>>>
>>
>>
>> Yep, pretty easy to implement in the teiHeader.  And then it can be 
>> used by Marcello's script to link back to the DP project commments 
>> fairly easily.  At this point, I figure it will be in there in the 
>> final specs.
>
>
> Please note there is no 1 to 1 correspondence between the DP projectID 
> & an etext number.
>
> Multi-volume works/works which are split at DP for ease of processing, 
> appear as with a single etext number in PG, while having multiple DP 
> projectIDs.
>

Not exactly.  The only things that don't have a one to one equivalent 
are the beginner books.  Something spans multiple volumes is usually 
(always?) posted as multiple etexts.  There may be an omnibus posting, 
though, too.

I those cases, we'd probably link to the projectID used for the first 
part going through DP.

> DP does record the PG etext number for each projectID once the work 
> has been posted.
>
> DP requires a user be logged in to be able to view the Project 
> Comments pages at present.
>
> i.e. I do not see the wisdom of linking to the internal DP projectID 
> from the PG database.
>

Whether the link back to DP would be useful or not I leave to others to 
decide.  However, when we get to the point ... someday ... of providing 
the original page scans (for those that we can), I wouldn't be surprised 
to see the projectID used as the value to tie back to them.  So, let's 
put the projectID in there for now.  It definitely doesn't hurt anything.


> It would great to capture the bio. info in some of the DP Project 
> Comments pages in PG, for some of the projects I post process, the DP 
> Project Manager has been adding information to existing wikipedia 
> entries. e.g.
> http://en.wikipedia.org/wiki/Kermit_Roosevelt
>
> Cheers,
> P


From j.hagerson at comcast.net  Fri Nov 12 19:19:13 2004
From: j.hagerson at comcast.net (John Hagerson)
Date: Fri Nov 12 19:19:45 2004
Subject: [gutvol-d] Linking back to DP to get page scans [Was: PG audience]
Message-ID: <007f01c4c92f$900159b0$6401a8c0@enterprise>

Joshua Hutchinson wrote:
>Whether the link back to DP would be useful or not I leave to others to 
>decide.  However, when we get to the point ... someday ... of providing 
>the original page scans (for those that we can), I wouldn't be surprised 
>to see the projectID used as the value to tie back to them.  So, let's 
>put the projectID in there for now.  It definitely doesn't hurt anything.

While it won't hurt anything, a DP ID might not have any significance for
linking to page scans. DP periodically archives finished projects and takes
them off line. If we would need to bring scans back on line, it would make
more sense to me to restore them to a directory named with the PG eBook
number and not the hexadecimal alphabet soup ID that DP uses.


From joshua at hutchinson.net  Fri Nov 12 19:22:55 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Nov 12 19:23:02 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <41954D36.7B9503D9@ibiblio.org>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>	<20041112022413.GB8242@pglaf.org>	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>	<4194B4A3.3050305@perathoner.de>	<Pine.LNX.4.60.0411120623310.28125@pglaf.org>
	<41954D36.7B9503D9@ibiblio.org>
Message-ID: <41957E0F.2020906@hutchinson.net>

Michael Dyck wrote:

>Instead, some people simply want to make PG texts more useful to
>scholars than they currently are, and I think we can do that without
>making them less useful or less available to non-scholars.
>
>  
>
AMEN!
From joshua at hutchinson.net  Fri Nov 12 19:34:52 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Nov 12 19:34:50 2004
Subject: Marking bold & italics in .txt (was Re: [gutvol-d] a few	questions
	that i don't know the answer to)
In-Reply-To: <20041113023028.GA8211@panix.com>
References: <20041113023028.GA8211@panix.com>
Message-ID: <419580DC.2070705@hutchinson.net>

Jim Tinsley wrote:

>
>Where bold does need to be rendered in plain text, the current
>most common usage (from DP) is *bold text*. There are times when
>it is appropriate to signify bold, but I have seen some texts
>coming from DP where it has been used unnecessarily -- mostly
>to indicate a sub-heading or chapter title in the book. In
>such a case, where a chapter title is clearly a chapter title
>and on a line by itself, there really is no need to mark it in
>the plain text version as having been bold face in the original. 
>I think this practice comes from people pre-marking the text for
>later conversion to HTML, rather than any intent to clutter the
>plain text.
>
>  
>
Actually, it is probably there from the OCR pre-processing and was never 
removed through all the rounds of proofing and post-processing... why I 
feel this is important enough of  a distinction that I needed to make a 
post about ... I have no idea.  I'm going to bed.

Josh
From joshua at hutchinson.net  Fri Nov 12 19:37:24 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Nov 12 19:37:27 2004
Subject: [gutvol-d] Linking back to DP to get page scans [Was: PG audience]
In-Reply-To: <007f01c4c92f$900159b0$6401a8c0@enterprise>
References: <007f01c4c92f$900159b0$6401a8c0@enterprise>
Message-ID: <41958174.40500@hutchinson.net>

John Hagerson wrote:

>Joshua Hutchinson wrote:
>  
>
>>Whether the link back to DP would be useful or not I leave to others to 
>>decide.  However, when we get to the point ... someday ... of providing 
>>the original page scans (for those that we can), I wouldn't be surprised 
>>to see the projectID used as the value to tie back to them.  So, let's 
>>put the projectID in there for now.  It definitely doesn't hurt anything.
>>    
>>
>
>While it won't hurt anything, a DP ID might not have any significance for
>linking to page scans. DP periodically archives finished projects and takes
>them off line. If we would need to bring scans back on line, it would make
>more sense to me to restore them to a directory named with the PG eBook
>number and not the hexadecimal alphabet soup ID that DP uses.
>
>  
>
True ... but this way we know which text was which projectID.  Otherwise, how do you know that the archive pngs in projectID747364873 should go into PG etext 10576?

Josh  

(Yeah, I made up those numbers ... they ain't valid)

From jtinsley at pobox.com  Fri Nov 12 20:16:21 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Fri Nov 12 20:16:37 2004
Subject: [gutvol-d] Linking back to DP to get page scans [Was: PG audience]
In-Reply-To: <007f01c4c92f$900159b0$6401a8c0@enterprise>
References: <007f01c4c92f$900159b0$6401a8c0@enterprise>
Message-ID: <20041113041621.GB8211@panix.com>

On Fri, Nov 12, 2004 at 09:19:13PM -0600, John Hagerson wrote:
>Joshua Hutchinson wrote:
>>Whether the link back to DP would be useful or not I leave to others to 
>>decide.  However, when we get to the point ... someday ... of providing 
>>the original page scans (for those that we can), I wouldn't be surprised 
>>to see the projectID used as the value to tie back to them.  So, let's 
>>put the projectID in there for now.  It definitely doesn't hurt anything.
>
>While it won't hurt anything, a DP ID might not have any significance for
>linking to page scans. DP periodically archives finished projects and takes
>them off line. If we would need to bring scans back on line, it would make
>more sense to me to restore them to a directory named with the PG eBook
>number and not the hexadecimal alphabet soup ID that DP uses.

I have posted, I think, three books with their page scans already,
which I could do because I was working directly with the producer. 
We have a provisional protocol for this, which is to put them into
12345/page-images/12345-page-images.zip. You can see
http://gutenberg.net/faq/S-21 for more detail. This protocol may,
and probably will, evolve as we get more cases. Speaking for myself,
I'd like to see more cases (though not an inundation! :-) fairly soon.
So if you're producing a book now, whether from DP or elsewhere, and 
you are free to submit the images, let me know.

The page images, where posted, should definitely be stored in the PG 
archive rather than linked-to; otherwise they won't get mirrored.

jim

From shalesller at writeme.com  Fri Nov 12 21:03:58 2004
From: shalesller at writeme.com (D. Starner)
Date: Fri Nov 12 21:04:15 2004
Subject: [gutvol-d] Perfection
Message-ID: <20041113050358.471844BE64@ws1-1.us4.outblaze.com>

"Her Serene Highness" <mbuch@mcsp.com> writes:
> David Starner writes:

> > What is he supposed to do, give a page reference to one of a dozen editions
> > that might be very hard for the teacher to find? With etexts, you know
> > that your recipent has access to the same edition you have. And as someone
> > else pointed out, if you quote the sentence, the context can be found in
> > seconds.
> 
> **Why not?  It's done all the time. Students and scholars have cited rare
> books that are impossible to find before- I remember citing a rare book that
> contained the concordat between the Vatican and Germany for a grad class
> years ago, and information on the Black Star line of Marcus Garvey while
> still in high school.  Why did my professors accept  my citations? Because
> they could be tracked down.  

One of the methods of mathematical proof is proof by uncheckable citation.
"This lemma is proved in the January 1822 volume of the Bohemian Mathematical 
Journal, pages 12-43." If the volume is in some library half-way across the 
country, nobody is going to take the time to check a cite in some students 
paper. If the teacher is never going to check the cite, what's the point?
And if he's going to find the one copy in the nation and order it via ILL,
what's so hard about searching through an online document?

> But other than as a work of literature, i'd have problems
> using it- like if I were comparing 19th century versions of Arabic texts,
> because I'm not even sure it was written in the 19th century.**

Anyone born in 1980 or later would know quite quickly, just like I do.
It was translated in 1861 and reprinted in 1971 as part of the
Everyman's Library, and has been frequently reprinted. It has a second
edition, in 1871; assuming the Everyman's library's text was taken from the
second edition, you can quickly check to see whether the PG edition is the 
first edition or the second edition.

Google is your friend. So is the LoC catalogs, but watch out because they
frequently have authors split under two headings, one of the marked
as being from the old catalog.

> **How? Easy. You look at other books about Koranic translations and see if
> they refer to this one- and guess what? You can't do that online. Which
> means you have to go to a library anyway. Online isn't BETTER.  It's
> different.

Or I could do a search online and find out that Rodwell's translation is
considered inferior by some because he wasn't a Muslim, but is probably
one of the better public-domain ones. I also find

"All the prominent translations of the Quran have each been the product 
of a single individual, so there is no translation which truly reflects 
the collective and opposing thoughts of a range of scholars. Such a 
large-scale collaborative effort would most likely be required to establish 
any one translation as most authoritative. Since this has not yet happened, 
there is no translation of the Qur'an as widely accepted (for example) as 
the New Revised Standard Version of the Bible.

"As a result, individual English-speaking Muslims tend to have their own 
personal favourites. Indeed, those who read more than one translation often 
develop a fondness for different aspects of each. For example, the renowned 
scholar Annemarie Schimmel, author of dozens of books on Islam and formerly 
professor of Islam at Harvard University, favoured the translation of Arthur 
John Arberry for beauty of expression, and that of Marmaduke Pickthall for 
literal rendering of Arabic phrases."

which are from <http://en.wikipedia.org/wiki/Translation_of_the_Qur%27an>, and
which convienantly have links to the authors so I can find their credentials.

> By the way- in a library, I can tell if a book is a reprint. 

But what you can't tell is if it was reprinted, if all you have is the
original. A quick search through the LoC's online catalogs should give 
you a pretty reasonable guess as to whether it was reprinted or not.

>  I could print it
> out and share it with my friends- after all, most people don't read whole
> books online. Control F is only useful if I'm in front of a machine.  If I
> want to read a Tom Swift book to my kids at a chapter a night, I'm not going
> to do it from a laptop or park little Johnny's bed next to my desk. 

No, but you aren't doing scholarly work with Tom Swift. And again, your
generation doesn't read whole books online, but mine does. 

> when I can print it out and have it
> paginated, 

My printer _always_ paginates documents. If you're dealing with an old 
dot-matrix, you have to paginate it manually, but the paper is usually
prescored for seperation. (-:

But seriously, I can't imagine why you'd want that. The original pages
were designed for the original machine, and the change in fonts and 
typesetting, which is unavoidable, will change where the page breaks
would naturally fall even if you strive to keep everything the same 
as the original. It would be much better to put page numbers in the
margins and let the physical breaks fall where they may. Accept that
page numbers have become free of the physical form of the book.

I think that we should retain source information, but you don't 
understand and accept the power of the tools at your fingertips.
Much of the context about a book can be resolved in a google search
or a search of the appropriate library catalog online. Stop and
think before you hit that print button; ink is expensive, you know.
A lot of things don't need printing out. Try emailing things to 
people, and letting them print it out if they want to.

Ebooks don't need to dance to be useful. Even well-stocked libraries
don't have many of the books we do, and even if you have to send for
the hardcopy, having the book at hand is useful. It vastly simplifies
searching and especially concordence building. Online books are better
in many ways, not just different.

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From brad at chenla.org  Fri Nov 12 21:46:26 2004
From: brad at chenla.org (Brad Collins)
Date: Fri Nov 12 21:48:26 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <41950197.2020707@perathoner.de> (Marcello Perathoner's message
	of "Fri, 12 Nov 2004 19:31:51 +0100")
References: <EIEBIMKJHGGGNFLDOPPKOEKOCJAA.mbuch@mcsp.com>
	<41950197.2020707@perathoner.de>
Message-ID: <wkwtwqpah9.fsf@chenla.org>


This is a real polarizing issue, with many academics believing that
they are the annointed guardians of literature and recorded
knowledge.  They feel threatened by groups like PG and DP which have
by-passed their institutional traditions.  Many academics today feel 
threatened by etexts in the same way that the clergy felt threatened
by the printing press.

I asked for a copy of the TEI source for Bradford's History of
Plymouth Plantation last month from some academic group.  They asked
me to submit a formal request which would explain what I would use
the text for!

There are any number of academic etext repositories which block
people from accessing public domain material because of `copyright
issues'.  Worse is how many university presses are making IP land
grabs worthy of the RIAA and MPAA.  There are a number of books which
are now only available to astonishingly expensive editions.

The OED is an example of this.  Oxford has pumped a huge amount of
money into the dictionary, but the dictionary has also been built
with an enormous amount of volunteer help.  There are no libraries
anywhere near where I live in Bangkok with a copy of the OED which I
can use.  Since I don't have a credit card, I can't get access to the
online edition even if I had the money to pay for it.

The academic preisthood feels that their powerbase and institutional
pupose for existence is threatened so they are circling the wagons
and giving the world good reason to threaten them.

On the other hand, academics, _are_ often the only people preserving a
lot of man's older and mostly forgotten knowledge and placing it in
context so that it can be understood today.  Academics feel horrified
when they hear people say, I don't care about all that stuff, just
give me books.

This is the same horror that geeks experience when they hear people
say with pride that they can't program their VCR and never will. 

Being proud of being ignorant is something that I have never
understood and never will, but I think that what Joe Sixpack is saying
is that geeks and scholars should do their job and shouldn't bother
him with the details.  He's only concerned with the resulting text or
software not the process of how it was created.

In a sense he's right. That's our job, and we shouldn't try to force
the end-user to understand the larger or technical issues involved in
doing our job.  The great unwashed masses have no idea how much work
is involved in doing our jobs and sometimes believe that we're making
things far more difficult and complex than it really is.

As Neil Stephenson said, most people want a mediated experience like
you get from Disney.  They don't want to see or deal with the enormous
complexity behind it all.

I believe that we should think more like special effects artists who
believe the best effects, and the ones that they are most proud of
are the ones that no one realises are effects in the first place.

Many academic editions are so burdened with analysis, and annotation
that they get in the way of the text itself.  Electronic editions can
hide the glorified and sanctified academic Cliff Notes but make them
easily accessible if you need or want them.  Personally I like it
both ways.  Sometimes I want I want to work at a text and really
study it and all the scholarly apparatus is a godsend.  But other
times I just want to read a story, and leave the stuff I don't
understand for another time.

The great promise of the computer age has been to provide tools which
allow the average person with no experience or skills to do the work
that required highly skilled workers using specialized professional
equiptment.  

Desk top publishing in the 80's is a great example.  As soon as laser
printers and colour monitors became cheap enough everyone thought
that a secretary who barely could use Wordstar could do the work of a
team of professional graphic artists and typesetters.

Visual Basic was touted as being a language that could be mastered by
the average person and produce applications of the same quality as
apps written in C by experienced programmers.  Right.

Apple is now pushing the dream that anyone with an Apple and a good
video camera can be the next Stanley Kuberick with less than US$20K in
hardware and software.  The barrier of entry and access to the tools
for the next Stanley Kuberick is now much lower, but that doesn't
mean your Aunt Cindy is going to be making the next Full Metal Jacket
in the corner of her family room on her iMac.

People like Bowerbird (who I suspect is still here, despite giving
his formal swan song) want to reduce the complexity behind the scenes
to something as simple as what the end-user sees.  The thing is, that
at first glance it really doesn't look like it's too difficult.  
And the plethora of cheap, professional quality tools availible
through chain stores makes it seem, at first, not to be too difficult.

This has had the negative side-effect of giving Joe Sixpack the
illusion that all of this stuff is a lot easier than it is and to
give the impression that professionals who have spent decades
studying and honing their craft are just full of crap and making
things more difficult than they have to be.

I suspect that over the next decade, institutions will be re-cast and
professionals will re-establish themselves so that their education,
experience and skills will be respected.  

But for those of us in the trenches during the transition it won't be
easy and it won't be pretty.

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From shalesller at writeme.com  Fri Nov 12 21:59:47 2004
From: shalesller at writeme.com (D. Starner)
Date: Fri Nov 12 22:00:04 2004
Subject: [gutvol-d] PG audience
Message-ID: <20041113055947.910244BE64@ws1-1.us4.outblaze.com>

Brad Collins <brad@chenla.org> writes:

> The OED is an example of this.  Oxford has pumped a huge amount of
> money into the dictionary, but the dictionary has also been built
> with an enormous amount of volunteer help.  There are no libraries
> anywhere near where I live in Bangkok with a copy of the OED which I
> can use.  Since I don't have a credit card, I can't get access to the
> online edition even if I had the money to pay for it.

And I understand that despite how much it costs, it has never turned a
profit in the history of its existance. Oxford keeps people working on
it because of its importance, not as a profit making venture.

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From traverso at dm.unipi.it  Fri Nov 12 22:16:38 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Fri Nov 12 22:16:59 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <20041113050358.471844BE64@ws1-1.us4.outblaze.com>
	(shalesller@writeme.com)
References: <20041113050358.471844BE64@ws1-1.us4.outblaze.com>
Message-ID: <200411130616.iAD6GcSm004979@posso.dm.unipi.it>


There is a reason to preserve page numbers in ebooks. While correct
academic quotations can be excellently made without page numbers,
quoting an electronic version, the same is not true with retrieving
information quoted somewhere else (in an old paper edition).

So for example if an existing book contains a sentence like "This
topic is discussed in book aaaaa in pages xxx-yyy, (the exact edition
is quoted in a reference) how do you find easily the exact range of
pages, without page numbers?

The same happens when a book has an index: the index item is often not
found literally in the text, and a page number is an handy way to find
the reference. Of course, the index can be improved (in an *ML
edition) with cross-links, but transforming an index into a
cross-linked version is a lot of work, and has to be done by an
expert, while reading a page to find a reference is much less work,
and is usually done by a (relatively) expert.

Some just answer: then do an HTML or a TEI edition. This I don want: I
cannot, and I do not want to learn, I prefer working with text, and do
more texts. And I prefer using text instead of *ML. Moreover if I keep
page numbers, conversion to *ML with page numbers will be much easier
than having to retrieve the numbers from the images. 

Some say: page numbers are ugly in txt. It is the same people that
want to have an *ML version, so why do they bother? Please take the
txt version with number, do your *ML and leave the txt alone.

Of course, having page numbers in Tom Swift might be too much. But at
least if a book has an index, I believe that page numbers might be
useful, even in txt, and we should recommend to keep the information.

 Carlo

From sly at victoria.tc.ca  Fri Nov 12 23:22:59 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Fri Nov 12 23:23:18 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <41951581.5060303@ar.com.au>
References: <20041112170017.93D4C2F902@ws6-3.us4.outblaze.com>
	<41951581.5060303@ar.com.au>
Message-ID: <Pine.GSO.4.58.0411122320230.25965@vtn1.victoria.tc.ca>


On Sat, 13 Nov 2004, Pauline wrote:

> It would great to capture the bio. info in some of the DP Project
> Comments pages in PG, for some of the projects I post process, the DP
> Project Manager has been adding information to existing wikipedia
> entries. e.g.
> http://en.wikipedia.org/wiki/Kermit_Roosevelt


I've just added that link to the author record for Kermit Roosevelt
in the PG online catalog, as well as his birth and death dates.

If you have any other authors represented in PG who have articles
about them on wikipedia, let me know...

Andrew
From brad at chenla.org  Fri Nov 12 23:37:53 2004
From: brad at chenla.org (Brad Collins)
Date: Fri Nov 12 23:39:59 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <20041113055947.910244BE64@ws1-1.us4.outblaze.com> (D.
	Starner's message of "Fri, 12 Nov 2004 21:59:47 -0800")
References: <20041113055947.910244BE64@ws1-1.us4.outblaze.com>
Message-ID: <wkoei2p5bi.fsf@chenla.org>

"D. Starner" <shalesller@writeme.com> writes:

> Brad Collins <brad@chenla.org> writes:
>
>> The OED is an example of this.  Oxford has pumped a huge amount of
>> money into the dictionary, but the dictionary has also been built
>> with an enormous amount of volunteer help.  There are no libraries
>> anywhere near where I live in Bangkok with a copy of the OED which I
>> can use.  Since I don't have a credit card, I can't get access to the
>> online edition even if I had the money to pay for it.
>
> And I understand that despite how much it costs, it has never turned a
> profit in the history of its existance. Oxford keeps people working on
> it because of its importance, not as a profit making venture.
>

Good point -- 

The bills have to be paid by _someone_.  But does that factor in
profits from other dictionaries like the COD (Concise Oxford
Dictionary)?  The OED is the baseline for all of the Oxford
Dictionaries, just as Merriam Webster does with their unabridged third
international and the rest.

COD or the MW Collegiate would not be what they are without their
monster unprofitable cousins.

I read somewhere that the COD has been one of the top selling books in
UK every year for quite some time (that could be wrong though).  And
it might well be that even with this other revenue the whole venture
might still be short of a profit.

But if they are working on it because of its importance and not for
profit then why make it so expensive?  They _want_ to make a profit
from it and they are trying.  Fair enough.

If the OED is only available in institutions which can afford it, it will
eventually be replaced by another, just as Britannica is loosing
ground to Wikipedia.

Wikipedia still has a ways to go (perhaps not in quantity but in
quality) but the writing is on the wall.  More than any other type of
intellectual work, every dictionary and encyclopedia is built on the
backs of those that come before it.

And so it goes.

b/


-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From tb at baechler.net  Sat Nov 13 00:07:16 2004
From: tb at baechler.net (Tony Baechler)
Date: Sat Nov 13 00:05:50 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <20041112185400.GD3160@pglaf.org>
References: <5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com>
	<4194B4A3.3050305@perathoner.de>
	<LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<4194B4A3.3050305@perathoner.de>
	<5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com>
Message-ID: <5.2.0.9.0.20041113000326.02004ae0@snoopy2.trkhosting.com>

At 10:54 AM 11/12/2004 -0800, you wrote:
>On Fri, Nov 12, 2004 at 07:11:46AM -0800, Tony Baechler wrote:
> > At 06:23 AM 11/12/2004 -0800, you wrote:
> >
> > >Actually, it's pretty easy to find all the original Project Gutenberg
> > >eBooks,
> > >as well as the newer versions, because so many places keep them, 
> usually in
> > >the thousands for any of our eBooks that have been out for even a week.
> >
> > Hello.  Actually, I've had a hard time finding any of the very early
> > editions of PG files.  There are some old files in the etext90 directory,
> > but not edition 10 of the first several ebooks.  I would be interested to
> > find the very first edition of when10.txt or whatever it was called as MH
> > posted it.  Even the old GUTINDEX.* files have been removed, with the
> > earliest being GUTINDEX.96 when it used to be GUTINDEX.90.
>
>Michael might have some of the older files.  There are a few
>sources, like old Walnut Creek CDs, that might also be able
>to help.

I do not have every old Walnut Creek CD ever published, but I do have one 
and it does not have any of the older files either.  I first started using 
PG in 1995 and even then the very early files from 1971-89 were not 
generally available.  The oldest file, at least as far as apparently the 
oldest PG header that I am aware of is plboss10.zip.  I'm not sure if 
edition 10 is still available but I have it. 

From david at newmannotes.com  Sat Nov 13 01:30:44 2004
From: david at newmannotes.com (David Newman)
Date: Sat Nov 13 01:30:44 2004
Subject: [gutvol-d] Scholarly use of PG
In-Reply-To: <20041113050417.9F29A8C914@pglaf.org>
References: <20041113050417.9F29A8C914@pglaf.org>
Message-ID: <p06020404bdbb4ec0f1cc@[192.168.0.3]>

As a credentialed conflict avoider, I've been loathe to stick my head 
into this fray. Indeed, this battle about meeting the needs of 
academia appears to be waged at times with an ideological fervor to 
rival that of the recent US election.  It seems to me that the 
fervency with which people approach this issue has made it difficult 
in some cases for the arguments to follow a path towards resolution. 
It is perhaps also complicated by the wide assortment of changes 
being proposed to remedy the perceived problems.

Some arguments for change suggest that PG should direct its energies 
towards making its library suitable for scholars by including more 
information in the files, particularly pagination and provenance, 
presumably packaged with XML.  I have no problem with including such 
information.  However, I don't think it should be required of all 
texts, nor do I believe that it really solves the scholarship issue. 
Including page scans _would_, to the degree that a solution is 
possible, and requires approximately 0% extra work for most of our 
valiant volunteers.  And, PG has made it clear that this is 
acceptable, and has already done so for some projects.

I feel that Marcello gave the most persuasive and concise summary of 
the situation, and I didn't notice any overt disagreement.

Marcello Perathoner wrote:
>The best value for Academia (and the least work for us) would be just to
>include the page scans. Any transcription you make will fall short of
>the requirements of some scholar. I think we should use our time for
>producing more books for a general audience instead than producing
>Academia-certified editions of them.

HSH's comments justify such an approach.

Her Serene Highness wrote:
>I need to know EXACTLY when the original
>was published, who published it, and where, since there are variant texts
>out there.  Even a single word change that might have occurred in the
>copying process could change the meaning of a vital sentence.

Of course, there is a simple, if unsatisfactory, answer to all these 
questions for PG texts:  they were published by PG, on the PG 
website, and each file states when it was published.  Each work we 
publish is the "PG variant" of that text.

As an academic, I find it dishonest and unhelpful for a scholar to 
cite a physical volume when the volume they consulted is an 
electronic edition.  It is virtually impossible to guarantee that 
"even a single word change" was not introduced in the transcription 
process.  Even with DP's careful processes, I would not wager that 
most of our books enter PG completely error free (or correction free, 
for that matter.)

Page scans allow for an additional layer of safety for any scholar 
concerned about the adherence to a given print edition, though a 
certain level of trust in the provider is still required.  Thus, 
while I hope that PG's holdings are as accurate as possible, it would 
also be my hope that scholars using PG would cite PG.  Evidently this 
is not always the case.

Michael Hart wrote:
>I've also heard that many of those who complain, actually use our
>eBooks in secret, and ONLY want the provenance so they can steal
>them without giving credit where credit is due.

This suggests to me two things.

1) We can include page scans and information about provenance, _when 
available_, with the files so that academics can feel confident in 
the reliability of those PG holdings.  Not so that the original 
sources can be dishonestly cited, but to provide the necessary data 
for certain scholars to confidently cite PG's edition.  We can point 
to this in our documentation to enhance our scholarly credibility.

2) We can prominently suggest an appropriate style of citation of 
works in PG's holdings. (I've seen this done with other digital 
collections.)  Perhaps if the citation style also takes into account 
the original source, some otherwise reluctant scholars would be 
appeased.

Is this something we can all agree on?

-- 
David Newman
www.davidnewman.info
From traverso at dm.unipi.it  Sat Nov 13 02:55:32 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Sat Nov 13 02:55:56 2004
Subject: [gutvol-d] Scholarly use of PG
In-Reply-To: <p06020404bdbb4ec0f1cc@[192.168.0.3]> (message from David Newman
	on Sat, 13 Nov 2004 01:30:44 -0800)
References: <20041113050417.9F29A8C914@pglaf.org>
	<p06020404bdbb4ec0f1cc@[192.168.0.3]>
Message-ID: <200411131055.iADAtWUG012033@posso.dm.unipi.it>


David's solution is perfectly OK for me. It is sufficient that PG does
not discourage keeping the extra information (it did until
recently). The volunteers will do the rest.

An importaint improvement would be to be able to go easily from the
text to the corresponding page scan. Just having the two separately is
fine, but having them linked is better; going from image to txt is
easy (search), but the converse is often hard. There are of course
different solutions. All require, in some form, to preserve the page
information, including page numbers in the source is just one method.


Another remark, on page scans obtained from other sources: one of these
sources, the one that I mostly use, and that has originated hundreds and
probably thousands of PG books, is the french national library,
http://gallica.bnf.fr.

I have received (by email) a ratheer broad permission to use
everything on the site to produce ebooks for DP and PG, and related
sites (I have used the permission for LiberLiber and DP-EU). It might be
possible to renegotiate the permission, but might result in a
restriction of the terms. But I believe that the original permission
could cover the possibility of giving to the user the possibility of
checking an individual page for comparison, not of mirroring their
files, once the transcription completed; these files can very well
obtained from the origin. The french national library is not
expected to die or to become unavailable: and in that case we have the
image files.

Carlo

From marcello at perathoner.de  Sat Nov 13 05:41:56 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Nov 13 05:41:58 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <41951581.5060303@ar.com.au>
References: <20041112170017.93D4C2F902@ws6-3.us4.outblaze.com>
	<41951581.5060303@ar.com.au>
Message-ID: <41960F24.7010900@perathoner.de>

Pauline wrote:

> Please note there is no 1 to 1 correspondence between the DP projectID & 
> an etext number.

Then put *all* DP project IDs into the reulting files. Or put the same 
project ID into multiple etexts.


> i.e. I do not see the wisdom of linking to the internal DP projectID 
> from the PG database.

DP could offer access to their database thru remote procedure calls or 
whatever so I could pull the data into the catalog.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Sat Nov 13 06:01:59 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Nov 13 06:02:03 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <20041112182216.Y99646@krweb.net>
References: <20041112221426.87CDD10997C@ws6-4.us4.outblaze.com>
	<20041112182216.Y99646@krweb.net>
Message-ID: <419613D7.4080907@perathoner.de>

kris foster wrote:

> this is a dangerous reliance on a transitory medium.  electronic 
> citation is merely more convenient.


What makes medium permanence a value per se ?

Academia has developed its traditions around a medium (papyrus, paper) 
that is permanent. Not the other way around. If the medium they had used 
was impermanent the methods and traditions of Academia would be 
different today.


Medium permanence can be a big disadvantage too. The scholars in the 
middle ages relied blindly on Aristotle. Scientific method in the middle 
ages amounted to find out what Aristotle said about some subject, and 
that was that. Own research was not deemed a scientific method.

Of course, Aristotle said that "wood swims and metal sinks" and that 
"heavier items fall faster than lighter ones".


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Sat Nov 13 06:42:13 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Nov 13 06:42:19 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <wkoei2p5bi.fsf@chenla.org>
References: <20041113055947.910244BE64@ws1-1.us4.outblaze.com>
	<wkoei2p5bi.fsf@chenla.org>
Message-ID: <41961D45.1080901@perathoner.de>

Brad Collins wrote:

> Wikipedia still has a ways to go (perhaps not in quantity but in
> quality) but the writing is on the wall.


Renowned German computer magazine c't (issue 2004/21 pg. 132ff) tested 
following German encyclopaedias:

  - MS Encarta 2005 Professional (DVD)
  - Brockhaus 2005 Premium (DVD)
  - Wikipedia (internet)


Wikipedia got the best score (3.6) for in the "contents" category. 
(Brockhaus: 3.3, Encarta 3.1)

The "contents" test consisted in having domain specialists review 66 
articles in 22 different subjects.


This is only the German version of Wikipedia with 136,000 articles at 
the time of testing. English Wikipedia has now approx 400,000 articles.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jon at noring.name  Sat Nov 13 08:22:18 2004
From: jon at noring.name (Jon Noring)
Date: Sat Nov 13 08:22:46 2004
Subject: [gutvol-d] Scholarly use of PG
In-Reply-To: <p06020404bdbb4ec0f1cc@[192\.168\.0\.3]>
References: <20041113050417.9F29A8C914@pglaf.org>
	<p06020404bdbb4ec0f1cc@[192.168.0.3]>
Message-ID: <161997739609.20041113092218@noring.name>

Michael Hart supposedly wrote:

> I've also heard that many of those who complain, actually use our
> eBooks in secret, and ONLY want the provenance so they can steal
> them without giving credit where credit is due.

Michael, Michael, Michael -- you're grasping at straws trying to
justify not providing the provenance of any PG works.

Let's analyze what you wrote.

Here's a portion of the Boilerplate, the "TOU" (Terms of Use) from
a 2004 text.

(from ftp://sailor.gutenberg.org/pub/gutenberg/etext04/ge71v10.txt )

**********************************************************************

DISTRIBUTION UNDER "PROJECT GUTENBERG-tm"

You may distribute copies of this eBook electronically, or by
disk, book or any other medium if you either delete this
"Small Print!" and all other references to Project Gutenberg,
or:

[Snip of various restrictions when the small print is kept,
including paying a 20% royalty if profits are made from the
work.]

**********************************************************************


Parsing the Distribution Notice, we see two and only two scenarios
(due to the "either/or" construct):

1) Anyone may distribute copies of this eBook electronically, or by
   *any other medium* (implying conversion), without *any* stated
   restrictions whatsoever so long as the "small print" and any
   references to the name "Project Gutenberg" are removed (that is,
   no one will know for sure the work came from the PG Library.)

2) Follow the restrictions if the "small print" and/or "Project
   Gutenberg" is mentioned as the source.


Michael, PG *is allowing* people to use PG's texts "in secret"
(whatever that means), and is welcoming them to be used that way!
You are essentially saying they should not use them this way, in
contradiction to PG's own small print.

There are several strong arguments for including the provenance of PG
texts in metadata, which have been discussed ad nauseum here by quite
a few sharp people. For example, the leaders and several volunteers of
DP have stated it is a good thing to do (and I believe some of them
have said it should be a basic requirement.)

What I suggest PG does is to take a poll, using an independent forum,
of the various volunteers and users of PG texts, and *ask them* the
following question:

   Should PG include the full details of the source document(s) used
   to produce every PG text?

Supposing 2/3 of the respondents said 'yes'. Would PG honor the wishes
of the volunteers and users by then requiring the source information
be included in the text's metadata?

Note that over the years the PG volunteers have put in hundreds of
thousands hours (and maybe over a million hours) of time to help build
the PG Library collection. They have a valid claim of "moral
co-ownership" of what has been produced, and should collectively have
their say in the future development of the collection. (One reason
why I believe PGLAF should become a member-"owned" organization --
currently its Articles of Corporation state it has no members.)

Michael, I'd like to hear from you *all* the reasons you have for why
the provenance of PG texts should not be included in the metadata.
How does providing the provenance harm the goals of PG? I have yet to
hear a cogent and logical argument. Why not write a specific FAQ on
this topic?

Jon Noring

From hart at pglaf.org  Sat Nov 13 08:30:11 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Nov 13 08:30:13 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <419613D7.4080907@perathoner.de>
References: <20041112221426.87CDD10997C@ws6-4.us4.outblaze.com>
	<20041112182216.Y99646@krweb.net> <419613D7.4080907@perathoner.de>
Message-ID: <Pine.LNX.4.60.0411130825550.32321@pglaf.org>


On Sat, 13 Nov 2004, Marcello Perathoner wrote:

> kris foster wrote:
>
>> this is a dangerous reliance on a transitory medium.  electronic citation 
>> is merely more convenient.
>
>
> What makes medium permanence a value per se ?
>
> Academia has developed its traditions around a medium (papyrus, paper) that 
> is permanent. Not the other way around. If the medium they had used was 
> impermanent the methods and traditions of Academia would be different today.

If you visit any library archive, you might be surprised at the preservations
problems they are having on an ever increasing level.

A decade or so ago, the Library of Congress just completely gave up on trying
to keep much of their newspaper collection, and decided to microfilm what
they could and to sell off the rest of those archives before they completely
fell apart.  I bought several volumes from about a century ago, of the New
York Herald, just so I could have an additional perspective on the era.
Of course, much of the most interesting parts wouldn't be referenced,
as they are advertizing. . .such as the first New York apartments that
included cooking facilities.

Michael

From jon at noring.name  Sat Nov 13 08:39:51 2004
From: jon at noring.name (Jon Noring)
Date: Sat Nov 13 08:40:02 2004
Subject: [gutvol-d] Scholarly use of PG
In-Reply-To: <161997739609.20041113092218@noring.name>
References: <20041113050417.9F29A8C914@pglaf.org>
	<p06020404bdbb4ec0f1cc@[192.168.0.3]>
	<161997739609.20041113092218@noring.name>
Message-ID: <32998793031.20041113093951@noring.name>

I asked the following "poll" question:

> Should PG include the full details of the source document(s) used
> to produce every PG text?

Reading some other recent messages, they seem to imply that any new
text submitted to PG which includes full source metadata, that info
will now be kept in, when formerly it was stripped out? Is this true?

If this is true, this is definitely good news. Since DP appears to
include this data, the vast majority of new and future PG texts should
now include this info. When the early PG texts are redone by DP at
some future time, the world will now have the data available.

(If I misread something, however, someone correct me.)

But we are still faced with the issue of whether PG should require
provenance metadata when the work is transcribed from a "paper/ink"
original? Another question is if it should attempt, wherever possible,
to reinsert the data into existing PG texts. (E.g., contact the
submitters and ask them if they kept that data. DP has submitted
thousands of texts -- they no doubt have the source information.)

Jon Noring

From hart at pglaf.org  Sat Nov 13 08:55:04 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Nov 13 08:55:05 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <5.2.0.9.0.20041113000326.02004ae0@snoopy2.trkhosting.com>
References: <5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com>
	<4194B4A3.3050305@perathoner.de>
	<LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<4194B4A3.3050305@perathoner.de>
	<5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com>
	<5.2.0.9.0.20041113000326.02004ae0@snoopy2.trkhosting.com>
Message-ID: <Pine.LNX.4.60.0411130847330.32321@pglaf.org>


On Sat, 13 Nov 2004, Tony Baechler wrote:

> At 10:54 AM 11/12/2004 -0800, you wrote:
>> On Fri, Nov 12, 2004 at 07:11:46AM -0800, Tony Baechler wrote:
>> > At 06:23 AM 11/12/2004 -0800, you wrote:
>> >
>> > >Actually, it's pretty easy to find all the original Project Gutenberg
>> > >eBooks,
>> > >as well as the newer versions, because so many places keep them, 
>> usually in
>> > >the thousands for any of our eBooks that have been out for even a week.
>> >
>> > Hello.  Actually, I've had a hard time finding any of the very early
>> > editions of PG files.  There are some old files in the etext90 
>> directory,
>> > but not edition 10 of the first several ebooks.  I would be interested 
>> to
>> > find the very first edition of when10.txt or whatever it was called as 
>> MH
>> > posted it.  Even the old GUTINDEX.* files have been removed, with the
>> > earliest being GUTINDEX.96 when it used to be GUTINDEX.90.
>> 
>> Michael might have some of the older files.  There are a few
>> sources, like old Walnut Creek CDs, that might also be able to help.

I could look through my old collections of CD and floppy eBook collections
if this is truly important, but you should be advised that the originals
of all the earliest eBooks were ALL IN CAPS, and with limited punctuation,
since they were typed in on TeleType 33 machines.  It would be fun to see
if anyone could change them back to the originals, and if the blogosphere
that caught Dan Rather could possibly check all the punctuation marks to
prove that such a document COULD have been typed on a TeleType 33.

Of course, I still have mine here in the basement, and might be able to
fake it better than anyone could disprove.

However, the whole idea of finding the original files doesn't mean a lot
to me. . .but I think the first file was just named "when". . .without
any number or any extension.  [However, that could have been changed by
the system administrators when they moved it to 9-track tape. . .which
was done by file location, as I recall, rather than by file name.  i.e.,
give me the file that starts at 1240 feet on tape number 1642. . . .
That was the kind of instruction we received back in 1971 when someone
wanted the Declaration of Independence.


> I do not have every old Walnut Creek CD ever published, but I do have one and 
> it does not have any of the older files either.  I first started using PG in 
> 1995 and even then the very early files from 1971-89 were not generally 
> available.  The oldest file, at least as far as apparently the oldest PG 
> header that I am aware of is plboss10.zip.  I'm not sure if edition 10 is 
> still available but I have it.

I probably still have copies of the first one. . .I think it was an odd green
color. . .but, again, it's only of sentimental value as a collectors' item,
as far as I am concerned.

I wonder if they will appear 100 years from now on "Antiques Roadshow?"

;-)

From hart at pglaf.org  Sat Nov 13 09:01:22 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Nov 13 09:01:24 2004
Subject: !@!Re: [gutvol-d] PG audience
In-Reply-To: <wkoei2p5bi.fsf@chenla.org>
References: <20041113055947.910244BE64@ws1-1.us4.outblaze.com>
	<wkoei2p5bi.fsf@chenla.org>
Message-ID: <Pine.LNX.4.60.0411130856460.32321@pglaf.org>


Speaking of the OED, Greg Newby and I were just discussing it a week
or two ago, and he is willing to do the first few pages if anyone
has access to an edition that correctly states the date of the
first volume as 1888. . .just photocopy the title page/verso and
the first couple pages and send to him, to get the ball rolling.

Not ALL of the original OED is in the public domain in the U.S.,
by the way. . .only those volumes published before 1923:

NEW DICTIONARY OF THE ENGLISH LANGUAGE BASED ON HISTORICAL PRINCIPLES
also known as "The Oxford English Dictionary"

Copyright dates for the first edition are:

V.1     : 1888
v.2     : 1893
v.3     : 1897
v.4     : 1901
v.5     : 1901
v.6     : 1908
v.7     : 1909
v.8     : 1914
v.9i    : 1919
v.9ii   : 1919   <<< Last Volume We Can Do In The US!

v.10i   : 1926
v.10ii  : 1928
Supplement 1933

*******
The completion of each individual portion was as follows:

[I am not sure if the copyright dates concur exactly]
[This is something for the scholars and lawyers to fight about]

AB-1888
C -1893
D -1897
E -1893
F- 1897
G- 1900
H- 1899
IJK-1901
M-1908
N-1907
O-1904
P-1909
Q-1902
R-RE-1905
RE-RY-1910
S-SH-1914
SI-SQ-1915
ST-1919
SU-SZ-1919
T-1915
U-1926
V-1920
W-WE-1923
WH-WO-1927
WO-WY-1927
XYZ-1921

Michael


On Sat, 13 Nov 2004, Brad Collins wrote:

> "D. Starner" <shalesller@writeme.com> writes:
>
>> Brad Collins <brad@chenla.org> writes:
>>
>>> The OED is an example of this.  Oxford has pumped a huge amount of
>>> money into the dictionary, but the dictionary has also been built
>>> with an enormous amount of volunteer help.  There are no libraries
>>> anywhere near where I live in Bangkok with a copy of the OED which I
>>> can use.  Since I don't have a credit card, I can't get access to the
>>> online edition even if I had the money to pay for it.
>>
>> And I understand that despite how much it costs, it has never turned a
>> profit in the history of its existance. Oxford keeps people working on
>> it because of its importance, not as a profit making venture.
>>
>
> Good point --
>
> The bills have to be paid by _someone_.  But does that factor in
> profits from other dictionaries like the COD (Concise Oxford
> Dictionary)?  The OED is the baseline for all of the Oxford
> Dictionaries, just as Merriam Webster does with their unabridged third
> international and the rest.
>
> COD or the MW Collegiate would not be what they are without their
> monster unprofitable cousins.
>
> I read somewhere that the COD has been one of the top selling books in
> UK every year for quite some time (that could be wrong though).  And
> it might well be that even with this other revenue the whole venture
> might still be short of a profit.
>
> But if they are working on it because of its importance and not for
> profit then why make it so expensive?  They _want_ to make a profit
> from it and they are trying.  Fair enough.
>
> If the OED is only available in institutions which can afford it, it will
> eventually be replaced by another, just as Britannica is loosing
> ground to Wikipedia.
>
> Wikipedia still has a ways to go (perhaps not in quantity but in
> quality) but the writing is on the wall.  More than any other type of
> intellectual work, every dictionary and encyclopedia is built on the
> backs of those that come before it.
>
> And so it goes.
>
> b/
>
>
> --
> Brad Collins <brad@chenla.org>, Bangkok, Thailand
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From jon at noring.name  Sat Nov 13 09:03:36 2004
From: jon at noring.name (Jon Noring)
Date: Sat Nov 13 09:04:19 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <wkwtwqpah9.fsf@chenla.org>
References: <EIEBIMKJHGGGNFLDOPPKOEKOCJAA.mbuch@mcsp.com>
	<41950197.2020707@perathoner.de> <wkwtwqpah9.fsf@chenla.org>
Message-ID: <171000217734.20041113100336@noring.name>

Brad wrote:

> I asked for a copy of the TEI source for Bradford's History of
> Plymouth Plantation last month from some academic group.  They asked
> me to submit a formal request which would explain what I would use
> the text for!

Interesting.

I happen to have a copy of the 1898 printing of `Bradford's History
"Of Plimoth Plantation."' My wife's maternal ancestry goes back to
colonial Massachusetts, and I think one of her ancestors is mentioned
in the book (Degory Priest).

If this book has not yet been scanned by anyone affiliated with PG/DP,
I'll gladly offer our copy for scanning so long as whatever is used to
scan it will not damage the binding (probably can't use a flat bed
scanner), and that the scans *will* be made available online for free,
even before the work is converted to XML.

The book, including index, has over 550 pages, so it is pretty massive.

A fascinating work, btw, and one I hope will be scanned and converted
to TEI by PG/DP.

Jon Noring

From hart at pglaf.org  Sat Nov 13 09:08:28 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Nov 13 09:08:30 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <200411130616.iAD6GcSm004979@posso.dm.unipi.it>
References: <20041113050358.471844BE64@ws1-1.us4.outblaze.com>
	<200411130616.iAD6GcSm004979@posso.dm.unipi.it>
Message-ID: <Pine.LNX.4.60.0411130907410.32321@pglaf.org>


Question:

How much harder is it to make an eBook set up to answer all
these scholarly and reference questions, than just to read?

Michael

From marcello at perathoner.de  Sat Nov 13 09:25:19 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Nov 13 09:25:27 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <Pine.LNX.4.60.0411130907410.32321@pglaf.org>
References: <20041113050358.471844BE64@ws1-1.us4.outblaze.com>	<200411130616.iAD6GcSm004979@posso.dm.unipi.it>
	<Pine.LNX.4.60.0411130907410.32321@pglaf.org>
Message-ID: <4196437F.9080905@perathoner.de>

Michael Hart wrote:

> How much harder is it to make an eBook set up to answer all
> these scholarly and reference questions, than just to read?

Providing source information and page numbers is easy. So it is to 
provide the page scans. Of course: page scans != ebook.

Marking up a book to satisfy most scholarly requirements is more work 
than I would care for, short of being paid to do it.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From hart at pglaf.org  Sat Nov 13 09:27:39 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Nov 13 09:27:41 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <wkwtwqpah9.fsf@chenla.org>
References: <EIEBIMKJHGGGNFLDOPPKOEKOCJAA.mbuch@mcsp.com>
	<41950197.2020707@perathoner.de> <wkwtwqpah9.fsf@chenla.org>
Message-ID: <Pine.LNX.4.60.0411130922300.32321@pglaf.org>


On Sat, 13 Nov 2004, Brad Collins wrote:

>
> This is a real polarizing issue, with many academics believing that
> they are the annointed guardians of literature and recorded
> knowledge.  They feel threatened by groups like PG and DP which have
> by-passed their institutional traditions.  Many academics today feel
> threatened by etexts in the same way that the clergy felt threatened
> by the printing press.

And this is one of the reasons they won't accept eBooks,
even when I bring them to them personally, free of charge.

[Oh, they take them, but the won't allow them in libraries.]


They still want to be "A Big Fish In A Small Pond."

They don't realize the the walls of academia have been penetrated
by the virtual world. . .for them to try to stop eBooks is like
James Watson's efforts to stop Craig Venter from mapping DNA,
or even his efforts to stop the model building Crick 50 years ago.

mh
From hart at pglaf.org  Sat Nov 13 09:28:32 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Nov 13 09:28:34 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <wkwtwqpah9.fsf@chenla.org>
References: <EIEBIMKJHGGGNFLDOPPKOEKOCJAA.mbuch@mcsp.com>
	<41950197.2020707@perathoner.de> <wkwtwqpah9.fsf@chenla.org>
Message-ID: <Pine.LNX.4.60.0411130927410.32321@pglaf.org>


On Sat, 13 Nov 2004, Brad Collins wrote:

> I asked for a copy of the TEI source for Bradford's History of
> Plymouth Plantation last month from some academic group.  They asked
> me to submit a formal request which would explain what I would use
> the text for!

Try getting one from the Oxford Text Archive, hee hee!

Presuming they are still in operation, and still use
the same user apgreement. . . .

mh
From jon at noring.name  Sat Nov 13 09:38:13 2004
From: jon at noring.name (Jon Noring)
Date: Sat Nov 13 09:38:25 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <wkwtwqpah9.fsf@chenla.org>
References: <EIEBIMKJHGGGNFLDOPPKOEKOCJAA.mbuch@mcsp.com>
	<41950197.2020707@perathoner.de> <wkwtwqpah9.fsf@chenla.org>
Message-ID: <821002295031.20041113103813@noring.name>

Brad wrote:

> This is a real polarizing issue, with many academics believing that
> they are the annointed guardians of literature and recorded
> knowledge.  They feel threatened by groups like PG and DP which have
> by-passed their institutional traditions.  Many academics today feel 
> threatened by etexts in the same way that the clergy felt threatened
> by the printing press.
>
> I asked for a copy of the TEI source for Bradford's History of
> Plymouth Plantation last month from some academic group.  They asked
> me to submit a formal request which would explain what I would use
> the text for!
>
> [snip of excellent comments]

I totally agree that academia (in a general sense, there are notable
individual exceptions) is overly protective (to a neurotic degree) of
their collections of Public Domain materials and digital derivatives
thereof, and should not be.

This does not mean, then, that PG and other like-minded digital text
repositories should therefore choose not to build their text libraries
to have a *reasonable* level of quality for academics and scholars.

Rather, what better way to stick it to them is to compete with them on
their own turf!

Doing this will also raise the consciousness among many, including
our politicians, of the value of free and open documents. It might
even lead to politicians in progressive states to pass laws requiring
their state-run colleges and universities to scan their holdings of
public domain works and place them online for free and unencumbered
use. After all, many of the "academics" are being paid by taxpayer
money, as are many of the archives/repositories they run, thus they
are ultimately beholden to the public which pays them, and which is
the moral owner of the Public Domain.

I'm glad that Michael, this morning, made a call to digitize the OED.

Despite my heavy criticisms regarding how PG is run, and what its
basic requirements should be, I'm fully in support of its Prime
Directive in that (in my words):

   "All public domain texts, both scans and cleaned-up etexts, should
   be made, and must be made, freely available in digital form to the
   world without restriction or encumberance."

It pains me when I see publicly-funded academic digital repositories
not allowing free and unrestricted access to any work whose source is
from the Public Domain. Even if it cost someone $$$ to scan and markup
the work, the results should be open to the Public. After all, it is
the Public who owns the Public Domain, thus it has the moral right to
demand how any digital derivatives of the Public Domain should be used.

Jon Noring


(p.s., I wonder if some States have an "open documents" law on their
books that could be applied to their universities and colleges, and
which could be used to force them to open up their digital scans and
digital derivatives of public domain works in their collections? I may
bring this up with Brewster when I meet with him next week. Thoughts?)

From jon at noring.name  Sat Nov 13 09:47:26 2004
From: jon at noring.name (Jon Noring)
Date: Sat Nov 13 09:47:39 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <4196437F.9080905@perathoner.de>
References: <20041113050358.471844BE64@ws1-1.us4.outblaze.com>
	<200411130616.iAD6GcSm004979@posso.dm.unipi.it>
	<Pine.LNX.4.60.0411130907410.32321@pglaf.org>
	<4196437F.9080905@perathoner.de>
Message-ID: <1611002848015.20041113104726@noring.name>

Marcello wrote:
> Michael Hart wrote:

>> How much harder is it to make an eBook set up to answer all
>> these scholarly and reference questions, than just to read?

> Providing source information and page numbers is easy. So it is to 
> provide the page scans. Of course: page scans != ebook.
>
> Marking up a book to satisfy most scholarly requirements is more work 
> than I would care for, short of being paid to do it.

1) There are *reasonable* basic requirements, which are not onerous at
   all, that can be made to make the PG corpus of texts much more useful
   to academia and scholars. Here are a few that come to mind:

   a) Provide full catalog info for the source of the digital text.

   b) Provide the complete set of page scans. (I'm still of the
      opinion this should be a requirement, with the allowance that
      scans need not be provided under several defined circumstances.)

   c) In markup in the Master copy, add markers (plus maybe XLinks) to
      page breaks found in the source.

2) Any 21st century digital repository of texts should allow the
   ability of users to annotate, reference, and interlink the texts. This
   can be done without altering the texts themselves. Thus, the
   digital repository will do things that no traditional academic
   library of atomic-based artifacts can do. Thus, scholars themselves
   will improve the texts to meet their needs -- we need not do
   everything for them if we give them the tools to do it themselves.


Jon Noring

From mbuch at mcsp.com  Sat Nov 13 09:52:06 2004
From: mbuch at mcsp.com (Her Serene Highness)
Date: Sat Nov 13 09:50:19 2004
Subject: [gutvol-d] Scholarly use of PG
In-Reply-To: <p06020404bdbb4ec0f1cc@[192.168.0.3]>
Message-ID: <EIEBIMKJHGGGNFLDOPPKAEMGCJAA.mbuch@mcsp.com>


-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of David Newman
Sent: Saturday, November 13, 2004 4:31 AM
To: gutvol-d@lists.pglaf.org
Subject: [gutvol-d] Scholarly use of PG


As a credentialed conflict avoider, I've been loathe to stick my head
into this fray. Indeed, this battle about meeting the needs of
academia appears to be waged at times with an ideological fervor to
rival that of the recent US election.  It seems to me that the
fervency with which people approach this issue has made it difficult
in some cases for the arguments to follow a path towards resolution.
It is perhaps also complicated by the wide assortment of changes
being proposed to remedy the perceived problems.

Some arguments for change suggest that PG should direct its energies
towards making its library suitable for scholars by including more
information in the files, particularly pagination and provenance,
presumably packaged with XML.  I have no problem with including such
information.  However, I don't think it should be required of all
texts, nor do I believe that it really solves the scholarship issue.
Including page scans _would_, to the degree that a solution is
possible, and requires approximately 0% extra work for most of our
valiant volunteers.  And, PG has made it clear that this is
acceptable, and has already done so for some projects.

I feel that Marcello gave the most persuasive and concise summary of
the situation, and I didn't notice any overt disagreement.

Marcello Perathoner wrote:
>The best value for Academia (and the least work for us) would be just to
>include the page scans. Any transcription you make will fall short of
>the requirements of some scholar. I think we should use our time for
>producing more books for a general audience instead than producing
>Academia-certified editions of them.

HSH's comments justify such an approach.

Her Serene Highness wrote:
>I need to know EXACTLY when the original
>was published, who published it, and where, since there are variant texts
>out there.  Even a single word change that might have occurred in the
>copying process could change the meaning of a vital sentence.

Of course, there is a simple, if unsatisfactory, answer to all these
questions for PG texts:  they were published by PG, on the PG
website, and each file states when it was published.  Each work we
publish is the "PG variant" of that text.

As an academic, I find it dishonest and unhelpful for a scholar to
cite a physical volume when the volume they consulted is an
electronic edition.  It is virtually impossible to guarantee that
"even a single word change" was not introduced in the transcription
process.  Even with DP's careful processes, I would not wager that
most of our books enter PG completely error free (or correction free,
for that matter.)

** I would find it dishonest also.  I think it is very important for people
to give correct citations.  However- and this is a big however- PG is not
'publishing' books.  It's copying them. There is no PG publishing house that
is making decisions on whether something is worth publishing or not.  PG
acts a repository- a library. Paper publishers cannot guarantee that each
word onthe written page is exactly as written by the author. However, with
books that are well known or historically important, scholars can often
compare published texts with author's notes in order to see the variants.
Many of the books on PG are obscure. we are given the name of a book and an
author, but there is no book to be looked at.  If these texts are important-
and I would argue that many obscure texts are, if only for historical
reasons- it is important to have copies of the scans.  In some cases, PG may
be the only place where someone can find particular texts.

Textual clues do not live only in words.  A book comes alive in typeface,
and in word placement on a page. James Joyce didn't just write words to be
read- he placed them on pages in ways that told the reader how to interpret
them. Taking a book out of context- the context of the page- when that book
was written prior to the computer revolution is like ignoring how many
paintings were paired with their frames by the painters themselves.  Saving
a book while divorcing it from its index, illustrations, typefont, and so on
is not 'saving' it.  It's a decontextualization.

A perfect example would be movie remakes.  There are many different versions
of 'A Christmas Carol', several of them in modern dress.  Many of them use
pretty much the same exact script.  Does that make them the same? Why do
people prefer even an old, scratched-up and faded copy with Alistair Sim to
a nice shiny new version, even if the new film is a shot for shot remake?  A
film is more than actors spouting lines. Film is every aspect that goes into
it, even beyond hwat Dickens thought up.  There are times when we want
nothing more than the words of dickens, and there are times when we want the
thrill of seeing characters come to life before us in front of our physical
eyes.  A book may be perfectly good reading material- but an ebook printed
in Courier (which is very hard to read), perhaps missing its original
illustrations, without an index that shows the manner in which the author's
or editor's mind worked- is no longer the original book.

As a scholar I like working from original materials.  An original material
may be on a computer screen- that's fine by me.  An original material might
be enhanced by being online- many versions of The Bible are, for instance,
and I received great joy recently while reading what was essentially a book
that gave a key to Silverlock- it worked better online than it ever could
have on paper.  But PG is not publishing or storing original texts. It's
working with old ones.

I recall the cry that vinyl was going the way of the dinosaur- yet it has
not.  In fact, the MP3 player is the new vinyl- for the first time in years,
there are cost effective '45s', courtesy of Napster and other companies.  I
can hear snippets of a song before buying, just as my mother one did in
record shops.  However, Napster technology is not better in the long run
than a record- CDs and computer memory degrade at an alarming rate.  Books
aren't dead either, and people who think books are about finding passages in
less than 25 seconds are missing the point of why people read- in the same
way that people who drink coffee to get revved up often don't understand why
tea drinkers make elaborate ceremonies around a caffeinated beverage.
People read because they want a total experience- computers don't feel like
paper.  They don't smell. The text is usually flat and more difficult to
read. Some of this will change over time- but not all of it, thank the Lord.

I want books to be available to the public in ways that they have never been
before, and so I support PG.  But it doesn't have the credibility of a real
library or publishing house, because it doesn't publish (copying things and
leaving out some of the vitals doesn't constitute puplishing in most
people's minds, or at least not in a good way, no matter that info techies
might want to think)and it doesn't store (libraries don't cut the covers and
publishing info off their books to make more room on the shelves, they
include books of criticism, and they have technologies for cross
referencing- they also have people called librarians who can help people
refine their interests and find books that might be of use to them.  So do
bookstores.  Even Barnes and Noble, to some extent).

I think some people here want to store books.  That's nice, as far as it
goes.  Whether they understand how people use books or why- well, I
seriously doubt that some people here have thought about that.  It's like MS
Word, which ignores that people write more complex things than business
letters.  Its vocabulary and understanding of grammar are seriously stunted,
and it's hellish for anyone who wants to edit anything longer than two
pages. Does it process words efficiently? Yes.  But it's a fucking bad word
processor and has none of the grace of WordPerfect. That most people are
fine with it shows how few people actually write or edit for the joy of
doing so, which is fine- but its incomaptibilty with WP and vice versa makes
life tough on those who do.***

Page scans allow for an additional layer of safety for any scholar
concerned about the adherence to a given print edition, though a
certain level of trust in the provider is still required.  Thus,
while I hope that PG's holdings are as accurate as possible, it would
also be my hope that scholars using PG would cite PG.  Evidently this
is not always the case.

Michael Hart wrote:
>I've also heard that many of those who complain, actually use our
>eBooks in secret, and ONLY want the provenance so they can steal
>them without giving credit where credit is due.

This suggests to me two things.

1) We can include page scans and information about provenance, _when
available_, with the files so that academics can feel confident in
the reliability of those PG holdings.  Not so that the original
sources can be dishonestly cited, but to provide the necessary data
for certain scholars to confidently cite PG's edition.  We can point
to this in our documentation to enhance our scholarly credibility.

2) We can prominently suggest an appropriate style of citation of
works in PG's holdings. (I've seen this done with other digital
collections.)  Perhaps if the citation style also takes into account
the original source, some otherwise reluctant scholars would be
appeased.

Is this something we can all agree on?

--
David Newman
www.davidnewman.info
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

From hart at pglaf.org  Sat Nov 13 09:50:34 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Nov 13 09:50:36 2004
Subject: [gutvol-d] increasing literacy
In-Reply-To: <EIEBIMKJHGGGNFLDOPPKGELJCJAA.mbuch@mcsp.com>
References: <EIEBIMKJHGGGNFLDOPPKGELJCJAA.mbuch@mcsp.com>
Message-ID: <Pine.LNX.4.60.0411130946510.32321@pglaf.org>


On Fri, 12 Nov 2004, Her Serene Highness wrote:

> Illiterates rarely use computers for reading. PG would be useful after a
> person became literate, i.e., able to read. Even the children's books on PG
> are a bit too advanced for a person who is non-litereate.  Having taught
> reading, it would not be the first place I would turn- it's too text-heavy,
> for one thing.

Given that most computers today can read eBooks out loud,
it's a perfect way to learn how to read, except that you
might end up talking like Stephen Hawking. . . .

However, given the number of people who learned English
and other languages over the short-wave radio, I think
there is a real future for this.

Obviously it's not "The Young Lady's Illustrated Primer"
as in "The Diamond Age," by Neal Stephenson, but it's a start.

I have received a number of emails from people who were
starting to learn English who found our eBooks very useful.


Michael
From hart at pglaf.org  Sat Nov 13 09:53:22 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Nov 13 09:53:23 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <EIEBIMKJHGGGNFLDOPPKOELICJAA.mbuch@mcsp.com>
References: <EIEBIMKJHGGGNFLDOPPKOELICJAA.mbuch@mcsp.com>
Message-ID: <Pine.LNX.4.60.0411130950390.32321@pglaf.org>


On Fri, 12 Nov 2004, Her Serene Highness wrote:

>
>
> -----Original Message-----
> From: gutvol-d-bounces@lists.pglaf.org
> [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Dyck
> Sent: Friday, November 12, 2004 6:55 PM
> To: Project Gutenberg Volunteer Discussion
> Subject: Re: [gutvol-d] PG audience
>
>
> Michael Hart wrote:
>>
>> If we cater to scholars, we are only expanding the "digital divide,"
>> so to speak.  Our goal is to provide a large viable library to all,
>> not just to the scholars, who represent less than 1% of the people,
>> and are often very elitist.
>
> I don't think anyone is advocating providing the PG library "just to the
> scholars", so that's a strawman.

I'm worried that making the eBooks acceptable to scholars may take
more effort than simply creating them did, and that then the scholars,
libraries, etc., may still opt not to use them or to encourage others
to use them.

I'm working up a feasibility study on this now, let me know if you have
a library/librarian/scholar who is willing to try out a few dozen eBooks
with these additional features.


Michael S. Hart

From hart at pglaf.org  Sat Nov 13 09:57:02 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Nov 13 09:57:04 2004
Subject: [gutvol-d] increasing literacy
In-Reply-To: <41954E74.4EFB64DE@ibiblio.org>
References: <LYRIS-2985802-7614134-2004.11.11-14.50.08--jon#noring.name@listserv.unc.edu>
	<LYRIS-2129719-7614523-2004.11.11-15.35.38--gbnewby#pglaf.org@listserv.unc.edu>
	<20041112022413.GB8242@pglaf.org>
	<Pine.BSI.4.58.0411111832270.18744@malasada.lava.net>
	<4194B4A3.3050305@perathoner.de>
	<Pine.LNX.4.60.0411120623310.28125@pglaf.org>
	<41954E74.4EFB64DE@ibiblio.org>
Message-ID: <Pine.LNX.4.60.0411130955250.32321@pglaf.org>


On Fri, 12 Nov 2004, Michael Dyck wrote:

> Michael Hart wrote:
>>
>> If we can increase literacy by even 10%,
>> we make more difference than if we cater
>> to the scholars.
>
> We could make even more difference by doing both!
>
> Setting that aside, do we have any data (or even anecdotal evidence)
> re the effect of Project Gutenberg on literacy levels?

Lots of schools and home schoolers have sent me messages asking
and thanking us for the PG eBooks. . .enough to realize that it
is no longer just a dream for these to be used in schooling.

As for libraries, I get less of these messages from them,
but still find that things are getting started there.

mh
From mbuch at mcsp.com  Sat Nov 13 10:11:08 2004
From: mbuch at mcsp.com (Her Serene Highness)
Date: Sat Nov 13 10:09:23 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <1611002848015.20041113104726@noring.name>
Message-ID: <EIEBIMKJHGGGNFLDOPPKOEMGCJAA.mbuch@mcsp.com>

This is what I want, too.  I want cyber texts to be MORE useful, not less.

When libraries went to electronic catelogues, Info geeks cheered- they made
libraries efficient.  They should have been shot.  What they did was throw
out the original cards, which had been marked up by librarians and scholars,
and which provided clues as to which books were worth reading.  The people
who cheered did not love books- they loved information. Knowledge and
information are very different- knowledge takes time. When people thumb
through things, they discover new things- hypertext links can help them do
this.

Several of you here are academics.  Academics who give and process info are
not the same a researchers- you don't have the same needs. Research takes
time and requires facts on a level that number and word-crunching don't.

And Michael- I think you are brilliant in many ways, but you don't even want
to provide the amount of information required of a junior high school
student writing a social studies paper, let alone a scholar- and I think
that's a shame.  I shudder to think what you believe scholars do, and why,
if you love books so much, you have so high an antipathy for them. Getting
books on the web is more than a numbers game.  It's about preserving
somethng of value. What I'm seing here among some people is a mentality akin
to the early archaeologists, who completely destroyed sites in their rush to
get trophies for their museums.  They were bad scientists and little more
than barbarians. Destroying books in order to reach the new numerical goal
is not a good thing- it's very, very bad.

Michele (yeah, I have a name)

-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Jon Noring
Sent: Saturday, November 13, 2004 12:47 PM
To: gutvol-d@lists.pglaf.org
Subject: Re: [gutvol-d] Perfection


Marcello wrote:
> Michael Hart wrote:

>> How much harder is it to make an eBook set up to answer all
>> these scholarly and reference questions, than just to read?

> Providing source information and page numbers is easy. So it is to
> provide the page scans. Of course: page scans != ebook.
>
> Marking up a book to satisfy most scholarly requirements is more work
> than I would care for, short of being paid to do it.

1) There are *reasonable* basic requirements, which are not onerous at
   all, that can be made to make the PG corpus of texts much more useful
   to academia and scholars. Here are a few that come to mind:

   a) Provide full catalog info for the source of the digital text.

   b) Provide the complete set of page scans. (I'm still of the
      opinion this should be a requirement, with the allowance that
      scans need not be provided under several defined circumstances.)

   c) In markup in the Master copy, add markers (plus maybe XLinks) to
      page breaks found in the source.

2) Any 21st century digital repository of texts should allow the
   ability of users to annotate, reference, and interlink the texts. This
   can be done without altering the texts themselves. Thus, the
   digital repository will do things that no traditional academic
   library of atomic-based artifacts can do. Thus, scholars themselves
   will improve the texts to meet their needs -- we need not do
   everything for them if we give them the tools to do it themselves.


Jon Noring

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

From mbuch at mcsp.com  Sat Nov 13 10:11:11 2004
From: mbuch at mcsp.com (Her Serene Highness)
Date: Sat Nov 13 10:09:26 2004
Subject: [gutvol-d] increasing literacy
In-Reply-To: <Pine.LNX.4.60.0411130955250.32321@pglaf.org>
Message-ID: <EIEBIMKJHGGGNFLDOPPKAEMHCJAA.mbuch@mcsp.com>


-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Hart
Sent: Saturday, November 13, 2004 12:57 PM
To: Project Gutenberg Volunteer Discussion
Subject: Re: [gutvol-d] increasing literacy


On Fri, 12 Nov 2004, Michael Dyck wrote:

> Michael Hart wrote:
>>
>> If we can increase literacy by even 10%,
>> we make more difference than if we cater
>> to the scholars.
>
> We could make even more difference by doing both!
>
> Setting that aside, do we have any data (or even anecdotal evidence)
> re the effect of Project Gutenberg on literacy levels?

Lots of schools and home schoolers have sent me messages asking
and thanking us for the PG eBooks. . .enough to realize that it
is no longer just a dream for these to be used in schooling.

** Still, that's not the same as increasing litereacy.  That's facilitating
literacy.  Did they say they became smarter or better read?  The books are
good for schooling- but they could be a hell of a lot better.  Having taught
every level of school except for elementary (and I've tutored in that), I
still say it wouldn't take much to add info that would push PG forward into
classroom acceptability.**

As for libraries, I get less of these messages from them,
but still find that things are getting started there.

**How do you find that?  Are librarians saying that, or are you? Are they
using other textual sites? Why?  Do they suggest improvements? What are
they?**

mh
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

From marcello at perathoner.de  Sat Nov 13 10:39:12 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Nov 13 10:39:21 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <Pine.LNX.4.60.0411130922300.32321@pglaf.org>
References: <EIEBIMKJHGGGNFLDOPPKOEKOCJAA.mbuch@mcsp.com>	<41950197.2020707@perathoner.de>
	<wkwtwqpah9.fsf@chenla.org>
	<Pine.LNX.4.60.0411130922300.32321@pglaf.org>
Message-ID: <419654D0.3080204@perathoner.de>

Michael Hart wrote:

> They don't realize the the walls of academia have been penetrated
> by the virtual world. . .for them to try to stop eBooks is like
> James Watson's efforts to stop Craig Venter from mapping DNA,
> or even his efforts to stop the model building Crick 50 years ago.

Well, well, capitalism *has* to be good for something.

So lets praise capitalism for kicking the clerics in the *** and freeing 
information from the inprisonment in monasteries ... before we start 
kicking capitalism in the *** for making information a proprietary article.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From hart at pglaf.org  Sat Nov 13 10:41:15 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Nov 13 10:41:16 2004
Subject: !@!@!RE: [gutvol-d] Perfection
In-Reply-To: <EIEBIMKJHGGGNFLDOPPKOEMGCJAA.mbuch@mcsp.com>
References: <EIEBIMKJHGGGNFLDOPPKOEMGCJAA.mbuch@mcsp.com>
Message-ID: <Pine.LNX.4.60.0411131031390.32321@pglaf.org>


On Sat, 13 Nov 2004, Her Serene Highness [Michele Dyck?] wrote:

> This is what I want, too.  I want cyber texts to be MORE useful, not less.
>
> When libraries went to electronic catelogues, Info geeks cheered- they made
> libraries efficient.  They should have been shot.  What they did was throw
> out the original cards, which had been marked up by librarians and scholars,
> and which provided clues as to which books were worth reading.  The people
> who cheered did not love books- they loved information. Knowledge and
> information are very different- knowledge takes time. When people thumb through
> things, they discover new things- hypertext links can help them do this.

I must admit that I, too, was surprised that ye olde carde catalogues
were tossed out like babies with the bathwater.


> Several of you here are academics.  Academics who give and process info are
> not the same a researchers- you don't have the same needs. Research takes
> time and requires facts on a level that number and word-crunching don't.

More on research below.


> And Michael- I think you are brilliant in many ways, but you don't even want
> to provide the amount of information required of a junior high school
> student writing a social studies paper, let alone a scholar- and I think
> that's a shame.  I shudder to think what you believe scholars do, and why,
> if you love books so much, you have so high an antipathy for them.

It's not that I don't believe in this kind of information, it's that I
didn't want to provide a different Project Gutenberg eBook for each and
every single paper edition out there, and then have to keep canonical
errors [sic] in them for all time.

I wanted to created a "critical edition" that combined corrections
and items from various editions, and we have always supplied the
necessary information for citing our eBooks on request, which has
apparently never caused any problem either for student or teacher.


> Getting books on the web is more than a numbers game.  It's about preserving 
> somethng of value. What I'm seing here among some people is a mentality akin 
> to the early archaeologists, who completely destroyed sites in their rush to 
> get trophies for their museums.  They were bad scientists and little more 
> than barbarians. Destroying books in order to reach the new numerical goal is 
> not a good thing- it's very, very bad.

Being a pioneer is different that being a researcher,
unless you are Indians Jones, that is, but even he,
if you will recall, had his most important work[s]
taken away from him repeatedly by both those above
and below him on the Darwinian ladder.

Me, I'm a pioneer, not a researcher, and I fully
warned everyone year after year that I am NOT a
cataloguer, and that once we passed 10,000 books
this would become a very obvious problem.

However, libraries carry all storts of materials
that don't come with cataloging information, such
as records, CDs, DVDs, pamphlets, paintings, etc.

Doubly, however, I am doing some feasibility studies
on providing MARC records, and could use some help.

Michael Hart

From gbnewby at pglaf.org  Sat Nov 13 11:09:01 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat Nov 13 11:09:04 2004
Subject: [gutvol-d] Page scans (go for it!)
In-Reply-To: <p06020404bdbb4ec0f1cc@[192.168.0.3]>
References: <20041113050417.9F29A8C914@pglaf.org>
	<p06020404bdbb4ec0f1cc@[192.168.0.3]>
Message-ID: <20041113190901.GA5711@pglaf.org>

On Sat, Nov 13, 2004 at 01:30:44AM -0800, David Newman wrote:
> Marcello Perathoner wrote:
> >The best value for Academia (and the least work for us) would be just to
> >include the page scans. Any transcription you make will fall short of
> >the requirements of some scholar. I think we should use our time for
> >producing more books for a general audience instead than producing
> >Academia-certified editions of them.

It occurred to me that some people might think that page
scans are forbidden or not welcome.  While it's true that
we don't have many (any?) eBooks with full page scans,
we *are* willing & able & ready to take them.

Jim Tinsley did a 'howto' on the page scan naming 
convention (that is the hard part - so people know what
they're called and where to find them).  The post-10K
directory structure, created over a year ago, includes the
notion of a subdir for scans.

DP has been invited to submit scans along with their texts.
Maybe this word has not gone out sufficiently.  Like
with the XML markup discussion, the question is not "if"
but "how."  The first folks to submit scans with their
submitted eBooks will need to do some extra work to help
figure out the best way to do it.  The posting team will
need to keep track of the large files involved.

If someone has the scans for a completed eBook, now would 
be a good time to work on getting them online.  My estimate
from early 2004 was that this would have grown the PG
collection by an extra terabyte or so if we did it all through
2004.  We haven't, and so this growth hasn't happened.  But
other than needing to deal with the extra space (which is
trivial for a small number of eBooks, but could be challenging
for our mirrors and main distribution servers when done 
en masse), there's no impediment I know of to moving forward.
  -- Greg
From ciesiels at bigpond.net.au  Sat Nov 13 11:14:42 2004
From: ciesiels at bigpond.net.au (Michael Ciesielski)
Date: Sat Nov 13 11:15:52 2004
Subject: [gutvol-d] Page scans (go for it!)
In-Reply-To: <20041113190901.GA5711@pglaf.org>
References: <20041113050417.9F29A8C914@pglaf.org>
	<p06020404bdbb4ec0f1cc@[192.168.0.3]>
	<20041113190901.GA5711@pglaf.org>
Message-ID: <41965D22.3050405@bigpond.net.au>

Greg Newby wrote:

>DP has been invited to submit scans along with their texts.
>
http://gutenberg.net/faq/S-21 says:

"Page images submitted to Distributed Proofreaders are automatically 
saved, and, while not publicly available today, will probably become so 
in the future."

I took this to mean that there is no point in submitting page scans of 
DP projects to PG. Is that right?

Mike
From gbnewby at pglaf.org  Sat Nov 13 11:26:17 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat Nov 13 11:26:18 2004
Subject: [gutvol-d] Page scans (go for it!)
In-Reply-To: <41965D22.3050405@bigpond.net.au>
References: <20041113050417.9F29A8C914@pglaf.org>
	<p06020404bdbb4ec0f1cc@[192.168.0.3]>
	<20041113190901.GA5711@pglaf.org> <41965D22.3050405@bigpond.net.au>
Message-ID: <20041113192617.GB6386@pglaf.org>

On Sun, Nov 14, 2004 at 06:14:42AM +1100, Michael Ciesielski wrote:
> Greg Newby wrote:
> 
> >DP has been invited to submit scans along with their texts.
> >
> http://gutenberg.net/faq/S-21 says:
> 
> "Page images submitted to Distributed Proofreaders are automatically 
> saved, and, while not publicly available today, will probably become so 
> in the future."
> 
> I took this to mean that there is no point in submitting page scans of 
> DP projects to PG. Is that right?
> 
> Mike

Jim Tinsley had sent a note in this thread about this too, that I
hadn't yet seen when I wrote my reply.  No, it's not right.  Yes, we
are ready to accept page scans as part of completed eBooks from DP or
other sources.  Jim said he's already done this for 3 eBooks.

I hope Jim can find the little 'howto' he wrote about the file names &
formats, otherwise I can dive into my email archive to seek it out.

Getting the process tuned will take a few tries, and require some
patience from everyone involved, but the intent for quite some time
(over a year at least) has been to move forward with scans.
  -- Greg

From jon at noring.name  Sat Nov 13 12:06:22 2004
From: jon at noring.name (Jon Noring)
Date: Sat Nov 13 12:06:40 2004
Subject: [gutvol-d] Page scans (go for it!)
In-Reply-To: <20041113190901.GA5711@pglaf.org>
References: <20041113050417.9F29A8C914@pglaf.org>
	<p06020404bdbb4ec0f1cc@[192.168.0.3]> <20041113190901.GA5711@pglaf.org>
Message-ID: <221011183281.20041113130622@noring.name>

Greg wrote:

> It occurred to me that some people might think that page
> scans are forbidden or not welcome.  While it's true that
> we don't have many (any?) eBooks with full page scans,
> we *are* willing & able & ready to take them.

This is excellent news! Yes, I think people were uncertain about how
welcome page scans were by PG. (Whether PG should require page scans
be submitted along with texts, with certain exceptions given, is a
different issue.)

Obviously, if the page scans existed for all the 10,000+ PG texts, the
collection of scans would occupy a lot of space, but surprisingly not
as much as one might think, at least by today's hardware standards.

Assuming we have 15,000 texts, each of which has an average of 300
source pages (which may be a high estimate -- anyone?), and each page
scan occupies about 60k (using an efficient lossless compression
scheme -- this may also be a high estimate -- anyone?), this works out
to approximately a little under 300 gigabytes.

(My son recently bought two 200G hard drives for $100 each. There are
300G drives available, and it seems like year after year hard disk
capacities continue to increase, while $/gig continues to drop.)

I know Brewster Kahle at the Internet Archive will also be happy to
receive file copies of these page scans and tuck them away into his
archive (which is redundantly mirrored) for preservation and open
online access.

Of course, with one million scanned books, we are now talking about
significant space, approximately 20 terabytes (using the assumptions
above). But this is 1/5 of Brewster's "rack" (where 10 racks makes a
petabyte) and again I know he'll be thrilled to store these away for
safekeeping and open access. (PG should also store these scans itself
and find others throughout the world willing to store them on hard
disk, tape, etc., to assure redundant storage and preservation.)

It would not surprise me to see in a few years high quality, durable,
random access, compact, and very cheap storage in the ten to twenty
terabyte range per unit -- enough to hold the original page scans for
one million books. We then can start thinking about one billion books.

So storage and access should NOT be an issue with regards to acquiring
the original page scans for the PG Library.

Jon Noring

From jmdyck at ibiblio.org  Sat Nov 13 12:42:56 2004
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Sat Nov 13 12:43:21 2004
Subject: !@!@!RE: [gutvol-d] Perfection
References: <EIEBIMKJHGGGNFLDOPPKOEMGCJAA.mbuch@mcsp.com>
	<Pine.LNX.4.60.0411131031390.32321@pglaf.org>
Message-ID: <419671D0.863F7CE1@ibiblio.org>

Michael Hart wrote:
> 
> On Sat, 13 Nov 2004, Her Serene Highness [Michele Dyck?] wrote:

"Her Serene Highness" is Michele, but given her email address,
I doubt her last name is Dyck. Mine is, though.

Michael Hart:
>
> ... I didn't want to provide a different Project Gutenberg eBook
> for each and every single paper edition out there, and then have
> to keep canonical errors [sic] in them for all time.

You say "didn't". Do you still feel this way?

> I wanted to created a "critical edition" that combined corrections
> and items from various editions,

I'm curious: How many such amalgams has PG produced?
What was the latest?

> and we have always supplied the necessary information for citing
> our eBooks on request,

But that's not apparent to someone reading a PG eBook, I think.
E.g., the PG boilerplate doesn't have a sentence like:
    To find out what printed edition(s) this eBook was
    created from, send a request to someone@pglaf.org.

-Michael
From nwolcott2 at kreative.net  Sat Nov 13 13:21:14 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Sat Nov 13 13:21:38 2004
Subject: [gutvol-d] Perfection
References: <20041112234527.EF05E4BE64@ws1-1.us4.outblaze.com>
Message-ID: <00c201c4c9c6$bc5f8260$b79495ce@net>

I'm working on the 1616 translation of Suetonius by Philemon Holland, still
regarded by some as the best translation.
nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
----- Original Message -----
From: "D. Starner" <shalesller@writeme.com>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Friday, November 12, 2004 6:45 PM
Subject: RE: [gutvol-d] Perfection


> Let me note, I had no way of telling Greg's comments apart
> from yours except for context. Perhaps you relied on some
> HTML thing; please don't do so. I'm not going to argue the
> wisdom of HTML email, but HTML email that does not degrade
> nicely to plain text is going to look awful to many of the
> recievers.
>
> "Her Serene Highness" writes:
>
> > But a
> > citation of an out of print book in anthropology, English literature,
the
> > hard scieces, et al, which might very well not be correct in its
> > information- that will be problematic.
>
> But this has nothing to do with etexts; this has to do with older books.
>
> >
> > I would be very happy to see Boas online. Eventually I hope to track
down
> > an out of copywright version of his writings and scan it for PG.
>
> It'll be a long time, unless you move to Canada. The last of his
> works are out of print for another 7 years in the EU and 33 years in
> the US. The Bureau of American Ethnology volumes are being worked on
> up to 1930 (since it's a US government publication) and I believe that
> includes some work by Boas.
>
> > In chapter 5
> > there might be a very quotable sentence- but what my student doesn't
know is
> > that this sentence was changed in later editions. And there's no page
> > number- does he tell his teacher to read the entire chapter to find a
> > sentence that won't be there in a later edition?
>
> What is he supposed to do, give a page reference to one of a dozen
editions
> that might be very hard for the teacher to find? With etexts, you know
> that your recipent has access to the same edition you have. And as someone
> else pointed out, if you quote the sentence, the context can be found in
> seconds.
>
> > After all, I
> > have no idea who JM Rodwell was, or whether his translation of The Koran
is
> > the definitive English version, or why his translation was chosen- other
> > than that his book was out of copyright. From my point of view, that's a
red
> > flag itself. If this translation is so superb, why isn't it still being
> > used- or is it?
>
> And how do I know that if I pull it off the library shelves? My college
library
> has a half dozen different translations of the Koran; how am I to know
which
> are in use?
>
> As for the reason it's not being used, I would suggest that the fact that
> academics like to retranslate everything every decade might be an
explanation.
> My class used a modern translation of the Iliad, but that doesn't mean
that
> in several hundred years of English translation of the work that's now
public
> domain, there's not one competent, even superb translation.
>
> > Nietzsche's work for instance, was butchered by his sister. There are
> > conflicting copies of his work floating around. When his works were
copied
> > for Project Gutenberg, did someone go for an out of copyright copy that
is
> > definitive, or one that his sister chopped up? Did that matter, or was
it
> > just more important to get a copy up?
>
> I doubt that the people who scanned it were aware of the differences.
> --
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From Gutenberg9443 at aol.com  Sat Nov 13 13:24:19 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Sat Nov 13 13:24:30 2004
Subject: !@!@!RE: [gutvol-d] Perfection
Message-ID: <14.384d0968.2ec7d583@aol.com>

 
In a message dated 11/13/2004 1:43:34 PM Mountain Standard Time,  
jmdyck@ibiblio.org writes:

I wanted  to created a "critical edition" that combined corrections
> and items  from various editions,

I'm curious: How many such amalgams has PG  produced?
What was the latest?


I don't know how many others, but my version of
SWISS FAMILY ROBINSON is such an amalgam.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041113/3d02184a/attachment.html
From Gutenberg9443 at aol.com  Sat Nov 13 13:38:19 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Sat Nov 13 13:38:30 2004
Subject: [gutvol-d] Perfection
Message-ID: <c1.4b50a168.2ec7d8cb@aol.com>

 
In a message dated 11/13/2004 2:21:47 PM Mountain Standard Time,  
nwolcott2@kreative.net writes:

I'm  working on the 1616 translation of Suetonius by Philemon Holland,  still
regarded by some as the best translation.


Alexander Thomson did the translation already
posted. Obviously this is one of the cases in
which the name of the translator is essential.
 
It does bug me when an etext doesn't give
me the original pub date, although I am able
to look most of them up.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041113/01e810df/attachment.html
From shalesller at writeme.com  Sat Nov 13 14:36:54 2004
From: shalesller at writeme.com (D. Starner)
Date: Sat Nov 13 14:37:05 2004
Subject: [gutvol-d] Scholarly use of PG
Message-ID: <20041113223654.137CA4BE64@ws1-1.us4.outblaze.com>

"Her Serene Highness" writes:
> However- and this is a big however- PG is not 
> 'publishing' books. It's copying them. There is no PG publishing house that 
> is making decisions on whether something is worth publishing or not.

Yes, there is. The people of PG as a whole decide whether something is
worth publishing or not. There are many books that the people of PG
have decided they aren't worth publishing for right now, because they
are too hard to scan, too hard to process, too expensive or too hard
to get a copy of, etc.

> Textual clues do not live only in words. A book comes alive in typeface, 
> and in word placement on a page. 

I've seen a grand total of two modern books that used the long-s and
weren't facsimile copies, and one of those only used the long-s to
replicate the original title pages, not the original text. Several of
the facsimile editions have been upfront about the fact that they do
what they do for cost reasons, not because it's better.

> Saving 
> a book while divorcing it from its index, illustrations, typefont, and so on 
> is not 'saving' it. It's a decontextualization. 

If you want an original copy, go find it. But every new publication 
decontextualizes the books. Somehow Beowulf readers are willing to
deal with editions that look nothing like the original, that lack
its illustrations, its typeface. I've handled books printed in Germany
in the mid-18th century, and it's an experience in some ways. But that
doesn't mean that I insist that reprints be printed in old-style German
fonts on rag paper.

> A book may be perfectly good reading material- but an ebook printed 
> in Courier (which is very hard to read), 

Then don't print it in Courier. That choice is left to you.

> perhaps missing its original 
> illustrations, without an index that shows the manner in which the author's 
> or editor's mind worked- is no longer the original book. 

No, it's not. We don't have matter copiers to replicate the original book.

> I recall the cry that vinyl was going the way of the dinosaur- yet it has 
> not. In fact, the MP3 player is the new vinyl-

Vinyl has gone the way of the dinosaur; I think it's down to less than 0.5%
of the new material sold is on vinyl, and that in a few limited genres. 
The MP3 may be the "new vinyl", but it's not vinyl.

> However, Napster technology is not better in the long run 
> than a record- CDs and computer memory degrade at an alarming rate. 

Records are degraded the instant they're pressed, are impossible
to copy, and degrade while playing. Napster made backups on a million 
computers in a few days. You can manually make backups easily, and
take them in the car or while jogging.

You seem like you're looking for reasons to attack new technology.
It has its faults, but I think the complete superseding of records
by CDs is good evidence that CDs are overall better than records.

> Books 
> aren't dead either, and people who think books are about finding passages in 
> less than 25 seconds are missing the point of why people read

That was a challenge you made.

> People read because they want a total experience - computers don't feel like 
> paper. They don't smell. The text is usually flat and more difficult to 
> read. Some of this will change over time- but not all of it, thank the Lord. 

That's absurd. People read for a million reasons; there is no one point
of why people read. Some readings are for entertainment, some readings
are because there's nothing else to do on a long trip, some are of whole
books for detailed information, some are of one page for a little piece
of information. Yes, some people prefer to read on paper for those
things that will never be replicated on computer, but we aren't going
into libraries and taking books off the shelves.

> (copying things and 
> leaving out some of the vitals doesn't constitute puplishing in most 
> people's minds, or at least not in a good way, no matter that info techies 
> might want to think)

My library has dozens of little brown books in a series called "Handy
Literal Translations", where they took older translations, dumped 
most of the plays or speeches, and published them in a handy portable
form. Or how about the Augustan Reprint Society which frequently 
reprinted only the introduction, or select essays from various
volumes?

> libraries don't cut the covers and 
> publishing info off their books to make more room on the shelves, 

Right, they put them on microfilm and throw the newspapers away. My
mother has a signed Mark Twain that the library was getting rid of,
and someone on DP bought boxes of books at the Sydney University bookfest
at $5 a piece. Don't tell me that libraries don't get rid of books.

As for the covers, in many academic libraries, a number of books have 
had their covers removed and replaced with a library binding. I rarely
see dust jackets on library books, especially not in university libraries.
Decontextualization galore.

> they 
> include books of criticism,

How many books of criticism do you usually find in a library of 10,000
to 15,000 books? It's not like we don't include books of criticism,
it's just that we don't have many yet. 

> Whether they understand how people use books or why- well, I 
> seriously doubt that some people here have thought about that. 

Right, all these people who love books so much that they would spend
their volunteer time working on scanning them and proofing them don't
know how people use books or why. They just love books from a distance;
they don't actually use them.

Furthermore, I don't think you understand how people use ebooks or why.
You spend a lot of time in criticism, but a lot of it just wrong. You
told us we couldn't find a quote in a large body of text, you tell us
that typeface is important when no printed book cares, you complain
that an ebook in Courier is hard to read, which is a bit like saying
that it's hard to read this book because it's upside down.

> its incomaptibilty with WP and vice versa makes 
> life tough on those who do.

Have you ever tried learning Word? Your gripes sound like someone
who learned WordPerfect and never bothered to learn Word.

To which, may I ask again, that you conform to the standard email
quoting standard and trim irrelevant text that you aren't replying
to? There are rules which have developed over time for the ease of
communication via email, which may at some points be arbitary, but
everyone adhering to the standard facilites communicating. Your 
ignoring of these rules make me feel that it's less of a communication
and more your demanding that the computer world bend entirely to 
conform to your little world.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From vze3rknp at verizon.net  Sat Nov 13 15:41:05 2004
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Sat Nov 13 15:41:16 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <171000217734.20041113100336@noring.name>
References: <EIEBIMKJHGGGNFLDOPPKOEKOCJAA.mbuch@mcsp.com>	<41950197.2020707@perathoner.de>
	<wkwtwqpah9.fsf@chenla.org>
	<171000217734.20041113100336@noring.name>
Message-ID: <41969B91.6060104@verizon.net>

I scanned that book and put it through DP around the 4th of July. It is 
still waiting for someone to decide to post-process it.

JulietS

Jon Noring wrote:

>Brad wrote:
>
>  
>
>>I asked for a copy of the TEI source for Bradford's History of
>>Plymouth Plantation last month from some academic group.  They asked
>>me to submit a formal request which would explain what I would use
>>the text for!
>>    
>>
>
>Interesting.
>
>I happen to have a copy of the 1898 printing of `Bradford's History
>"Of Plimoth Plantation."' My wife's maternal ancestry goes back to
>colonial Massachusetts, and I think one of her ancestors is mentioned
>in the book (Degory Priest).
>
>If this book has not yet been scanned by anyone affiliated with PG/DP,
>I'll gladly offer our copy for scanning so long as whatever is used to
>scan it will not damage the binding (probably can't use a flat bed
>scanner), and that the scans *will* be made available online for free,
>even before the work is converted to XML.
>
>The book, including index, has over 550 pages, so it is pretty massive.
>
>A fascinating work, btw, and one I hope will be scanned and converted
>to TEI by PG/DP.
>
>Jon Noring
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>  
>

From jon at noring.name  Sat Nov 13 15:48:35 2004
From: jon at noring.name (Jon Noring)
Date: Sat Nov 13 15:49:23 2004
Subject: !@!@!RE: [gutvol-d] Perfection
In-Reply-To: <Pine.LNX.4.60.0411131031390.32321@pglaf.org>
References: <EIEBIMKJHGGGNFLDOPPKOEMGCJAA.mbuch@mcsp.com>
	<Pine.LNX.4.60.0411131031390.32321@pglaf.org>
Message-ID: <1131024517078.20041113164835@noring.name>

Michael Hart wrote:
> Michele wrote:

>> And Michael- I think you are brilliant in many ways, but you don't even want
>> to provide the amount of information required of a junior high school
>> student writing a social studies paper, let alone a scholar- and I think
>> that's a shame.  I shudder to think what you believe scholars do, and why,
>> if you love books so much, you have so high an antipathy for them.

> It's not that I don't believe in this kind of information, it's that I
> didn't want to provide a different Project Gutenberg eBook for each and
> every single paper edition out there, and then have to keep canonical
> errors [sic] in them for all time.
>
> I wanted to created a "critical edition" that combined corrections
> and items from various editions, and we have always supplied the
> necessary information for citing our eBooks on request, which has
> apparently never caused any problem either for student or teacher.

Now I think this is getting us to the core of the various issues being
discussed of late.

In the early days of PG, when disk space was ultra-expensive (and
removable storage was of limited capacity), when volunteers were few,
and when the Internet did not yet exist (and when it came into being
for the ordinary Joe in the late 1980's with very slow modem access),
the idea of PG focusing on producing a "critical edition" of important
public domain works for casual reading made a whole lot of sense.

However, I believe things have changed so much that this focus needs
to be reevaluated. Let's look at the situation today, and tomorrow:

(o) Disk space is getting so cheap and of such high capacity that we
    can now consider it economical for text repositories to hold the
    high-density original page scan images for *one million books*.
    When the texts are in high-quality XML, we can hold *billions* of
    textual works, with no problem. In a decade, we can begin talking
    about *trillions* of textual works (big and small). There's no
    longer an issue of which published edition to pick to "represent"
    a particular Work -- we can have them all online.

(o) More and more people have high-speed access to the Internet,
    allowing fast downloading of books, as well as enabling the
    technologies to mobilize large numbers of avid volunteers to
    produce high-quality texts (eventually in XML markup) using
    Internet-enabled systems such as Distributed Proofreaders.

And tomorrow? Here's what I see:

(o) We will see Distributed Proofreaders greatly improve in both
    quality of production (high quality XML output) as well as much
    greater capacity. It will also be "clonable" by other groups
    dealing with specific types of publications. I believe we'll see
    over 1000 major books PER DAY being completed by DP and its
    various "clones" throughout the world, not to mention innumerable
    texts of other types. That's a thousand book-length works PER DAY
    worldwide.

Thus, the need for "critical" editions based on technical limitations
is no longer an issue. Many works were only issued once anyway, so the
etext version *is* the critical edition, but some works were issued in
various editions over time -- all of them can now be scanned and
placed side-by-side online. Let the end-user decide which one to
access, based on their own investigation or by the recommendations of
others (advanced systems can be set up to aid in selection -- PG
itself can recommend which version the reader should consider first.)

It is thus important to preserve the full source information, since
end-users will need to know that information, to know what they are
getting. If an earlier, more faithful version of the Work is not in
the PG system (how would they know unless the versions of the Work
already in the system have complete source information?), they can
suggest which edition to convert through DP. Ultimately, I hope that
PG will cover almost all first and early editions of important works.

Another aspect of this issue are submissions of works to PG which are
based on original Public Domain works, but which have been
substantially modified by the submitter acting as editor, in essence
creating a new edition of the Work. For example, my publishing
company's version of Sir Richard F. Burton's "Kama Sutra of
Vatsyayana", first published in the 1880's, has been significantly
edited and modified -- but not expunged in any way -- no content has
been removed, but has been moved around to aid with logical
organization, plus I've added several annotations to clarify things
which Burton inexplicably did not. The publisher intro to this book
makes clear what changes were made to the text.

For submissions such as this, PG should certainly accept such altered
and composite works, but it is important the metadata state clearly
this is an "altered" work from the source, or something to that
effect, as well as stating what public domain source(s) were used to
create the work. (Ideally, PG would have these source works in the PG
Library, with the original page scans and the faithful etext versions
alongside, so the user of the altered/composite etext will be able to
determine, if they want, the alterations which were made to create it.)


In summary, I believe PG is making a big mistake going down the road
of being a "gatekeeper" or "original publisher" of some sort. It
should concentrate on what it does best: locate/acquire, copyright
clear, and place online Public Domain (and Creative Commons) texts in
high-quality form. Let others do the vetting and recommendations for
what should be read. Let PG make it ALL available for free to
everyone, everywhere and at all times.

Jon Noring

From jon at noring.name  Sat Nov 13 15:55:33 2004
From: jon at noring.name (Jon Noring)
Date: Sat Nov 13 15:56:08 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <41969B91.6060104@verizon.net>
References: <EIEBIMKJHGGGNFLDOPPKOEKOCJAA.mbuch@mcsp.com>
	<41950197.2020707@perathoner.de> <wkwtwqpah9.fsf@chenla.org>
	<171000217734.20041113100336@noring.name>
	<41969B91.6060104@verizon.net>
Message-ID: <1241024934156.20041113165533@noring.name>

Juliet wrote:
> Jon wrote:

>> I happen to have a copy of the 1898 printing of `Bradford's History
>> "Of Plimoth Plantation."' My wife's maternal ancestry goes back to
>> colonial Massachusetts, and I think one of her ancestors is mentioned
>> in the book (Degory Priest).

> I scanned that book and put it through DP around the 4th of July. It is
> still waiting for someone to decide to post-process it.

Great to hear! Thanks for replying.

Jon

From nwolcott2 at kreative.net  Sat Nov 13 14:45:18 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Sat Nov 13 16:03:18 2004
Subject: [gutvol-d] Gone with the wind i s "Gone with the wind"
References: <006d01c4c8cb$e4d38440$069595ce@net> <4195639D.5060600@bohol.ph>
Message-ID: <00da01c4c9dd$50e11460$b79495ce@net>

When an item is removed from a collection, it is apparently removed from all
of the archive as well. Hence no GWTW on the archive.
nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
----- Original Message -----
From: "Jeroen Hellingman" <jeroen@bohol.ph>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Friday, November 12, 2004 8:30 PM
Subject: Re: [gutvol-d] Gone with the wind i s "Gone with the wind"


> Norm Wolcott wrote:
>
> > Sayonara. Apparently all versions of GWTW have disappeared from the net.
> >
> > nwolcott2@post.harvard.edu <mailto:nwolcott2@post.harvard.edu>  Friar
> > Wolcott, Gutenberg Abbey, Sherwood Forrest
>
>
> Try the wayback machine, www.archive.org
>
> Jeroen.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From nwolcott2 at kreative.net  Sat Nov 13 16:02:51 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Sat Nov 13 16:03:21 2004
Subject: [gutvol-d] Gone with the wind i s "Gone with the wind"
References: <006d01c4c8cb$e4d38440$069595ce@net> <4195639D.5060600@bohol.ph>
Message-ID: <00db01c4c9dd$51c46f80$b79495ce@net>

The volume seems to have been deleted in August 2003. There are some
fragments remaining, 1.67/2.24 MB .  I believe that when removed from a site
they are also removed from the archives.  Also many zip files have been
removed. Could be that the earlier defective versiions somehow did not get
removed.
nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
----- Original Message -----
From: "Jeroen Hellingman" <jeroen@bohol.ph>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Friday, November 12, 2004 8:30 PM
Subject: Re: [gutvol-d] Gone with the wind i s "Gone with the wind"


> Norm Wolcott wrote:
>
> > Sayonara. Apparently all versions of GWTW have disappeared from the net.
> >
> > nwolcott2@post.harvard.edu <mailto:nwolcott2@post.harvard.edu>  Friar
> > Wolcott, Gutenberg Abbey, Sherwood Forrest
>
>
> Try the wayback machine, www.archive.org
>
> Jeroen.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From j.hagerson at comcast.net  Sat Nov 13 16:16:31 2004
From: j.hagerson at comcast.net (John Hagerson)
Date: Sat Nov 13 16:16:52 2004
Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience,
	etc.]
Message-ID: <009301c4c9df$349c3940$6401a8c0@enterprise>

#define SOMEWHAT_PEEVED 1
I'm a student (or a researcher). I'm writing a paper. I'm planning to cite
an e-book.

Why don't I just DOWNLOAD the eBook to my computer and append it to my
paper? Then, it doesn't make any difference what happens to the eBook left
in the ether. TEACHER, HERE IS THE COPY *I* USED.

I'm just a lowly volunteer slogging my way through DP producing books. If
Jon Noring wants to go off and create "Distributed Definitive Editions," I
don't see anything stopping him. I, personally, am not at all interested in
that effort, but I'm just a lowly peasant (an *anybody* doing proofreading).

What am I missing?
#define SOMEWHAT_PEEVED 0

John


From Gutenberg9443 at aol.com  Sat Nov 13 17:22:12 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Sat Nov 13 17:22:29 2004
Subject: [gutvol-d] future of ebooks + Scholarly use of PG + books to take
	to bed
Message-ID: <102.53edde11.2ec80d44@aol.com>

 
In a message dated 11/13/2004 3:37:14 PM Mountain Standard Time,  
shalesller@writeme.com writes:

>>You
>>told us we couldn't find a quote in a large body  of text, >>you tell us
>>that typeface is important when no  printed book cares, >>you complain
>>that an ebook in Courier  is hard to read, which is a bit >>like saying
>>that it's hard  to read this book because it's upside  >>down.


Right on. I loathe Courier, so I don't read my ebooks in Courier; I read  
them on the computer in Times New Roman, and I read them on my ebook reader in  
Arial, because that's the default. I could change it if I wanted to.
 
As for books to read in bed . . . well, good people, I have wonderful news  
for you. A version of the Rocket has been revived. FictionWise now has one  
available for $99 and is actively working on newer versions. Go here for  more 
information:
 
  _eBookwise_ (http://www.ebookwise.com/)  
 
As for me, why would I (or anybody else) reading in bed want to read a  paper 
book, which has only one right-side-up and doesn't care what position I'm  in 
and how uncomfortable I am trying to read it in that position, when I have a  
lovely little device that will change its orientation so that if I want to 
hold  it in my left hand or in my right hand or longwise with either side up, it 
 agreeably makes its buttons available to whatever hand I want to use, and it 
 will let me decide which of four positions is "up" right now? Oh yes, and 
it's  backlit, so I don't have to keep an overhead light on. Believe me, you 
read this  kind of book in bed for one hour and you will NEVER want to go back to 
tree  books for reading in bed. (Back to the quotation I started with, I CAN 
read my  book upside-down, because upside-down becomes right-side-up at the 
click of a  button.)
 
As yet, eBookWise does not sell a program for turning your own material  
into.rb format, but you go to the site below and download the second program,  the 
one that supports USB or serial port, and you can turn your Gutenberg books  
into .rb books. Then use your eBookWise librarian to import the .rb books  and 
put them into your eBookWise reader.
 
_Rocket eBook  Site_ (http://www.rocket-ebook.com/Readers/Software/)  
 
In fact, I will make a rash promise that I probably will live to regret,  and 
promise you that when you buy your eBookWise ebook reader, let me know what  
five PG books you want the most and I'll convert them myself, though they're  
probably already at Blackmask in .rb format. You can also get .rb books at
 
_Phoenix-Library - A worldwide  multiformat ebook library._ 
(http://www.phoenix-library.org/)  
 
Among its other offerings, it has an excellent selection of translating  
dictionaries from and to several different languages. These are not searchable;  
instead, when you start to read a book that you know is likely to have words in 
 different languages, you also load the different language, and then whenever 
you  come to a word you don't know you tell your reader what language it is 
and to  look it up, and most of the time the word will be present. I keep 
French-English  and Spanish-English  on my reader at all times, and add other 
dictionaries  (which I keep in my ebook library) when I need them.
 
So--you want a good look at the future of ebooks? Brothers and sisters,  it's 
here. Of course technology will improve. It is the job of technology to  
improve. But every time it does, there will be a sufficient span of time for PG  
and its descendants to change the filing system into one that can remain  
readable.
 
Someone used the example of Beowulf. Uh, yeah. That's a good one. I can't  
possibly read it in its original language, but I can read it in my original  
language whenever I want it. Some people find Chaucer unreadable. I don't, but  
I'm glad it's available in modern English for people who can read it only that  
way.
 
After all, you can't understand Shakespeare unless you read him in the  
original Klingon.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041113/d4cc0412/attachment.html
From Gutenberg9443 at aol.com  Sat Nov 13 17:24:31 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Sat Nov 13 17:24:51 2004
Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience,
	etc.]
Message-ID: <c1.4b540933.2ec80dcf@aol.com>

 
In a message dated 11/13/2004 5:16:55 PM Mountain Standard Time,  
j.hagerson@comcast.net writes:

Why  don't I just DOWNLOAD the eBook to my computer and append it to my
paper?  Then, it doesn't make any difference what happens to the eBook left
in the  ether. TEACHER, HERE IS THE COPY *I* USED.


Jon, as one who has graded many college papers, I beg and plead and implore  
that you provide the teacher the complete URL, not the ebook itself.
 
However, this is a great idea.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041113/2feb653c/attachment-0001.html
From jtinsley at pobox.com  Sat Nov 13 18:35:23 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Sat Nov 13 18:35:38 2004
Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience,
	etc.]
In-Reply-To: <009301c4c9df$349c3940$6401a8c0@enterprise>
References: <009301c4c9df$349c3940$6401a8c0@enterprise>
Message-ID: <20041114023523.GA9326@panix.com>

On Sat, Nov 13, 2004 at 06:16:31PM -0600, John Hagerson wrote:
>#define SOMEWHAT_PEEVED 1
>I'm a student (or a researcher). I'm writing a paper. I'm planning to cite
>an e-book.
>
>Why don't I just DOWNLOAD the eBook to my computer and append it to my
>paper? Then, it doesn't make any difference what happens to the eBook left
>in the ether. TEACHER, HERE IS THE COPY *I* USED.
>
>I'm just a lowly volunteer slogging my way through DP producing books. If
>Jon Noring wants to go off and create "Distributed Definitive Editions," I
>don't see anything stopping him. I, personally, am not at all interested in
>that effort, but I'm just a lowly peasant (an *anybody* doing proofreading).
>
>What am I missing?
>#define SOMEWHAT_PEEVED 0
>

>

I'm afraid, John, that you are missing a great deal, as am I. We are
both rather lowly, and obviously can't grasp the Big Picture that is
so clear to those unencumbered by experience of making etexts, and who
have lots of time to contemplate the Big Picture since they aren't
spending their time actually doing useful work for PG. 

Until we evolve enough to understand their mighty cogitations, or at
least enough to be considered suitable for a post as VP of Marketing,
I think we should ignore them, and get back to our humble tasks, while
our betters decide what we should be doing.

jim

From jtinsley at pobox.com  Sat Nov 13 19:00:09 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Sat Nov 13 19:00:27 2004
Subject: Marking bold & italics in .txt (was Re: [gutvol-d] a
	few	questions that i don't know the answer to)
Message-ID: <20041114030009.GA21613@panix.com>

On Fri, 12 Nov 2004 22:34:52 -0500, Joshua Hutchinson <joshua@hutchinson.net> wrote:

>Jim Tinsley wrote:
>
>>
>>Where bold does need to be rendered in plain text, the current
>>most common usage (from DP) is *bold text*. There are times when
>>it is appropriate to signify bold, but I have seen some texts
>>coming from DP where it has been used unnecessarily -- mostly
>>to indicate a sub-heading or chapter title in the book. In
>>such a case, where a chapter title is clearly a chapter title
>>and on a line by itself, there really is no need to mark it in
>>the plain text version as having been bold face in the original. 
>>I think this practice comes from people pre-marking the text for
>>later conversion to HTML, rather than any intent to clutter the
>>plain text.
>>
>>  
>>
>Actually, it is probably there from the OCR pre-processing and was never 
>removed through all the rounds of proofing and post-processing... why I 
>feel this is important enough of  a distinction that I needed to make a 
>post about ... I have no idea.

Well, at least you cleared up a mystery for me! Thanks!

And since I have now been asked twice within the last couple
of weeks about bold, I guess that's Frequently enough to go
into the FAQ. It's now on the list for the next update.

jim


From jon at noring.name  Sat Nov 13 19:35:06 2004
From: jon at noring.name (Jon Noring)
Date: Sat Nov 13 19:35:26 2004
Subject: [gutvol-d] Page Scans versus Real eText?
Message-ID: <211038107671.20041113203506@noring.name>

[I posted the following to The eBook Community. But clearly most of
the real experts in digitizing texts are found right here on gutvol-d,
so I'm reposting the message here.

I'm especially curious in the economics associated with commercially
doing what Michael Hart and PG has done for years and years.]


There are essentially two ways to digitize and place online textual
works which exist only on paper. This applies, for example, to older
public domain books.

1) Digitally scan the publication, and place the resultant page scan
   images online as the final product.

   Optionally, these page scans can be OCR'd to produce a raw
   (uncorrected) searchable index to search for page scans that may be
   of interest to the user. Additionally, the scanned images can be
   "packed" into a PDF document for online distribution and viewing,
   and for offline printing.

2) The publication is converted into real digital text, using either
   OCR or keying in by hand to produce the raw text. Then, significant
   human effort is expended to proofread and correct the digital text
   for any transcribing/OCR and other errors. The resultant cleaned-up
   text can either be kept in plain text form (traditional Project
   Gutenberg text), or marked-up into XML documents using some markup
   vocabulary.

   (Optionally, the original page scans can be kept along-side the
   cleaned-up/marked-up text, thereby accruing whatever advantages
   the first method gives.)


Clearly, digital text is superior in many respects to page scan
images. The biggest downside is the need to do the laborious human
proofing. Online proofing systems such as Distributed Proofreaders
have made proofing much easier to do, mobilizing many willing
volunteer proofers and providing a convenient Internet interface.

However, in discussions with various people on this topic I've not
been able to explain, in a cogent and compelling way, all the reasons
why the additional effort should be expended to produce high-quality
digital text. Some of these people believe that putting the scanned
page images online is more than sufficient.

So, this is an "Ask TeBC" request for better arguments to use. I hope,
too, that it catalyzes interesting discussion on the various aspects
associated with the general issue of getting our printed heritage
online, which is obviously an ebook-related topic. And this not only
applies to books, but to periodicals, newspapers, and many other
types of historical documents.

*****

Another related question:

If I have a typical printed book (say a 300 page fictional work), and
I hire a commercial company to convert it into a clean, high-quality
digital text with XML markup (e.g., using TEI-Lite), how much would it
cost? In the U.S.? Overseas (such as in India)?

Anyone know?

*****

Jon Noring

From mbuch at mcsp.com  Sat Nov 13 20:33:31 2004
From: mbuch at mcsp.com (Her Serene Highness)
Date: Sat Nov 13 20:31:57 2004
Subject: [gutvol-d] future of ebooks + Scholarly use of PG + books to
	taketo bed
In-Reply-To: <102.53edde11.2ec80d44@aol.com>
Message-ID: <EIEBIMKJHGGGNFLDOPPKEENJCJAA.mbuch@mcsp.com>


  -----Original Message-----
  From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Gutenberg9443@aol.com
  Sent: Saturday, November 13, 2004 8:22 PM
  To: gutvol-d@lists.pglaf.org
  Subject: [gutvol-d] future of ebooks + Scholarly use of PG + books to
taketo bed


  In a message dated 11/13/2004 3:37:14 PM Mountain Standard Time,
shalesller@writeme.com writes:
    >>You
    >>told us we couldn't find a quote in a large body of text, >>you tell
us
    >>that typeface is important when no printed book cares, >>you complain
    >>that an ebook in Courier is hard to read, which is a bit >>like saying
    >>that it's hard to read this book because it's upside >>down.

  Right on. I loathe Courier, so I don't read my ebooks in Courier; I read
them on the computer in Times New Roman, and I read them on my ebook reader
in Arial, because that's the default. I could change it if I wanted to.

  As for books to read in bed . . . well, good people, I have wonderful news
for you. A version of the Rocket has been revived. FictionWise now has one
available for $99 and is actively working on newer versions. Go here for
more information:

  >> Again- the people on this list are computer savvy. My mother isn't.  My
students are limited.  None of them have ebook readers.  They are not going
to spend $99 to read a book, especially one that isn't new..  And lots of
people read paper books in bed- and would prefer one over an ebook that will
cost them $99 to read, and has a limited number of options.  I can go to my
local library and get any number of books for free, and curl up in bed with
them.  I can order even more through interlibrary loan, and get them in a
matter of days.  I can get new ones and old ones, of my own choice, in a
variety of editions- not just what someone has chosen to put on the net.  I
can get books with illustrations in color or black and white.  A book like
Alice in Wonderland can be read in versions that contain illos by Rackham,
Tenniel, or more modern illustrators- and the books are usually large enough
for me to share with a child. For instance, i love childre's books- I was in
a store today and read through several by Chris Van Allsburg.  I'm sorry I
didn't get to the store earlier- he was signing books.  i would have bought
one for him to sign.  I don't think he signs books on machines.
  Why do I like paper? I don't need batteries or electricity.  All I need is
sunlight or a candle.  I can pack a book in plastic and come back 20 years
later to find it in working order- the technology won't have changed.  In
100 years I'll be able to read it too, in many cases.

  I can't read books I once downloaded to 5" floppies.  Soon, I won't be
able to read books downloaded to 3" floppies.  I can't load books to some
earlier versions of ebooks.  I can however read paper books without having
to change the typeface myself, and I can carry them with me anywhere. I
don't read twenty books at once.  I read one.  And I can glance ahead, go
back, look at he page next to the one I'm reading, put the book down next to
another book and compare the information in the two without having to spend
$198, and a whole lot of other things.

  In NYC where I live, we are in love with technology.  We have one of the
oldest subway systems. WiFi is very popular among the upper middle classes.
We pretty much all carry cellular phones and use them constantly.  We live
for our iPods and mp3 players. We are wired to the max- and in the subway,
on the street and in cafes, we read paper books and paper magazines and
paper tabloids.  And we don't even have to change the type or pay a
class-separating $99 for the right to read. When my local Barnes and Noble
was selling ebooks here a few years ago, people in this high tech city, in
the shadow of what was then Silicon Alley- very few sold. I look forward to
the OQO and some of Sony's new products, but pay $99 to have the right to
read old Tom Swift books with no pictures and not even the enticing smell
that old books have?  When the machine you talk about can't carry the texts
I actually need on a regular basis, because I'm an academic? When pretty
much all the fun books I love are only in paper form, and have pictures and
other temptations to boot??

  Good Lord, woman, what on earth do you read?  Are you honestly saying that
every book you will ever want to read is on a computer?  That every book you
love is usually out of print and copyright, or is on the level of John
Grisham?  Are you truly saying that you think the best version of Treasure
Island in on a machine, and not between the pages of a book with color
illustrations by NC Wyeth?  Are you saying that you never look at art books,
cookbooks, science books? That you only read popular literature and authors
who have been dead for about a century? That the latest information about
Africa or Asia was written in 1910?

  I have a couple of first editions by anthropologists. None of the ones I
have are online.  They smell musty.  I know that somewhere along the line,
another anthopologist loved those books like I love them.  Not just the
words- the books.  They have marginalia. The fact that they are marked up
makes me love them all the more. One day I will die and someone else will
love my books and see the comments I made.  They will know what I read- not
only the book they will hold, but other books I mentioned i the marginalia.
I will be putting a message in a bottle that will turn up in the future.  I
have other books that are used copies from academic bookstores- the margins
told me what Professor So-and-so thought was important to his students.  The
notes helped me get through grad school.  I made my own notes, sole the
books, and passed them on. I cannot pass on a $99 machine. The individual
books help the people who need them.  A machine can only be held by one
person and the data can be lost.

  I have a cookbook that belonged to my mother-in-law, now senile, that she
gave to my husband, now dead.  It has notes from all three of us, and stains
from our cooking.  I sometimes take it to bed with me, to check recipes the
day before a holiday meal.  You must think I'm mad to love a physical book
that I will pass down to some relative of mine, who will know from which
pages were stained the most what the best recipes are.  There are
thumb-prints all over it, and it smells vaguely of milk- it holds my
favorite quiche recipe.  I have another book that is made up of xeroxes and
has illustrations on how exactly to prepare certain medieval recipes.  The
illustrations are important to me.  i can hod that and take it to bed, too.

  When NY lost the Twin Towers a few years ago, i thought of what I would
take with me if we were ever bombed and I could get out.  My computer was
not on the list. I thought of things I could use without a battery or any
outside power. In an emergency, I could use my cookbook and read Alice in
Wonderland, so I would take them along with a pot and some matches.So--you
want a good look at the future of ebooks? Brothers and sisters, it's here.
Of course technology will improve. It is the job of technology to improve.
But every time it does, there will be a sufficient span of time for PG and
its descendants to change the filing system into one that can remain
readable. <<

  Someone used the example of Beowulf. Uh, yeah. That's a good one. I can't
possibly read it in its original language, but I can read it in my original
language whenever I want it. Some people find Chaucer unreadable. I don't,
but I'm glad it's available in modern English for people who can read it
only that way.

  >>It's available in modern English in book form.  A good modern
translation is by Seamus Heaney.   The original text is reprinted, for those
of us who want to check accuracy. <<

  After all, you can't understand Shakespeare unless you read him in the
original Klingon.

  >>You can- if you're educated.  Plenty of people understand Shakespeare.
Even high school students. People in Italy can read Shakespeare. Tiny
children can also- they could at the beginning of this century.  My
badly-educated, at-risk high school students were able to understand
Shakespeare.  If you don't, that says more about you than it does about
early modern English. And some of us can even parse Beowulf- with a
two-language version (which is how it's usually printed) the average person
can read an amazing amount of it in the original, or at least grasp it.
  Maybe if you stopped reading Star Trk novels as literature, you'd realize
you read Shakespeare's language pretty much every day. His turns of phrase
are used all the time, and can be understood by people of all economic level
who have the desire to read an learn- even people who cannot affor $99
ebooks to read Stephen King novels (not theat Stephen King is bad, but
there's more to reading than that).<<

  Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041113/e56fa2a7/attachment-0001.html
From sly at victoria.tc.ca  Sun Nov 14 00:14:11 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Nov 14 00:14:30 2004
Subject: !@!@!RE: [gutvol-d] Perfection
In-Reply-To: <419671D0.863F7CE1@ibiblio.org>
References: <EIEBIMKJHGGGNFLDOPPKOEMGCJAA.mbuch@mcsp.com>
	<Pine.LNX.4.60.0411131031390.32321@pglaf.org>
	<419671D0.863F7CE1@ibiblio.org>
Message-ID: <Pine.GSO.4.58.0411140000030.9001@vtn1.victoria.tc.ca>


On Sat, 13 Nov 2004, Michael Dyck wrote:

>
> I'm curious: How many such amalgams has PG produced?
> What was the latest?
>

There is no effort to count them, so I doubt you could
get a reliable number.

One I remember clearly doing was "Roughing it in the Bush" by
Susanna Moodie. I used as my basis a text online at another site,
which curiously enough, had a very scholarly citation of
exactly what it used as its source, although it still had
a large number of evident transcription errors.
(Also, it said that it was based on the 1852 first edition.
Although, learning about the publishing history of this text,
I found that there were varying forms of the first edition,
as corrections were being made to the plates _during_ the
printing process.)


Also, on the topic of alamgams, it may help to realize that
sometimes corrections are made to a PG text using a different
edition than was originally used.

Andrew
From lofstrom at lava.net  Sun Nov 14 00:56:23 2004
From: lofstrom at lava.net (Karen Lofstrom)
Date: Sun Nov 14 00:56:43 2004
Subject: [gutvol-d] I'm sorry, I don't get it 
In-Reply-To: <20041114023523.GA9326@panix.com>
References: <009301c4c9df$349c3940$6401a8c0@enterprise>
	<20041114023523.GA9326@panix.com>
Message-ID: <Pine.BSI.4.58.0411132253440.18747@malasada.lava.net>


On Sat, 13 Nov 2004, Jim Tinsley wrote:

> I'm afraid, John, that you are missing a great deal, as am I. We are
> both rather lowly, and obviously can't grasp the Big Picture that is
> so clear to those unencumbered by experience of making etexts, and who
> have lots of time to contemplate the Big Picture since they aren't
> spending their time actually doing useful work for PG.

I'm one of the people who wants scholar-friendly editions and I've
proofread my 14K pages over the last year and half. Not to mention a
little scanning and post-processing.

-- 
Karen Lofstrom
(Zora on DP)
From jeroen at bohol.ph  Sun Nov 14 01:59:09 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Sun Nov 14 01:58:52 2004
Subject: [gutvol-d] Gone with the wind i s "Gone with the wind"
In-Reply-To: <00db01c4c9dd$51c46f80$b79495ce@net>
References: <006d01c4c8cb$e4d38440$069595ce@net> <4195639D.5060600@bohol.ph>
	<00db01c4c9dd$51c46f80$b79495ce@net>
Message-ID: <41972C6D.5010209@bohol.ph>

Norm Wolcott wrote:

>The volume seems to have been deleted in August 2003. There are some
>fragments remaining, 1.67/2.24 MB .  I believe that when removed from a site
>they are also removed from the archives.  Also many zip files have been
>removed. Could be that the earlier defective versiions somehow did not get
>removed.
>  
>
I got a version from there, don't try the zip files, but the txt files 
(1.0 version) are still there. I managed to get one complete... Now 
looking into having a Philippines based website set-up. Problem is, most 
Philippine hosting providers rent their servers in the U.S., very few 
have servers actually located in the Philippines. Also, a single book is 
not worth having a separate server at about $12 per month, and we need 
to discuss beforehand of the hosting provider about copyright policies 
of the hosting provider. Of course I have a lot of Philippine related 
works to be added to the collection, but to date they all happily can 
reside on the PG US server. (That might start once we want to tackle 
more recent Philippine PD works)

In Holland, Bits of Freedom recently placed a long PD copy of one of 
Multatuli's works on-line, and then, using a yahoo (!) mail account  a 
fake lawyer under the name "Droogstoppel" (should ring a bell with 
everybody who has read his most books) send a serious looking cease and 
desist letter to a large number of providers. Most complied immediately, 
even without notifying the owner, or verifying the copyright status 
(which was in fact clearly explained with the copy itself). Only one 
gave the correct answer that the work was public domain, and the request 
could not be honored.

Jeroen.

From j.hagerson at comcast.net  Sun Nov 14 05:12:09 2004
From: j.hagerson at comcast.net (John Hagerson)
Date: Sun Nov 14 05:12:30 2004
Subject: [gutvol-d] What do scholars want?
Message-ID: <00a301c4ca4b$8e657bf0$6401a8c0@enterprise>

>From the prospective of a peasant.

It appears that the most important thing that scholars want is immutability.
A dead tree copy of a book can't be changed, so they can go on endlessly
about which dead tree copy is "better" than any other dead tree copy (I know
where all of the errors are, and you don't, so there!). Even though,
eventually, dead tree copies wear out, are burned up in fires, are
carelessly discarded, or sold off to make space, etc., etc., they don't
change.

Therefore, an electronic copy is unacceptable because:

1. Maybe it is not the exact representation of a dead tree copy. This is
entirely unacceptable because "my" dead tree copy is better than all of the
others.

2. Its URL might change and then I couldn't find it.

3. Worse yet, the URL doesn't change but the text does. (See point 1.)

It appears that we need to modify the PG web site to include checksum and
CRC data on each of our files to provide a mechanism of verifying that they
have not been nefariously modified after download, so "my" electronic copy
can be judged the same as "your" electronic copy.

I fall back to my earlier point: What would be better when you're submitting
research than to include a copy of every item of source material? This is
not done with dead trees because we do not have a mechanism to instantly
create an exact duplicate of a given piece of material for free in the dead
tree world. Such a mechanism does exist in the electronic world. When
academia wakes up to this fact, maybe their negativity toward electronic
copies will lessen somewhat.


From tb at baechler.net  Sun Nov 14 07:13:53 2004
From: tb at baechler.net (Tony Baechler)
Date: Sun Nov 14 07:12:10 2004
Subject: [gutvol-d] What do scholars want?
In-Reply-To: <00a301c4ca4b$8e657bf0$6401a8c0@enterprise>
Message-ID: <5.2.0.9.0.20041114071022.01fe1c50@snoopy2.trkhosting.com>

At 07:12 AM 11/14/2004 -0600, you wrote:
>It appears that we need to modify the PG web site to include checksum and
>CRC data on each of our files to provide a mechanism of verifying that they
>have not been nefariously modified after download, so "my" electronic copy
>can be judged the same as "your" electronic copy.

Yes, but even CRC, hash or md5 values can be forged.  All someone would 
need to do is somehow compromise the PG server.  That has happened with a 
main Debian and gnu server already.  How would we make sure that the hashes 
are real?  One solution is gpg signatures, but then someone needs to 
download and install gpg, a tool to verify the hash, plus the actual text 
file.  The average user won't know how to do this and wouldn't even if they 
could.  Not to mention that the hash and signature process would have to be 
done every time one byte is changed in the original, such as for correcting 
errors. 

From hart at pglaf.org  Sun Nov 14 07:37:52 2004
From: hart at pglaf.org (Michael Hart)
Date: Sun Nov 14 07:37:55 2004
Subject: !@!Re: [gutvol-d] What do scholars want?
In-Reply-To: <5.2.0.9.0.20041114071022.01fe1c50@snoopy2.trkhosting.com>
References: <5.2.0.9.0.20041114071022.01fe1c50@snoopy2.trkhosting.com>
Message-ID: <Pine.LNX.4.60.0411140728410.12294@pglaf.org>


On Sun, 14 Nov 2004, Tony Baechler wrote:

> At 07:12 AM 11/14/2004 -0600, you wrote:
>> It appears that we need to modify the PG web site to include checksum and
>> CRC data on each of our files to provide a mechanism of verifying that 
>> they
>> have not been nefariously modified after download, so "my" electronic copy
>> can be judged the same as "your" electronic copy.
>
> Yes, but even CRC, hash or md5 values can be forged.  All someone would need 
> to do is somehow compromise the PG server.  That has happened with a main 
> Debian and gnu server already.  How would we make sure that the hashes are 
> real?  One solution is gpg signatures, but then someone needs to download and 
> install gpg, a tool to verify the hash, plus the actual text file.  The 
> average user won't know how to do this and wouldn't even if they could.  Not 
> to mention that the hash and signature process would have to be done every 
> time one byte is changed in the original, such as for correcting errors.


Nothing more is needed for this than "compare."

This has been discussed widely over the years,
and the simple and easy solution, for those who
really want to test the files, is simply to get
a few copies of the eBook in question from some
different sources and test them with any of the
various "file compare" programs that come with
virtually all operating systems.

Thus, even if just one ";" were changed to a ":"
it would show up immediately, something that a
careful proofreader might still miss.

This totally avoids the possibility raise above
of forged CRCs or hashes, and eliminated a need
for any extra work on eBook preparation.

Anyone can run the tests, themselves, without a
reliance on outside authorities to tell them if
one eBook edition is any different than another,
and exactly how different it is.

Simple, fast and effective, the way the entire
eBook process should be.


Michael S. Hart
From marcello at perathoner.de  Sun Nov 14 07:42:01 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Nov 14 07:42:06 2004
Subject: [gutvol-d] What do scholars want?
In-Reply-To: <00a301c4ca4b$8e657bf0$6401a8c0@enterprise>
References: <00a301c4ca4b$8e657bf0$6401a8c0@enterprise>
Message-ID: <41977CC9.7040406@perathoner.de>

John Hagerson wrote:

> It appears that we need to modify the PG web site to include checksum and
> CRC data on each of our files to provide a mechanism of verifying that they
> have not been nefariously modified after download, so "my" electronic copy
> can be judged the same as "your" electronic copy.

We already have hashes for all our files. That's the way KaZaa and other 
P2P networks work. We keep more hashes for every file than you may want 
to know about: md5, sha1, kazaa, ed2k and tigertree.

If you go to the bibrec page and hover over the P2P link you can see 
them. (or copy the link into an editor)


But I still don't understand what good it will do you if you know the 
hash of the "original" file?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From hart at pglaf.org  Sun Nov 14 09:39:46 2004
From: hart at pglaf.org (Michael Hart)
Date: Sun Nov 14 09:39:48 2004
Subject: [gutvol-d] future of ebooks + Scholarly use of PG + books to
	taketo bed
In-Reply-To: <EIEBIMKJHGGGNFLDOPPKEENJCJAA.mbuch@mcsp.com>
References: <EIEBIMKJHGGGNFLDOPPKEENJCJAA.mbuch@mcsp.com>
Message-ID: <Pine.LNX.4.60.0411140938270.12294@pglaf.org>


Where did this idea that the $99 for the Fictionwise Rocketbook
is only for a license to read out of date books come from?

Can someone tell us the real deal about this?

Thanks!

mh

From hart at pglaf.org  Sun Nov 14 09:54:44 2004
From: hart at pglaf.org (Michael Hart)
Date: Sun Nov 14 09:54:45 2004
Subject: [gutvol-d] Re: [ebook-community] Ask The eBook Community: Page
 Scans versus Real  eText?
In-Reply-To: <4196D9ED.4BC8D6F7@hidden-knowledge.com>
References: <931037570875.20041113202609@noring.name>
	<4196D9ED.4BC8D6F7@hidden-knowledge.com>
Message-ID: <Pine.LNX.4.60.0411140949070.12294@pglaf.org>


On Sat, 13 Nov 2004, Michael Ward wrote:

>
> Jon, two major reasons to convert from page scans:
>
> 1. File size, text vs. jpeg.
> 2. Access to the -content- [semantics, meanings, words]
>  a.  Search
>  b.  Indexing
>  c.  Quoting
>  d.  Footnoting
>  e.  Links
>
>
> Michael Ward
> Hidden Knowledge


Let's not forget the following:

3.  Changing the -content-
  a.  Correcting typos and other errors
  b.  Adding footnotes, comments, prefaces, intros,
      epilogues from other public domain sources;
      or your own.
  c.  Machine Transltion
  d.  Inserting missing lines, paragraphs, etc.

From hart at pglaf.org  Sun Nov 14 10:02:39 2004
From: hart at pglaf.org (Michael Hart)
Date: Sun Nov 14 10:02:41 2004
Subject: [gutvol-d] Page Scans versus Real eText?
In-Reply-To: <211038107671.20041113203506@noring.name>
References: <211038107671.20041113203506@noring.name>
Message-ID: <Pine.LNX.4.60.0411140956310.12294@pglaf.org>


Replying to both lists. . .mh

On Sat, 13 Nov 2004, Jon Noring wrote:

> [I posted the following to The eBook Community. But clearly most of
> the real experts in digitizing texts are found right here on gutvol-d,
> so I'm reposting the message here.
>
> I'm especially curious in the economics associated with commercially
> doing what Michael Hart and PG has done for years and years.]

My guess is that by the time there is a serious commercial eBook industry,
say to the point where the next David Letterman and Jay Leno are joking
about eBooks then, the way they were about the Web 5-10 years ago,
that most of the public domain books will already have been scanned,
OCRed, placed online, and will be going through translations into
the various languages of the world.

If not most, then certainly most of the ones that were easy to find
and of general interest.  But still, at the rate things have been
going over the past 15 years, another 15 years should put us only
a few years from this goal, well within sight, before the commercial
eBook industry is developed enough to be part of cultural awareness.

Michael S. Hart
Project Gutenberg

From hart at pglaf.org  Sun Nov 14 10:22:33 2004
From: hart at pglaf.org (Michael Hart)
Date: Sun Nov 14 10:22:35 2004
Subject: [gutvol-d] Re: [ebook-community] Ask The eBook Community: Page
 Scans versus Real  eText?
In-Reply-To: <Pine.LNX.4.60.0411140949070.12294@pglaf.org>
References: <931037570875.20041113202609@noring.name>
	<4196D9ED.4BC8D6F7@hidden-knowledge.com>
	<Pine.LNX.4.60.0411140949070.12294@pglaf.org>
Message-ID: <Pine.LNX.4.60.0411141022180.12294@pglaf.org>


oh. . .I forgot perhaps the most important,

  e.  Putting the books in your own favorite
      font, character size, margination.


From hart at pglaf.org  Sun Nov 14 10:26:03 2004
From: hart at pglaf.org (Michael Hart)
Date: Sun Nov 14 10:26:06 2004
Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG
	Audience, etc.]
In-Reply-To: <009301c4c9df$349c3940$6401a8c0@enterprise>
References: <009301c4c9df$349c3940$6401a8c0@enterprise>
Message-ID: <Pine.LNX.4.60.0411141023530.12294@pglaf.org>


On Sat, 13 Nov 2004, John Hagerson wrote:

> #define SOMEWHAT_PEEVED 1
> I'm a student (or a researcher). I'm writing a paper. I'm planning to cite
> an e-book.
>
> Why don't I just DOWNLOAD the eBook to my computer and append it to my
> paper? Then, it doesn't make any difference what happens to the eBook left
> in the ether. TEACHER, HERE IS THE COPY *I* USED.

Personally, I like that idea, but for those who don't WANT a footnote
the size of an entire book, perhaps putting in a separate file, or disk,
would be better. . .though _I_ would be tempted just to put it on my OWN
little web page, with my OWN URL:  then I don't have to worry about some
other problems, such as someone changing the filename, dirname, URL,
or deleting the entire book, directory, or site. . .etc.

Michael

From holden.mcgroin at dsl.pipex.com  Sun Nov 14 10:52:19 2004
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Sun Nov 14 10:52:30 2004
Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience,
	etc.]
In-Reply-To: <009301c4c9df$349c3940$6401a8c0@enterprise>
References: <009301c4c9df$349c3940$6401a8c0@enterprise>
Message-ID: <4197A963.3050204@dsl.pipex.com>

John Hagerson wrote:
> #define SOMEWHAT_PEEVED 1
> I'm a student (or a researcher). I'm writing a paper. I'm planning to cite
> an e-book.
> 
> Why don't I just DOWNLOAD the eBook to my computer and append it to my
> paper? Then, it doesn't make any difference what happens to the eBook left
> in the ether. TEACHER, HERE IS THE COPY *I* USED.

I'm just curious here but which journal would be willing to publish the 
full text of all references cited? Certainly none of those I've had 
papers in.

Cheers,
Holden
From Gutenberg9443 at aol.com  Sun Nov 14 10:58:28 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Sun Nov 14 10:58:37 2004
Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience,
	etc.]
Message-ID: <1e.386f7476.2ec904d4@aol.com>

 
In a message dated 11/14/2004 11:52:40 AM Mountain Standard Time,  
holden.mcgroin@dsl.pipex.com writes:

>>I'm just curious here but which journal would be willing  >>to publish the 
>>full text of all references cited?  Certainly none of those >>I've had 
>>papers  in.


None. That's why Michael's suggestion--store your references on a personal  
Website and put a URL in your bib--is better than this suggestion.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041114/5fc22fe2/attachment.html
From hart at pglaf.org  Sun Nov 14 11:07:31 2004
From: hart at pglaf.org (Michael Hart)
Date: Sun Nov 14 11:07:33 2004
Subject: [gutvol-d] PG Is Not A Xerox Machine, Yet [was Perfection]
In-Reply-To: <1131024517078.20041113164835@noring.name>
References: <EIEBIMKJHGGGNFLDOPPKOEMGCJAA.mbuch@mcsp.com>
	<Pine.LNX.4.60.0411131031390.32321@pglaf.org>
	<1131024517078.20041113164835@noring.name>
Message-ID: <Pine.LNX.4.60.0411141038360.17421@pglaf.org>


re: Jon Noring reply to "Re: !@!@!RE: [gutvol-d] Perfection:

Jon brings up several points that are between the past and the future,
and obviously he has some differing points of view as to when each of
these events might be placed on the calendar.

The obvious point right now is whether Project Gutenberg should be
doing several possible editions of each eBook, or should be comparing
several different editions and creating our own edition that we hope
will eventually be better than any of the previous paper editions.

Jon says we should be doing separate editions, due to advances in
disk space, download speed, and the time when Distributed Proofing
will be doing 1,000 eBooks per day.

*

If we presume this is going at a rate of about 10 per day [we are
at just about 11 per day in reality] and that this rate should be
doubling at Moore's Law rates, then we would have this scenario:

Bks/
Day Years   Date

  10  0      2004
  20
  40  3      2007
  80
160  6      2010
320 
640  9      2013
1K+ 10      2014+


I agree that when all of these have been integrated into the world
of 75% to 90% of even our own portion of the Internet for several
years [enough time to do our first eBooks of most of these books]
then it will certainly be time to start including variant editions,
as we have already done with some of the great works such as those
of Shakespeare, Dante, the Bible, etc.

In fact, my own estimate of the time we will have 1,000,000 eBooks
certainly lies within the realm of Jon's suggested 1,000 per day.
By that time, we will probably be finding it harder and harder to
track down all the editions we have yet to do, and it will be a
matter of very good timing to start in on creating all variants
of the editions Jon wants us to have.

Hopefully by this time, OCR will be so accurate that the dream of
simply using it as one would use a xerox machine, will be closer
to reality.  In the interim, perhaps we can simply make available
various eBook editions that do and don't include any corrections
of typos, missing words, lines, paragraphs, etc.

This, along with perservation of the original scans, should allow
for a timely revision of any and all eBooks we produce.

With the aid of various "diff" and "compare" programs, editors
can even proofread the same eBook into the various composite or
non-composite editions Jon suggests we should have.

Anyone who wishes to volunteer to assist Jon in his efforts
should let us know, and we will work up a listserver and
other support for this effort.

Michael S. Hart

P.S.
The day should eventually come when such efforts are no longer
required at the human level, and Jon can simply scan and OCR
each separate edition with a sufficient level of accuracy that
it could either stand immediately on its own, or do so with only
a small amount of human intervention. . .less effort than it may
take to work from a previous scan of a different paper variant.

From jtinsley at pobox.com  Sun Nov 14 11:40:32 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Sun Nov 14 11:40:41 2004
Subject: [gutvol-d] I'm sorry, I don't get it
In-Reply-To: <Pine.BSI.4.58.0411132253440.18747@malasada.lava.net>
References: <009301c4c9df$349c3940$6401a8c0@enterprise>
	<20041114023523.GA9326@panix.com>
	<Pine.BSI.4.58.0411132253440.18747@malasada.lava.net>
Message-ID: <20041114194032.GA17627@panix.com>

On Sat, 13 Nov 2004 22:56:23 -1000 (HST), Karen Lofstrom <lofstrom@lava.net> wrote:

>
>On Sat, 13 Nov 2004, Jim Tinsley wrote:
>
>> I'm afraid, John, that you are missing a great deal, as am I. We are
>> both rather lowly, and obviously can't grasp the Big Picture that is
>> so clear to those unencumbered by experience of making etexts, and who
>> have lots of time to contemplate the Big Picture since they aren't
>> spending their time actually doing useful work for PG.
>
>I'm one of the people who wants scholar-friendly editions and I've
>proofread my 14K pages over the last year and half. Not to mention a
>little scanning and post-processing.

Yes, you have, and by virtue of that you're due a better answer. But
the answer is one you've heard before, so I'm not sure how much good
repeating it is going to do.

The answer is not PG-specific; it's common wisdom: "If you want
something done, do it yourself", or, extrapolating from advice to
writers: "Show, don't tell." There's also "The journey of a thousand
miles starts with a single footstep", and "Better to light one candle
than curse the darkness". Take your pick.

Nothing in PG ever happened because someone said that other people
should do things. When Michael typed in that first text, it wasn't
because some friend bemoaned the lack of e-texts and said that Someone
Should Do Something About It. When Charles Franks created DP, it
wasn't because anyone nagged him. We're in the middle of trying to
hammer out a good XML solution, so that issue is a bit hot, but it
didn't get as far as it has, nor will it get resolved, because anyone
insisted others should work for their agenda; it got as far as it has
because a few people who believed in it as a way forward actually
rolled their sleeves up and did the work. And they are the ones who
are going to make it happen.

I could give many examples of smaller ways in which PG changed, and
all of them had some person or persons behind them, who actually did
the work, because they thought the work should be done. Often, people
think they've got a good idea. Sometimes they work toward it. Of those
who work toward it, some give up, some discover that it wasn't a good
idea after all, and some show that it was a good idea, and find others
to join them in the work. Very often, somebody identifies a need, but
identifying that need doesn't cause progress all by itself.

Charles thought, back in 2000, that page images would become desirable
in the future. This wasn't revolutionary -- people had been talking
about it in PG for some time -- but the difference was that he _did
it_. And so PG will have, someday, when we work out the mechanisms,
all or most of the page scans that went through DP. I thought, a year
or so ago, that page scans were about to become practical, and we
built it into the new filesystem structure, and I worked with a couple
of producers to get samples posted. Now we've got some posted, and
there are going to be more. If you like that idea, work towards it!
If you don't, ignore it.

I'm personally not convinced that _I_ should spend the hours of my
life pursuing your agenda. If you want to convince me that I should,
show (don't tell) me what such a "scholar-friendly" program and its
output would look like; because you surely are not going to convince
me any other way*.

[Footnote *: And that by itself might not be enough either; I'd want
to believe that academics really would _use_ it. And BTW, I really
wish it were possible to ask Dorothy Parker to define the difference
between an academic and a scholar. I bet it'd be a good one! :-) ]

If I wanted to do something towards a "scholar-friendly" PG, I'd first
draw up a list of specifications for a candidate text, and then
compare them against Charlz's writings about an OLS, against
"competitor" sites like Bartleby and Perseus (and the OTA, if I felt
in need of a laugh!) and against any statements I could find made by
academics, and revise my list accordingly.

I would then seek out and collar one or more academics, and ask them
what would persuade them to use etexts in their work -- say, as the
prescribed edition of a class text. I might well have made what I felt
was a good example of what they would want to show them, so they could
criticize by comparison -- blind laundry-lists of requirements are
often unhelpful.

I would incorporate what I thought best from that experience into a
prototype "scholarly e-text". I would show this to anyone who would
look, and several people whose eyes I'd have to tape open for the
experience, in PG or outside, and ask for comments and for other
people to join me. Around that time, I would start to get an idea
whether I was on the right track or not.

Given that we have 14,000 texts available, and that starting from any
one of them massively cuts the work needed to produce some form of
whatever you think of as a "scholarly e-text", it is obscenely
frustrating to hear people exhort _me_ to work toward _their_ goals,
when all the material is there for them to do it themselves.

But you, at least, have helped towards getting those 14,000 books
posted, and I therefore credit you with some standing, and grasp of
what is involved. If you want to work toward your goal, you'll have
to convince me by showing, but you won't have to tape my eyes open. :-)


jim

From hart at pglaf.org  Sun Nov 14 11:40:45 2004
From: hart at pglaf.org (Michael Hart)
Date: Sun Nov 14 11:40:47 2004
Subject: !@!@!RE: [gutvol-d] Perfection
In-Reply-To: <419671D0.863F7CE1@ibiblio.org>
References: <EIEBIMKJHGGGNFLDOPPKOEMGCJAA.mbuch@mcsp.com>
	<Pine.LNX.4.60.0411131031390.32321@pglaf.org>
	<419671D0.863F7CE1@ibiblio.org>
Message-ID: <Pine.LNX.4.60.0411141137370.17421@pglaf.org>


On Sat, 13 Nov 2004, Michael Dyck wrote:

> Michael Hart wrote:
>>
>> On Sat, 13 Nov 2004, Her Serene Highness [Michele Dyck?] wrote:
>
> "Her Serene Highness" is Michele, but given her email address,
> I doubt her last name is Dyck. Mine is, though.

OK, then I'm still a little in the dark,
as we have one other "Her Serene Highess"
who has contributed as well. . . .

>
> Michael Hart:
>>
>> ... I didn't want to provide a different Project Gutenberg eBook
>> for each and every single paper edition out there, and then have
>> to keep canonical errors [sic] in them for all time.
>
> You say "didn't". Do you still feel this way?

Eventually, when OCR is about as good as xeroxing,
then it shouldn't be much effort to scan multiple editions.
See previous note w/ xerox in header.

>> I wanted to created a "critical edition" that combined corrections
>> and items from various editions,
>
> I'm curious: How many such amalgams has PG produced?
> What was the latest?

Couldn't tell you, but every time a new proofer sends in errors,
it's more likely some were researched from a different edition.


>> and we have always supplied the necessary information for citing
>> our eBooks on request,
>
> But that's not apparent to someone reading a PG eBook, I think.
> E.g., the PG boilerplate doesn't have a sentence like:
>    To find out what printed edition(s) this eBook was
>    created from, send a request to someone@pglaf.org.

Usually they just send an email asking how to cite, and I send:


Bibliographic information comes from any full record displayed by either: 
the Project Gutenberg Search Engine (http://promo.net/cgi-promo/pg/t9.cgi)
Project Gutenberg Catalog Browser (http://promo.net/cgi-promo/pg/cat.cgi).

For an example, if you use Canterbury Tales from our collection,
you'll get the following card information:

      AUTHOR: Chaucer, Geoffrey, circa 1340-1400
         AKA: 
ADD. AUTHOR: Purves, D. Laing, Editor --
       TITLE: Canterbury Tales, and Other Poems
     SUBJECT:
   LOC CLASS: PR --
       NOTES:
    LANGUAGE: English  -

    DOWNLOAD:

cbtls10.txt - 1.62 MB 
cbtls10.zip - 641 KB


Chaucer, Geoffrey, circa 1340-1400. - 2000. - Canterbury Tales, and Other Poems
- Urbana, Illinois (USA): Project Gutenberg.
Etext #2383. - First Release: Nov 2000 - ID:2862

Where the last three lines should be your bibiliographic information.

Hope this helps,

So nice to hear from you!!


Michael S. Hart
<hart@pobox.com>
Project Gutenberg
"*Ask Dr. Internet*"
Executive Coordinator
"*Internet User ~#100*"
From hart at pglaf.org  Sun Nov 14 11:43:08 2004
From: hart at pglaf.org (Michael Hart)
Date: Sun Nov 14 11:43:09 2004
Subject: [gutvol-d] Page scans (go for it!)
In-Reply-To: <221011183281.20041113130622@noring.name>
References: <20041113050417.9F29A8C914@pglaf.org>
	<p06020404bdbb4ec0f1cc@[192.168.0.3]>
	<20041113190901.GA5711@pglaf.org>
	<221011183281.20041113130622@noring.name>
Message-ID: <Pine.LNX.4.60.0411141141510.17421@pglaf.org>


I thought we had a plan to save all page scans nearly a year ago.

Greg told me he thought that Charles Franks had them, but both
are on the road/vacation right now, so not sure how to check.

We'll see.

Michael

From hart at pglaf.org  Sun Nov 14 11:45:42 2004
From: hart at pglaf.org (Michael Hart)
Date: Sun Nov 14 11:45:44 2004
Subject: [gutvol-d] PG audience
In-Reply-To: <419654D0.3080204@perathoner.de>
References: <EIEBIMKJHGGGNFLDOPPKOEKOCJAA.mbuch@mcsp.com>
	<41950197.2020707@perathoner.de> <wkwtwqpah9.fsf@chenla.org>
	<Pine.LNX.4.60.0411130922300.32321@pglaf.org>
	<419654D0.3080204@perathoner.de>
Message-ID: <Pine.LNX.4.60.0411141144360.17421@pglaf.org>


On Sat, 13 Nov 2004, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>> They don't realize the the walls of academia have been penetrated
>> by the virtual world. . .for them to try to stop eBooks is like
>> James Watson's efforts to stop Craig Venter from mapping DNA,
>> or even his efforts to stop the model building Crick 50 years ago.
>
> Well, well, capitalism *has* to be good for something.
>
> So lets praise capitalism for kicking the clerics in the *** and freeing 
> information from the inprisonment in monasteries ... before we start kicking 
> capitalism in the *** for making information a proprietary article.

I'm not sure ANY of the above was done via capitalism. . . .

Certainly not Watson, Crick, Venter. . .or PG eBooks. . . .

;-)
From hart at pglaf.org  Sun Nov 14 11:49:35 2004
From: hart at pglaf.org (Michael Hart)
Date: Sun Nov 14 11:49:37 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <4196437F.9080905@perathoner.de>
References: <20041113050358.471844BE64@ws1-1.us4.outblaze.com>
	<200411130616.iAD6GcSm004979@posso.dm.unipi.it>
	<Pine.LNX.4.60.0411130907410.32321@pglaf.org>
	<4196437F.9080905@perathoner.de>
Message-ID: <Pine.LNX.4.60.0411141147180.17421@pglaf.org>

On Sat, 13 Nov 2004, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>> How much harder is it to make an eBook set up to answer all
>> these scholarly and reference questions, than just to read?
>
> Providing source information and page numbers is easy. So it is to provide 
> the page scans. Of course: page scans != ebook.
>
> Marking up a book to satisfy most scholarly requirements is more work than I 
> would care for, short of being paid to do it.

A big bug/feature for me is page number, bold, italic, underscore, etc.,
I would prefer an eBook without them. . .they are just too distracting,
I just want to read the CONTENT not the FORM.

I have heard people mention that creating both kinds of eBooks should
be easy from one session, but I'm not sure if anyone is DOING it.

BTW, bold, italic, etc., also mess up a lot of search/quotes.

Michael

From shalesller at writeme.com  Sun Nov 14 12:56:27 2004
From: shalesller at writeme.com (D. Starner)
Date: Sun Nov 14 12:56:38 2004
Subject: [gutvol-d] future of ebooks + Scholarly use of PG + books
	totaketo bed
Message-ID: <20041114205627.DB4B84BE64@ws1-1.us4.outblaze.com>

"Her Serene Highness" writes:
> Are you honestly saying that every book you will ever want 
> to read is on a computer??

Why are you so violent about this? Why can't you understand that
no one here is planning on torching the libraries, that ebooks
and paper books aren't exclusive?

> You must think I'm mad to love a physical book that 
> I will pass down to some relative of mine, 

Why do you assume that?

> My computer was not on the list. 

And neither were most of your books.

Your books are, of course, naturally inferior. Under even moderate enviromental
conditions, they will fade away in a hundred years, a few hundred years at the
best. Even in libraries they yellow and fade. They don't have the right smell,
they don't feel right in the hand. That's why they will never supersede stone
tablets.

>It's available in modern English in book form.? 

It's available in modern English in ebook form, too. The original text
is also available in ebook form, both from Project Gutenberg.

>> You can't understand Shakespeare until you read him in the original Klingon.

>You can- if you're educated.? 

Bah! The poorly translated English versions are but mere shadows of the
originals in the Warrior's Tongue! Your human biased education merely blinds
you to that fact!

> Maybe if you stopped reading Star Trk novels as literature, you'd 
> realize you read Shakespeare's language pretty much every day. 

Perhaps if you started reading Star Trek novels, you would realize
that reading doesn't have to be serious, and that our dreams aren't
circumscribed by the concepts of the Elizabetheans and Victorians,
that there is a wonderful future ahead of us, but it may require letting
go of our death-grip on the things of the past.

We may have a home on the moon or Mars sometime in the near future, but
if we do, the library will be composed of ebooks, not paper books. In
a world where every pound costs hundreds or thousands of dollars to
move, ebooks are a godsend.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From j.hagerson at comcast.net  Sun Nov 14 14:22:40 2004
From: j.hagerson at comcast.net (John Hagerson)
Date: Sun Nov 14 14:23:10 2004
Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience,
	etc.]
In-Reply-To: <1e.386f7476.2ec904d4@aol.com>
Message-ID: <00ab01c4ca98$7bffadd0$6401a8c0@enterprise>

The journals of the future, in the unlimited storage world that has been
postulated, will be delighted to publish articles with the complete text of
every cited source.

-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Gutenberg9443@aol.com
Sent: Sunday, November 14, 2004 12:58 PM
To: gutvol-d@lists.pglaf.org
Subject: Re: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG
Audience,etc.]

In a message dated 11/14/2004 11:52:40 AM Mountain Standard Time,
holden.mcgroin@dsl.pipex.com writes:
>>I'm just curious here but which journal would be willing >>to publish the 
>>full text of all references cited? Certainly none of those >>I've had 
>>papers in.
None. That's why Michael's suggestion--store your references on a personal
Website and put a URL in your bib--is better than this suggestion.
?
Anne


From kris at transitory.org  Sun Nov 14 21:24:53 2004
From: kris at transitory.org (kris foster)
Date: Sun Nov 14 21:25:09 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <419613D7.4080907@perathoner.de>
References: <20041112221426.87CDD10997C@ws6-4.us4.outblaze.com>
	<20041112182216.Y99646@krweb.net> <419613D7.4080907@perathoner.de>
Message-ID: <20041115000421.W99646@krweb.net>

> What makes medium permanence a value per se ?

(I agree with the remainder of your reply)

My argument is that the transience in electronic media is more deep than 
that of books (hopefully it's safe to leave stone tablets out of this). 
Beyond bit rot or pages fading, ebooks can be altered more easily than 
books both intentionally and unintentionally.  The source of the texts, GP 
mirrors and publishers, perish yet only a publisher's book remains.  And 
to be a little silly, the internet has shown it can survive for several 
decades, books have been proven for hundreds of years.

It was demonstrated in the message I replied to how quickly and easily 
ebooks can be used to find quotations.  An electronic citation then 
becomes little more than a convenience and an advertisement, which will 
likely have a shorter life span than the paper itself.  Are people ready 
to put their academic necks on the line?

To be constructive, how are the GP mirrors monitored to ensure consistency 
today?

--kris

> Academia has developed its traditions around a medium (papyrus, paper) that 
> is permanent. Not the other way around. If the medium they had used was 
> impermanent the methods and traditions of Academia would be different today.
>
>
> Medium permanence can be a big disadvantage too. The scholars in the middle 
> ages relied blindly on Aristotle. Scientific method in the middle ages 
> amounted to find out what Aristotle said about some subject, and that was 
> that. Own research was not deemed a scientific method.
>
> Of course, Aristotle said that "wood swims and metal sinks" and that "heavier 
> items fall faster than lighter ones".
>
>
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From joshua at hutchinson.net  Mon Nov 15 05:25:49 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Nov 15 05:25:52 2004
Subject: [gutvol-d] Perfection
Message-ID: <20041115132549.C606D9E751@ws6-2.us4.outblaze.com>


----- Original Message -----
From: Michael Hart <hart@pglaf.org>
> 
> Question:
> 
> How much harder is it to make an eBook set up to answer all
> these scholarly and reference questions, than just to read?
> 
> Michael
> 

As far as the ones produced at DP... neglible.  A few seconds to a few minutes time to include the information.  (We basically already have it, just needs some formatting applied).

Josh
From joshua at hutchinson.net  Mon Nov 15 06:01:28 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Nov 15 06:01:30 2004
Subject: [gutvol-d] future of ebooks + Scholarly use of PG + books
	totaketo bed
Message-ID: <20041115140128.4F9F14F491@ws6-5.us4.outblaze.com>


----- Original Message -----
From: "Her Serene Highness" <mbuch@mcsp.com>
> 
>   After all, you can't understand Shakespeare unless you read him in the
> original Klingon.
> 
>   >>You can- if you're educated.  Plenty of people understand Shakespeare.
> Even high school students. People in Italy can read Shakespeare. Tiny
> children can also- they could at the beginning of this century.  My
> badly-educated, at-risk high school students were able to understand
> Shakespeare.  If you don't, that says more about you than it does about
> early modern English. And some of us can even parse Beowulf- with a
> two-language version (which is how it's usually printed) the average person
> can read an amazing amount of it in the original, or at least grasp it.
>   Maybe if you stopped reading Star Trk novels as literature, you'd realize
> you read Shakespeare's language pretty much every day. His turns of phrase
> are used all the time, and can be understood by people of all economic level
> who have the desire to read an learn- even people who cannot affor $99
> ebooks to read Stephen King novels (not theat Stephen King is bad, but
> there's more to reading than that).<<
> 

First of all, Anne's comment was an off-the-cuff, tongue-in-cheek ... JOKE.

Second, I wouldn't make fun of Star Trek fans.  They tend to be higher educated and have read more things like Shakespeare than the average Joe (I don't have the link to source of the information, but I remember reading it somewhere).

Third, your comments are really making you sound like an academic elitist.  And I don't think you are or mean to be.

Josh
From j.hagerson at comcast.net  Mon Nov 15 06:26:05 2004
From: j.hagerson at comcast.net (John Hagerson)
Date: Mon Nov 15 06:26:22 2004
Subject: [gutvol-d] [etext04|etext05]/index.html  missing?
Message-ID: <00be01c4cb1f$0dae7be0$6401a8c0@enterprise>

I am using wget to download books from www.gutenberg.org. The process is
stuck on etext04 in what appears to be a futile effort to download
index.html.

The file must have been there last night, because I didn't have this
problem.

Could the appropriate person please look into this?

Thank you very much.


From gbnewby at pglaf.org  Mon Nov 15 07:35:03 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon Nov 15 07:35:05 2004
Subject: [gutvol-d] [etext04|etext05]/index.html  missing?
In-Reply-To: <00be01c4cb1f$0dae7be0$6401a8c0@enterprise>
References: <00be01c4cb1f$0dae7be0$6401a8c0@enterprise>
Message-ID: <20041115153503.GA13757@pglaf.org>

On Mon, Nov 15, 2004 at 08:26:05AM -0600, John Hagerson wrote:
> I am using wget to download books from www.gutenberg.org. The process is
> stuck on etext04 in what appears to be a futile effort to download
> index.html.
> 
> The file must have been there last night, because I didn't have this
> problem.
> 
> Could the appropriate person please look into this?

There's no index.html currently.
  -- gbn
From hart at pglaf.org  Mon Nov 15 08:52:43 2004
From: hart at pglaf.org (Michael Hart)
Date: Mon Nov 15 08:52:46 2004
Subject: [gutvol-d] Perfection
In-Reply-To: <20041115132549.C606D9E751@ws6-2.us4.outblaze.com>
References: <20041115132549.C606D9E751@ws6-2.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0411150851590.14763@pglaf.org>

On Mon, 15 Nov 2004, Joshua Hutchinson wrote:

>
> ----- Original Message -----
> From: Michael Hart <hart@pglaf.org>
>>
>> Question:
>>
>> How much harder is it to make an eBook set up to answer all
>> these scholarly and reference questions, than just to read?
>>
>> Michael
>>
>
> As far as the ones produced at DP... neglible.  A few seconds to a few 
> minutes time to include the information.  (We basically already have it, just 
> needs some formatting applied).
>
> Josh
>

Then lets run a few dozen of these up the flagpole,
and see what happens. . . .

Michael

From marcello at perathoner.de  Mon Nov 15 09:13:06 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon Nov 15 09:13:17 2004
Subject: [gutvol-d] [etext04|etext05]/index.html  missing?
In-Reply-To: <00be01c4cb1f$0dae7be0$6401a8c0@enterprise>
References: <00be01c4cb1f$0dae7be0$6401a8c0@enterprise>
Message-ID: <4198E3A2.6080504@perathoner.de>

John Hagerson wrote:

> I am using wget to download books from www.gutenberg.org. The process is
> stuck on etext04 in what appears to be a futile effort to download
> index.html.

The indexes are auto-generated on the fly by Apache.

If the load on the fileservers is too high the connection times out 
before a full directory listing can be retrieved.

You should not harvest at peak hours anyway.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From gbnewby at pglaf.org  Mon Nov 15 09:45:08 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon Nov 15 09:45:10 2004
Subject: [gutvol-d] [etext04|etext05]/index.html  missing?
In-Reply-To: <4198E3A2.6080504@perathoner.de>
References: <00be01c4cb1f$0dae7be0$6401a8c0@enterprise>
	<4198E3A2.6080504@perathoner.de>
Message-ID: <20041115174508.GB17511@pglaf.org>

On Mon, Nov 15, 2004 at 06:13:06PM +0100, Marcello Perathoner wrote:
> John Hagerson wrote:
> 
> >I am using wget to download books from www.gutenberg.org. The process is
> >stuck on etext04 in what appears to be a futile effort to download
> >index.html.
> 
> The indexes are auto-generated on the fly by Apache.
> 
> If the load on the fileservers is too high the connection times out 
> before a full directory listing can be retrieved.
> 
> You should not harvest at peak hours anyway.

One more thing (or two):
- you can't get the big directories via FTP.  Use HTTP.
(The FTP servers stop after 2K items).

- Don't use HTTP, use rsync.  See the mirroring HOWTO
at gutenberg.org/howto for more info (yes, you can use
rsync to just get particular directories, filename extensions,
etc.).

But if things are still weird, send something we can
replicate and we'll help fix it!
  -- gbn
From Gutenberg9443 at aol.com  Mon Nov 15 09:58:38 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Mon Nov 15 09:58:48 2004
Subject: [gutvol-d] future of ebooks + books to take to bed
Message-ID: <e2.68e1f35.2eca484e@aol.com>

 
In a message dated 11/15/2004 7:02:02 AM Mountain Standard Time,  
joshua@hutchinson.net writes:

Second,  I wouldn't make fun of Star Trek fans.  They tend to be higher 
educated  and have read more things like Shakespeare than the average Joe (I don't 
have  the link to source of the information, but I remember reading it  
somewhere).


I'm going to ramble about for a while here, but I am getting back to the  
topic eventually. So please put up with me.
 
My experience as a writing teacher, middle school through university,  has 
been that the more fantasy the student reads, the better the student's  
vocabulary is. Fantasy is the only popular genre, so far as I know, that revels  in 
vocabulary. Also, a person who plays Dungeons and Dragons is likely to know  
more about comparative mythology than anybody else out of graduate school. A  
good player of D&D reads extensively.
 
I don't play Dungeons and Dragons, but my son does. He's schizophrenic, and  
his mind flies off in all directions. When he was a child, if I told him to  
clean his room, he would sit down on the floor and cry, because he couldn't  
break "clean your room" into its component tasks. But if I told him, "Put your  
books away, put your clothes away, put your toys away, make your bed, and 
vacuum  your floor," he could do all that. When he first got into D&D, he spent a  
lot of time at the kitchen table drawing dungeons on graph paper. I kept him  
supplied with graph paper. Later, as I happened to be driving from Fort Worth 
to  Dallas, he spent the entire trip cross-examining me on comparative 
mythology. I  got most of the questions right. At that time he was in middle school 
and I had  an MA.
 
Twenty-five years of D&D later, most of them as an advanced  Dungeon-master, 
he is director of parking lot security at a major stock-car  race. By the time 
he got through examining the overall situation, deploying his  personnel 
optimally, and keeping an eye on all his personnel and everything that  happened 
in his jurisdiction, the race track director said that he (my son) had  done 
the best job of policing the parking lot that he (race track director) had  ever 
seen, and my son was instantly signed to bring his crew back the next year.  
Playing a much-maligned game, and reading much-maligned "junk" genre  fiction, 
 taught him sequence, analysis, and synthesis.
 
I don't call any books except pornography junk. Even if a kid is only  
reading Sweet Valley High, at least the kid is READING.
 
My opinion is that there is one main reason why many kids nowadays don't  
have the respect for the written word that kids several generations ago  had: 
they don't have time to read. Our youngest daughter, about halfway through  
seventh grade, began begging to be homeschooled. My husband and I vetoed it,  until 
the end of the year. At that time I gave her a few formal and informal  tests 
and was absolutely appalled. She had learned nothing, despite making  decent 
grades. We immediately granted her request, and we had to back her up to  
third-grade math and have her work forward.
 
One day the weather was thoroughly icky, and she was in her room. She came  
to me and said, "Mom, a funny thing just happened." When I asked what it was,  
she said, "Well, I thought I had read just a few pages, but then I found that 
I  was at the end of the book, and then I looked at the clock and I had been  
reading an hour."
 
I said, "Congratulations, my child. You have learned to read."
 
Of course she indignantly pointed out that she had been reading since the  
first grade. I said, "No, you haven't been reading. You've been sounding out  
words, and that was taking so much of your mental energy that you didn't have  
time to concentrate on what the words meant."
 
How many kids, today, have an hour--or half an hour--or even fifteen  
minutes--of uninterrupted reading time? This problem can't be solved by the  schools; 
the answer has to come in the homes. The one-eyed monster in the living  room 
has an off switch; it even has an electrical cord that can be unplugged.  
Once I got so sick of my children arguing about it that I put the television in  
the attic for three months. They still argued, but now it was over which one 
of  them got to play the piano first. Their misbehaviors got more interesting; 
I  remember once telling Liz that she absolutely could not read the Bible any 
more  until she had finished washing the dishes, and then thinking how happy 
other  parents would be to have the problems I had. Those three months broke 
the  addiction, and they watched TV after that only rarely and for something in  
particular that they were following--not sitcoms and soap operas.
 
Computers are dandy, but a kid who is addicted to the  computer must be 
required to spend at least half an hour a day reading a  book of his or her choice 
ON THE COMPUTER. This way, the child learns that  reading and computers aren't 
irreconcilable.
 
As a volunteer online tutor, I have many students asking me where they can  
find such-and-such a book online. If it is public domain, I look it  
up--preferably on PG--and give the student a link to it. But often the student  is 
asking for a book that is still in copyright, and I have to explain that one  has t
o go to a REAL library for that book.
 
So this is what I mean when I say that when these kids are adults, about  ten 
years from now, they are going to demand computerized books and they are  
going to get computerized books. Somebody last week mentioned Luddites; I am  
aware that many people my age (61) are Luddites about computers, but my  
state--Utah--has the highest percentage of "wired" households of any state in  the 
Union. I do not think that a person who refuses to think about reading  
computerized books is a Luddite, but I do think that person is not well  informed. I 
think that if that person would try out a Rocket for a week,  preferably a week 
which included several days in bed for flu or recovery from  surgery or 
something like that, that person would never go back to paper books  for anything 
that was available electronically. But notice that I said "think."  I could be 
wrong.
 
I live in a rather small house--definitely too small to be running three  
businesses from. But given technology that exists RIGHT NOW, everything in the  
Library of Congress would fit into my house. Everything in the Salt Lake City  
Library and the Salt Lake County Library and the University of Utah libraries  
would fit into one bookcase in my office.
 
There are books that, for very good reason, I own in both silicon and dead  
tree formats. But when my grandchildren are the age I am now, they will think  
that having all those dead tree books around is a stupid, space-wasting, fire  
hazard.
 
So what I'm getting at is this: We can't possibly guess the future of  
ebooks. It's bigger than any of us think it is. Even the best science fiction  
writers never guessed how we would use computers, and we're still on the edge of  
that, also. It is really absurd to worry about errors made ten years, even five 
 years, ago. Resolve not to make those mistakes again, but go forward, not 
back.  Don't try to figure out who made the mistakes. That doesn't matter. Go 
forward,  not back. The first telegraph message, on 24 May, 1844, said, "What 
hath God  wrought?" Now, 160 years later, we gripe if television from Mars or  
Ganymede is a little fuzzy. I don't want to offend the atheists on this ML,  
but--God is still wrighting. We're part of that process. I won't live to see  
PG's 60th birthday, but some of you will. What, by then, will God have wrought,  
using our hands to do the work? But we'll never get it done if we spend all 
our  time squabbling about what somebody should have done five or ten or 
fifteen  years ago. Go forward, not back.
 
My husband has instructed me not to get into any more flame wars because  
they upset me too much. So I'm going back into watching status, where I spend  
most of my time anyway. But, good people--and you are good people, all of you,  
because you wouldn't be pouring your heart and mind and time into this work if 
 you weren't--stop looking behind you. The action is in front of you.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041115/0d953c36/attachment-0001.html
From stephen.thomas at adelaide.edu.au  Mon Nov 15 16:10:10 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Mon Nov 15 16:10:24 2004
Subject: [gutvol-d] future of ebooks + books to take to bed
In-Reply-To: <e2.68e1f35.2eca484e@aol.com>
References: <e2.68e1f35.2eca484e@aol.com>
Message-ID: <41994562.9020801@adelaide.edu.au>

Gutenberg9443@aol.com wrote:

> ...  Go forward, not back.

Well said, Anne, and thank you for your many salient points.

One thing jumped out at me:

> .... But often the 
> student is asking for a book that is still in copyright, and I have to 
> explain that one has to go to a REAL library for that book.

I believe that PG now has the right to be called a library -- 
even a REAL library. It fulfills the major criteria for a 
library: it has a large collection of books, and it has a 
catalog (online) through which patrons may locate items in the 
collection.

True, there are deficiencies, but you'll find similar in any 
library -- even the Library of Congress is somewhat less than 
perfect. ;-)  My own library, at the University of Adelaide, 
still has tens of thousands of brief catalogue records (out of 
1.5M) -- we're cleaning them up as funds permit, but its a 
twenty year project. We're all just doing the best we can with 
the resources available, which is all anyone can ask.

So perhaps we should start referring to "The Project Gutenberg 
Library" instead of simply "Project Gutenberg"?


Steve

-- 

Stephen Thomas,
Senior Systems Analyst,
University of Adelaide Library
UNIVERSITY OF ADELAIDE SA 5005
AUSTRALIA
Phone: +61 8 830 35190  Fax: +61 8 830 34369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

CRICOS Provider Number 00123M
-----------------------------------------------------------
This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.

From j.hagerson at comcast.net  Mon Nov 15 17:21:55 2004
From: j.hagerson at comcast.net (John Hagerson)
Date: Mon Nov 15 17:22:18 2004
Subject: [gutvol-d] [etext04|etext05]/index.html  missing?
In-Reply-To: <4198E3A2.6080504@perathoner.de>
Message-ID: <00ca01c4cb7a$ac6d7640$6401a8c0@enterprise>

Well, not knowing what to do, I went to the Robots Readme on the
Gutenberg.org web site and copied the wget command listed under the heading
"Getting All EBook Files." I started this process on Sunday evening, at the
end of a cable modem. Little did I realize that more than 24 hours later,
the process would still be running.

In a private message, I was told to use rsync. OK. If rsync is the preferred
method, then why is wget presented as the example?

It appears that I'm storing a bunch of index.html files that are redundant
if I use rsync. I guess I can clean them up at my leisure. However, again
the web page says "keep the html files" to make re-roboting faster.

Well, I'll be a mirror site for all of the ZIP and HTML files, anyway.

Please post suggestions here or pm me. Thank you.

-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Marcello Perathoner
Sent: Monday, November 15, 2004 11:13 AM
To: Project Gutenberg Volunteer Discussion
Subject: Re: [gutvol-d] [etext04|etext05]/index.html missing?

John Hagerson wrote:

> I am using wget to download books from www.gutenberg.org. The process is
> stuck on etext04 in what appears to be a futile effort to download
> index.html.

The indexes are auto-generated on the fly by Apache.

If the load on the fileservers is too high the connection times out 
before a full directory listing can be retrieved.

You should not harvest at peak hours anyway.


-- 
Marcello Perathoner
webmaster@gutenberg.org

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


From Gutenberg9443 at aol.com  Mon Nov 15 22:00:09 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Mon Nov 15 22:00:28 2004
Subject: [gutvol-d] future of ebooks + books to take to bed
Message-ID: <1a3.2bc1f095.2ecaf169@aol.com>

 
In a message dated 11/15/2004 5:10:38 PM Mountain Standard Time,  
stephen.thomas@adelaide.edu.au writes:

So  perhaps we should start referring to "The Project Gutenberg 
Library"  instead of simply "Project Gutenberg"?


I usually describe it as the world's free public library. Of course, I  meant 
that I have to send the kids to a dead tree library. You're quite  right.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041116/6bb19ec6/attachment.html
From j.hagerson at comcast.net  Wed Nov 17 11:06:24 2004
From: j.hagerson at comcast.net (John Hagerson)
Date: Wed Nov 17 11:06:42 2004
Subject: [gutvol-d] PG content on Guntella? Please test.
Message-ID: <000701c4ccd8$8bc4a8a0$6401a8c0@enterprise>

I believe I'm set up to provide ZIP and HTML files of PG content published
prior to 13-NOV-2004 through the Gnutella network.

If anyone would care to test this hypothesis and tell me if you can
successfully access a file, I would like to know.

Thank you very much.

John Hagerson (my IP address is 24.14.124.xxx to know if the file came from
me)


From joshua at hutchinson.net  Wed Nov 17 11:19:20 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Nov 17 11:19:34 2004
Subject: [gutvol-d] PG content on Guntella? Please test.
Message-ID: <20041117191920.237059E899@ws6-2.us4.outblaze.com>

There was some effort a while back to support P2P.

*searching*

Here is the PG webpage:  http://www.gutenberg.org/howto/p2p-howto

I never played with it to tell you if it worked at all (I'm at work now and can't test it), but it looks like we've been seeding the p2p networks already.

Josh


----- Original Message -----
From: "John Hagerson" <j.hagerson@comcast.net>
To: "'Project Gutenberg Volunteer Discussion'" <gutvol-d@lists.pglaf.org>
Subject: [gutvol-d] PG content on Guntella? Please test.
Date: Wed, 17 Nov 2004 13:06:24 -0600

> 
> I believe I'm set up to provide ZIP and HTML files of PG content published
> prior to 13-NOV-2004 through the Gnutella network.
> 
> If anyone would care to test this hypothesis and tell me if you can
> successfully access a file, I would like to know.
> 
> Thank you very much.
> 
> John Hagerson (my IP address is 24.14.124.xxx to know if the file came from
> me)
> 
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From stephen.thomas at adelaide.edu.au  Wed Nov 17 19:15:19 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Wed Nov 17 19:15:40 2004
Subject: [gutvol-d] Re: PG catalog - MARC -- problem with encoding for Audio
	Books
In-Reply-To: <wky8gzc2s6.fsf@chenla.org>
References: <LYRIS-2807460-7607260-2004.11.10-20.00.22--brad#studiojungle.net@listserv.unc.edu>
	<wky8gzc2s6.fsf@chenla.org>
Message-ID: <419C13C7.5070904@adelaide.edu.au>

Brad Collins wrote:

> Steve/
> 
> I tried to send this to the list but it bounced but you will likely
> be about the only person interested so here it is...

Surely that's not true! I'm sure many on this list are just 
thrilled by discussions about MARC. ;-)

> 
> Steve Thomas <stephen.thomas@adelaide.edu.au> writes:
> 
> 
>>Feedback welcomed.
> 
> 
> First of all, your script taught me a lot about both MARC and
> Perl. VERY GOOD WORK!

Thanks.

> 
> I have been working the last few days on porting your script to elisp
> and I noticed the following problem with Audio Book records:
> 
> RDF
> 
> <pgterms:etext rdf:ID="etext9737">
>   <dc:publisher>&pg;</dc:publisher>
>   <dc:title rdf:parseType="Literal">The Seven Poor Travellers</dc:title>
>   <dc:creator>Dickens, Charles (1812-1870)</dc:creator>
>   <dc:language>en</dc:language>
>   <dc:type>Audio Book, computer-generated</dc:type>
>   <dc:created>2006-01-01</dc:created>
>   <dc:rights>Copyrighted work. See license inside work.</dc:rights>
> </pgterms:etext>
> 
> MARC
> 
> LDR 00560cam  22001573a 4500
> 005 20041111153800.0
> 008 060101s2006||||xxu|||||s|||||000 f eng d
> 100 1 |aDickens, Charles,|d1812-1870
> 245 14|aThe Seven Poor Travellers |h[electronic resource] /|cby Charles Dickens
> 260   |bProject Gutenberg Literary Archive Foundation,|c2006
> 500   |aProject Gutenberg
> 506   |aFreely available.
> 516   |aElectronic text
> 830  0|aProject Gutenberg|v9737
> 856 40|uhttp://www.gutenberg.org/etext/9737
> 856 42|uhttp://www.gutenberg.org/license|zLicense

First, I see you are using a prior version of the script/output. 
The latest version now produces this:

LDR 00626cam a22002053a 4500
000 9737
003 PGUSA
005 20041115162032.0
008 060101s2006||||xxu|||||s|||||000 | eng d
040   |aPGUSA|beng
042   |adc
100 1 |aDickens, Charles,|d1812-1870
245 14|aThe Seven Poor Travellers |h[electronic resource] /|cby 
Charles Dickens
260   |bProject Gutenberg,|c2006
500   |aProject Gutenberg
506   |aFreely available.
516   |acomputer-generated Audio Book
830  0|aProject Gutenberg|v9737
856 40|uhttp://www.gutenberg.org/etext/9737
856 42|uhttp://www.gutenberg.org/license|3Rights

I can see immediately that I need to add soemthing for the 
copyrighted works. Probably an addition to the 506 note.

> 
> I am far from being fluent in MARC, but from what I've seen I would
> tend to say that the value for the 245 h subfield should be `sound
> recording' and I am still not sure about the 516 field for electronic
> file types.

You'll see that the 516 now reflects what's in the PG catalog. 
The 245 h subfield value used is a generic term, for the medium 
of the item, and this is commonly used for any kind of 
electronic resource. The term 'sound recording' is used for 
things like LP (and I guess CD) records. The major intent is to 
distinguish this item from other media, e.g. paper.

> 
> Does MARC have a list of defined enumerated values for these
> subfields?
> 
> I have a few other questions:
> 
> I'm also still not clear on why a 500 field is needed.  The 500 field is a
> general note, so why would a note with a value of `Project Gutenberg'
> be helpful?

Not sure about this one. But using the 830 field (Series 
statement) requires either a 490 or 500 (general note). So I'm 
just following the MARC spec. here. Some things I just don't ask 
about. ;-)

> 
> Second, I would suggest making the 260 field a bit more ISBD-ish.
> 
> 260   |a-Urbana: |bProject Gutenberg, |c2006.  
> 
> or at least:
> 
> 260   |aUrbana, |bProject Gutenberg, |c2006.  

Yes. You'll see I'm now using just 'Project Gutenberg' for the 
publisher name -- after coment from Greg. The a subfield can be 
used for place of publication, but ... I'm not sure what that 
is. Is it still Urbana (I thought PG had long since moved from 
there)? Is it the business address of PGLAF? Is it the home town 
of ibiblio? In the end, it seemed easiest to omit that.

> 
> Which leads us to the question of how should the publisher name be
> formated?
> 
> In a sense, each PG; Aussie, Germany, EU, Canada etc would have a
> different city they were published in, located in the country they
> were from.  Even if, (as they are in this case) seperate legal
> entities the city should be enough to identify PG USA.  There should
> be a publisher authority record this points to.  How should this be
> handled?

The catalog only includes items from "PGUSA". If other countries 
wanted to use the script to build MARC for their collections, 
then we can easily modify the script to change the publisher name.


Thanks for the feedback!
Steve

-- 

Stephen Thomas,
Senior Systems Analyst,
University of Adelaide Library
UNIVERSITY OF ADELAIDE SA 5005
AUSTRALIA
Phone: +61 8 830 35190  Fax: +61 8 830 34369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

CRICOS Provider Number 00123M
-----------------------------------------------------------
This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.

From gbnewby at pglaf.org  Thu Nov 18 04:57:58 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Nov 18 04:58:00 2004
Subject: [gutvol-d] Re: PG catalog - MARC -- problem with encoding for
	Audio Books
In-Reply-To: <419C13C7.5070904@adelaide.edu.au>
References: <LYRIS-2807460-7607260-2004.11.10-20.00.22--brad#studiojungle.net@listserv.unc.edu>
	<wky8gzc2s6.fsf@chenla.org> <419C13C7.5070904@adelaide.edu.au>
Message-ID: <20041118125758.GA5939@pglaf.org>

On Thu, Nov 18, 2004 at 01:45:19PM +1030, Steve Thomas wrote:
>...
> >
> >Second, I would suggest making the 260 field a bit more ISBD-ish.
> >
> >260   |a-Urbana: |bProject Gutenberg, |c2006.  
> >
> >or at least:
> >
> >260   |aUrbana, |bProject Gutenberg, |c2006.  
> 
> Yes. You'll see I'm now using just 'Project Gutenberg' for the 
> publisher name -- after coment from Greg. The a subfield can be 
> used for place of publication, but ... I'm not sure what that 
> is. Is it still Urbana (I thought PG had long since moved from 
> there)? Is it the business address of PGLAF? Is it the home town 
> of ibiblio? In the end, it seemed easiest to omit that.

I always used Urbana because it's the historical home, and of course
PG still has a presence there (i.e., Michael).

Legally speaking, the PGLAF organizational home is wherever I live
(funny, I know), unless the PGLAF board decides otherwise.  But I
don't like using this as a publication location, since I might move.
Chapel Hill would be reasonable, since that's where iBiblio is, but PG
has no "real" organization there.

Salt Lake City is where the business office is, but overall I still
prefer Urbana as the "publication location" for PG.

There is no 100% accurate place to list.
  -- Greg
From holden.mcgroin at dsl.pipex.com  Thu Nov 18 13:43:06 2004
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Thu Nov 18 13:43:19 2004
Subject: [gutvol-d] LoC Public Domain Newspaper Archive
Message-ID: <419D176A.2010709@dsl.pipex.com>

Hi all!

CNN is reporting that the U.S. Library of Congress is trying to digitize 
and make available over the web 30 million pages from public domain 
newspapers :-)

http://www.cnn.com/2004/TECH/internet/11/17/oldnewspapers.ap/index.html

Cheers,
Holden
From brad at chenla.org  Thu Nov 18 17:58:55 2004
From: brad at chenla.org (Brad Collins)
Date: Thu Nov 18 18:00:57 2004
Subject: [gutvol-d] Re: PG catalog - MARC -- problem with encoding for
	Audio Books
In-Reply-To: <20041118125758.GA5939@pglaf.org> (Greg Newby's message of
	"Thu, 18 Nov 2004 04:57:58 -0800")
References: <LYRIS-2807460-7607260-2004.11.10-20.00.22--brad#studiojungle.net@listserv.unc.edu>
	<wky8gzc2s6.fsf@chenla.org> <419C13C7.5070904@adelaide.edu.au>
	<20041118125758.GA5939@pglaf.org>
Message-ID: <wkmzxek3a8.fsf@chenla.org>

Greg Newby <gbnewby@pglaf.org> writes:
>> Yes. You'll see I'm now using just 'Project Gutenberg' for the 
>> publisher name -- after coment from Greg. The a subfield can be 
>> used for place of publication, but ... I'm not sure what that 
>> is. Is it still Urbana (I thought PG had long since moved from 
>> there)? Is it the business address of PGLAF? Is it the home town 
>> of ibiblio? In the end, it seemed easiest to omit that.
>
> I always used Urbana because it's the historical home, and of course
> PG still has a presence there (i.e., Michael).
>
[snip]
> There is no 100% accurate place to list.


Since the place of publication is important for determining copyright
restrictions in some cases, I think it would be better to include a
place of publication.

This has bothered me for some time. I've always wondered how to
handle virtual organizations which don't really have a place of
publication in the conventional sense like PG or the Apache Group.

So I did a little digging in the ISBD specs and found the following:

,----[ ISBD(ER) 4.1.13 ]
| 4.1.13 When a place of publication, production or distribution does
| not appear anywhere in the item, the name of the known city or town
| is supplied in square brackets. If the city or town is uncertain, or
| unknown, the name of the probable city or town followed by a
| question mark is supplied in square brackets.  e.g.
| 
| - [Paris]
| - [Prague?]
`----

,----[ ISBD(ER) 4.1.14 ]
| 4.1.14 When the name of a city or town cannot be given, the name of
| the state, province or country is given, according to the same
| stipulations as are applicable to the names of cities or towns.
| e.g.
|
| - Canada 
|   Editorial comment: Known as place of publication;
|   appears in prescribed source.
`----

Since PG doesn't explicitly state that the place of publication is in
the States in etexts, (is that right?) this would suggest something
like:

  - [USA]: Project Gutenberg, 2004.

or (I prefer)

  - [Urbana]: Project Gutenberg, 2004.

in BMF this might look like:

  published : &hyphen; $pl[&lsqb;USA&rsqb;]: $pb[Project Gutenberg], $dt[2004]

or more verbose BMF (bxids only for example):

  published : &hyphen; $pl[$d:bxid://geo:IKE8-5510 $l:&lsqb;USA&rsqb;]: 
              $pb[$d:bxid://aut:JIQ6-7286 $l:Project Gutenberg], 
              $dt[$v:2004-10-12 $l:2004]

BMF subfields used:
(For complete list of subfields see:
http://192.168.0.103/cgi-bin/bmf.cgi/Reference/SubfieldQuickRef.html)

  pl  place name
  d   defined-by
  l   label
  pb  publisher name
  dt  inclusive dates
  v   value-- in dt it should be a iso8601 formated date


b/


-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From brad at chenla.org  Thu Nov 18 18:08:19 2004
From: brad at chenla.org (Brad Collins)
Date: Thu Nov 18 18:10:21 2004
Subject: [gutvol-d] Re: PG catalog - MARC -- problem with encoding for
	Audio Books
In-Reply-To: <20041118125758.GA5939@pglaf.org> (Greg Newby's message of
	"Thu, 18 Nov 2004 04:57:58 -0800")
References: <LYRIS-2807460-7607260-2004.11.10-20.00.22--brad#studiojungle.net@listserv.unc.edu>
	<wky8gzc2s6.fsf@chenla.org> <419C13C7.5070904@adelaide.edu.au>
	<20041118125758.GA5939@pglaf.org>
Message-ID: <wkis82k2uk.fsf@chenla.org>


My last post in reply to Greg included a chunk of rather raw notes
I took on the subject yesterday.

I might as well send along the rest of the notes which are all
exploring issues with the nitty gritty details of manifestation entity
records for PG texts.  You can ignore the BMF stuff.

Take this all as food for thought rather than specific suggestions
for PG.

** Series 

,----[ ISBD(ER) 6.6.1 ]
|  6.6.1 The numbering of the item within a series or sub-series is
|  given in the terms in which it appears in the item. Standard
|  abbreviations may be used. Arabic numerals are substituted for other
|  numerals or spelled-out numbers.  e.g.
| 
|   - (Multimedia learning series ; vol. 2) 
|   - (Visit Canada series ; vol. C) 
|   - (Computer simulation games ; module 5) 
|   - (BTS research report ; 2)
`----

Steve's script give's us:

    830  0|aProject Gutenberg|v9737

But the ISBD suggests something like this:

   (Project Gutenberg etext ; no. 8654)

   830  0|a(Project Gutenberg etext ; |vno. 9737

or BMF: 

  series : ($a[Project Gutenberg etext] ; no. $vol[9737])

** Material Designation

,----[ ISBD(ER) Appendix C ]
| **General material designation:**
|   Electronic resource 
|
| **Resource designations with "electronic" in the designations:**
|   Electronic data
|     Electronic font data
|     Electronic image data
|   Electronic numeric data
|     Electronic census data
|     Electronic survey data 
|   Electronic representational data
|     Electronic map data 
|     Electronic sound data
|   Electronic text data
|     Electronic bibliographic database(s)
|     Electronic document(s) (e.g. letters, articles) Electronic journal(s)
|     Electronic newsletter(s) 
`----

For PG, this would then suggest changing the more general Electronic
Resource to more specific:

    Electronic document
    Electronic sound data

The reason I am suggesting this is that all of the examples I have
seen using `Electronic resourece' are for things like interactive
CDROMS, and dynamic Web sites.

These are not specifically electronic texts, documents or sound
recordings.

The distinction is small and certainly the general `Electronic
Resource' works, but I wanted to find out if there were more specific
enumerated values for material designation....

** Mode of Access

Since we haven't gotten around to working on Instance/Item entities
yet, this is a bit premature.  Access fields are not used in
Manifestation entities.

But reading through the ISBD and MARC specs got me thinking about the
issue.

I must say that I don't like the ISBD(ER) mode of access field.

   Mode of access: Internet via World Wide Web. URL:
   http://muse.jhu.edu/journals/callaloo/.

This is needlessly verbose and redundent. Another example in the spec
is a bit better.

   Mode of access: Internet. URL: http://mitpress.mit.edu/CityofBits/.

But it's not much better.

,----[ ISBD(ER) 7.5.2 Notes relating to mode of access]
|
|   Mode of access shall be recorded in a note for all remote access
|   electronic resources.
| 
|   Mode of access is given as the second note following the System
|   requirements note (see 7.5.1), if given, and is preceded by "Mode of
|   access" (or its equivalent in another language and/or script). In
|   the absence of a system requirements note, mode of access is given
|   as the first note. e.g.
| 
|   - Mode of access: Lexis system. Requires subscription to 
|     Mead Data Central, Inc. 
|   - Mode of access: World Wide Web. URL: http://www.un.org 
|   - Mode of access: Internet via ftp://ftp.nevada.edu 
|   - Mode of access: Gopher://gopher.peabody.yale.edu 
|   - Mode of access: Computer university network 
|   - Mode of access: Mikenet
`----

On the whole, MARC and ISBD are a bit clumsy when it comes to
networked resources--the records are basically electronic catalog
cards. 

Numbers 2,3 and 4 are all network addresses, which have a URL pointing
to the resource.  I can understand putting a label indicating the type
of network protocol but the examples are all screwed up mixing
descriptive labels for the protocol with the type of network.

Better would be something like the following:

  - access: Lexis [dialup network]: 
            Note: Requires subscription to Mead Data 
            Central, Inc. 
  - access: Project Gutenberg (WWW site): 
            URL: http://projectgutenberg.org 
  - access: Project Gutenberg (FTP mirror):
            URL: ftp://ftp.ibiblio.org 
  - access: Internet (FTP site):
            URL: ftp://ftp.nevada.edu
  - access: Internet (Gopher site): 
            URL: gopher://gopher.peabody.yale.edu 
  - access: UCLA (university intranet):
            URL: http://libary.ucla.edu:2080
            Note: Requires university network account.
  - access: Mikenet (private local area network) 

in BMF

  access:   
  - $a[$typ:dialup $l:Lexis (dialup network)]: 
    Note: $not[Requires subscription to Mead Data Central, 
    Inc.]
  - $a[$typ:www $l:Project Gutenberg (WWW site): 
    URL: $url[http://projectgutenberg.org]
  - $a[$typ:ftp $l:Project Gutenberg (FTP mirror)]:
    URL: $url[ftp://ftp.ibiblio.org]
  - $a[$typ:ftp $l:Internet (FTP site)]:
    URL: $url[ftp://ftp.nevada.edu]
  - $a[$typ:gopher $l:Internet (Gopher site)]: 
    URL: $url[gopher://gopher.peabody.yale.edu] 
  - $a[$typ:intranet $l:UCLA (university intranet)]:
    URL: $url[http://libary.ucla.edu:2080]
    Note: Requires university network account.
  - $a[$typ:lan $l:Mikenet] ($not[private local area 
    network])

Now what about MARC?  Steve's script produces:

   856 40|uhttp://www.gutenberg.org/etext/9737
   856 42|uhttp://www.gutenberg.org/license|3Rights

and the spec sez...

,----[ MARC: 856 Electronic Location and Access ]
|  Field 856 contains the information needed to locate and access an
|  electronic resource. The field may be used in a bibliographic record
|  for a resource when that resource or a subset of it is available
|  electronically. In addition, it may be used to locate and access an
|  electronic version of a non-electronic resource described in the
|  bibliographic record or a related electronic resource.
`----

This breaks down to:

*** Indicators 
  First: 
    4  HTTP
  Second:
    0  Resource
    2  Related Resource

*** Subfields
    $u URI (do they make a distinction between URI and URL?)
    $3 Materials specified.


-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From sly at victoria.tc.ca  Thu Nov 18 21:53:50 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Nov 18 21:54:13 2004
Subject: [gutvol-d] Re: PG catalog - MARC
In-Reply-To: <wkis82k2uk.fsf@chenla.org>
References: <LYRIS-2807460-7607260-2004.11.10-20.00.22--brad#studiojungle.net@listserv.unc.edu>
	<wky8gzc2s6.fsf@chenla.org> <419C13C7.5070904@adelaide.edu.au>
	<20041118125758.GA5939@pglaf.org> <wkis82k2uk.fsf@chenla.org>
Message-ID: <Pine.GSO.4.58.0411182146160.20805@vtn1.victoria.tc.ca>


I question if the use of a series number as suggested below is an
ideal approach.

I believe it is intended for a smaller number of items which are
intentionally published as a series.

I'd suggest that the closest thing to PG etexts numbers in a traditional
research library, would be accession numbers (as commonly used for
microforms)

Andrew

On Fri, 19 Nov 2004, Brad Collins wrote:

>
> ** Series
>
> ,----[ ISBD(ER) 6.6.1 ]
> |  6.6.1 The numbering of the item within a series or sub-series is
> |  given in the terms in which it appears in the item. Standard
> |  abbreviations may be used. Arabic numerals are substituted for other
> |  numerals or spelled-out numbers.  e.g.
> |
> |   - (Multimedia learning series ; vol. 2)
> |   - (Visit Canada series ; vol. C)
> |   - (Computer simulation games ; module 5)
> |   - (BTS research report ; 2)
> `----
>
> Steve's script give's us:
>
>     830  0|aProject Gutenberg|v9737
>
> But the ISBD suggests something like this:
>
>    (Project Gutenberg etext ; no. 8654)
>
>    830  0|a(Project Gutenberg etext ; |vno. 9737
>
> or BMF:
>
>   series : ($a[Project Gutenberg etext] ; no. $vol[9737])
>
From brad at chenla.org  Thu Nov 18 23:49:23 2004
From: brad at chenla.org (Brad Collins)
Date: Thu Nov 18 23:51:42 2004
Subject: [gutvol-d] Re: PG catalog - MARC
In-Reply-To: <Pine.GSO.4.58.0411182146160.20805@vtn1.victoria.tc.ca> (Andrew
	Sly's message of "Thu, 18 Nov 2004 21:53:50 -0800 (PST)")
References: <LYRIS-2807460-7607260-2004.11.10-20.00.22--brad#studiojungle.net@listserv.unc.edu>
	<wky8gzc2s6.fsf@chenla.org> <419C13C7.5070904@adelaide.edu.au>
	<20041118125758.GA5939@pglaf.org> <wkis82k2uk.fsf@chenla.org>
	<Pine.GSO.4.58.0411182146160.20805@vtn1.victoria.tc.ca>
Message-ID: <wk6542jn24.fsf@chenla.org>

Andrew Sly <sly@victoria.tc.ca> writes:

> I question if the use of a series number as suggested below is an
> ideal approach.
>
> I believe it is intended for a smaller number of items which are
> intentionally published as a series.
>
> I'd suggest that the closest thing to PG etexts numbers in a traditional
> research library, would be accession numbers (as commonly used for
> microforms)

I used Series because The Early English Text Society publications are
cataloged as a series and this was the closest thing I have found to
the PG etext numbers.  

This is from the LOC:

Series: Early English Text Society (Series). Original series ; 10, [etc.]

830 _0 |a Early English Text Society (Series). |p Original series ; |v 10, [etc.]

I understand that the PG etext numbers are not a concious planned
series but I still think it works....

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From vze3rknp at verizon.net  Fri Nov 19 06:02:50 2004
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Fri Nov 19 06:02:40 2004
Subject: [gutvol-d] Re: PG catalog - MARC
In-Reply-To: <wk6542jn24.fsf@chenla.org>
References: <LYRIS-2807460-7607260-2004.11.10-20.00.22--brad#studiojungle.net@listserv.unc.edu>	<wky8gzc2s6.fsf@chenla.org>
	<419C13C7.5070904@adelaide.edu.au>	<20041118125758.GA5939@pglaf.org>
	<wkis82k2uk.fsf@chenla.org>	<Pine.GSO.4.58.0411182146160.20805@vtn1.victoria.tc.ca>
	<wk6542jn24.fsf@chenla.org>
Message-ID: <419DFD0A.7020504@verizon.net>

This is far from my area of expertise, but I do know that we are putting 
books into PG that come from several types of what I think of as 
"series". One kind is a group of books by one author (eg The Bobbsey 
Twins Series) and the other kind is a group of books, each by different 
authors, that are intended to go together (eg the English Men of Letters 
biographies). I'd think that we would want to have a way to represent 
each of these in the PG catalog.

JulietS

Brad Collins wrote:

>Andrew Sly <sly@victoria.tc.ca> writes:
>
>  
>
>>I question if the use of a series number as suggested below is an
>>ideal approach.
>>
>>I believe it is intended for a smaller number of items which are
>>intentionally published as a series.
>>
>>I'd suggest that the closest thing to PG etexts numbers in a traditional
>>research library, would be accession numbers (as commonly used for
>>microforms)
>>    
>>
>
>I used Series because The Early English Text Society publications are
>cataloged as a series and this was the closest thing I have found to
>the PG etext numbers.  
>
>This is from the LOC:
>
>Series: Early English Text Society (Series). Original series ; 10, [etc.]
>
>830 _0 |a Early English Text Society (Series). |p Original series ; |v 10, [etc.]
>
>I understand that the PG etext numbers are not a concious planned
>series but I still think it works....
>
>b/
>
>  
>

From brad at chenla.org  Fri Nov 19 06:43:42 2004
From: brad at chenla.org (Brad Collins)
Date: Fri Nov 19 06:45:56 2004
Subject: [gutvol-d] Re: PG catalog - MARC
In-Reply-To: <419DFD0A.7020504@verizon.net> (Juliet Sutherland's message of
	"Fri, 19 Nov 2004 09:02:50 -0500")
References: <LYRIS-2807460-7607260-2004.11.10-20.00.22--brad#studiojungle.net@listserv.unc.edu>
	<wky8gzc2s6.fsf@chenla.org> <419C13C7.5070904@adelaide.edu.au>
	<20041118125758.GA5939@pglaf.org> <wkis82k2uk.fsf@chenla.org>
	<Pine.GSO.4.58.0411182146160.20805@vtn1.victoria.tc.ca>
	<wk6542jn24.fsf@chenla.org> <419DFD0A.7020504@verizon.net>
Message-ID: <wk1xepkig1.fsf@chenla.org>

Juliet Sutherland <vze3rknp@verizon.net> writes:

> This is far from my area of expertise, but I do know that we are
> putting books into PG that come from several types of what I think of
> as "series". One kind is a group of books by one author (eg The
> Bobbsey Twins Series) and the other kind is a group of books, each by
> different authors, that are intended to go together (eg the English
> Men of Letters biographies). I'd think that we would want to have a
> way to represent each of these in the PG catalog.
>

And you are correct -- and this is why MARC has a number of
different ways of dealing with the issue (and I am not the person to
explain them) but as far as I can see they are not mutually exclusive.

Fields can be repeated (MARC 830 is repeatable) and there is no reason
why there aren't series within series.

If there is a better way to do this?

Was the LOC example I used wrong?

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From sly at victoria.tc.ca  Fri Nov 19 09:16:16 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Fri Nov 19 09:16:22 2004
Subject: [gutvol-d] Re: PG catalog - MARC
In-Reply-To: <wk1xepkig1.fsf@chenla.org>
References: <LYRIS-2807460-7607260-2004.11.10-20.00.22--brad#studiojungle.net@listserv.unc.edu>
	<wky8gzc2s6.fsf@chenla.org> <419C13C7.5070904@adelaide.edu.au>
	<20041118125758.GA5939@pglaf.org> <wkis82k2uk.fsf@chenla.org>
	<Pine.GSO.4.58.0411182146160.20805@vtn1.victoria.tc.ca>
	<wk6542jn24.fsf@chenla.org>
	<419DFD0A.7020504@verizon.net> <wk1xepkig1.fsf@chenla.org>
Message-ID: <Pine.GSO.4.58.0411190915270.4605@vtn1.victoria.tc.ca>


Hi Brad.

I looked for an example of an accession number for microfiche,
as used in a marc record, and found the following example:

(The accession number is 05000, found in fields 490 and 830,
pretty much as you had suggested.)

   000 00858nam 2200181 a 450
   001 571327
   008 810528c19801898enka b 00011 eng 0
   020 __ |a 0665050003 (Positive copy)
   035 __ |a (CaOOCIHM)81603284X
   035 __ |9 ACN8054TS
   040 __ |a CaOOCIHM |b eng
   100 10 |a Allen, Grant, |d 1848-1899
   245 13 |a An African millionaire |h [microform] : |b episodes in the
   life of the illustrious Colonel Clay / |c by Grant Allen.
   260 0_ |a London : |b G. Richards, |c 1898.
   300 __ |a 4 microfiches (183 fr.) : |b ill.
   490 1_ |a CIHM/ICMH Microfiche series = CIHM/ICMH collection de
   microfiches ; |v no. 05000
   533 __ |a Filmed from a copy of the original publication held by the
   Izaak Walton Killam Mmemorial Library, Dalhousie University. |b Ottawa
   : |c Canadian Institute for Historical Microreproductions, |d 1980.
   830 _0 |a CIHM/ICMH Microfiche series ; |v no. 05000


Thanks,
Andrew
From shalesller at writeme.com  Fri Nov 19 10:05:40 2004
From: shalesller at writeme.com (D. Starner)
Date: Fri Nov 19 10:05:50 2004
Subject: [gutvol-d] Re: PG catalog - MARC
Message-ID: <20041119180540.5D2BC4BDAA@ws1-1.us4.outblaze.com>

"Brad Collins" writes:
> I used Series because The Early English Text Society publications are 
> cataloged as a series and this was the closest thing I have found to 
> the PG etext numbers. 
> 
> This is from the LOC: 
> 
> Series: Early English Text Society (Series). Original series ; 10, [etc.] 
> 
> 830 _0 |a Early English Text Society (Series). |p Original series ; |v 10, [etc.] 

So what happens when we do the EETS books? There's one in the PPVing queue
at DP, and more being proofed and waiting for PPers.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From joshua at hutchinson.net  Fri Nov 19 10:18:10 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Nov 19 10:18:17 2004
Subject: [gutvol-d] TEI Header Spec (rough draft)
Message-ID: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com>

Attached at the bottom is rough draft of a teiHeader spec.  I basically wrote it as an example teiHeader with comments scattered all over to explain things.

Since I'm not a cataloging expert (heck, I'm don't even qualify as a catalog neophyte), I'm relying heavily on what is in Marcello's documentation and the original TEI documentation.  I basically just picked out stuff that looked important and relevant based on stuff I've done before.  I'm sure I've missed some things.

Please take a look through this and point out items that aren't covered and you think should be.  Also, if anything is unclear, let me know.  I'll try to explain it better (and update the explanation in the spec).

Remember, the goal is to be able to grab information from the teiHeader in each etext to generate cataloging information (this ties in nicely with the ongoing MARC discussions around here).

Josh

PS The next document will be the actual markup spec rough draft.  This one will probably be delayed until after the US Thanksgiving holidays (I don't think well with 5 pounds of turkey digesting in my tummy!)


-------------- next part --------------
Project Gutenberg TEI Header Specification

by Joshua Hutchinson


This document will provide a "dummy" teiHeader, marking each line as either MANDATORY, RECOMMENDED or OPTIONAL.

For fuller descriptions of what each part of the teiHeader, please see the original TEILite documentation for the Electronic Title Page (<teiHeader>) located here (http://www.tei-c.org/Lite/teiu5_en.html#U5-header).


<teiHeader>
  <fileDesc>  <-- MANDATORY SECTION -->

    <titleStmt>  <-- MANDATORY SECTION -->
      <title>The Title of the EText</title>  <-- MANDATORY SECTION -->
      <author><name reg="LastName, FirstName">FirstName LastName</name></author>  <-- MANDATORY SECTION -->
      <respStmt><resp>Illustrator, Editor, etc.</resp>
             <name reg="LastName, FirstName">FirstName LastName</name></resp></respStmt>  <-- MANDATORY SECTION (if it exists for this text) -->
    </titleStmt>

<-- Multiple entries are allowed.  For instance, co-authors for a text
would result in multiple <author></author> entries, one for each author. -->

    <editionStmt>  <-- OPTIONAL -->
      <edition n="1">First edition</edition>  <-- OPTIONAL -->
    </editionStmt>

<-- This is information specifically about the PG edition.  For instance,
in the past, a major update of a text would often result in a number increment
of a text's file name.  This information could be tracked here.  I'm not
certain that this type of information is captured anymore. -->

    <publicationStmt>  <-- MANDATORY SECTION -->
      <publisher>Project Gutenberg</publisher>  <-- MANDATORY SECTION -->
      <date value="2004-11">November, 2004</date>  <-- MANDATORY SECTION -->
      <idno type="etext-number">12345</idno>  <-- MANDATORY SECTION -->
    </publicationStmt>

<-- This information pertains to the posting of the etext in PG archives.
The <publisher> field won't ever change.  (Should it be something else,
like PGLAF?)  The <date> field is the month and year it was posted.  The 
<idno> field is the file number assigned to the text by the whitewashers 
before posting to the archive. (NOTE: PG could also list itself as a
distributor instead of a publisher.)

Possible future addition here:  There has been sporadic talk in the past
of getting ISBN numbers for PG posted works.  This would go here as another 
<idno> field. -->

    <sourceDesc>  <-- MANDATORY SECTION -->
      <bibl>A short description of the etext (i.e., The first folio of Shakespeare, prepared by Charlton Hinman)</bibl>  <-- OPTIONAL -->
      <biblFull>  <-- MANDATORY SECTION -->
        <titleStmt>  <-- MANDATORY SECTION -->
          <title>The Title of the Source Text</title>  <-- MANDATORY SECTION -->
          <author><name reg="LastName, FirstName">FirstName LastName</name></author>  <-- MANDATORY SECTION -->
          <respStmt><resp>Illustrator, Editor, etc.</resp><name reg="LastName, FirstName">FirstName LastName</name></resp></respStmt>  <-- MANDATORY SECTION (if it exists for this text) -->
        </titleStmt>
        <publicationStmt>  <-- MANDATORY SECTION -->
          <publisher>Original Source Publisher</publisher>  <-- MANDATORY SECTION -->
          <date value="1923-01">January, 1922</date>  <-- MANDATORY SECTION -->
          <pubPlace>Place of Publication</pubPlace>  <-- MANDATORY SECTION -->
        </publicationStmt>
      </biblFull>
    </sourceDesc>

<-- The observant will notice that the fields from the beginning of
<fileDesc> are duplicated in the <sourceDesc> section.  The information 
will not necessarily be identical, though. The <fileDesc> information 
refers to our etext, while the <sourceDesc> refers back to the original 
source document. -->

  </fileDesc>

  <encodingDesc>  <-- OPTIONAL -->
    <projectDesc>The etext was produced by the Distributed Proofreaders at http://www.pgdp.net.</projectDesc>  <-- OPTIONAL -->
  </encodeDesc>

<-- This section is optional.  In fact, for the example given, it is 
largely redundant because DP will be given credit in the <revisionDesc> 
section detailed below. -->

  <profileDesc>  <-- MANDATORY SECTION -->
    <langUsage>  <-- MANDATORY SECTION -->
      <language id="en-us">English (United States)</language>
      <language id="xx">Written out language title</language>
    </langUsage>
    <textClass>  <-- RECOMMENDED SECTION -->
      <keywords>
        <list>
          <item>KEYWORD</item>
        </list>
      </keywords>
    </textClass>
  </profileDesc>

<-- This section is fairly straight forward, just listing the languages used in the etext and some search keywords. -->

  <revisionDesc>  <-- MANDATORY SECTION -->
    <change>
      <date value="2004-11">November 2004</date>
      <respStmt>
        <name>Scans provided by Cornell University</name>
        <name>Joshua Hutchinson</name>
        <name>Juliet Sutherland</name>
        <name>Distributed Proofreaders</name>
       </respStmt>
       <item>Etext created</item>
    </change>
    <change>
      <date value="2005-11">November 2005</date>
      <respStmt>
        <name>Jim Tinsley</name>
      </respStmt>
      <item>Fixed missing chapter headers and minor typos</items>
    </change>
  </revisionDesc>

<-- This section will be added to each time something is done to the
text, so we have a running record of changes.  The order should be
in the order the changes were done, the original creation first and 
each additional change listed in order below.  Because some volunteers
wish to remain anonymous, it is perfectly acceptable to simply list 
ANONYMOUS in a name line. -->
From marcello at perathoner.de  Fri Nov 19 10:46:03 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov 19 10:46:12 2004
Subject: [gutvol-d] TEI Header Spec (rough draft)
In-Reply-To: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com>
References: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com>
Message-ID: <419E3F6B.2050902@perathoner.de>

Joshua Hutchinson wrote:

>       <title>The Title of the EText</title>  <-- MANDATORY SECTION -->

We should provide an non-standard attribute of "nonfiling". This is the 
number of chars to remove from the start of title before sorting it.

   <title nonfiling="4">The Tempest</title>

   <title nonfiling="2">A Midsummer Nights Dream</title>

This is an extension to TEI but very useful for the catalog software. It 
avoids unsightly titles like: "Tempest, The" and still sorts right.


>     <editionStmt>  <-- OPTIONAL -->
>       <edition n="1">First edition</edition>  <-- OPTIONAL -->
>     </editionStmt>

I think the edition number is not maintained any more. I don't see any 
of them in the new file system.


>     <publicationStmt>  <-- MANDATORY SECTION -->
>       <publisher>Project Gutenberg</publisher>  <-- MANDATORY SECTION -->
>       <date value="2004-11">November, 2004</date>  <-- MANDATORY SECTION -->
>       <idno type="etext-number">12345</idno>  <-- MANDATORY SECTION -->
>     </publicationStmt>

The date should also mention the day. We are not using the date for 
filing any more.

Is this the date of first publication or updated with each new edition ?


>     <textClass>  <-- RECOMMENDED SECTION -->
>       <keywords>
>         <list>
>           <item>KEYWORD</item>
>         </list>
>       </keywords>
>     </textClass>

This needs some more thought as the keywords should come out of some 
authority list. In that case the authority must be specified.


>     <change>
>       <date value="2004-11">November 2004</date>
>       <respStmt>
>         <name>Scans provided by Cornell University</name>
>         <name>Joshua Hutchinson</name>
>         <name>Juliet Sutherland</name>
>         <name>Distributed Proofreaders</name>
>        </respStmt>
>        <item>Etext created</item>
>     </change>

Better separate scanning and proofing:

      <change>
        <date value="2003">2003</date>
        <respStmt>
          <name type="Organisation">Cornell University</name>
         </respStmt>
         <item>Scanned the source</item>
      </change>
      <change>
        <date value="2004-11">November 2004</date>
        <respStmt>
          <name>Joshua Hutchinson</name>
          <name>Juliet Sutherland</name>
          <name>Distributed Proofreaders</name>
         </respStmt>
         <item>Etext created</item>
      </change>


-- 
Marcello Perathoner
webmaster@gutenberg.org

From scott_bulkmail at productarchitect.com  Fri Nov 19 11:30:33 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Fri Nov 19 11:31:22 2004
Subject: [gutvol-d] TEI Header Spec (rough draft)
In-Reply-To: <419E3F6B.2050902@perathoner.de>
References: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com>
	<419E3F6B.2050902@perathoner.de>
Message-ID: <p06110419bdc3f93ed9bb@[192.168.0.52]>

>>      <title>The Title of the EText</title>  <-- MANDATORY SECTION -->
>
>We should provide an non-standard attribute of "nonfiling". This is the number of chars to remove from the start of title before sorting it.
>
>  <title nonfiling="4">The Tempest</title>
>
>  <title nonfiling="2">A Midsummer Nights Dream</title>
>
>This is an extension to TEI but very useful for the catalog software. It avoids unsightly titles like: "Tempest, The" and still sorts right.

I think that should be done (or not) by the cataloging software rather than hardcoded into each file.  It's an easy thing to miss, i.e. to be done inconsistently.  And, since it's not part of non-PG TEI, there's no other software in the outside world that looks for it.

(I may have made this point before, but if so I can't find it in my archives.)
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
http://ProductArchitect.com/ - consulting
From krausyaoj at ameritech.net  Fri Nov 19 11:48:52 2004
From: krausyaoj at ameritech.net (Jeffrey Kraus-yao)
Date: Fri Nov 19 11:48:08 2004
Subject: [gutvol-d] TEI Header Spec (rough draft)
In-Reply-To: <p06110419bdc3f93ed9bb@[192.168.0.52]>
Message-ID: <001401c4ce70$cbe4b940$0402a8c0@p3>

Another option for the title is to use a file-as attribute.

<title file-as="Tempest, The">The Tempest</title>
<title file-as="Midsummer Nights Dream, A">A Midsummer Nights
Dream</title>

While this may not be included in the TEI standard, it is part of the
OEB standard,
http://www.openebook.org/oebps/oebps1.0.1/download/oeb101-xhtml.htm


-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Scott Lawton
Sent: 19 November, 2004 13:31
To: Project Gutenberg Volunteer Discussion
Subject: Re: [gutvol-d] TEI Header Spec (rough draft)


>>      <title>The Title of the EText</title>  <-- MANDATORY SECTION -->
>
>We should provide an non-standard attribute of "nonfiling". This is the

>number of chars to remove from the start of title before sorting it.
>
>  <title nonfiling="4">The Tempest</title>
>
>  <title nonfiling="2">A Midsummer Nights Dream</title>
>
>This is an extension to TEI but very useful for the catalog software. 
>It avoids unsightly titles like: "Tempest, The" and still sorts right.

I think that should be done (or not) by the cataloging software rather
than hardcoded into each file.  It's an easy thing to miss, i.e. to be
done inconsistently.  And, since it's not part of non-PG TEI, there's no
other software in the outside world that looks for it.

(I may have made this point before, but if so I can't find it in my
archives.)
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books http://ProductArchitect.com/ -
consulting _______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

From traverso at dm.unipi.it  Fri Nov 19 14:12:11 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Fri Nov 19 14:12:25 2004
Subject: [gutvol-d] TEI Header Spec (rough draft)
In-Reply-To: <p06110419bdc3f93ed9bb@[192.168.0.52]> (message from Scott Lawton
	on Fri, 19 Nov 2004 14:30:33 -0500)
References: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com>
	<419E3F6B.2050902@perathoner.de> <p06110419bdc3f93ed9bb@[192.168.0.52]>
Message-ID: <200411192212.iAJMCBdV012178@posso.dm.unipi.it>

>>>>> "Scott" == Scott Lawton <scott_bulkmail@productarchitect.com> writes:

    >>> <title>The Title of the EText</title> <-- MANDATORY SECTION
    >>> -->
    >>  We should provide an non-standard attribute of
    >> "nonfiling". This is the number of chars to remove from the
    >> start of title before sorting it.
    >> 
    >> <title nonfiling="4">The Tempest</title>
    >> 
    >> <title nonfiling="2">A Midsummer Nights Dream</title>
    >> 
    >> This is an extension to TEI but very useful for the catalog
    >> software. It avoids unsightly titles like: "Tempest, The" and
    >> still sorts right.

    Scott> I think that should be done (or not) by the cataloging
    Scott> software rather than hardcoded into each file.  

How can the software guess what is filing and what not? "As Farpas"
and "As you like it", "As" is filing or not? Here the language might
decide, but I think that it is possible in the same language to have
the same word to be filing or non filing (surely it is in italian if
you disregard accents).

However, relying on character count is very fragile, especially in a
context in which whitespace is considered irrelevant. I have often
seen braces used in sorting software: <title>{The} Tempest</title>, 
<title>{A} Midsummer Nights Dream</title>: characters in braces and
whitespace are discarded for the purpose of sorting, braces are
discarded for the purpose of printing. Of course it is possible to
achieve the same result, much more verbosely, with angled brackets....
<title><nonfiling>The</nonfiling> Tempest</title>

(a side remark: a non-filing part is not always separated by space:
<title>{L'}Inferno</title>)

Carlo


From brad at chenla.org  Fri Nov 19 19:05:51 2004
From: brad at chenla.org (Brad Collins)
Date: Fri Nov 19 19:07:56 2004
Subject: [gutvol-d] Re: PG catalog - MARC
In-Reply-To: <20041119180540.5D2BC4BDAA@ws1-1.us4.outblaze.com> (D.
	Starner's message of "Fri, 19 Nov 2004 10:05:40 -0800")
References: <20041119180540.5D2BC4BDAA@ws1-1.us4.outblaze.com>
Message-ID: <wkzn1dz0c0.fsf@chenla.org>

"D. Starner" <shalesller@writeme.com> writes:

> So what happens when we do the EETS books? There's one in the PPVing
> queue at DP, and more being proofed and waiting for PPers.

Since the PG edition is distinct from the EETS edition this won't be
a problem.

Any reference to the EETS series number would be in a note indicating
the source used for the PG edition.

It might look something like this (please excuse my shaky ISBD)

   mainTitle      : Vices and virtues; being a soul's confession of its
                    sins with reason's description of the virtues. A
                    middle-English dialogue of about 1200
                    A.D. [electronic document] / Edited by
                    F. Holthausen.
   responsibility : Holthausen, Ferdinand, 1860-1956. ed.
   published      : - [Urbana]: Project Gutenberg, 2006.
   series         : Project Gutenberg ; etext no. 55031
   source         : Text based on: Vices and Virtues / ed. by Dr. F
                    Holthausen. EETS Original Series ; no. 89, 159
                    - London: Kegan Paul, Trench, Tr?bner, 1888.

or it could be included in an 500 note field and a 830 labeled as a
Variant Series as was done in a record for reprint of the book in the
LOC catalog.

    notes         : v. 1 first published 1888, v. 2 first published 1921.
    variantSeries : Early English Text Society. Publications. 
                    Original series ; no. 89, 159

Actually I like this better.

b/

who just now realised it's Saturday morning...
-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From joshua at hutchinson.net  Fri Nov 19 20:15:03 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Nov 19 20:15:04 2004
Subject: [gutvol-d] TEI Header Spec (rough draft)
In-Reply-To: <419E3F6B.2050902@perathoner.de>
References: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com>
	<419E3F6B.2050902@perathoner.de>
Message-ID: <419EC4C7.7080307@hutchinson.net>

Marcello Perathoner wrote:

> Joshua Hutchinson wrote:
>
>>       <title>The Title of the EText</title>  <-- MANDATORY SECTION -->
>
>
> We should provide an non-standard attribute of "nonfiling". This is 
> the number of chars to remove from the start of title before sorting it.
>
>   <title nonfiling="4">The Tempest</title>
>
>   <title nonfiling="2">A Midsummer Nights Dream</title>
>
> This is an extension to TEI but very useful for the catalog software. 
> It avoids unsightly titles like: "Tempest, The" and still sorts right.
>
>

I see the need for this... But I think I like Jeffrey's method a little 
better.  (From another post)

<title file-as="Tempest, The">The Tempest</title>
<title file-as="Midsummer Nights Dream, A">A Midsummer Nights
Dream</title>

While this may not be included in the TEI standard, it is part of the
OEB standard,
http://www.openebook.org/oebps/oebps1.0.1/download/oeb101-xhtml.htm


>>     <publicationStmt>  <-- MANDATORY SECTION -->
>>       <publisher>Project Gutenberg</publisher>  <-- MANDATORY SECTION 
>> -->
>>       <date value="2004-11">November, 2004</date>  <-- MANDATORY 
>> SECTION -->
>>       <idno type="etext-number">12345</idno>  <-- MANDATORY SECTION -->
>>     </publicationStmt>
>
>
> The date should also mention the day. We are not using the date for 
> filing any more.
>
Fair enough.  Will change it.

> Is this the date of first publication or updated with each new edition ?
>
>
First publication.  Subsequent updates will be documented at the end.

>>     <textClass>  <-- RECOMMENDED SECTION -->
>>       <keywords>
>>         <list>
>>           <item>KEYWORD</item>
>>         </list>
>>       </keywords>
>>     </textClass>
>
>
> This needs some more thought as the keywords should come out of some 
> authority list. In that case the authority must be specified.
>
>

This is where the catalog folks need to step in.  :)

>>     <change>
>>       <date value="2004-11">November 2004</date>
>>       <respStmt>
>>         <name>Scans provided by Cornell University</name>
>>         <name>Joshua Hutchinson</name>
>>         <name>Juliet Sutherland</name>
>>         <name>Distributed Proofreaders</name>
>>        </respStmt>
>>        <item>Etext created</item>
>>     </change>
>
>
> Better separate scanning and proofing:
>
>      <change>
>        <date value="2003">2003</date>
>        <respStmt>
>          <name type="Organisation">Cornell University</name>
>         </respStmt>
>         <item>Scanned the source</item>
>      </change>
>      <change>
>        <date value="2004-11">November 2004</date>
>        <respStmt>
>          <name>Joshua Hutchinson</name>
>          <name>Juliet Sutherland</name>
>          <name>Distributed Proofreaders</name>
>         </respStmt>
>         <item>Etext created</item>
>      </change>
>
>
>

Ok, we can separate that information out... Will update.

Josh

From stephen.thomas at adelaide.edu.au  Fri Nov 19 21:29:06 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Fri Nov 19 21:29:28 2004
Subject: [gutvol-d] Re: PG catalog - MARC
In-Reply-To: <wkmzxek3a8.fsf@chenla.org>
References: <LYRIS-2807460-7607260-2004.11.10-20.00.22--brad#studiojungle.net@listserv.unc.edu>
	<wky8gzc2s6.fsf@chenla.org> <419C13C7.5070904@adelaide.edu.au>
	<20041118125758.GA5939@pglaf.org> <wkmzxek3a8.fsf@chenla.org>
Message-ID: <419ED622.5060904@adelaide.edu.au>

Brad Collins wrote:

> Since the place of publication is important for determining copyright
> restrictions in some cases, I think it would be better to include a
> place of publication.
> 
> This has bothered me for some time. I've always wondered how to
> handle virtual organizations which don't really have a place of
> publication in the conventional sense like PG or the Apache Group.

I think the ISBD recommends using "s.l." where the place is 
unknown or indeterminate. (Initials for "sine loco". See 
http://www.ifla.org/VII/s13/pubs/isbd3.htm#18 section 4.1.15)

However, this does not help the copyright question.

MARC does provide the 506 field ("Restrictions on Access note") 
for copyright notices etc. Right now, I'm just putting "Freely 
available" in here (or the copyright statement for copyrighted 
works). But we could use a more detailed statement here. E.g. we 
should as a minimum say "Freely available in the USA. May be 
subject to copyright in other locations." We could also place 
the license url in this field (subfield u) rather than the 856.


Regarding the series statement -- I'm not wedded to the use of 
830 for "Project Gutenberg". It just seemed an appropriate way 
to include the PG number.

One typical use of 830 in library catalogues is to be able to 
index works by series name. So this would allow (in this case) 
for a search on series name "Project Gutenberg" to list all the 
works in the collection. However, with currently almost 14,000 
titles, maybe this isn't a worthwhile goal. "Project Gutenberg" 
should also be available as keywords in any library catalog 
search, if one needed to limit a search to just PG works.

We could always expand the 500 General Note to include more 
detail about PG, including the item number. (500 can be whatever 
we want, and you can have as many 500 notes as you need.) Also, 
the item number is present in field 001 -- although that 
probably won't be visible to the general user of a library 
catalog, so including it in the 500 note is useful (and again 
makes the number usable in a keyword search).

So if you want to reserve the 830 for particular series within 
PG (e.g. EET) then that's fine with me.


Steve

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From marcello at perathoner.de  Sat Nov 20 01:46:10 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Nov 20 01:46:33 2004
Subject: [gutvol-d] TEI Header Spec (rough draft)
In-Reply-To: <419EC4C7.7080307@hutchinson.net>
References: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com>	<419E3F6B.2050902@perathoner.de>
	<419EC4C7.7080307@hutchinson.net>
Message-ID: <419F1262.5080408@perathoner.de>

Joshua Hutchinson wrote:

> I see the need for this... But I think I like Jeffrey's method a little 
> better.  (From another post)
> 
> <title file-as="Tempest, The">The Tempest</title>
> <title file-as="Midsummer Nights Dream, A">A Midsummer Nights
> Dream</title>
> 
> While this may not be included in the TEI standard, it is part of the
> OEB standard,
> http://www.openebook.org/oebps/oebps1.0.1/download/oeb101-xhtml.htm

My method is part of the MARC standard and is already implemented in the 
catalog database.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From j.hagerson at comcast.net  Sat Nov 20 09:09:42 2004
From: j.hagerson at comcast.net (John Hagerson)
Date: Sat Nov 20 09:10:57 2004
Subject: [gutvol-d] Problems running W3 validator from XP
Message-ID: <002901c4cf23$bda4b6b0$6401a8c0@enterprise>

My efforts to use the W3 HTML validator fail every time because the MIME
type is text. In the past, I was able to validate files.

Other than chucking my operating system, do you have any suggestions as to
how I can address this problem?

Thank you very much.


From Gutenberg9443 at aol.com  Sat Nov 20 09:32:23 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Sat Nov 20 09:32:35 2004
Subject: [gutvol-d] documenting etexts
Message-ID: <13e.6b4b973.2ed0d9a7@aol.com>

Since this topic came up, I have given it a
lot of thought. I think I have the answer.
 
We do not have to know the specific page
number if we're quoting the Bible or Shakespeare.
That can be carried over to other texts as well. (I
hope my underlining shows up in all email.)
 
Bib entry:
 
Richardson, Samuel. Pamela, or Virtue Rewarded.
     orig. pub. 1740-1741. n.p.: Project
     Gutenberg, n.d.
 
footnote or endnote:
 
Richardson. Pamela. Section IV, Letter VII, par. 4.
 
Would not this serve most purposes?
 
Please discuss this WITHOUT FLAMING.
The world has flames enough without them
showing up here.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041120/9e0d6dc0/attachment.html
From jmdyck at ibiblio.org  Sat Nov 20 10:29:00 2004
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Sat Nov 20 10:30:01 2004
Subject: [gutvol-d] documenting etexts
References: <13e.6b4b973.2ed0d9a7@aol.com>
Message-ID: <419F8CEC.8B69BB26@ibiblio.org>

Gutenberg9443@aol.com wrote:
> 
> We do not have to know the specific page
> number if we're quoting the Bible or Shakespeare.
> That can be carried over to other texts as well. (I
> hope my underlining shows up in all email.)
> 
> Bib entry:
> 
> Richardson, Samuel. Pamela, or Virtue Rewarded.
>      orig. pub. 1740-1741. n.p.: Project
>      Gutenberg, n.d.
> 
> footnote or endnote:
> 
> Richardson. Pamela. Section IV, Letter VII, par. 4.
> 
> Would not this serve most purposes?

Theoretically, perhaps, but I think it has some practical shortcomings.

1) It assumes that the person making the reference and the people
looking up the reference all agree on how to count paragraphs. Usually
it's straightforward, but if the source has display quotes, poetry (with
stanzas), epigraphs, footnotes, etc, people will probably make different
assumptions about how to count them.

2) Some cases would require you to count a lot of paragraphs. Consider a
chapter in a novel, with lots of conversational dialogue. The number of
paragraphs could easily get into the hundreds. A reference like "Chapter
5, par. 157" might be rather discouraging.

(The Bible, and some editions of Shakespeare, avoid these problems by
putting the numbering system explicitly in the text.)

-Michael

From joshua at hutchinson.net  Sat Nov 20 13:08:11 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat Nov 20 13:08:07 2004
Subject: [gutvol-d] Problems running W3 validator from XP
In-Reply-To: <002901c4cf23$bda4b6b0$6401a8c0@enterprise>
References: <002901c4cf23$bda4b6b0$6401a8c0@enterprise>
Message-ID: <419FB23B.5000906@hutchinson.net>

Hmm... Are you pointing it to a file on a server or just browsing for 
the local .HTML file and uploading it?  If it is a server, then there is 
something set wrong on your server.  If it is a local file, there should 
BE a MIME type.

Josh

John Hagerson wrote:

>My efforts to use the W3 HTML validator fail every time because the MIME
>type is text. In the past, I was able to validate files.
>
>Other than chucking my operating system, do you have any suggestions as to
>how I can address this problem?
>
>Thank you very much.
>
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>  
>

From joshua at hutchinson.net  Sat Nov 20 13:09:31 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat Nov 20 13:09:21 2004
Subject: [gutvol-d] Problems running W3 validator from XP
In-Reply-To: <419FB23B.5000906@hutchinson.net>
References: <002901c4cf23$bda4b6b0$6401a8c0@enterprise>
	<419FB23B.5000906@hutchinson.net>
Message-ID: <419FB28B.5030108@hutchinson.net>

Oops, that last sentence should read, "If it is a local file, there 
should NOT be a MIME type."


Joshua Hutchinson wrote:

> Hmm... Are you pointing it to a file on a server or just browsing for 
> the local .HTML file and uploading it?  If it is a server, then there 
> is something set wrong on your server.  If it is a local file, there 
> should BE a MIME type.
>
> Josh
>
> John Hagerson wrote:
>
>> My efforts to use the W3 HTML validator fail every time because the MIME
>> type is text. In the past, I was able to validate files.
>>
>> Other than chucking my operating system, do you have any suggestions 
>> as to
>> how I can address this problem?
>>
>> Thank you very much.
>>
>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>
>>  
>>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From servalan at ar.com.au  Sat Nov 20 13:18:10 2004
From: servalan at ar.com.au (Pauline)
Date: Sat Nov 20 13:45:52 2004
Subject: [gutvol-d] Problems running W3 validator from XP
In-Reply-To: <002901c4cf23$bda4b6b0$6401a8c0@enterprise>
References: <002901c4cf23$bda4b6b0$6401a8c0@enterprise>
Message-ID: <419FB492.8010809@ar.com.au>

John Hagerson wrote:
> My efforts to use the W3 HTML validator fail every time because the MIME
> type is text. In the past, I was able to validate files.
> 
> Other than chucking my operating system, do you have any suggestions as to
> how I can address this problem?

This is a known problem with XP Service Pack 2 & IE.

Details here:
http://www.webmasterworld.com/forum21/8867.htm
& more places if you google:
http://www.google.com.au/search?q=validate+w3c+xp+sp2+IE&btnG=Search&hl=en

& discussed at DP here:
http://www.pgdp.net/phpBB2/viewtopic.php?p=88608&highlight=validate+xp#88608

As a permanent fix - use a real browser :)
Download one here: http://www.mozilla.org/

I'm a recent convert from Mozilla for browsing & email to Firefox for 
browsing & Thunderbird for email.

I hope this helps,
P
-- 
Distributed Proofreaders: http://www.pgdp.net
"Preserving history one page at a time."
From stephen.thomas at adelaide.edu.au  Sat Nov 20 16:00:33 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Sat Nov 20 16:00:49 2004
Subject: [gutvol-d] documenting etexts
In-Reply-To: <13e.6b4b973.2ed0d9a7@aol.com>
References: <13e.6b4b973.2ed0d9a7@aol.com>
Message-ID: <419FDAA1.10405@adelaide.edu.au>

As Michael Dyck has pointed out, there are problems with citing 
paragraph numbers. With an HTML version, it would be quite 
possible to add an anchor to the start of every paragraph, so 
that a citation might simply provide a URL to the exact 
paragraph. E.g.

Richardson. _Pamela_. Section IV, Letter VII, 
http://www.gutenberg.org/dirs/etext04/pam1w10.htm#p456

(which unfortunately doesn't exist -- but follows Anne's example.)

(One would expect that anyone citing a PG work would provide the 
link to the exact version that they'd used.)

With a plain text version, it's simply not possible to give an 
exact citation, for the reasons that Michael mentioned. However, 
citation is about citing sources, as best one can, so that

Richardson. _Pamela_. Section IV, Letter VII, 
http://www.gutenberg.org/dirs/etext04/pam1w10.txt

would be perfectly acceptable as a citation -- leaving of course 
the matter of *finding* the exact point in the text to the reader.

GIven that the reader can use the URL to obtain the text, and 
then use Find to search for the phrase in question, with less 
trouble that locating a phrase on a particular printed page, 
this seems to me to be a perfectly adequate form of citation. 
(Especially as any reader will be able to easily obtain the PG 
text, which can't be said for many print citations -- if you 
can't lay your hands on the print edition, it doesn't matter how 
closely the thing is cited!

So my advice is -- don't sweat it, cite as best you can and 
consider the advantages over the disadvantages.


Steve


Gutenberg9443@aol.com wrote:
> Since this topic came up, I have given it a
> lot of thought. I think I have the answer.
>  
> We do not have to know the specific page
> number if we're quoting the Bible or Shakespeare.
> That can be carried over to other texts as well. (I
> hope my underlining shows up in all email.)
>  
> Bib entry:
>  
> Richardson, Samuel. _Pamela, or Virtue Rewarded_.
>      orig. pub. 1740-1741. n.p.: Project
>      Gutenberg, n.d.
>  
> footnote or endnote:
>  
> Richardson. _Pamela_. Section IV, Letter VII, par. 4.
>  
> Would not this serve most purposes?
>  
> Please discuss this WITHOUT FLAMING.
> The world has flames enough without them
> showing up here.
>  
> Anne
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From traverso at dm.unipi.it  Sat Nov 20 11:54:41 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Sun Nov 21 01:07:12 2004
Subject: [gutvol-d] documenting etexts
In-Reply-To: <13e.6b4b973.2ed0d9a7@aol.com> (Gutenberg9443@aol.com)
References: <13e.6b4b973.2ed0d9a7@aol.com>
Message-ID: <200411201954.iAKJsfI2004973@posso.dm.unipi.it>

>>>>> "Anne" == Gutenberg9443  <Gutenberg9443@aol.com> writes:

    Anne> Since this topic came up, I have given it a lot of
    Anne> thought. I think I have the answer.
 
    Anne> We do not have to know the specific page number if we're
    Anne> quoting the Bible or Shakespeare.  That can be carried over
    Anne> to other texts as well. (I hope my underlining shows up in
    Anne> all email.)
 
    Anne> Bib entry:
 
    Anne> Richardson, Samuel. Pamela, or Virtue Rewarded.
    Anne> orig. pub. 1740-1741. n.p.: Project Gutenberg, n.d.
 
    Anne> footnote or endnote:
 
    Anne> Richardson. Pamela. Section IV, Letter VII, par. 4.
 
    Anne> Would not this serve most purposes?
 
    Anne> Please discuss this WITHOUT FLAMING.  The world has flames
    Anne> enough without them showing up here.
 

I have two objections, that I would like to know how you would solve:

- existing books quote other books through pages. If you want to find
  in a book a discussion that is quoted by page, how are you going to
  find it, if you don't have page numbers? Of course, if you have an
  exact quotation you can search for it, but assume that you have just
  a description, or maybe a translation.

- assume that you want to quote a book that you have only in a paper
  edition; to quote it in your style, you need to manually count the
  paragraphs, both when quoting and when checking a quotation;
  wouldn't the standard way of quoting pages of a reference edition
  (usually, the only edition) be better?


Of course, you said: 

    Anne> Would not this serve most purposes?

Yes, most maybe, but not all. And most is not enough. 


You said also

    Anne> to other texts as well. (I hope my underlining shows up in
    Anne> all email.)

No, it doesn't. Here too you are assuming that other people use the
same tools that you use. A good tool is one that adapts itself to an
unknown situation, and does not make assumptions. Discarding page
numbers in reference works makes assumptions on other people's working
methods; the result is a less flexible tool.

Carlo

From gbnewby at pglaf.org  Sun Nov 21 14:06:37 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Nov 21 14:06:39 2004
Subject: [gutvol-d] Re: PG catalog - MARC -- problem with encoding for
	Audio Books
In-Reply-To: <wkmzxek3a8.fsf@chenla.org>
References: <LYRIS-2807460-7607260-2004.11.10-20.00.22--brad#studiojungle.net@listserv.unc.edu>
	<wky8gzc2s6.fsf@chenla.org> <419C13C7.5070904@adelaide.edu.au>
	<20041118125758.GA5939@pglaf.org> <wkmzxek3a8.fsf@chenla.org>
Message-ID: <20041121220637.GB24601@pglaf.org>

On Fri, Nov 19, 2004 at 08:58:55AM +0700, Brad Collins wrote:
> Greg Newby <gbnewby@pglaf.org> writes:
> >> Yes. You'll see I'm now using just 'Project Gutenberg' for the 
> >> publisher name -- after coment from Greg. The a subfield can be 
> >> used for place of publication, but ... I'm not sure what that 
> >> is. Is it still Urbana (I thought PG had long since moved from 
> >> there)? Is it the business address of PGLAF? Is it the home town 
> >> of ibiblio? In the end, it seemed easiest to omit that.
> >
> > I always used Urbana because it's the historical home, and of course
> > PG still has a presence there (i.e., Michael).
> >
> [snip]
> > There is no 100% accurate place to list.
>
>
> Since the place of publication is important for determining copyright
> restrictions in some cases, I think it would be better to include a
> place of publication.

I definitely agree.  I left the below for context, but
wanted to mention my favorite is:

[Urbana, Illinois]: Project Gutenberg, 2004.

Note, I added the state, since there are many Urbanas.

Urbana is as accurate as we are likely to get.
  -- Greg


> This has bothered me for some time. I've always wondered how to
> handle virtual organizations which don't really have a place of
> publication in the conventional sense like PG or the Apache Group.
> 
> So I did a little digging in the ISBD specs and found the following:
> 
> ,----[ ISBD(ER) 4.1.13 ]
> | 4.1.13 When a place of publication, production or distribution does
> | not appear anywhere in the item, the name of the known city or town
> | is supplied in square brackets. If the city or town is uncertain, or
> | unknown, the name of the probable city or town followed by a
> | question mark is supplied in square brackets.  e.g.
> | 
> | - [Paris]
> | - [Prague?]
> `----
> 
> ,----[ ISBD(ER) 4.1.14 ]
> | 4.1.14 When the name of a city or town cannot be given, the name of
> | the state, province or country is given, according to the same
> | stipulations as are applicable to the names of cities or towns.
> | e.g.
> |
> | - Canada 
> |   Editorial comment: Known as place of publication;
> |   appears in prescribed source.
> `----
> 
> Since PG doesn't explicitly state that the place of publication is in
> the States in etexts, (is that right?) this would suggest something
> like:
> 
>   - [USA]: Project Gutenberg, 2004.
> 
> or (I prefer)
> 
>   - [Urbana]: Project Gutenberg, 2004.
> 
> in BMF this might look like:
> 
>   published : &hyphen; $pl[&lsqb;USA&rsqb;]: $pb[Project Gutenberg], $dt[2004]
> 
> or more verbose BMF (bxids only for example):
> 
>   published : &hyphen; $pl[$d:bxid://geo:IKE8-5510 $l:&lsqb;USA&rsqb;]: 
>               $pb[$d:bxid://aut:JIQ6-7286 $l:Project Gutenberg], 
>               $dt[$v:2004-10-12 $l:2004]
> 
> BMF subfields used:
> (For complete list of subfields see:
> http://192.168.0.103/cgi-bin/bmf.cgi/Reference/SubfieldQuickRef.html)
> 
>   pl  place name
>   d   defined-by
>   l   label
>   pb  publisher name
>   dt  inclusive dates
>   v   value-- in dt it should be a iso8601 formated date
> 
> 
> b/
> 
> 
> -- 
> Brad Collins <brad@chenla.org>, Bangkok, Thailand
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From gbnewby at pglaf.org  Sun Nov 21 14:18:32 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Nov 21 14:18:34 2004
Subject: [gutvol-d] [etext04|etext05]/index.html  missing?
In-Reply-To: <00ca01c4cb7a$ac6d7640$6401a8c0@enterprise>
References: <4198E3A2.6080504@perathoner.de>
	<00ca01c4cb7a$ac6d7640$6401a8c0@enterprise>
Message-ID: <20041121221832.GF24601@pglaf.org>

On Mon, Nov 15, 2004 at 07:21:55PM -0600, John Hagerson wrote:
> Well, not knowing what to do, I went to the Robots Readme on the
> Gutenberg.org web site and copied the wget command listed under the heading
> "Getting All EBook Files." I started this process on Sunday evening, at the
> end of a cable modem. Little did I realize that more than 24 hours later,
> the process would still be running.
> 
> In a private message, I was told to use rsync. OK. If rsync is the preferred
> method, then why is wget presented as the example?
> 
> It appears that I'm storing a bunch of index.html files that are redundant
> if I use rsync. I guess I can clean them up at my leisure. However, again
> the web page says "keep the html files" to make re-roboting faster.
> 
> Well, I'll be a mirror site for all of the ZIP and HTML files, anyway.
> 
> Please post suggestions here or pm me. Thank you.

John, please see the mirroring HOWTO at
	http://gutenberg.org/howto

Mirroring the entire site is different than harvesting
a few directories or sets of files.  The "index.html" is created
by the remote server, to simply list the files in a directory -
you are right that it's transient/temporary/imaginary.

Note that a 256Kbit DSL model will take about 6 days to
download the entire PG collection (it's 140GB).  We do not 
recommend DSL or cable modems for setting up mirrors, and
generally don't list them in our mirror list.
  -- Greg


> -----Original Message-----
> From: gutvol-d-bounces@lists.pglaf.org
> [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Marcello Perathoner
> Sent: Monday, November 15, 2004 11:13 AM
> To: Project Gutenberg Volunteer Discussion
> Subject: Re: [gutvol-d] [etext04|etext05]/index.html missing?
> 
> John Hagerson wrote:
> 
> > I am using wget to download books from www.gutenberg.org. The process is
> > stuck on etext04 in what appears to be a futile effort to download
> > index.html.
> 
> The indexes are auto-generated on the fly by Apache.
> 
> If the load on the fileservers is too high the connection times out 
> before a full directory listing can be retrieved.
> 
> You should not harvest at peak hours anyway.
> 
> 
> 
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From Gutenberg9443 at aol.com  Sun Nov 21 17:02:04 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Sun Nov 21 17:02:23 2004
Subject: [gutvol-d] documenting etexts
Message-ID: <19e.2bb96cf4.2ed2948c@aol.com>

 
In a message dated 11/21/2004 2:07:20 AM Mountain Standard Time,  
traverso@dm.unipi.it writes:

>>Here too you are assuming that other people use  the
>>same tools that you use. A good tool is one that adapts  >>itself to an
>>unknown situation, and does not make  assumptions. >>Discarding page
>>numbers in reference works  makes assumptions on >>other people's working
>methods; the result  is a less flexible tool.


I didn't assume. I asked. Some ISPs carry formatting over and some don't. I  
have no idea what ISP someone I don't know is using or what that ISP might  do 
with formatting.
 
A good tool is one that can be used for at least 50 things besides the one  
it is designed for. You can use two bricks to kill a fly, if you can figure out 
 how to make the fly stay on the bottom brick long enough for you to clap the 
top  brick on top of it. You can use bricks and boards to make a bookcase. 
You can  make a street out of bricks. Go away and think of 47 other uses for 
bricks. Then  talk to me about tools.
 
I am not recommending discarding page numbers in reference books. I am  
suggesting that the majority of books already posted do not necessitate going  back 
and redoing to insert the page numbers.
 
It's like those old Tom Swift books I was recently accused of reading in  
preference to anything else: if I want to do a learned paper or book on the  
Stratemeyer syndicate--I think but am not sure, and it is not necessary for  
anybody to inform me, that it included Tom Swift; I know it included Nancy Drew  
and the Hardy Boys--I will have to go somewhere that I can use the tree book  
versions, and even then I'll have to be careful, because I know that Nancy Drew  
and the Hardy Boys were rewritten umpteen times, often with no more than the  
title saved from edition to edition and no indication in the front matter as 
to  what version this one was.
 
But do enough people want to write learned papers on Tom Swift, or Tarzan,  
or Elsie Dinsmore, or The Wizard of Oz, for it to be reasonable for me to 
demand  that all the Tom Swift, Tarzan, Elsie Dinsmore, and Wizard of Oz  books to 
be pulled down until somebody has time to rescan them and keep  all the page 
numbers this time?
 
I don't think so.
 
By the way, since so many people seem to know better than I do what I'm  
reading at present, I'll save them the trouble of guessing. I have finally laid  
my hands on a copy of Isabella Beeton's 1865 book on household  
management--University of Adelaide has posted it--and I'm reading it because I  think that it 
is appropriate Sabbath Day reading, and yes I know different  religions have 
different "Sabbath Days" but I'm referring to my own religion's.  (I specify 
this because I was once head of a very small--three  person--department which 
happened to include a Muslim, a Christian, and a fellow  who wasn't interested 
in religion. So I set schedules up so that I was always  off Sunday, which I 
wanted, and Saki was always off Saturday, which he wanted,  and Pat was always 
off in the middle of the week, which he wanted. So my boss's  boss turned it 
all around so that none of us had the days off we wanted, because  I could not 
get it through his head that we were all happy with the schedule I  had 
arranged.) I stopped reading the book long enough to send a message to my  brothers 
inquiring how to slice a garfish and how many axes would be necessary.  Mrs. 
Beeton gives instructions for how to cook the garfish but she begins by  saying 
that it is necessary to begin by slicing the garfish. Last time (okay,  the 
only time) I ever saw a garfish, one of my brothers tried to behead it and  
broke an axe.
 
By the way, Mrs. Beeton does not number pages. She numbers recipes. So her  
table of contents and her index get a reader to the right place no matter what  
form the text is in. Three cheers for Mrs. Beeton!
 
The previous book was Pamela; the next ones will be A. Merritt's The Moon  
Pool and The Metal Monster. I'm perfectly furious that one of the A. Merritt  
books I've been seeking has turned up on FictionWise and I have to PAY for it  
and I don't have the money.
 
Shall I report on what books I read after The Metal Monster? Actually I was  
kind of thinking about calling a whole lot of state capitals and explaining 
that  we need to redo our registrations and asking how I need to go about doing 
it,  but if it's necessary for me to give book reports I can do that instead. 
Also  I'm sort of busy reading ancient Egyptian medical books in preparation 
for a  novel I'm writing that includes Luke the Physician, but I couldn't get 
them  online because the English versions are still in copyright and I can't 
read  hieroglyphics, which doesn't matter because they aren't on line in  
hieroglyphics  either, so I had to get them through ILL.
 
What have I ever done to you to make you want to bite my head off every  time 
I post? I can't help being autistic. I was born autistic. You can help  being 
a walking, talking, grouch box.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041121/b5a17e8f/attachment.html
From Gutenberg9443 at aol.com  Mon Nov 22 06:38:11 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Mon Nov 22 06:42:42 2004
Subject: [gutvol-d] documenting etexts (without flaming)
Message-ID: <9.37cba20c.2ed353d3@aol.com>

 
In a message dated 11/21/2004 2:07:20 AM Mountain Standard Time,  
traverso@dm.unipi.it writes:

>>- assume that you want to quote a book that you have  >>only in a paper
>>  edition; to quote it in your style,  you need to manually >>count the
>> paragraphs, both when  quoting and when checking a quotation;
>>  wouldn't the standard  way of quoting pages of a >>reference edition
>>   (usually, the only edition) be better?


>>Of course, you said:  

>>    Anne> Would not this serve most  purposes?

>>Yes, most maybe, but not all. And most is not enough.  


>>You said also

>>   Anne> to  other texts as well. (I hope my underlining shows up in
>>   Anne> all email.)

>>No, it doesn't. Here too you are  assuming that other >>people use the
>>>>same tools that  you use. A good tool is one that adapts itself to an
>>unknown  situation, and does not make assumptions. >>Discarding  page
>>numbers in reference works makes assumptions on >>other  people's working
??methods; the result is a less flexible  tool.


There's another problem here, that YOU are
missing,  and it is this:
 
PG does not have control of all etexts.
 
Whether the total number of free etexts online is
40,000, as I estimate, or 100,000, as Michael
estimates, the fact remains that PG does not
have all etexts, or a majority of etexts, or even a 
plurality of etexts.
 
I keep track of every etext site I hear of, and I check
all of them out. Only those few that post page scans,
and there are very few of them, make original
page numbers available. If I am looking for a book
and I can find only page scans of it, I won't download
it unless I desperately want it and can't find it anywhere
else, because I don't like to fiddle around with putting
the pages together to read. Some years ago I wanted
a specific edition of the Qur'an, and had to download
it sura by sura. It took me a lot more hours to put it
together than I wanted to expend on that task.
 
So a documentation method that works only for
page scans and/or full texts that include page
numbers is unusable for more texts than it is
usable for
 
Also, I don't want to say that all, or even most,
reference books come in only one edition.
My Oxford Guide to American Literature is
fifth edition, and I'm almost certain there's now
a sixth edition available. My Granger's Index to Poetry
is eighth edition and I think it is two editions old; I know
it is at least one. My Larousse English/German dictionary is 
dated 2000 and MIGHT be current, except for the
fact that it uses the "new" German spelling, and
I think I read online that the "old" spelling is back in
use. Most astronomy, physics, biology, geography, and
geology texts are out of date by the time they roll
off the press, and by the time they make their way
online they are so hideously out of date that anyone
relying on them would be in trouble.
 
Of the solutions proposed, the one I like best is the
suggestion that the person doing the paper could
include with it the URL of, or a link to, the specific
reference book used, and to make sure it doesn't
change, that person should put the source on
his or her own Website and link to it there. But
even THAT won't work for purchased ebooks.
 
I think we'll probably flounder around for another
ten to twenty years before a workable permanent
solution is devised. But all the flaming and/or
condescension in the world isn't going to help
a bit.
 
I apologize for my flaming yesterday. I try not
to blow up but sometimes I do it anyway.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041122/983092bc/attachment-0001.html
From joshua at hutchinson.net  Mon Nov 22 11:05:49 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Nov 22 11:05:56 2004
Subject: [gutvol-d] documenting etexts
Message-ID: <20041122190549.B40EC109939@ws6-4.us4.outblaze.com>


----- Original Message -----
From: Gutenberg9443@aol.com
>  
> In a message dated 11/21/2004 2:07:20 AM Mountain Standard Time,  
> traverso@dm.unipi.it writes:
> 
> >>Here too you are assuming that other people use  the
> >>same tools that you use. A good tool is one that adapts  >>itself to an
> >>unknown situation, and does not make  assumptions. >>Discarding page
> >>numbers in reference works  makes assumptions on >>other people's working
> >methods; the result  is a less flexible tool.
> 
> 
> 
> I didn't assume. I asked. Some ISPs carry formatting over and some don't. I  
> have no idea what ISP someone I don't know is using or what that ISP might  do 
> with formatting.

Actually, the ISP has nothing to do with it... What shows up is dependent on what program you are using to read the e-mail.  Just a quick FYI.

>  
> A good tool is one that can be used for at least 50 things besides the one  
> it is designed for. You can use two bricks to kill a fly, if you can figure out 
>  how to make the fly stay on the bottom brick long enough for you to clap the 
> top  brick on top of it. You can use bricks and boards to make a bookcase. 
> You can  make a street out of bricks. Go away and think of 47 other uses for 
> bricks. Then  talk to me about tools.
>  
> I am not recommending discarding page numbers in reference books. I am  
> suggesting that the majority of books already posted do not necessitate going  back 
> and redoing to insert the page numbers.
>  

Well, you can kill that same fly by running over it with a semi truck ... but that doesn't make either one a GOOD tool for the job.  Carlos' point (which was worded nicely despite your reaction to it) is that you have to create a system that works despite not knowing the exact environment it will be used in. This is why so many people had problems with bowerbird's ZML viewer.  It required everyone to be using a specific reader program that you simply cannot guarantee will be in use.


> It's like those old Tom Swift books I was recently accused of reading 

Unless I missed a message somewhere ... someone was using the Tom Swifts as an example of a type of book for a particular point.  It was not a listing of what you read or don't read.

>  
> But do enough people want to write learned papers on Tom Swift, or Tarzan,  
> or Elsie Dinsmore, or The Wizard of Oz, for it to be reasonable for me to 
> demand  that all the Tom Swift, Tarzan, Elsie Dinsmore, and Wizard of Oz  books to 
> be pulled down until somebody has time to rescan them and keep  all the page 
> numbers this time?
>  

No one has ever said that (unless, again, I missed a message).  Many people have said that they will need to be redone at some future point to put that information back in.  (Jon Noring is the biggest proponent of this.)  

> By the way, since so many people seem to know better than I do what I'm  
> reading at present, I'll save them the trouble of guessing. 

This type of wording is what starts flame wars.  And it is coming from your side.  Please calm down a little here.  No one has tried to start a flame war, but I can see people getting defensive in reply to your recent messages and it will lead to some things being said that probably shouldn't be.

>  
> What have I ever done to you to make you want to bite my head off every  time 
> I post? I can't help being autistic. I was born autistic. You can help  being 
> a walking, talking, grouch box.
>  
> Anne
> 

All I can tell you, Anne, is that Carlos did NOT bite your head off.  Rather, he explained the fallacies he saw in your argument.  Carlos is actually one of the more even tempered folks around here.  He won't hold back on pointing out things he disagrees with, but I've never seen him be a "walking, talking, grouch box."

Josh
From Gutenberg9443 at aol.com  Mon Nov 22 12:21:10 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Mon Nov 22 12:21:26 2004
Subject: [gutvol-d] documenting etexts
Message-ID: <45.1bdb2f1d.2ed3a436@aol.com>

 
In a message dated 11/22/2004 12:06:08 PM Mountain Standard Time,  
joshua@hutchinson.net writes:

Rather,  he explained the fallacies he saw in your argument.  


I have no objections to having fallacies pointed out;
however, I had made no assumptions. I had 
made a SUGGESTION and ASKED FOR COMMENT. 
I had thought about the situation for some time before
I was ready to put forth the suggestion. Therefore, condescendingly telling  
me I had made 
incorrect assumptions was maddening. I shall now 
explain why I almost never make assumptions.
 
When I first became a crime scene technician,
my boss would never allow me to say a substance
was blood. I had to say "a red fluid which 
appeared to be blood." Even if somebody is
lying on the floor with a shotgun blast through
his chest, he is lying in a pool of "a red
fluid which appears to be blood." I couldn't
understand why I had to do this, until the day
that my boss and I were trailing an injured  murderer
down an alley by the places he had stopped
to bleed. The last blood spatter was in the
middle of a blind alley with no doors opening onto
it and a wall too high for an injured person to
climb. This made no sense at all to us. There 
was nowhere for him to go from there. Nevertheless, 
a sample was taken from each splotch. When 
the lab report came back, we learned that the last 
spatter was brake fluid. We had lost him on the street,
at the end of the alley, where he apparently got into a car.
 
I don't KNOW that he got into a car. He might
have gotten into a truck or onto a motorcycle
or bicycle. He might have gotten into a flying
saucer. It APPEARED that he had gotten into
a car. I cannot ASSUME what he did. I wasn't
there. I didn't see it.
 
Therefore I rarely make assumptions. 
 
I asked whether my suggestion would work. I 
have no problem at all with being told that it would 
not work. It was the condescending attitude, in this 
case and in the "old Tom Swift books," post, that 
was like a red flag to a bull. And you have apparently
missed some posts, because this is the
third time in less than a month that Carlo
has dropped on me like a ton of lead,
assuming I have assumptions that I do
not have. The first two times I
laboriously explained what I was saying
and why and how I had not meant what
he assumed I meant. This time he got
me on a day when I was ill and already
crabby, and I bit back. I wished that I
had not done so two seconds after I
sent it, but I couldn't unsend it.
 
As to ISPs and programs, you explained
without acting as if I had an IQ of minus
thirty. Thank you.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041122/6b52e387/attachment.html
From joshua at hutchinson.net  Mon Nov 22 13:03:17 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Nov 22 13:03:27 2004
Subject: [gutvol-d] documenting etexts
Message-ID: <20041122210317.6FEEE4F52D@ws6-5.us4.outblaze.com>

Something just occured to me here ... Did you realize that Carlos (while very articulate in English) is not a native english speaker?  Maybe that explains why you are reading more into what Carlos is writing than I am getting from it.

Also, it is not out of line to interpret "it should underline" as an assumption of sorts.

Josh


----- Original Message -----
From: Gutenberg9443@aol.com
To: gutvol-d@lists.pglaf.org
Subject: Re: [gutvol-d] documenting etexts
Date: Mon, 22 Nov 2004 15:21:10 EST

> 
>  
> In a message dated 11/22/2004 12:06:08 PM Mountain Standard Time,  
> joshua@hutchinson.net writes:
> 
> Rather,  he explained the fallacies he saw in your argument.  
> 
> 
> I have no objections to having fallacies pointed out;
> however, I had made no assumptions. I had 
> made a SUGGESTION and ASKED FOR COMMENT. 
> I had thought about the situation for some time before
> I was ready to put forth the suggestion. Therefore, condescendingly telling  
> me I had made 
> incorrect assumptions was maddening. I shall now 
> explain why I almost never make assumptions.
>  
> When I first became a crime scene technician,
> my boss would never allow me to say a substance
> was blood. I had to say "a red fluid which 
> appeared to be blood." Even if somebody is
> lying on the floor with a shotgun blast through
> his chest, he is lying in a pool of "a red
> fluid which appears to be blood." I couldn't
> understand why I had to do this, until the day
> that my boss and I were trailing an injured  murderer
> down an alley by the places he had stopped
> to bleed. The last blood spatter was in the
> middle of a blind alley with no doors opening onto
> it and a wall too high for an injured person to
> climb. This made no sense at all to us. There 
> was nowhere for him to go from there. Nevertheless, 
> a sample was taken from each splotch. When 
> the lab report came back, we learned that the last 
> spatter was brake fluid. We had lost him on the street,
> at the end of the alley, where he apparently got into a car.
>  
> I don't KNOW that he got into a car. He might
> have gotten into a truck or onto a motorcycle
> or bicycle. He might have gotten into a flying
> saucer. It APPEARED that he had gotten into
> a car. I cannot ASSUME what he did. I wasn't
> there. I didn't see it.
>  
> Therefore I rarely make assumptions. 
>  
> I asked whether my suggestion would work. I 
> have no problem at all with being told that it would 
> not work. It was the condescending attitude, in this 
> case and in the "old Tom Swift books," post, that 
> was like a red flag to a bull. And you have apparently
> missed some posts, because this is the
> third time in less than a month that Carlo
> has dropped on me like a ton of lead,
> assuming I have assumptions that I do
> not have. The first two times I
> laboriously explained what I was saying
> and why and how I had not meant what
> he assumed I meant. This time he got
> me on a day when I was ill and already
> crabby, and I bit back. I wished that I
> had not done so two seconds after I
> sent it, but I couldn't unsend it.
>  
> As to ISPs and programs, you explained
> without acting as if I had an IQ of minus
> thirty. Thank you.
>  
> Anne
> 

>
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From traverso at dm.unipi.it  Mon Nov 22 13:39:49 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Mon Nov 22 13:40:20 2004
Subject: [gutvol-d] documenting etexts
In-Reply-To: <45.1bdb2f1d.2ed3a436@aol.com> (Gutenberg9443@aol.com)
References: <45.1bdb2f1d.2ed3a436@aol.com>
Message-ID: <200411222139.iAMLdn9W024656@posso.dm.unipi.it>


Dear Anne,

I apologize if you have felt any animosity towards you. It was not
meant. As you know, I am not a native speaker of english. Maybe it is
just a misunderstanding.

You say:

> And you have apparently
> missed some posts, because this is the
> third time in less than a month that Carlo
> has dropped on me like a ton of lead,
> assuming I have assumptions that I do
> not have. The first two times I
> laboriously explained what I was saying
> and why and how I had not meant what
> he assumed I meant.

Sorry, I should have missed them too, I don't remember having answered
posts of yours recently, and I have not found one in the last year of
gutvol-d. Of course I often disagree with you, our point of view is
different. 

Another coincidence might have been bad in my post: I had answered
much earlier, but the mail was delayed by the mailer, and arrived when
the post was already answered (you can check in the header, I answered
on saturday, and arrived late on sunday). So my post seemed to be
insisting on a point that was already discussed enough. I would not
have sent it on sunday.


Carlo

From Gutenberg9443 at aol.com  Mon Nov 22 15:27:36 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Mon Nov 22 15:27:54 2004
Subject: [gutvol-d] documenting etexts
Message-ID: <144.39631f99.2ed3cfe8@aol.com>

 
In a message dated 11/22/2004 2:03:48 PM Mountain Standard Time,  
joshua@hutchinson.net writes:

Something just occured to me here ... Did you realize that Carlos  (while 
very articulate in English) is not a native english speaker?   


Yes, I know that.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041122/eaff83fa/attachment.html
From jmdyck at ibiblio.org  Wed Nov 24 11:58:59 2004
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Wed Nov 24 11:59:24 2004
Subject: [gutvol-d] "We're still keeping up with Moore's Law!"
Message-ID: <41A4E803.39FDF18A@ibiblio.org>

In today's gweekly Pt 1, Michael Hart wrote:
>
> We're still keeping up with Moore's Law!
> Moore's Law 18 month percentage = 115%
> Moore's Law 12 month percentage =  67%

I don't understand how these percentages were calculated.

18 months ago (May 24, 2003), the "TOTAL COUNT" (incl PG Australia) was
about 8044 (interpolating between 8021 on May 21 and 8075 on May 28),
so doubling every 18 months would land us at 2 * 8044 = 16088 today.
Today's actual count is 14,484, or 90% of the predicted amount.

12 months ago (Nov 24, 2003), the total count was about 10,517
(interpolating between 10,396 on Nov 19 and 10,565 on Nov 26).
Doubling every 18 months means multiplying by 2^(12/18) = 1.5874
every 12 months, which would predict a count of 16695 today.
Our actual count of 14,484 is 87% of that.

So where do 115% and 67% come from?

-Michael
From stephen.thomas at adelaide.edu.au  Wed Nov 24 17:52:53 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Wed Nov 24 17:53:14 2004
Subject: [gutvol-d] Linking to page images
Message-ID: <41A53AF5.9020208@adelaide.edu.au>

Regarding the question of whether and how we should provide 
links to page scans, here's one idea which may have potential:

Etext number 10072, "English Housewifery Exemplified", by 
Elizabeth Moxon, was produced from scans from Biblioteca de la 
Universitat de Barcelona (it says so at the top of the text).

It happens that Biblioteca de la Universitat de Barcelona makes 
their scans available on the net, for free (apparently -- my 
Spanish is non-existent; might even be Catalan?)

So I've added a note for this text ni the catalog, which reads:

" Produced from page images available from Biblioteca de la 
Universitat de Barcelona, at 
http://www.bib.ub.es/grewe/showbook.pl?gw58 "

thus providing a link to the actual page scans for anyone 
interested. Is this a model to be followed?

Of course, this raises the question of who is to add this 
information to the catalogue -- the link was not provided in the 
etext, I had to go look for it.

Also, the link is not clickable. Ideally, you'd want the link to 
be active, but there's no place for this in the present catalog 
design.


Steve

-- 

Stephen Thomas,
Senior Systems Analyst,
University of Adelaide Library
UNIVERSITY OF ADELAIDE SA 5005
AUSTRALIA
Phone: +61 8 830 35190  Fax: +61 8 830 34369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

CRICOS Provider Number 00123M
-----------------------------------------------------------
This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.


From hart at pglaf.org  Thu Nov 25 08:24:53 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Nov 25 08:24:55 2004
Subject: [gutvol-d] "We're still keeping up with Moore's Law!"
In-Reply-To: <41A4E803.39FDF18A@ibiblio.org>
References: <41A4E803.39FDF18A@ibiblio.org>
Message-ID: <Pine.LNX.4.60.0411250823390.4581@pglaf.org>


On Wed, 24 Nov 2004, Michael Dyck wrote:

> In today's gweekly Pt 1, Michael Hart wrote:
>>
>> We're still keeping up with Moore's Law!
>> Moore's Law 18 month percentage = 115%
>> Moore's Law 12 month percentage =  67%
>
> I don't understand how these percentages were calculated.
>
> 18 months ago (May 24, 2003), the "TOTAL COUNT" (incl PG Australia) was
> about 8044 (interpolating between 8021 on May 21 and 8075 on May 28),
> so doubling every 18 months would land us at 2 * 8044 = 16088 today.
> Today's actual count is 14,484, or 90% of the predicted amount.
>
> 12 months ago (Nov 24, 2003), the total count was about 10,517
> (interpolating between 10,396 on Nov 19 and 10,565 on Nov 26).
> Doubling every 18 months means multiplying by 2^(12/18) = 1.5874
> every 12 months, which would predict a count of 16695 today.
> Our actual count of 14,484 is 87% of that.
>
> So where do 115% and 67% come from?

These come from one of Brett's programs. . .I've asked him
to check on them a few times, as I agree with you that the
figures don't look right.

I forwarded him this to encourage some rechecking.

;-)

Happy Thanksgiving!

Michael

From nihil_obstat at mindspring.com  Fri Nov 26 10:09:53 2004
From: nihil_obstat at mindspring.com (Dennis McCarthy)
Date: Fri Nov 26 10:10:03 2004
Subject: [gutvol-d] 7-bit ASCII, how many characters?
Message-ID: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net>


A technical question:

Exactly what characters make up 7-bit ascii?  I presume it is 128 (2 to the 7th power).  So logically any character I can generate by typing Alt+0000 thro' Alt+0127 (in MS Windows) is kosher in a 7-bit ASCII text.

Specifically I want to know if I can us "|" (the character made by hitting Shift+backslash on a standard US keyboard, or Alt+0124).

Generally, are the following (Alt+0000 thro' Alt+0127) always okay?
      ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ 


---------------------------
Dennis McCarthy
nihil_obstat@mindspring.com

From jtinsley at pobox.com  Fri Nov 26 10:25:34 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Fri Nov 26 10:25:41 2004
Subject: [gutvol-d] 7-bit ASCII, how many characters?
In-Reply-To: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net>
References: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net>
Message-ID: <20041126182534.GB8717@panix.com>

On Fri, Nov 26, 2004 at 01:09:53PM -0500, Dennis McCarthy wrote:
>
>A technical question:
>
>Exactly what characters make up 7-bit ascii?  I presume it is 128 (2 to the 7th power).  So logically any character I can generate by typing Alt+0000 thro' Alt+0127 (in MS Windows) is kosher in a 7-bit ASCII text.
>
>Specifically I want to know if I can us "|" (the character made by hitting Shift+backslash on a standard US keyboard, or Alt+0124).
>
>Generally, are the following (Alt+0000 thro' Alt+0127) always okay?
>      ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ 
>

Depends what you mean by "okay".

Anything in the range 32 (space) through 126 (tilde) is
definitely OK, and common. Your specific character, 124,
is commonly used by people who want to create a box-like
layout.

Below 32, chars 10 and 13 (LF and CR) are definitely necessary
at the end of every line, but character 9 (Tab) is specifically
discouraged, because of the undefined effect it may have on
different viewing or editing programs. Other characters below 
space (32) . . . well, I imagine someone could come up with a useful
reason to use one or more of them, in some special situation, but
I can't think of one right now. Ditto 127, whose only reason for
existence is to delete another character.

jim

From hyphen at hyphenologist.co.uk  Fri Nov 26 11:28:14 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Fri Nov 26 11:28:34 2004
Subject: [gutvol-d] 7-bit ASCII, how many characters?
In-Reply-To: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net>
References: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net>
Message-ID: <p60fq05ah65lbltdlntrnf6sf3hmtqfl06@4ax.com>

On Fri, 26 Nov 2004 13:09:53 -0500 (GMT-05:00),  Dennis McCarthy
<nihil_obstat@mindspring.com> wrote:

| 
| A technical question:
| 
| Exactly what characters make up 7-bit ascii?  I presume it is 128 (2 to the 7th power).  So logically any character I can generate by typing Alt+0000 thro' Alt+0127 (in MS Windows) is kosher in a 7-bit ASCII text.
| 
| Specifically I want to know if I can us "|" (the character made by hitting Shift+backslash on a standard US keyboard, or Alt+0124).
| 
| Generally, are the following (Alt+0000 thro' Alt+0127) always okay?
|       ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ 

Not everyone used Windoze there are some 500 8 bit character sets in use,
but many are obsolete/obsolescent.


See http://www.asciitable.com/

Decimal 0 to 31 are control characters and unusable in text.  
  They may be many other things besides control characters.
Decimal 32 is space.
Decimal 33 to 126 are usable in 7 bit ASCII text as listed in the URL
Decimal 127 is unusable.
Decimal 128 and above may be absolutely anything, to use these one must
state which of the 500 character sets you are using.  One persons standard
sends the next person insane :-(

-- 
Dave F

From stephen.thomas at adelaide.edu.au  Fri Nov 26 18:51:01 2004
From: stephen.thomas at adelaide.edu.au (Stephen Thomas)
Date: Fri Nov 26 18:52:35 2004
Subject: [gutvol-d] 7-bit ASCII, how many characters?
In-Reply-To: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net>
References: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net>
Message-ID: <1101523861.41a7eb95a9b5e@pandani.services.adelaide.edu.au>

If you have a standard US keyboard, then any of the keys you can
hit (excluding the function keys), with or without the shift
key, are ASCII.

(ASCII also includes "control" characters, 0000 thru 0031, which
you won't see, and won't want to enter anyway.)

The "|" (vertical bar or pipe) is certainly legit ASCII.

Steve


Quoting Dennis McCarthy <nihil_obstat@mindspring.com>:

>
> A technical question:
>
> Exactly what characters make up 7-bit ascii?  I presume it is
> 128 (2 to the 7th power).  So logically any character I can
> generate by typing Alt+0000 thro' Alt+0127 (in MS Windows) is
> kosher in a 7-bit ASCII text.
>
> Specifically I want to know if I can us "|" (the character
> made by hitting Shift+backslash on a standard US keyboard, or
> Alt+0124).
>
> Generally, are the following (Alt+0000 thro' Alt+0127) always
> okay?
>       ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; <
> = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [
> \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z
> { | } ~ 
>
>
>
> ---------------------------
> Dennis McCarthy
> nihil_obstat@mindspring.com
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


Stephen Thomas,
Senior Systems Analyst,
University of Adelaide Library
UNIVERSITY OF ADELAIDE SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

CRICOS Provider Number 00123M
-----------------------------------------------------------
This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.
From sly at victoria.tc.ca  Sat Nov 27 00:34:59 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat Nov 27 00:35:22 2004
Subject: [gutvol-d] 7-bit ASCII, how many characters?
In-Reply-To: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net>
References: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net>
Message-ID: <Pine.GSO.4.58.0411270016480.18366@vtn1.victoria.tc.ca>


On Fri, 26 Nov 2004, Dennis McCarthy wrote:

> Exactly what characters make up 7-bit ascii?
>
> Generally, are the following (Alt+0000 thro' Alt+0127) always okay?

One thing I might add to this discussion is a matter of semantics.
The character _`_ does belong to ascii (96-decimal, 60-hex)
However it is a spacing grave accent mark, and not an opening
single quote, which you may sometimes see it used for in
text files.

A while ago, I had the address of a web page which explained
in detail why ASCII-96 should not be used as an opening single
quote, but I can't find it now.

Andrew
From ke at gnu.franken.de  Sun Nov 28 06:17:26 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Sun Nov 28 07:39:35 2004
Subject: [gutvol-d] Re: 7-bit ASCII, how many characters?
In-Reply-To: <Pine.GSO.4.58.0411270016480.18366@vtn1.victoria.tc.ca> (Andrew
	Sly's message of "Sat, 27 Nov 2004 00:34:59 -0800 (PST)")
References: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net>
	<Pine.GSO.4.58.0411270016480.18366@vtn1.victoria.tc.ca>
Message-ID: <shact26osp.fsf@tux.gnu.franken.de>

Andrew Sly <sly@victoria.tc.ca> writes:

> A while ago, I had the address of a web page which explained
> in detail why ASCII-96 should not be used as an opening single
> quote, but I can't find it now.

Search for "Markus" and "Kuhn" and "quote":
http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

-- 
http://www.gnu.franken.de/ke/                           |      ,__o
                                                        |    _-\_<,
                                                        |   (*)/'(*)
Key fingerprint = F138 B28F B7ED E0AC 1AB4  AA7F C90A 35C3 E9D0 5D1C
From hart at pglaf.org  Sun Nov 28 07:43:49 2004
From: hart at pglaf.org (Michael Hart)
Date: Sun Nov 28 07:43:51 2004
Subject: [gutvol-d] 7-bit ASCII, how many characters?
In-Reply-To: <Pine.GSO.4.58.0411270016480.18366@vtn1.victoria.tc.ca>
References: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net>
	<Pine.GSO.4.58.0411270016480.18366@vtn1.victoria.tc.ca>
Message-ID: <Pine.LNX.4.60.0411280735500.1150@pglaf.org>


On Sat, 27 Nov 2004, Andrew Sly wrote:

>
>
> On Fri, 26 Nov 2004, Dennis McCarthy wrote:
>
>> Exactly what characters make up 7-bit ascii?
>>
>> Generally, are the following (Alt+0000 thro' Alt+0127) always okay?
>
> One thing I might add to this discussion is a matter of semantics.
> The character _`_ does belong to ascii (96-decimal, 60-hex)
> However it is a spacing grave accent mark, and not an opening
> single quote, which you may sometimes see it used for in
> text files.
>
> A while ago, I had the address of a web page which explained
> in detail why ASCII-96 should not be used as an opening single
> quote, but I can't find it now.

Since French doesn't really USE the ` with the spacing, other
than in cases where we usually would use _`_ or " ' " etc,
it is really somewhat of a moot point.

In addition, if this really had been intended to be a French
accent grave, why is the "_" between it and the " ^ " which
could be the French accent cironflex. . .not to mention the
lack of an accent aigue, etc. . . .

87 57  W
88 58  X
89 59  Y
90 5A  Z
91 5B  [
92 5C  \
93 5D  ]

94 5E  ^  <<<

95 5F  _

96 60  `  <<<

97 61  a
98 62  b
99 63  c


Michael

From shalesller at writeme.com  Sun Nov 28 17:04:10 2004
From: shalesller at writeme.com (D. Starner)
Date: Sun Nov 28 17:04:22 2004
Subject: [gutvol-d] 7-bit ASCII, how many characters?
Message-ID: <20041129010410.E14A14BDAA@ws1-1.us4.outblaze.com>

"Michael Hart" writes:

> On Sat, 27 Nov 2004, Andrew Sly wrote: 

> > A while ago, I had the address of a web page which explained 
> > in detail why ASCII-96 should not be used as an opening single 
> > quote, but I can't find it now. 

It's <http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html>; basically,
the grave and the Latin-1/Unicode acute accent (U+00B4) should
be balanced and are in most modern fonts, 
 
> In addition, if this really had been intended to be a French 
> accent grave, why is the "_" between it and the " ^ " which 
> could be the French accent cironflex. . .not to mention the 
> lack of an accent aigue, etc. . . . 

There's no particular reason for the sorting, but the ' originally
leaned right, which not only made `quotes' look right, but make it
possible to be used as an acute accent, both designed to be backspaced
over the character. If you used the " as diaresis and , as cedilla,
you have German and all the Romantic languages handled. The use of
backspace in this manner disappeared after ASCII was standardized,
making the ASCII collection a little unusual.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From jon at noring.name  Mon Nov 29 18:32:58 2004
From: jon at noring.name (Jon Noring)
Date: Mon Nov 29 18:33:41 2004
Subject: [gutvol-d] On quote-like marks...
Message-ID: <180519400796.20041129193258@noring.name>

Regarding the recent discussion about ASCII and the single/double
quote marks (and what to use), I have my two cents to add (and those
here who are much more expert at character sets and Unicode than I am
will undoubtely be able to add to this.)

The situation regarding single and double quote-like marks is even
more complicated than what it has been presented so far. It has an
impact on the future expanded use of PG texts as envisioned by Michael
Hart and others, such as text-to-speech and language conversion. So I
believe it needs to be dealt with in a more standardized-fashion (that
is, don't simply use the straight keyboard ' and " for everything
under the sun.)

Quote-like marks are used for multiple purposes in texts -- especially
single quote-like marks. And then there are the "curly" types of marks
used in typographical presentation.

Here's a (probably) partial list of their multiple uses:

1) For marking up quotations (other conventions are also used)
2) Word contractions (e.g., "we're" for "we are")
3) Possessives ("the Emperor's crown")
4) Non-breaking character modifiers (see below)
5) Minutes and second of time and arc. (50d3'25")
6) Feet and inches unit indicator (She is 5'7" tall)
7) Other mathematical symbol and unit measurement uses.

Item (4) is particularly interesting since I'm working on cleaning up
Burton's "1001 Arabian Nights Tales", and in it there are many Arabic
names where, when Burton converted to Latin script, single quote-like
marks were inserted to indicate a type of non-breaking character
modifier for pronounciation purposes. For example: Ja'afar. This
semantically differs from the apostrophes used for contractions/
possessives -- or at least semantically different enough (imho) that
warrants differentiation in character encoding/entities.

In the XML markup of the Arabian Nights, I've chosen to use the
following Unicode character conventions to keep everything straight.
It's not what I necessarily propose PG/DP do, but it indicates one
possible approach. Since at present I do not enclose quotations in <q>
(for example), I keep in the quotation marks (double and single) to
identify quotations. In the Arabian Nights I find some odd quotation
passages, a couple of which start in the middle of one paragraph and
end in the middle of another paragraph later within a story, so adding
<q>...</q> would result in non-well-formed XML (I could use the "mile
marker" approach as defined in TEI, but for the Arabian Nights have
chosen not to.)

1) For quotations using double quote marks, I use the Unicode
   left-double quote mark for the beginning, and the right-double
   quote mark for the ending: &#x201C; and &#x201D;, respectively.

   (The "curly" quotes -- for those who don't like curly quotes for
   reading, it is trivial to convert them to straight keyboard quote
   marks, but going the other way is more difficult to do.)

2) For quotations using single quote marks, I use the Unicode
   left-single quote mark for the beginning, and the right-single
   quote mark for the ending: &#x2018; and &#x2019;, respectively.

3) For the non-breaking character modifier as described above, e.g.
   for "Ja'afar", I use the Unicode character specific for this
   purpose: &#x02BC;

4) For word contractions and possessives I use the ordinary lower-
   ASCII single straight quote mark: '  (For later presentational
   purposes this character can always be converted to the right-single
   "curly" quote mark.)

For use of ' and " for minutes and seconds of arc, and feet/inches,
there are special Unicode code points for these (I don't see this
usage in the Arabian Nights footnotes, but maybe I'll encounter it
somewhere, not having finished the 5000+ footnotes.)

If one is working with plain text files (not XML), the above Unicode
characters can be encoded at the bit-level using UTF-8 or UTF-16
encoding.


Jon Noring

From jeroen at bohol.ph  Tue Nov 30 14:05:43 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Tue Nov 30 14:04:42 2004
Subject: [gutvol-d] On quote-like marks...
In-Reply-To: <180519400796.20041129193258@noring.name>
References: <180519400796.20041129193258@noring.name>
Message-ID: <41ACEEB7.8080403@bohol.ph>


When I prepare TEI versions of my texts, I normally use the following:

&ldquo; &lsquo; &rdquo; &rsquo; for quotation marks (in English, LOTE 
has even more variants)

&apos; for apostrophe, including those used in the possesive, as they 
are the same.

&prime; &Prime; for minutes and seconds (and even &tprime; for tripple 
primes)

For works using Arabic, I also use &ayn;, etc., to represent those 
Arabic letters, if they are thus represented in Roman script.

I map these entities to Unicode for HTML versions, and to nearest ASCII 
equivalents in plain vanilla.

I avoid <q>...</q> for the reasons you mention.

If you need help with validation / transforms, etc., drop me a note...

Jeroen.

Jon Noring wrote:

>Regarding the recent discussion about ASCII and the single/double
>quote marks (and what to use), I have my two cents to add (and those
>here who are much more expert at character sets and Unicode than I am
>will undoubtely be able to add to this.)
>
>The situation regarding single and double quote-like marks is even
>more complicated than what it has been presented so far. It has an
>impact on the future expanded use of PG texts as envisioned by Michael
>Hart and others, such as text-to-speech and language conversion. So I
>believe it needs to be dealt with in a more standardized-fashion (that
>is, don't simply use the straight keyboard ' and " for everything
>under the sun.)
>
>Quote-like marks are used for multiple purposes in texts -- especially
>single quote-like marks. And then there are the "curly" types of marks
>used in typographical presentation.
>
>Here's a (probably) partial list of their multiple uses:
>
>1) For marking up quotations (other conventions are also used)
>2) Word contractions (e.g., "we're" for "we are")
>3) Possessives ("the Emperor's crown")
>4) Non-breaking character modifiers (see below)
>5) Minutes and second of time and arc. (50d3'25")
>6) Feet and inches unit indicator (She is 5'7" tall)
>7) Other mathematical symbol and unit measurement uses.
>
>Item (4) is particularly interesting since I'm working on cleaning up
>Burton's "1001 Arabian Nights Tales", and in it there are many Arabic
>names where, when Burton converted to Latin script, single quote-like
>marks were inserted to indicate a type of non-breaking character
>modifier for pronounciation purposes. For example: Ja'afar. This
>semantically differs from the apostrophes used for contractions/
>possessives -- or at least semantically different enough (imho) that
>warrants differentiation in character encoding/entities.
>
>In the XML markup of the Arabian Nights, I've chosen to use the
>following Unicode character conventions to keep everything straight.
>It's not what I necessarily propose PG/DP do, but it indicates one
>possible approach. Since at present I do not enclose quotations in <q>
>(for example), I keep in the quotation marks (double and single) to
>identify quotations. In the Arabian Nights I find some odd quotation
>passages, a couple of which start in the middle of one paragraph and
>end in the middle of another paragraph later within a story, so adding
><q>...</q> would result in non-well-formed XML (I could use the "mile
>marker" approach as defined in TEI, but for the Arabian Nights have
>chosen not to.)
>
>1) For quotations using double quote marks, I use the Unicode
>   left-double quote mark for the beginning, and the right-double
>   quote mark for the ending: &#x201C; and &#x201D;, respectively.
>
>   (The "curly" quotes -- for those who don't like curly quotes for
>   reading, it is trivial to convert them to straight keyboard quote
>   marks, but going the other way is more difficult to do.)
>
>2) For quotations using single quote marks, I use the Unicode
>   left-single quote mark for the beginning, and the right-single
>   quote mark for the ending: &#x2018; and &#x2019;, respectively.
>
>3) For the non-breaking character modifier as described above, e.g.
>   for "Ja'afar", I use the Unicode character specific for this
>   purpose: &#x02BC;
>
>4) For word contractions and possessives I use the ordinary lower-
>   ASCII single straight quote mark: '  (For later presentational
>   purposes this character can always be converted to the right-single
>   "curly" quote mark.)
>
>For use of ' and " for minutes and seconds of arc, and feet/inches,
>there are special Unicode code points for these (I don't see this
>usage in the Arabian Nights footnotes, but maybe I'll encounter it
>somewhere, not having finished the 5000+ footnotes.)
>
>If one is working with plain text files (not XML), the above Unicode
>characters can be encoded at the bit-level using UTF-8 or UTF-16
>encoding.
>
>
>Jon Noring
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>
>  
>

From stuart at ww1aviationlinks.cjb.net  Tue Nov 30 17:29:08 2004
From: stuart at ww1aviationlinks.cjb.net (stuart)
Date: Tue Nov 30 17:29:22 2004
Subject: [gutvol-d] request for input (first timer here)
Message-ID: <20041130172908.237d8b64.stuart@ww1aviationlinks.cjb.net>

I am starting on a project to convert Jane's All The World's Aircraft 1919
to an ebook, any suggestions welcome. If interested, take a look at
http://ww1aviationlinks.cjb.net/janes/index.html