From bill at truthdb.org  Tue Feb  1 01:01:19 2005
From: bill at truthdb.org (bill jenness)
Date: Tue Feb  1 01:01:51 2005
Subject: [gutvol-d] Top 100 EBooks this week (and other stories)
Message-ID: <1350.134.117.141.69.1107248479.squirrel@134.117.141.69>

The picture of Dorian Gray (http://www.gutenberg.org/etext/174) is 47th
and The picture of Dorian Gray (http://www.gutenberg.org/etext/4078) is
99th.
I guess these are different editions but it seems a little odd nonetheless.

The bibrec of the more recent edition lack two subject entries the earlier
one lists.

The other story I want to mention has to do with 2 books that I had
cleared that I am not able to proceed with at this time.

gbn0307140503: Unknown, The Reason Why--Natural History.  bill jenness
<cr502@freenet.carleton.ca>.  1860c.  7/24/2003.  ok.

and

gbn0307140507: warren colburn, Arithmetic upon the inductive method of
instruction....  bill jenness <cr502@freenet.carleton.ca>.  1856p1826c. 
7/24/2003.
ok.


These books are no longer in my possesion and my scanner is toast, I am
limping along with a p166 until I can afford to pickup some new equipment.

I have "The Reason Why" partially scanned but that won't do anyone much
good as the bulb got progressively more discolored as I went along.

If there is someone in Ottawa (Canada) who could scan them in, I could
probably get my hands back on them but they do not belong to me and I
would need them returned.


From maitriv at yahoo.com  Tue Feb  1 06:32:51 2005
From: maitriv at yahoo.com (maitri venkat-ramani)
Date: Tue Feb  1 06:32:56 2005
Subject: [gutvol-d] Arabic eTexts
In-Reply-To: <8d.1f9e780f.2f2fa784@aol.com>
Message-ID: <20050201143251.56694.qmail@web52302.mail.yahoo.com>

Some of these books may be in the public domain and worth looking into.
 Anyone particularly interested in developing an Arabic language
partnership with the project mentioned below?

Maitri

SOFTWARE FOR SCANNING ARABIC DOCUMENTS

Noting that "the whole Internet is skewed toward people who speak
English," computer scientist Venu Govindaraju of the University of 
Buffalo says his research group is developing software to scan Arabic
printed and handwritten documents. Without optical character
recognition software developed for a particular language, Govindaraju
fears that "all the classic texts in that language will disappear into
oblivion." The project's Arabic software will take into account the
fact that characters may take different forms depending on where within
a word they appear, and that Arabic vowels are pronounced but often not
written. (AP 27 Jan 2005)

<http://apnews.excite.com/article/20050127/D87SE8E80.html>


__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - You care about security. So do we. 
http://promotions.yahoo.com/new_mail
From nwolcott at dsdial.net  Tue Feb  1 10:03:43 2005
From: nwolcott at dsdial.net (N Wolcott)
Date: Tue Feb  1 10:04:23 2005
Subject: [gutvol-d] Arabic eTexts
References: <20050201143251.56694.qmail@web52302.mail.yahoo.com>
Message-ID: <001e01c50888$64ec2820$2b9495ce@gw98>

A scientist at the U ov. of Washington developed a "arabic printed text"
after digitizing handwritten scripts by expert Arabic calligraphers. This
was done because of the poor quality arabic used in modern printed arabic
books. I remember a sample from Diocles "On Burning Mirrors" which he put on
the
internet. Unfortunately his characters were kept private I believe although
he also had developed a program which would write the script correctly
accounting for accents, position in word, etc. He developed outline fonts
which could be the basis for something new if they are available. He did
give me a deck of cards for the numerals, which has faded into punch card
history.
----- Original Message -----
From: "maitri venkat-ramani" <maitriv@yahoo.com>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Tuesday, February 01, 2005 9:32 AM
Subject: [gutvol-d] Arabic eTexts


> Some of these books may be in the public domain and worth looking into.
>  Anyone particularly interested in developing an Arabic language
> partnership with the project mentioned below?
>
> Maitri
>
> SOFTWARE FOR SCANNING ARABIC DOCUMENTS
>
> Noting that "the whole Internet is skewed toward people who speak
> English," computer scientist Venu Govindaraju of the University of
> Buffalo says his research group is developing software to scan Arabic
> printed and handwritten documents. Without optical character
> recognition software developed for a particular language, Govindaraju
> fears that "all the classic texts in that language will disappear into
> oblivion." The project's Arabic software will take into account the
> fact that characters may take different forms depending on where within
> a word they appear, and that Arabic vowels are pronounced but often not
> written. (AP 27 Jan 2005)
>
> <http://apnews.excite.com/article/20050127/D87SE8E80.html>
>
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail - You care about security. So do we.
> http://promotions.yahoo.com/new_mail
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From bubblegirl at optusnet.com.au  Tue Feb  1 14:27:18 2005
From: bubblegirl at optusnet.com.au (Season BubbleGirl)
Date: Tue Feb  1 14:27:22 2005
Subject: [gutvol-d] Question.
Message-ID: <200502012227.j11MR2ke026835@mail28.syd.optusnet.com.au>


Hi,

I'm compiling a clean jokes book to be included in free book archives. I was just wondering if jokes are copyrighted. Not comedian-specific jokes - jokes such as, Why did the chicken cross the road?

I found all jokes on different webpages. Because the ebook is  free, aren't I doing the same as the webpages?


Season BubbleGirl:
Writer, poet, Pocket PC enthusiast 
bubblegirl@bubblegirl.net
www.bubblegirl.net

Did you know ROM of PC POWERPLAY moved? He's now an AUSSIE PLAYING UP at www.bubblegirl.net/playingup.php


-----Original Message-----
    From: "Gutenberg9443@aol.com"<Gutenberg9443@aol.com>
    Sent: 02/01/2005 1:53:48 AM
    To: "gutvol-d@lists.pglaf.org"<gutvol-d@lists.pglaf.org>
    Subject: Re: [gutvol-d] date-sensitive info about ebook purchase
       
    In a message dated 1/30/2005 4:46:21 PM Mountain Standard Time,  
    gbnewby@pglaf.org writes:
    
    Evidently, the mainstream publishers are not putting  their
    mainstream works onto the Fictionwise site - maybe  they're
    elsewhere.  My strong suspicion is that many the works  on
    the Fictionwise site are those that are owned by authors,
    not  publishers.  So, right now, this device doesn't replace
    bn.com or  whatever for my reading of contemporary works.
    
    
    There is a lot of new stuff at FictionWise. It's in RB format and 
    can be dumped straight into the ebook. Probably most of
    the stuff on the sites is older and the author has gotten
    copyright revision, but more and more publishers are
    getting the idea and putting new works up. For example,
    THE DA VINCI CODE went up on FictionWise about the
    same time it was released in hardback. Its success in 
    eformat has certainly caught the eyes of other mainstream
    publishers.
     
    It's a good beginning, but it IS a beginning.
     
    Anne
    

From shalesller at writeme.com  Tue Feb  1 15:35:01 2005
From: shalesller at writeme.com (D. Starner)
Date: Tue Feb  1 15:35:18 2005
Subject: [gutvol-d] Arabic eTexts
Message-ID: <20050201233501.A9C654BDAB@ws1-1.us4.outblaze.com>

"maitri venkat-ramani" writes:

> Some of these books may be in the public domain and worth looking into. 
> Anyone particularly interested in developing an Arabic language 
> partnership with the project mentioned below? 

It sounds like they're writing software, not transcribing books. I'm
not sure why this is news that everyone's carrying; 
<http://www.hf.uib.no/smi/ksv/arabocr.html> is a ten-year old review
of Arabic OCRs. <http://www.translation.net/sakhr_automatic_reader.html>
is a commercially available Arabic OCR program, even if it's a touch
expensive. <http://www.translation.net/ocr.html> is a nice list of OCR
programs if you're looking for something beyond what ABBYY supports.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From maitriv at yahoo.com  Tue Feb  1 20:52:14 2005
From: maitriv at yahoo.com (maitri venkat-ramani)
Date: Tue Feb  1 20:52:33 2005
Subject: [gutvol-d] Arabic eTexts
In-Reply-To: <20050201233501.A9C654BDAB@ws1-1.us4.outblaze.com>
Message-ID: <20050202045215.27164.qmail@web52310.mail.yahoo.com>


>From the article I read, I got the impression that the lead researcher
is passionate about certain texts which will be lost if his reader is
not developed.  I'll email him and find out if he has any particular
eBook intentions and forward him some of our questions.

Maitri

--- "D. Starner" <shalesller@writeme.com> wrote:

> "maitri venkat-ramani" writes:
> 
> > Some of these books may be in the public domain and worth looking
> into. 
> > Anyone particularly interested in developing an Arabic language 
> > partnership with the project mentioned below? 
> 
> It sounds like they're writing software, not transcribing books. I'm
> not sure why this is news that everyone's carrying; 
> <http://www.hf.uib.no/smi/ksv/arabocr.html> is a ten-year old review
> of Arabic OCRs.
> <http://www.translation.net/sakhr_automatic_reader.html>
> is a commercially available Arabic OCR program, even if it's a touch
> expensive. <http://www.translation.net/ocr.html> is a nice list of
> OCR
> programs if you're looking for something beyond what ABBYY supports.
> -- 
 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
From j.hagerson at comcast.net  Wed Feb  2 17:04:12 2005
From: j.hagerson at comcast.net (John Hagerson)
Date: Wed Feb  2 17:04:37 2005
Subject: [gutvol-d] GREG NEWBY: Please check your e-mail!
Message-ID: <004301c5098c$48aa7810$6401a8c0@sarek>

Sorry for the broadcast, but other methods to reach Greg have been
unsuccessful.

Greg: Please look for messages from Aaron Cannon and John Hagerson. Thank
you.


From gbnewby at pglaf.org  Wed Feb  2 19:53:27 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Feb  2 19:53:29 2005
Subject: [gutvol-d] GREG NEWBY: Please check your e-mail!
In-Reply-To: <004301c5098c$48aa7810$6401a8c0@sarek>
References: <004301c5098c$48aa7810$6401a8c0@sarek>
Message-ID: <20050203035327.GB7603@pglaf.org>

On Wed, Feb 02, 2005 at 07:04:12PM -0600, John Hagerson wrote:
> Sorry for the broadcast, but other methods to reach Greg have been
> unsuccessful.
> 
> Greg: Please look for messages from Aaron Cannon and John Hagerson. Thank
> you.

Ok: soon.

I've been a little busy.  Life, job, flu, that sort of thing.
  -- Greg
From cannona at fireantproductions.com  Wed Feb  2 20:10:52 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Wed Feb  2 20:12:38 2005
Subject: [gutvol-d] GREG NEWBY: Please check your e-mail!
In-Reply-To: <20050203035327.GB7603@pglaf.org>
References: <004301c5098c$48aa7810$6401a8c0@sarek>
	<20050203035327.GB7603@pglaf.org>
Message-ID: <6.1.2.0.0.20050202220817.01c48840@mail.fireantproductions.com>

Sorry to bother.  It just appeared that messages just weren't getting 
through.  You being occupied with other matters changes everything, and is 
completely understandable.

Sorry again and take your time.

Sincerely
Aaron Cannon


At 09:53 PM 2/2/2005, you wrote:
>On Wed, Feb 02, 2005 at 07:04:12PM -0600, John Hagerson wrote:
> > Sorry for the broadcast, but other methods to reach Greg have been
> > unsuccessful.
> >
> > Greg: Please look for messages from Aaron Cannon and John Hagerson. Thank
> > you.
>
>Ok: soon.
>
>I've been a little busy.  Life, job, flu, that sort of thing.
>   -- Greg
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d


--
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) 


From j.hagerson at comcast.net  Wed Feb  2 20:12:56 2005
From: j.hagerson at comcast.net (John Hagerson)
Date: Wed Feb  2 20:13:18 2005
Subject: [gutvol-d] A question raised by Part 2 of this week's weekly
	newsletter...
Message-ID: <004f01c509a6$a3e4a920$6401a8c0@sarek>

Quoth the newsletter:
>And yes I said yes today is the 83rd anniversary of the first
>publication of Ulysses.

I'm probably missing something. Was Foghorn Leghorn involved with the
publication of Ulysses?


From gbnewby at pglaf.org  Wed Feb  2 23:16:20 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Feb  2 23:16:22 2005
Subject: [gutvol-d] GREG NEWBY: Please check your e-mail!
In-Reply-To: <6.1.2.0.0.20050202220817.01c48840@mail.fireantproductions.com>
References: <004301c5098c$48aa7810$6401a8c0@sarek>
	<20050203035327.GB7603@pglaf.org>
	<6.1.2.0.0.20050202220817.01c48840@mail.fireantproductions.com>
Message-ID: <20050203071620.GC11085@pglaf.org>

On Wed, Feb 02, 2005 at 10:10:52PM -0600, Aaron Cannon wrote:
> Sorry to bother.  It just appeared that messages just weren't getting 
> through.  You being occupied with other matters changes everything, and is 
> completely understandable.
> 
> Sorry again and take your time.

De nada - I'm sorry for not responding sooner.  It's always fine
to re-send an email after a few days, since sometimes things get
lost, deleted or filtered by mistake.
  -- Greg


> At 09:53 PM 2/2/2005, you wrote:
> >On Wed, Feb 02, 2005 at 07:04:12PM -0600, John Hagerson wrote:
> >> Sorry for the broadcast, but other methods to reach Greg have been
> >> unsuccessful.
> >>
> >> Greg: Please look for messages from Aaron Cannon and John Hagerson. Thank
> >> you.
> >
> >Ok: soon.
> >
> >I've been a little busy.  Life, job, flu, that sort of thing.
> >  -- Greg
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d@lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 
> 
> 
> --
> E-mail: cannona@fireantproductions.com
> Skype: cannona
> MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail 
> address.) 
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From gbnewby at pglaf.org  Wed Feb  2 23:23:48 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Feb  2 23:23:50 2005
Subject: [gutvol-d] Fwd: Proposed CD Navigation Files now available for
	review
In-Reply-To: <6.1.2.0.0.20050128112358.01c90da0@mail.fireantproductions.com>
References: <6.1.2.0.0.20050128112358.01c90da0@mail.fireantproductions.com>
Message-ID: <20050203072348.GC11403@pglaf.org>


On Fri, Jan 28, 2005 at 11:24:03AM -0600, Aaron Cannon wrote:
> 
> >From: "John Hagerson" <j.hagerson@comcast.net>
> >To: "'Aaron Cannon'" <cannona@fireantproductions.com>
> >Subject: Proposed CD Navigation Files now available for review
> >Date: Fri, 28 Jan 2005 08:04:16 -0600
> >X-Mailer: Microsoft Outlook, Build 10.0.6626
> >
> >Four navigation files built for a new Project Gutenberg CD-ROM which
> >contains primarily non-English electronic books are now available for 
> >review
> >at http://www.aaronandgabby.com/pgcd/ The files allow one to browse the CD
> >by Author, Language and Author, Language and Title, or Title.
> >

This is great stuff!  Once it's raedy (or ready enough),
I'd like to go ahead and make an .iso image to add to our
collection.
  -- Greg

> >The files were developed from the Project Gutenberg production prior to 
> >book
> >14700. The Distributed Proofreaders have been especially prolific in
> >non-English books recently, so it seems that a number of books of recent
> >production will be omitted regardless of where we draw the line.
> >
> >I believe I have included every non-English book produced prior to 14700
> >with the exception of three books (7216, 7337, and 12407) where the title
> >and author were both in Unicode characters that most fonts do not support.
> >Each of the omitted works is in Chinese. If someone could help me obtain
> >more information on these works, there is ample space to include them.
> >
> >Please respond to the list or directly to mailto:j.hagerson@comcast.net 
> >with
> >your comments regarding the files.
> >
> >Thank you.
> >
> >Aaron: Before you forward this to the list, please make sure that the http
> >download works. My attempts to view the directory were met with a 403 
> >error.
> >Thank you.
> 
> 
> 
> --
> E-mail: cannona@fireantproductions.com
> Skype: cannona
> MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail 
> address.)  
> 
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From jlinden at projectgutenberg.ca  Wed Feb  2 23:41:09 2005
From: jlinden at projectgutenberg.ca (James Linden)
Date: Wed Feb  2 23:44:29 2005
Subject: [gutvol-d] Fwd: Proposed CD Navigation Files now available for
	review
In-Reply-To: <20050203072348.GC11403@pglaf.org>
References: <6.1.2.0.0.20050128112358.01c90da0@mail.fireantproductions.com>
	<20050203072348.GC11403@pglaf.org>
Message-ID: <4201D595.2050604@projectgutenberg.ca>

>>>Four navigation files built for a new Project Gutenberg CD-ROM which
>>>contains primarily non-English electronic books are now available for 
>>>review
>>>at http://www.aaronandgabby.com/pgcd/ The files allow one to browse the CD
>>>by Author, Language and Author, Language and Title, or Title.
>>>
> This is great stuff!  Once it's raedy (or ready enough),
> I'd like to go ahead and make an .iso image to add to our
> collection.
>   -- Greg

  Why aren't we generating navigation pages from the catalog DB, instead 
of making HUGE single files? We should be providing indexes by Language, 
Subject Matter, Alphabetical Author, Alphabetical Title, etc -- all in a 
nicely paged manner.

  Other than that, I like the format of each record -- easily readable. 
Nice work Aaron!

-- James
From gbnewby at pglaf.org  Wed Feb  2 23:48:18 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Feb  2 23:48:19 2005
Subject: [gutvol-d] Fwd: Proposed CD Navigation Files now available for
	review
In-Reply-To: <4201D595.2050604@projectgutenberg.ca>
References: <6.1.2.0.0.20050128112358.01c90da0@mail.fireantproductions.com>
	<20050203072348.GC11403@pglaf.org>
	<4201D595.2050604@projectgutenberg.ca>
Message-ID: <20050203074818.GA11955@pglaf.org>

On Thu, Feb 03, 2005 at 02:41:09AM -0500, James Linden wrote:
> >>>Four navigation files built for a new Project Gutenberg CD-ROM which
> >>>contains primarily non-English electronic books are now available for 
> >>>review
> >>>at http://www.aaronandgabby.com/pgcd/ The files allow one to browse the 
> >>>CD
> >>>by Author, Language and Author, Language and Title, or Title.
> >>>
> >This is great stuff!  Once it's raedy (or ready enough),
> >I'd like to go ahead and make an .iso image to add to our
> >collection.
> >  -- Greg
> 
>  Why aren't we generating navigation pages from the catalog DB, instead 
> of making HUGE single files? We should be providing indexes by Language, 
> Subject Matter, Alphabetical Author, Alphabetical Title, etc -- all in a 
> nicely paged manner.

People using a CD or DVD directly won't necessarily have access
to the catalog DB, so some sort of built-in file-based navigation
seems necessary.

Providing a CD-based program + database for Win or Mac or Lin or
whatever would be fine, too (in addition to file-based), but we don't
have one.

>  Other than that, I like the format of each record -- easily readable. 
> Nice work Aaron!

Related: I'm finally making moves (thanks to the XML/RDF file)
on generating ISO files on the fly, based on a list of eBook #s.
Stay tuned...
  -- Greg
From jlinden at projectgutenberg.ca  Wed Feb  2 23:52:51 2005
From: jlinden at projectgutenberg.ca (James Linden)
Date: Wed Feb  2 23:56:10 2005
Subject: [gutvol-d] Fwd: Proposed CD Navigation Files now available for
	review
In-Reply-To: <20050203074818.GA11955@pglaf.org>
References: <6.1.2.0.0.20050128112358.01c90da0@mail.fireantproductions.com>	<20050203072348.GC11403@pglaf.org>	<4201D595.2050604@projectgutenberg.ca>
	<20050203074818.GA11955@pglaf.org>
Message-ID: <4201D853.504@projectgutenberg.ca>

Greg Newby wrote:
> On Thu, Feb 03, 2005 at 02:41:09AM -0500, James Linden wrote:
> 
>>>>>Four navigation files built for a new Project Gutenberg CD-ROM which
>>>>>contains primarily non-English electronic books are now available for 
>>>>>review
>>>>>at http://www.aaronandgabby.com/pgcd/ The files allow one to browse the 
>>>>>CD
>>>>>by Author, Language and Author, Language and Title, or Title.
>>>>>
>>>
>>>This is great stuff!  Once it's raedy (or ready enough),
>>>I'd like to go ahead and make an .iso image to add to our
>>>collection.
>>> -- Greg
>>
>> Why aren't we generating navigation pages from the catalog DB, instead 
>>of making HUGE single files? We should be providing indexes by Language, 
>>Subject Matter, Alphabetical Author, Alphabetical Title, etc -- all in a 
>>nicely paged manner.
> 
> 
> People using a CD or DVD directly won't necessarily have access
> to the catalog DB, so some sort of built-in file-based navigation
> seems necessary.
> 
> Providing a CD-based program + database for Win or Mac or Lin or
> whatever would be fine, too (in addition to file-based), but we don't
> have one.

  The idea of _generating_ the navigation files is that we can burn 
static files on the CD, but generate them for each CD image version 
using paging, various sort options, etc. This does not require users to 
have access to the DB, only a simple script that creates the HTML files.

-- James
From cannona at fireantproductions.com  Thu Feb  3 00:24:17 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu Feb  3 00:25:40 2005
Subject: [gutvol-d] Fwd: Proposed CD Navigation Files now available
	for review
In-Reply-To: <4201D595.2050604@projectgutenberg.ca>
References: <6.1.2.0.0.20050128112358.01c90da0@mail.fireantproductions.com>
	<20050203072348.GC11403@pglaf.org>
	<4201D595.2050604@projectgutenberg.ca>
Message-ID: <6.1.2.0.0.20050203022013.01af7838@mail.fireantproductions.com>

At 01:41 AM 2/3/2005, you wrote:
<snip>
>  Other than that, I like the format of each record -- easily readable. 
> Nice work Aaron!


Thanks.  I only wish I could take credit. :)  Actually the majority of the 
work on the CD came from John Hagerson.  I've just been doing some very 
light assisting.  Nevertheless, your feedback is appreciated by both of us.


Sincerely
Aaron Cannon


--
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail 
address.)  


From cannona at fireantproductions.com  Thu Feb  3 00:28:21 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu Feb  3 00:29:46 2005
Subject: [gutvol-d] Fwd: Proposed CD Navigation Files now available
	for review
In-Reply-To: <20050203072348.GC11403@pglaf.org>
References: <6.1.2.0.0.20050128112358.01c90da0@mail.fireantproductions.com>
	<20050203072348.GC11403@pglaf.org>
Message-ID: <6.1.2.0.0.20050203022635.01c81178@mail.fireantproductions.com>

At 01:23 AM 2/3/2005, you wrote:

>This is great stuff!  Once it's raedy (or ready enough),
>I'd like to go ahead and make an .iso image to add to our
>collection.
>   -- Greg

Indeed.  We'll be sure to build it under linux, so as to avoid any problems 
with capitalization of file names.


--
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) 


From ke at gnu.franken.de  Wed Feb  2 20:48:24 2005
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Thu Feb  3 07:39:09 2005
Subject: [gutvol-d] Re: Error Correction Data Needed
In-Reply-To: <Pine.LNX.4.60.0501280958420.1895@pglaf.org> (Michael Hart's
	message of "Fri, 28 Jan 2005 10:06:10 -0800 (PST)")
References: <Pine.LNX.4.60.0501280958420.1895@pglaf.org>
Message-ID: <shy8e6gsfb.fsf@tux.gnu.franken.de>

Michael Hart <hart@pglaf.org> writes:

> However, my most recent research, in conjunctions with the head
> of error correction at a major publisher, leads me to think 1/3
> of errors might be found per pass, instead of the previous 1/2.

What a about a proper case study?  I'd say you would better give up on
talking about numbers ;)

If you are interested in catching errors, print it out and read a paper
copy.  Or even better, if you are interested in 1:1 accuracy between the
original and the copy, let one loud-read(?) the text with all
diacritical marks while a second person looks at the text of the copy.

-- 
http://www.gnu.franken.de/ke/                           |      ,__o
                                                        |    _-\_<,
                                                        |   (*)/'(*)
Key fingerprint = F138 B28F B7ED E0AC 1AB4  AA7F C90A 35C3 E9D0 5D1C
From ag737 at freenet.carleton.ca  Thu Feb  3 08:17:23 2005
From: ag737 at freenet.carleton.ca (Wallace J.McLean)
Date: Thu Feb  3 08:17:32 2005
Subject: [gutvol-d] Re Error Correction Data Needed
Message-ID: <4114d14113c9.4113c94114d1@ncf.ca>

I'm inclined to think that the 1/3 figure, AT MOST, may be closer to 
the truth.

I've been working on a massive (300,000 word) publication for a number 
of years now. (It's FINALLY in pre-press, hooray!) My workflow was:

handkey text (except for 16 pages I OCRd as a test.)

Proofread 1

Attestation* 1

Proofread 2

Attestation* 2

Skimread (very superficial, but often you find stupid errors that way. 
Like the one on PAGE 1!!!)

Software spellcheck

Proofread 3

Readback** 1

Readback 2

And IIRC, Readback 3


* Attestation: Comparing my typescript to the original, word-by-word, 
phrase-by-phrase.

** Readback: After the HUGE error rates I was still getting after each 
prevous pass, I bought voice synthesis software, and had the work read 
back to me, while I followed along in the original.

I've kept stats somewhere on the error catch-rate at each stage; I'll 
dig them up later. The caveat, of course, is that the only way for me 
to get "fresh eyes" on the project was to put it aside for a few weeks 
or months; I can't afford to hire someone else. The error rate on the 
last pass was so small that, even if I had only caught 30% of the 
remaining errors, the few that are statistically expectable are no 
longer worth it on the law of diminishing returns curve.


----- Original Message ----- 
>From  Michael Hart <hart@pglaf.org> 
Date  Fri, 28 Jan 2005 10:06:10 -0800 (PST) 
Subject  [gutvol-d] Error Correction Data Needed 


[Please excuse cross-posting.]

However, my most recent research, in conjunctions with the head
of error correction at a major publisher, leads me to think 1/3
of errors might be found per pass, instead of the previous 1/2.

If any of you have any suggestions as to what these figures are,
please let me know.


From sly at victoria.tc.ca  Thu Feb  3 09:21:17 2005
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Feb  3 09:21:42 2005
Subject: [gutvol-d] Re Error Correction Data Needed
In-Reply-To: <4114d14113c9.4113c94114d1@ncf.ca>
References: <4114d14113c9.4113c94114d1@ncf.ca>
Message-ID: <Pine.GSO.4.58.0502030910070.2426@vtn1.victoria.tc.ca>


On Thu, 3 Feb 2005, Wallace J.McLean wrote:

> Skimread (very superficial, but often you find stupid errors that way.
> Like the one on PAGE 1!!!)
>

That reminds me of a book I have called "Indian Myths and Legends"
which, in the introduction, details how the whole text was
carefully translated from the German, and double checked many
times, over the course of 30 years. And of course, I see an
obvious error on page 1. :)

Andrew
From gbuchana at rogers.com  Sat Feb  5 08:15:54 2005
From: gbuchana at rogers.com (Gardner Buchanan)
Date: Sat Feb  5 08:16:10 2005
Subject: [gutvol-d] Fwd: Proposed CD Navigation Files now available f
In-Reply-To: <6.1.2.0.0.20050203022635.01c81178@mail.fireantproductions.com>
Message-ID: <XFMail.050205111554.gbuchana@rogers.com>

Hi Aaron,

On 08:28:21 Aaron Cannon wrote:
> 
> Indeed.  We'll be sure to build it under linux, so as to avoid any problems 
> with capitalization of file names.
> 

I noticed that in the BrowsebyLanguageandTitle page there is
some funny business at the end, with the Welsh section appearing
more than once.

In general, my comment is that the pages are too large.  I think
a page with just language, linked to a page with the titles for
that language would, for example, be more managable.

============================================================
Gardner Buchanan                       <gbuchana@rogers.com>
Ottawa, ON             FreeBSD: Where you want to go. Today.
From ag737 at freenet.carleton.ca  Sat Feb  5 10:16:34 2005
From: ag737 at freenet.carleton.ca (Wallace J.McLean)
Date: Sat Feb  5 10:16:41 2005
Subject: [gutvol-d] Error rate statistics
Message-ID: <442d4e44a4fa.44a4fa442d4e@ncf.ca>

As I previously discussed, these are the figures from a project I've 
been working on for several years. It's a massive, three-volume job, 
for print publication. After each round, described below, I would 
reprint the text, verify the correction of the previous round's errors, 
and then do another round. I also did a batch-verify of ALL previous 
rounds' corrections, finding one or two that I had missed along the way.


Round	Type	Errors
1/1a	p/a	944
2	p	415
3	a	454
4&5	p/a	154
	sc	35-40
6	rb	170
7	sr	0
8	rb	64
"9"	rb	0


Explanation: Round = round, Type = type of reading: p/roofreading, 
a/ttestation, s/pell c/heck, r/ead b/ack using voice synthesis. I keyed 
most of the text, apart from a small sample (about 15-20pp) which I 
OCRd near the end of the text-entry phase. This made it imperative that 
I not only proofread the typescript in the conventional sense, but also 
attest it, compare it back to the original, para by para, line by line, 
word by word.

There were many errors that I introduced to the text that would pass 
spellcheck or proofread; they weren't "errors", but they weren't 
faithful to the text, either. They had to be exterminated

Some - many, actually - of the errors were native to the original. As I 
keyed the text, I retained them, but corrected them afterwards. Thus, 
the error stats are somewhat inflated, in that a good number of them, 
probably 10 or 12 percent, weren't my fault.

Rounds 1 and 1a, my first attestation and proof, I did on the same 
copy, so I couldn't do separate stats.

After rounds 4-5, I did a spellcheck, which returned about 35-40 
spelling errors which my eyes hadn't caught. This was a bit of a shock 
to my own esteem of my proofing skills, so I went out and got some 
speech synthesis software to do readbacks. I'd clip a few hundred words 
at a time, and follow in the original, highlighting discrepancies as I 
went along.

7 was a skimread of the whole thing.

8 was a second full readback. I know I did a third full readback, but 
didn't seem to keep stats on it.

"9" was a partial readback. At 64 errors in round 8, that works out to 
about one discrepency every 15 pages or 4500 words. I did a bunch of 
batches of 15 pages and 4500 words, and also did a complete readback of 
several of the most error-prone sections of the book. Even with the 
long breaks I took in between rounds, round "9", with no moments of 
sheer "d'uh" to break up the monotony, was where the law of diminishing 
returns kicked in. I re-did perhaps 15% of the entire text without 
finding any further errors.

At that point, I estimated the number of remaining typos or text 
discrepencies in the entire book to be somewhere between 6 and 20, and 
I'll be damned if I'm going to spend another three months of evenings 
hunting the buggers down.

(At the same time, in my second readback pass, I at times would go 100 
pages without finding ANY errors, then hit three or four on the same 
page.)

The total number of native typos, my typos, and my transcription 
errors, worked out to about 2 per 300-word page. Not great, but not 
bad. It was, probably, actually higher, but in my early eyeball rounds, 
if I came across an error that I thought I had repeatedly made, I would 
do a global search, attest, and replace on it when I did the 
corrections at the end of that round.

However, I only caught under 50% by eye on my first round, and fewer 
than 90% by eye, overall, on subsequent rounds. About 12%, I would not 
have caught at all, but for speech synthesis and spellcheck.


From miranda_vandeheijning at blueyonder.co.uk  Sun Feb  6 02:36:59 2005
From: miranda_vandeheijning at blueyonder.co.uk (Miranda van de Heijning)
Date: Sun Feb  6 02:37:26 2005
Subject: [gutvol-d] Surge in users?
In-Reply-To: <20050202045215.27164.qmail@web52310.mail.yahoo.com>
References: <20050202045215.27164.qmail@web52310.mail.yahoo.com>
Message-ID: <4205F34B.2010606@blueyonder.co.uk>


Just wondering, I was looking through the PG Top 100 and realised the 
figures for all the books are a lot higher than usual. Do we have a 
surge in visitors this week?

Secondly, mainly because I can't get enough of stats, would it be 
possible to have a 'total number of books downloaded' somewhere, so we 
can compare week on week how we are doing?

Miranda


From marcello at perathoner.de  Sun Feb  6 10:25:22 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Feb  6 11:33:11 2005
Subject: [gutvol-d] Surge in users?
In-Reply-To: <4205F34B.2010606@blueyonder.co.uk>
References: <20050202045215.27164.qmail@web52310.mail.yahoo.com>
	<4205F34B.2010606@blueyonder.co.uk>
Message-ID: <42066112.1020206@perathoner.de>

Miranda van de Heijning wrote:

> Just wondering, I was looking through the PG Top 100 and realised the 
> figures for all the books are a lot higher than usual. Do we have a 
> surge in visitors this week?

Due to problems at ibiblios file servers we didn't get the log files for 
some days and so the script couldn't count the downloads.

If you want to see the global numbers go to:

   http://www.gutenberg.org/internal/stats/2005/02/
   user: internal
   pass: books

and look at month-files.html

To see an independent stat about gutenberg.org's popularity go to:

 
http://www.alexa.com/data/details/traffic_details?&range=3m&size=large&compare_sites=gutenberg.net,promo.net&y=t&url=gutenberg.org


> Secondly, mainly because I can't get enough of stats, would it be 
> possible to have a 'total number of books downloaded' somewhere, so we 
> can compare week on week how we are doing?

I could add that figure quite easily on the top 100 page but it will be 
misleading. We just count the downloads from ibiblio's servers. We don't 
know how many books get downloaded from our mirrors.

And, at the rate we are going, we are far below the numbers Michael 
likes to put in his newsletter (billions, trillions, gazillions) and 
most likely that'll start another war about how to count downloaded ebooks.


-- 
Marcello Perathoner
webmaster@gutenberg.org


From hyphen at hyphenologist.co.uk  Sun Feb  6 12:53:02 2005
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sun Feb  6 12:53:29 2005
Subject: [gutvol-d] Surge in users?
In-Reply-To: <4205F34B.2010606@blueyonder.co.uk>
References: <20050202045215.27164.qmail@web52310.mail.yahoo.com>
	<4205F34B.2010606@blueyonder.co.uk>
Message-ID: <0m0d01tu4u2243uknmr00t3q4j7mhqebci@4ax.com>

On Sun, 06 Feb 2005 10:36:59 +0000,  Miranda van de Heijning
<miranda_vandeheijning@blueyonder.co.uk> wrote:

| 
| Just wondering, I was looking through the PG Top 100 and realised the 
| figures for all the books are a lot higher than usual. Do we have a 
| surge in visitors this week?
| 
| Secondly, mainly because I can't get enough of stats, would it be 
| possible to have a 'total number of books downloaded' somewhere, so we 
| can compare week on week how we are doing?

Beware the numbers of books downloaded will vary drastically on a weekly
basis, because of Christmas and other public holidays, University and
school holidays etc.  

IMO monthly running averages would give a better idea of what is happening.


-- 
Dave F

From krooger at debian.org  Sun Feb  6 14:16:43 2005
From: krooger at debian.org (Jonathan Walther)
Date: Sun Feb  6 14:16:58 2005
Subject: [gutvol-d] Can project use legally encumbered scans?
In-Reply-To: <00a001c5048d$5371e480$f69495ce@gw98>
References: <20050126210155.GA8093@reactor-core.org>
	<00a001c5048d$5371e480$f69495ce@gw98>
Message-ID: <20050206221643.GA22130@reactor-core.org>

On Thu, Jan 27, 2005 at 11:20:55AM -0500, N Wolcott wrote:
>If you have a a valuable collection, if the scans are high quality
>tiff's or tiff's and jpegs you might enquire about space on ibiblio
>where they can be accessed as a collection. Many PG tiff's are just
>high enought quality to "get the job done", you might want yours to be
>separated from the dross.

I know of a situation.  Let's say that it's hypothetical.  Someone got
access to some extremely old and rare books, and photographed them.  The
photos were scanned and distributed on CDROM by a company.  The owners
of the photos say the scans constitute stolen property, and after years
of legal action, stopped the company from distributing the scans.  The
books in question are up to 500 years old and unlikely to ever come back
into print.

What is PG's position?  The books themselves are clearly not in
copyright; the few remaing copies are heirlooms tucked away in a few
select private libraries.  PG would not be distributing the scans
themselves.  If PG could get access to the scans, would it be ethical to
use them?

Please let me know the official answer.

Jonathan


-- 
          It's not true unless it makes you laugh,                           
     but you don't understand it until it makes you weep.

Eukleia: Jonathan Walther
Address: 12706 99 Ave, Surrey, BC V3V2P8 (Canada)
Contact: 604-684-1319 (daytime)
Contact: 604-582-9308 (morning and evening)
Puritan: Purity of faith, Purity of doctrine. Sola Scriptura!
From gbnewby at pglaf.org  Sun Feb  6 14:31:44 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Feb  6 14:31:44 2005
Subject: [gutvol-d] Can project use legally encumbered scans?
In-Reply-To: <20050206221643.GA22130@reactor-core.org>
References: <20050126210155.GA8093@reactor-core.org>
	<00a001c5048d$5371e480$f69495ce@gw98>
	<20050206221643.GA22130@reactor-core.org>
Message-ID: <20050206223144.GA30756@pglaf.org>

On Sun, Feb 06, 2005 at 02:16:43PM -0800, Jonathan Walther wrote:
> On Thu, Jan 27, 2005 at 11:20:55AM -0500, N Wolcott wrote:
> >If you have a a valuable collection, if the scans are high quality
> >tiff's or tiff's and jpegs you might enquire about space on ibiblio
> >where they can be accessed as a collection. Many PG tiff's are just
> >high enought quality to "get the job done", you might want yours to be
> >separated from the dross.
> 
> I know of a situation.  Let's say that it's hypothetical.  Someone got
> access to some extremely old and rare books, and photographed them.  The
> photos were scanned and distributed on CDROM by a company.  The owners
> of the photos say the scans constitute stolen property, and after years
> of legal action, stopped the company from distributing the scans.  The
> books in question are up to 500 years old and unlikely to ever come back
> into print.
> 
> What is PG's position?  The books themselves are clearly not in
> copyright; the few remaing copies are heirlooms tucked away in a few
> select private libraries.  PG would not be distributing the scans
> themselves.  If PG could get access to the scans, would it be ethical to
> use them?

(Are you talking about scans of photos, from CDs?  Were there any
other value-added processes involved in creating the scans/photos?
Are these entire books, or some sort of collection of items, which
might have a compilation copyright?)

> Please let me know the official answer.

This is an official answer, but doesn't quite meet your needs.

The short answer is that it's hard to deal with hypotheticals,
since there are a few issues that could mitigate.  The main
one is if there's a relevant court case that was decided that
could impact our decision.  The other is if the books could count
as unpublished manuscripts, which get a separate copyright
period of modern-day protection, regardless of when they
were published (http://gutenberg.org/howto/copyright-howto).

But our basic answer is that IF the source is verifiably
public domain in the US, using our clearance procedures,
then scans or pictures of the source, as well as OCR,
proofreading, markup, and completed eBooks, are also public
domain.

This is a position that has been vetted by several 
lawyers who help PG, but has not yet been tested in court
as far as we know.  The closest counter-example I can
think of is the dead sea scrolls, which (IIRC) did end
up with some sort of copyright protection despite their age.

In other words, there *might* be a risk.  When we get such
requests, we sometimes need to look at the risk of getting sued,
as well as our own procedures.  We're definitely willing
to take risks, but in a thoughtful manner.

Feel free to send me further details, or just upload
the request via http://copy.pglaf.org, along with details.
  -- Greg

Dr. Gregory B. Newby
Chief Executive and Director
Project Gutenberg Literary Archive Foundation http://gutenberg.net
A 501(c)(3) not-for-profit organization with EIN 64-6221541
gbnewby@pglaf.org

From kouhia at nic.funet.fi  Mon Feb  7 06:21:16 2005
From: kouhia at nic.funet.fi (Juhana Sadeharju)
Date: Mon Feb  7 06:21:26 2005
Subject: [gutvol-d] Re: Arabic eTexts
Message-ID: <S18357AbVBGOVQ/20050207142116Z+7941@nic.funet.fi>


Are those Arabic OCR software open source and free?

Having no Arabic OCR software has not prevented us from
digitizing Arabic texts earlier. If only buying a $$$$ software
gets you motivated to digitize arabic texts, then it is fine
by me. 

However, I feel the arabic texts should be digitized first as
image files. Specially if the text is written by hand. This
apparoach will be cheaper and faster as well.

Please don't make the mistage of not archiving and making
available the images if you choose the OCR approach. I'm
pleased to archive any arabic digitizations as image files for
now and for future use. Only image files can preserve the text
as close to original as possible.

Juhana
-- 
  http://music.columbia.edu/mailman/listinfo/linux-graphics-dev
  for developers of open source graphics software
From hart at pglaf.org  Mon Feb  7 07:52:05 2005
From: hart at pglaf.org (Michael Hart)
Date: Mon Feb  7 07:52:06 2005
Subject: [gutvol-d] Can project use legally encumbered scans?
In-Reply-To: <20050206223144.GA30756@pglaf.org>
References: <20050126210155.GA8093@reactor-core.org>
	<00a001c5048d$5371e480$f69495ce@gw98>
	<20050206221643.GA22130@reactor-core.org>
	<20050206223144.GA30756@pglaf.org>
Message-ID: <Pine.LNX.4.60.0502070744420.16344@pglaf.org>


Photographs, even of public domain materials, can be copyrighted,
though I doubt a similar photograph would infringe.

However, this has not been established for photocopies, scans, etc.

As for the WORDS on the pages, in the photographs, etc., those are
still in the public domain, and you could legally type/scan them in to
create an eBook, probably even if the license says you cannot.

This would be similar to the case of someone owning a painting in the
public domain, and you take a picture of it.  You could either copyright
the picture or put it in the public domain.

Some people claim all rights to reproduction of certain public domain
materials, such as museums, but I don't know if that can be enforced
outside of certain contracts with the museums.  Perhaps just walking
in to the museums is regarded in some places like ye olde "shrikwrap"
licenses that are no longer legally enforceable.


I am not a lawyer. . .this is NOT a legal opinion or legal advice.

IANAL = I am not a lawyer.

mh


On Sun, 6 Feb 2005, Greg Newby wrote:

> On Sun, Feb 06, 2005 at 02:16:43PM -0800, Jonathan Walther wrote:
>> On Thu, Jan 27, 2005 at 11:20:55AM -0500, N Wolcott wrote:
>>> If you have a a valuable collection, if the scans are high quality
>>> tiff's or tiff's and jpegs you might enquire about space on ibiblio
>>> where they can be accessed as a collection. Many PG tiff's are just
>>> high enought quality to "get the job done", you might want yours to be
>>> separated from the dross.
>>
>> I know of a situation.  Let's say that it's hypothetical.  Someone got
>> access to some extremely old and rare books, and photographed them.  The
>> photos were scanned and distributed on CDROM by a company.  The owners
>> of the photos say the scans constitute stolen property, and after years
>> of legal action, stopped the company from distributing the scans.  The
>> books in question are up to 500 years old and unlikely to ever come back
>> into print.
>>
>> What is PG's position?  The books themselves are clearly not in
>> copyright; the few remaing copies are heirlooms tucked away in a few
>> select private libraries.  PG would not be distributing the scans
>> themselves.  If PG could get access to the scans, would it be ethical to
>> use them?
>
> (Are you talking about scans of photos, from CDs?  Were there any
> other value-added processes involved in creating the scans/photos?
> Are these entire books, or some sort of collection of items, which
> might have a compilation copyright?)
>
>> Please let me know the official answer.
>
> This is an official answer, but doesn't quite meet your needs.
>
> The short answer is that it's hard to deal with hypotheticals,
> since there are a few issues that could mitigate.  The main
> one is if there's a relevant court case that was decided that
> could impact our decision.  The other is if the books could count
> as unpublished manuscripts, which get a separate copyright
> period of modern-day protection, regardless of when they
> were published (http://gutenberg.org/howto/copyright-howto).
>
> But our basic answer is that IF the source is verifiably
> public domain in the US, using our clearance procedures,
> then scans or pictures of the source, as well as OCR,
> proofreading, markup, and completed eBooks, are also public
> domain.
>
> This is a position that has been vetted by several
> lawyers who help PG, but has not yet been tested in court
> as far as we know.  The closest counter-example I can
> think of is the dead sea scrolls, which (IIRC) did end
> up with some sort of copyright protection despite their age.
>
> In other words, there *might* be a risk.  When we get such
> requests, we sometimes need to look at the risk of getting sued,
> as well as our own procedures.  We're definitely willing
> to take risks, but in a thoughtful manner.
>
> Feel free to send me further details, or just upload
> the request via http://copy.pglaf.org, along with details.
>  -- Greg
>
> Dr. Gregory B. Newby
> Chief Executive and Director
> Project Gutenberg Literary Archive Foundation http://gutenberg.net
> A 501(c)(3) not-for-profit organization with EIN 64-6221541
> gbnewby@pglaf.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From shimmin at uiuc.edu  Mon Feb  7 08:16:00 2005
From: shimmin at uiuc.edu (Robert Shimmin)
Date: Mon Feb  7 08:16:05 2005
Subject: [gutvol-d] Can project use legally encumbered scans?
In-Reply-To: <Pine.LNX.4.60.0502070744420.16344@pglaf.org>
References: <20050126210155.GA8093@reactor-core.org>	<00a001c5048d$5371e480$f69495ce@gw98>	<20050206221643.GA22130@reactor-core.org>	<20050206223144.GA30756@pglaf.org>
	<Pine.LNX.4.60.0502070744420.16344@pglaf.org>
Message-ID: <42079440.4080106@uiuc.edu>

The closest U.S. case law I know of is Bridgeman Art Library Ltd. v. 
Corel Corporation (1999). There, a U.S. District Court ruled that 
photographic reproductions of two-dimensional works of art, where the 
goal is to make as accurate a reproduction of the work as possible, were 
not 'original works,' and therefore not copyrightable.

By no means does this apply to all photographs of artwork, but only 
those where the artistic capacity of the photographer in choosing angle, 
composing the subject matter, selecting lighting, etc., has been 
subjugated to the overarching goal of reproducing the artwork as 
accurately as possible.

-- RS
From hart at pglaf.org  Mon Feb  7 08:38:17 2005
From: hart at pglaf.org (Michael Hart)
Date: Mon Feb  7 08:38:18 2005
Subject: [gutvol-d] Surge in users?
In-Reply-To: <0m0d01tu4u2243uknmr00t3q4j7mhqebci@4ax.com>
References: <20050202045215.27164.qmail@web52310.mail.yahoo.com>
	<4205F34B.2010606@blueyonder.co.uk>
	<0m0d01tu4u2243uknmr00t3q4j7mhqebci@4ax.com>
Message-ID: <Pine.LNX.4.60.0502070836520.16344@pglaf.org>


This could have been from some press we got in the UK.

And don't forget that every once in a while some big
outfit like Yahoo or Google just grabs everything,
likely if we get lots more hits, but over all the
eBooks in general. . . .

mh


On Sun, 6 Feb 2005, Dave Fawthrop wrote:

> On Sun, 06 Feb 2005 10:36:59 +0000,  Miranda van de Heijning
> <miranda_vandeheijning@blueyonder.co.uk> wrote:
>
> |
> | Just wondering, I was looking through the PG Top 100 and realised the
> | figures for all the books are a lot higher than usual. Do we have a
> | surge in visitors this week?
> |
> | Secondly, mainly because I can't get enough of stats, would it be
> | possible to have a 'total number of books downloaded' somewhere, so we
> | can compare week on week how we are doing?
>
> Beware the numbers of books downloaded will vary drastically on a weekly
> basis, because of Christmas and other public holidays, University and
> school holidays etc.
>
> IMO monthly running averages would give a better idea of what is happening.
>
>
>
> --
> Dave F
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From cannona at fireantproductions.com  Mon Feb  7 10:32:46 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Mon Feb  7 11:09:19 2005
Subject: [gutvol-d] Surge in users?
In-Reply-To: <Pine.LNX.4.60.0502070836520.16344@pglaf.org>
References: <20050202045215.27164.qmail@web52310.mail.yahoo.com>
	<4205F34B.2010606@blueyonder.co.uk>
	<0m0d01tu4u2243uknmr00t3q4j7mhqebci@4ax.com>
	<Pine.LNX.4.60.0502070836520.16344@pglaf.org>
Message-ID: <6.1.2.0.0.20050207123100.01beaea0@mail.fireantproductions.com>

At 10:38 AM 2/7/2005, you wrote:

>This could have been from some press we got in the UK.

I believe this first explanation to be more likely, as the requests for 
DVDs have gone through the roof, and 90% of them were from the 
UK.  Fortunately, things have slowed down a lot over the last few days.

Aaron


>And don't forget that every once in a while some big
>outfit like Yahoo or Google just grabs everything,
>likely if we get lots more hits, but over all the
>eBooks in general. . . .
>
>mh
>
>
>On Sun, 6 Feb 2005, Dave Fawthrop wrote:
>
>>On Sun, 06 Feb 2005 10:36:59 +0000,  Miranda van de Heijning
>><miranda_vandeheijning@blueyonder.co.uk> wrote:
>>
>>|
>>| Just wondering, I was looking through the PG Top 100 and realised the
>>| figures for all the books are a lot higher than usual. Do we have a
>>| surge in visitors this week?
>>|
>>| Secondly, mainly because I can't get enough of stats, would it be
>>| possible to have a 'total number of books downloaded' somewhere, so we
>>| can compare week on week how we are doing?
>>
>>Beware the numbers of books downloaded will vary drastically on a weekly
>>basis, because of Christmas and other public holidays, University and
>>school holidays etc.
>>
>>IMO monthly running averages would give a better idea of what is happening.
>>
>>
>>
>>--
>>Dave F
>>
>>_______________________________________________
>>gutvol-d mailing list
>>gutvol-d@lists.pglaf.org
>>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d


--
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) 


From servalan at ar.com.au  Tue Feb  8 16:47:16 2005
From: servalan at ar.com.au (Pauline)
Date: Tue Feb  8 16:48:12 2005
Subject: [gutvol-d] Issues with links from posted notices failing in Firefox
	1.0
Message-ID: <42095D94.1020509@ar.com.au>

Hi All,

I don't think I am the only one with this problem, but I have yet to see 
this being discussed here...

In Firefox 1.0 links to recently posted extexts from the posted mailing 
list such as:
http://www.gutenberg.net/1/4/9/8/14980

fail. I see an error:

Files Lookup

I see no such file here! (1/4/9/8/14980)

The links work perfectly well in IE6.

The links start to work in Firefox a few days after a project is posted, 
but it's very frustrating to not be able to send links to others without 
knowing whether they will fail or not.

There is a discussion at DP on this issue here:
http://www.pgdp.net/phpBB2/viewtopic.php?p=109131#109131

Some users say if they change the skin back to the default, links work 
OK again. No such luck for me.

Thanks in advance,
P
--
Distributed Proofreaders: http://www.pgdp.net
"Preserving history one page at a time."
From kouhia at nic.funet.fi  Wed Feb  9 08:54:17 2005
From: kouhia at nic.funet.fi (Juhana Sadeharju)
Date: Wed Feb  9 08:54:28 2005
Subject: [gutvol-d] Re: Can project use legally encumbered scans?
Message-ID: <S6922AbVBIQyR/20050209165418Z+8995@nic.funet.fi>

>From: Jonathan Walther <krooger@debian.org>
>
>I know of a situation.  Let's say that it's hypothetical.  Someone got
>access to some extremely old and rare books, and photographed them.  The
>photos were scanned and distributed on CDROM by a company.  The owners
>of the photos say the scans constitute stolen property, and after years
>of legal action, stopped the company from distributing the scans.

We can and should process the scans privately.
Please make them (if the scans were not entirely hypothetical)
available privately, only to people who are willing to help.

Remember, Hershey fonts were in public domain but the file format
was not. The solution was to convert the data to another file format.
How that info could be exploited here?

I would like to have a copy of every scan. If I develop a solution,
the scans may not be available when the solution is ready. That would
only waste our time and give false hopes.

Best regards,
Juhana
-- 
  http://music.columbia.edu/mailman/listinfo/linux-graphics-dev
  for developers of open source graphics software
From maitriv at yahoo.com  Wed Feb  9 09:41:21 2005
From: maitriv at yahoo.com (maitri venkat-ramani)
Date: Wed Feb  9 09:41:28 2005
Subject: [gutvol-d] Re: Arabic eTexts
In-Reply-To: <S18357AbVBGOVQ/20050207142116Z+7941@nic.funet.fi>
Message-ID: <20050209174122.75408.qmail@web52306.mail.yahoo.com>


--- Juhana Sadeharju <kouhia@nic.funet.fi> wrote:

> Are those Arabic OCR software open source and free?

I don't know, I merely pointed you all to the research that is going on
in this area.  My hope was that one/several of our volunteers who are
interested in and have previous experience in PG Arabic etexts would
get in touch with the project and find out what they are doing.

> Having no Arabic OCR software has not prevented us from
> digitizing Arabic texts earlier. If only buying a $$$$ software
> gets you motivated to digitize arabic texts, then it is fine
> by me. 

It's not the digitization method, but the books that come out of the
process that are my concern.  Who cares how he does it - if that guy
does all the scanning and work, what harm is there in asking if he will
share his collection with PG?  If not, fine.
 
Cheers,
Maitri


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
From marcello at perathoner.de  Wed Feb  9 10:24:03 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Feb  9 13:47:59 2005
Subject: [gutvol-d] Issues with links from posted notices failing in
	Firefox 1.0
In-Reply-To: <42095D94.1020509@ar.com.au>
References: <42095D94.1020509@ar.com.au>
Message-ID: <420A5543.7060709@perathoner.de>

Pauline wrote:

> In Firefox 1.0 links to recently posted extexts from the posted mailing 
> list such as:
> http://www.gutenberg.net/1/4/9/8/14980
> 
> fail. I see an error:
> 
> Files Lookup
> 
> I see no such file here! (1/4/9/8/14980)
> 
> The links work perfectly well in IE6.

I'm using Firefox 1.0 (Linux) for development. I don't have any such 
problems.

Are you using some web filtering proxy?

Did you try

   http://www.gutenberg.net/dirs/1/4/9/8/14980/

with a trailing slash? (which, by the way, is the correct url for a 
directory)

Can you install the HTTP Live Headers Plugin and send me a dump of the 
request your browser is generating?


> There is a discussion at DP on this issue here:
> http://www.pgdp.net/phpBB2/viewtopic.php?p=109131#109131

That's fine, because my DP account somehow got removed.


> Some users say if they change the skin back to the default, links work 
> OK again. No such luck for me.

Skins have nothing to do with that.


-- 
Marcello Perathoner
webmaster@gutenberg.org


From marcello at perathoner.de  Wed Feb  9 14:16:19 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Feb  9 14:16:23 2005
Subject: [gutvol-d] Issues with links from posted notices failing in
	Firefox 1.0
In-Reply-To: <42095D94.1020509@ar.com.au>
References: <42095D94.1020509@ar.com.au>
Message-ID: <420A8BB3.6090201@perathoner.de>

Pauline wrote:

> In Firefox 1.0 links to recently posted extexts from the posted mailing 
> list such as:
> http://www.gutenberg.net/1/4/9/8/14980
> 
> fail.  I see an error:
> 
> Files Lookup
> 
> I see no such file here! (1/4/9/8/14980)

Also, I noticed that the posting note gets sent *before* the files are 
posted. In that case the error message you get is quite appropriate.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From servalan at ar.com.au  Wed Feb  9 14:34:48 2005
From: servalan at ar.com.au (Pauline)
Date: Wed Feb  9 14:35:25 2005
Subject: [gutvol-d] Issues with links from posted notices failing
	in	Firefox 1.0
In-Reply-To: <420A5543.7060709@perathoner.de>
References: <42095D94.1020509@ar.com.au> <420A5543.7060709@perathoner.de>
Message-ID: <420A9008.6060608@ar.com.au>

Hiya Marcello,

Marcello Perathoner wrote:

> I'm using Firefox 1.0 (Linux) for development. I don't have any such
>  problems.
> 
> Are you using some web filtering proxy?

Nope.

> Did you try
> 
> http://www.gutenberg.net/dirs/1/4/9/8/14980/
> 
> with a trailing slash? (which, by the way, is the correct url for a 
> directory)

I'm using the urls which appear in the PG posted notices e.g.

Friedrich v. Schiller's Biographie, by H. Doering
  14997
   [Language: German]
   [Link: http://www.gutenberg.net/1/4/9/9/14997 ]
   [Files: 14997-8.txt]

> Can you install the HTTP Live Headers Plugin and send me a dump of
> the request your browser is generating?

I can, but whatever the problem was I cannot reproduce it today. Looks
like it has been fixed & interestingly enough, when I click on a link
without a trailing /, the redirected URL has one tacked on the end. e.g.
http://www.gutenberg.net/1/4/9/9/14994
becomes:
http://www.gutenberg.org/dirs/1/4/9/9/14994/
in the browser address bar.

>> There is a discussion at DP on this issue here: 
>> http://www.pgdp.net/phpBB2/viewtopic.php?p=109131#109131
> 
> 
> That's fine, because my DP account somehow got removed.

I see an account for you (I'm a DP site admin). If you have hassles
logging in, email dphelp@pgdp.net.

I'll post to that thread & see if this issue is now resolved for the
other users.

>> Some users say if they change the skin back to the default, links
>> work OK again. No such luck for me.
> 
> 
> Skins have nothing to do with that.

One of those weird coincidence things then. :)

Thanks,
P
From servalan at ar.com.au  Wed Feb  9 14:37:38 2005
From: servalan at ar.com.au (Pauline)
Date: Wed Feb  9 14:38:13 2005
Subject: [gutvol-d] Issues with links from posted notices failing
	in	Firefox 1.0
In-Reply-To: <420A8BB3.6090201@perathoner.de>
References: <42095D94.1020509@ar.com.au> <420A8BB3.6090201@perathoner.de>
Message-ID: <420A90B2.5040305@ar.com.au>

Marcello Perathoner wrote:
> Pauline wrote:
> 
>> In Firefox 1.0 links to recently posted extexts from the posted 
>> mailing list such as:
>> http://www.gutenberg.net/1/4/9/8/14980
>>
>> fail.  I see an error:
>>
>> Files Lookup
>>
>> I see no such file here! (1/4/9/8/14980)
> 
> 
> Also, I noticed that the posting note gets sent *before* the files are 
> posted. In that case the error message you get is quite appropriate.

I realise that. But the links were working fine in IE, just not in 
Firefox. The error message for a pre-emptive posting note is different & 
fails in all browsers.

Whatever the problem was, it's fixed today for me.

Thanks,
P

From fvandrog at scripps.edu  Wed Feb  9 17:58:44 2005
From: fvandrog at scripps.edu (Frank van Drogen)
Date: Wed Feb  9 17:58:48 2005
Subject: [gutvol-d] What about 15000??
In-Reply-To: <420A90B2.5040305@ar.com.au>
References: <42095D94.1020509@ar.com.au> <420A8BB3.6090201@perathoner.de>
	<420A90B2.5040305@ar.com.au>
Message-ID: <6.2.0.8.0.20050209175751.029e5da0@mail.scripps.edu>

eBook 14999 was posted today, as was eBook 15001 :)

What about the one in between??

Frank

From servalan at ar.com.au  Wed Feb  9 18:21:11 2005
From: servalan at ar.com.au (Pauline)
Date: Wed Feb  9 18:21:52 2005
Subject: [gutvol-d] Issues with links from posted notices
	failing	in	Firefox 1.0
In-Reply-To: <420A90B2.5040305@ar.com.au>
References: <42095D94.1020509@ar.com.au> <420A8BB3.6090201@perathoner.de>
	<420A90B2.5040305@ar.com.au>
Message-ID: <420AC517.4040306@ar.com.au>

Pauline wrote:
> Marcello Perathoner wrote:

>> Also, I noticed that the posting note gets sent *before* the files 
>> are posted. In that case the error message you get is quite 
>> appropriate.
> 
> 
> I realise that. But the links were working fine in IE, just not in 
> Firefox. The error message for a pre-emptive posting note is 
> different & fails in all browsers.
> 
> Whatever the problem was, it's fixed today for me.

I spoke too soon. The links work ok from a link in email, they fail when
opened as a link from another web page. e.g. from within a DP Forum post.
They also seem to work ok if you cut & paste the URL directly into a browser
window as I am doing for IE.

Also - I see recursive redirects behaviour in Firefox for this links like:
http://www.gutenberg.org/dirs/1/4/9/0/14908/14908-h/14908-h.htm

in IE I am presented with the HTML version of the text, in Firefox I get
sent to:
http://www.gutenberg.org/etext/14908

& clicking on the HTML link from the catalogue page which results in
Firefox just winds up back at the catalogue page. There is an
explanation of this behaviour on the DP Forums here:
http://www.pgdp.net/phpBB2/viewtopic.php?p=109273#109273

I'm happy to help debug off-list if needed, I'll go & install that
Firefox extension installed. Contact me if you want my help, or you can
track the discussion on the DP Forums.

Needless to say navigating PG as a Firefox user is v. frustrating at the
moment.

Thanks,
P

From phil at thalasson.com  Thu Feb 10 17:28:28 2005
From: phil at thalasson.com (Philip Baker)
Date: Thu Feb 10 17:30:27 2005
Subject: [gutvol-d] Issues with links from posted
	notices	failing	in	Firefox 1.0
In-Reply-To: <420AC517.4040306@ar.com.au>
Message-ID: <hlMw7UA8oADCFwsg@thalasson.com>

Pauline <servalan@ar.com.au> writes
>
>I spoke too soon. The links work ok from a link in email, they fail when
>opened as a link from another web page. e.g. from within a DP Forum post.
>They also seem to work ok if you cut & paste the URL directly into a browser
>window as I am doing for IE.
>
>Also - I see recursive redirects behaviour in Firefox for this links like:
>http://www.gutenberg.org/dirs/1/4/9/0/14908/14908-h/14908-h.htm
>
>in IE I am presented with the HTML version of the text, in Firefox I get
>sent to:
>http://www.gutenberg.org/etext/14908
>
>& clicking on the HTML link from the catalogue page which results in
>Firefox just winds up back at the catalogue page. There is an
>explanation of this behaviour on the DP Forums here:
>http://www.pgdp.net/phpBB2/viewtopic.php?p=109273#109273
>
>I'm happy to help debug off-list if needed, I'll go & install that
>Firefox extension installed. Contact me if you want my help, or you can
>track the discussion on the DP Forums.
>
>Needless to say navigating PG as a Firefox user is v. frustrating at the
>moment.
>

Check what kind of HTTP_REFERER value, if any, your Firefox is
configured to send to a web server. I believe Firefox uses the term
'network.http.sendRefererHeader' for this in its configuration options.
-- 
Philip Baker
From sly at victoria.tc.ca  Tue Feb 15 21:47:53 2005
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Feb 15 21:48:14 2005
Subject: [gutvol-d] Tamil eBooks site
Message-ID: <Pine.GSO.4.58.0502152145140.24499@vtn1.victoria.tc.ca>


Doing a little bit of browsing of ebooks in other languages,
I found a project working on Tamil texts that appears to
be using the DP software to process their texts.

See:
http://www.tamil.net/projectmadurai/dppm.html

It's nice to see more efforts out there digitizing old
literature, but as usual, they are not as free about
having their texts redistributed as PG is.


Andrew
From shalesller at writeme.com  Tue Feb 15 23:41:50 2005
From: shalesller at writeme.com (D. Starner)
Date: Tue Feb 15 23:42:11 2005
Subject: Canceling Clearances (Was: [gutvol-d] Top 100 EBooks this week
	(and other stories))
Message-ID: <20050216074150.6430F4BDAA@ws1-1.us4.outblaze.com>

> The other story I want to mention has to do with 2 books that I had
> cleared that I am not able to proceed with at this time.
>
> gbn0307140503: Unknown, The Reason Why--Natural History.? bill jenness
> <cr502@freenet.carleton.ca>.? 1860c.? 7/24/2003.? ok.
>
> and
>
> gbn0307140507: warren colburn, Arithmetic upon the inductive method of
> instruction....? bill jenness <cr502@freenet.carleton.ca>.? 1856p1826c.
> 7/24/2003.
> ok.
>
> These books are no longer in my possesion and my scanner is toast, I am
> limping along with a p166 until I can afford to pickup some new equipment.

There ought to be a standard way of canceling clearances, at least for
the new system. I've got clearances that turned out to be done already,
and ones that should be done, but the copy I have isn't in a condition
I can get usable scans out of, which sometimes turns up only when I start
scanning.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From nwolcott at dsdial.net  Wed Feb 16 06:03:02 2005
From: nwolcott at dsdial.net (N Wolcott)
Date: Wed Feb 16 06:24:59 2005
Subject: Canceling Clearances (Was: [gutvol-d] Top 100 EBooks this
	week(and other stories))
References: <20050216074150.6430F4BDAA@ws1-1.us4.outblaze.com>
Message-ID: <007c01c51433$3397c840$a99495ce@gw98>

I often submit clearances for books I think I might get around to if only so
that PG knows they are available PD and others can make use of the
clearance. Not a dog in the manger thing.
----- Original Message -----
From: "D. Starner" <shalesller@writeme.com>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Wednesday, February 16, 2005 2:41 AM
Subject: Canceling Clearances (Was: [gutvol-d] Top 100 EBooks this week(and
other stories))


> > The other story I want to mention has to do with 2 books that I had
> > cleared that I am not able to proceed with at this time.
> >
> > gbn0307140503: Unknown, The Reason Why--Natural History. bill jenness
> > <cr502@freenet.carleton.ca>. 1860c. 7/24/2003. ok.
> >
> > and
> >
> > gbn0307140507: warren colburn, Arithmetic upon the inductive method of
> > instruction.... bill jenness <cr502@freenet.carleton.ca>. 1856p1826c.
> > 7/24/2003.
> > ok.
> >
> > These books are no longer in my possesion and my scanner is toast, I am
> > limping along with a p166 until I can afford to pickup some new
equipment.
>
> There ought to be a standard way of canceling clearances, at least for
> the new system. I've got clearances that turned out to be done already,
> and ones that should be done, but the copy I have isn't in a condition
> I can get usable scans out of, which sometimes turns up only when I start
> scanning.
> --
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From joshua at hutchinson.net  Wed Feb 16 06:29:05 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Feb 16 06:29:09 2005
Subject: Canceling Clearances (Was: [gutvol-d] Top 100 EBooks
	thisweek(and other stor
Message-ID: <20050216142905.35F174F4CB@ws6-5.us4.outblaze.com>

That's probably not a good idea.  The only "list" we have of things people are supposedly working on is the clearance list.  I know a lot of people will skip a book if they see someone else has cleared it recently.  You should probably only clear something if you plan on working on it... and probably only if you plan on working on it soon.  DP churns through a lot of books and if you clear something but don't work on it for a year, you're blocking that content from DP.

Josh

----- Original Message -----
From: "N Wolcott" <nwolcott@dsdial.net>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: Canceling Clearances (Was: [gutvol-d] Top 100 EBooks thisweek(and other stories))
Date: Wed, 16 Feb 2005 09:03:02 -0500

> 
> I often submit clearances for books I think I might get around to if only so
> that PG knows they are available PD and others can make use of the
> clearance. Not a dog in the manger thing.

From shalesller at writeme.com  Wed Feb 16 07:47:10 2005
From: shalesller at writeme.com (D. Starner)
Date: Wed Feb 16 07:47:31 2005
Subject: Canceling Clearances (Was: [gutvol-d] Top 100 EBooks
	thisweek(and other stories))
Message-ID: <20050216154710.8511B4BE64@ws1-1.us4.outblaze.com>

"N Wolcott" writes:

> I often submit clearances for books I think I might get around to if only so 
> that PG knows they are available PD and others can make use of the 
> clearance. Not a dog in the manger thing. 

Except in the rare case of renewal notices, if someone cares if they are PD,
it's trivial to check. I can't make use of the clearance, as I've to clear
my own copy of the book, but your clearance will tell me that I shouldn't
bother, since other people are working on it. 
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From hart at pglaf.org  Wed Feb 16 12:52:30 2005
From: hart at pglaf.org (Michael Hart)
Date: Wed Feb 16 12:52:31 2005
Subject: [gutvol-d] Tamil eBooks site
In-Reply-To: <Pine.GSO.4.58.0502152145140.24499@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0502152145140.24499@vtn1.victoria.tc.ca>
Message-ID: <Pine.LNX.4.60.0502161248430.4044@pglaf.org>


On Tue, 15 Feb 2005, Andrew Sly wrote:

>
> Doing a little bit of browsing of ebooks in other languages,
> I found a project working on Tamil texts that appears to
> be using the DP software to process their texts.
>
> See:
> http://www.tamil.net/projectmadurai/dppm.html
>
> It's nice to see more efforts out there digitizing old
> literature, but as usual, they are not as free about
> having their texts redistributed as PG is.
>
> Andrew

We've never required that anyone using PG services,
even copyright research, give their results back to PG.

We are here to encourage the creation and distribution
of eBooks.

We don't have to create and distribute all the eBooks
we have some involvement with.

"There is no end to the great things we can accomplish
if we don't worry about who gets the credit."  - Anon.

Life is an open book test,
without any time limits.
So let's provide more books.

The continuing standard of living of humankind
is how we measure the value of our work.


Michael

From ag737 at freenet.carleton.ca  Wed Feb 16 13:47:12 2005
From: ag737 at freenet.carleton.ca (Wallace J.McLean)
Date: Wed Feb 16 13:47:23 2005
Subject: [gutvol-d] Re: Canceling Clearances
Message-ID: <571a7156e479.56e479571a71@ncf.ca>

It would be a perfectly good idea, if the clearance system and IP list 
was dynamic and allowed for contact between volunteers.

I'm shocked that the system is still the way it is. I'm half surprised 
it's not on the back of envelopes and napkins.


> That's probably not a good idea.  The only "list" we have of things 
people are
> supposedly working on is the clearance list.  I know a lot of people 
will skip a 
> book if they see someone else has cleared it recently.  You should 
probably only 
> clear something if you plan on working on it... and probably only if 
you plan on 
> working on it soon.  DP churns through a lot of books and if you 
clear something but 
> don't work on it for a year, you're blocking that content from DP.


----- Original Message -----
From: "N Wolcott" <nwolcott@dsdial.net>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: Canceling Clearances (Was: [gutvol-d] Top 100 EBooks 
thisweek(and other stories))
Date: Wed, 16 Feb 2005 09:03:02 -0500

> 
> I often submit clearances for books I think I might get around to if 
only so
> that PG knows they are available PD and others can make use of the
> clearance. Not a dog in the manger thing.

From joshua at hutchinson.net  Wed Feb 16 14:06:49 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Feb 16 14:06:58 2005
Subject: [gutvol-d] Re: Canceling Clearances
Message-ID: <20050216220649.E51FC1099CE@ws6-4.us4.outblaze.com>

Well, if pigs flew ... fried pork wings would be a perfectly good idea, too.

The point is, we don't have a dynamic system.  We have what we have (until someone makes something better).  That means, currently, creating book clearances you don't intend to use shortly is a "bad idea."

I'm not trying to be rude; I'm just trying to point out a behavior that doesn't work well in the currently implemented clearance system.

Josh


----- Original Message -----
From: "Wallace J.McLean" <ag737@freenet.carleton.ca>
> 
> It would be a perfectly good idea, if the clearance system and IP list
> was dynamic and allowed for contact between volunteers.
> 
> I'm shocked that the system is still the way it is. I'm half surprised
> it's not on the back of envelopes and napkins.
> 
> 
> > That's probably not a good idea.  The only "list" we have of things
> people are
> > supposedly working on is the clearance list.  I know a lot of people
> will skip a
> > book if they see someone else has cleared it recently.  You should
> probably only
> > clear something if you plan on working on it... and probably only if
> you plan on
> > working on it soon.  DP churns through a lot of books and if you
> clear something but
> > don't work on it for a year, you're blocking that content from DP.
> 
> 
> ----- Original Message -----
> From: "N Wolcott" <nwolcott@dsdial.net>
> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
> Subject: Re: Canceling Clearances (Was: [gutvol-d] Top 100 EBooks
> thisweek(and other stories))
> Date: Wed, 16 Feb 2005 09:03:02 -0500
> 
> >
> > I often submit clearances for books I think I might get around to if
> only so
> > that PG knows they are available PD and others can make use of the
> > clearance. Not a dog in the manger thing.
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From alex at awstudios.net  Thu Feb 17 07:52:09 2005
From: alex at awstudios.net (Alex Wilson)
Date: Thu Feb 17 08:39:26 2005
Subject: [gutvol-d] David Wyllie Email address
Message-ID: <BE3A27D9.3C61D%alex@awstudios.net>

About a month ago Greg Newby offered to get me in touch with David
Wyllie--who provided the English translation of Kafka's Metamorphosis for
PG--and I haven't heard from him since. I'm thinking Greg's emails or mine
are ending up in a junk mail folder, so I'm wondering if anyone here knows
how I can get in touch with Mr. Wyllie.

Thanks.

Alex.

http://www.telltaleweekly.org - Funding a Free Audiobook Library


From hart at pglaf.org  Thu Feb 17 10:01:53 2005
From: hart at pglaf.org (Michael Hart)
Date: Thu Feb 17 10:01:54 2005
Subject: [gutvol-d] David Wyllie Email address
In-Reply-To: <BE3A27D9.3C61D%alex@awstudios.net>
References: <BE3A27D9.3C61D%alex@awstudios.net>
Message-ID: <Pine.LNX.4.60.0502171001190.28262@pglaf.org>


I sent the <dandelion> address,
unless someone has a better one.

Michael


On Thu, 17 Feb 2005, Alex Wilson wrote:

> About a month ago Greg Newby offered to get me in touch with David
> Wyllie--who provided the English translation of Kafka's Metamorphosis for
> PG--and I haven't heard from him since. I'm thinking Greg's emails or mine
> are ending up in a junk mail folder, so I'm wondering if anyone here knows
> how I can get in touch with Mr. Wyllie.
>
> Thanks.
>
> Alex.
>
> http://www.telltaleweekly.org - Funding a Free Audiobook Library
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From miranda_vandeheijning at blueyonder.co.uk  Sat Feb 19 02:23:03 2005
From: miranda_vandeheijning at blueyonder.co.uk (Miranda van de Heijning)
Date: Sat Feb 19 02:23:33 2005
Subject: [gutvol-d] 500th French book
In-Reply-To: <Pine.LNX.4.60.0502171001190.28262@pglaf.org>
References: <BE3A27D9.3C61D%alex@awstudios.net>
	<Pine.LNX.4.60.0502171001190.28262@pglaf.org>
Message-ID: <42171387.5020807@blueyonder.co.uk>


Hi guys,

There are 485 French books in PG at the moment, so we will be reaching 
500 pretty soon. Has any thought been given yet about what could be the 
500th book? If no decision has been made, there are quite a few George 
Sand's coming up from DP and they may be suitable, considering that we 
are working on providing her complete works.

Secondly, are there any statistics on which are the most popular French 
books? I know that Le Kama Soutra is quite a crowdpleaser, but what 
about the rest?

Miranda


Michael Hart wrote:

>
> I sent the <dandelion> address,
> unless someone has a better one.
>
> Michael
>
>
> On Thu, 17 Feb 2005, Alex Wilson wrote:
>
>> About a month ago Greg Newby offered to get me in touch with David
>> Wyllie--who provided the English translation of Kafka's Metamorphosis 
>> for
>> PG--and I haven't heard from him since. I'm thinking Greg's emails or 
>> mine
>> are ending up in a junk mail folder, so I'm wondering if anyone here 
>> knows
>> how I can get in touch with Mr. Wyllie.
>>
>> Thanks.
>>
>> Alex.
>>
>> http://www.telltaleweekly.org - Funding a Free Audiobook Library
>>
>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>

From gbnewby at pglaf.org  Sat Feb 19 21:49:56 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat Feb 19 21:49:58 2005
Subject: [gutvol-d] 500th French book
In-Reply-To: <42171387.5020807@blueyonder.co.uk>
References: <BE3A27D9.3C61D%alex@awstudios.net>
	<Pine.LNX.4.60.0502171001190.28262@pglaf.org>
	<42171387.5020807@blueyonder.co.uk>
Message-ID: <20050220054956.GB30309@pglaf.org>

On Sat, Feb 19, 2005 at 10:23:03AM +0000, Miranda van de Heijning wrote:
> 
> Hi guys,
> 
> There are 485 French books in PG at the moment, so we will be reaching 
> 500 pretty soon. Has any thought been given yet about what could be the 
> 500th book? If no decision has been made, there are quite a few George 
> Sand's coming up from DP and they may be suitable, considering that we 
> are working on providing her complete works.

I don't think anyone has suggested one yet.  Sands sounds
like a good choice.  We also have a nice array of Jules Verne
and Victor Hugo, and I've noticed some Shakespeare translations.

> Secondly, are there any statistics on which are the most popular French 
> books? I know that Le Kama Soutra is quite a crowdpleaser, but what 
> about the rest?

There's a "top 100" list at http://gutenberg.org/catalog
There is also a non-public analysis of the download
statistics.  Both of these are for ibiblio only, so while they're
useful they don't represent other download sources (notably,
our many mirrors).

You'd need to look through the download list "by hand" to spot the
French titles.  Email if if you want the URL & username+password,
and I'll dig it up.
  -- Greg


> Michael Hart wrote:
> 
> >
> >I sent the <dandelion> address,
> >unless someone has a better one.
> >
> >Michael
> >
> >
> >On Thu, 17 Feb 2005, Alex Wilson wrote:
> >
> >>About a month ago Greg Newby offered to get me in touch with David
> >>Wyllie--who provided the English translation of Kafka's Metamorphosis 
> >>for
> >>PG--and I haven't heard from him since. I'm thinking Greg's emails or 
> >>mine
> >>are ending up in a junk mail folder, so I'm wondering if anyone here 
> >>knows
> >>how I can get in touch with Mr. Wyllie.
> >>
> >>Thanks.
> >>
> >>Alex.
> >>
> >>http://www.telltaleweekly.org - Funding a Free Audiobook Library
> >>
> >>
> >>_______________________________________________
> >>gutvol-d mailing list
> >>gutvol-d@lists.pglaf.org
> >>http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >>
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d@lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> >
> >
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From miranda_vandeheijning at blueyonder.co.uk  Sun Feb 20 03:11:33 2005
From: miranda_vandeheijning at blueyonder.co.uk (Miranda van de Heijning)
Date: Sun Feb 20 03:11:59 2005
Subject: [gutvol-d] 500th French book
In-Reply-To: <20050220054956.GB30309@pglaf.org>
References: <BE3A27D9.3C61D%alex@awstudios.net>
	<Pine.LNX.4.60.0502171001190.28262@pglaf.org>
	<42171387.5020807@blueyonder.co.uk>
	<20050220054956.GB30309@pglaf.org>
Message-ID: <42187065.4060107@blueyonder.co.uk>

Hi all,

I have just looked through the download info which Marcello very kindly 
compiled for me and I would like to suggest we post as the 500th book 
part 1 of 'Sodome et Gomorrhe'.

It is part of Proust's classic A la recherche du temps perdu and the 
only remaining volume which we can actually post to PG. This is because 
the other parts of the series were published after his death, between 
1923 and 1927. We already have Sodomo et Gomorrhe 2.

Sodome et Gomorrhe 1 is close to finishing proofing at Distributed 
Proofreaders (162 pages to go in round 2) so I expect it will be 
available for post-processing/posting soon.

Or are there any other suggestions?

Miranda


Greg Newby wrote:

>On Sat, Feb 19, 2005 at 10:23:03AM +0000, Miranda van de Heijning wrote:
>  
>
>>Hi guys,
>>
>>There are 485 French books in PG at the moment, so we will be reaching 
>>500 pretty soon. Has any thought been given yet about what could be the 
>>500th book? If no decision has been made, there are quite a few George 
>>Sand's coming up from DP and they may be suitable, considering that we 
>>are working on providing her complete works.
>>    
>>
>
>I don't think anyone has suggested one yet.  Sands sounds
>like a good choice.  We also have a nice array of Jules Verne
>and Victor Hugo, and I've noticed some Shakespeare translations.
>
>  
>
>>Secondly, are there any statistics on which are the most popular French 
>>books? I know that Le Kama Soutra is quite a crowdpleaser, but what 
>>about the rest?
>>    
>>
>
>There's a "top 100" list at http://gutenberg.org/catalog
>There is also a non-public analysis of the download
>statistics.  Both of these are for ibiblio only, so while they're
>useful they don't represent other download sources (notably,
>our many mirrors).
>
>You'd need to look through the download list "by hand" to spot the
>French titles.  Email if if you want the URL & username+password,
>and I'll dig it up.
>  -- Greg
>
>
>  
>
>>Michael Hart wrote:
>>
>>    
>>
>>>I sent the <dandelion> address,
>>>unless someone has a better one.
>>>
>>>Michael
>>>
>>>
>>>On Thu, 17 Feb 2005, Alex Wilson wrote:
>>>
>>>      
>>>
>>>>About a month ago Greg Newby offered to get me in touch with David
>>>>Wyllie--who provided the English translation of Kafka's Metamorphosis 
>>>>for
>>>>PG--and I haven't heard from him since. I'm thinking Greg's emails or 
>>>>mine
>>>>are ending up in a junk mail folder, so I'm wondering if anyone here 
>>>>knows
>>>>how I can get in touch with Mr. Wyllie.
>>>>
>>>>Thanks.
>>>>
>>>>Alex.
>>>>
>>>>http://www.telltaleweekly.org - Funding a Free Audiobook Library
>>>>
>>>>
>>>>_______________________________________________
>>>>gutvol-d mailing list
>>>>gutvol-d@lists.pglaf.org
>>>>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>>>
>>>>        
>>>>
>>>_______________________________________________
>>>gutvol-d mailing list
>>>gutvol-d@lists.pglaf.org
>>>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>>
>>>
>>>
>>>      
>>>
>>_______________________________________________
>>gutvol-d mailing list
>>gutvol-d@lists.pglaf.org
>>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>    
>>
>
>
>
>  
>

From bill at truthdb.org  Sun Feb 20 22:20:34 2005
From: bill at truthdb.org (bill jenness)
Date: Sun Feb 20 22:20:54 2005
Subject: [gutvol-d] pgdvd access index file error
Message-ID: <1272.134.117.137.41.1108966834.squirrel@134.117.137.41>

I have just downloaded the dvd from ibiblio and there seems to be a
problem with the index.htm in the access directory. Here is an excerpt to
illustrate the problem:

<A HREF="gtnletC.htm#cooperja">James Fenimore Cooper</A>

The file refered to on the dvd is actually gtnletc.htm and this error is
propagated throughout the index. The gtnanon.htm file is correctly linked
but the links for the others are not. This may not be a problem on a
windows machine but it is on a case sensitive filesystem. The download
file was 10802.iso, the file date is Nov 22, 2003. It should be fairly
trivial to fix.

Two ways to correct this are change the hotlinks in access/index.htm or
change the filenames in that directory to agree. Either way it would mean
opening the iso for editing to make the repair then correcting the md5sum
to match.

Is this something that has already been looked at?
From jon_niehof at yahoo.com  Sun Feb 20 23:12:18 2005
From: jon_niehof at yahoo.com (Jon Niehof)
Date: Sun Feb 20 23:12:36 2005
Subject: [gutvol-d] pgdvd access index file error
In-Reply-To: <1272.134.117.137.41.1108966834.squirrel@134.117.137.41>
Message-ID: <20050221071218.218.qmail@web80905.mail.scd.yahoo.com>

> I have just downloaded the dvd from ibiblio and there seems to
> be a problem with the index.htm in the access directory. Here
> is an excerpt to illustrate the problem:
> 
> <A HREF="gtnletC.htm#cooperja">James Fenimore Cooper</A>
> 
> The file refered to on the dvd is actually gtnletc.htm and
> this error is propagated throughout the index. The gtnanon.htm
> file is correctly linked but the links for the others are not.
> This may not be a problem on a windows machine but it is on a
> case sensitive filesystem.

You don't say on what sort of system you had mounted the DVD. Is
it possible the DVD has Joliet or Rock Ridge extensions but they
are not being read by your system? If there are no such
extensions (and indeed they would seem to be against PG
philosophy of least common denominator), I would expect the OS
should treat the ISO9660 filesystem as case-insensitive; it's
often translated anyhow (e.g. filenames usually show up as
lowercase on my Linux box). Of course, I haven't validated this
behaviour as either required or actually implemented ;) but if
the filenames are to be corrected due to case sensitivity making
them all caps would, I believe, be more accurate.


__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - 250MB free storage. Do more. Manage less. 
http://info.mail.yahoo.com/mail_250
From gbnewby at pglaf.org  Mon Feb 21 00:03:50 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon Feb 21 00:03:52 2005
Subject: [gutvol-d] pgdvd access index file error
In-Reply-To: <1272.134.117.137.41.1108966834.squirrel@134.117.137.41>
References: <1272.134.117.137.41.1108966834.squirrel@134.117.137.41>
Message-ID: <20050221080350.GA5557@pglaf.org>

On Mon, Feb 21, 2005 at 01:20:34AM -0500, bill jenness wrote:
> I have just downloaded the dvd from ibiblio and there seems to be a
> problem with the index.htm in the access directory. Here is an excerpt to
> illustrate the problem:
> 
> <A HREF="gtnletC.htm#cooperja">James Fenimore Cooper</A>
> 
> The file refered to on the dvd is actually gtnletc.htm and this error is
> propagated throughout the index. The gtnanon.htm file is correctly linked
> but the links for the others are not. This may not be a problem on a
> windows machine but it is on a case sensitive filesystem. The download
> file was 10802.iso, the file date is Nov 22, 2003. It should be fairly
> trivial to fix.
> 
> Two ways to correct this are change the hotlinks in access/index.htm or
> change the filenames in that directory to agree. Either way it would mean
> opening the iso for editing to make the repair then correcting the md5sum
> to match.
> 
> Is this something that has already been looked at?

This is a a known problem.  It's on the list, with a few other things,
to fix in the next iteration of an ISO image.  I keep thinking I'm going
to roll out a brand new ISO, but it hasn't happened yet.  Meanwhile, for
most people, just editing the index.htm to all lower-case is a good
"quick hack" solution - assuming you've copied all the files to your
hard drive.  Something like this:
	cp index.htm /tmp/oldindex.htm 
	cat /tmp/oldindex.htm | tr '[A-Z]' '[a-z]' > index.htm

  -- Greg
From hart at pglaf.org  Mon Feb 21 11:06:02 2005
From: hart at pglaf.org (Michael Hart)
Date: Mon Feb 21 11:06:05 2005
Subject: [gutvol-d] 500th French book
In-Reply-To: <42187065.4060107@blueyonder.co.uk>
References: <BE3A27D9.3C61D%alex@awstudios.net>
	<Pine.LNX.4.60.0502171001190.28262@pglaf.org>
	<42171387.5020807@blueyonder.co.uk> <20050220054956.GB30309@pglaf.org>
	<42187065.4060107@blueyonder.co.uk>
Message-ID: <Pine.LNX.4.60.0502211104400.15772@pglaf.org>


Don't forget, all of Proust can be posted at Project Gutenberg sites
with "life +50" and +70 copyrights, since he died so long ago.

Michael


On Sun, 20 Feb 2005, Miranda van de Heijning wrote:

> Hi all,
>
> I have just looked through the download info which Marcello very kindly 
> compiled for me and I would like to suggest we post as the 500th book part 1 
> of 'Sodome et Gomorrhe'.
>
> It is part of Proust's classic A la recherche du temps perdu and the only 
> remaining volume which we can actually post to PG. This is because the other 
> parts of the series were published after his death, between 1923 and 1927. We 
> already have Sodomo et Gomorrhe 2.
>
> Sodome et Gomorrhe 1 is close to finishing proofing at Distributed 
> Proofreaders (162 pages to go in round 2) so I expect it will be available 
> for post-processing/posting soon.
>
> Or are there any other suggestions?
>
> Miranda
>
>
>
> Greg Newby wrote:
>
>> On Sat, Feb 19, 2005 at 10:23:03AM +0000, Miranda van de Heijning wrote:
>> 
>>> Hi guys,
>>> 
>>> There are 485 French books in PG at the moment, so we will be reaching 
>>> 500 pretty soon. Has any thought been given yet about what could be the 
>>> 500th book? If no decision has been made, there are quite a few George 
>>> Sand's coming up from DP and they may be suitable, considering that we 
>>> are working on providing her complete works.
>>> 
>> 
>> I don't think anyone has suggested one yet.  Sands sounds
>> like a good choice.  We also have a nice array of Jules Verne
>> and Victor Hugo, and I've noticed some Shakespeare translations.
>> 
>> 
>>> Secondly, are there any statistics on which are the most popular French 
>>> books? I know that Le Kama Soutra is quite a crowdpleaser, but what 
>>> about the rest?
>>> 
>> 
>> There's a "top 100" list at http://gutenberg.org/catalog
>> There is also a non-public analysis of the download
>> statistics.  Both of these are for ibiblio only, so while they're
>> useful they don't represent other download sources (notably,
>> our many mirrors).
>> 
>> You'd need to look through the download list "by hand" to spot the
>> French titles.  Email if if you want the URL & username+password,
>> and I'll dig it up.
>>  -- Greg
>> 
>> 
>> 
>>> Michael Hart wrote:
>>> 
>>> 
>>>> I sent the <dandelion> address,
>>>> unless someone has a better one.
>>>> 
>>>> Michael
>>>> 
>>>> 
>>>> On Thu, 17 Feb 2005, Alex Wilson wrote:
>>>> 
>>>> 
>>>>> About a month ago Greg Newby offered to get me in touch with David
>>>>> Wyllie--who provided the English translation of Kafka's 
>>>>> Metamorphosis for
>>>>> PG--and I haven't heard from him since. I'm thinking Greg's emails 
>>>>> or mine
>>>>> are ending up in a junk mail folder, so I'm wondering if anyone here 
>>>>> knows
>>>>> how I can get in touch with Mr. Wyllie.
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> Alex.
>>>>> 
>>>>> http://www.telltaleweekly.org - Funding a Free Audiobook Library
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> gutvol-d mailing list
>>>>> gutvol-d@lists.pglaf.org
>>>>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>>>> 
>>>>> 
>>>> _______________________________________________
>>>> gutvol-d mailing list
>>>> gutvol-d@lists.pglaf.org
>>>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>>> 
>>>> 
>>>> 
>>>> 
>>> _______________________________________________
>>> gutvol-d mailing list
>>> gutvol-d@lists.pglaf.org
>>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>> 
>> 
>> 
>> 
>> 
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From sly at victoria.tc.ca  Mon Feb 21 21:53:23 2005
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon Feb 21 21:53:41 2005
Subject: [gutvol-d] Understatment of the day
Message-ID: <Pine.GSO.4.58.0502212152001.29044@vtn1.victoria.tc.ca>


Here's a little mention of PG from usenet...

Newsgroups: alt.games.video.nintendo.gameboy.advance
Date: 2005-02-19 12:22:54 PST

Files for books?  Text files are available for books all over.
Project Gutenberg has hundreds of public domain text files.


From miranda_vandeheijning at blueyonder.co.uk  Tue Feb 22 01:50:56 2005
From: miranda_vandeheijning at blueyonder.co.uk (Miranda van de Heijning)
Date: Tue Feb 22 01:51:23 2005
Subject: [gutvol-d] 500th French book
In-Reply-To: <Pine.LNX.4.60.0502211104400.15772@pglaf.org>
References: <BE3A27D9.3C61D%alex@awstudios.net>
	<Pine.LNX.4.60.0502171001190.28262@pglaf.org>
	<42171387.5020807@blueyonder.co.uk>
	<20050220054956.GB30309@pglaf.org>
	<42187065.4060107@blueyonder.co.uk>
	<Pine.LNX.4.60.0502211104400.15772@pglaf.org>
Message-ID: <421B0080.8060402@blueyonder.co.uk>

My intention is to continue A la recherche du temps perdu on DP-EU and 
hopefully, one of the other PG sites will be able to publish them.

After that, we just need to wait for US copyright to move along a few 
years and then PG-US will have the full lot as well. :-)

Miranda


Michael Hart wrote:

>
> Don't forget, all of Proust can be posted at Project Gutenberg sites
> with "life +50" and +70 copyrights, since he died so long ago.
>
> Michael
>
>
> On Sun, 20 Feb 2005, Miranda van de Heijning wrote:
>
>> Hi all,
>>
>> I have just looked through the download info which Marcello very 
>> kindly compiled for me and I would like to suggest we post as the 
>> 500th book part 1 of 'Sodome et Gomorrhe'.
>>
>> It is part of Proust's classic A la recherche du temps perdu and the 
>> only remaining volume which we can actually post to PG. This is 
>> because the other parts of the series were published after his death, 
>> between 1923 and 1927. We already have Sodomo et Gomorrhe 2.
>>
>> Sodome et Gomorrhe 1 is close to finishing proofing at Distributed 
>> Proofreaders (162 pages to go in round 2) so I expect it will be 
>> available for post-processing/posting soon.
>>
>> Or are there any other suggestions?
>>
>> Miranda
>>
>>
>>
>> Greg Newby wrote:
>>
>>> On Sat, Feb 19, 2005 at 10:23:03AM +0000, Miranda van de Heijning 
>>> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> There are 485 French books in PG at the moment, so we will be 
>>>> reaching 500 pretty soon. Has any thought been given yet about what 
>>>> could be the 500th book? If no decision has been made, there are 
>>>> quite a few George Sand's coming up from DP and they may be 
>>>> suitable, considering that we are working on providing her complete 
>>>> works.
>>>>
>>>
>>> I don't think anyone has suggested one yet.  Sands sounds
>>> like a good choice.  We also have a nice array of Jules Verne
>>> and Victor Hugo, and I've noticed some Shakespeare translations.
>>>
>>>
>>>> Secondly, are there any statistics on which are the most popular 
>>>> French books? I know that Le Kama Soutra is quite a crowdpleaser, 
>>>> but what about the rest?
>>>>
>>>
>>> There's a "top 100" list at http://gutenberg.org/catalog
>>> There is also a non-public analysis of the download
>>> statistics.  Both of these are for ibiblio only, so while they're
>>> useful they don't represent other download sources (notably,
>>> our many mirrors).
>>>
>>> You'd need to look through the download list "by hand" to spot the
>>> French titles.  Email if if you want the URL & username+password,
>>> and I'll dig it up.
>>>  -- Greg
>>>
>>>
>>>
>>>> Michael Hart wrote:
>>>>
>>>>
>>>>> I sent the <dandelion> address,
>>>>> unless someone has a better one.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>> On Thu, 17 Feb 2005, Alex Wilson wrote:
>>>>>
>>>>>
>>>>>> About a month ago Greg Newby offered to get me in touch with David
>>>>>> Wyllie--who provided the English translation of Kafka's 
>>>>>> Metamorphosis for
>>>>>> PG--and I haven't heard from him since. I'm thinking Greg's 
>>>>>> emails or mine
>>>>>> are ending up in a junk mail folder, so I'm wondering if anyone 
>>>>>> here knows
>>>>>> how I can get in touch with Mr. Wyllie.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Alex.
>>>>>>
>>>>>> http://www.telltaleweekly.org - Funding a Free Audiobook Library
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> gutvol-d mailing list
>>>>>> gutvol-d@lists.pglaf.org
>>>>>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> gutvol-d mailing list
>>>>> gutvol-d@lists.pglaf.org
>>>>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> gutvol-d mailing list
>>>> gutvol-d@lists.pglaf.org
>>>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>
>
>
>

From marcello at perathoner.de  Tue Feb 22 13:36:01 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Feb 22 16:13:07 2005
Subject: [gutvol-d] Gutenbergprosjektet (PG mentioned in Norwegian web site)
Message-ID: <421BA5C1.2070605@perathoner.de>


   http://www.dinside.no/php/art.php?id=117187


-- 
Marcello Perathoner
webmaster@gutenberg.org


From nwolcott at dsdial.net  Wed Feb 23 07:49:16 2005
From: nwolcott at dsdial.net (N Wolcott)
Date: Wed Feb 23 07:50:17 2005
Subject: [gutvol-d] Pepys' birthday
Message-ID: <005901c519bf$420ca4e0$ac9495ce@gw98>

Being his birthday maybe this is appropriate. Pepys gave his memoirs to Cambridge University, but the full text was not published until 1970. That being the case would not the text (minus editorial comment and added footnotes) be now public domain as it is now more than 75 years since the author's death? 


N Wolcott  nwolcott2@post.harvard.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050223/b4389498/attachment.html
From hyphen at hyphenologist.co.uk  Wed Feb 23 09:07:23 2005
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Wed Feb 23 09:07:47 2005
Subject: [gutvol-d] Pepys' birthday
In-Reply-To: <005901c519bf$420ca4e0$ac9495ce@gw98>
References: <005901c519bf$420ca4e0$ac9495ce@gw98>
Message-ID: <akdp11tcmheplqo5u9q7gu94sj8ocakv78@4ax.com>

On Wed, 23 Feb 2005 10:49:16 -0500,  "N Wolcott" <nwolcott@dsdial.net>
wrote:

| Being his birthday maybe this is appropriate. Pepys gave his 
| memoirs to Cambridge University, 

Probably not the University more likely a college.
Do you have any idea which college?

| but the full text was not 
| published  until 1970. That being the case would not the text 
| (minus editorial comment and added footnotes) be now public 
| domain as it is now more than 75 years since the author's death? 

Probably but how to obtain a scan to work from.

See also www.pepysdiary.com which appears to be everything on line.


-- 
Dave F

From gbnewby at pglaf.org  Wed Feb 23 09:16:46 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Feb 23 09:16:47 2005
Subject: [gutvol-d] Pepys' birthday
In-Reply-To: <005901c519bf$420ca4e0$ac9495ce@gw98>
References: <005901c519bf$420ca4e0$ac9495ce@gw98>
Message-ID: <20050223171646.GA15596@pglaf.org>

On Wed, Feb 23, 2005 at 10:49:16AM -0500, N Wolcott wrote:
> Being his birthday maybe this is appropriate. Pepys gave his memoirs to Cambridge University, but the full text was not published until 1970. That being the case would not the text (minus editorial comment and added footnotes) be now public domain as it is now more than 75 years since the author's death? 
> 

Are we missing something?

Quotes and Images From The Diary of Samuel Pepys, by Samuel Pepys         7554
Jul 2003 Quotations From Diary of Samuel Pepys, by Widger  [dwqspxxx.xxx] 4202
Diary of Samuel Pepys, Complete, by Samuel Pepys                          4200
Diary of Samuel Pepys, 1669 N.S. Complete, by Samuel Pepys                4199
Diary of Samuel Pepys, Apr/May 1669, by Samuel Pepys                      4198
Diary of Samuel Pepys, Feb/Mar 1668/69, by Samuel Pepys                   4197
Diary of Samuel Pepys, January 1668/69, by Samuel Pepys                   4196
Diary of Samuel Pepys, 1668 N.S. Complete, by Samuel Pepys                4195
Diary of Samuel Pepys, December 1668, by Samuel Pepys                     4194
Diary of Samuel Pepys, November 1668, by Samuel Pepys                     4193
Diary of Samuel Pepys, September/October 1668, by Samuel Pepys            4192
Diary of Samuel Pepys, August 1668, by Samuel Pepys                       4191
Diary of Samuel Pepys, June/July 1668, by Samuel Pepys                    4190
Diary of Samuel Pepys, May 1668, by Samuel Pepys                          4189
Diary of Samuel Pepys, April 1668, by Samuel Pepys                        4188
Diary of Samuel Pepys, March 1667/68, by Samuel Pepys                     4187
Diary of Samuel Pepys, February 1667/68, by Samuel Pepys                  4186
Diary of Samuel Pepys, January 1667/68, by Samuel Pepys                   4185
Diary of Samuel Pepys, 1667 N.S. Complete, by Samuel Pepys                4184
Diary of Samuel Pepys, December 1667, by Samuel Pepys                     4183
Diary of Samuel Pepys, November 1667, by Samuel Pepys                     4182
Diary of Samuel Pepys, October 1667, by Samuel Pepys                      4181
Diary of Samuel Pepys, September 1667, by Samuel Pepys                    4180
Diary of Samuel Pepys, August 1667, by Samuel Pepys                       4179
Diary of Samuel Pepys, July 1667, by Samuel Pepys                         4178
Diary of Samuel Pepys, June 1667, by Samuel Pepys                         4177
Diary of Samuel Pepys, May 1667, by Samuel Pepys                          4176
Diary of Samuel Pepys, April 1966/67, by Samuel Pepys                     4175
Diary of Samuel Pepys, March 1966/67, by Samuel Pepys                     4174
Diary of Samuel Pepys, February 1966/67, by Samuel Pepys                  4173
Diary of Samuel Pepys, January 1966/67, by Samuel Pepys                   4172
Diary of Samuel Pepys, 1666 N.S. Complete, by Samuel Pepys                4171
Diary of Samuel Pepys, December 1666, by Samuel Pepys                     4170
Diary of Samuel Pepys, November 1666, by Samuel Pepys                     4169
Diary of Samuel Pepys, October 1666, by Samuel Pepys                      4168
Diary of Samuel Pepys, August/September 1666, by Samuel Pepys             4167
Diary of Samuel Pepys, July 1666, by Samuel Pepys                         4166
Diary of Samuel Pepys, May/June 1666, by Samuel Pepys                     4165
Diary of Samuel Pepys, March/April 1665/66, by Samuel Pepys               4164
Diary of Samuel Pepys, January/February 1965/66, by Samuel Pepys          4163
Diary of Samuel Pepys, 1665 N.S. Complete, by Samuel Pepys                4162
Diary of Samuel Pepys, November/December 1665, by Samuel Pepys            4161
Diary of Samuel Pepys, October 1665, by Samuel Pepys                      4160
Diary of Samuel Pepys, September 1665, by Samuel Pepys                    4159
Diary of Samuel Pepys, August 1665, by Samuel Pepys                       4158
Diary of Samuel Pepys, July 1665, by Samuel Pepys                         4157
Diary of Samuel Pepys, May/June 1665, by Samuel Pepys                     4156
Diary of Samuel Pepys, March/April 1664/65, by Samuel Pepys               4155
Diary of Samuel Pepys, January/February 1964/65, by Samuel Pepys          4154
Diary of Samuel Pepys, 1664 N.S. Complete, by Samuel Pepys                4153
Diary of Samuel Pepys, December 1664, by Samuel Pepys                     4152
Diary of Samuel Pepys, October/November 1664, by Samuel Pepys             4151
Diary of Samuel Pepys, August/September 1664, by Samuel Pepys             4150
Diary of Samuel Pepys, June/July 1664, by Samuel Pepys                    4149
Diary of Samuel Pepys, April/May 1664, by Samuel Pepys                    4148
Diary of Samuel Pepys, March 1663/64, by Samuel Pepys                     4147
Diary of Samuel Pepys, January/February 1663/64, by Samuel Pepys          4146
Diary of Samuel Pepys, 1663 N.S. Complete, by Samuel Pepys                4145
Diary of Samuel Pepys, November/December 1663, by Samuel Pepys            4144
Diary of Samuel Pepys, September/October 1663, by Samuel Pepys            4143
Diary of Samuel Pepys, July/August 1663, by Samuel Pepys                  4142
Diary of Samuel Pepys, May/June 1663, by Samuel Pepys                     4141
Diary of Samuel Pepys, March/April 1662/63, by Samuel Pepys               4140
Diary of Samuel Pepys, January/February 1662/63, by Samuel Pepys          4139
Diary of Samuel Pepys, 1662 N.S. Complete, by Samuel Pepys                4138
Diary of Samuel Pepys, November/December 1662, by Samuel Pepys            4137
Diary of Samuel Pepys, September/October 1662, by Samuel Pepys            4136
Diary of Samuel Pepys, July/August 1662, by Samuel Pepys                  4135
Diary of Samuel Pepys, May/June 1662, by Samuel Pepys                     4134
Diary of Samuel Pepys, March/April 1661/62, by Samuel Pepys               4133
Diary of Samuel Pepys, January/February 1661/62, by Samuel Pepys          4132
Diary of Samuel Pepys, 1661 N.S. Complete, by Samuel Pepys                4131
Diary of Samuel Pepys, November/December 1661, by Samuel Pepys            4130
Diary of Samuel Pepys, September/October 1661, by Samuel Pepys            4129
Diary of Samuel Pepys, June/July/August 1661, by Samuel Pepys             4128
Diary of Samuel Pepys, April/May 1661, by Samuel Pepys                    4127
Diary of Samuel Pepys, January/February/March 1660/61, by Samuel Pepys    4126
Diary of Samuel Pepys, 1660 N.S. Complete, by Samuel Pepys                4125
Diary of Samuel Pepys, October/November/December 1660, by Samuel Pepys    4124
Diary of Samuel Pepys, August/September 1660, by Samuel Pepys             4123
Diary of Samuel Pepys, June/July 1660, by Samuel Pepys                    4122
Diary of Samuel Pepys, May 1660, by Samuel Pepys                          4121
Diary of Samuel Pepys, March/April 1659/1660, by Samuel Pepys             4120
Diary of Samuel Pepys, February 1659/1660, by Samuel Pepys                4119
Diary of Samuel Pepys, January 1659/1660, by Samuel Pepys                 4118
Diary of Samuel Pepys, Unabridged, Preface and Life, by Samuel Pepys      4117
Jul 2002 The Diary of Samuel Pepys, Lord Braybrooke/Editor [pepysxxx.xxx] 3331
From shimmin at uiuc.edu  Wed Feb 23 09:33:18 2005
From: shimmin at uiuc.edu (Robert Shimmin)
Date: Wed Feb 23 09:33:24 2005
Subject: [gutvol-d] Pepys' birthday
In-Reply-To: <005901c519bf$420ca4e0$ac9495ce@gw98>
References: <005901c519bf$420ca4e0$ac9495ce@gw98>
Message-ID: <421CBE5E.7060801@uiuc.edu>

N Wolcott wrote:

> Being his birthday maybe this is appropriate. Pepys gave his memoirs to 
> Cambridge University, but the full text was not published until 1970. 
> That being the case would not the text (minus editorial comment and 
> added footnotes) be now public domain as it is now more than 75 years 
> since the author's death?

In the US, a work first published in 1970 has a 95-year term, and won't 
hit the public domain until 2066.

In the UK, posthumous works are no different than other works today, but 
that has only been the case since 1988.  Before 1988, posthumous works 
got a 50-year copyright (2021).  This may have been extended to 70 years 
since then (2041).

Canada was also offering a 50-year copyright to first publications of 
posthumous works at the time, and I know they haven't extended their term.

-- RS
From Gutenberg9443 at aol.com  Wed Feb 23 12:42:58 2005
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Feb 23 12:43:11 2005
Subject: [gutvol-d] Pepys' birthday
Message-ID: <1e.3fc5749d.2f4e44d2@aol.com>

 
In a message dated 2/23/2005 8:50:44 AM Mountain Standard Time,  
nwolcott@dsdial.net writes:

Pepys gave his memoirs to Cambridge University, but the full  text was not 
published until 1970. That being the case would not the text  (minus editorial 
comment and added footnotes) be now public domain as it is  now more than 75 
years since the author's death? 


Oh horrors! I forgot Pepys's birthday! Oh well, there's still time to bake  a 
cake. (We celebrate the birthdays of Shakespeare, Robert Burns, and Rudyard  
Kipling already.)
 
As to the diaries, they're already posted. I gave them on CD to a very dear  
neighbor for Christmas two or three years ago. She never got around to reading 
 them, mainly because despite an IQ somewhat stratospheric, she never did 
figure  out her computer. But she was grateful that I had thought of giving them 
to  her.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050223/b745fa39/attachment.html
From Gutenberg9443 at aol.com  Wed Feb 23 12:44:51 2005
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Feb 23 12:45:11 2005
Subject: [gutvol-d] Pepys' birthday
Message-ID: <1e2.36266c1c.2f4e4543@aol.com>

 
In a message dated 2/23/2005 10:33:50 AM Mountain Standard Time,  
shimmin@uiuc.edu writes:

In the  US, a work first published in 1970 has a 95-year term, and won't 
hit the  public domain until 2066.

In the UK, posthumous works are no different  than other works today, but 
that has only been the case since 1988.   Before 1988, posthumous works 
got a 50-year copyright (2021).  This  may have been extended to 70 years 
since then  (2041).


So who is going to complain? There is a new edition as of about 24 years  
ago, which includes all Pepys's XXX comments that are omitted from the earlier  
edition.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050223/09bc8b3f/attachment.html
From krooger at debian.org  Wed Feb 23 13:35:39 2005
From: krooger at debian.org (Jonathan Walther)
Date: Wed Feb 23 13:35:55 2005
Subject: [gutvol-d] Pepys' birthday
In-Reply-To: <1e2.36266c1c.2f4e4543@aol.com>
References: <1e2.36266c1c.2f4e4543@aol.com>
Message-ID: <20050223213539.GB14264@reactor-core.org>

On Wed, Feb 23, 2005 at 03:44:51PM -0500, Gutenberg9443@aol.com wrote:
>   So who is going to complain? There is a new edition as of about 24
>   years ago, which includes all Pepys's XXX comments that are omitted
>   from the earlier edition.

I seem to recall that Pepys diaries were written in a special shorthand.
The current editions may claim copyright on their "transcriptions" of
the shorthand.

Anyone game for scanning in the original shorthand, and transcribing it?

Jonathan

-- 
          It's not true unless it makes you laugh,                           
     but you don't understand it until it makes you weep.

Eukleia: Jonathan Walther
Address: 12706 99 Ave, Surrey, BC V3V2P8 (Canada)
Contact: 604-684-1319 (daytime)
Contact: 604-582-9308 (morning and evening)
Puritan: Purity of faith, Purity of doctrine. Sola Scriptura!

Patriarchy, Polygamy, Slavery === Fatherhood, Husbandry, Mastery
Matriarchy, Monogamy, Prisons === Wickedness, Stupidity, Buggery
From marcello at perathoner.de  Wed Feb 23 11:31:50 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Feb 23 14:44:16 2005
Subject: [gutvol-d] Filesystem changes to the web site
Message-ID: <421CDA26.7060507@perathoner.de>

At my request ibiblio is moving our site to the new file server.


Our new Apache documentroot will be:

   /public/vhost/g/gutenberg/html/

our old documentroot was:

   /public/html/gutenberg/


The steps are:

1. (done)

The new directory has been created and the old directory has been copied 
to the new directory.

The new directory is accessible thru the development web server at:

   http://www-dev.gutenberg.org

2. (in progress)

I will update the files in the new directory and test them.

3.

ibiblio will switch the production server to the new directory.

4.

ibiblio will delete the old directory.


How does this affect you ?

If you are editing files in /public/html/gutenberg/ you should copy your 
edits to the corresponding files in /public/vhost/g/gutenberg/html/. At 
least you should keep a list of which files you did edit so you can copy 
them over before we switch the production servers.


-- 
Marcello Perathoner
webmaster@gutenberg.org


From gbuchana at rogers.com  Wed Feb 23 17:42:39 2005
From: gbuchana at rogers.com (Gardner Buchanan)
Date: Wed Feb 23 17:43:01 2005
Subject: [gutvol-d] Ibn Batuta (Was Re: Fwd: Project Googleberg)
In-Reply-To: <41CB953A.50402@dsl.pipex.com>
Message-ID: <XFMail.050223204239.gbuchana@rogers.com>

Hi all,

So after all the enthusiastic chatter about this in December, I'm
a little surprised two months later to find myself the first mover,
but here I am.  I splashed out $12 for a facsimile edition of the
1829 Lee translation from Amazon, and yesterday I got hold of it.
I've submitted it for clearance and will do the scans as time
permits.  I intend to push the scans into a DP project versus trying
to handle it myself.  There will be difficulty however: this is a
scholarly translation and is full of footnotes, pronunciation
notation and is stuffed with arabic passages.  Have a look at a
sample here:

http://unixcomputer.net/new-photo/cd/p12.gif

Anyone got ideas or suggestions for handling this sort of material?

See you,

On 04:04:10 Holden McGroin wrote:
> Gutenberg9443@aol.com wrote:
>> By the way, does ANYBODY know where we can get a public domain copy of 
>> Ibn Batuta? I've had no luck finding one online. I even asked the king 
>> of Saudi Arabia for a copy, but His Majesty didn't answer. The few 
>> snippets I've seen are fascinating. He left his home to go on a haj, and 
>> then kept going, spending 29 years travelling and writing fascinating 
>> notes of where he went, namely everywhere you could get to without going 
>> to Arctica, Antarctica, or the Americas.
> 
> I have to agree with Anne. Every time I hear about Ibn Batuta's amazing 
> travels, I feel the urge to read his writings. Is there any chance we 
> could get them online as part of Gutenberg's collection?

============================================================
Gardner Buchanan                       <gbuchana@rogers.com>
Ottawa, ON             FreeBSD: Where you want to go. Today.
From la_joconde_orange at yahoo.com  Wed Feb 23 19:49:03 2005
From: la_joconde_orange at yahoo.com (Melissa)
Date: Wed Feb 23 19:49:19 2005
Subject: [gutvol-d] Ibn Batuta (Was Re: Fwd: Project Googleberg)
In-Reply-To: <XFMail.050223204239.gbuchana@rogers.com>
Message-ID: <20050224034904.99107.qmail@web20224.mail.yahoo.com>

Footnotes should not be a problem for DP, we handle them all the time. Arabic is another question. An html edition could certainly be made, with the help of someone who knows arabic to do those transcriptions, and a transcriber's note added that to view the arabic text, an arabic font must be installed. An ascii text would just be plaintext of course and therefore incomplete, but there could be a transcriber's note in that edition too, pointing the reader to the html edition for the complete text. Many at DP are not scared by the thought of a scholarly work or making faithful renditions of them. Some even relish the challenge. With the collaboration of a speaker of Arabic, whether someone at DP or elsewhere, such a project could be reliably done at DP. 
 
On DP, la_joconde
 
On PG, Melissa Er-Raqabi (you may search my name at pgdp.net. Some of my recent uploads for Black History Month are non-fiction works with transcriber's notes. Higher project numbers are obviously more recent.)
 
--Melissa
 
 
Gardner Buchanan <gbuchana@rogers.com> wrote:
Hi all,

So after all the enthusiastic chatter about this in December, I'm
a little surprised two months later to find myself the first mover,
but here I am. I splashed out $12 for a facsimile edition of the
1829 Lee translation from Amazon, and yesterday I got hold of it.
I've submitted it for clearance and will do the scans as time
permits. I intend to push the scans into a DP project versus trying
to handle it myself. There will be difficulty however: this is a
scholarly translation and is full of footnotes, pronunciation
notation and is stuffed with arabic passages. Have a look at a
sample here:

http://unixcomputer.net/new-photo/cd/p12.gif

Anyone got ideas or suggestions for handling this sort of material?

See you,

On 04:04:10 Holden McGroin wrote:
> Gutenberg9443@aol.com wrote:
>> By the way, does ANYBODY know where we can get a public domain copy of 
>> Ibn Batuta? I've had no luck finding one online. I even asked the king 
>> of Saudi Arabia for a copy, but His Majesty didn't answer. The few 
>> snippets I've seen are fascinating. He left his home to go on a haj, and 
>> then kept going, spending 29 years travelling and writing fascinating 
>> notes of where he went, namely everywhere you could get to without going 
>> to Arctica, Antarctica, or the Americas.
> 
> I have to agree with Anne. Every time I hear about Ibn Batuta's amazing 
> travels, I feel the urge to read his writings. Is there any chance we 
> could get them online as part of Gutenberg's collection?

============================================================
Gardner Buchanan 
Ottawa, ON FreeBSD: Where you want to go. Today.
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050223/c5a803f2/attachment.html
From lofstrom at lava.net  Wed Feb 23 20:49:37 2005
From: lofstrom at lava.net (Karen Lofstrom)
Date: Wed Feb 23 20:49:53 2005
Subject: [gutvol-d] Ibn Batuta (Was Re: Fwd: Project Googleberg)
In-Reply-To: <XFMail.050223204239.gbuchana@rogers.com>
References: <XFMail.050223204239.gbuchana@rogers.com>
Message-ID: <Pine.BSI.4.61.0502231848090.6709@malasada.lava.net>


On Wed, 23 Feb 2005, Gardner Buchanan wrote:

> I splashed out $12 for a facsimile edition of the
> 1829 Lee translation from Amazon, and yesterday I got hold of it.
> I've submitted it for clearance and will do the scans as time
> permits.  I intend to push the scans into a DP project versus trying
> to handle it myself.  There will be difficulty however: this is a
> scholarly translation and is full of footnotes, pronunciation
> notation and is stuffed with arabic passages.

Do put it through DP-EU (the European Distributed Proofreaders). They are 
using Unicode and can handle Arabic text. In fact, they were or are doing 
some Urdu texts in Arabic script.

-- 
Karen Lofstrom
Zora on DP

From Gutenberg9443 at aol.com  Fri Feb 25 07:04:23 2005
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Feb 25 07:05:16 2005
Subject: [gutvol-d] Pepys' birthday
Message-ID: <7b.3fb076d7.2f509877@aol.com>

 
In a message dated 2/23/2005 2:36:14 PM Mountain Standard Time,  
krooger@debian.org writes:

I seem  to recall that Pepys diaries were written in a special shorthand.
The  current editions may claim copyright on their "transcriptions" of
the  shorthand.

Anyone game for scanning in the original shorthand, and  transcribing it?


I don't know how you could get a copy of the diaries OR the shorthand. I'm  
sure the recent edition would claim copyright, but I still don't think the 
older  edition is likely to be a problem. It's been up for a while and so far as I 
 know, nobody's complained. Considering how long it took to transliterate 
them  the two times they were transliterated, I expect most of us have more to do 
with  the next fifteen years of our lives.
 
I think we should stay with what we have. If anybody needs something more  
thorough, that person will probably need to go to the closest major university  
library.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050225/e38fd817/attachment.html
From ron at zytrax.com  Fri Feb 25 08:38:18 2005
From: ron at zytrax.com (Ron Aitchison)
Date: Fri Feb 25 08:39:55 2005
Subject: [gutvol-d] Enlightened self-interst
Message-ID: <421F547A.1080007@zytrax.com>

Having discovered Jane Austen regrettably late in life I have 
down-loaded a couple of novels and since I find the raw text format 
unpleasant to read I have reformatted for my own use.
It seems to me since I have the ability to produce PDFs and OpenOffice 
formats and even - heaven forfend - MS doc format should they be wanted, 
it would be churlish not to make such an offer.
If you can point me at a standard for PDF, page width, font size etc, 
etc., and let me know what formats you do want I would be happy to 
undertake the small additional work for the two novels I have currently 
downloaded.
I cannot supply DocBook at this time but hope to have that available 
shortly.
Regards

-- 
Ron Aitchison

From sly at victoria.tc.ca  Fri Feb 25 09:36:54 2005
From: sly at victoria.tc.ca (Andrew Sly)
Date: Fri Feb 25 09:37:44 2005
Subject: [gutvol-d] Enlightened self-interst
In-Reply-To: <421F547A.1080007@zytrax.com>
References: <421F547A.1080007@zytrax.com>
Message-ID: <Pine.GSO.4.58.0502250926030.1858@vtn1.victoria.tc.ca>


One possible problem is that PDF files are not easily editable.

All of our older texts are being gradually worked through,
corrected, supplied with a new PG header (which puts all the
legal "small print" at the end of the file instead of the
beginning) and REPosted into the currant directory structure.
When this process is done it will make some of the back-end
organization much easier to deal with.

However, if during this process, we come across a non-editable file
(PDF, Lit, whatever), we cannot update it, and it's generally moved
into an "old" directory, where it is still availible if someone
goes looking for it, but otherwise is not shown in the catalog.

Andrew

On Fri, 25 Feb 2005, Ron Aitchison wrote:

> Having discovered Jane Austen regrettably late in life I have
> down-loaded a couple of novels and since I find the raw text format
> unpleasant to read I have reformatted for my own use.
> It seems to me since I have the ability to produce PDFs and OpenOffice
> formats and even - heaven forfend - MS doc format should they be wanted,
> it would be churlish not to make such an offer.
> If you can point me at a standard for PDF, page width, font size etc,
> etc., and let me know what formats you do want I would be happy to
> undertake the small additional work for the two novels I have currently
> downloaded.
> I cannot supply DocBook at this time but hope to have that available
> shortly.
> Regards
>
>
From ron at zytrax.com  Fri Feb 25 14:24:49 2005
From: ron at zytrax.com (Ron Aitchison)
Date: Fri Feb 25 14:26:33 2005
Subject: [gutvol-d] Enlightened Self Interest
Message-ID: <421FA5B1.2080806@zytrax.com>

Understand the issue of editing. My proposal would be to supply an 
editable file in OpenOffice or MS doc format (BTW if you are not using 
the Open Source OpenOffice suite I recommend you check it out - the 
features are great, at least as feature rich as MS word, plus - one 
button PDF creation, output as doc, text or native XML format and a 
great price = $0! http://www.openoffice.org ).
I propose to take nothing away you will have edit control over the file. 
This also opens up another question over what base document formats you 
have standardized for editability and portability e.g. OASIS etc.. Maybe 
that is a topic another list.
Finally I note you have PDF formats available for some other books.

Andrew Sly wrote:

>One possible problem is that PDF files are not easily editable.
>
>All of our older texts are being gradually worked through,
>corrected, supplied with a new PG header (which puts all the
>legal "small print" at the end of the file instead of the
>beginning) and REPosted into the currant directory structure.
>When this process is done it will make some of the back-end
>organization much easier to deal with.
>
>However, if during this process, we come across a non-editable file
>(PDF, Lit, whatever), we cannot update it, and it's generally moved
>into an "old" directory, where it is still availible if someone
>goes looking for it, but otherwise is not shown in the catalog.
>
>Andrew
>
>> Having discovered Jane Austen regrettably late in life I have
>> down-loaded a couple of novels and since I find the raw text format
>> unpleasant to read I have reformatted for my own use.
>> It seems to me since I have the ability to produce PDFs and OpenOffice
>> formats and even - heaven forfend - MS doc format should they be wanted,
>> it would be churlish not to make such an offer.
>> If you can point me at a standard for PDF, page width, font size etc,
>> etc., and let me know what formats you do want I would be happy to
>> undertake the small additional work for the two novels I have currently
>> downloaded.
>> I cannot supply DocBook at this time but hope to have that available
>> shortly.
>> Regards
>>
>>
>  
>

-- 
Ron Aitchison 


From cannona at fireantproductions.com  Fri Feb 25 15:16:13 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Fri Feb 25 15:20:06 2005
Subject: [gutvol-d] Filesystem changes to the web site
In-Reply-To: <421CDA26.7060507@perathoner.de>
References: <421CDA26.7060507@perathoner.de>
Message-ID: <6.1.2.0.0.20050225171445.01be73b0@mail.fireantproductions.com>

Any chance we could get a specific timeline on when these changes would be 
taking place?  I just want to be sure we don't miss any CD/DVD requests.

Thanks.

Sincerely
Aaron Cannon


At 01:31 PM 2/23/2005, you wrote:
>At my request ibiblio is moving our site to the new file server.
>
>
>Our new Apache documentroot will be:
>
>   /public/vhost/g/gutenberg/html/
>
>our old documentroot was:
>
>   /public/html/gutenberg/
>
>
>The steps are:
>
>1. (done)
>
>The new directory has been created and the old directory has been copied 
>to the new directory.
>
>The new directory is accessible thru the development web server at:
>
>   http://www-dev.gutenberg.org
>
>2. (in progress)
>
>I will update the files in the new directory and test them.
>
>3.
>
>ibiblio will switch the production server to the new directory.
>
>4.
>
>ibiblio will delete the old directory.
>
>
>How does this affect you ?
>
>If you are editing files in /public/html/gutenberg/ you should copy your 
>edits to the corresponding files in /public/vhost/g/gutenberg/html/. At 
>least you should keep a list of which files you did edit so you can copy 
>them over before we switch the production servers.
>
>
>
>--
>Marcello Perathoner
>webmaster@gutenberg.org
>
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d


--
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) 


From bruce at zuhause.org  Fri Feb 25 16:11:02 2005
From: bruce at zuhause.org (Bruce Albrecht)
Date: Fri Feb 25 16:11:56 2005
Subject: [gutvol-d] Enlightened Self Interest
In-Reply-To: <421FA5B1.2080806@zytrax.com>
References: <421FA5B1.2080806@zytrax.com>
Message-ID: <16927.48790.533173.228950@celery.zuhause.org>

I think the long term view, at least from the Distributed
Proofreader's supply chain, is to provide a TEI-Lite document for each
text, and from it programmatically create HTML, plain text, PDF, etc
on the fly.  I'm not sure when this will happen, but I expect that
some of the precursor activities at DP will take place this year.  I
don't know if DP will try to replace all previous versions of texts
with TEI-Lite documents, but my guess is that once a system is in
place, there will be volunteers that will go back and rework the
texts, just as we have volunteers today providing revised editions of
earlier texts with HTML and text versions that follow the current
formatting guidelines.

As always, volunteer in the ways you see fit, but I suspect many here
(at least us DPers) would argue that working on new texts hitherto
unavailable to PG is probably a better use of your time than providing
multiple reformatted versions of existing works.

Bruce

http://www.pgdp.net/vision/     For Charlz' vision
http://www.tei-c.org/Lite/      For information on TEI-Lite
http://www.pdgp.net             For volunteering at Distributed Proofreaders

Ron Aitchison writes:
 > Understand the issue of editing. My proposal would be to supply an 
 > editable file in OpenOffice or MS doc format (BTW if you are not using 
 > the Open Source OpenOffice suite I recommend you check it out - the 
 > features are great, at least as feature rich as MS word, plus - one 
 > button PDF creation, output as doc, text or native XML format and a 
 > great price = $0! http://www.openoffice.org ).
 > I propose to take nothing away you will have edit control over the file. 
 > This also opens up another question over what base document formats you 
 > have standardized for editability and portability e.g. OASIS etc.. Maybe 
 > that is a topic another list.
 > Finally I note you have PDF formats available for some other books.
From jon_niehof at yahoo.com  Fri Feb 25 17:20:08 2005
From: jon_niehof at yahoo.com (Jon Niehof)
Date: Fri Feb 25 17:21:04 2005
Subject: [gutvol-d] Enlightened Self Interest
In-Reply-To: <16927.48790.533173.228950@celery.zuhause.org>
Message-ID: <20050226012008.95779.qmail@web41601.mail.yahoo.com>

> As always, volunteer in the ways you see fit, but I suspect
> many here (at least us DPers) would argue that working on new
> texts hitherto unavailable to PG is probably a better use of
> your time than providing multiple reformatted versions of
> existing works.

I would agree; it seems to me that converting into a format that
cannot be programmatically converted into other formats
(including other "master" formats like DP-TEI, whenever that
gets specified), is rather a waste of one's time.

Anything that isn't a value-add (like converting straight text
to Word or PDF without adding, say, bookmark information) also
strikes me as not too useful. I could blast all of PG into
Weasel format without a lot of trouble, for example, but I don't
see a benefit as anybody who could make use of it could easily
do the conversion as well.

(pie-in-the-sky: being able to on-the-fly convert TEI to format
of user's choice on download would be nearly Grail-like.)


__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - now with 250MB free storage. Learn more.
http://info.mail.yahoo.com/mail_250
From jtinsley at pobox.com  Fri Feb 25 18:04:52 2005
From: jtinsley at pobox.com (Jim Tinsley)
Date: Fri Feb 25 18:05:53 2005
Subject: [gutvol-d] Enlightened Self Interest
In-Reply-To: <20050226012008.95779.qmail@web41601.mail.yahoo.com>
References: <16927.48790.533173.228950@celery.zuhause.org>
	<20050226012008.95779.qmail@web41601.mail.yahoo.com>
Message-ID: <20050226020452.GA24272@panix.com>

On Fri, Feb 25, 2005 at 05:20:08PM -0800, Jon Niehof wrote:
>> As always, volunteer in the ways you see fit, but I suspect
>> many here (at least us DPers) would argue that working on new
>> texts hitherto unavailable to PG is probably a better use of
>> your time than providing multiple reformatted versions of
>> existing works.
>
>I would agree; it seems to me that converting into a format that
>cannot be programmatically converted into other formats
>(including other "master" formats like DP-TEI, whenever that
>gets specified), is rather a waste of one's time.
>
>Anything that isn't a value-add (like converting straight text
>to Word or PDF without adding, say, bookmark information) also
>strikes me as not too useful. I could blast all of PG into
>Weasel format without a lot of trouble, for example, but I don't
>see a benefit as anybody who could make use of it could easily
>do the conversion as well.
>

Well put. What we call "blind format conversions" -- conversions
from one format to another, based on your own preferences,
without any value-added input such as, say, illustrations from
an eligible edition -- are not things that we really want to
post, without some special reason. We have done it in the past,
and it hasn't worked well. 

Sites like Blackmask http://blackmask.com do a better job of 
managing such content than we do, and in fact David Moynihan 
of Blackmask has offered us all of his converted files if
we want them. We discussed it a few years ago, and decided
against.

>(pie-in-the-sky: being able to on-the-fly convert TEI to format
>of user's choice on download would be nearly Grail-like.)

You don't need TEI just for conversion. Today, HTML is the Universal
Format for converting _from_. It may not be so always, and
HTML has limits; it ain't great on mathematical texts, for
instance, but given HTML, you can very easily get to any of
the common reader formats in one step.

jim

From prosfilaes at gmail.com  Fri Feb 25 18:57:42 2005
From: prosfilaes at gmail.com (David Starner)
Date: Fri Feb 25 18:58:39 2005
Subject: [gutvol-d] Enlightened Self Interest
In-Reply-To: <421FA5B1.2080806@zytrax.com>
References: <421FA5B1.2080806@zytrax.com>
Message-ID: <6d99d1fd05022518573b49c21a@mail.gmail.com>

On Fri, 25 Feb 2005 17:24:49 -0500, Ron Aitchison <ron@zytrax.com> writes:
> Finally I note you have PDF formats available for some other books.

Primarily from TeX, which makes it easy to generate, and primarily for
mathematical and scientific documents that pretty much have to be done
in TeX.

Jim Tinsley  <jtinsley@pobox.com> writes:
> Today, HTML is the Universal
> Format for converting _from_. It may not be so always, and
> HTML has limits; it ain't great on mathematical texts,

More importantly, HTML can't really do footnotes, and I doubt anything
is doing decent transformations on what we kludge sidenotes into.
From ron at zytrax.com  Fri Feb 25 19:04:01 2005
From: ron at zytrax.com (Ron Aitchison)
Date: Fri Feb 25 19:05:48 2005
Subject: [gutvol-d] Enlightened Self Interest
In-Reply-To: <20050226020452.GA24272@panix.com>
References: <16927.48790.533173.228950@celery.zuhause.org>	<20050226012008.95779.qmail@web41601.mail.yahoo.com>
	<20050226020452.GA24272@panix.com>
Message-ID: <421FE721.2000004@zytrax.com>

Whoa there. Clearly I walked into a minefield and feel in imminent 
danger of having various limbs blasted from my poor undeserving corpus.
Let me state my point of view or why I made the offer and why I think 
perhaps trees and forests may be getting a little confused. Now I'm new 
to this stuff and many of you good folks have labored for years so if I 
lay a few mines of my own - so be it.
1. The primary reason for my offer was simply that since I found the 
simple text version unpleasant to read I thought there may be others and 
that having a choice of formats available may make the output - the 
books - more approachable hence reach a wider audience and all the good 
things that must flow from that. Seems to me this is that GP is about - 
outreach.
2. I fully understand the issue of editable text. and rampant variations 
- a maintenance nightmare. Untenable.
So let me address the issue of maintenance and incidentally why I do not 
think that my offer need cause the end of the world as we know it.
There are two parts to this argument:
1. The basic format that I have converted to is OpenOffice 's XML format 
from which multiple conversions - PDF and MS doc if you want - are 
derived. . All essentially driven from a set of DTD's. My brief reading 
of TEI is that it too uses an XML base. So we have a trivial level of 
commonality as a starting point. By looking at the conversion processes 
we could have a WSYIWYG editor off-the-shelf at $0 cost with output 
convertible to TEI output by driving it through appropriate XLST's and 
all that good stuff. OpenOffice has a pilot development with DocBook to 
do something similar. It is not making much progress but with the right 
effort it could.
2. The second point relates to the difficulty, of success possibility, 
of conversion. I used 4 styles in the book. Header 1, paragraph, page 
header and page footer (the last two could be easily removed but are 
tactically useful because of page numbering). For a simple text book I 
see no reason to use any more and the cost of replacement of  header 
/footer with an alternate implementation is trivial in the extreme. Hard 
pagination is perhaps a bit more difficult to handle and I'm not sure I 
should have done it but in the absence of any instructions/suggestions 
to the contrary I did. So a set of simple rules in the period before an 
idealized solution is available would significantly reduce difficulties.
Now whether TEI is better than DocBook or a converged OASIS standard is 
not for me to say.
But it does seem to me there is a way forward in the short term by 
making the right intercepts - a combination of technology and rules - 
without building up a redundant and unmanageable nightmare. Or am I wrong?
Finally does anyone want my pathetic conversions of Northanger Abbey and 
Persuasion !! -:) Or is it thanks but no thanks!

Jim Tinsley wrote:

>On Fri, Feb 25, 2005 at 05:20:08PM -0800, Jon Niehof wrote:
>  
>
>>>As always, volunteer in the ways you see fit, but I suspect
>>>many here (at least us DPers) would argue that working on new
>>>texts hitherto unavailable to PG is probably a better use of
>>>your time than providing multiple reformatted versions of
>>>existing works.
>>>      
>>>
>>I would agree; it seems to me that converting into a format that
>>cannot be programmatically converted into other formats
>>(including other "master" formats like DP-TEI, whenever that
>>gets specified), is rather a waste of one's time.
>>
>>Anything that isn't a value-add (like converting straight text
>>to Word or PDF without adding, say, bookmark information) also
>>strikes me as not too useful. I could blast all of PG into
>>Weasel format without a lot of trouble, for example, but I don't
>>see a benefit as anybody who could make use of it could easily
>>do the conversion as well.
>>
>>    
>>
>
>Well put. What we call "blind format conversions" -- conversions
>from one format to another, based on your own preferences,
>without any value-added input such as, say, illustrations from
>an eligible edition -- are not things that we really want to
>post, without some special reason. We have done it in the past,
>and it hasn't worked well. 
>
>Sites like Blackmask http://blackmask.com do a better job of 
>managing such content than we do, and in fact David Moynihan 
>of Blackmask has offered us all of his converted files if
>we want them. We discussed it a few years ago, and decided
>against.
>
>  
>
>>(pie-in-the-sky: being able to on-the-fly convert TEI to format
>>of user's choice on download would be nearly Grail-like.)
>>    
>>
>
>You don't need TEI just for conversion. Today, HTML is the Universal
>Format for converting _from_. It may not be so always, and
>HTML has limits; it ain't great on mathematical texts, for
>instance, but given HTML, you can very easily get to any of
>the common reader formats in one step.
>
>jim
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>  
>

-- 
Ron Aitchison               http://www.zytrax.com
ZyTrax                      mailto:r.aitchison@zytrax.com
                            70 rue Notre Dame West
                            Montreal Quebec H2Y 1S6
                            Tel:(514) 285.9088 	

From jon at noring.name  Fri Feb 25 19:26:13 2005
From: jon at noring.name (Jon Noring)
Date: Fri Feb 25 19:27:12 2005
Subject: [gutvol-d] Enlightened Self Interest
In-Reply-To: <421FE721.2000004@zytrax.com>
References: <16927.48790.533173.228950@celery.zuhause.org>
	<20050226012008.95779.qmail@web41601.mail.yahoo.com>
	<20050226020452.GA24272@panix.com> <421FE721.2000004@zytrax.com>
Message-ID: <7815066468.20050225202613@noring.name>

Ron wrote:

> 1. The basic format that I have converted to is OpenOffice 's XML format 
> from which multiple conversions - PDF and MS doc if you want - are 
> derived. . All essentially driven from a set of DTD's. My brief reading 
> of TEI is that it too uses an XML base. So we have a trivial level of 
> commonality as a starting point. By looking at the conversion processes 
> we could have a WSYIWYG editor off-the-shelf at $0 cost with output 
> convertible to TEI output by driving it through appropriate XLST's and 
> all that good stuff. OpenOffice has a pilot development with DocBook to 
> do something similar. It is not making much progress but with the right 
> effort it could.

For maximum archivability, repurposeability and accessibility, it is
important for the XML markup vocabulary used in the master document to
be wholly structural and semantic. Except where absolutely necessary
(and maybe best solved using SVG and MathML), presentational markup
should be avoided.

TEI is primarily structural/semantic, but there are some presentational
components. The base DP-TEI (I envision three levels of DP-TEI), when
it comes into being, should not specify any presentational markup
components.

I am not familiar with OpenOffice's XML vocabulary, but I would guess
that it, too, is a mix of structural/semantic tags with presentation
tags (I also guess that it is much more presentationally-oriented than
TEI, and doesn't have the structural/semantic richness of TEI.) If
OpenOffice's XML vocabulary is to be used, it should be subsetted (at
least at the base level) to not allow presentational markup.

I do not recommend DocBook as the primary markup vocabulary for
general books, but certainly it is intriguing to consider it as a
second "blessed" vocabulary for particular types of documents it
is designed for (primarily technical documents.)

Just my $0.02 worth.

Jon Noring

From jtinsley at pobox.com  Fri Feb 25 19:29:24 2005
From: jtinsley at pobox.com (Jim Tinsley)
Date: Fri Feb 25 19:30:22 2005
Subject: [gutvol-d] Enlightened Self Interest
In-Reply-To: <421FE721.2000004@zytrax.com>
References: <16927.48790.533173.228950@celery.zuhause.org>
	<20050226012008.95779.qmail@web41601.mail.yahoo.com>
	<20050226020452.GA24272@panix.com> <421FE721.2000004@zytrax.com>
Message-ID: <20050226032924.GA29574@panix.com>

On Fri, Feb 25, 2005 at 10:04:01PM -0500, Ron Aitchison wrote:
>Whoa there. Clearly I walked into a minefield and feel in imminent 
>danger of having various limbs blasted from my poor undeserving corpus.

Minefield, yes. We really should put a sign up at the gates. :-)
But nobody wants to blast you, I promise.

It's an old, old, subject, and we've tried various things at 
verious times over the last 5 years or so -- some tries even
pre-date that. I don't think there's one we don't regret. So
it's not like we're dismissing your idea out of hand; it's one
of those things that we've all thought of, and we'd all like to
do, and we never quite forget it, and it pops up now and again
even among old hands, but it's a net negative.

And there's a lot of people here who have a lot of experience
of the subject. There was probably a time when even I thought 
that posting individual blind format conversions was a good idea,
but it must have been long ago.

>Let me state my point of view or why I made the offer and why I think 
>perhaps trees and forests may be getting a little confused. Now I'm new 
>to this stuff and many of you good folks have labored for years so if I 
>lay a few mines of my own - so be it.
>1. The primary reason for my offer was simply that since I found the 
>simple text version unpleasant to read I thought there may be others and 
>that having a choice of formats available may make the output - the 
>books - more approachable hence reach a wider audience and all the good 
>things that must flow from that. Seems to me this is that GP is about - 
>outreach.
>2. I fully understand the issue of editable text. and rampant variations 
>- a maintenance nightmare. Untenable.
>So let me address the issue of maintenance and incidentally why I do not 
>think that my offer need cause the end of the world as we know it.
>There are two parts to this argument:
>1. The basic format that I have converted to is OpenOffice 's XML format 
>from which multiple conversions - PDF and MS doc if you want - are 
>derived. . All essentially driven from a set of DTD's. My brief reading 
>of TEI is that it too uses an XML base. So we have a trivial level of 
>commonality as a starting point. By looking at the conversion processes 
>we could have a WSYIWYG editor off-the-shelf at $0 cost with output 
>convertible to TEI output by driving it through appropriate XLST's and 
>all that good stuff. OpenOffice has a pilot development with DocBook to 
>do something similar. It is not making much progress but with the right 
>effort it could.
>2. The second point relates to the difficulty, of success possibility, 
>of conversion. I used 4 styles in the book. Header 1, paragraph, page 
>header and page footer (the last two could be easily removed but are 
>tactically useful because of page numbering). For a simple text book I 
>see no reason to use any more and the cost of replacement of  header 
>/footer with an alternate implementation is trivial in the extreme. Hard 
>pagination is perhaps a bit more difficult to handle and I'm not sure I 
>should have done it but in the absence of any instructions/suggestions 
>to the contrary I did. So a set of simple rules in the period before an 
>idealized solution is available would significantly reduce difficulties.
>Now whether TEI is better than DocBook or a converged OASIS standard is 
>not for me to say.
>But it does seem to me there is a way forward in the short term by 
>making the right intercepts - a combination of technology and rules - 
>without building up a redundant and unmanageable nightmare. Or am I wrong?
>Finally does anyone want my pathetic conversions of Northanger Abbey and 
>Persuasion !! -:) Or is it thanks but no thanks!

Your conversions may well be lovely; their quality isn't at all
an issue here. It's just not something that we do, except under
some compelling special circumstances.

jim

From Bowerbird at aol.com  Fri Feb 25 20:01:24 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Feb 25 20:02:28 2005
Subject: [gutvol-d] Enlightened Self Interest
Message-ID: <a3.6e21e848.2f514e94@aol.com>

jon niehof said:
>   (pie-in-the-sky: 
>   being able to on-the-fly convert TEI 
>   to format of user's choice on download 
>   would be nearly Grail-like.)

well, then _totally_ "grail-like" would be
for users to have a tool that enables them to
convert the sole maintained download format
to any other format they might have a need for.

i created the format -- zen markup language --
and am rapidly finishing programming the tool...

that is all for now.  talk amongst yourselves...

-bowerbird
From donovan at abs.net  Sat Feb 26 04:29:38 2005
From: donovan at abs.net (D Garcia)
Date: Sat Feb 26 04:31:54 2005
Subject: [gutvol-d] Enlightened Self Interest
In-Reply-To: <16927.48790.533173.228950@celery.zuhause.org>
References: <421FA5B1.2080806@zytrax.com>
	<16927.48790.533173.228950@celery.zuhause.org>
Message-ID: <200502260729.38650.donovan@abs.net>

On Friday 25 February 2005 07:11 pm, Bruce Albrecht wrote:
> I don't know if DP will try to replace all previous versions of texts
> with TEI-Lite documents, but my guess is that once a system is in
> place, there will be volunteers that will go back and rework the
> texts, just as we have volunteers today providing revised editions of
> earlier texts with HTML and text versions that follow the current
> formatting guidelines.
>
> As always, volunteer in the ways you see fit, but I suspect many here
> (at least us DPers) would argue that working on new texts hitherto
> unavailable to PG is probably a better use of your time than providing
> multiple reformatted versions of existing works.

I'm one of the volunteers who is going back and providing reworked versions of 
existing older PG texts, and my approximate criteria for selection are: Older 
than (roughly) number 7000, is only in text version at PG, text version has 
many "hard" errors (tbe, arc, arid, etc. as opposed to "soft" problems such 
as formatting), illustrations not present, and most importantly, ones that I 
have a physical copy of the book from which to make the corrections from.

This clearly falls under the "value-added" category of thinking. While I share 
your position that simple reformatting is mostly a waste of time, going back 
and rehabilitating existing works is not, and I hope that people interested 
in working on that aspect are not discouraged.

I think of it much like carpentry; there are some people who are more of a 
framing temperament, those who are interested in finish work, and those who 
like to do restoration or renovation work. All of those skills/mindsets are 
necessary to complete a strong and attractive project.
From bruce at zuhause.org  Sat Feb 26 07:43:23 2005
From: bruce at zuhause.org (Bruce Albrecht)
Date: Sat Feb 26 07:44:30 2005
Subject: [gutvol-d] Enlightened Self Interest
In-Reply-To: <200502260729.38650.donovan@abs.net>
References: <421FA5B1.2080806@zytrax.com>
	<16927.48790.533173.228950@celery.zuhause.org>
	<200502260729.38650.donovan@abs.net>
Message-ID: <16928.39195.282423.849192@celery.zuhause.org>

D Garcia writes:
 > I'm one of the volunteers who is going back and providing reworked versions of 
 > existing older PG texts, and my approximate criteria for selection are: Older 
 > than (roughly) number 7000, is only in text version at PG, text version has 
 > many "hard" errors (tbe, arc, arid, etc. as opposed to "soft" problems such 
 > as formatting), illustrations not present, and most importantly, ones that I 
 > have a physical copy of the book from which to make the corrections from.
 > 
 > This clearly falls under the "value-added" category of thinking. While I share 
 > your position that simple reformatting is mostly a waste of time, going back 
 > and rehabilitating existing works is not, and I hope that people interested 
 > in working on that aspect are not discouraged.

I agree that your type of updates is needed for the older PG titles,
and don't consider it a waste of time.  However, it was my impression
that Ron was offering to provide uncorrected reformatted editions of
the titles in question.
From joshua at hutchinson.net  Sat Feb 26 08:25:13 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat Feb 26 08:26:01 2005
Subject: [gutvol-d] Enlightened Self Interest
In-Reply-To: <20050226032924.GA29574@panix.com>
References: <16927.48790.533173.228950@celery.zuhause.org>	<20050226012008.95779.qmail@web41601.mail.yahoo.com>	<20050226020452.GA24272@panix.com>
	<421FE721.2000004@zytrax.com> <20050226032924.GA29574@panix.com>
Message-ID: <4220A2E9.7010300@hutchinson.net>

Ok, I leave the computer for one night and you all go nuts with the 
posts!  :)

hehe

Anyway, as one of the people working on PGTEI, I figure this discussion 
could use an update where things stand.

Currently, my efforts have concentrate on two fronts.

1 - Converting those texts that come through me from DP into PGTEI 
master format.  I then use the online PGTEI -> HTML conversion routine 
to convert them to HTML for posting to PG.  Most of them are not 
converted to TEXT simply because someone else at DP did the text version 
before I got to them.  In other words, I've been mostly concentrating on 
the PGTEI format itself and the HTML output that results from it.

Here is a recent link to a posted book... from off the top of my head.  
There are many more I just don't have the list here on this computer.  
(Last count there were 20+ documents that I've put in PGTEI format 
sitting on my computer... most of which have been posted to the PG 
archives in HTML and/or TEXT format.)

http://www.gutenberg.org/dirs/1/4/9/8/14986/14986-h/14986-h.htm
Experimental Researches in Electricity, Volume 1

This is a pretty straightforward text, but it has an automatically 
produced Table of Contents and the generated footnotes, so it gives some 
idea of where we are at.  One of the things I plan on fixing in the 
future is the lack of links from the footnote text BACK to the footnote 
anchor in the main text.

2 - Updating/expanding the PGTEI documentation.  I've got more notes 
than I know what to do with and many many pages of additional 
documentation written in a rough draft.

***

The eventual end I am hoping for is a standard encoding that makes 
conversion to other formats easy and quick.  For instance, one of my 
next projects will be to take on of the VERY nasty math texts that DP 
has produced in TeX format and convert it to PGTEI.  TEI uses TeX 
encoding for the math equations themselves, but the rest of the 
formatting is a little more intuitive AND because of the validation 
routines we have available, much easier to develop and fix.  But, since 
I haven't tried the TeX on a massive scale yet within a PGTEI document, 
I don't know what bugs and gotchas I'm going to find.

If there are any questions (or if anyone wants to see some of the PGTEI 
documents I've created, rough drafts of the documentation I've working 
on, etc), please let me know.

Josh
From Bowerbird at aol.com  Sat Feb 26 10:27:07 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Feb 26 10:28:26 2005
Subject: [gutvol-d] one more thing, for jon noring
Message-ID: <e5.dd570cd.2f52197b@aol.com>

oh yeah, one more thing before i return to my laboratory.

for jon noring.  or 2 things, actually.  no, make that 3.

first, jon, since you've been makin' some big noises about
"my antonia", could you please make available a .zip file
containing all of your image-scans and the o.c.r. output?
i plan on using them in a nice little project of mine, and
downloading the scans one at a time is a pain in the neck.

second, since you regularly assert your insistence that
markup must be "semantic" rather than "presentational",
can you elucidate the structural aspects that typically
should be marked up in books?  that list would include
things like chapter-headings, footnotes, block-quotes;
and what else?  would also be nice if you could say _how_
these things should be marked up, with actual examples,
but since even the .tei experts can't seem to agree on it...

third, over on the bookpeople list, john mark ockerbloom
moderated out my replies to your late-december posts
where you issued some "friendly challenges" to me; but 
let it be known that my replies accepted your challenges.
i'll be creating a space soon where we can discuss them...

-bowerbird
From marcello at perathoner.de  Sat Feb 26 10:57:35 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Feb 26 13:08:56 2005
Subject: [gutvol-d] Enlightened Self Interest
In-Reply-To: <4220A2E9.7010300@hutchinson.net>
References: <16927.48790.533173.228950@celery.zuhause.org>	<20050226012008.95779.qmail@web41601.mail.yahoo.com>	<20050226020452.GA24272@panix.com>	<421FE721.2000004@zytrax.com>
	<20050226032924.GA29574@panix.com>
	<4220A2E9.7010300@hutchinson.net>
Message-ID: <4220C69F.1060301@perathoner.de>

Joshua Hutchinson wrote:

> If there are any questions (or if anyone wants to see some of the PGTEI 
> documents I've created, rough drafts of the documentation I've working 
> on, etc), please let me know.

I'd like to see the draft documentation.


-- 
Marcello Perathoner
webmaster@gutenberg.org


From marcello at perathoner.de  Sat Feb 26 11:05:13 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Feb 26 13:08:59 2005
Subject: [gutvol-d] Filesystem changes to the web site
In-Reply-To: <6.1.2.0.0.20050225171445.01be73b0@mail.fireantproductions.com>
References: <421CDA26.7060507@perathoner.de>
	<6.1.2.0.0.20050225171445.01be73b0@mail.fireantproductions.com>
Message-ID: <4220C869.9010406@perathoner.de>

Aaron Cannon wrote:

> Any chance we could get a specific timeline on when these changes would 
> be taking place?  I just want to be sure we don't miss any CD/DVD requests.

The timeline is: as soon as I get it done.

I assume your form processor writes a log file of the requests 
somewhere. At present there are two copies of your program.

You should go to the /public/vhost/g/gutenberg/html/ directory on 
login.ibiblio.org and edit that copy of the form processor so it writes 
the log into the new file hierarchy. The old copy under 
/public/html/gutenberg/ will still write the log to the old location. 
When we switch over, the new form will start writing the new log and 
you'll just have to pick up the old log once manually before we delete 
the old directory.

Test your new form under

   www-dev.gutenberg.org


-- 
Marcello Perathoner
webmaster@gutenberg.org


From jon at noring.name  Sat Feb 26 13:06:31 2005
From: jon at noring.name (Jon Noring)
Date: Sat Feb 26 13:09:07 2005
Subject: [gutvol-d] one more thing, for jon noring
In-Reply-To: <e5.dd570cd.2f52197b@aol.com>
References: <e5.dd570cd.2f52197b@aol.com>
Message-ID: <1817862109.20050226140631@noring.name>

Bowerbird wrote:

> first, jon, since you've been makin' some big noises about
> "my antonia", could you please make available a .zip file
> containing all of your image-scans and the o.c.r. output?
> i plan on using them in a nice little project of mine, and
> downloading the scans one at a time is a pain in the neck.

Good idea.

Unfortunately I do not have OCR output, but I have the page scans.
I'll zip up the 600 dpi 2-color (B&W) scans which have already gone
through a clean-up stage (they will be PNG files, and occupy if
memory serves me right, about 50 megs of space.) These should import
nicely into an OCR program. If you don't have an OCR program, someone
here may offer to do that for you. (Note that the page scans which
are individually linked from the My Antonia online document were
resampled from the 600 dpi 2-color scans to 120 dpi with greyscale
antialiasing to improve legibility at lower resolutions -- the 120
dpi versions probably are not as good to use for OCRing.)

Anyone?


> second, since you regularly assert your insistence that
> markup must be "semantic" rather than "presentational",
> can you elucidate the structural aspects that typically
> should be marked up in books?  that list would include
> things like chapter-headings, footnotes, block-quotes;
> and what else?  would also be nice if you could say _how_
> these things should be marked up, with actual examples,
> but since even the .tei experts can't seem to agree on it...

Also a very good suggestion. Remind me if I don't answer anytime
soon. Got a lot of projects on my plate (and just got done with
a several day project to upgrade the hardware, OS and software on
my main computer.)

Yes, the TEI people also disagree, but that's because the full
vocabulary of TEI is quite extensive. When I talked with Charles last
year on this topic, his vision at the time seemed to be that DP will
settle upon a required base subset, maybe an extended subset that
those who are interested can use but that's not required for basic
support (e.g., including semantic information as to who speaks a
particular quote, which can be marked up but is probably overkill
for basic markup support.)

I should probably make the inquiry over at the DP forums, but those
working with DP who are familiar with DP's consideration of blessing a
TEI subset for its master documents, let me know.


> third, over on the bookpeople list, john mark ockerbloom
> moderated out my replies to your late-december posts
> where you issued some "friendly challenges" to me; but 
> let it be known that my replies accepted your challenges.
> i'll be creating a space soon where we can discuss them...

Thanks. I look forward to it! (Really, I do.)

Jon

From jon at noring.name  Sat Feb 26 14:12:44 2005
From: jon at noring.name (Jon Noring)
Date: Sat Feb 26 14:14:01 2005
Subject: [gutvol-d] one more thing, for jon noring
In-Reply-To: <e5.dd570cd.2f52197b@aol.com>
References: <e5.dd570cd.2f52197b@aol.com>
Message-ID: <9221835046.20050226151244@noring.name>

Bowerbird asked:

> first, jon, since you've been makin' some big noises about
> "my antonia", could you please make available a .zip file
> containing all of your image-scans and the o.c.r. output?

The 600 dpi bitonal page scans of My Antonia (as PNG, archived in ZIP)
now available via:

   http://www.openreader.org/myantonia

I encourage others to download the ZIP to preserve the page scans. But
be forewarned the ZIP file is 49 megs in size. Using one of the CCITT
bitonal compression algorithms it would be possible to do better with
lossless compression, maybe 50% better than the currently used PNG.
But virtually everyone can view PNG files, while those CCITT
algorithms (usually encapsulated in TIFF) are oftentimes obscure.

Jon

From jon at noring.name  Sat Feb 26 17:03:31 2005
From: jon at noring.name (Jon Noring)
Date: Sat Feb 26 17:04:53 2005
Subject: [gutvol-d] ZML added (was one more thing, for jon noring)
In-Reply-To: <9221835046.20050226151244@noring.name>
References: <e5.dd570cd.2f52197b@aol.com>
	<9221835046.20050226151244@noring.name>
Message-ID: <3032082515.20050226180331@noring.name>

Btw, to the "My Antonia" beta page I've added an entry for
"regularized" plain text, with one format in this category being
Bowerbird's ZML.

I have heard of a couple other systems being touted for regularized
plain text, but none of them are being discussed in Project Gutenberg.

Jon

From joshua at hutchinson.net  Sat Feb 26 18:39:27 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat Feb 26 18:40:21 2005
Subject: [gutvol-d] one more thing, for jon noring
In-Reply-To: <e5.dd570cd.2f52197b@aol.com>
References: <e5.dd570cd.2f52197b@aol.com>
Message-ID: <422132DF.4040508@hutchinson.net>

Bowerbird@aol.com wrote:

>second, since you regularly assert your insistence that
>markup must be "semantic" rather than "presentational",
>can you elucidate the structural aspects that typically
>should be marked up in books?  that list would include
>things like chapter-headings, footnotes, block-quotes;
>and what else?  would also be nice if you could say _how_
>these things should be marked up, with actual examples,
>but since even the .tei experts can't seem to agree on it...
>

Hmm, I've yet to find a TEI "expert" that doesn't agree on the 
fundamental markups.

<p></p>  is a paragraph container.

<head></head> for a divisional (chapter, section, part, etc) heading.

<note place="foot"></note> for a footnote.

*  replace "foot" with "margin" or "endnote" as appropriate for other 
note markers.

<quote rend="display"></quote> for a block quote.

<figure url="file_name"></figure> for an inline illustration.

***

The problems with TEI don't tend to lie in the markup, but rather in the 
conversion of said markup to a final presentation format.  And usually 
then it is in markup that requires a bit of intelligence on the part of 
the rendering engine ... like complex tables, for instance.

Josh
From Bowerbird at aol.com  Sun Feb 27 01:31:43 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Feb 27 01:33:10 2005
Subject: [gutvol-d] one more thing, for jon noring
Message-ID: <157.4b53b442.2f52ed7f@aol.com>

jon noring said:
>   Unfortunately I do not have OCR output

did you do o.c.r. on it?  if you can retrieve the output, that would be good.
it would allow people to do research on assessing/improving o.c.r. quality,
and assist programmers in developing post-o.c.r. text-cleanup programs.

(but, from later posts, it looks like you grabbed the text from elsewhere.
so what you've done is "blessed" somebody else's work as "trustworthy",
presumably after checking it, and maybe correcting it.  you could also
have done that same thing using project gutenberg's version of the text,
since my comparison of the two files shows them to be very similar,
so much so that i expect they were indeed based on the same version.)


>   I'll zip up the 600 dpi 2-color (B&W) scans 
>   which have already gone through a clean-up stage 
>   (they will be PNG files, and occupy if memory serves me right, 
>   about 50 megs of space

those are too big for my purposes, and for me to download.

but if i could reimburse you for sending them to me on a cd?

or the 120-dpi versions would work just fine for my project,
the same ones that are on the website, just zipped together.


>   Remind me if I don't answer anytime soon.

sure thing.


>   Thanks. I look forward to it! (Really, I do.)

great.

-bowerbird
From cannona at fireantproductions.com  Sun Feb 27 05:30:22 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sun Feb 27 05:32:45 2005
Subject: [gutvol-d] Filesystem changes to the web site
In-Reply-To: <4220C869.9010406@perathoner.de>
References: <421CDA26.7060507@perathoner.de>
	<6.1.2.0.0.20050225171445.01be73b0@mail.fireantproductions.com>
	<4220C869.9010406@perathoner.de>
Message-ID: <6.1.2.0.0.20050227072235.01e2dec0@mail.fireantproductions.com>

At 01:05 PM 2/26/2005, you wrote:

>The timeline is: as soon as I get it done.
>
>I assume your form processor writes a log file of the requests somewhere. 
>At present there are two copies of your program.
>
>You should go to the /public/vhost/g/gutenberg/html/ directory on 
>login.ibiblio.org and edit that copy of the form processor so it writes 
>the log into the new file hierarchy. The old copy under 
>/public/html/gutenberg/ will still write the log to the old location. When 
>we switch over, the new form will start writing the new log and you'll 
>just have to pick up the old log once manually before we delete the old 
>directory.
>
>Test your new form under
>
>   www-dev.gutenberg.org


I'm actually thinking it might be easier to take the system down for a 
couple days during the switch over.  That way, I can copy the old database 
into the new directory without having to wonder which requests went where.

I assume that you will be giving the go-ahead to Ibiblio once you've tested 
everything.  Would it be at all possible to drop me an e-mail a day or so 
before you think you'll be contacting them so I can take things offline?

Thanks!

Sincerely
Aaron Cannon


>--
>Marcello Perathoner
>webmaster@gutenberg.org
>
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d


--
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) 


From hacker at gnu-designs.com  Sun Feb 27 06:21:24 2005
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Sun Feb 27 06:23:30 2005
Subject: [gutvol-d] Filesystem changes to the web site
In-Reply-To: <6.1.2.0.0.20050227072235.01e2dec0@mail.fireantproductions.com>
References: <421CDA26.7060507@perathoner.de>
	<6.1.2.0.0.20050225171445.01be73b0@mail.fireantproductions.com>
	<4220C869.9010406@perathoner.de>
	<6.1.2.0.0.20050227072235.01e2dec0@mail.fireantproductions.com>
Message-ID: <Pine.LNX.4.61.0502270920300.23810@aphrodite.gnu-designs.com>


> I'm actually thinking it might be easier to take the system down for 
> a couple days during the switch over.  That way, I can copy the old 
> database into the new directory without having to wonder which 
> requests went where.

 	How do these changes affect those of us who maintain mirrors? 
I've noticed that rsync'ing the main filesystem to keep up to date, 
has duplicated two copies of the tree now. Is this intentional? Its 
also taking twice the amount of space.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From nwolcott at dsdial.net  Sun Feb 27 03:59:00 2005
From: nwolcott at dsdial.net (N Wolcott)
Date: Sun Feb 27 08:43:09 2005
Subject: [gutvol-d] Pepys' birthday
References: <1e2.36266c1c.2f4e4543@aol.com>
	<20050223213539.GB14264@reactor-core.org>
Message-ID: <006801c51ceb$18df3220$399495ce@gw98>

It occurred to me that if there are not too many xxx portions of the diary,
then under the "fair use" doctrine one could write a "scholarly article" on
the topic of "editorial squeamishness" and include the referenced passages
as footnotes and publish it in a scholarly place like the PG arachive??
----- Original Message -----
From: "Jonathan Walther" <krooger@debian.org>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Wednesday, February 23, 2005 4:35 PM
Subject: Re: [gutvol-d] Pepys' birthday


> On Wed, Feb 23, 2005 at 03:44:51PM -0500, Gutenberg9443@aol.com wrote:
> >   So who is going to complain? There is a new edition as of about 24
> >   years ago, which includes all Pepys's XXX comments that are omitted
> >   from the earlier edition.
>
> I seem to recall that Pepys diaries were written in a special shorthand.
> The current editions may claim copyright on their "transcriptions" of
> the shorthand.
>
> Anyone game for scanning in the original shorthand, and transcribing it?
>
> Jonathan
>
> --
>           It's not true unless it makes you laugh,
>      but you don't understand it until it makes you weep.
>
> Eukleia: Jonathan Walther
> Address: 12706 99 Ave, Surrey, BC V3V2P8 (Canada)
> Contact: 604-684-1319 (daytime)
> Contact: 604-582-9308 (morning and evening)
> Puritan: Purity of faith, Purity of doctrine. Sola Scriptura!
>
> Patriarchy, Polygamy, Slavery === Fatherhood, Husbandry, Mastery
> Matriarchy, Monogamy, Prisons === Wickedness, Stupidity, Buggery
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hyphen at hyphenologist.co.uk  Sun Feb 27 10:26:45 2005
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sun Feb 27 10:28:40 2005
Subject: [gutvol-d] Pepys' birthday
In-Reply-To: <006801c51ceb$18df3220$399495ce@gw98>
References: <1e2.36266c1c.2f4e4543@aol.com>
	<20050223213539.GB14264@reactor-core.org>
	<006801c51ceb$18df3220$399495ce@gw98>
Message-ID: <nu3421t20s20gcnte6ai4crf1545n9dvn4@4ax.com>

On Sun, 27 Feb 2005 06:59:00 -0500,  "N Wolcott" <nwolcott@dsdial.net>
wrote:

| It occurred to me that if there are not too many xxx portions of the diary,
| then under the "fair use" doctrine one could write a "scholarly article" on
| the topic of "editorial squeamishness" and include the referenced passages
| as footnotes and publish it in a scholarly place like the PG arachive??

I don't understand why there should be any problems with xxx portions.  Or
indeed why you thought it necessary to use xxx.
Nowadays absolutely anything goes, a bit of sex does not cause any problems
whatsoever, in the UK at least.

-- 
Dave F

From jon at noring.name  Sun Feb 27 12:16:17 2005
From: jon at noring.name (Jon Noring)
Date: Sun Feb 27 12:17:55 2005
Subject: [gutvol-d] one more thing, for jon noring
In-Reply-To: <157.4b53b442.2f52ed7f@aol.com>
References: <157.4b53b442.2f52ed7f@aol.com>
Message-ID: <15216999687.20050227131617@noring.name>

Bowerbird wrote:
> jon noring said:

> did you do o.c.r. on it?  if you can retrieve the output, that would
> be good. it would allow people to do research on assessing/improving
> o.c.r. quality, and assist programmers in developing post-o.c.r.
> text-cleanup programs.

No, I did not OCR the scans for producing My Antonia (I did experiment
with scanning though). But since the scans exist, any OCR package will
import them and scan them. So nothing is "lost". There's no law that
says one must OCR them at the same time they are scanned -- they are
separate processes and can be decoupled with no loss of anything to
anyone at any time.

If you need OCRing done, you can probably post a "plea for help" and
find someone who has the OCR software packages you'd like to try (I
don't have the robust, up-to-date ones -- like Abbyy which I asked a
friend to help out with for my experiments -- I just have the cheapo
freebies.) The scans are available online for download, as you know.


> (but, from later posts, it looks like you grabbed the text from
> elsewhere. so what you've done is "blessed" somebody else's work as
> "trustworthy", presumably after checking it, and maybe correcting
> it.  you could also have done that same thing using project
> gutenberg's version of the text, since my comparison of the two
> files shows them to be very similar, so much so that i expect they
> were indeed based on the same version.)

I won't go into the gory details, but yes, I took two versions and
then combined/diffed them. I then did a very thorough comparison
page-by-page to the original source page scans, a la DP. We are now in
the process of having several people do the same (a page-by-page
comparison of the XHTML version to the page scans -- I want at least
two people to go over each page) -- anyone reading this, you are
welcome to volunteer and help us -- do a few pages just like for DP!

The error rate in the XHTML version (still beta) is now very low, and
can be considered for all practical purposes a very accurate and
textually faithful reproduction of the 1918 1st edition. (But then,
maybe I'll be surprised and find a serious error in the text.)

In retrospect, this process should have been done via DP instead. But
there was a deadline to finish the first beta of the cleaned-up text,
so there was no time to have this done at DP. However, I do plan to
post a request to the relevant DP forums for final proofing help, as
well as to seek help from the DP folk on other matters (such as TEI
markup). If DP wishes to go over it in some fashion and incorporate
it into their "archive" as well as submit it to PG, that's fine by me
(I will not directly contribute the text to PG as I've noted on TeBC.)

Regarding the PG version of My Antonia compared to the 1918 1st
edition, there are a *lot* of differences. I regularized both texts
and ran 'diff' between them, and found over 200 differences, mostly
spelling (the PG version uses mostly British spelling but even here it
is strangely inconsistent!), but also oddities in punctuation, wrong
paragraph breaks, some missing accented characters, a couple places
with changed wording, a few misspellings, etc.

Of course, whenever I encountered a difference, I went to the
original page scans of the 1918 1st edition to verify what was done
there. All 200+ differences with respect to the original text were
with the PG version, which I surmise was derived from the British
edition of My Antonia, which is noted to have been mangled in editing
(Willa Cather was supposedly furious over the quality of editing in
the British edition which went beyond just using British spelling for
words, such as 'colour' instead of 'color'.) 

Anyway, when the final proofing is done, I believe the textual error
rate will be very low, near zero (but one cannot say it is perfect.)
So I think it will be useful for OCR accuracy experiments (which I
assume is what you want to perform?)

Of course, there's always the issue of hyphenated compound words,
figuring out if a hyphenated compound word will have a dash in it or
not, but that's another matter. I believe we did pretty good on this,
with help from the UNL information as well as textual analysis.


>> I'll zip up the 600 dpi 2-color (B&W) scans 
>> which have already gone through a clean-up stage 
>> (they will be PNG files, and occupy if memory serves me right, 
>> about 50 megs of space

> those are too big for my purposes, and for me to download.

Oops, sorry. They are pretty large for downloading by modem (but with
DSL/Cable they can be downloaded pretty quick.)


> but if i could reimburse you for sending them to me on a cd?

It's on me. In private email send me your address and I'll burn and
mail you a disk of the 600 dpi and 120 dpi scans. I do have the
original 600 dpi 24-bit color scans (which is overkill -- next time
I'll do the raw scans for B&W pages at greyscale), but in PNG they
occupy over 5 gigs of disk space! (Don't have a DVD burner yet
otherwise I'd send those, too.)


> or the 120-dpi versions would work just fine for my project,
> the same ones that are on the website, just zipped together.

Unfortunately, since the 120-dpi scans are antialiased greyscale
(while the 600 dpi are bitonal), the size difference is surprisingly
not that different.

I updated the My Antonio index page to include downloading all the
120-dpi scans in a ZIP file, which is still over 30 megs in size:

   http://www.openreader.org/myantonia/

Bowerbird, I'll be happy to put up a ZML regularized text version of
My Antonia. If I put up plain text, I want the plain text to follow
some regularization rules, and ZML is the only game in town actively
working with etexts (as far as I know at least -- I do recall two
other text regularization schemas, but don't know if the authors are
doing anything with them.)


Jon

From marcello at perathoner.de  Sun Feb 27 11:25:14 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Feb 27 12:19:41 2005
Subject: [gutvol-d] Filesystem changes to the web site
In-Reply-To: <Pine.LNX.4.61.0502270920300.23810@aphrodite.gnu-designs.com>
References: <421CDA26.7060507@perathoner.de>	<6.1.2.0.0.20050225171445.01be73b0@mail.fireantproductions.com>	<4220C869.9010406@perathoner.de>	<6.1.2.0.0.20050227072235.01e2dec0@mail.fireantproductions.com>
	<Pine.LNX.4.61.0502270920300.23810@aphrodite.gnu-designs.com>
Message-ID: <42221E9A.9050202@perathoner.de>

David A. Desrosiers wrote:

>     How do these changes affect those of us who maintain mirrors? I've 
> noticed that rsync'ing the main filesystem to keep up to date, has 
> duplicated two copies of the tree now. Is this intentional? Its also 
> taking twice the amount of space.

Nothing is going to change for the file archive. Just the web site files 
will be moved to a different file server.

You are not supposed to keep mirrors of the web site. We will implement 
a net of squids to take load off the main site.


-- 
Marcello Perathoner
webmaster@gutenberg.org


From hacker at gnu-designs.com  Sun Feb 27 12:23:20 2005
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Sun Feb 27 12:26:34 2005
Subject: [gutvol-d] Filesystem changes to the web site
In-Reply-To: <42221E9A.9050202@perathoner.de>
References: <421CDA26.7060507@perathoner.de>
	<6.1.2.0.0.20050225171445.01be73b0@mail.fireantproductions.com>
	<4220C869.9010406@perathoner.de>
	<6.1.2.0.0.20050227072235.01e2dec0@mail.fireantproductions.com>
	<Pine.LNX.4.61.0502270920300.23810@aphrodite.gnu-designs.com>
	<42221E9A.9050202@perathoner.de>
Message-ID: <Pine.LNX.4.61.0502271522300.3549@angst.gnu-designs.com>


> You are not supposed to keep mirrors of the web site. We will 
> implement a net of squids to take load off the main site.

	I'm mirroring the archive, not the website. Something changed 
recently, and all of the directories have been moved to a completely 
new layout, duplicating the tree in a secondary location inside the 
same parent root. Its doubled the amount of space the archive 
consumes, which is why I was concerned.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From marcello at perathoner.de  Sun Feb 27 12:57:35 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Feb 27 12:58:47 2005
Subject: [gutvol-d] Filesystem changes to the web site
In-Reply-To: <Pine.LNX.4.61.0502271522300.3549@angst.gnu-designs.com>
References: <421CDA26.7060507@perathoner.de>	<6.1.2.0.0.20050225171445.01be73b0@mail.fireantproductions.com>	<4220C869.9010406@perathoner.de>	<6.1.2.0.0.20050227072235.01e2dec0@mail.fireantproductions.com>	<Pine.LNX.4.61.0502270920300.23810@aphrodite.gnu-designs.com>	<42221E9A.9050202@perathoner.de>
	<Pine.LNX.4.61.0502271522300.3549@angst.gnu-designs.com>
Message-ID: <4222343F.4050404@perathoner.de>

David A. Desrosiers wrote:

> 	I'm mirroring the archive, not the website. Something changed 
> recently, and all of the directories have been moved to a completely 
> new layout, duplicating the tree in a secondary location inside the 
> same parent root. Its doubled the amount of space the archive 
> consumes, which is why I was concerned.

I cannot understand that. The file archive was moved a while ago to the 
new fileserver but mounted on the same directory.

What commandline are you using to rsync the archive?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From nwolcott at dsdial.net  Sun Feb 27 13:03:09 2005
From: nwolcott at dsdial.net (N Wolcott)
Date: Sun Feb 27 13:27:32 2005
Subject: [gutvol-d] Good site
Message-ID: <003401c51d12$d1807240$bc9495ce@gw98>

http://copac.ac.uk/  is a marvellous site I just discovered, at the University of Manchester,  propbably already known to you experts. On it you can search all the combined british, scottish, and irish library catalogues at one time with results which can be sorted etc, downloaded in various formats, and saved. A real boon if you are looking for pre 1900 books. The expanded catalogue entries often have information as to pseudonyms publisher dates etc etc.
 
N Wolcott  nwolcott2@post.harvard.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050227/fd7f0183/attachment.html
From Bowerbird at aol.com  Sun Feb 27 13:35:39 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Feb 27 13:37:16 2005
Subject: [gutvol-d] one more thing, for jon noring
Message-ID: <1ec.35c2cb43.2f53972b@aol.com>

jon said:
>   But there was a deadline to finish the first beta of the 
>   cleaned-up text, so there was no time to have this done at DP.

this text would fly through d.p. in a matter of hours...


>   and found over 200 differences

but my comparison shows that most of those are minor,
to the point of total insignificance to the average reader.
when the focus is narrowed to meaningful differences,
the number is less than 20.  it is good to correct them
-- and the less-significant ones too -- very good, but 
this is hardly a good example of an error-ridden e-text.


>   Unfortunately, since the 120-dpi scans are 
>   antialiased greyscale (while the 600 dpi are bitonal), 
>   the size difference is surprisingly not that different.
>   I updated the My Antonio index page to include 
>   downloading all the 120-dpi scans in a ZIP file, 
>   which is still over 30 megs in size:

the .pngs on the website would seem to be much smaller.
roughly 400 of those, at about 20k each, would be 8 megs.
is my arithmetic wrong?  or am i missing something?

-bowerbird
From jon at noring.name  Sun Feb 27 15:57:13 2005
From: jon at noring.name (Jon Noring)
Date: Sun Feb 27 15:58:54 2005
Subject: [gutvol-d] one more thing, for jon noring
In-Reply-To: <1ec.35c2cb43.2f53972b@aol.com>
References: <1ec.35c2cb43.2f53972b@aol.com>
Message-ID: <19430256015.20050227165713@noring.name>

Bowerbird wrote:
> jon said:

>> But there was a deadline to finish the first beta of the cleaned-up
>> text, so there was no time to have this done at DP.

> this text would fly through d.p. in a matter of hours...

Well, yes, once it's been put in the queue. Anyone here from DP care to
comment on typical times for a book to be proofed in the DP system? (I
would have been happy to contribute to the post-processing markup
stage.)

(Btw, I finished the XHTML before I even finished the scanning, so it
would have been delayed in the DP system anyway. But yet, I would have
preferred the job be done in the DP system. At this stage it probably
won't fit into their work flow.)


>> Unfortunately, since the 120-dpi scans are 
>> antialiased greyscale (while the 600 dpi are bitonal), 
>> the size difference is surprisingly not that different.
>> I updated the My Antonio index page to include 
>> downloading all the 120-dpi scans in a ZIP file, 
>> which is still over 30 megs in size:

> the .pngs on the website would seem to be much smaller.
> roughly 400 of those, at about 20k each, would be 8 megs.
> is my arithmetic wrong?  or am i missing something?

Most of the PNGs are in the 70-80k range (I just rechecked at the
online site to make sure something weird didn't happen.) So the 30+
MBytes for the 400+ scans at 120 dpi greyscale/antialiased is about
right.

Let me know if you want me to snail-mail the scans on CD-ROM.

Jon

From nwolcott at dsdial.net  Sun Feb 27 13:57:33 2005
From: nwolcott at dsdial.net (N Wolcott)
Date: Sun Feb 27 16:27:09 2005
Subject: [gutvol-d] Pepys' birthday
References: <1e2.36266c1c.2f4e4543@aol.com><20050223213539.GB14264@reactor-core.org><006801c51ceb$18df3220$399495ce@gw98>
	<nu3421t20s20gcnte6ai4crf1545n9dvn4@4ax.com>
Message-ID: <00cf01c51d2b$e7516e80$bc9495ce@gw98>

The problem with the xxx portions is that they are only available in the
copyrighted version of the diaries circa 1970. Since the Mynors-Bright
versions indicate where the cuts were made, one could easily marry the
additions with the original. Hence the need to use the U.S. "fair use"
doctrine if it still exists.
----- Original Message -----
From: "Dave Fawthrop" <hyphen@hyphenologist.co.uk>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Sunday, February 27, 2005 1:26 PM
Subject: Re: [gutvol-d] Pepys' birthday


> On Sun, 27 Feb 2005 06:59:00 -0500,  "N Wolcott" <nwolcott@dsdial.net>
> wrote:
>
> | It occurred to me that if there are not too many xxx portions of the
diary,
> | then under the "fair use" doctrine one could write a "scholarly article"
on
> | the topic of "editorial squeamishness" and include the referenced
passages
> | as footnotes and publish it in a scholarly place like the PG arachive??
>
> I don't understand why there should be any problems with xxx portions.  Or
> indeed why you thought it necessary to use xxx.
> Nowadays absolutely anything goes, a bit of sex does not cause any
problems
> whatsoever, in the UK at least.
>
> --
> Dave F
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From Bowerbird at aol.com  Sun Feb 27 16:35:49 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Feb 27 16:37:33 2005
Subject: [gutvol-d] one more thing, for jon noring
Message-ID: <85.22504fbb.2f53c165@aol.com>

jon said:
>   Most of the PNGs are in the 70-80k range

yes, that was my mistake, sorry.
i was misled by the .djvu version,
where most of the pages are <10k.
(i've _got_ to see how to use that!)

of course, at 30 megs per book --
gosh, i remember when 30 megs
was a good-sized _hard-drive_!       :+)
-- we see why project gutenberg
has never put all its scans online.

distributed proofreaders keeps
_saying_ that they are going to,
but as of yet, they haven't done it.

d.p. still seems to be constrained
by disk-space, even with generous
help from ibiblio and internet archive.

-bowerbird
From jon at noring.name  Sun Feb 27 17:00:29 2005
From: jon at noring.name (Jon Noring)
Date: Sun Feb 27 17:02:12 2005
Subject: [gutvol-d] one more thing, for jon noring
In-Reply-To: <85.22504fbb.2f53c165@aol.com>
References: <85.22504fbb.2f53c165@aol.com>
Message-ID: <9734051828.20050227180029@noring.name>

Bowerbird wrote:
> jon said:

>> Most of the PNGs are in the 70-80k range

> yes, that was my mistake, sorry.
> i was misled by the .djvu version,
> where most of the pages are <10k.
> (i've _got_ to see how to use that!)

DjVu is cool, but the "openness" and long-term viability of the format
is still open to question. The Internet Archive uses it, so they must
feel it is open enough to use. There's a dearth of free tools using
DjVu, but from what I understand there's no impediment to open source
DjVu compile tools.


> distributed proofreaders keeps
> _saying_ that they are going to,
> but as of yet, they haven't done it.

Yes, if this is the case, it is mysterious since IA will gladly host
them once the etext version is out the door. I do know that some scan
sets are encumbered (they are "loaned" to DP under some sort of
arrangement, but cannot be made public -- this is somewhat troubling,
but hopefully the scans will be made available elsewhere at a future
time, such as through IA's scanning activities. One thing I do know is
that DP does keep full source metadata for each text they produce,
even if that data is not turned over to PG.)


> d.p. still seems to be constrained
> by disk-space, even with generous
> help from ibiblio and internet archive.

I know that for production purposes they want to use their own servers
-- IA is not reliable enough. IA's focus is on archiving and storing,
so 24-7 with full-throttle availability is a lower priority to IA,
while DP *must* have 24-7 availability and sufficient speed to not
keep volunteers waiting. Thus, disk space is an issue for the DP
production process, especially in that DP is still a shoestring
operation. Anyway, this is my interpretation of what Juliet told me a
few months ago. Maybe someone from DP will reply to this...

Jon

From servalan at ar.com.au  Sun Feb 27 18:47:50 2005
From: servalan at ar.com.au (Pauline)
Date: Sun Feb 27 18:50:44 2005
Subject: [gutvol-d] one more thing, for jon noring
In-Reply-To: <9734051828.20050227180029@noring.name>
References: <85.22504fbb.2f53c165@aol.com>
	<9734051828.20050227180029@noring.name>
Message-ID: <42228656.5000105@ar.com.au>

Jon Noring wrote:

> I know that for production purposes they want to use their own servers
> -- IA is not reliable enough. IA's focus is on archiving and storing,
> so 24-7 with full-throttle availability is a lower priority to IA,
> while DP *must* have 24-7 availability and sufficient speed to not
> keep volunteers waiting. Thus, disk space is an issue for the DP
> production process, especially in that DP is still a shoestring
> operation. Anyway, this is my interpretation of what Juliet told me a
> few months ago. Maybe someone from DP will reply to this...

The short version - very busy tying shoestrings :) :
DP does fairly well for 24/7 uptime now. Since migrating to our own 
server last year, we've had minor network glitches due to routing 
hassles at the ISP & a few scheduled outages due to upgrades to the DP 
code.

All our projects in various stages of production are kept on the DP 
production server. DP has had a production inbalance, proofing more 
books than post-processing & subsequent posting to PG. Projects are 
archived off the production server only after they have been posted to 
PG. Hence over time, we have wound up with ever-decreasing amounts of 
free disk space. The lack of disk space is not really the issue, the 
inbalance is.

We are doing our best to address the inbalance by further distributing 
workload & post-processing more of our in progress projects. In the 
interim, disk space is tight, but we are managing for the moment. I have 
posted a few times about this issue to the DP Forums.

Want to help? - sign up to smooth-read texts before they get posted & 
help our volunteer post-processors (PPers) & post-processing verifiers 
(PPVers) post more books to PG. If you can read an ebook, you can help. 
More info here:
http://www.pgdp.net/phpBB2/viewtopic.php?t=13677

An accessible archive of posted projects & images is in the works.

Thanks,
P - one of the DP Site Admins - (pourlean @ DP)
From prosfilaes at gmail.com  Sun Feb 27 20:06:35 2005
From: prosfilaes at gmail.com (David Starner)
Date: Sun Feb 27 20:08:15 2005
Subject: [gutvol-d] one more thing, for jon noring
In-Reply-To: <9734051828.20050227180029@noring.name>
References: <85.22504fbb.2f53c165@aol.com>
	<9734051828.20050227180029@noring.name>
Message-ID: <6d99d1fd05022720067eebfde7@mail.gmail.com>

On Sun, 27 Feb 2005 18:00:29 -0700, Jon Noring <jon@noring.name> wrote:
> There's a dearth of free tools using
> DjVu, but from what I understand there's no impediment to open source
> DjVu compile tools.

What do you mean a dearth of free tools? The djvulibre set seems to be
a pretty complete set of tools.
From jon at noring.name  Sun Feb 27 20:28:51 2005
From: jon at noring.name (Jon Noring)
Date: Sun Feb 27 20:30:53 2005
Subject: [gutvol-d] one more thing, for jon noring
In-Reply-To: <6d99d1fd05022720067eebfde7@mail.gmail.com>
References: <85.22504fbb.2f53c165@aol.com>
	<9734051828.20050227180029@noring.name>
	<6d99d1fd05022720067eebfde7@mail.gmail.com>
Message-ID: <10946554359.20050227212851@noring.name>

David Starner wrote:
> Jon Noring <jon@noring.name> wrote:

>> There's a dearth of free tools using
>> DjVu, but from what I understand there's no impediment to open source
>> DjVu compile tools.

> What do you mean a dearth of free tools? The djvulibre set seems to be
> a pretty complete set of tools.

I stand corrected.

I'll try the viewer plugin for Opera/Firefox. It looks interesting.

Hopefully a Windows-based encoder with GUI front end will eventually
be developed. From this perspective, there does appear to be a dearth
of free tools for DjVu encoding.

Jon

From jon at noring.name  Sun Feb 27 20:37:18 2005
From: jon at noring.name (Jon Noring)
Date: Sun Feb 27 20:39:16 2005
Subject: [gutvol-d] one more thing, for jon noring
In-Reply-To: <10946554359.20050227212851@noring.name>
References: <85.22504fbb.2f53c165@aol.com>
	<9734051828.20050227180029@noring.name>
	<6d99d1fd05022720067eebfde7@mail.gmail.com>
	<10946554359.20050227212851@noring.name>
Message-ID: <19547060671.20050227213718@noring.name>

>Jon Noring wrote:
> David Starner wrote:
>> Jon Noring <jon@noring.name> wrote:

>>> There's a dearth of free tools using
>>> DjVu, but from what I understand there's no impediment to open source
>>> DjVu compile tools.

>> What do you mean a dearth of free tools? The djvulibre set seems to be
>> a pretty complete set of tools.

> I'll try the viewer plugin for Opera/Firefox. It looks interesting.

Oops, there's not yet a djvulibre browser viewer plugin for Windows
(I use both Opera 7 and FireFox, so I got excited that I could view
DjVu files using these browsers in Windows. But nada -- stuck with
IE6 and LizardTech's plugin.)

So for those who do most of their text and graphics processing on
Windows, we're still stuck with the payware encoders from LizardTech.

This may be one reason why DjVu has not taken off -- the djvulibre
developers seem to have little interest at this time in encoders and
viewers for Windows-based systems. Not exactly a great marketing
decision.

Jon

From scott_bulkmail at productarchitect.com  Sun Feb 27 20:37:56 2005
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Sun Feb 27 20:42:52 2005
Subject: [gutvol-d] plain text formats [was: one more thing, for jon noring]
In-Reply-To: <15216999687.20050227131617@noring.name>
References: <157.4b53b442.2f52ed7f@aol.com>
	<15216999687.20050227131617@noring.name>
Message-ID: <p06110429be4841633c23@[192.168.0.52]>

>If I put up plain text, I want the plain text to follow
>some regularization rules, and ZML is the only game in town actively
>working with etexts (as far as I know at least -- I do recall two
>other text regularization schemas, but don't know if the authors are
>doing anything with them.)

There are several schemes in active use, including:

wiki markup: http://en.wikipedia.org/wiki/Wiki_markup

STX: http://www.zope.org/Members/jim/StructuredTextWiki/FrontPage (Structured Text) by Jim Fulton, e.g. for Zope and ZWiki

reStructuredText: http://docutils.sourceforge.net/rst.html for Python's DocUtils

Markdown: http://daringfireball.net/projects/markdown/ by John Gruber of Daring Fireball

(I'm not sure if any are being used for the same type of etexts as PG, but it seems likely that the overall level and diversity of activity and tools are more important.  e.g. I think all of the above include source code, typically using a friendly "attribution" license.)

My own approach (from 1995) plus links to several others is here:
	No-Tags Markup: http://prefab.com/ssl/notagsmarkup.html
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
From jon at noring.name  Sun Feb 27 21:09:00 2005
From: jon at noring.name (Jon Noring)
Date: Sun Feb 27 21:10:45 2005
Subject: [gutvol-d] plain text formats [was: one more thing,
	for jon noring]
In-Reply-To: <p06110429be4841633c23@[192\.168\.0\.52]>
References: <157.4b53b442.2f52ed7f@aol.com>
	<15216999687.20050227131617@noring.name>
	<p06110429be4841633c23@[192.168.0.52]>
Message-ID: <12848962687.20050227220900@noring.name>

Scott Lawton wrote:

>> If I put up plain text, I want the plain text to follow
>> some regularization rules, and ZML is the only game in town actively
>> working with etexts (as far as I know at least -- I do recall two
>> other text regularization schemas, but don't know if the authors are
>> doing anything with them.)

> There are several schemes in active use, including:
>
> wiki markup: http://en.wikipedia.org/wiki/Wiki_markup
>
> STX: http://www.zope.org/Members/jim/StructuredTextWiki/FrontPage
> (Structured Text) by Jim Fulton, e.g. for Zope and ZWiki
>
> reStructuredText: http://docutils.sourceforge.net/rst.html for
> Python's DocUtils
>
> Markdown: http://daringfireball.net/projects/markdown/ by John
> Gruber of Daring Fireball

Thanks! Markdown is especially interesting since it produces
regularized plain text which looks and reads the most like PG plain
text, other than Bowerbird's ZML.

It's also interesting that there's another ZML in use, so Bowerbird
may need to change the acronym he is using for his regularized
plain text schema, such as to ZenML:

   http://rx4rdf.liminalzone.org/RhizML


> (I'm not sure if any are being used for the same type of etexts as
> PG, but it seems likely that the overall level and diversity of
> activity and tools are more important.  e.g. I think all of the
> above include source code, typically using a friendly "attribution"
> license.)
>
> My own approach (from 1995) plus links to several others is here:
>         No-Tags Markup: http://prefab.com/ssl/notagsmarkup.html

Very useful information. Thanks.

Jon

From Bowerbird at aol.com  Sun Feb 27 23:19:33 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Feb 27 23:21:23 2005
Subject: [gutvol-d] plain text formats [was: one more thing,
	for jon noring]
Message-ID: <b8.6d2b4e17.2f542005@aol.com>

jon said:
>   It's also interesting that there's another ZML in use, 
>   so Bowerbird may need to change the acronym he is using 
>   for his regularized plain text schema, such as to ZenML:

fat chance!
somebody better warn the new interloper...        ;+)

interesting page, scott.
(except i got a flock of 404s.)

one of these days, no-markup markup
is finally gonna hit its critical mass...

my efforts are aimed at offline e-books,
for which i am creating the viewer-app,
so my "zen markup language" files aren't
intended to be transformed into (x)html
(who needs the hassle?), but read directly.

the real noise will be about my _viewer_.
the sizzle on the steak is that you feed it
plain old text, and still get immense power.

eventually i'll write a z.m.l. browser plug-in,
but for now the more-important priority is to
wean people off the browser for reading e-books.

so i do not see myself as competing in any way
with any of the other no-markup systems today.
(nor do i see any of them as competition to me.)
i also intend to be much simpler than any of them!

-bowerbird
From gbnewby at pglaf.org  Mon Feb 28 00:13:16 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon Feb 28 00:13:17 2005
Subject: [gutvol-d] Filesystem changes to the web site
In-Reply-To: <4222343F.4050404@perathoner.de>
References: <421CDA26.7060507@perathoner.de>
	<6.1.2.0.0.20050225171445.01be73b0@mail.fireantproductions.com>
	<4220C869.9010406@perathoner.de>
	<6.1.2.0.0.20050227072235.01e2dec0@mail.fireantproductions.com>
	<Pine.LNX.4.61.0502270920300.23810@aphrodite.gnu-designs.com>
	<42221E9A.9050202@perathoner.de>
	<Pine.LNX.4.61.0502271522300.3549@angst.gnu-designs.com>
	<4222343F.4050404@perathoner.de>
Message-ID: <20050228081316.GB27826@pglaf.org>

On Sun, Feb 27, 2005 at 09:57:35PM +0100, Marcello Perathoner wrote:
> David A. Desrosiers wrote:
> 
> >	I'm mirroring the archive, not the website. Something changed 
> >recently, and all of the directories have been moved to a completely 
> >new layout, duplicating the tree in a secondary location inside the 
> >same parent root. Its doubled the amount of space the archive 
> >consumes, which is why I was concerned.
> 
> I cannot understand that. The file archive was moved a while ago to the 
> new fileserver but mounted on the same directory.
> 
> What commandline are you using to rsync the archive?

I'm just confirming that my mirrors don't seem to show
any duplication (total size is ~143.6GB).  For sample
command lines and mirroring methods, see:
	http://gutenberg.org/howto/mirror-howto

  -- Greg
From widger at cecomet.net  Wed Feb 23 08:24:33 2005
From: widger at cecomet.net (David Widger)
Date: Mon Feb 28 00:19:54 2005
Subject: [gutvol-d] Pepys' birthday
In-Reply-To: <005901c519bf$420ca4e0$ac9495ce@gw98>
References: <005901c519bf$420ca4e0$ac9495ce@gw98>
Message-ID: <6.0.1.1.2.20050223112214.027c8c48@mail.adelphia.net>

At 10:49 AM 2/23/2005, N Wolcott wrote:
>Being his birthday maybe this is appropriate. Pepys gave his memoirs to 
>Cambridge University, but the full text was not published until 1970. That 
>being the case would not the text (minus editorial comment and added 
>footnotes) be now public domain as it is now more than 75 years since the 
>author's death?
>
>
>N Wolcott  <mailto:nwolcott2@post.harvard.edu>nwolcott2@post.harvard.edu
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d


Here is the PG Pepys.

David


Samuel Pepys       Unabridged Diary
<http://www.gutenberg.org/dirs/4/2/0/4200/4200.txt>Entire Gutenberg Edition 
of The Diary of Samuel Pepys(6.6 mb)
<http://www.gutenberg.org/dirs/7/5/5/7554/7554-h/7554-h.htm>Quotes & Images
<http://www.gutenberg.net.au/widger/portrait/pepys.jpg>
182389e.jpg


<http://www.gutenberg.org/dirs/4/1/2/4125/4125.txt>1660 
<http://www.gutenberg.org/dirs/4/1/1/4117/4117.txt>Intro 
<http://www.gutenberg.org/dirs/4/1/1/4118/4118.txt> 
Jan   <http://www.gutenberg.org/dirs/4/1/1/4119/4119.txt> 
Feb   <http://www.gutenberg.org/dirs/4/1/2/4120/4120.txt> 
Mar/Apr   <http://www.gutenberg.org/dirs/4/1/2/4121/4121.txt> 
May   <http://www.gutenberg.org/dirs/4/1/2/4122/4122.txt> 
Jun/Jul   <http://www.gutenberg.org/dirs/4/1/2/4123/4123.txt> 
Aug/Sep   <http://www.gutenberg.org/dirs/4/1/2/4124/4124.txt> Oct/Nov/Dec

<http://www.gutenberg.org/dirs/4/1/3/4131/4131.txt>1661 
<http://www.gutenberg.org/dirs/4/1/2/4126/4126.txt>Jan/Feb 
/Mar   <http://www.gutenberg.org/dirs/4/1/2/4127/4127.txt> 
Apr/May/Jun   <http://www.gutenberg.org/dirs/4/1/2/4128/4128.txt> 
Jul/Aug   <http://www.gutenberg.org/dirs/4/1/2/4129/4129.txt> 
Sep/Oct   <http://www.gutenberg.org/dirs/4/1/3/4130/4130.txt> Nov/Dec

<http://www.gutenberg.org/dirs/4/1/3/4138/4138.txt>1662 
<http://www.gutenberg.org/dirs/4/1/3/4132/4132.txt>Jan/Feb 
<http://www.gutenberg.org/dirs/4/1/3/4133/4133.txt> 
Mar/Apr   <http://www.gutenberg.org/dirs/4/1/3/4134/4134.txt> 
May/Jun   <http://www.gutenberg.org/dirs/4/1/3/4135/4135.txt> 
Jul/Aug   <http://www.gutenberg.org/dirs/4/1/3/4136/4136.txt> 
Sep/Oct   <http://www.gutenberg.org/dirs/4/1/3/4137/4137.txt> Nov/Dec

<http://www.gutenberg.org/dirs/4/1/4/4145/4145.txt>1663 
<http://www.gutenberg.org/dirs/4/1/3/4139/4139.txt>Jan/Feb 
<http://www.gutenberg.org/dirs/4/1/4/4140/4140.txt> 
Mar/Apr   <http://www.gutenberg.org/dirs/4/1/4/4141/4141.txt> 
May/Jun   <http://www.gutenberg.org/dirs/4/1/4/4142/4142.txt> 
Jul/Aug   <http://www.gutenberg.org/dirs/4/1/4/4143/4143.txt> 
Sep/Oct   <http://www.gutenberg.org/dirs/4/1/4/4144/4144.txt> Nov/Dec

<http://www.gutenberg.org/dirs/4/1/5/4153/4153.txt>1664 
<http://www.gutenberg.org/dirs/4/1/4/4146/4146.txt>Jan/Feb 
<http://www.gutenberg.org/dirs/4/1/4/4147/4147.txt> 
Mar   <http://www.gutenberg.org/dirs/4/1/4/4148/4148.txt> 
Apr/May   <http://www.gutenberg.org/dirs/4/1/4/4149/4149.txt> 
Jun/Jul   <http://www.gutenberg.org/dirs/4/1/5/4150/4150.txt> 
Aug/Sep   <http://www.gutenberg.org/dirs/4/1/5/4151/4151.txt> 
Oct/Nov   <http://www.gutenberg.org/dirs/4/1/5/4152/4152.txt> Dec

<http://www.gutenberg.org/dirs/4/1/6/4162/4162.txt>1665 
<http://www.gutenberg.org/dirs/4/1/5/4154/4154.txt>Jan/Feb 
<http://www.gutenberg.org/dirs/4/1/5/4155/4155.txt> 
Mar/Apr   <http://www.gutenberg.org/dirs/4/1/5/4156/4156.txt> 
May/Jun   <http://www.gutenberg.org/dirs/4/1/5/4157/4157.txt> 
Jul   <http://www.gutenberg.org/dirs/4/1/5/4158/4158.txt> 
Aug   <http://www.gutenberg.org/dirs/4/1/5/4159/4159.txt> 
Sep   <http://www.gutenberg.org/dirs/4/1/6/4160/4160.txt> 
Oct   <http://www.gutenberg.org/dirs/4/1/6/4161/4161.txt> Nov/Dec

<http://www.gutenberg.org/dirs/4/1/7/4171/4171.txt>1666 
<http://www.gutenberg.org/dirs/4/1/6/4163/4163.txt>Jan/Feb 
<http://www.gutenberg.org/dirs/4/1/6/4164/4164.txt> 
Mar/Apr   <http://www.gutenberg.org/dirs/4/1/6/4165/4165.txt> 
May/Jun   <http://www.gutenberg.org/dirs/4/1/6/4166/4166.txt> 
Jul   <http://www.gutenberg.org/dirs/4/1/6/4167/4167.txt> 
Aug/Sep   <http://www.gutenberg.org/dirs/4/1/6/4168/4168.txt> 
Oct   <http://www.gutenberg.org/dirs/4/1/6/4169/4169.txt> 
Nov   <http://www.gutenberg.org/dirs/4/1/7/4170/4170.txt> Dec

<http://www.gutenberg.org/dirs/4/1/8/4184/4184.txt>1667 
<http://www.gutenberg.org/dirs/4/1/7/4172/4172.txt> 
Jan  <http://www.gutenberg.org/dirs/4/1/7/4173/4173.txt> 
Feb  <http://www.gutenberg.org/dirs/4/1/7/4174/4174.txt> 
Mar  <http://www.gutenberg.org/dirs/4/1/7/4175/4175.txt> 
Apr  <http://www.gutenberg.org/dirs/4/1/7/4176/4176.txt> 
May  <http://www.gutenberg.org/dirs/4/1/7/4177/4177.txt> 
Jun  <http://www.gutenberg.org/dirs/4/1/7/4178/4178.txt> 
Jul  <http://www.gutenberg.org/dirs/4/1/7/4179/4179.txt> 
Aug  <http://www.gutenberg.org/dirs/4/1/8/4180/4180.txt> 
Sep  <http://www.gutenberg.org/dirs/4/1/8/4181/4181.txt> 
Oct  <http://www.gutenberg.org/dirs/4/1/8/4182/4182.txt> 
Nov  <http://www.gutenberg.org/dirs/4/1/8/4183/4183.txt> Dec

<http://www.gutenberg.org/dirs/4/1/9/4195/4195.txt>1668 
<http://www.gutenberg.org/dirs/4/1/8/4185/4185.txt>Jan 
<http://www.gutenberg.org/dirs/4/1/8/4186/4186.txt> 
Feb   <http://www.gutenberg.org/dirs/4/1/8/4187/4187.txt> 
Mar   <http://www.gutenberg.org/dirs/4/1/8/4188/4188.txt> 
Apr   <http://www.gutenberg.org/dirs/4/1/8/4189/4189.txt> 
May   <http://www.gutenberg.org/dirs/4/1/9/4190/4190.txt> 
Jun/Jul   <http://www.gutenberg.org/dirs/4/1/9/4191/4191.txt> 
Aug    <http://www.gutenberg.org/dirs/4/1/9/4192/4192.txt> 
Sep/Oct   <http://www.gutenberg.org/dirs/4/1/9/4193/4193.txt> 
Nov   <http://www.gutenberg.org/dirs/4/1/9/4194/4194.txt> Dec

<http://www.gutenberg.org/dirs/4/1/9/4199/4199.txt>1669 
<http://www.gutenberg.org/dirs/4/1/9/4196/4196.txt> 
Jan   <http://www.gutenberg.org/dirs/4/1/9/4197/4197.txt> 
Feb/Mar   <http://www.gutenberg.org/dirs/4/1/9/4198/4198.txt> Apr/May
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 182389e.jpg
Type: image/jpeg
Size: 61296 bytes
Desc: not available
Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050223/cc0042b9/182389e-0001.jpg
From webmaster at gutenberg.org  Sun Feb 27 12:32:06 2005
From: webmaster at gutenberg.org (Marcello Perathoner)
Date: Mon Feb 28 00:19:56 2005
Subject: [gutvol-d] [Fwd: Carl Ludwig Schleich/Inka Weide]
Message-ID: <42222E46.6040802@gutenberg.org>

Any DPers here that can get this msg to Inka?


-------- Original Message --------
Subject: Carl Ludwig Schleich/Inka Weide
Date: Sun, 27 Feb 2005 17:28:23 +0100
From: jp-com <jp-com@online.de>
To: webmaster@gutenberg.org

Hallo,
please send this mail to inka weide.

Hallo Inka,
mit Freude habe ich auf Gutenberg die Aufs?tze von Caarl Ludwig Schleich
gelesen.
Mehr ?ber meinen Urgro?onkel findet man unter
www.carl-ludwig-schleich.de

Gru?
J?rgen Pohl
Eichenhain 13
D--31311  Uetze


-- 
Marcello Perathoner
webmaster@gutenberg.org

From inka at 21torr.com  Mon Feb 28 03:11:50 2005
From: inka at 21torr.com (inka@21torr.com)
Date: Mon Feb 28 03:10:41 2005
Subject: [gutvol-d] [Fwd: Carl Ludwig Schleich/Inka Weide]
In-Reply-To: <42222E46.6040802@gutenberg.org>
References: <42222E46.6040802@gutenberg.org>
Message-ID: <Pine.LNX.4.58.0502281206130.7499@inka.intranet.21torr.com>

On Sun, 27 Feb 2005, Marcello Perathoner wrote:

> Any DPers here that can get this msg to Inka?
> 

Hm, yes, I think I may be able to reach me :)

Thanks - the first 'reader feedback' for a book I worked on.


Inka
From shimmin at uiuc.edu  Mon Feb 28 06:08:36 2005
From: shimmin at uiuc.edu (Robert Shimmin)
Date: Mon Feb 28 06:08:40 2005
Subject: [gutvol-d] one more thing, for jon noring
In-Reply-To: <19430256015.20050227165713@noring.name>
References: <1ec.35c2cb43.2f53972b@aol.com>
	<19430256015.20050227165713@noring.name>
Message-ID: <422325E4.1060907@uiuc.edu>

Jon Noring wrote:

> Well, yes, once it's been put in the queue. Anyone here from DP care to
> comment on typical times for a book to be proofed in the DP system? (I
> would have been happy to contribute to the post-processing markup
> stage.)

This depends greatly on which queue it gets put in.  English-language 
novels tend to go quickly.  If they qualify for the "easy" queue, they 
spend little time queuing, and often complete proofreading within a few 
days.  Complex texts on "dry" subjects might spend a few weeks queueing, 
and then require several weeks to go through the proofreading rounds.

-- RS
From Bowerbird at aol.com  Mon Feb 28 07:38:41 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Feb 28 07:38:51 2005
Subject: [gutvol-d] one more thing, for jon noring
Message-ID: <2b.6ddabc74.2f549501@aol.com>

robert said:
>   English-language novels tend to go quickly.  
>   If they qualify for the "easy" queue, 
>   they spend little time queuing, and 
>   often complete proofreading within a few days. 

when i said "fly through d.p. in a few hours"
i meant literally, not figuratively.

(well, i guess the "fly" part was figurative,     :+)
but the "few hours" part was quite literal,
especially since first-time producers get to
go to the head of the queue.  go ahead, time it.)

-bowerbird

p.s.  a text that is already this clean would
most definitely be put into the "easy" queue.
moreover, re-doing existing e-texts might be
a very good test of the formatting rounds that
are now being contemplated for the d.p. future.
(of course, those are a complete waste of time,
in my humble opinion, but they _are_ the plan...)

p.p.s.  robert, didn't you make a d.p. forum post
that also listed a bunch of plain-text formats
recently?  or was that dazb?  or someone else?
From jon at noring.name  Mon Feb 28 12:33:30 2005
From: jon at noring.name (Jon Noring)
Date: Mon Feb 28 12:34:00 2005
Subject: [gutvol-d] Interesting message on TeBC from NetWorker (about fixing
	errors, Frankenstein, etc.)
Message-ID: <15820479578.20050228133330@noring.name>

[I'm forwarding the following message by "NetWorker" posted to The
eBook Community. Any followup to the specific points NetWorker
raises are maybe better posted over there, especially if you'd like
NetWorker to see your comments ("ebook-community" at YahooGroups).
NetWorker is very thorough...   Jon]


NetWorker wrote [a few days prior]:

>  Project Gutenberg e-texts may yet have a role to play in the
>  production of high-quality e-books. Rising to the bait, I have placed
>  a hold on my local library's one(!) copy of Frankenstein, which I
>  will scan when it arrives. I will try to create a highly structured
>  e-text (certainly not as fine as what Jon did with "My Antonia"). I
>  will then try to find a way to preprocess both the OCR'ed text and
>  the Project Gutenberg e-text in such a way that the two files can be
>  meaningfully "diffed." Hopefully, I can come up with a method that
>  will allow existing PG e-texts to be an automated "proofread" of next
>  generation Public Domain e-books.

Boy, _that_ was an interesting experience!

The project goals:

As an e-book consumer, I want an e-book that contains _lots_ of 
metadata; the more the better. I want the metadata to be patterned, so 
that I can use automated tools to manage a collection; sorting by 
author, genre, publication date, publisher, contributors such as editors 
and illustrators, etc. I also want the actual text to be marked-up in 
such a way that 1) I can view the text with all the presentational 
richness traditionally associated with a paper book, if I choose to do 
so,  2) I can convert unambiguously from one markup language to another, 
and 3) so that I can do a structural analysis of the book using 
automated tools. I also want a mechanism to know if apparent errors in 
the text are due to transcription errors or the author's intent -- this 
can be accomplished by including source information in the metadata, or 
providing access to page scans; both would be preferable.

Project Gutenberg e-texts satisfy none of these wishes; to create an 
e-book which _does_ satisfy them pretty much requires starting from 
scratch. Scanning technologies are quite advanced these days, but OCR is 
still not 100% accurate, and automated spell checking can only go so 
far. Clearly the most time-consuming -- and most error-prone -- part of 
producing a reasonably accurate e-book is proof-reading by a human 
being. My goal was to discover if Project Gutenberg e-texts, which are 
presumably fairly accurate as to the _words_, if nothing else, could be 
used as yet another automated preprocessing step to reduce typographical 
errors to a minimum before the actual proof-reading begins.

The process:

To test my theory, I decided to use the novel _Frankenstein_, by Mary 
Wollstonecraft Shelly. _Frankenstein_  is clearly in the public domain, 
is known to have at least two versions, and has been the subject of a 
fair amount of discussion on this list in the recent past.

I obtained a copy of Frankenstein from the public library; it was 
published in the "Barnes & Noble Classics" series in 2000. I was fairly 
pleased with the edition, as it was printed in a rather old-seeming 
type-face which gave the appearance that it was in fact a photo 
reproduction of a much older text; it seemed likely that it had not gone 
through much in the way of re-editing to modern conventions.

I scanned and OCR'ed the book using ABBYY FineReader. I then did a 
spell-check of the book from within FineReader so I could compared 
"misspelled" words to the actual scanned image. I then saved the text as 
an HTML file.

In the past I have written a couple of programs to help in the creation 
of e-books. TidyeBook is based on the HTML Tidy code base. It fixes some 
of the inaccurate HTML produced by ABBYY, strips headers and footers but 
leaves page numbers intact, if invisible, and merges broken paragraphs 
when it can do so without question. html2txt, based on an earlier C++ 
version of HTML Tidy, takes an HTML document and reduces it to simple 
text similar to that used by Project Gutenberg.

Next I ran "frankenstein.html" through TidyeBook to clean up the HTML. I 
then hand-edited the HTML to fix paragraph breaks not fixed in the 
automated process, or which should not have been broken. I also fixed 
those instances where hyphenated words spanned a page break (very easy 
to do given the output of TidyeBook). I then generated an Impoverished 
Text Format version of the HTML text using html2txt.

My strategy was to use the Gnu "diff" program to detect differences 
between the simplified version of my work product, and the Project 
Gutenberg version. Because "diff" is line-oriented I needed to normalize 
the two texts so there was a greater likelihood that lines would be 
correctly matched. I did this by writing yet another program (this could 
probably have been done more efficiently by a Perl or AWK script, but I 
am not very familiar with scripting languages, but am a highly 
proficient C/C++ programmer; it was easiest for me to use the tools at 
my disposal). The new program would reduce each file to lines of no more 
than 60 characters (the shorter the line the easier for a human to find 
the difference detected). Additionally, the program would start a new 
line whenever it encountered what is conventionally accepted as 
sentence-ending punctuation (!.?) or two newline characters in a row, 
which would signal the beginning of a new paragraph. All whitespace was 
reduced to a space character, including multiple whitespace characters.

I used the new program to normalize the text produced by html2txt and 
that of frank14.txt from Project Gutenberg. I then compared the two 
resultant files using gnu diff and Microsoft's WinDiff.

The results:

I was quite surprised to find literally thousands of differences between 
the two texts. Most of the differences were changes in punctuation and 
capitalization. Many em-dashes were converted to semicolons or omitted 
altogether, and many semicolons were converted to commas. Some words 
capitalized in my scan (eg. Paradise) were converted to lower case 
(paradise). Some phrases were "fixed" ("our uncle Thomas's book" became 
"our Uncle Thomas' book"; "an European" became "a European"). Some words 
were Americanized ("tranquillise" became "tranquillize") yet other words 
are not ("favourite" remained "favourite").

In an attempt to discover the source of these differences, I visited a 
number of not-so-local libraries, and checked out a number of different 
printings of _Frankenstein_. Two of the most interesting are Leonard 
Wolf's _The Annotated Frankenstein_, Clarkson N. Potter, 1977, which 
claims that "In order to ensure the authenticity of the text, we 
arranged with the Library of Congress in Washington, D.C., to microfilm 
a copy of the first edition. That text has been reproduced in this 
volume by the photo-offset process," and the Penguin Classics edition 
which includes an appendix identifying the differences between the 1818 
and 1831 editions (while significant, they are neither as pervasive nor 
as substantive as has been earlier suggested).

Neither of these editions contained the punctuation or spelling changes 
of the Project Gutenberg edition. One of the books I checked out, rather 
serendipitously as it turns out, was the Bantam Classic edition, which 
was first published in 1981. Of all the editions I consulted, only the 
Bantam edition contains virtually all of the changes I noted in the 
Project Gutenberg e-text. The PG edition is apparently based on Mary 
Shelly's revised 1831 edition (although it has lost both the "Author's 
Introduction to the Standard Novels Edition (1831)" by Mary Shell, and 
the "Preface" to the 1818 edition by Percy Shelly). I thus believe that 
the PG edition is based on the Bantam Classic edition of 1981.

<sidebar>
Interestingly, copyright law provides protection to changed versions of 
public domain texts if those changes are of a nature that they are more 
than mechanical and provide some modicum of creativity. Clearly, the 
punctuation changes are not merely mechanical, and in some cases 
actually change subtly the nuances of the text. Ironically, of all the 
textual bases that Project Gutenberg could have used for its e-text of 
_Frankenstein_, it choose the one which is apparently still protected by 
copyright!
</sidebar>

I modified my text normalization program to discard all punctuation 
except hyphens and underscores (and of course excluding the 
sentence-ending punctuation mentioned earlier). This reduced the noise 
to signal ratio enough that the differences started to become 
meaningful, although it still resulted in at least 500 differences. It 
allowed me to discover a handful of OCR errors that had been missed by 
the earlier automated methods, and I have so far also found a handful of 
errors in the PG text ("But must finish." should be "But I must 
finish.",  "every sight ... seem still to" should be "every sight ... 
seems still to", "destroy radiant innocence" should be "destroy such 
radiant innocence", etc.)

Conclusions:

Of course, the goal of this exercise was not to establish the provenance 
of the Project Gutenberg e-text of _Frankenstein_, nor to discover if 
there are any errors in the PG e-text, but to determine if there was an 
automated method of reducing errors in newly scanned e-books for which a 
Project Gutenberg e-text already exists. I'm afraid the jury is still 
out on this question. If the texts are a different as the PG edition of 
_Frankenstein_ and virtually all other editions, the process of sorting 
through the chaff to find the grain of wheat may not be worthwhile; I 
believe that the OCR errors discovered so far were blatant enough that 
they would have been easily discovered in the first proof-reading, and I 
believe that human proof-reading will always be required no matter how 
good our automated tools become.

Some time ago I produced an HTML e-book version of Mark Twain's 
_Pudd'n'head Wilson_. I believe I will put that e-book through that same 
process. I will then attempt a new scan of some other, perhaps more 
obscure, PD work that already has a PG version. Having at least three 
data points, I will report again later.

p.s. -- If someone from Project Gutenberg wants my diff file to update 
the PG e-text, I will be happy to e-mail it to you; it is approximately 
95k in size.

From Bowerbird at aol.com  Mon Feb 28 13:30:00 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Feb 28 13:30:20 2005
Subject: [gutvol-d] Interesting message on TeBC from NetWorker (about
	fixing errors, Frankenstein, etc.)
Message-ID: <1d7.37a74a84.2f54e758@aol.com>

jon said networker said:
>   Project Gutenberg e-texts satisfy none of these wishes

well, i guess networker will have to start his own project, eh?
give him my best wishes!         :+)


>   Conclusions
>   Of course, the goal of this exercise was not to establish the 
>   provenance of the Project Gutenberg e-text of _Frankenstein_, 

maybe not.  but having done so, it is _refreshing_ to know that
-- when that's factored in -- only "a handful" of errors surface.

so once again, in spite of some very big noises, it ends up that
this fails to stand as a good example of an error-ridden e-text.


>   nor to discover if there are any errors in the PG e-text, but 
>   to determine if there was an automated method of reducing errors 
>   in newly scanned e-books for which a  Project Gutenberg e-text 
>   already exists. I'm afraid the jury is still out on this question.

as for this "conclusion", the jury may still be out in _his_ mind,
but in mine, the answer is very clear, and i've said it before here:
if you do the scanning properly, manipulate those scans correctly,
use abbyy in the best way, and subject its results to the right tools,
you will reduce the errors in your text to a relatively small number.

(the number we've been kickin' around is 1 error for every 10 pages,
and at that point, proofreading by the public becomes very viable.)

if you then have the rare luxury of evaluating your output against
an existing version of the book -- like a project gutenberg e-text --
with the right tool (which networker obviously does not yet have),
the comparison between the two, alongside the page-images, should
make the process of coming to an error-free version simply a breeze.

since this is _exactly_ what will need to be done _increasingly_,
as the page-images from the internet archive and (we hope) google
-- plus the work done by individual people scanning everywhere --
emerge into cyberspace, that's where my tool-development efforts
are now being focused.  i suggest networker start reading my blog;
it should start being updated on a daily basis starting next week...

-bowerbird
From Bowerbird at aol.com  Mon Feb 28 13:46:49 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Feb 28 13:47:01 2005
Subject: [gutvol-d] ok, this is my last post for the time being, really
Message-ID: <129.57a41ac4.2f54eb49@aol.com>

jon said:
>   Bowerbird, I'll be happy to put up 
>   a ZML regularized text version of My Antonia.

if you prepare your browser-based versions carefully,
copying text from the browser-window will give z.m.l., 
so there's no real reason to create a separate version...

(of course, if you don't do the correct preparation...)

for me, "round-tripping" is one of the biggest priorities.
and by that, i mean that when a z.m.l. file is presented,
the end-user should be able to copy text out _as_ z.m.l.,
so that -- with just a few global search-and-replaces --
when reloaded into a z.m.l.-viewer, it will look the same.

(even if its text-styling is stripped away, as can happen
when it's saved as a plain .txt file, it should be restored.
automatically.)

i have attained this in the z.m.l. viewer-program already.
(this was pretty simple, as i control all the operations.)

i've also attained it in my .pdf version, where i'm able to
work around the limitations of acrobat's copy operation
-- if you've ever copied text from a .pdf, you realize how
awful it mangles the formatting -- by controlling what
my viewer-program writes to the .pdf in the first place.

(to answer the first question of a knowledgeable person,
i write a dummy-line as a separator between paragraphs,
so a global replace restores the blank line between them.)

when i get around to making my zml-to-html converter,
i will try to make sure that the .html that's created will
copy out of the browser-window correctly too.  however,
browsers do some funky crap in their copy operations, so
it might not be possible to preserve _everything_, at least
until the browser-programmers tighten up their act there.
when i do that work, i'll share any tips people need to know
in order to prepare an .html version to copy out good .zml...

but in many cases, even now, a copy out of a browser-window
can produce text that is .zml, or can be easily converted to it.
for instance, jon, your website that gives your listserve rules
creates a nice .zml file.  consistent formatting yields good .zml.

***

jon said:
>   Yes, if this is the case, it is mysterious 
>   since IA will gladly host them 
>   once the etext version is out the door.

sometimes i wonder if the internet archive is 
quite as accommodating as you always seem to
make them out to be.  i don't know otherwise,
but it seemed if they were, then d.p. would've
put their scans up long ago.  (unlike their site,
page-scans wouldn't need quick response-time.)


pourlean said:
>   An accessible archive of posted 
>   projects & images is in the works.

i look forward to the day it comes online!

(if there is any particular stumbling block,
do please let me know, as maybe i can help.)

***

jon said:
>   It's on me.  In private email send me your address and 
>   I'll burn and mail you a disk of the 600 dpi and 120 dpi scans

hey, thanks for the gift, jon, i appreciate it!  but i can't use
30 megs worth of scans in my little project; it's just a demo.

so i wrote a quick program to grab a few dozen from the site.
and now that i've done that, i can grab 'em all, if i ever need;
since i was just looking for a way to do it in one fell swoop,
i shoulda just done that straightaway, instead of bugging you.
but maybe someone else will now be able to make good use of
the zipped package of the scans that you added to your site...

-bowerbird

p.s.  now if you'll all excuse me, i really need to go back to work...      
:+)
From jon at noring.name  Mon Feb 28 14:55:13 2005
From: jon at noring.name (Jon Noring)
Date: Mon Feb 28 14:55:25 2005
Subject: [gutvol-d] Interesting message on TeBC from NetWorker (about
	fixing errors, Frankenstein, etc.)
In-Reply-To: <1d7.37a74a84.2f54e758@aol.com>
References: <1d7.37a74a84.2f54e758@aol.com>
Message-ID: <11628983187.20050228155513@noring.name>

Bowerbird wrote:
> jon said networker said:

> maybe not.  but having done so, it is _refreshing_ to know that
> -- when that's factored in -- only "a handful" of errors surface.

But the bigger issue is not constrained to errors (differences) with
respect to the source text used, as you continue to focus on. The
issues deal with the larger areas of trust, verifiability, proper
digital preservation of the Public Domain, and using acceptable
sources (with proper documentation), not just blindly grabbing
anything off the shelf as what appears happened to the PG version of
Frankenstein, which now exposes PG to legal liability.

The lack of proper processes, procedures and guidelines to build the
non-DP portion of the PG library (which comprises about half the
collection and is heavily skewed towards the more classic works), is
leading to serious questions about the integrity and trustworthiness
of the whole PG library (I've discussed this at length the last
couple weeks on The eBook Community.) It can certainly be fixed, but
the fix will require:

1) redoing most of the non-DP works using DP,

2) Proper selection of sources so they are acceptable, both legally and
   from those knowledgeable as to the better sources to use, and

3) Proper documentation as to source, including making available all
   the original page scans (and not just the title page, which proves
   nothing.)


(Btw, NetWorker presented evidence in his message to indicate PG's
version of Frankenstein was taken from a copyrighted edition, that
itself had a significant number of emendments from the original,
which in essence act like a "fingerprint" as to the pedigree. This is
NOT good. It casts PG's archive in a negative light, and may even lead
to a legal demand by Bantam for PG to remove the current version of
Frankenstein. It also calls into question the provenance of a large
number of other pre-DP texts where there's no source metadata given
and no page scans to prove proper provenance. NetWorker himself is a
former attorney, and he has thoroughly researched copyright law the
last couple years as it relates to ebooks, so Michael and Greg should
seriously sit up and take notice of the problem with PG's version of
Frankenstein, and many of its other texts where an acceptable source
cannot be demonstrated. Even if PG is "right" in a legal sense, that
it could use the 1981 Bantam Classics edition as it *might* have done,
does it want to even fight this in court, or to try to explain it
away to the trusting public?)


> so once again, in spite of some very big noises, it ends up that
> this fails to stand as a good example of an error-ridden e-text.

Well, at least you seem to indicate from your interest in very low
error rate OCR that every etext PG includes in its archive should be a
textually faithful reproduction of some known source. That is, if any
post-emendments are done, that they should be properly documented.
Otherwise, leave the text as it is in the print source.

Is this your thinking, or do you believe that textual faithfulness
and proper source identification and verification are not necessary
at all? That is, just let people take any text in the PG library and
then "edit it" as they see fit?


> if you do the scanning properly, manipulate those scans correctly,
> use abbyy in the best way, and subject its results to the right tools,
> you will reduce the errors in your text to a relatively small number.

I don't believe anyone disagrees with you here in general. But
NetWorker was not only interested in OCR errors, but the bigger
issues as mentioned above -- they are all interlinked.


> (the number we've been kickin' around is 1 error for every 10 pages,
> iand at that point, proofreading by the public becomes very viable.)

I doubt this error rate (let's say for even half of the public domain
printings out there) is accomplishable without sentient-level AI. But
if proofreading is to be done anyway by the public, as is *now done*
by DP, what difference is there between an OCR error of one every 10
pages, and one every page?

The key is that for the aspect of building *trust* in the final
product, it is a very good idea to involve the volunteer proofreaders
to go over the texts, even if *you don't have to*. Having (and proving
to anyone who asks) at least two independent people who proofed every
page, adds to its trustworthiness. Include source metadata, and access
to the original page scans used as the source, and the highest level
of trust is built (as well as greater immunity to legal challenge.)
That's what makes DP's system so powerful.

But look at PG's edition of Frankenstein:

1) Which original edition it represents is not documented (Mary Shelley
   issued two substantially different editions). I think the reader
   should know which one it is in the PG cataloging information. This
   lack of care about different editions is troubling.

2) The source document is not given at all. I'm not sure if the person
   who did the first etext version is even recorded anywhere (or even
   known.)

   (Btw, this person, should Bantam press the issue, which I hope they
   don't, would probably become a co-defendent. This shows that the
   lack of proper guidelines, processes and verification methods in
   the building of the non-DP portion of PG's collection exposes the
   volunteer donors of texts to potential legal liability! This is
   another demonstration that if a project is to do something, it
   needs to *do it right* from the start, and not just do the
   "ready-fire-aim" approach to everything.)

3) It is unknown what subsequent "edits" were done along the way --
   they are not documented, as far as I know. (How do we know that
   whole paragraphs were removed or inserted?)

4) It now appears, but is not proven, that the source document was the
   1981 Bantam Classics edition.


This certainly does not give one warm fuzzies as to the trustworthiness
of the non-DP portion of the PG collection.

As a user of PG texts, it is important, for both moral, legal and
aesthetic reasons, that the texts are:

1) textually faithful reproductions of *known* sources,

2) provable as such (include access to the full page scans, and not just
   the title page), and

3) the sources of which are themselves acceptable to use, both legally
   and from those knowledgeable (both professional and amateur) with
   the Work in question. (For Works which were only published once and
   never republished by anyone, this last point does not apply provided
   the source is itself Public Domain.)


> if you then have the rare luxury of evaluating your output against
> an existing version of the book -- like a project gutenberg e-text --
> with the right tool (which networker obviously does not yet have),
> the comparison between the two, alongside the page-images, should
> make the process of coming to an error-free version simply a breeze.

There will always be hand work necessary to compare two different
etexts of the same Work (note that oftentimes there are multiple
editions of multiple versions: The Work/Expression/Manifestation (WEM)
principle.)

Even the issue of hyphenation of compound words requires a human
being to ascertain what the author intended. Of course, if this is
not important to you, then what can I say?


>since this is _exactly_ what will need to be done _increasingly_,
>as the page-images from the internet archive and (we hope) google
>-- plus the work done by individual people scanning everywhere --
>emerge into cyberspace, that's where my tool-development efforts
>are now being focused.  i suggest networker start reading my blog;
>it should start being updated on a daily basis starting next week...

Tools such as yours will likely work for some types of texts, and
not work for others, where there'll be a need for human beings to
not only proof for errors, but to properly structure the document.

I'm now assessing the digitizing of records of historical and
genealogical significance, and these documents usually have quite
complex table layouts, very poor quality printing (and oftentimes
handwriting). Scans of these records are insufficient for use, so
having human beings read them and transcribe the information into
properly structured etext form is necessary.

I'll post an announcement to TeBC of your blog if you'd like me to
(although I don't know the address of your blog -- had it and then
lost it.)

Jon

From Bowerbird at aol.com  Mon Feb 28 16:53:21 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Feb 28 16:53:42 2005
Subject: [gutvol-d] Interesting message on TeBC from NetWorker (about
	fixing errors, Frankenstein, etc.)
Message-ID: <80.22755cdc.2f551701@aol.com>

jon said:
>   But the bigger issue is not constrained to errors (differences) 
>   with respect to the source text used, as you continue to focus on.

i think it was you who made "errors" the issue,
revolving around the concept of "trustworthiness".

if, once that house of cards falls down, you want to
turn the issue to one of "which source-text to use",
well then i think that michael's "i'm open to all of 'em"
stance covers _that_ quite nicely, thank you very much.

if you don't like the version of my antonia that's in the library now,
add your own!  the same goes for all the versions of "frankenstein".
casting aspersions on the edition that _is_ there isn't constructive.
provide all the meta-data you want on the version that you furnish;
heck, you can even put a pointer in to your project at librarycity.org;
these days i see a lot of e-texts referencing an .rtf version in france.


>   the PG version of Frankenstein, 
>   which now exposes PG to legal liability.

i don't agree.  but if the lawyers to whom "bantam classics" is 
paying good money decide to send a cease-and-desist, let 'em.

going by results obtained by the "gone with the wind" lawyers,
the project gutenberg people will probably fold very quickly;
without any money, you can't play poker against deep pockets.

but hey, i would like to hear the laughter that would resound
when bantam's lawyers argued that the way they can _prove_
that this e-text copied their book is because of the _errors_
(map-makers can pull that trick.  but book-publishers?  ha!)

who knows, jon, maybe the project gutenberg lawyers will call
_you_ to the stand, to throw your arms in the air and rant about
how those terrible mistakes are ruining the fragile public domain,
and therefore bantam doesn't _deserve_ the protection of the law.
wouldn't that be ironic?       :+)


>   The lack of proper processes, procedures and guidelines

well, i don't agree with that either, jon.
you might not agree with the procedures,
but that doesn't mean there is a "lack" of them.

maybe you don't agree with their choice of source-text
for frankenstein.  but it _was_ good enough for bantam.


>   is leading to serious questions about the integrity 
>   and trustworthiness of the whole PG library

not in my mind.  and not in the minds of most people, i don't think.
not any more so than with any paper-book i might find in a store.
like the "frankenstein" version that was being _sold_ by bantam.


>   1) redoing most of the non-DP works using DP,

let's find out how many d.p. people want me to go over _their_ work
with a fine-tooth comb.  go ahead, speak up, i'd _love_ the challenge.


>   Well, at least you seem to indicate from 
>   your interest in very low error rate OCR 
>   that every etext PG includes in its archive 
>   should be a textually faithful reproduction 
>   of some known source. 

not necessarily.  if someone wants to play editor and
combine editions, i don't have any problem with that.
in some sense, that's what the public domain is about.
i don't see it in black/white terms as something frozen.

if you _are_ going to represent something as faithful,
i think it should _be_ faithful.  but  even then, that is
_to_the_best_of_your_ability_.  as long as you do that,
and give your end-users a means of "checking your work", 
including a solid mechanism for improving it to perfection,
then i think you've done your job.  so yes, i agree with you,
that scans should absolutely be furnished to the end-users,
for works that purport to replicate that edition, certainly...

however, i understand why they haven't been, up to this point,
and so do you -- disk-space just hasn't been affordable enough,
even now, if it were not for the largess of ibiblio and brewster,
we couldn't even be entertaining the thought of posting the scans.


>   I doubt this error rate (let's say for even half of the public domain
>   printings out there) is accomplishable without sentient-level AI. 

i'm trying to get back off this listserve.  i don't like contributing
to the discourse in a place where my voice has been muffled before.

so let me set up a place where you and i can fight... i mean, discuss...

but this doubt of yours is rather easy to dispel, and quickly.

you did a pretty good job of scanning that copy of "my antonia".
and it looks like you processed (e.g., straightened) the scans well.
so now we need to put them through o.c.r., using abbyy finereader;
please have that done as follows:  save results out to an .rtf file,
one for each page; retaining line-breaks and paragraph indentation.
do this for 20-50 pages, and zip the output up and e-mail it to me.
i will reply to you with feedback on if the o.c.r. was done correctly.
then i'll run it through programs that will soon be made available,
at no cost, and we'll see what kind of an error-rate we end up with.

or, if you prefer, follow this same procedure with some other book.

then, if you still want to discuss this matter, we'll do it elsewhere.


>   But if proofreading is to be done anyway by the public, 
>   as is *now done* by DP, what difference is there between 
>   an OCR error of one every 10 pages, and one every page?

when i talk about "the public", i mean _end-users_ who
are reading the book for the purpose of reading the book, 
and _not_ specifically to be "proofreading" it per se.

for that type of reader, one error on every page is too many,
but one error on every tenth page is not.  especially since
-- if we give them an easy means of checking for errors and
reporting them, and then reward readers for finding them -- 
errors won't persist for very long, and the e-text will instead
progress very quickly on its merry way to a state of perfection.

in a practical sense, this means that before you turn an e-text
loose for download in an all-in-one file, you make it available
_page-by-page_ on the web.  anyone who might want to read it
has to do so in that form.  right alongside the text for each page
is the image, so the person can easily check any possible errors.
you let 'em know you are asking for their help to find mistakes.
if they find one, they fill out a form right on the page, and their
input is recorded -- wiki-style -- immediately.  later readers
can either confirm the error, or question it, or make comments.
first person to find each error gets a credit in the final e-text.

you also give people a viewer-program that allows them to
download the appropriate page-image if they suspect an error
-- displaying it right there in the viewer-app next to the text --
and which simplifies the process of reporting it if they find one.
(by, for instance, filling out an e-mail they can send with a click.)


>   The key is that for the aspect of building *trust* in 
>   the final product, it is a very good idea to involve 
>   the volunteer proofreaders to go over the texts, 
>   even if *you don't have to*.

what i just described does a good job of doing that.

this is the system of "continuous proofreading"
i outlined on this listserve a very long time ago.
you recently mistakenly credited it to james linden.

my offer to develop this system was largely snubbed.
for _that_, the project gutenberg "people in charge"
rightly deserve to be criticized.  for the tiny stuff
that you have been complaining about, they do not...


>   Having (and proving to anyone who asks) at least 
>   two independent people who proofed every page, 
>   adds to its trustworthiness. 

not nearly as well as putting text and image side-by-side, and 
allowing any number of "volunteer proofreaders" to examine 'em.

you might be surprised by the number of errors that "slip by"
the proofreaders through two rounds of eyeballing over at d.p.
(indeed, many even slip by the "third round" of post-processing
and whitewashing, and sit there big and ugly in the final e-text.)

even if a dozen people look at a page, an error might _still_ be there.

but with eternal transparency, there is always hope it will be fixed.

anyway, jon, i hope you take up the friendly challenge i issued here.
and if any d.p. people want to call me on the challenge i made to them,
you just let me know.

in the meantime, i'll let you get in the last word on this thread, 
jon, because i _really_ need to be going.  use it wisely...        ;+)

-bowerbird