From traverso at dm.unipi.it  Wed Dec  1 02:25:59 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Wed Dec  1 02:26:23 2004
Subject: [gutvol-d] request for input (first timer here)
In-Reply-To: <20041130172908.237d8b64.stuart@ww1aviationlinks.cjb.net>
	(message from stuart on Tue, 30 Nov 2004 17:29:08 -0800)
References: <20041130172908.237d8b64.stuart@ww1aviationlinks.cjb.net>
Message-ID: <200412011025.iB1APx35025750@posso.dm.unipi.it>

>>>>> "stuart" == stuart  <stuart@ww1aviationlinks.cjb.net> writes:

    stuart> I am starting on a project to convert Jane's All The
    stuart> World's Aircraft 1919 to an ebook, any suggestions
    stuart> welcome. If interested, take a look at
    stuart> http://ww1aviationlinks.cjb.net/janes/index.html
    stuart> _______________________________________________ gutvol-d
    stuart> mailing list gutvol-d@lists.pglaf.org
    stuart> http://lists.pglaf.org/listinfo.cgi/gutvol-d

The 1913 edition has been proofread by Distributed Proofreaders, and
is currently in post-processing. You might contact the post-processor
for mutual suggestions. 

Carlo

From flis at detk.com  Wed Dec  1 11:35:12 2004
From: flis at detk.com (William Flis)
Date: Wed Dec  1 11:28:47 2004
Subject: [gutvol-d] Unicode versions?
Message-ID: <LBELIICCBHDEONNDCACJAEIGCHAA.flis@detk.com>

I prepared ("postprocessed" at Distributed Proofreaders) also a Unicode
(UTF-8) version of this book, but it doesn't seem to have made it to posting
at PG. The UTF-8 elements were all pronunciation symbols, which one might
expect to be important in such a book. I'm currently working on another
volume in this series, with similar symbols. I've also been working on a
book on Native American sign language, which contains a good number of
special symbols used in transcribing the NA spoken languages, also requiring
UTF-8.

So my first question is, did my Unicode version just get lost somewhere? (I
usually upload directly to PG but submitted this one the long way around
through DP's "Post-Processing Verification" system so someone else would
take a look at it, since it was my first attempt at Unicode.)

Second, if not, are Unicode versions welcome?

Bill Flis

> Society for Pure English, Tract 2, on English Homophones, Robert
> Bridges 14227
>    [Link: http://www.gutenberg.net/1/4/2/2/14227 ]
>    [Files: 14227.txt; 14227-8.txt; 14227-h.htm]

From dwidger at adelphia.net  Wed Dec  1 11:41:11 2004
From: dwidger at adelphia.net (David Widger)
Date: Wed Dec  1 11:41:17 2004
Subject: [gutvol-d] Unicode versions?
References: <LBELIICCBHDEONNDCACJAEIGCHAA.flis@detk.com>
Message-ID: <009401c4d7dd$b64e1010$6901a8c0@novocon.net>


----- Original Message ----- 
From: "William Flis" <flis@detk.com>
To: <gutvol-d@lists.pglaf.org>
Sent: Wednesday, December 01, 2004 2:35 PM
Subject: [gutvol-d] Unicode versions?


> I prepared ("postprocessed" at Distributed Proofreaders) also a Unicode
> (UTF-8) version of this book, but it doesn't seem to have made it to posting
> at PG. The UTF-8 elements were all pronunciation symbols, which one might
> expect to be important in such a book. I'm currently working on another
> volume in this series, with similar symbols. I've also been working on a
> book on Native American sign language, which contains a good number of
> special symbols used in transcribing the NA spoken languages, also requiring
> UTF-8.
>
> So my first question is, did my Unicode version just get lost somewhere? (I
> usually upload directly to PG but submitted this one the long way around
> through DP's "Post-Processing Verification" system so someone else would
> take a look at it, since it was my first attempt at Unicode.)
>
> Second, if not, are Unicode versions welcome?
>
> Bill Flis
>
> > Society for Pure English, Tract 2, on English Homophones, Robert
> > Bridges 14227
> >    [Link: http://www.gutenberg.net/1/4/2/2/14227 ]
> >    [Files: 14227.txt; 14227-8.txt; 14227-h.htm]
>

Hi Bill,

Unicode is very welcome.  Here is the note I sent to Frank this morning and should
have sent a copy to you.

Hi Frank,

I have been toying with this file for several days.  The original problem was your
provision of two html files one Latin-1 and one Unicode.  We can only post one
html file in the directory for the eBook.  There is one way around this (and the
one I was thinking of trying) which is to make a main html file with a links to
the two html files you provided.  However this went down the drain when I found
the utf-8 html file has an invalid CSS statement (see the attached W3C CSS
validation report).  So I elected to post the valid Latin-1 html file and the text
file alone.

If you object to my approach kindly provide a file such as I suggested above with
links to both html files and be sure that all the html files validate on all three
W3C checks.

Thanks,

David

PS.  I sent Frank a copy of the CSS validator report but no longer have
it--something about not allowing content in the prolog which I did not understand.

DW


> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From marcello at perathoner.de  Thu Dec  2 06:02:13 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Dec  2 06:02:19 2004
Subject: [gutvol-d] Anybody want to test of this gutenberg browser ?
Message-ID: <41AF2065.9030900@perathoner.de>

-------- Original Message --------
From: Clif Flynt <clif@cflynt.com>
Message-Id: <200412012104.iB1L4rm27796@clif.cflynt.com>
Subject: Re: Project Gutenberg Browser
To: marcello@perathoner.de (Marcello Perathoner)
Date: Wed, 1 Dec 2004 16:04:53 -0500 (EST)


Hi,
   My apology for being slow in replying to your mail.  Being a
spare-time project, the TkGutenbrowser doesn't get as much attention as
it deserves, and the move from "works-for-me" to
"suitable-for-general-use" is always slower than I'd like.

The browser software is currently at
http://www.mod3.net:~clif

   This is very much a work-in-progress, but the software does what I
consider the minimal set of tasks now.  It will read and display text,
save text as PDB for C-Spot-Run on a PDA, and download non-text
documents (images, sound files) to a disk file.  I've not added support
for bookmarks yet and the help is rudimentary.

   I believe that Project Gutenberg has changed the data in the catalog.rdf
file since I started the project, and it appears that most (all?) of
the information is now in catalog.rdf, instead of being split between
catalog.rdf and GUTINDEX.ALL.  My browser is only using catalog.rdf
now, and not downloading the other files.

...

	Clif

.... Clif Flynt ... http://www.cflynt.com ... clif@cflynt.com ...
..Tcl/Tk: A Developer's Guide (2'nd edition) - Morgan Kauffman ..


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Gutenberg9443 at aol.com  Thu Dec  2 07:52:00 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Thu Dec  2 07:52:18 2004
Subject: [gutvol-d] interim report on eBookWise reader, info about Rocket
Message-ID: <1a7.2c565d81.2ee09420@aol.com>

Hi all--
 
One quick note on the Rocket: If your power cord has gone out, call Fox  
International at 1-800-321-6993 for a replacement. If you're in the US it will  
cost about $33 including shipping. I don't know whether you can even get it  
outside the US. Don't try to go online to get it so you'll have a printout of  
the order; they'll just tell you to call the phone number.
 
Now to the eBookWise reader:
 
Judging from the info I have so far, its footprint is slightly larger than  
that of the Rocket, but it weighs less--about a pound, whereas Rocket weighs 22 
 oz. It's still slightly smaller than a trade paperback, though, and weighs  
considerably less. Some of its controls seem counterintuitive to me, but that  
may be just because I've used the Rocket so many years.
 
There are thousands (about 7,500 to be precise) of commercial books for  
those who can afford to buy them; they're at eBookWise.com, which is a  subsidiary 
of FictionWise.com, and is presently engaged in converting  all, or almost 
all, of its content into an eBookWise format. Some of them are  new books, often 
bestsellers, and others are classics that are not yet out of  copyright. If I 
had it to spend, I could spend a thousand dollars there in two  shakes of a 
puppy-dog's tail. Alas, I don't have a thousand dollars. 
 
Although loading your own content is right now rather clunky, involving  
upload to a server and then download through a telephone line, it is  doable. You 
can keep your own bookshelf at the server, and download by  telephone, so you 
don't have to take your computer on the road with you to  change the books in 
your reader. They have their software engineers working on a  direct USB 
download program, the kind Rocket has, but it isn't ready yet. This  close to 
Christmas, they made the decision to make the device available now and  fix the 
software later.
 
This is a quotation from eBookWise's propaganda sheet:
 
"In addition, the eBookwise-1150 can display your own personal content in  
the following file formats: plain text (.txt), rich text format (.rtf),  
Microsoft Word documents (.doc), HTML (.htm or .html), and Rocket eBook Editions  
(.rb)."
 
So that means that just about any free online book is readable on this  
reader, except that far under 1% that are available only in PDF. So--PG, PG Oz,  
Blackmask, _http://www.sacred-texts.com_ (http://www.sacred-texts.com)  ,  
Phoenix-Library.com and umpteen dozen other sites are now your oyster, in a  reader 
you can fit into any briefcase or backpack and most large-size handbags.  
(Phoenix-Library has good language-to-language dictionaries in a surprisingly  
large number of languages.)
 
It will hold only about 20 texts at a time, which was also normal for the  
Rocket unless you bought the extra large storage device when you ordered your  
Rocket. I did, and my Rocket holds about 50 texts. However, it has a slot for a 
 SmartMedia card; I checked, and you can get SmartMedia cards holding 
anything  from 4 mg up to half a gig. (The half-gig card costs about $250.)
 
It does not do much with illustrations, as it is grayscale only. However,  it 
appears that at least some small illustrations can be put into it.
 
If you want one right now, go to eBookWise and create an account. Put $110  
into your account. You can immediately order your machine, which will use 10  
cents short of the entire $110, but it will then give you $20 in book credits.  
Wait to buy them until your device arrives, but you can go through and select 
 them and put them into your cart now. Of course, you can deposit as much 
money  as you want to into your account, and put the books into your shopping 
cart,  to purchase when your device arrives.
 
This is not the best possible ebook reader, but it is the best presently  
available for anybody who is not content to read ebooks only on a computer or a  
PDA.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041202/cb030670/attachment.html
From joshua at hutchinson.net  Thu Dec  2 08:10:12 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Dec  2 08:10:17 2004
Subject: [gutvol-d] interim report on eBookWise reader, info about Rocket
Message-ID: <20041202161012.185664F4C6@ws6-5.us4.outblaze.com>

Good review... Just one quick quibble.  The SmartMedia card is only supported up to 128MB and it lists for $30.11 at Amazon (which means you can probably find it cheaper other places, but that's a good ballpark figure).

Josh

----- Original Message -----
From: Gutenberg9443@aol.com
To: gutvol-d@lists.pglaf.org
Subject: [gutvol-d] interim report on eBookWise reader, info about Rocket
Date: Thu, 2 Dec 2004 10:52:00 EST

> 
> Hi all--
> 
> One quick note on the Rocket: If your power cord has gone out, call Fox
> International at 1-800-321-6993 for a replacement. If you're in the US it will
> cost about $33 including shipping. I don't know whether you can even get it
> outside the US. Don't try to go online to get it so you'll have a printout of
> the order; they'll just tell you to call the phone number.
> 
> Now to the eBookWise reader:
> 
> Judging from the info I have so far, its footprint is slightly larger than
> that of the Rocket, but it weighs less--about a pound, whereas Rocket weighs 22
>   oz. It's still slightly smaller than a trade paperback, though, and weighs
> considerably less. Some of its controls seem counterintuitive to me, but that
> may be just because I've used the Rocket so many years.
> 
> There are thousands (about 7,500 to be precise) of commercial books for
> those who can afford to buy them; they're at eBookWise.com, which is a  
> subsidiary
> of FictionWise.com, and is presently engaged in converting  all, or almost
> all, of its content into an eBookWise format. Some of them are  new books, 
> often
> bestsellers, and others are classics that are not yet out of  copyright. If I
> had it to spend, I could spend a thousand dollars there in two  shakes of a
> puppy-dog's tail. Alas, I don't have a thousand dollars.
> 
> Although loading your own content is right now rather clunky, involving
> upload to a server and then download through a telephone line, it is  doable. 
> You
> can keep your own bookshelf at the server, and download by  telephone, so you
> don't have to take your computer on the road with you to  change the books in
> your reader. They have their software engineers working on a  direct USB
> download program, the kind Rocket has, but it isn't ready yet. This  close to
> Christmas, they made the decision to make the device available now and  fix the
> software later.
> 
> This is a quotation from eBookWise's propaganda sheet:
> 
> "In addition, the eBookwise-1150 can display your own personal content in
> the following file formats: plain text (.txt), rich text format (.rtf),
> Microsoft Word documents (.doc), HTML (.htm or .html), and Rocket eBook 
> Editions
> (.rb)."
> 
> So that means that just about any free online book is readable on this
> reader, except that far under 1% that are available only in PDF. So--PG, PG Oz,
> Blackmask, _http://www.sacred-texts.com_ (http://www.sacred-texts.com)  ,
> Phoenix-Library.com and umpteen dozen other sites are now your oyster, in a  
> reader
> you can fit into any briefcase or backpack and most large-size handbags.
> (Phoenix-Library has good language-to-language dictionaries in a surprisingly
> large number of languages.)
> 
> It will hold only about 20 texts at a time, which was also normal for the
> Rocket unless you bought the extra large storage device when you ordered your
> Rocket. I did, and my Rocket holds about 50 texts. However, it has a slot for a
>   SmartMedia card; I checked, and you can get SmartMedia cards holding
> anything  from 4 mg up to half a gig. (The half-gig card costs about $250.)
> 
> It does not do much with illustrations, as it is grayscale only. However,  it
> appears that at least some small illustrations can be put into it.
> 
> If you want one right now, go to eBookWise and create an account. Put $110
> into your account. You can immediately order your machine, which will use 10
> cents short of the entire $110, but it will then give you $20 in book credits.
> Wait to buy them until your device arrives, but you can go through and select
>   them and put them into your cart now. Of course, you can deposit as much
> money  as you want to into your account, and put the books into your shopping
> cart,  to purchase when your device arrives.
> 
> This is not the best possible ebook reader, but it is the best presently
> available for anybody who is not content to read ebooks only on a computer or a
> PDA.
> 
> Anne

>
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From Gutenberg9443 at aol.com  Thu Dec  2 08:31:59 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Thu Dec  2 08:32:29 2004
Subject: [gutvol-d] interim report on eBookWise reader, info about Rocket
Message-ID: <199.3375a734.2ee09d7f@aol.com>

 
In a message dated 12/2/2004 9:10:38 AM Mountain Standard Time,  
joshua@hutchinson.net writes:

The  SmartMedia card is only supported up to 128MB and it lists for $30.11 at 
 Amazon (which means you can probably find it cheaper other places, but 
that's  a good ballpark figure).


Thank you. I wasn't sure about that. It's $35.18 at CompUSA, plus shipping,  
so the Amazon price is probably about the best. Well, 128 MB will hold about 
256  average texts, so that plus the built-in storage for another 20 books  
should suffice most people for the average airline trip or emergency room  wait! 
The only time I read more than one complete book in either of those  
situations was the first time I flew after 9/11. I had to get to the airport two  hours 
early and then the plane was two hours late. I spent most of the time  
sitting on the floor with my Rocket plugged in so I wouldn't drain the  battery.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041202/255aad44/attachment.html
From blondeel at clipper.ens.fr  Thu Dec  2 20:28:39 2004
From: blondeel at clipper.ens.fr (Sebastien Blondeel)
Date: Thu Dec  2 20:28:56 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
Message-ID: <20041203042839.GA3074@clipper.ens.fr>

Hello,

I hacked some scripts doing the following:

RTF -> XML

RTF: from Word, using a (very) simple stylesheet: just
  paragraphs, 3 title levels, footnotes, and italics
  Meta-information is in the properties of the document.
  My script can extract images too, if wanted.

XML: using a personal and simple DTD (embedded), probably easy to port
  to any more complete DTD, such as TEI

This is the hard part, and I am never quite sure it will not break in
case the Word file is weird.

>From that, I then did other (proof-of-concept) scripts to produce:

XML -> PG TXT
XML -> (LaTeX) -> PDF, DVI, PS (with hyperlinks)
XML -> valid HTML 4.01 (probably useless)
XML -> XHTML 1.0 Strict with some CSS (embedded)

The programming is very defensive, so when all transforms finish I am
confident enough the stuff is right.

You can find examples of those formats at
http://www.eleves.ens.fr/home/blondeel/ebooksgratuits/
(most of the books there don't have the meta-info properly set up,
 so don't worry too much about that).

My scripts also clean up small typography mistakes (they are specialized
in French rules but can of course be taught any thing). They will be
used to help give PG nicer formats from the ebooksgratuits team (until
now their Word macros could only produce PG TXT, which is not very sexy
to read for the end user).

Regards,
From joshua at hutchinson.net  Fri Dec  3 05:46:10 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Dec  3 05:46:13 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
Message-ID: <20041203134610.93A12EDAFD@ws6-1.us4.outblaze.com>

I'm curious to see if your script can handle tables.  That is our current biggest bugaboo when it comes to transforming to PG TXT format.

Josh


----- Original Message -----
From: "Sebastien Blondeel" <blondeel@clipper.ens.fr>
To: gutvol-d@lists.pglaf.org
Subject: [gutvol-d] XML version of some books of PG (and other formats)
Date: Fri, 3 Dec 2004 05:28:39 +0100

> 
> Hello,
> 
> I hacked some scripts doing the following:
> 
> RTF -> XML
> 
> RTF: from Word, using a (very) simple stylesheet: just
>    paragraphs, 3 title levels, footnotes, and italics
>    Meta-information is in the properties of the document.
>    My script can extract images too, if wanted.
> 
> XML: using a personal and simple DTD (embedded), probably easy to port
>    to any more complete DTD, such as TEI
> 
> This is the hard part, and I am never quite sure it will not break in
> case the Word file is weird.
> 
> > From that, I then did other (proof-of-concept) scripts to produce:
> 
> XML -> PG TXT
> XML -> (LaTeX) -> PDF, DVI, PS (with hyperlinks)
> XML -> valid HTML 4.01 (probably useless)
> XML -> XHTML 1.0 Strict with some CSS (embedded)
> 
> The programming is very defensive, so when all transforms finish I am
> confident enough the stuff is right.
> 
> You can find examples of those formats at
> http://www.eleves.ens.fr/home/blondeel/ebooksgratuits/
> (most of the books there don't have the meta-info properly set up,
>   so don't worry too much about that).
> 
> My scripts also clean up small typography mistakes (they are specialized
> in French rules but can of course be taught any thing). They will be
> used to help give PG nicer formats from the ebooksgratuits team (until
> now their Word macros could only produce PG TXT, which is not very sexy
> to read for the end user).
> 
> Regards,
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From blondeel at clipper.ens.fr  Fri Dec  3 07:16:02 2004
From: blondeel at clipper.ens.fr (Sebastien Blondeel)
Date: Fri Dec  3 07:16:08 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
In-Reply-To: <20041203134610.93A12EDAFD@ws6-1.us4.outblaze.com>
References: <20041203134610.93A12EDAFD@ws6-1.us4.outblaze.com>
Message-ID: <20041203151602.GA10478@clipper.ens.fr>

On Fri, Dec 03, 2004 at 08:46:10AM -0500, Joshua Hutchinson wrote:
> I'm curious to see if your script can handle tables.  That is our
> current biggest bugaboo when it comes to transforming to PG TXT
> format.

My DTD doesn't mention them (yet?). It focuses mainly on the French
books of the ebooksgratuits site. I guess it can very easily be injected
in a more complete DTD (TEI, Docbook, whatever).

I already did Perl (not XSLT!) translations of XML tables (Docbook, for
example) to other formats (HTML: easy; LaTeX: harder...; TXT: w3m -dump
of the HTML version is usually good enough) for other projects.

I heard there were now Perl modules able to deal with XML and XSLT so it
should be even easier to take care of. XSLT-style of programming is not
for me...

How complex are your tables and what do you need to do with them? Any
example of (input, output desired, and constraints [API, language...] of
the transformation)?
From joshua at hutchinson.net  Fri Dec  3 07:52:25 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Dec  3 07:52:29 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
Message-ID: <20041203155226.053FF4F462@ws6-5.us4.outblaze.com>

The hard part is getting the table info within PG text 80 column width.

A typical table might be 4 columns wide and 5 rows tall.

Here is a fairly simple one from a Basebal Guide text I'm working on...

 Club.       Won. Lost. P.C.
Chicago       42   14   .788
Hartford      47   21   .691
St. Louis     45   19   .703
Boston        39   31   .557
Louisville    30   36   .455
Mutual        21   35   .375
Athletic      14   45   .237
Cincinnati     9   56   .135

Here is one a little more complex... It has more text columns.

                     THE RECORD OF 1875.
 Club.         Won. Lost. P.C.        Club.            Won. Lost. P.C.
Boston ........ 71    8   .809       St. Louis Reds .... 4   14   .222 
Athletic ...... 55   28   .756       Washington ........ 4   22   .156
Hartford ...... 54   28   .639       New Haven ......... 7   39   .152 
St. Louis* .... 29   39   .574       Centennial......... 2   13   .133
Philadelphia .. 37   31   .544       Western ........... 1   12   .077
Chicago ....... 30   37   .448       Atlantic .......... 2   42   .065
Mutual ........ 29   38   .426

FYI, this table becomes this in TEI markup (NOTE: I made the second Club column just continue under the first for simplicities sake):

<table rows="16" cols="4">
<row>
  <cell cols="4" role="label">THE RECORD OF 1875.</cell>
</row>
<row>
  <cell role="label">Club.</cell><cell role="label">Won.</cell><cell role="label">Lost.</cell><cell role="label">P.C.</cell>
</row>
<row>
  <cell role="data">Boston</cell><cell role="data">71</cell><cell role="data">8</cell><cell role="data">.809</cell>
</row>
<row>
  <cell role="data">Athletic</cell><cell role="data">55</cell><cell role="data">28</cell><cell role="data">.756</cell>
</row>
<row>
  <cell role="data">Hartford</cell><cell role="data">54</cell><cell role="data">28</cell><cell role="data">.639</cell>
</row>
<row>
  <cell role="data">St. Louis</cell><cell role="data"><sic corr="39">29</sic></cell><cell role="data"><sic corr="29">39</sic></cell><cell role="data">.574</cell>
</row>
<row>
  <cell role="data">Philadelphia</cell><cell role="data">37</cell><cell role="data">31</cell><cell role="data">.544</cell>
</row>
<row>
  <cell role="data">Chicago</cell><cell role="data">30</cell><cell role="data">37</cell><cell role="data">.448</cell>
</row>
<row>
  <cell role="data">Mutual</cell><cell role="data">29</cell><cell role="data">38</cell><cell role="data">.426</cell>
</row>
<row>
  <cell role="data">St. Louis Reds</cell><cell role="data">4</cell><cell role="data">14</cell><cell role="data">.222</cell>
</row>
<row>
  <cell role="data">Washington</cell><cell role="data">4</cell><cell role="data">22</cell><cell role="data">.156</cell>
</row>
<row>
  <cell role="data">New Haven</cell><cell role="data">7</cell><cell role="data">39</cell><cell role="data">.152</cell>
</row>
<row>
  <cell role="data">Centennial</cell><cell role="data">2</cell><cell role="data">13</cell><cell role="data">.133</cell>
</row>
<row>
  <cell role="data">Western</cell><cell role="data">1</cell><cell role="data">12</cell><cell role="data">.077</cell>
</row>
<row>
  <cell role="data">Atlantic</cell><cell role="data">2</cell><cell role="data">42</cell><cell role="data">.065</cell>
</row>
</table>

----- Original Message -----
From: "Sebastien Blondeel" <blondeel@clipper.ens.fr>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] XML version of some books of PG (and other formats)
Date: Fri, 3 Dec 2004 16:16:02 +0100

> 
> On Fri, Dec 03, 2004 at 08:46:10AM -0500, Joshua Hutchinson wrote:
> > I'm curious to see if your script can handle tables.  That is our
> > current biggest bugaboo when it comes to transforming to PG TXT
> > format.
> 
> My DTD doesn't mention them (yet?). It focuses mainly on the French
> books of the ebooksgratuits site. I guess it can very easily be injected
> in a more complete DTD (TEI, Docbook, whatever).
> 
> I already did Perl (not XSLT!) translations of XML tables (Docbook, for
> example) to other formats (HTML: easy; LaTeX: harder...; TXT: w3m -dump
> of the HTML version is usually good enough) for other projects.
> 
> I heard there were now Perl modules able to deal with XML and XSLT so it
> should be even easier to take care of. XSLT-style of programming is not
> for me...
> 
> How complex are your tables and what do you need to do with them? Any
> example of (input, output desired, and constraints [API, language...] of
> the transformation)?
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From blondeel at clipper.ens.fr  Fri Dec  3 08:53:49 2004
From: blondeel at clipper.ens.fr (Sebastien Blondeel)
Date: Fri Dec  3 08:53:55 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
In-Reply-To: <20041203155226.053FF4F462@ws6-5.us4.outblaze.com>
References: <20041203155226.053FF4F462@ws6-5.us4.outblaze.com>
Message-ID: <20041203165349.GA193@clipper.ens.fr>

On Fri, Dec 03, 2004 at 10:52:25AM -0500, Joshua Hutchinson wrote:
> The hard part is getting the table info within PG text 80 column width.

It is not always possible of course.

> FYI, this table becomes this in TEI markup (NOTE: I made the second

That looks simple enough.

> Club column just continue under the first for simplicities sake):

Change it to HTML:

-=-=-=
<table border="0">
<tr>
  <td colspan="4" align="center">THE RECORD OF 1875.</td>
[...]
-=-=-=

then replace:
  row  -> tr
  cell -> td

then "w3m -dump table.html" gives:

$ w3m -dump table.html
      THE RECORD OF 1875.      
Club.           Won. Lost. P.C.
Boston          71   8     .809
Athletic        55   28    .756
Hartford        54   28    .639
St. Louis       29   39    .574
Philadelphia    37   31    .544
Chicago         30   37    .448
Mutual          29   38    .426
St. Louis Reds  4    14    .222
Washington      4    22    .156
New Haven       7    39    .152
Centennial      2    13    .133
Western         1    12    .077
Atlantic        2    42    .065

(the star after St. Louis has disappeared).

If you need it embedded in a program I can try to code the algorithm,
depending on the programming language you want (Perl should be easy).

Then you can detect cells with just numbers in them should be
right-aligned, etc.

It should also be easy to translate this to LaTeX for PDF/DVI/PS output.
From joshua at hutchinson.net  Fri Dec  3 09:07:31 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Dec  3 09:07:36 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
Message-ID: <20041203170731.E8BE5EDEA0@ws6-1.us4.outblaze.com>

The problem I've always run into is where the table tries to grow beyond 80 characters wide.

For instance, say that one row looks like this in the original book.


Data label that       Now we have a column      Now we have a column
is extremely long     of data that is also      of data that is also
and is broken up      very long and broken      very long and broken
accordingly over      up over multiple lines.   up over multiple lines.
multiple lines.

Most automated text converters will put each cell on one line with no line breaks.

A web browser will generate line breaks within cells so that the table will end up looking very similar to the above.  I haven't tried w3m ... will it handle the above scenario?  I've tried lynx dumping to a text file and IE/Mozilla dumping to a text, and they all fail miserably.

Josh

----- Original Message -----
From: "Sebastien Blondeel" <blondeel@clipper.ens.fr>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] XML version of some books of PG (and other formats)
Date: Fri, 3 Dec 2004 17:53:49 +0100

> 
> On Fri, Dec 03, 2004 at 10:52:25AM -0500, Joshua Hutchinson wrote:
> > The hard part is getting the table info within PG text 80 column width.
> 
> It is not always possible of course.
> 
> > FYI, this table becomes this in TEI markup (NOTE: I made the second
> 
> That looks simple enough.
> 
> > Club column just continue under the first for simplicities sake):
> 
> Change it to HTML:
> 
> -=-=-=
> <table border="0">
> <tr>
>    <td colspan="4" align="center">THE RECORD OF 1875.</td>
> [...]
> -=-=-=
> 
> then replace:
>    row  -> tr
>    cell -> td
> 
> then "w3m -dump table.html" gives:
> 
> $ w3m -dump table.html
>        THE RECORD OF 1875.
> Club.           Won. Lost. P.C.
> Boston          71   8     .809
> Athletic        55   28    .756
> Hartford        54   28    .639
> St. Louis       29   39    .574
> Philadelphia    37   31    .544
> Chicago         30   37    .448
> Mutual          29   38    .426
> St. Louis Reds  4    14    .222
> Washington      4    22    .156
> New Haven       7    39    .152
> Centennial      2    13    .133
> Western         1    12    .077
> Atlantic        2    42    .065
> 
> (the star after St. Louis has disappeared).
> 
> If you need it embedded in a program I can try to code the algorithm,
> depending on the programming language you want (Perl should be easy).
> 
> Then you can detect cells with just numbers in them should be
> right-aligned, etc.
> 
> It should also be easy to translate this to LaTeX for PDF/DVI/PS output.
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From marcello at perathoner.de  Fri Dec  3 09:17:00 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Dec  3 09:17:08 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
In-Reply-To: <20041203155226.053FF4F462@ws6-5.us4.outblaze.com>
References: <20041203155226.053FF4F462@ws6-5.us4.outblaze.com>
Message-ID: <41B09F8C.8000704@perathoner.de>

Joshua Hutchinson wrote:

> <table rows="16" cols="4">
> <row>
>   <cell cols="4" role="label">THE RECORD OF 1875.</cell>
> </row>

Shouldn't that be

   <table rows="15" cols="4">
   <head>
     THE RECORD OF 1875.
   </head>

?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Fri Dec  3 09:26:17 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Dec  3 09:26:31 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
Message-ID: <20041203172617.5A5439E832@ws6-2.us4.outblaze.com>

I didn't think you could have a <head></head> inside a <table>...  So I went back and looked at the TEI-Lite.  It doesn't mention it, so I thought I was right.  Then, just to be sure, I checked the full spec.  There, it does mention <head></head> as a viable element inside a <table>

So, you are right.  <head></head> markup would be the more correct route.

Josh

----- Original Message -----
From: "Marcello Perathoner" <marcello@perathoner.de>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] XML version of some books of PG (and other formats)
Date: Fri, 03 Dec 2004 18:17:00 +0100

> 
> Joshua Hutchinson wrote:
> 
> > <table rows="16" cols="4">
> > <row>
> >   <cell cols="4" role="label">THE RECORD OF 1875.</cell>
> > </row>
> 
> Shouldn't that be
> 
>    <table rows="15" cols="4">
>    <head>
>      THE RECORD OF 1875.
>    </head>
> 
> ?
> 
> 
> -- Marcello Perathoner
> webmaster@gutenberg.org
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From jussi.kukkonen at welho.com  Fri Dec  3 11:00:07 2004
From: jussi.kukkonen at welho.com (Jussi Kukkonen)
Date: Fri Dec  3 11:00:04 2004
Subject: [gutvol-d] possible fix for overwide tables in PGTEI text (was: XML
 version of some books...)
Message-ID: <41B0B7B7.1010605@welho.com>

Joshua Hutchinson wrote:
> I'm curious to see if your script can handle tables.  That is our
> current biggest bugaboo when it comes to transforming to PG TXT
> format.
>

Now that you mentioned it....

I've been playing with PGTEI, and encountered this problem (too wide
tables) also. If anyone is wondering what we're talking about, please
search for string "1271-95." in
    http://koti.welho.com/jkukkone/geo/teioutput.txt
Oh, feel free to see the html version also while you're there:
    http://koti.welho.com/jkukkone/geo/teioutput.html
(warning for modem users - some images might still be pretty large).

So, I spent some time with Groff* and Tbl** manuals and I think I found
a fix for this. Currently Tbl input tables look like this (3 rows, 2
columns):
***
1873.	Livingstone discovers Lake Moero.
1874-75.	Lieut. Cameron crosses equatorial Africa.
1875-94.	?lis?e Reclus publishes his _G?ographie Universelle_.
***

Tbl _can_ be instructed to wrap lines when needed by changing the input
to this:
***
T{
1873.
T}	T{
Livingstone discovers Lake Moero.
T}
T{
1874-75.
T}	T{
Lieut. Cameron crosses equatorial Africa.
T}
T{
1875-94.
T}	T{
?lis?e Reclus publishes his _G?ographie Universelle_.
T}
***

According to my tests it works suprisingly well. Sometimes Tbl does wrap
too eagerly, but I saw nothing that wasn't acceptable.

Marcello (or anyone with some authority on PGTEI), I can try and come up
with a patch for tei2nroff-common.xsl, if that's wished for. Let me know.

- jussi


*, **  For those not familiar with the more obscure unix tools:
Groff, or GNU Troff, is a document typesetting tool used in PGTEI to
produce TXT and PDB versions. Tbl is a table formatting tool used by Groff

-- 
Jussi Kukkonen

From marcello at perathoner.de  Fri Dec  3 11:07:11 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Dec  3 11:07:23 2004
Subject: [gutvol-d] possible fix for overwide tables in PGTEI text (was:
	XML version of some books...)
In-Reply-To: <41B0B7B7.1010605@welho.com>
References: <41B0B7B7.1010605@welho.com>
Message-ID: <41B0B95F.5030805@perathoner.de>

Jussi Kukkonen wrote:

> Tbl _can_ be instructed to wrap lines when needed by changing the input
> to this:
> ***
> T{
> 1873.
> T}    T{
> Livingstone discovers Lake Moero.
> T}
> T{
> 1874-75.
> T}    T{
> Lieut. Cameron crosses equatorial Africa.
> T}
> T{
> 1875-94.
> T}    T{
> ?lis?e Reclus publishes his _G?ographie Universelle_.
> T}
> ***
> 
> According to my tests it works suprisingly well. Sometimes Tbl does wrap
> too eagerly, but I saw nothing that wasn't acceptable.
> 
> Marcello (or anyone with some authority on PGTEI), I can try and come up
> with a patch for tei2nroff-common.xsl, if that's wished for. Let me know.

The new forthcoming version 0.3 of the PGTEI converter already does 
this. (with a little help from the markup: you have to manually specify 
the width of the column.)


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Fri Dec  3 11:20:46 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Dec  3 11:20:54 2004
Subject: [gutvol-d] possible fix for overwide tables in PGTEI text
	(was:XML version of some books...)
Message-ID: <20041203192046.DBD1C10993F@ws6-4.us4.outblaze.com>


----- Original Message -----
From: "Marcello Perathoner" <marcello@perathoner.de>
> 
> The new forthcoming version 0.3 of the PGTEI converter already does this. 
> (with a little help from the markup: you have to manually specify the width of 
> the column.)
> 

I saw that in the preliminary document you sent me, Marcello.  I haven't had a chance to really dig into it yet.  (US Thanksgiving holiday has put me behind)

Dumb, off-the-top-of-my-head question:

Would it be possible for the converter to assume that if no manual width is specified, that it should just divide the table width by the number of columns and apply that value automatically to each column?  That way, if a quick-and-dirty table will work, no further markup is needed.  But if special formatting is needed (perhaps for a really big, complex table?), the etext preparer can take the time to do so.

Josh
From blondeel at clipper.ens.fr  Fri Dec  3 12:12:28 2004
From: blondeel at clipper.ens.fr (Sebastien Blondeel)
Date: Fri Dec  3 12:12:44 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
In-Reply-To: <20041203170731.E8BE5EDEA0@ws6-1.us4.outblaze.com>
References: <20041203170731.E8BE5EDEA0@ws6-1.us4.outblaze.com>
Message-ID: <20041203201228.GA5096@clipper.ens.fr>

On Fri, Dec 03, 2004 at 12:07:31PM -0500, Joshua Hutchinson wrote:
> A web browser will generate line breaks within cells so that the table
> will end up looking very similar to the above.  I haven't tried w3m
> ... will it handle the above scenario?  I've tried lynx dumping to a

You should have. Yes it does:

-=-=-=
$ cat /tmp/toto.html 
<table border="0">
<tr>
<td>Data label that is extremely long and is broken up accordingly over
multiple lines.</td>
<td>Now we have a column of data that is also very long and broken up
over multiple lines.</td>
<td>Now we have a column of data that is also very long and broken up
over multiple lines.</td>
</tr>
</table>

$ w3m -cols 72 -dump /tmp/toto.html 
Data label that is      Now we have a column of Now we have a column of
extremely long and is   data that is also very  data that is also very 
broken up accordingly   long and broken up over long and broken up over
over multiple lines.    multiple lines.         multiple lines.    

$ w3m -cols 48 -dump /tmp/toto.html 
Data label that Now we have a   Now we have a  
is extremely    column of data  column of data 
long and is     that is also    that is also   
broken up       very long and   very long and  
accordingly     broken up over  broken up over 
over multiple   multiple lines. multiple lines.
lines.                                         
-=-=-=

(Note: for the 72 columns version I don't know why there is an extra
space between columns 1 and 2. Probably a bug of w3m: it was already
there in the base-ball example. This is easy to detect and fix I guess:
use ``border=1'' and clean out the frames:

$ w3m -cols 76 -dump /tmp/toto.html 
+-------------------------------------------------------------------------+
|Data label that is     |Now we have a column of |Now we have a column of |
|extremely long and is  |data that is also very  |data that is also very  |
|broken up accordingly  |long and broken up over |long and broken up over |
|over multiple lines.   |multiple lines.         |multiple lines.         |
+-------------------------------------------------------------------------+
                      ^^^                       ^^

You can detect those useless empty columns and remove them (or decide to have 2
or 3 blanks between columns). Doing this without frames is more dangerous, and
the columns more difficult to detect:
 
$ w3m -cols 72 -dump /tmp/toto.html 
Data la el that is      Now we have a column of Now we have a column of
extreme y long and is   data that is also very  data that is also very 
brokenx p accordingly   long and broken up over long and broken up over
over mu tiple lines.    multiple lines.         multiple lines.    
      ^^^
       this is not a column break.

> text file and IE/Mozilla dumping to a text, and they all fail
> miserably.

w3m is better than lynx for tables (and many other things: it is able to
display images in console mode and inside xterms!). links is good too.

As for the example given in a later message:

-=-=-=
$ w3m -cols 72 -dump teioutput.html  | head
150.       Ptolemy publishes his geography.                            
230.       The Peutinger Table pictures the Roman roads.               
400-14.    Fa-hien travels through and describes Afghanistan and India.
499.       Hoei-Sin said to have visited the kingdom of Fu-sang, 20,000
           furlongs east of China (identified by some with California).
518-21.    Hoei-Sing and Sung-Yun visit and describe the Pamirs and the
           Punjab.                                                     
540.       Cosmas Indicopleustes visits India, and combats the         
           sphericity of the globe.                                    
629-46.    Hiouen-Tshang travels through Turkestan, Afghanistan, India,

$ perl ~/work/PGDP/ebooksgratuits/Fmt.pl 72 teioutput.html | head
<table border="0">
<tr><td>150.</td><td>Ptolemy publishes his geography.</td></tr>
<tr><td>230.</td><td>The Peutinger Table pictures the Roman
roads.</td></tr>
<tr><td>400-14.</td><td>Fa-hien travels through and describes
Afghanistan and India.</td></tr>
<tr><td>499.</td><td>Hoei-Sin said to have visited the kingdom of
Fu-sang, 20,000 furlongs east of China (identified by some with
California).</td></tr>
<tr><td>518-21.</td><td>Hoei-Sing and Sung-Yun visit and describe the
-=-=-=

Note: sometimes the columns in w3m are, weirdly, unbalanced. I don't have an
example right here but I guess you can help with percentage-width attributes in
the columns (if that is possible at all in TEI). I am using an old version of
w3m, too (Debian stable).

With links:

-=-=-=
$ links -dump teioutput.html | head
   150.       Ptolemy publishes his geography.                                
   230.       The Peutinger Table pictures the Roman roads.                   
   400-14.    Fa-hien travels through and describes Afghanistan and India.    
   499.       Hoei-Sin said to have visited the kingdom of Fu-sang, 20,000    
              furlongs east of China (identified by some with California).    
   518-21.    Hoei-Sing and Sung-Yun visit and describe the Pamirs and the    
              Punjab.                                                         
   540.       Cosmas Indicopleustes visits India, and combats the sphericity  
              of the globe.                                                   
   629-46.    Hiouen-Tshang travels through Turkestan, Afghanistan, India,    

$ links -dump toto.html 
   +------------------------------------------------------------------------+
   | Data label that is    | Now we have a column of | Now we have a column |
   | extremely long and is | data that is also very  | of data that is also |
   | broken up accordingly | long and broken up over | very long and broken |
   | over multiple lines.  | multiple lines.         | up over multiple     |
   |                       |                         | lines.               |
   +------------------------------------------------------------------------+

$ links -dump toto.html  # without border
   Data label that is         Now we have a column of Now we have a column of 
   extremely long and is      data that is also very  data that is also very  
   broken up accordingly over long and broken up over long and broken up over 
   multiple lines.            multiple lines.         multiple lines.         
-=-=-=
From joshua at hutchinson.net  Fri Dec  3 12:36:36 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Dec  3 12:36:45 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
Message-ID: <20041203203636.C862E2F9A7@ws6-3.us4.outblaze.com>

Anyone have a link to w3m in a windows executable (command line is fine, I just don't have access to a way to compile the source where I'm at right now)?  This definitely looks interesting.

Josh

----- Original Message -----
From: "Sebastien Blondeel" <blondeel@clipper.ens.fr>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] XML version of some books of PG (and other formats)
Date: Fri, 3 Dec 2004 21:12:28 +0100

> 
> On Fri, Dec 03, 2004 at 12:07:31PM -0500, Joshua Hutchinson wrote:
> > A web browser will generate line breaks within cells so that the table
> > will end up looking very similar to the above.  I haven't tried w3m
> > ... will it handle the above scenario?  I've tried lynx dumping to a
> 
> You should have. Yes it does:
> 
> -=-=-=
> $ cat /tmp/toto.html
> <table border="0">
> <tr>
> <td>Data label that is extremely long and is broken up accordingly over
> multiple lines.</td>
> <td>Now we have a column of data that is also very long and broken up
> over multiple lines.</td>
> <td>Now we have a column of data that is also very long and broken up
> over multiple lines.</td>
> </tr>
> </table>
> 
> $ w3m -cols 72 -dump /tmp/toto.html
> Data label that is      Now we have a column of Now we have a column of
> extremely long and is   data that is also very  data that is also very
> broken up accordingly   long and broken up over long and broken up over
> over multiple lines.    multiple lines.         multiple lines.
> 
> $ w3m -cols 48 -dump /tmp/toto.html
> Data label that Now we have a   Now we have a
> is extremely    column of data  column of data
> long and is     that is also    that is also
> broken up       very long and   very long and
> accordingly     broken up over  broken up over
> over multiple   multiple lines. multiple lines.
> lines.
> -=-=-=
> 
> (Note: for the 72 columns version I don't know why there is an extra
> space between columns 1 and 2. Probably a bug of w3m: it was already
> there in the base-ball example. This is easy to detect and fix I guess:
> use ``border=1'' and clean out the frames:
> 
> $ w3m -cols 76 -dump /tmp/toto.html
> +-------------------------------------------------------------------------+
> |Data label that is     |Now we have a column of |Now we have a column of |
> |extremely long and is  |data that is also very  |data that is also very  |
> |broken up accordingly  |long and broken up over |long and broken up over |
> |over multiple lines.   |multiple lines.         |multiple lines.         |
> +-------------------------------------------------------------------------+
>                        ^^^                       ^^
> 
> You can detect those useless empty columns and remove them (or decide to have 2
> or 3 blanks between columns). Doing this without frames is more dangerous, and
> the columns more difficult to detect:
> 
> $ w3m -cols 72 -dump /tmp/toto.html
> Data la el that is      Now we have a column of Now we have a column of
> extreme y long and is   data that is also very  data that is also very
> brokenx p accordingly   long and broken up over long and broken up over
> over mu tiple lines.    multiple lines.         multiple lines.
>        ^^^
>         this is not a column break.
> 
> > text file and IE/Mozilla dumping to a text, and they all fail
> > miserably.
> 
> w3m is better than lynx for tables (and many other things: it is able to
> display images in console mode and inside xterms!). links is good too.
> 
> As for the example given in a later message:
> 
> -=-=-=
> $ w3m -cols 72 -dump teioutput.html  | head
> 150.       Ptolemy publishes his geography.
> 230.       The Peutinger Table pictures the Roman roads.
> 400-14.    Fa-hien travels through and describes Afghanistan and India.
> 499.       Hoei-Sin said to have visited the kingdom of Fu-sang, 20,000
>             furlongs east of China (identified by some with California).
> 518-21.    Hoei-Sing and Sung-Yun visit and describe the Pamirs and the
>             Punjab.
> 540.       Cosmas Indicopleustes visits India, and combats the
>             sphericity of the globe.
> 629-46.    Hiouen-Tshang travels through Turkestan, Afghanistan, India,
> 
> $ perl ~/work/PGDP/ebooksgratuits/Fmt.pl 72 teioutput.html | head
> <table border="0">
> <tr><td>150.</td><td>Ptolemy publishes his geography.</td></tr>
> <tr><td>230.</td><td>The Peutinger Table pictures the Roman
> roads.</td></tr>
> <tr><td>400-14.</td><td>Fa-hien travels through and describes
> Afghanistan and India.</td></tr>
> <tr><td>499.</td><td>Hoei-Sin said to have visited the kingdom of
> Fu-sang, 20,000 furlongs east of China (identified by some with
> California).</td></tr>
> <tr><td>518-21.</td><td>Hoei-Sing and Sung-Yun visit and describe the
> -=-=-=
> 
> Note: sometimes the columns in w3m are, weirdly, unbalanced. I don't have an
> example right here but I guess you can help with percentage-width attributes in
> the columns (if that is possible at all in TEI). I am using an old version of
> w3m, too (Debian stable).
> 
> With links:
> 
> -=-=-=
> $ links -dump teioutput.html | head
>     150.       Ptolemy publishes his geography.
>     230.       The Peutinger Table pictures the Roman roads.
>     400-14.    Fa-hien travels through and describes Afghanistan and India.
>     499.       Hoei-Sin said to have visited the kingdom of Fu-sang, 20,000
>                furlongs east of China (identified by some with California).
>     518-21.    Hoei-Sing and Sung-Yun visit and describe the Pamirs and the
>                Punjab.
>     540.       Cosmas Indicopleustes visits India, and combats the sphericity
>                of the globe.
>     629-46.    Hiouen-Tshang travels through Turkestan, Afghanistan, India,
> 
> $ links -dump toto.html
>     +------------------------------------------------------------------------+
>     | Data label that is    | Now we have a column of | Now we have a column |
>     | extremely long and is | data that is also very  | of data that is also |
>     | broken up accordingly | long and broken up over | very long and broken |
>     | over multiple lines.  | multiple lines.         | up over multiple     |
>     |                       |                         | lines.               |
>     +------------------------------------------------------------------------+
> 
> $ links -dump toto.html  # without border
>     Data label that is         Now we have a column of Now we have a column of
>     extremely long and is      data that is also very  data that is also very
>     broken up accordingly over long and broken up over long and broken up over
>     multiple lines.            multiple lines.         multiple lines.
> -=-=-=
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From Gutenberg9443 at aol.com  Fri Dec  3 18:49:38 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Dec  3 18:50:01 2004
Subject: [gutvol-d] Update 2 on the eFictionWise
Message-ID: <159.459950f3.2ee27fc2@aol.com>

Hi all--
 
My eFictionWise 1150 arrived this morning, and I have been fiddling with it  
ever since.
 
Ten seconds after I got my Rocket out of its box, I was in love with it. If  
my first experience with a dedicated ebook reader had been an eFictionWise  
1150, I would never have even tried to use a dedicated ebook reader again.
 
It is nicely packaged. It feels good in the hand. The problem of having to  
point the down arrow to go up and the up arrow to go down has been  remedied.
 
BUT--
 
(1) There seem to be only two type sizes available, larger and smaller.  
Although the Rocket has the same built-in capacity, it also has a way to  download 
other font sizes from the RocketLibrary.
 
(2) With my vision problems, I need black on white. What I'm getting is  
grey-beige on white-grey-beige. The controls do not allow me to create black on  
white. I will be able to read the books in bed, because they are backlighted,  
but I'm not going to be able to do it without glasses, as I can the  Rocket.
 
(3) I still haven't figured out how to do most things. The biggest problem  
is that three of the four on-screen icons don't mean the same as they do on the 
 Rocket. All kinds of pulldown menus are located in very illogical places.
 
(4)  I think I have figured out how to highlight passages, but not how  to 
insert bookmarks.
 
(5) The write-and-draw feature does not work. I have not checked out  the 
keyboard yet, but it's pretty useless if I can't get back to the notes I  have 
made. That's why I need bookmarks.
 
(6) The method of downloading books is a total, utter, and complete  
nightmare. When I am downloading a Rocket or Microwave Reader from  FictionWise.com, I 
pay for it and then I download it onto my screen and then, if  it's a Rocket, 
I goes directly to the RocketLibrarian, which then asks me if I  want to put 
it into the Rocket eBook reader. If I say yes, the job is done in a  few 
seconds. If I say no, the file sits right there in my RocketLibrarian on my  
computer until I need it. If I want to import a personal file, from PG or any of  
several other sources, I make sure that I have saved it in .txt or .htm. Then I  
go into the Rocket Librarian, press a button and import the file, name the 
file,  and then do exactly what I do for a book that is already .rb. Now, read 
the  following eBookWise version of this task:
 
I had put two ebooks in my eBookWise  cart a couple of days ago,  When I went 
back today and tried to buy them, the program didn't let me.  Instead, it 
told me that I already had them.
 
In order to try out capabilities, I picked out another ebook in which I was  
mildly interested and bought it. I had to jump through hoops to identify 
myself  and my online "bookshelf."
Then, in order to get it onto my eBookWise reader, I had to use a special  
heavy telephone cord to get it from the "bookshelf" to the reading device.
 
Then I went to load one of my personal books onto the eBookWise. In order  to 
do this, I had to go to my computer and then upload the file into my  
"bookshelf" online. This required me to jump through a few more hoops. I am  allowed 
only 10 mg of "personal" stuff. Then I had to go back to the reader and  play 
telephone games a while longer.
 
I'll try it out again when the new program allowing USB uploads onto the  
eBookWise, and allowing the unlimited library that the Rocket allows, arrives.  
In the meantime, I'm using my Rocket.
 
My own recommendation is that eBookWise take a VERY good look at a Rocket  
and then rewrite the abominable software for this device.
 
I'm sorry. I wish I had better news.
 
Anne
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041203/f05f1a89/attachment.html
From ciesiels at bigpond.net.au  Sat Dec  4 00:53:43 2004
From: ciesiels at bigpond.net.au (Michael Ciesielski)
Date: Sat Dec  4 00:54:55 2004
Subject: [gutvol-d] promo.net/pg
Message-ID: <41B17B17.6070805@bigpond.net.au>

promo.net/pg is still the second result for a Google search for "Project 
Gutenberg". Is there any reason why this site still exists other then as 
a redirect to gutenberg.org?

At the moment, the top two choices are:

** <http://www.gutenberg.org/>
Welcome to Project Gutenberg - Project Gutenberg

PROJECT GUTENBERG OFFICIAL HOME SITE - INDEX -- Free Books On-Line ...


Not knowing anything about PG, I'd be most inclined to go for the 
second, which is promo.net.

Once opened, the promo.net site provides no indication that it is not 
the real PG website, and even the link to gutenberg.net is phrased such 
that a casual glance would make one think that promo.net/pg was the 
official site.

Mike
From marcello at perathoner.de  Sat Dec  4 02:36:51 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Dec  4 02:37:15 2004
Subject: [gutvol-d] possible fix for overwide tables in PGTEI text	(was:XML
	version of some books...)
In-Reply-To: <20041203192046.DBD1C10993F@ws6-4.us4.outblaze.com>
References: <20041203192046.DBD1C10993F@ws6-4.us4.outblaze.com>
Message-ID: <41B19343.8030201@perathoner.de>

Joshua Hutchinson wrote:

> Would it be possible for the converter to assume that if no manual
> width is specified, that it should just divide the table width by the
> number of columns and apply that value automatically to each column?

That is exactly what it does unless you specify some column width.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From blondeel at clipper.ens.fr  Sat Dec  4 05:30:19 2004
From: blondeel at clipper.ens.fr (Sebastien Blondeel)
Date: Sat Dec  4 05:30:27 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
In-Reply-To: <20041203203636.C862E2F9A7@ws6-3.us4.outblaze.com>
References: <20041203203636.C862E2F9A7@ws6-3.us4.outblaze.com>
Message-ID: <20041204133019.GD27551@clipper.ens.fr>

On Fri, Dec 03, 2004 at 03:36:36PM -0500, Joshua Hutchinson wrote:
> Anyone have a link to w3m in a windows executable (command line is
> fine, I just don't have access to a way to compile the source where
> I'm at right now)?  This definitely looks interesting.

A friend of mine competent in Windows stuff suggests to use the
following:

==== xml2txt.js ====
var x = new ActiveXObject("Msxml2.FreeThreadedDOMDocument");
x.load(WScript.Arguments(0));
var p = x.documentElement.selectNodes("//p");
for (var it = new Enumerator(p) ; !it.atEnd() ; it.moveNext())
{
        var t = it.item().text.replace(/\s+/, " ");
        if (t.charAt(0) == " ")
                t = t.slice(1);
        var l = t.length;
        var i = 0;
        var j = 0;
        while (i < l)
        {
                j = i + 77;
                if (j <= l)
                {
                        j = t.lastIndexOf(" ", j);
                        if (j < i)
                        {
                                j = t.indexOf(" ", i);
                                if (j == -1)
                                        j = l;
                        };
                }
                else
                {
                        j = l;
                };
                WScript.Echo(t.slice(i, j));
                i = j + 1;
        };
        WScript.Echo("");
};
WScript.Quit(0);
====================

Run it with the following command:

-=-=-=
cscript //nologo xml2txt.js ton_fichier.xml
-=-=-=

With the paragraphs in the XML marked as <p></p> (Cf the selectNodes
call).
From marcello at perathoner.de  Sat Dec  4 11:02:37 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Dec  4 11:02:45 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
In-Reply-To: <20041203134610.93A12EDAFD@ws6-1.us4.outblaze.com>
References: <20041203134610.93A12EDAFD@ws6-1.us4.outblaze.com>
Message-ID: <41B209CD.8050406@perathoner.de>

Joshua Hutchinson wrote:

> I'm curious to see if your script can handle tables.  That is our
> current biggest bugaboo when it comes to transforming to PG TXT
> format.

HTML, TXT and TEI versions of the 0.3 docs are up at:

   http://www.gutenberg.org/tei/marcello/0.3/doc/


There are two tables in the docs, a small one and a bigger one that 
needs manual specifying of the column width.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From sly at victoria.tc.ca  Sat Dec  4 11:37:49 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat Dec  4 11:38:01 2004
Subject: [gutvol-d] promo.net/pg
In-Reply-To: <41B17B17.6070805@bigpond.net.au>
References: <41B17B17.6070805@bigpond.net.au>
Message-ID: <Pine.GSO.4.58.0412041129510.5873@vtn1.victoria.tc.ca>


On Sat, 4 Dec 2004, Michael Ciesielski wrote:

> promo.net/pg is still the second result for a Google search for "Project
> Gutenberg". Is there any reason why this site still exists other then as
> a redirect to gutenberg.org?

My understanding is that there is a problem with who has write permissions
for those particular pages. There was some discussion of this on gutvol-d
a while ago, but it was before this list was moved to pglaf.org so I
don't know if you could find it archived.

I have sent emails to many people who have mentioned the URL promo.net/pg
on web pages or in newsgroups to let them know that the current, most
correct URL is http://www.gutenberg.org/

Andrew
From joshua at hutchinson.net  Sat Dec  4 14:10:47 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat Dec  4 14:10:57 2004
Subject: =?iso-8859-1?q?Re:_[gutvol-d]_possible_fix_for_overwide_tables?=
	=?iso-8859-1?q?_in_PGTEI_text=09(was:XMLversion_of_some_books.?=
	=?iso-8859-1?q?..)?=
Message-ID: <20041204221047.51C049E79E@ws6-2.us4.outblaze.com>


----- Original Message -----
From: "Marcello Perathoner" <marcello@perathoner.de>
> 
> Joshua Hutchinson wrote:
> 
> > Would it be possible for the converter to assume that if no manual
> > width is specified, that it should just divide the table width by the
> > number of columns and apply that value automatically to each column?
> 
> That is exactly what it does unless you specify some column width.
> 
> 

Good.  Thanks for clearing that up for me.

Josh
From Gutenberg9443 at aol.com  Sat Dec  4 14:45:28 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Sat Dec  4 14:45:42 2004
Subject: [gutvol-d] Update on eBookWise -more sanguine
Message-ID: <1ee.303f8519.2ee39808@aol.com>

I'm feeling a little better about the eBookWise reader than I did  yesterday.
 
The problem with getting ebooks from the online bookshelf into my reader  
turned out to be a problem with the telephone company serving the bookshelf. I  
checked on that and this afternoon I was able to download with no problem at  
all.
 
The "handwriting" feature suddenly started working. I don't know why it  
wasn't working before.
 
I'm still not happy about the contrast, and obviously, until an eBookWise  
Librarian program allows personal content to be used without limit, I'm not  
going to be happy about the present very limited personal content, especially  
since I like to do a lot of my editing on my Rocket. It will actually be far  
easier on eBookWise once unlimited personal content is allowed, because it's  
much easier for me to write in the changes and then transfer by hand to the  
computer than it is to monkey around with punching each letter on a teeny little  
keyboard with the stylus.
 
eBookWise took care of a rather peculiar problem with my ability to order a  
book. Their technicians are noticeably quick to correct problems that do not  
entail new programs, and their customer service reps also are prompt.
 
So I think that once the USB computer to reader problem is solved, thereby  
solving the problem of extensive use of personal content, the reader will be  
highly user-friendly. Obviously eBookWise would rather I purchase books from  
them than use my own content, and I will definitely be purchasing as many books 
 as I can afford. As I have been saying for years, I can't imagine anyone who 
has  given a fair trial to a good ebook reader chosing to go back to tree 
books if  there's an ebook available.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041204/cea73bfd/attachment.html
From alex at awstudios.net  Sat Dec  4 15:09:37 2004
From: alex at awstudios.net (Alex Wilson)
Date: Sat Dec  4 15:09:56 2004
Subject: [gutvol-d] NYT upbeat on e-books 
Message-ID: <BDD7ADE1.32C6C%alex@awstudios.net>

Just saw this note on Slashdot, though it'd be of interest.
http://slashdot.org/article.pl?sid=04/12/04/181228

"Sunday's NYT Book Review will carry an upbeat article on e-books, complete
with mention of the New York Public Library's impressive 3,000-title
efforts...." 

Links and discussion at the above link.

Alex.

http://www.alexwilson.com - Alex Wilson Studios
http://www.telltaleweekly.org - Funding a Free Audiobook Library


From nwolcott2 at kreative.net  Sat Dec  4 14:00:47 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Sat Dec  4 19:10:09 2004
Subject: [gutvol-d] XML version of some books of PG (and other formats)
References: <20041203155226.053FF4F462@ws6-5.us4.outblaze.com>
Message-ID: <017101c4da77$e350c020$2371fea9@gateway>

As I recall the 80 colllumn rule didn't used to be a hard and fast rule for
tables.  When the table contained too much information, one was supposed to
expand it the minmum amount necessary, at least that is what I recall MH as
saying.
nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
----- Original Message -----
From: "Joshua Hutchinson" <joshua@hutchinson.net>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Friday, December 03, 2004 10:52 AM
Subject: Re: [gutvol-d] XML version of some books of PG (and other formats)


> The hard part is getting the table info within PG text 80 column width.
>
> A typical table might be 4 columns wide and 5 rows tall.
>
> Here is a fairly simple one from a Basebal Guide text I'm working on...
>
>  Club.       Won. Lost. P.C.
> Chicago       42   14   .788
> Hartford      47   21   .691
> St. Louis     45   19   .703
> Boston        39   31   .557
> Louisville    30   36   .455
> Mutual        21   35   .375
> Athletic      14   45   .237
> Cincinnati     9   56   .135
>
> Here is one a little more complex... It has more text columns.
>
>                      THE RECORD OF 1875.
>  Club.         Won. Lost. P.C.        Club.            Won. Lost. P.C.
> Boston ........ 71    8   .809       St. Louis Reds .... 4   14   .222
> Athletic ...... 55   28   .756       Washington ........ 4   22   .156
> Hartford ...... 54   28   .639       New Haven ......... 7   39   .152
> St. Louis* .... 29   39   .574       Centennial......... 2   13   .133
> Philadelphia .. 37   31   .544       Western ........... 1   12   .077
> Chicago ....... 30   37   .448       Atlantic .......... 2   42   .065
> Mutual ........ 29   38   .426
>
> FYI, this table becomes this in TEI markup (NOTE: I made the second Club
column just continue under the first for simplicities sake):
>
> <table rows="16" cols="4">
> <row>
>   <cell cols="4" role="label">THE RECORD OF 1875.</cell>
> </row>
> <row>
>   <cell role="label">Club.</cell><cell role="label">Won.</cell><cell
role="label">Lost.</cell><cell role="label">P.C.</cell>
> </row>
> <row>
>   <cell role="data">Boston</cell><cell role="data">71</cell><cell
role="data">8</cell><cell role="data">.809</cell>
> </row>
> <row>
>   <cell role="data">Athletic</cell><cell role="data">55</cell><cell
role="data">28</cell><cell role="data">.756</cell>
> </row>
> <row>
>   <cell role="data">Hartford</cell><cell role="data">54</cell><cell
role="data">28</cell><cell role="data">.639</cell>
> </row>
> <row>
>   <cell role="data">St. Louis</cell><cell role="data"><sic
corr="39">29</sic></cell><cell role="data"><sic
corr="29">39</sic></cell><cell role="data">.574</cell>
> </row>
> <row>
>   <cell role="data">Philadelphia</cell><cell role="data">37</cell><cell
role="data">31</cell><cell role="data">.544</cell>
> </row>
> <row>
>   <cell role="data">Chicago</cell><cell role="data">30</cell><cell
role="data">37</cell><cell role="data">.448</cell>
> </row>
> <row>
>   <cell role="data">Mutual</cell><cell role="data">29</cell><cell
role="data">38</cell><cell role="data">.426</cell>
> </row>
> <row>
>   <cell role="data">St. Louis Reds</cell><cell role="data">4</cell><cell
role="data">14</cell><cell role="data">.222</cell>
> </row>
> <row>
>   <cell role="data">Washington</cell><cell role="data">4</cell><cell
role="data">22</cell><cell role="data">.156</cell>
> </row>
> <row>
>   <cell role="data">New Haven</cell><cell role="data">7</cell><cell
role="data">39</cell><cell role="data">.152</cell>
> </row>
> <row>
>   <cell role="data">Centennial</cell><cell role="data">2</cell><cell
role="data">13</cell><cell role="data">.133</cell>
> </row>
> <row>
>   <cell role="data">Western</cell><cell role="data">1</cell><cell
role="data">12</cell><cell role="data">.077</cell>
> </row>
> <row>
>   <cell role="data">Atlantic</cell><cell role="data">2</cell><cell
role="data">42</cell><cell role="data">.065</cell>
> </row>
> </table>
>
> ----- Original Message -----
> From: "Sebastien Blondeel" <blondeel@clipper.ens.fr>
> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
> Subject: Re: [gutvol-d] XML version of some books of PG (and other
formats)
> Date: Fri, 3 Dec 2004 16:16:02 +0100
>
> >
> > On Fri, Dec 03, 2004 at 08:46:10AM -0500, Joshua Hutchinson wrote:
> > > I'm curious to see if your script can handle tables.  That is our
> > > current biggest bugaboo when it comes to transforming to PG TXT
> > > format.
> >
> > My DTD doesn't mention them (yet?). It focuses mainly on the French
> > books of the ebooksgratuits site. I guess it can very easily be injected
> > in a more complete DTD (TEI, Docbook, whatever).
> >
> > I already did Perl (not XSLT!) translations of XML tables (Docbook, for
> > example) to other formats (HTML: easy; LaTeX: harder...; TXT: w3m -dump
> > of the HTML version is usually good enough) for other projects.
> >
> > I heard there were now Perl modules able to deal with XML and XSLT so it
> > should be even easier to take care of. XSLT-style of programming is not
> > for me...
> >
> > How complex are your tables and what do you need to do with them? Any
> > example of (input, output desired, and constraints [API, language...] of
> > the transformation)?
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d@lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

From Gutenberg9443 at aol.com  Tue Dec  7 11:53:19 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Tue Dec  7 11:53:34 2004
Subject: [gutvol-d] NYT upbeat on e-books; so am I. 
Message-ID: <e4.5e02ec15.2ee7642f@aol.com>

 
In a message dated 12/4/2004 4:09:58 PM Mountain Standard Time,  
alex@awstudios.net writes:

"Sunday's NYT Book Review will carry an upbeat article on e-books,  complete
with mention of the New York Public Library's impressive  3,000-title
efforts...." 


Good news.
 
But, as has been pointed out often, nobody really wants to sit at the  
computer or take the computer to bed in order to read a book.
 
Nobody is going to provide book-size ebook readers for very long unless  
doing so becomes financially feasible.
 
So if you don't want to read on a computer or on a PDA, I would appreciate  
it if anybody who can afford it would go to either eBookWise.com or  
FictionWise.com and spend at least $20 a month. If you don't have an ebook  reader you 
can download Microsoft Reader to your desktop free; however, if you  purchase th
e 1150 you will have a highly acceptable tool you can use for many  years.
 
By buying all the remaining 1150s and making them available dirt cheap;  
transforming over 7500 ebooks, mostly proprietary, into the right format for the  
1150; hiring software engineers to fix perceived problems; and hiring hardware 
 engineers to make improved readers, FictionWise has stuck its neck so far 
out it  looks like a giraffe. Now we need to feed that giraffe.
 
As to my complaints about the 1150, I was mistaken on most of them. Some of  
the changes from Rocket are definitely an improvement. For example, so far I  
have zorched TWO Rocket powercords because the location of the cord port is 
such  that the cord is often bent at a right angle. The 1150's cord port is at 
the  top, which obviates that problem. It is possible to insert bookmarks (I 
was just  plain wrong on that earlier). It is also possible to handwrite your 
notes to  yourself as you're reading.
 
I have only two remaining objections: First, of course, is the limited  
ability to use personal material; that is being worked on right now, and will be  
fixed as soon as possible by allowing direct USB downloading from your 
computer.  The other, which I hadn't noticed earlier, is that there is no dictionary  
capability. I use that extensively in the Rocket, both English dictionary and  
language-to-language dictionaries. I do not know whether there is any 
intention  of adding that.
 
I won't update again until the USB problem is remedied. But on second  
thought, if you have it to spare, spend $50 a month at Fictionwise and/or  
eBookwise. Unfortunately, $20 is my limit and I don't always have that. But I  think 
that the combination of free ebooks from us and many other sources, and  
commercial ebooks, is going to be a long-range win for all of us.
 
Anne
PS--Yes, you can read at the beach if you keep your 1150 inside a ziplock  
plastic bag, though I wouldn't do it because the possibility of somebody  
stealing it or walking on it is too high. As for underwater . . . If you're  
underwater watch the fishies instead of reading a book. I still wouldn't read it  in 
the bathtub, but then I never read in the bathtub anyway since I dropped a  
rather expensive library book into the water.  AW
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041207/e746d584/attachment.html
From sly at victoria.tc.ca  Wed Dec  8 00:28:04 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Dec  8 00:28:26 2004
Subject: [gutvol-d] PG books used by visually impaired
Message-ID: <Pine.GSO.4.58.0412080021150.3356@vtn1.victoria.tc.ca>

Dear fellow PG volunteers,

Here's a recent newsgroup posting that I came across
showing more of the use that people do get out of
the texts that we are creating.

Andrew


   Newsgroups: alt.disability.blind.social
   Date: 2004-12-07 11:15:06 PST

I often listen to books from the Gutenberg Project on my laptop. Most of
those books are available from the US Library of Congress on audio cassette
tape. So mainly, when I listen to a book on my laptop, it's because I
forgot to order another book and am having a "book emergency". You know,
when you finish one book and have nothig else to read and it's cold and
rainy outside and there's nothing on TV?

But I've listened to a lot of books on my laptop and after a while you
don't notice the voice -- if it's a good book.

From nihil_obstat at mindspring.com  Wed Dec  8 07:16:11 2004
From: nihil_obstat at mindspring.com (Dennis McCarthy)
Date: Wed Dec  8 07:16:16 2004
Subject: [gutvol-d] PG books used by visually impaired
Message-ID: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net>


On a related thread, does anyone know a user friendly way to make mp3s (or other format) with digitized voices out of P.G. e-books?

Specifically looking for using typical software on Windows or Mac format machines.

I have been able to get newer versions of Abode Acroread to "Read Out Loud" (an option under "View").  This has a couple problems, though:
1) Most people do not have access to Acrobat to make PDF files.  Acroread is free and lets you read PDF files, but you need Acrobat (somewhat expensive) to make PDFs.
2) This is fine for listening at your computer, but I could not find a way to export to MP3 for listening to later using other equipment.  Looked for an option in RealPlayer, but no dice.

This is specifically for someone who drives alot, and wants books-on-disk.

Thanks.


-Dennis McCarthy
anno Domini MMIIII, a.d. VI Id. Dec., dies Mercvri
Feast of the Immaculate Conception

-----Original Message-----
From: Andrew Sly <sly@victoria.tc.ca>
Sent: Dec 8, 2004 3:28 AM
To: gutvol-d@lists.pglaf.org
Subject: [gutvol-d] PG books used by visually impaired

Dear fellow PG volunteers,

Here's a recent newsgroup posting that I came across
showing more of the use that people do get out of
the texts that we are creating.

Andrew


   Newsgroups: alt.disability.blind.social
   Date: 2004-12-07 11:15:06 PST

I often listen to books from the Gutenberg Project on my laptop. Most of
those books are available from the US Library of Congress on audio cassette
tape. So mainly, when I listen to a book on my laptop, it's because I
forgot to order another book and am having a "book emergency". You know,
when you finish one book and have nothig else to read and it's cold and
rainy outside and there's nothing on TV?

But I've listened to a lot of books on my laptop and after a while you
don't notice the voice -- if it's a good book.

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

From alex at awstudios.net  Wed Dec  8 08:40:29 2004
From: alex at awstudios.net (Alex Wilson)
Date: Wed Dec  8 08:40:36 2004
Subject: [gutvol-d] PG books used by visually impaired
In-Reply-To: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net>
Message-ID: <BDDC98AD.333CA%alex@awstudios.net>

On 12/8/04 10:16 AM, "Dennis McCarthy" <nihil_obstat@mindspring.com> wrote:

> 
> On a related thread, does anyone know a user friendly way to make mp3s (or
> other format) with digitized voices out of P.G. e-books?

On the Mac side, Real Mac Software has a program called "Voice Box" which
does just that. The functionality is actually built in to Mac OS X so all
you really need is a simple AppleScript, but VoiceBox gives you more
options.

Alex.

http://www.alexwilson.com - Alex Wilson Studios
http://www.telltaleweekly.org - Funding a Free Audiobook Library


From kth at srv.net  Wed Dec  8 08:24:41 2004
From: kth at srv.net (Kevin Handy)
Date: Wed Dec  8 09:04:33 2004
Subject: [gutvol-d] PG books used by visually impaired
In-Reply-To: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net>
References: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net>
Message-ID: <41B72AC9.4080105@srv.net>

Dennis McCarthy wrote:

>On a related thread, does anyone know a user friendly way to make mp3s (or other format) with digitized voices out of P.G. e-books?
>
>Specifically looking for using typical software on Windows or Mac format machines.
>
>I have been able to get newer versions of Abode Acroread to "Read Out Loud" (an option under "View").  This has a couple problems, though:
>1) Most people do not have access to Acrobat to make PDF files.  Acroread is free and lets you read PDF files, but you need Acrobat (somewhat expensive) to make PDFs.
>2) This is fine for listening at your computer, but I could not find a way to export to MP3 for listening to later using other equipment.  Looked for an option in RealPlayer, but no dice.
>
>This is specifically for someone who drives alot, and wants books-on-disk.
>
>Thanks.
>  
>
Not sure of the availability for whatever platforms you are using, but
'festival' will generate '.wav' files using the included 'text2wav' program.
They should be easy to convert to mp3s.

It's a monotone reading, but usable. More recent versions (i.e. FC3) seem
to  sound smoother (less computerized) than earlier ones (i.e. RH9). Don't
know if this is because of the Linux or the festival versions. Comes
standard  with most recent RedHat and Fedora Core Linux installs, and
probably others.

Price is right (free).

http://www.cstr.ed.ac.uk/projects/festival/manual/

From M.J.Farmer at bham.ac.uk  Wed Dec  8 08:33:35 2004
From: M.J.Farmer at bham.ac.uk (Malcolm Farmer)
Date: Wed Dec  8 09:09:34 2004
Subject: [gutvol-d] PG books used by visually impaired
In-Reply-To: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net>
References: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net>
Message-ID: <41B72CDF.6050505@bham.ac.uk>

Dennis McCarthy wrote:

>On a related thread, does anyone know a user friendly way to make mp3s (or other format) with digitized voices out of P.G. e-books?
>
>Specifically looking for using typical software on Windows or Mac format machines.
>
>I have been able to get newer versions of Abode Acroread to "Read Out Loud" (an option under "View").  This has a couple problems, though:
>1) Most people do not have access to Acrobat to make PDF files.  Acroread is free and lets you read PDF files, but you need Acrobat (somewhat expensive) to make PDFs.
>  
>
PDF creator is free, for Windows: http://sourceforge.net/projects/pdfcreator
Don't know how good it is, but if you're starting with PG plain text 
files, it shouldn't have too much problem.

Or go the whole hog and use Open Office: free, and its word processor 
has an option to export to PDF

That solves the first part under Windows.  Getting Real to save the 
output as audio or MP3, I don't know. The only version of Real I've had 
much experience with (for Linux) just doesn't have any provision to 
allow saving output. It sounds as if Windows is the same.  However, 
under Linux, the "mplayer" program can use the Real codec for playing 
Real audio, and will happily give a variety of outputs, including 
dumping raw sound output to disk for burning or re-encoding, but I have 
no idea if there's a free equivalent for Windows.

You might have to resort to feeding one PC's soundcard line out to 
another PC's line in. These are synthesised voices we're talking about, 
so the loss in quality won't matter, if you're encoding to MP3 anyway.
From tb at baechler.net  Wed Dec  8 23:41:30 2004
From: tb at baechler.net (Tony Baechler)
Date: Wed Dec  8 23:39:25 2004
Subject: [gutvol-d] PG books used by visually impaired
In-Reply-To: <41B72CDF.6050505@bham.ac.uk>
References: <7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net>
	<7910234.1102518971837.JavaMail.root@wamui10.slb.atl.earthlink.net>
Message-ID: <5.2.0.9.0.20041208233722.04129bf0@snoopy2.trkhosting.com>

Hi.  Apologies in advance, but I don't have links for most of this 
software.  Also, I can only comment on Windows.

There are many easy ways to convert to mp3.  Probably the easiest is 
something like TextAloud but I don't know how much it is and I don't use 
it.  What I do is use an audio capture program.  In other words, it records 
anything from the sound card output, whether MIDI, RA, etc. to wave or 
mp3.  The problem is that it does this in real time so if the book is 10 
hours long, it takes that long to record.

This probably doesn't help, but newer OCR programs for the blind such as 
Kurzweil 1000 version 6 and up and newer versions of Openbook have this 
built-in.  They will convert text to mp3 relatively quickly.  If you have a 
specific book you want converted, I will do it for you.  Write me off list.

The particular capture program I use is RecAll.  It is shareware but there 
are free alternatives.  You can go here if you want a 30 day demo.

http://www.sagebrush.com/

From nwolcott2 at kreative.net  Sat Dec 11 07:26:18 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Sat Dec 11 07:26:39 2004
Subject: [gutvol-d] Induce law
Message-ID: <003201c4df95$ca040a00$2371fea9@gateway>

The people who brought you Sonny Bono now are giving you the Induce Law.  Prohibitting the manufacture or sale of any device which might be "reasonably" assumed to "induce" anyone into violating any law or regulation. A VCR is a good example. 

nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041211/fa68c917/attachment.html
From hart at pglaf.org  Sat Dec 11 08:09:34 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Dec 11 08:09:36 2004
Subject: [gutvol-d] Induce law
In-Reply-To: <003201c4df95$ca040a00$2371fea9@gateway>
References: <003201c4df95$ca040a00$2371fea9@gateway>
Message-ID: <Pine.LNX.4.60.0412110806440.2415@pglaf.org>


On Sat, 11 Dec 2004, Norm Wolcott wrote:

> The people who brought you Sonny Bono now are giving you the Induce Law. 
> Prohibitting the manufacture or sale of any device which might be 
> "reasonably" assumed to "induce" anyone into violating any law or regulation. 
> A VCR is a good example.
>
> nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest


Laws have also been introduced in Congress to include skipping ads on TV
as part of the "violating any law or regulation."

Brought to you by your friendly local Thought Police. . . .

The next series could make it illegal to even TALK about skipping ads.


mh
From Gutenberg9443 at aol.com  Sat Dec 11 08:10:23 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Sat Dec 11 08:10:30 2004
Subject: [gutvol-d] Induce law
Message-ID: <c4.1c81d56b.2eec75ef@aol.com>

 
In a message dated 12/11/2004 8:27:09 AM Mountain Standard Time,  
nwolcott2@kreative.net writes:

>>Prohibitting the manufacture or sale of any device  >>which might be 
"reasonably" assumed to "induce" >>anyone into  violating any law or regulation. A 
VCR is a >>good example.  


Passed or proposed?
 
In what legislative body?
 
It seems to me that would hit photocopiers, scanners, tape recorders, large  
and/or external hard drives, and . . . the list could go on quite a while  
longer. This is absurd.
 
Anne
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041211/06760870/attachment.html
From hyphen at hyphenologist.co.uk  Sat Dec 11 09:09:27 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sat Dec 11 09:09:58 2004
Subject: [gutvol-d] Induce law
In-Reply-To: <003201c4df95$ca040a00$2371fea9@gateway>
References: <003201c4df95$ca040a00$2371fea9@gateway>
Message-ID: <0camr01g1hq9k9i459g5b1e6rvi9ge2p9s@4ax.com>

On Sat, 11 Dec 2004 10:26:18 -0500,  "Norm Wolcott"
<nwolcott2@kreative.net> wrote:

| The people who brought you Sonny Bono now are giving you the 
| Induce Law.  Prohibitting the manufacture or sale of any 
| device which might be "reasonably" assumed to "induce" 
| anyone into violating any law or regulation. 
| A VCR is a good example. 

Thank ghod I do not live in the USA.


-- 
Dave F

From bill at truthdb.org  Sat Dec 11 16:55:41 2004
From: bill at truthdb.org (bill jenness)
Date: Sat Dec 11 16:55:56 2004
Subject: [gutvol-d] Re: Induce law 
In-Reply-To: <20041211200003.13D518C83B@pglaf.org>
References: <20041211200003.13D518C83B@pglaf.org>
Message-ID: <32790.134.117.137.83.1102812941.squirrel@134.117.137.83>

Wouldn't that also include guns, fast cars and money? The idea is patently
ridiculous.
From webmaster at gutenberg.org  Sun Dec 12 07:57:04 2004
From: webmaster at gutenberg.org (Marcello Perathoner)
Date: Mon Dec 13 20:03:05 2004
Subject: [gutvol-d] [Fwd: Folio files]
Message-ID: <41BC6A50.2040207@gutenberg.org>


-------- Original Message --------
Subject: Folio files
Date: Sat, 11 Dec 2004 22:45:42 -0000
From: Charles Crosby <charlesjcrosby@btinternet.com>
To: <webmaster@gutenberg.org>

I have downloaded a folio version of Gibbon's 'Decline and Fall...'
What program do I need to read it?
Hoping you can be of assistance,
Charles Crosby.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From gbnewby at pglaf.org  Mon Dec 13 20:11:44 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon Dec 13 20:11:46 2004
Subject: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <41BC6A50.2040207@gutenberg.org>
References: <41BC6A50.2040207@gutenberg.org>
Message-ID: <20041214041144.GC28632@pglaf.org>

On Sun, Dec 12, 2004 at 04:57:04PM +0100, Marcello Perathoner wrote:
> 
> 
> -------- Original Message --------
> Subject: Folio files
> Date: Sat, 11 Dec 2004 22:45:42 -0000
> From: Charles Crosby <charlesjcrosby@btinternet.com>
> To: <webmaster@gutenberg.org>
> 
> I have downloaded a folio version of Gibbon's 'Decline and Fall...'
> What program do I need to read it?
> Hoping you can be of assistance,
> Charles Crosby.

Hi, Charles.  You're probably better off with a different
version of this eBook (visit http://gutenberg.org and
type "gibbon" in an Author search box).

"Folio" is by a company that we haven't heard from
in awhile.  They had some proprietary software for
eBooks.  I'm unaware of any current programs that can
view these files properly.

We keep the files as part of the archive because we
don't like to delete things, but as you can see this format
as not much of a success, from today's point of view.
   -- Greg Newby

From hart at pglaf.org  Tue Dec 14 04:19:17 2004
From: hart at pglaf.org (Michael Hart)
Date: Tue Dec 14 04:19:20 2004
Subject: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <41BC6A50.2040207@gutenberg.org>
References: <41BC6A50.2040207@gutenberg.org>
Message-ID: <Pine.LNX.4.60.0412140418250.5895@pglaf.org>


On Sun, 12 Dec 2004, Marcello Perathoner wrote:

>
>
> -------- Original Message --------
> Subject: Folio files
> Date: Sat, 11 Dec 2004 22:45:42 -0000
> From: Charles Crosby <charlesjcrosby@btinternet.com>
> To: <webmaster@gutenberg.org>
>
> I have downloaded a folio version of Gibbon's 'Decline and Fall...'
> What program do I need to read it?
> Hoping you can be of assistance,
> Charles Crosby.

The program I remember was called
"Folio View"

There once was a free reader,
but it was discontinued.

Michael
From marcello at perathoner.de  Tue Dec 14 04:45:17 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Dec 14 04:46:03 2004
Subject: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <Pine.LNX.4.60.0412140418250.5895@pglaf.org>
References: <41BC6A50.2040207@gutenberg.org>
	<Pine.LNX.4.60.0412140418250.5895@pglaf.org>
Message-ID: <41BEE05D.3080109@perathoner.de>

Michael Hart wrote:

> The program I remember was called
> "Folio View"
> 
> There once was a free reader,
> but it was discontinued.

Then we should either get hold of a copy of that reader and offer it for 
download or delete the files. No point in holding files nobody can read.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From nihil_obstat at mindspring.com  Tue Dec 14 08:22:56 2004
From: nihil_obstat at mindspring.com (Dennis McCarthy)
Date: Tue Dec 14 08:23:02 2004
Subject: [gutvol-d] Google On-Line Library
Message-ID: <15491867.1103041377021.JavaMail.root@wamui08.slb.atl.earthlink.net>


FYI,

Here is an article about Google on-line library project:
http://www.foxnews.com/story/0,2933,141433,00.html

"The ambitious initiative announced late Monday gives Mountain View, Calif.-based Google the right to index material from the New York Public Library as well as libraries at four universities--Harvard, Stanford, Michigan and Oxford in England."

Not sure what their profit angle is.  Supposedly public domain works will be free to access.  Maybe they get a cut of copyrighted books viewed via this service.


---------------------------
Dennis McCarthy
nihil_obstat@mindspring.com

From marcello at perathoner.de  Tue Dec 14 09:01:20 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Dec 14 09:01:27 2004
Subject: [gutvol-d] [Fwd: [ibiblio-announce] ibiblio and all sites offline]
Message-ID: <41BF1C60.7070007@perathoner.de>


-------- Original Message --------
Subject: [ibiblio-announce] ibiblio and all sites offline
Date: Tue, 14 Dec 2004 11:47:00 -0500
From: John Reuning <john@metalab.unc.edu>
Reply-To: help@ibiblio.org
To: ibiblio-announce@lists.ibiblio.org

One of the core file servers crashed this morning.  All web and ftp
services will be offline until this system has been restored.

-jrr
_______________________________________________
ibiblio-announce mailing list
ibiblio-announce@lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/ibiblio-announce


-- 
Marcello Perathoner
webmaster@gutenberg.org

From gbnewby at pglaf.org  Tue Dec 14 09:55:58 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue Dec 14 09:56:00 2004
Subject: [gutvol-d] Google Partners with Oxford,
	Harvard & Others to Digitize Libraries
Message-ID: <20041214175558.GA15809@pglaf.org>

Here's an extract from
http://searchenginewatch.com/searchday/article.php/3447411

'"In-copyright" books that are in these collections will have basic
bibliographic information available but the full text will not be
accessible.

Smith told us that out-of copyright material will be available in full
text, though printing will be disabled when viewing this content.'


This doesn't sound like competition to PG, to me, and in
fact the second sentence above means they won't even meet
my definition of an eBook.

Not to say that these things aren't worthwhile.  After all,
*we* could generate eBooks from scans etc. made
available for public domain content.  This could be
very helpful.

As to "why," when a few PG'ers met with Google last year,
they stressed that from their point of view, any growth
in online content is good for them.  More stuff "out there"
means there's more for them to find.  So, this is partially
altruistic, but also partially for the public good.

It was interesting to see that UC Berkeley, UIUC and Yale
were not among the libraries chosen (those are the 
2-4th largest academic collections in the US, after Harvard).
  -- Greg


From hart at pglaf.org  Tue Dec 14 09:58:11 2004
From: hart at pglaf.org (Michael Hart)
Date: Tue Dec 14 09:58:13 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <41BEE05D.3080109@perathoner.de>
References: <41BC6A50.2040207@gutenberg.org>
	<Pine.LNX.4.60.0412140418250.5895@pglaf.org>
	<41BEE05D.3080109@perathoner.de>
Message-ID: <Pine.LNX.4.60.0412140956110.14885@pglaf.org>

On Tue, 14 Dec 2004, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>> The program I remember was called
>> "Folio View"
>> 
>> There once was a free reader,
>> but it was discontinued.
>
> Then we should either get hold of a copy of that reader and offer it for 
> download or delete the files. No point in holding files nobody can read.

It is VERY important to keep example of files that once had free readers
that are available no longer. . .if nothing more than examples of why we
don't put everything into any particular proprietary format.

Michael S. Hart
From hart at pglaf.org  Tue Dec 14 10:11:21 2004
From: hart at pglaf.org (Michael Hart)
Date: Tue Dec 14 10:11:23 2004
Subject: [gutvol-d] Google Partners with Oxford, Harvard & Others to
	Digitize Libraries
In-Reply-To: <20041214175558.GA15809@pglaf.org>
References: <20041214175558.GA15809@pglaf.org>
Message-ID: <Pine.LNX.4.60.0412141007330.14885@pglaf.org>


On Tue, 14 Dec 2004, Greg Newby wrote:

> Here's an extract from
> http://searchenginewatch.com/searchday/article.php/3447411
>
> '"In-copyright" books that are in these collections will have basic
> bibliographic information available but the full text will not be accessible.
>
> Smith told us that out-of copyright material will be available in full
> text, though printing will be disabled when viewing this content.'

I wonder what Smith means by "full text" ???


> This doesn't sound like competition to PG, to me, and in
> fact the second sentence above means they won't even meet
> my definition of an eBook.
>
> Not to say that these things aren't worthwhile.  After all,
> *we* could generate eBooks from scans etc. made
> available for public domain content.  This could be
> very helpful.

I've also heard they intend to start with 40,000 books
only of interest to rare book people and scholars.

The two projections I heard were 7 and 10 years for the project.


> As to "why," when a few PG'ers met with Google last year,
> they stressed that from their point of view, any growth
> in online content is good for them.  More stuff "out there"
> means there's more for them to find.  So, this is partially
> altruistic, but also partially for the public good.

Of course, Google didn't follow up in any way on this meeting,
and in fact didn't reply to my followup inquiries.


> It was interesting to see that UC Berkeley, UIUC and Yale
> were not among the libraries chosen (those are the
> 2-4th largest academic collections in the US, after Harvard).

Yale was originally announced, at least by NPR, and they
had to announce a retraction.


>  -- Greg


michael
From marcello at perathoner.de  Tue Dec 14 10:42:39 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Dec 14 10:42:48 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <Pine.LNX.4.60.0412140956110.14885@pglaf.org>
References: <41BC6A50.2040207@gutenberg.org>	<Pine.LNX.4.60.0412140418250.5895@pglaf.org>	<41BEE05D.3080109@perathoner.de>
	<Pine.LNX.4.60.0412140956110.14885@pglaf.org>
Message-ID: <41BF341F.9060101@perathoner.de>

Michael Hart wrote:

> It is VERY important to keep example of files that once had free readers
> that are available no longer. . .if nothing more than examples of why we
> don't put everything into any particular proprietary format.

Would there be a better way to keep those "examples" than to keep them 
in the collection buried beneath a ton of other files where they just 
pop up per chance to disgruntle users who inadvertently download them ?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jon at noring.name  Tue Dec 14 10:48:28 2004
From: jon at noring.name (Jon Noring)
Date: Tue Dec 14 10:48:39 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <41BF341F.9060101@perathoner.de>
References: <41BC6A50.2040207@gutenberg.org>
	<Pine.LNX.4.60.0412140418250.5895@pglaf.org>
	<41BEE05D.3080109@perathoner.de>
	<Pine.LNX.4.60.0412140956110.14885@pglaf.org>
	<41BF341F.9060101@perathoner.de>
Message-ID: <73101645671.20041214114828@noring.name>

Marcello wrote:
> Michael Hart wrote:

>> It is VERY important to keep example of files that once had free readers
>> that are available no longer. . .if nothing more than examples of why we
>> don't put everything into any particular proprietary format.

> Would there be a better way to keep those "examples" than to keep them 
> in the collection buried beneath a ton of other files where they just 
> pop up per chance to disgruntle users who inadvertently download them ?

It does seem to me that old "texts" in a obsolete proprietary format
be "retired" to a home of some sort. Keep them, but move them
somewhere else.

Btw, do the Folio version(s) exist in plain text or HTML form?

Jon

From flis at detk.com  Tue Dec 14 11:52:39 2004
From: flis at detk.com (William Flis)
Date: Tue Dec 14 11:46:13 2004
Subject: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <20041214041144.GC28632@pglaf.org>
Message-ID: <LBELIICCBHDEONNDCACJOEBMCIAA.flis@detk.com>

> > I have downloaded a folio version of Gibbon's 'Decline and Fall...'
> > What program do I need to read it?
> > Hoping you can be of assistance,
> > Charles Crosby.
>
> "Folio" is by a company that we haven't heard from
> in awhile.  They had some proprietary software for
> eBooks.  I'm unaware of any current programs that can
> view these files properly.
>
> We keep the files as part of the archive because we
> don't like to delete things, but as you can see this format
> as not much of a success, from today's point of view.

Out of curiosity, I tried Google to find this file (thought maybe I could
bust it open), and it seems that most of the versions of this book out on
the web are identified as "Folio", including those that are available in
plain text and html formats. Maybe he just meant the size of the original
book? (He did write "folio", not "Folio"!)

Bill Flis

From hart at pglaf.org  Tue Dec 14 13:22:13 2004
From: hart at pglaf.org (Michael Hart)
Date: Tue Dec 14 13:22:14 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <73101645671.20041214114828@noring.name>
References: <41BC6A50.2040207@gutenberg.org>
	<Pine.LNX.4.60.0412140418250.5895@pglaf.org>
	<41BEE05D.3080109@perathoner.de>
	<Pine.LNX.4.60.0412140956110.14885@pglaf.org>
	<41BF341F.9060101@perathoner.de>
	<73101645671.20041214114828@noring.name>
Message-ID: <Pine.LNX.4.60.0412141320470.21169@pglaf.org>


On Tue, 14 Dec 2004, Jon Noring wrote:

> Marcello wrote:
>> Michael Hart wrote:
>
>>> It is VERY important to keep example of files that once had free readers
>>> that are available no longer. . .if nothing more than examples of why we
>>> don't put everything into any particular proprietary format.
>
>> Would there be a better way to keep those "examples" than to keep them
>> in the collection buried beneath a ton of other files where they just
>> pop up per chance to disgruntle users who inadvertently download them ?
>
> It does seem to me that old "texts" in a obsolete proprietary format
> be "retired" to a home of some sort. Keep them, but move them
> somewhere else.

No. . .we want them right where people can see the effect
of what would happen if they relied on proprietrary formats.

"Lest we forget."


>
> Btw, do the Folio version(s) exist in plain text or HTML form?

They must somewhere, but I don't have them.

Michael

From hart at pglaf.org  Tue Dec 14 13:25:28 2004
From: hart at pglaf.org (Michael Hart)
Date: Tue Dec 14 13:25:30 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <41BF341F.9060101@perathoner.de>
References: <41BC6A50.2040207@gutenberg.org>
	<Pine.LNX.4.60.0412140418250.5895@pglaf.org>
	<41BEE05D.3080109@perathoner.de>
	<Pine.LNX.4.60.0412140956110.14885@pglaf.org>
	<41BF341F.9060101@perathoner.de>
Message-ID: <Pine.LNX.4.60.0412141322470.21169@pglaf.org>


On Tue, 14 Dec 2004, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>> It is VERY important to keep example of files that once had free readers
>> that are available no longer. . .if nothing more than examples of why we
>> don't put everything into any particular proprietary format.
>
> Would there be a better way to keep those "examples" than to keep them in the 
> collection buried beneath a ton of other files where they just pop up per 
> chance to disgruntle users who inadvertently download them ?

People SHOULD be disgruntled about such things. . . .

We are NOT going to rewrite Project Gutenberg history
to make it appear this didn't happen, nor are we going
to downplay that it happened.

I, personally, met with the President of Folio, before
we embarked on this project, and he assured me that the
free Folio reader would always be available. . .and he
seemed far more friendly than Adobe ever has appeared.

Michael

From gbnewby at pglaf.org  Tue Dec 14 14:25:48 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue Dec 14 14:25:50 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <Pine.LNX.4.60.0412141320470.21169@pglaf.org>
References: <41BC6A50.2040207@gutenberg.org>
	<Pine.LNX.4.60.0412140418250.5895@pglaf.org>
	<41BEE05D.3080109@perathoner.de>
	<Pine.LNX.4.60.0412140956110.14885@pglaf.org>
	<41BF341F.9060101@perathoner.de>
	<73101645671.20041214114828@noring.name>
	<Pine.LNX.4.60.0412141320470.21169@pglaf.org>
Message-ID: <20041214222548.GA23236@pglaf.org>

> >
> >Btw, do the Folio version(s) exist in plain text or HTML form?

Sure: plain text.  Visit gutenberg.org, search for "gibbon"
in the Author field.
  -- Greg

From sly at victoria.tc.ca  Tue Dec 14 15:56:11 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Dec 14 15:56:26 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <41BF341F.9060101@perathoner.de>
References: <41BC6A50.2040207@gutenberg.org>
	<Pine.LNX.4.60.0412140418250.5895@pglaf.org>
	<41BEE05D.3080109@perathoner.de>
	<Pine.LNX.4.60.0412140956110.14885@pglaf.org>
	<41BF341F.9060101@perathoner.de>
Message-ID: <Pine.GSO.4.58.0412141552200.9885@vtn1.victoria.tc.ca>


On Tue, 14 Dec 2004, Marcello Perathoner wrote:

> Michael Hart wrote:
>
> > It is VERY important to keep example of files that once had free readers
> > that are available no longer. . .if nothing more than examples of why we
> > don't put everything into any particular proprietary format.
>
> Would there be a better way to keep those "examples" than to keep them
> in the collection buried beneath a ton of other files where they just
> pop up per chance to disgruntle users who inadvertently download them ?
>

When old files get reposted in the new directory structure,
any formats like this, that cannot be updated, are moved into
an "old" directory.

Is that something like what you were thinking?

Andrew
From gbnewby at pglaf.org  Tue Dec 14 17:40:19 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue Dec 14 17:40:21 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <Pine.GSO.4.58.0412141552200.9885@vtn1.victoria.tc.ca>
References: <41BC6A50.2040207@gutenberg.org>
	<Pine.LNX.4.60.0412140418250.5895@pglaf.org>
	<41BEE05D.3080109@perathoner.de>
	<Pine.LNX.4.60.0412140956110.14885@pglaf.org>
	<41BF341F.9060101@perathoner.de>
	<Pine.GSO.4.58.0412141552200.9885@vtn1.victoria.tc.ca>
Message-ID: <20041215014019.GA10991@pglaf.org>

On Tue, Dec 14, 2004 at 03:56:11PM -0800, Andrew Sly wrote:
> 
> 
> On Tue, 14 Dec 2004, Marcello Perathoner wrote:
> 
> > Michael Hart wrote:
> >
> > > It is VERY important to keep example of files that once had free readers
> > > that are available no longer. . .if nothing more than examples of why we
> > > don't put everything into any particular proprietary format.
> >
> > Would there be a better way to keep those "examples" than to keep them
> > in the collection buried beneath a ton of other files where they just
> > pop up per chance to disgruntle users who inadvertently download them ?
> >
> 
> When old files get reposted in the new directory structure,
> any formats like this, that cannot be updated, are moved into
> an "old" directory.
> 
> Is that something like what you were thinking?
> 
> Andrew

While this is our usual method, unfortunately this 
particular title (#900) is its own eBook # in the
Folio format.

May 1997 Decline/Fall Of The Roman Empire, by Gibbon, Folio[dfre310f.xxx]  900

I think this is the only .nfo file we have.

  -- Greg

From marcello at perathoner.de  Wed Dec 15 02:17:00 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Dec 15 02:17:20 2004
Subject: [gutvol-d] [Fwd: Re: [webgroup] ibib's downtime/server resore]
Message-ID: <41C00F1C.40302@perathoner.de>

-------- Original Message --------
Subject: Re: [webgroup] ibib's downtime/server resore
Date: Tue, 14 Dec 2004 19:49:42 -0500 (EST)
From: Paul Jones <pjones@metalab.unc.edu>
CC: webgroup@lists.ibiblio.org


this morning we lost the fileserver for the first time in 2 years of
continuous uptime. it took until nearly 4 o'clock EST USA to get us back,
but we're back now and the response time is fine for pages and even for my
mysql driven blog.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Wed Dec 15 05:23:57 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Dec 15 05:24:00 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
Message-ID: <20041215132357.EF0F04F441@ws6-5.us4.outblaze.com>


----- Original Message -----
From: "Greg Newby" <gbnewby@pglaf.org>
> 
> On Tue, Dec 14, 2004 at 03:56:11PM -0800, Andrew Sly wrote:
> >
> >
> > On Tue, 14 Dec 2004, Marcello Perathoner wrote:
> >
> > > Michael Hart wrote:
> > >
> > > > It is VERY important to keep example of files that once had free readers
> > > > that are available no longer. . .if nothing more than examples of why we
> > > > don't put everything into any particular proprietary format.
> > >
> > > Would there be a better way to keep those "examples" than to keep them
> > > in the collection buried beneath a ton of other files where they just
> > > pop up per chance to disgruntle users who inadvertently download them ?
> > >
> >
> > When old files get reposted in the new directory structure,
> > any formats like this, that cannot be updated, are moved into
> > an "old" directory.
> >
> > Is that something like what you were thinking?
> >
> > Andrew
> 
> While this is our usual method, unfortunately this
> particular title (#900) is its own eBook # in the
> Folio format.
> 
> May 1997 Decline/Fall Of The Roman Empire, by Gibbon, Folio[dfre310f.xxx]  900
> 
> I think this is the only .nfo file we have.
> 

So any chance we can convert this file to a text file, make that the main entry and move the .nfo file to the OLD subdirectory?

Josh
From hart at pglaf.org  Wed Dec 15 09:37:34 2004
From: hart at pglaf.org (Michael Hart)
Date: Wed Dec 15 09:37:36 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <20041215132357.EF0F04F441@ws6-5.us4.outblaze.com>
References: <20041215132357.EF0F04F441@ws6-5.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0412150935530.28932@pglaf.org>


On Wed, 15 Dec 2004, Joshua Hutchinson wrote:

>
> ----- Original Message -----
> From: "Greg Newby" <gbnewby@pglaf.org>
>>
>> On Tue, Dec 14, 2004 at 03:56:11PM -0800, Andrew Sly wrote:
>>>
>>>
>>> On Tue, 14 Dec 2004, Marcello Perathoner wrote:
>>>
>>>> Michael Hart wrote:
>>>>
>>>>> It is VERY important to keep example of files that once had free readers
>>>>> that are available no longer. . .if nothing more than examples of why we
>>>>> don't put everything into any particular proprietary format.
>>>>
>>>> Would there be a better way to keep those "examples" than to keep them
>>>> in the collection buried beneath a ton of other files where they just
>>>> pop up per chance to disgruntle users who inadvertently download them ?
>>>>
>>>
>>> When old files get reposted in the new directory structure,
>>> any formats like this, that cannot be updated, are moved into
>>> an "old" directory.
>>>
>>> Is that something like what you were thinking?
>>>
>>> Andrew
>>
>> While this is our usual method, unfortunately this
>> particular title (#900) is its own eBook # in the
>> Folio format.
>>
>> May 1997 Decline/Fall Of The Roman Empire, by Gibbon, Folio[dfre310f.xxx]  900
>>
>> I think this is the only .nfo file we have.
>>
>
> So any chance we can convert this file to a text file, make that the main 
> entry and move the .nfo file to the OLD subdirectory?
>
> Josh

Please stop trying to rewrite history. . . .

This should be kept as a straighforward example
of what can and DOES happen with proprietary formats.


michael


PS

You probably don't remember the previous example of WordStar.
From joshua at hutchinson.net  Wed Dec 15 09:49:25 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Dec 15 09:49:30 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
Message-ID: <20041215174925.3624B9E79C@ws6-2.us4.outblaze.com>


----- Original Message -----
From: "Michael Hart" <hart@pglaf.org>

> >
> > So any chance we can convert this file to a text file, make that the main 
> > entry and move the .nfo file to the OLD subdirectory?
> >
> > Josh
> 
> Please stop trying to rewrite history. . . .
> 
> This should be kept as a straighforward example
> of what can and DOES happen with proprietary formats.
> 
> 

Sorry.  Gotta call bullshit on this one.  Keeping the file in the OLD subdirectory maintains the history for those that wish to find it, while allowing better usability for those folks that simply want to read this particular work.

How frustrating do you think people would be if they went to their local library, found a book in the catalog that they wanted, but the only place they are allowed to access the book is in a backroom that is pitch black.  Yeah, they have the book, but it is completely useless to the reader.

So, yeah, we have the book in PG... but it is completely useless to you.  

/nelson-voice-from-The-Simpsons

HA HA

/end-nelson-voice-from-The-Simpsons

Josh
From hart at pglaf.org  Wed Dec 15 09:56:56 2004
From: hart at pglaf.org (Michael Hart)
Date: Wed Dec 15 09:56:58 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <20041215174925.3624B9E79C@ws6-2.us4.outblaze.com>
References: <20041215174925.3624B9E79C@ws6-2.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0412150951200.28932@pglaf.org>


On Wed, 15 Dec 2004, Joshua Hutchinson wrote:

>
> ----- Original Message -----
> From: "Michael Hart" <hart@pglaf.org>
>
>>>
>>> So any chance we can convert this file to a text file, make that the main
>>> entry and move the .nfo file to the OLD subdirectory?
>>>
>>> Josh
>>
>> Please stop trying to rewrite history. . . .
>>
>> This should be kept as a straighforward example
>> of what can and DOES happen with proprietary formats.
>>
>>
>
> Sorry.  Gotta call bullshit on this one.  Keeping the file in the OLD 
> subdirectory maintains the history for those that wish to find it, while 
> allowing better usability for those folks that simply want to read this 
> particular work.

Barnyard epithets aside, this is too important to sweep under the carpet.

There is plenty of usability in other formats, so leave it be. . . .


> How frustrating do you think people would be if they went to their local 
> library, found a book in the catalog that they wanted, but the only place 
> they are allowed to access the book is in a backroom that is pitch black. 
> Yeah, they have the book, but it is completely useless to the reader.

That's the whole point. . .so don't hide it. . .MAKE the point, publicly.


> So, yeah, we have the book in PG... but it is completely useless to you.

No. . .it's available in other formats. . .if you take a look.


>
> /nelson-voice-from-The-Simpsons
>
> HA HA
>
> /end-nelson-voice-from-The-Simpsons
>
> Josh
>

Yes, you are correct, you are making a silly argument.


The President of Folio came to visit us here,
and promised the free Folio reader. . . .

Of course this is ancient history to you,
but some of us remember, and do not want
such an effort wiped out of our history.

It was a LOT of work. . . .

Leave it be. . . .


Michael
From marcello at perathoner.de  Wed Dec 15 09:59:53 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Dec 15 10:00:00 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <Pine.LNX.4.60.0412150935530.28932@pglaf.org>
References: <20041215132357.EF0F04F441@ws6-5.us4.outblaze.com>
	<Pine.LNX.4.60.0412150935530.28932@pglaf.org>
Message-ID: <41C07B99.10100@perathoner.de>

Michael Hart wrote:

> Please stop trying to rewrite history. . . .

Please stop kicking history in the teeth of people who don't care and 
just want to read a book.


> This should be kept as a straighforward example
> of what can and DOES happen with proprietary formats.

It should be kept, yes, but not in the main archive. Please, write a 
"Hall of Shame" page or something and link the files from there.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Wed Dec 15 10:06:03 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Dec 15 10:06:09 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
Message-ID: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com>

You're missing the point I'm trying to make, Michael.

Keep the history.  No problem.  Just don't make it the DEFAULT that pops up when someone does a search.  Joe User couldn't care less about our history.  He just wants to read the book.  So, give him the book that he CAN read.

Put the "historical mistake" in the OLD subdirectory, where it is still available for those of us that care about such things.

Josh

PS I think the pop culture Simpsons reference flew right by you.  ;)


----- Original Message -----
From: "Michael Hart" <hart@pglaf.org>
To: "The gutvol-d Mailing List" <gutvol-d@lists.pglaf.org>
Subject: Re: !@!Re: [gutvol-d] [Fwd: Folio files]
Date: Wed, 15 Dec 2004 09:56:56 -0800 (PST)

> 
> 
> On Wed, 15 Dec 2004, Joshua Hutchinson wrote:
> 
> >
> > ----- Original Message -----
> > From: "Michael Hart" <hart@pglaf.org>
> >
> >>>
> >>> So any chance we can convert this file to a text file, make that the main
> >>> entry and move the .nfo file to the OLD subdirectory?
> >>>
> >>> Josh
> >>
> >> Please stop trying to rewrite history. . . .
> >>
> >> This should be kept as a straighforward example
> >> of what can and DOES happen with proprietary formats.
> >>
> >>
> >
> > Sorry.  Gotta call bullshit on this one.  Keeping the file in the OLD 
> > subdirectory maintains the history for those that wish to find it, while 
> > allowing better usability for those folks that simply want to read this 
> > particular work.
> 
> Barnyard epithets aside, this is too important to sweep under the carpet.
> 
> There is plenty of usability in other formats, so leave it be. . . .
> 
> 
> > How frustrating do you think people would be if they went to their local 
> > library, found a book in the catalog that they wanted, but the only place 
> > they are allowed to access the book is in a backroom that is pitch black. 
> > Yeah, they have the book, but it is completely useless to the reader.
> 
> That's the whole point. . .so don't hide it. . .MAKE the point, publicly.
> 
> 
> > So, yeah, we have the book in PG... but it is completely useless to you.
> 
> No. . .it's available in other formats. . .if you take a look.
> 
> 
> >
> > /nelson-voice-from-The-Simpsons
> >
> > HA HA
> >
> > /end-nelson-voice-from-The-Simpsons
> >
> > Josh
> >
> 
> Yes, you are correct, you are making a silly argument.
> 
> 
> The President of Folio came to visit us here,
> and promised the free Folio reader. . . .
> 
> Of course this is ancient history to you,
> but some of us remember, and do not want
> such an effort wiped out of our history.
> 
> It was a LOT of work. . . .
> 
> Leave it be. . . .
> 
> 
> Michael
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From prishan at bom3.vsnl.net.in  Wed Dec 15 09:53:45 2004
From: prishan at bom3.vsnl.net.in (avinash kothare)
Date: Wed Dec 15 10:06:24 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
References: <Pine.LNX.4.60.0412150935530.28932@pglaf.org>
Message-ID: <41C07A29.000001.55643@AVINASH>

 
-------Original Message-------
 
From: Michael S. Hart; Project Gutenberg Volunteer Dis cussion
Date: 12/15/04 23:07:36
To: Project Gutenberg Volunteer Discussion
Subject: Re: !@!Re: [gutvol-d] [Fwd: Folio files]
 
Please stop trying to rewrite history. . . .
 
This should be kept as a straighforward example
of what can and DOES happen with proprietary formats.
 
 
michael

Duh!

I have downloaded so many reading formats, for making reading a pleasure for
the eyes.

At the final count, it all boils down to get a good version of text [I
shrivel from saying a 'perfect version'> and run a script which could make
it easier for your eyes.

Aesthetics dictate, that all those beatiful images are included in.

What else does the 99% of the whole wide world of readers <having been
fortunate enough to have an internet connection> need?

<Pleading forgiveness from the scholarly readers.>

You probably don't remember the previous example of WordStar.

Beg your pardon Sir. :-)

Avinash.______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041215/0a412bf6/attachment.html
From hart at pglaf.org  Wed Dec 15 10:12:43 2004
From: hart at pglaf.org (Michael Hart)
Date: Wed Dec 15 10:12:45 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com>
References: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0412151010230.28932@pglaf.org>


On Wed, 15 Dec 2004, Joshua Hutchinson wrote:

> You're missing the point I'm trying to make, Michael.
>
> Keep the history.  No problem.  Just don't make it the DEFAULT that pops up 
> when someone does a search.  Joe User couldn't care less about our history. 
> He just wants to read the book.  So, give him the book that he CAN read.

I don't see any messages that say the .nfo file is the default,
out of the dozen or so that I have here.

_I_ certainly haven't said it should be the default, just that it should
not be moved away from the main directories.


> Put the "historical mistake" in the OLD subdirectory, where it is still 
> available for those of us that care about such things.

It is our job to make sure people are aware of this.

They can't care if they are not aware of it.


> Josh

Michael


> PS I think the pop culture Simpsons reference flew right by you.  ;)

Doh!


>
>
> ----- Original Message ----- From: "Michael Hart" <hart@pglaf.org> To: "The 
> gutvol-d Mailing List" <gutvol-d@lists.pglaf.org> Subject: Re: !@!Re: 
> [gutvol-d] [Fwd: Folio files] Date: Wed, 15 Dec 2004 09:56:56 -0800 (PST)
>
>>
>>
>> On Wed, 15 Dec 2004, Joshua Hutchinson wrote:
>>
>>>
>>> ----- Original Message -----
>>> From: "Michael Hart" <hart@pglaf.org>
>>>
>>>>>
>>>>> So any chance we can convert this file to a text file, make that the main
>>>>> entry and move the .nfo file to the OLD subdirectory?
>>>>>
>>>>> Josh
>>>>
>>>> Please stop trying to rewrite history. . . .
>>>>
>>>> This should be kept as a straighforward example
>>>> of what can and DOES happen with proprietary formats.
>>>>
>>>>
>>>
>>> Sorry.  Gotta call bullshit on this one.  Keeping the file in the OLD
>>> subdirectory maintains the history for those that wish to find it, while
>>> allowing better usability for those folks that simply want to read this
>>> particular work.
>>
>> Barnyard epithets aside, this is too important to sweep under the carpet.
>>
>> There is plenty of usability in other formats, so leave it be. . . .
>>
>>
>>> How frustrating do you think people would be if they went to their local
>>> library, found a book in the catalog that they wanted, but the only place
>>> they are allowed to access the book is in a backroom that is pitch black.
>>> Yeah, they have the book, but it is completely useless to the reader.
>>
>> That's the whole point. . .so don't hide it. . .MAKE the point, publicly.
>>
>>
>>> So, yeah, we have the book in PG... but it is completely useless to you.
>>
>> No. . .it's available in other formats. . .if you take a look.
>>
>>
>>>
>>> /nelson-voice-from-The-Simpsons
>>>
>>> HA HA
>>>
>>> /end-nelson-voice-from-The-Simpsons
>>>
>>> Josh
>>>
>>
>> Yes, you are correct, you are making a silly argument.
>>
>>
>> The President of Folio came to visit us here,
>> and promised the free Folio reader. . . .
>>
>> Of course this is ancient history to you,
>> but some of us remember, and do not want
>> such an effort wiped out of our history.
>>
>> It was a LOT of work. . . .
>>
>> Leave it be. . . .
>>
>>
>> Michael
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From hart at pglaf.org  Wed Dec 15 10:14:01 2004
From: hart at pglaf.org (Michael Hart)
Date: Wed Dec 15 10:14:02 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <41C07B99.10100@perathoner.de>
References: <20041215132357.EF0F04F441@ws6-5.us4.outblaze.com>
	<Pine.LNX.4.60.0412150935530.28932@pglaf.org>
	<41C07B99.10100@perathoner.de>
Message-ID: <Pine.LNX.4.60.0412151013140.28932@pglaf.org>


On Wed, 15 Dec 2004, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>> Please stop trying to rewrite history. . . .
>
> Please stop kicking history in the teeth of people who don't care and just 
> want to read a book.

Not trying to make it the default, if that is now the issue.

>
>
>> This should be kept as a straighforward example
>> of what can and DOES happen with proprietary formats.
>
> It should be kept, yes, but not in the main archive. Please, write a "Hall of 
> Shame" page or something and link the files from there.

Sorry, it should not be relegated to museum status.

>
>
>
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org
>
From jon at noring.name  Wed Dec 15 10:30:42 2004
From: jon at noring.name (Jon Noring)
Date: Wed Dec 15 10:31:27 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com>
References: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com>
Message-ID: <1941350328.20041215113042@noring.name>

Joshua wrote:

> You're missing the point I'm trying to make, Michael.
>
> Keep the history.  No problem.  Just don't make it the DEFAULT
> that pops up when someone does a search.  Joe User couldn't care
> less about our history.  He just wants to read the book.  So,
> give him the book that he CAN read.

Good point. I agree with this.


> Put the "historical mistake" in the OLD subdirectory, where it
> is still available for those of us that care about such things.

Now, to show support for Michael's reasoning, PG definitely needs to
make a strong point about the importance of using easy to repurpose
open standards for formatting etexts. But mixing obsolete proprietary
formats with usable formats actually works against making this point,
as Joshua notes. It also aggravates users who may want to read the
work, but can't (and thus they will develop a negative view towards
PG.)

I say move them to a special directory so they are *easier* to find,
and then create a web site describing why proprietary formats are bad
(especially those which are very difficult to repurpose even when the
format is published.) Provide links at this web site to those works
in the collection using proprietary formats. I guess one could call it
a "PG Hall of Shame" collection. <smile/>

Just a suggestion.

Jon

From scottsch at ncweb.com  Wed Dec 15 11:41:41 2004
From: scottsch at ncweb.com (Scott Schmucker)
Date: Wed Dec 15 11:42:06 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <1941350328.20041215113042@noring.name>
References: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com>
	<1941350328.20041215113042@noring.name>
Message-ID: <41C09375.4070801@ncweb.com>

Jon Noring wrote:

>
>Now, to show support for Michael's reasoning, PG definitely needs to
>make a strong point about the importance of using easy to repurpose
>open standards for formatting etexts. But mixing obsolete proprietary
>formats with usable formats actually works against making this point,
>as Joshua notes. It also aggravates users who may want to read the
>work, but can't (and thus they will develop a negative view towards
>PG.)
>
>I say move them to a special directory so they are *easier* to find,
>and then create a web site describing why proprietary formats are bad
>(especially those which are very difficult to repurpose even when the
>format is published.) Provide links at this web site to those works
>in the collection using proprietary formats. I guess one could call it
>a "PG Hall of Shame" collection. <smile/>
>  
>
I do support this suggestion.  The reasoning behind my support is:

Were I a random reader searching Project Gutenberg for a copy of Edward 
Gibbon's "History of the Decline and Fall of the Roman Empire" I would 
be met by a series of files available for download.  I find several 
textual files for each volume of the history, and one Folio formatted 
document (which does, for the record, appear first, perhaps because 
'Folio' is alphabetically prior to 'Volume').  I choose the first item 
on the list, and perhaps, if it has been somehow moved from the top of 
the list, I select that which does not specify a volume number, 
intending to locate the full set of volumes.  I then see the following 
comment which appears in the notes for this Folio-formatted document:

DO NOT DOWNLOAD !!! see #892 for HTML format, #733 for plain text.
The Folio format is obsolete. You won't be able to display the file.

Thank goodness that this comment is here, but I suggest that this does 
not have the affect that we intend, and that Michael very strongly 
supports.  As a random reader, I do not look at this and say "What a 
tragic result of proprietary e-book formats!"  Rather, the only thought 
that I can imagine is one of confusion.  "What a foolish thing for 
Project Gutenberg to have!" not, "What a foolish thing for anybody to do!"

I do support Jon's suggestion of creating a Project Gutenberg "Hall of 
Shame" of sorts, which provides the argument against proprietary e-book 
formats.  I suggest that the Folio-formatted e-books could be moved into 
this portion of the site.  Of course, as Michael has pointed out, the 
intention is not to hide the documents away.  With that intention in 
mind, it would not be unreasonable to leave the original entry within 
the database, but replace the above note "DO NOT DOWNLOAD, etc" with a 
more detailed reference to the aforementioned "Hall of Shame."  This 
provides Joe Reader with more of a justification for the document's 
presence, and possibly sends him away with a different perspective on 
proprietary e-book formats, which, after all, is the intention.

- Scott Schmucker

From maitriv at yahoo.com  Wed Dec 15 11:46:29 2004
From: maitriv at yahoo.com (maitri venkat-ramani)
Date: Wed Dec 15 11:46:36 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <41C07B99.10100@perathoner.de>
Message-ID: <20041215194629.71470.qmail@web52308.mail.yahoo.com>


People, people,

>From the standpoint of an archivist, I have to fall on the side of
keeping all file formats available and accessible to the reader.  PG is
as much a historical catalog as it is a library.  How hard is it to 
have the search page list ALL of the files we have in our collection
under that name (pssst ... much like we have now?).  Leave it to the
discretion of the user to download the format they wish to.

If the Folio files aren't listed along with the others, how will our
readers know they are there?  Some solutions:

1.  List all of the files in PG per book, along with a legend that
explains to the user what the formats mean.

2.  Put up an info page that lets users know ALL file formats we carry.
 I'm happy to help put this together if appropriate.

There's nothing wrong in keeping archive formats around.  

As Chief Wiggum says, "I hope this has taught you kids a lesson: kids
never learn."

Maitri


--- Marcello Perathoner <marcello@perathoner.de> wrote:

> Michael Hart wrote:
> 
> > Please stop trying to rewrite history. . . .
> 
> Please stop kicking history in the teeth of people who don't care and
> 
> just want to read a book.
> 
> 
> > This should be kept as a straighforward example
> > of what can and DOES happen with proprietary formats.
> 
> It should be kept, yes, but not in the main archive. Please, write a 
> "Hall of Shame" page or something and link the files from there.
> 
> 
> 
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
From bill at truthdb.org  Wed Dec 15 12:14:07 2004
From: bill at truthdb.org (bill jenness)
Date: Wed Dec 15 12:14:17 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <41C09375.4070801@ncweb.com>
References: <41C09375.4070801@ncweb.com>
Message-ID: <32842.134.117.137.162.1103141647.squirrel@134.117.137.162>

...
> Were I a random reader searching Project Gutenberg for a copy of Edward
> Gibbon's "History of the Decline and Fall of the Roman Empire" I would
> be met by a series of files available for download.  I find several
> textual files for each volume of the history, and one Folio formatted
> document (which does, for the record, appear first, perhaps because
> 'Folio' is alphabetically prior to 'Volume').  I choose the first item
> on the list, and perhaps, if it has been somehow moved from the top of
...
> - Scott Schmucker
>
>
>
If this is the case, perhaps it would be easier to add XTINCT prior to
Folio in the title, thus commenting on proprietary formats and changing
the sort order although dropping the E from extinct might look funny....
From jmdyck at ibiblio.org  Wed Dec 15 12:55:34 2004
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Wed Dec 15 12:56:00 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
References: <20041215180603.CC4764F4BA@ws6-5.us4.outblaze.com>
	<1941350328.20041215113042@noring.name> <41C09375.4070801@ncweb.com>
Message-ID: <41C0A4C6.C66EEE2A@ibiblio.org>

Scott Schmucker wrote:
> 
> I then see the following
> comment which appears in the notes for this Folio-formatted document:
> 
> DO NOT DOWNLOAD !!! see #892 for HTML format, #733 for plain text.
> The Folio format is obsolete. You won't be able to display the file.

Etexts #892 and #733 are each "Decline & Fall" Volume 3.
Why point to volume 3?

-Michael
From hart at pglaf.org  Wed Dec 15 07:34:11 2004
From: hart at pglaf.org (Michael Hart)
Date: Wed Dec 15 18:43:25 2004
Subject: [gutvol-d] re: [BP] Google Partners with Oxford,
 Harvard & Others to Digitize Libraries
In-Reply-To: <20041214234529.2976.qmail@web60701.mail.yahoo.com>
References: <20041214234529.2976.qmail@web60701.mail.yahoo.com>
Message-ID: <Pine.LNX.4.60.0412150729001.28932@pglaf.org>


On Tue, 14 Dec 2004, Tony Kline wrote:

>
>
> Bowerbird@aol.com wrote:
>
> tony said:
>>> That's very good, though image files hardly meet the needs of those users
>>> who want digital text and the ability to download, cut and paste etc
>
>> well, since google _is_ a search engine, they'll obviously o.c.r. the text.
>> and clean up the text, because errors would muck up their search engine.
>
> Did they say OCR or did you deduce that? I got the impression they are
> imaging pages, and maybe adding some identifying keywords for each page.
> That is you'll be able to Google to a title chapter and page maybe, but
> you won't be able to Google within pages. Try OCR'ing some of the stuff
> in the Bodleian...there ain't no such fonts!! Does anyone know what
> they mean by digitizing?

Here's what I have gleaned from 5 TV network news shows and the various
NYT, SF Chron, etc., articles:

There will be one "full text" respository at Google,
but users won't be able to access more than a "snippet"
around any quotation they look up, much as with general
Google searches today, and then, if they want more, they
will have to click on the item and will then arrive at
a second database, this one provided by one of the five
libraries [NYCPL, Harvard, Michigan, Stanford, Oxford]
where they will get a graphical representation of the
non-printable page that contains the quotation.

Why they chose to call it "Google Print" when printing
is outlawed, I have no idea.


Michael


From hart at pglaf.org  Wed Dec 15 07:48:59 2004
From: hart at pglaf.org (Michael Hart)
Date: Wed Dec 15 18:43:26 2004
Subject: [gutvol-d] Re: [ebook-community] Google Question for Michael Hart
In-Reply-To: <000001c4e263$6e637cf0$0200a8c0@BABA>
References: <000001c4e263$6e637cf0$0200a8c0@BABA>
Message-ID: <Pine.LNX.4.60.0412150739110.28932@pglaf.org>


On Tue, 14 Dec 2004, Roy Lewis wrote:

>
> What is Michael Hart's take on the latest from Google?  I wonder how
> this will impact Project Gutenberg?  Will this do what you have been
> trying to do but with a LOTS of MONEY behind it?
>
> Roy Lewis
> Garland, TX

Today is the day I have to write and send the Project Gutenberg
Weekly Newsletter, and I have only 2:20 to the deadline, so I hope
you will allow me to come back to answer that in a bit, but if you
have a specific question I hope I can answer right away.

However, I think you will find that these billion dollar giants
don't actually have anything in mind that would be definable as
an eBook as the term has been being use. . .i.e. you can't keep
it, you can't print it, you can't cut and paste quotations that
are more than a "snippet," as they call it, you can't make your
own concordance, index, edition, or carry a million dollars of
retail value books with you on a DVD or two.

In addition, I guarantee that Project Gutenberg will be the first
to offer such a "Million Dollar DVD" of eBooks, and will be the
first to present a collection of 50,000 eBooks, and, most liklely
will be the first to offer 100,000 eBooks for any kind of service,
but certainly for free download.

As for getting into the millions, I'm going to wait until we're
approaching 100,000 to focus all that tightly on getting into
7 figures of eBooks.

BTW, they said 15 million eBooks. . .and I'm not sure they HAVE
15 million eBooks that they can legally use in the worldwide
service they announced yesterday.

I'd certainly be willing to bet dinner on it!

More later,


Thanks!!!


So Nice To Hear From You!

Happy Holidays!!!


Michael


Give FreeBooks!!!
In 39 Languages!!!

As of December 12, 2004
~14,683 FreeBooks at:
~317 to go to 15,000
http://www.gutenberg.org
http://www.gutenberg.net

We are ~96% of the way
from 10,000 to 15,000.

Now even more PG eBooks
In 104 Languages!!!
http://gutenberg.cc
http://gutenberg.us

Michael S. Hart
<hart@pobox.com>
Project Gutenberg
Executive Coordinator^M
"*Internet User ~#100*"

If you do not receive
a prompt reply, please
resend, keep resending.

From george at pglaf.org  Wed Dec 15 11:07:08 2004
From: george at pglaf.org (George Davis)
Date: Wed Dec 15 18:43:27 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
Message-ID: <Pine.LNX.4.60.0412151104550.4916@pglaf.org>

Apologies for coming in late here, but it should be noted that the following
entry was added to the GUTINDEX in April, 2003:

May 1997 Decline/Fall Of The Roman Empire, by Gibbon, Folio[dfre310f.xxx]  900
   (NOTE:  in proprietary Folio .nfo format; Vol. 3 only.)
   (See also:  #890-895 for HTML format, #731-736 for plain text.)

Also the following notes, especially for #892:

Apr 1997 Decline/Fall Of The Roman Empire, by Gibbon V6 htm[dfre6xxh.xxx]  895
Apr 1997 Decline/Fall Of The Roman Empire, by Gibbon V5 htm[dfre5xxh.xxx]  894
Apr 1997 Decline/Fall Of The Roman Empire, by Gibbon V4 htm[dfre4xxh.xxx]  893

Apr 1997 Decline/Fall Of The Roman Empire, by Gibbon V3 htm[dfre3xxx.xxx]  892
[This vol only also available as plain text in dfre3xx.txt/.zip]
Apr 1997 Decline/Fall Of The Roman Empire, by Gibbon V2 htm[dfre2xxh.xxx]  891
Apr 1997 Decline/Fall Of The Roman Empire, by Gibbon V1 htm[dfre1xxh.xxx]  890
[Author:  Edward Gibbon]
(Note:  The above 6 files are HTML conversions of ebook #'s 731-736)

Occasionally, one may find other tidbits of useful information inside
GUTINDEX.ALL, especially for some of the more esoteric items.  It is a holding
place for such info until such time as something better comes along.

For example, the above may or may not be useful when updating a bibrec in the
future.

And as I haven't expressed an opinion lately, herewith is mine for #900:

#900 should be moved to /9/0/900/old/, and a 900-readme.txt or dfre310f-
readme.txt should be placed in /9/0/900/ explaining the situation, including
the reasons it has not been discarded; it _is_ a part of PG history, as
documented in the newsletters over the years, and prior to that, in the various
maillists.

And besides, it makes for lively discussions every couple of years.

If no one else wants to, I'll write a brief (less than 10K words?) readme when
the time comes to "update" this posting.

FWIW,

[<G>eorge]
From hacker at gnu-designs.com  Wed Dec 15 19:56:57 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Dec 15 19:57:26 2004
Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others
	to Digitize Libraries
In-Reply-To: <Pine.LNX.4.60.0412150729001.28932@pglaf.org>
References: <20041214234529.2976.qmail@web60701.mail.yahoo.com>
	<Pine.LNX.4.60.0412150729001.28932@pglaf.org>
Message-ID: <Pine.LNX.4.58.0412152256320.17014@aphrodite.gnu-designs.com>


> > Did they say OCR or did you deduce that? I got the impression they
> > are imaging pages, and maybe adding some identifying keywords for
> > each page. That is you'll be able to Google to a title chapter and
> > page maybe, but you won't be able to Google within pages. Try
> > OCR'ing some of the stuff in the Bodleian...there ain't no such
> > fonts!! Does anyone know what they mean by digitizing?

	I don't think so. Have you seen catalog.google.com?

David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From shalesller at writeme.com  Wed Dec 15 20:18:06 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Dec 15 20:18:22 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
Message-ID: <20041216041806.956774BDAB@ws1-1.us4.outblaze.com>

> From the standpoint of an archivist, I have to fall on the side of
> keeping all file formats available and accessible to the reader.  PG is
> as much a historical catalog as it is a library.  

Since when? Why? 

If we want to teach people about the death of old
formats, maybe we should have a page about old formats, and how WordStar
and Folio and other formats were da bomb, and how it's hard to find anything
that can read them now. If they come across them in a search, how will they
even know that it's an old format nobody can read? For all I would have 
known before this discussion, you could run out and buy an ebook reader that
takes Folio, or download a program to read them. 

Remember that it's not just proprietary formats that die; I seem to remember 
code to read WordStar files in one of my old programming books, and there's 
a bunch of open source programs where you'd have to go through old CDs to 
find a version of the program that could read your files. 
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From j.hagerson at comcast.net  Wed Dec 15 20:23:57 2004
From: j.hagerson at comcast.net (John Hagerson)
Date: Wed Dec 15 20:24:21 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <20041216041806.956774BDAB@ws1-1.us4.outblaze.com>
Message-ID: <000c01c4e327$12d3da00$6401a8c0@sarek>

What is the purpose of PG? Is it to be a white-haired old man, with a
scraggy beard, carrying a sign on the beach that says "Proprietary e-book
formats may die!" or is it to provide a repository of information that is
useful today and into the future?

I wholeheartedly second the notion of moving obsolete formats into a "hall
of shame."


From jon at noring.name  Wed Dec 15 20:39:35 2004
From: jon at noring.name (Jon Noring)
Date: Wed Dec 15 20:40:01 2004
Subject: [gutvol-d] re: [BP] Google Partners with Oxford,
	Harvard & Others to Digitize Libraries
In-Reply-To: <Pine.LNX.4.60.0412150729001.28932@pglaf.org>
References: <20041214234529.2976.qmail@web60701.mail.yahoo.com>
	<Pine.LNX.4.60.0412150729001.28932@pglaf.org>
Message-ID: <19677883578.20041215213935@noring.name>

Tony Kline wrote:
> Bowerbird@aol.com wrote:

>> well, since google _is_ a search engine, they'll obviously o.c.r. the text.
>> and clean up the text, because errors would muck up their search engine.

> Did they say OCR or did you deduce that? I got the impression they are
> imaging pages, and maybe adding some identifying keywords for each page.
> That is you'll be able to Google to a title chapter and page maybe, but
> you won't be able to Google within pages. Try OCR'ing some of the stuff
> in the Bodleian...there ain't no such fonts!! Does anyone know what
> they mean by digitizing?

My understanding, which may be wrong, is that Google will OCR the
page scans, but do only cursory machine cleanup of the raw unstructured
text that results (which I call "raw digital text" or RDT), and use the
still-error-laden RDT in their search system to pull up the page scans
(or simply to refer to book title and page number.)

[Obviously, RDT will have numerous scanning errors, and those who are
familiar with the output of OCR engines know that that RDT is overall
one big ball of wax. Certainly Google can write some advanced program
to try to clean up the more obvious scanning errors in the RDT, but it
will only correct some of the errors, but the result is probably good
enough for search purposes. I rather doubt they will do any human
proofing (it is way too expensive, and anyway, it's better to turn the
public domain stuff over to Distributed Proofreaders who will do it
*for free* via enthusiastic volunteer power. Any corporate entity that
does not take advantage of free human labor to further their business
is not serving their stockholders!)]

Interestingly, this is what the University of Michigan (one of the
Google partners I believe) did in their "Making of America" collection,
which has been around for a few years now. See:

   http://www.hti.umich.edu/m/moagrp/

MoA scanned the books, placed the scanned page images online (they
are freely available -- it's a cool collection that, strangely, hardly
anyone has heard of), and built a search engine to search the
resulting RDT from OCR. Then one by one they have been converting the
RDT from selected books to highly-proofed SDT (structured digital text)
using human proofers and TEI (I think) for structuring. So, the scans
came first, and then the cleanup was (and is being) done at a later
time.

It's entirely possible that Google will give, upon request, the page
scans for any public domain books they've scanned to established
groups like Distributed Proofreaders for conversion into proofed SDT,
so long as Google gets a copy of the resulting high-quality SDT. I
hope they will do this. If not, it will be disappointing -- but at
least we have the Internet Archive who will make all their scanned
books available to the world. They may end up with over one million
books, enough to feed Distributed Proofreaders for quite a while.

Jon Noring

From Bowerbird at aol.com  Wed Dec 15 22:50:08 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Dec 15 22:50:29 2004
Subject: [gutvol-d] excuse me
Message-ID: <1d8.32b83f3f.2ef28a20@aol.com>


jon noring, please stop sending your replies
to my bookpeople posts to other listserves,
including one from which you've banned me.

thank you.

-bowerbird
From hart at pglaf.org  Thu Dec 16 06:51:01 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 16 06:51:03 2004
Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others
	to Digitize Libraries
In-Reply-To: <19677883578.20041215213935@noring.name>
References: <20041214234529.2976.qmail@web60701.mail.yahoo.com>
	<Pine.LNX.4.60.0412150729001.28932@pglaf.org>
	<19677883578.20041215213935@noring.name>
Message-ID: <Pine.LNX.4.60.0412160650120.1015@pglaf.org>


>From what I understand, Google and the five libraries are going to do some
serious DRM on all their sites and these scans and files will NOT go out.

mh


On Wed, 15 Dec 2004, Jon Noring wrote:

> Tony Kline wrote:
>> Bowerbird@aol.com wrote:
>
>>> well, since google _is_ a search engine, they'll obviously o.c.r. the text.
>>> and clean up the text, because errors would muck up their search engine.
>
>> Did they say OCR or did you deduce that? I got the impression they are
>> imaging pages, and maybe adding some identifying keywords for each page.
>> That is you'll be able to Google to a title chapter and page maybe, but
>> you won't be able to Google within pages. Try OCR'ing some of the stuff
>> in the Bodleian...there ain't no such fonts!! Does anyone know what
>> they mean by digitizing?
>
> My understanding, which may be wrong, is that Google will OCR the
> page scans, but do only cursory machine cleanup of the raw unstructured
> text that results (which I call "raw digital text" or RDT), and use the
> still-error-laden RDT in their search system to pull up the page scans
> (or simply to refer to book title and page number.)
>
> [Obviously, RDT will have numerous scanning errors, and those who are
> familiar with the output of OCR engines know that that RDT is overall
> one big ball of wax. Certainly Google can write some advanced program
> to try to clean up the more obvious scanning errors in the RDT, but it
> will only correct some of the errors, but the result is probably good
> enough for search purposes. I rather doubt they will do any human
> proofing (it is way too expensive, and anyway, it's better to turn the
> public domain stuff over to Distributed Proofreaders who will do it
> *for free* via enthusiastic volunteer power. Any corporate entity that
> does not take advantage of free human labor to further their business
> is not serving their stockholders!)]
>
> Interestingly, this is what the University of Michigan (one of the
> Google partners I believe) did in their "Making of America" collection,
> which has been around for a few years now. See:
>
>   http://www.hti.umich.edu/m/moagrp/
>
> MoA scanned the books, placed the scanned page images online (they
> are freely available -- it's a cool collection that, strangely, hardly
> anyone has heard of), and built a search engine to search the
> resulting RDT from OCR. Then one by one they have been converting the
> RDT from selected books to highly-proofed SDT (structured digital text)
> using human proofers and TEI (I think) for structuring. So, the scans
> came first, and then the cleanup was (and is being) done at a later
> time.
>
> It's entirely possible that Google will give, upon request, the page
> scans for any public domain books they've scanned to established
> groups like Distributed Proofreaders for conversion into proofed SDT,
> so long as Google gets a copy of the resulting high-quality SDT. I
> hope they will do this. If not, it will be disappointing -- but at
> least we have the Internet Archive who will make all their scanned
> books available to the world. They may end up with over one million
> books, enough to feed Distributed Proofreaders for quite a while.
>
> Jon Noring
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From hart at pglaf.org  Thu Dec 16 07:05:54 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 16 07:05:55 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <000c01c4e327$12d3da00$6401a8c0@sarek>
References: <000c01c4e327$12d3da00$6401a8c0@sarek>
Message-ID: <Pine.LNX.4.60.0412160700060.1015@pglaf.org>


On Wed, 15 Dec 2004, John Hagerson wrote:

> What is the purpose of PG? Is it to be a white-haired old man, with a
> scraggy beard, carrying a sign on the beach that says "Proprietary e-book
> formats may die!" or is it to provide a repository of information that is
> useful today and into the future?

Riiight. . .one book out of an entire library.

Just put the appropriate note in it and move on. . . .

[I think the current note could be improved. . .I don't know who wrote it,
but it wasn't much of an issue then, and shouldn't be now.  Tempest=Teapot]

As long as people are proposing .pdf for all eBooks, which they are,
this is a serious issue, but even when, hopefully it is not, it is
not something that should be forgotten.

Such as when WordStar went to court to claim copyright on ALL documents
stored in WordStar format, and wanted royalties every time you used the
documents you wrote yourself.

Think this is silly?

It was a HUGE case!

Now swept under the carpet.

[not to mention the people who tried to copyright the human genome,
or who DID patent one person's genome. . .Mr. Moore, who was immune
to a form of cancer.]

This is a non-issue. . .no one else is every going to notice as much
as this week.

From nihil_obstat at mindspring.com  Thu Dec 16 07:09:04 2004
From: nihil_obstat at mindspring.com (Dennis McCarthy)
Date: Thu Dec 16 07:09:09 2004
Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard &
	Others to Digitize Libraries
Message-ID: <28363984.1103209744189.JavaMail.root@wamui02.slb.atl.earthlink.net>


"Non-printable" page?

If you can display it on a screen, it should not be too difficult to capture the image.

Can the "Print Screen" capture method be disabled? (Copies the screen's visual display to the clipboard, at least on MS-Windows--presume there is something similar for Linux and Mac.)

Or will they try to figure out a way to keep that captured image from being fed to (or rendered unreadable) an OCR program?

Time will tell, but my guess is that these page images will one way or another become a source of material for future PG volunteers.

-----Original Message-----
From: Michael Hart <hart@pglaf.org>
Sent: Dec 15, 2004 10:34 AM
To: Book People <spok+bookpeople@cs.cmu.edu>
Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others to Digitize Libraries

> if they want more, they
will have to click on the item and will then arrive at
a second database, this one provided by one of the five
libraries [NYCPL, Harvard, Michigan, Stanford, Oxford]
where they will get a graphical representation of the
non-printable page that contains the quotation.


---------------------------
Dennis McCarthy
nihil_obstat@mindspring.com

From hart at pglaf.org  Thu Dec 16 07:09:51 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 16 07:09:53 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <20041216041806.956774BDAB@ws1-1.us4.outblaze.com>
References: <20041216041806.956774BDAB@ws1-1.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0412160706270.1015@pglaf.org>


On the one hand people complain that eBooks in general will never last,
simply because those big gov't databases were kept in formats no one
can read today. . .on the other hand you don't want this to be mentioned
up front. . . .

None of the people arguing this case were there when we met with the
President of Folio, none of them were part of doing Gibbon's "Roman Empire"
. . .so please just leave it be.

Some day, when you are all gone, perhaps someone else will sweep your
efforts under the carpet. . .and Google will go down as the inventor
of eBooks and the first eBook library.


On Wed, 15 Dec 2004, D. Starner wrote:

>> From the standpoint of an archivist, I have to fall on the side of
>> keeping all file formats available and accessible to the reader.  PG is
>> as much a historical catalog as it is a library.
>
> Since when? Why?
>
> If we want to teach people about the death of old
> formats, maybe we should have a page about old formats, and how WordStar
> and Folio and other formats were da bomb, and how it's hard to find anything
> that can read them now. If they come across them in a search, how will they
> even know that it's an old format nobody can read? For all I would have
> known before this discussion, you could run out and buy an ebook reader that
> takes Folio, or download a program to read them.
>
> Remember that it's not just proprietary formats that die; I seem to remember
> code to read WordStar files in one of my old programming books, and there's
> a bunch of open source programs where you'd have to go through old CDs to
> find a version of the program that could read your files.
> --
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From joshua at hutchinson.net  Thu Dec 16 07:20:52 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Dec 16 07:20:55 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
Message-ID: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>

You know, it's like you're deliberately trying to make me angry!

NO ONE HAS SUGGESTED SWEEPING IT AWAY!

In fact, every person that has suggested a change of some kind has advocated putting the obsolete format document somewhere accessible.  Just not right out in front where an uninformed visitor will see it, click it and get frustrated.  It reflects poorly on PG as a whole and turns off potential users from ever coming back.

Move the bloody thing into the OLD subdirectory.  That's what the OLD subdirectory is for.  Use it as such.

Is the text version we have the exact same document as the folio version or where they created from separate sources?  If it is the same, we should move the folio into the text's etext number and free up a number.  If they are from separate sources, can any somehow generate a text file from the Folio file we have?

Josh

----- Original Message -----
From: "Michael Hart" <hart@pglaf.org>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: !@!Re: [gutvol-d] [Fwd: Folio files]
Date: Thu, 16 Dec 2004 07:09:51 -0800 (PST)

> 
> 
> On the one hand people complain that eBooks in general will never last,
> simply because those big gov't databases were kept in formats no one
> can read today. . .on the other hand you don't want this to be mentioned
> up front. . . .
> 
> None of the people arguing this case were there when we met with the
> President of Folio, none of them were part of doing Gibbon's "Roman Empire"
> . . .so please just leave it be.
> 
> Some day, when you are all gone, perhaps someone else will sweep your
> efforts under the carpet. . .and Google will go down as the inventor
> of eBooks and the first eBook library.
> 
> 
> 
> On Wed, 15 Dec 2004, D. Starner wrote:
> 
> >> From the standpoint of an archivist, I have to fall on the side of
> >> keeping all file formats available and accessible to the reader.  PG is
> >> as much a historical catalog as it is a library.
> >
> > Since when? Why?
> >
> > If we want to teach people about the death of old
> > formats, maybe we should have a page about old formats, and how WordStar
> > and Folio and other formats were da bomb, and how it's hard to find anything
> > that can read them now. If they come across them in a search, how will they
> > even know that it's an old format nobody can read? For all I would have
> > known before this discussion, you could run out and buy an ebook reader that
> > takes Folio, or download a program to read them.
> >
> > Remember that it's not just proprietary formats that die; I seem to remember
> > code to read WordStar files in one of my old programming books, and there's
> > a bunch of open source programs where you'd have to go through old CDs to
> > find a version of the program that could read your files.
> > --
> > ___________________________________________________________
> > Sign-up for Ads Free at Mail.com
> > http://promo.mail.com/adsfreejump.htm
> >
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d@lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From jonathan.gorman at gmail.com  Thu Dec 16 07:36:13 2004
From: jonathan.gorman at gmail.com (Jon Gorman)
Date: Thu Dec 16 07:36:18 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
Message-ID: <4a6dc7604121607365bb13e91@mail.gmail.com>

On Thu, 16 Dec 2004 10:20:52 -0500, Joshua Hutchinson
<joshua@hutchinson.net> wrote:
> You know, it's like you're deliberately trying to make me angry!
> 
> NO ONE HAS SUGGESTED SWEEPING IT AWAY!
> 
> In fact, every person that has suggested a change of some kind has advocated putting >the obsolete format document somewhere accessible.  Just not right out in front where an >uninformed visitor will see it, click it and get frustrated.  It reflects poorly on PG as a whole >and turns off potential users from ever coming back.

Given my rather infrequent posting to this list (although long time
lurking from a variety of email addresses) I'm rather hesitant to
throw more fuel on the fire.  But I have to agree with the idea behind
Joshu Hutchinson and Jon Noring's suggestions.  The  folio is
confusing when it is the first return result, and people do have a
tendency to hit the first result.

I believe Greenstone (the new software behind the scenes at
gutenberg.org) allows pretty precise sorting of returns on various
conditions.  Would it be possible to always return the text format as
the first return?  This would help highlight the importance of the
text format without having to decide when a format is outdated or
unsupported, needs to be moved to the suggested "old" directory, or a
"stupid, stupid formats" page.

I know my first thought when seeing the folio was to think "That's got
to be an error, who would be crazy enough to publish that as a folio".
 But my excuse was it as a long day ;).

Jon Gorman
From hart at pglaf.org  Thu Dec 16 07:58:26 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 16 07:58:27 2004
Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others
	to Digitize Libraries
In-Reply-To: <28363984.1103209744189.JavaMail.root@wamui02.slb.atl.earthlink.net>
References: <28363984.1103209744189.JavaMail.root@wamui02.slb.atl.earthlink.net>
Message-ID: <Pine.LNX.4.60.0412160756080.1015@pglaf.org>


On Thu, 16 Dec 2004, Dennis McCarthy wrote:

>
> "Non-printable" page?
>
> If you can display it on a screen, it should not be too difficult to capture 
> the image.
>
> Can the "Print Screen" capture method be disabled? (Copies the screen's 
> visual display to the clipboard, at least on MS-Windows--presume there is 
> something similar for Linux and Mac.)
>
> Or will they try to figure out a way to keep that captured image from being 
> fed to (or rendered unreadable) an OCR program?

As I always predict, with every generation of DRM,
some 14 year old will figure out a way immediately,
before they have even finished their initial tests
of the Google Print project.

Interesting, tho, that they called it Google PRINT
when PRING is exactly what you can NOT do. . . .

I wonder if they plan to charge for printing?

mh
From hart at pglaf.org  Thu Dec 16 08:04:49 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 16 08:04:51 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <4a6dc7604121607365bb13e91@mail.gmail.com>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<4a6dc7604121607365bb13e91@mail.gmail.com>
Message-ID: <Pine.LNX.4.60.0412160802140.1015@pglaf.org>


On Thu, 16 Dec 2004, Jon Gorman wrote:

> On Thu, 16 Dec 2004 10:20:52 -0500, Joshua Hutchinson
> <joshua@hutchinson.net> wrote:
>> You know, it's like you're deliberately trying to make me angry!
>>
>> NO ONE HAS SUGGESTED SWEEPING IT AWAY!
>>
>> In fact, every person that has suggested a change of some kind has advocated putting >the obsolete format document somewhere accessible.  Just not right out in front where an >uninformed visitor will see it, click it and get frustrated.  It reflects poorly on PG as a whole >and turns off potential users from ever coming back.
>
> Given my rather infrequent posting to this list (although long time
> lurking from a variety of email addresses) I'm rather hesitant to
> throw more fuel on the fire.  But I have to agree with the idea behind
> Joshu Hutchinson and Jon Noring's suggestions.  The  folio is
> confusing when it is the first return result, and people do have a
> tendency to hit the first result.

No one is suggesting it should be the first result.


> I believe Greenstone (the new software behind the scenes at
> gutenberg.org) allows pretty precise sorting of returns on various
> conditions.  Would it be possible to always return the text format as
> the first return?  This would help highlight the importance of the
> text format without having to decide when a format is outdated or
> unsupported, needs to be moved to the suggested "old" directory, or a
> "stupid, stupid formats" page.

However, this sort of sweeping out of sight is not acceptable.

Try again when those of us who spent all the effort on this Folio
project are dead, eh?


> I know my first thought when seeing the folio was to think "That's got
> to be an error, who would be crazy enough to publish that as a folio".

That's the whole point. . .let us make that point.
From hacker at gnu-designs.com  Thu Dec 16 08:08:13 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Thu Dec 16 08:08:42 2004
Subject: [gutvol-d] re: [BP] Google Partners with Oxford, Harvard & Others
	to Digitize Libraries
In-Reply-To: <Pine.LNX.4.60.0412160756080.1015@pglaf.org>
References: <28363984.1103209744189.JavaMail.root@wamui02.slb.atl.earthlink.net>
	<Pine.LNX.4.60.0412160756080.1015@pglaf.org>
Message-ID: <Pine.LNX.4.58.0412161106400.18429@aphrodite.gnu-designs.com>


> Interesting, tho, that they called it Google PRINT when PRING is
> exactly what you can NOT do. . . .

	Noun vs. verb. Its "Print"(ed) media, but you cannot "print"
it on your normal printer for personal or commercial use.

	The same sort of confusion surrounds "DRM", which has
absolutely nothing to do with "Rights" at all.

> I wonder if they plan to charge for printing?

	If they do, this means they have 100% rights to do so, from
the copyright holder(s), assuming the copyright is still in effect.

David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From hart at pglaf.org  Thu Dec 16 08:10:26 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 16 08:10:27 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0412160805330.1015@pglaf.org>


On Thu, 16 Dec 2004, Joshua Hutchinson wrote:

> You know, it's like you're deliberately trying to make me angry!

Sweeping it under the carpet is exactly what you are promoting here.


>
> NO ONE HAS SUGGESTED SWEEPING IT AWAY!

Again:
Sweeping it under the carpet is exactly what you are promoting here.


>
> In fact, every person that has suggested a change of some kind has advocated 
> putting the obsolete format document somewhere accessible.  Just not right 
> out in front where an uninformed visitor will see it, click it and get 
> frustrated.  It reflects poorly on PG as a whole and turns off potential 
> users from ever coming back.

It this were the case, lots of people would have complained by now.

You are insiders. . .you have a distinctly different viewpoint.

>
> Move the bloody thing into the OLD subdirectory.  That's what the OLD 
> subdirectory is for.  Use it as such.

Again:
Sweeping it under the carpet is exactly what you are promoting here.


> Is the text version we have the exact same document as the folio version or 
> where they created from separate sources?  If it is the same, we should move 
> the folio into the text's etext number and free up a number.  If they are 
> from separate sources, can any somehow generate a text file from the Folio 
> file we have?

This is exactly the reason for having a separate number,
so people will NOT get the .nfo format unless they want it.

BTW, you can still get the Folio reader with the TIME Magazing CDs
which sell for $1.

> Josh
>
> ----- Original Message -----
> From: "Michael Hart" <hart@pglaf.org>
> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
> Subject: Re: !@!Re: [gutvol-d] [Fwd: Folio files]
> Date: Thu, 16 Dec 2004 07:09:51 -0800 (PST)
>
>>
>>
>> On the one hand people complain that eBooks in general will never last,
>> simply because those big gov't databases were kept in formats no one
>> can read today. . .on the other hand you don't want this to be mentioned
>> up front. . . .
>>
>> None of the people arguing this case were there when we met with the
>> President of Folio, none of them were part of doing Gibbon's "Roman Empire"
>> . . .so please just leave it be.
>>
>> Some day, when you are all gone, perhaps someone else will sweep your
>> efforts under the carpet. . .and Google will go down as the inventor
>> of eBooks and the first eBook library.
>>
>>
>>
>> On Wed, 15 Dec 2004, D. Starner wrote:
>>
>>>> From the standpoint of an archivist, I have to fall on the side of
>>>> keeping all file formats available and accessible to the reader.  PG is
>>>> as much a historical catalog as it is a library.
>>>
>>> Since when? Why?
>>>
>>> If we want to teach people about the death of old
>>> formats, maybe we should have a page about old formats, and how WordStar
>>> and Folio and other formats were da bomb, and how it's hard to find anything
>>> that can read them now. If they come across them in a search, how will they
>>> even know that it's an old format nobody can read? For all I would have
>>> known before this discussion, you could run out and buy an ebook reader that
>>> takes Folio, or download a program to read them.
>>>
>>> Remember that it's not just proprietary formats that die; I seem to remember
>>> code to read WordStar files in one of my old programming books, and there's
>>> a bunch of open source programs where you'd have to go through old CDs to
>>> find a version of the program that could read your files.
>>> --
>>> ___________________________________________________________
>>> Sign-up for Ads Free at Mail.com
>>> http://promo.mail.com/adsfreejump.htm
>>>
>>> _______________________________________________
>>> gutvol-d mailing list
>>> gutvol-d@lists.pglaf.org
>>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From marcello at perathoner.de  Thu Dec 16 08:21:46 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Dec 16 08:21:55 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <4a6dc7604121607365bb13e91@mail.gmail.com>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<4a6dc7604121607365bb13e91@mail.gmail.com>
Message-ID: <41C1B61A.5010704@perathoner.de>

Jon Gorman wrote:

> I believe Greenstone (the new software behind the scenes at
> gutenberg.org) allows pretty precise sorting of returns on various
> conditions.  

What? Who installed Greenstone without my noticing it?


> Would it be possible to always return the text format as
> the first return?

How can the software know that #900, #733 and #892 are the same book ? 
(If they are indeed the same, which I cannot establish, lacking a Folio 
viewer.)

The Right Thing to do is to reindex all formats (TXT, HTML, Folio) under 
one etext number. Then the software would sort it in a sensible way.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jonathan.gorman at gmail.com  Thu Dec 16 08:22:10 2004
From: jonathan.gorman at gmail.com (Jon Gorman)
Date: Thu Dec 16 08:22:15 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <Pine.LNX.4.60.0412160802140.1015@pglaf.org>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<4a6dc7604121607365bb13e91@mail.gmail.com>
	<Pine.LNX.4.60.0412160802140.1015@pglaf.org>
Message-ID: <4a6dc7604121608222f463d12@mail.gmail.com>

On Thu, 16 Dec 2004 08:04:49 -0800 (PST), Michael Hart <hart@pglaf.org> wrote:
> 
> 
> On Thu, 16 Dec 2004, Jon Gorman wrote:
> 
> > On Thu, 16 Dec 2004 10:20:52 -0500, Joshua Hutchinson
> > <joshua@hutchinson.net> wrote:
> >> You know, it's like you're deliberately trying to make me angry!
> >>
> >> NO ONE HAS SUGGESTED SWEEPING IT AWAY!
> >>
> >> In fact, every person that has suggested a change of some kind has advocated putting >the obsolete format document somewhere accessible.  Just not right out in front where an >uninformed visitor will see it, click it and get frustrated.  It reflects poorly on PG as a whole >and turns off potential users from ever coming back.
> >
> > Given my rather infrequent posting to this list (although long time
> > lurking from a variety of email addresses) I'm rather hesitant to
> > throw more fuel on the fire.  But I have to agree with the idea behind
> > Joshu Hutchinson and Jon Noring's suggestions.  The  folio is
> > confusing when it is the first return result, and people do have a
> > tendency to hit the first result.
> 
> No one is suggesting it should be the first result.
> 
> 
> > I believe Greenstone (the new software behind the scenes at
> > gutenberg.org) allows pretty precise sorting of returns on various
> > conditions.  Would it be possible to always return the text format as
> > the first return?  This would help highlight the importance of the
> > text format without having to decide when a format is outdated or
> > unsupported, needs to be moved to the suggested "old" directory, or a
> > "stupid, stupid formats" page.
> 
> However, this sort of sweeping out of sight is not acceptable.

Michael, I think people are trying to understand what you mean by
hiding or sweeping away.  The mere fact the folio appears first is an
unintentional accident of sorting.  You yourself  says that no one is
arguing it should be first.  Yet, down here you say changing the order
is not acceptable.  Should we be moving all the obsolete formats to
the front, essentially doing the opposite?  What service does that
provide?  Indeed by having the text format be first and foremost, it
should send a clear signal of the preferred format, and can be linked
to another page explaining why.

> 
> Try again when those of us who spent all the effort on this Folio
> project are dead, eh?
> 

Michael, no one is trying to disparage your efforts.  Indeed, I have
some questions.  Do we have it in writing there would always be a free
reader?  I've seen some algorithms and code that decodes the folio
format.  Would the lack a free reader for the folio allow these to be
legally available for a person to develop a reader/converter program
for it?

> 
> > I know my first thought when seeing the folio was to think "That's got
> > to be an error, who would be crazy enough to publish that as a folio".
> 
> That's the whole point. . .let us make that point.

Right.....but how often are encodings named after words like folio or
quarto?  Just serves another dose of confusion.


> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From jonathan.gorman at gmail.com  Thu Dec 16 08:34:23 2004
From: jonathan.gorman at gmail.com (Jon Gorman)
Date: Thu Dec 16 08:34:28 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <41C1B61A.5010704@perathoner.de>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<4a6dc7604121607365bb13e91@mail.gmail.com>
	<41C1B61A.5010704@perathoner.de>
Message-ID: <4a6dc7604121608345a650580@mail.gmail.com>

On Thu, 16 Dec 2004 17:21:46 +0100, Marcello Perathoner
<marcello@perathoner.de> wrote:
> Jon Gorman wrote:
> 
> > I believe Greenstone (the new software behind the scenes at
> > gutenberg.org) allows pretty precise sorting of returns on various
> > conditions.
> 
> What? Who installed Greenstone without my noticing it?

Errr, oops.  Dang, sorry about that.  Could have sworn I heard a bit
ago that you guys were putting it in and the new site (gutenburg.org)
sure looks like Greenstone.

Sorry about that.  This is exactly why I usually keep my mouth shut
about these types of things ;).

Jon Gorman
From joshua at hutchinson.net  Thu Dec 16 08:34:53 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Dec 16 08:35:03 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
Message-ID: <20041216163453.D512F4F507@ws6-5.us4.outblaze.com>

Michael, I know you're not this obtuse, so there must be something we are not understanding about each other's stance.

> 
> > You know, it's like you're deliberately trying to make me angry!
> 
> Sweeping it under the carpet is exactly what you are promoting here.
> 

Maybe it is the word sweeping.  

"Sweeping it away" means deleting to me.

I am most definitely NOT advocating that.  (And no one I've seen has been.)

I am advocating MOVING the file so that it is not the first thing someone sees when they do a search for that text.  The easiest way to do that is to move it into the OLD directory.  It is what it was created for.  It is still there for anyone interested in PG history, but it doesn't confuse the average user who just wants to be able to read the e-book.

> 
> It this were the case, lots of people would have complained by now.
> 
> You are insiders. . .you have a distinctly different viewpoint.
> 

It is well known that when people are searching the web, if they don't understand something, they are FAR more likely to just click away and never return.  For every one person that complains, you'll have hundreds, if not thousands, that just clicked away never to return.

> 
> > Is the text version we have the exact same document as the folio version or 
> > where they created from separate sources?  If it is the same, we should move 
> > the folio into the text's etext number and free up a number.  If they are 
> > from separate sources, can any somehow generate a text file from the Folio 
> > file we have?
> 
> This is exactly the reason for having a separate number,
> so people will NOT get the .nfo format unless they want it.
> 
> BTW, you can still get the Folio reader with the TIME Magazing CDs
> which sell for $1.
> 

Moving the file to a new number is probably not a good idea.  I was just kind of thinking out loud there.  But it should be moved to the OLD directory so that Joe User doesn't see it as the FIRST THING IN HIS SEARCH LIST.

Josh
From hart at pglaf.org  Thu Dec 16 08:38:08 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 16 08:38:09 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <4a6dc7604121608222f463d12@mail.gmail.com>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com> 
	<4a6dc7604121607365bb13e91@mail.gmail.com>
	<Pine.LNX.4.60.0412160802140.1015@pglaf.org>
	<4a6dc7604121608222f463d12@mail.gmail.com>
Message-ID: <Pine.LNX.4.60.0412160832060.1015@pglaf.org>


On Thu, 16 Dec 2004, Jon Gorman wrote:

>>> Joshu Hutchinson and Jon Noring's suggestions.  The  folio is
>>> confusing when it is the first return result, and people do have a
>>> tendency to hit the first result.
>>
>> No one is suggesting it should be the first result.
>>
>>
>>> I believe Greenstone (the new software behind the scenes at
>>> gutenberg.org) allows pretty precise sorting of returns on various
>>> conditions.  Would it be possible to always return the text format as
>>> the first return?  This would help highlight the importance of the
>>> text format without having to decide when a format is outdated or
>>> unsupported, needs to be moved to the suggested "old" directory, or a
>>> "stupid, stupid formats" page.
>>
>> However, this sort of sweeping out of sight is not acceptable.
>
> Michael, I think people are trying to understand what you mean by
> hiding or sweeping away.

Putting something where it is not likely be be seen is sweeping
under the carpet. . .period.


> The mere fact the folio appears first is an unintentional accident of sorting.

Then change the sorting technique so it is last. . . .

> You yourself says that no one is arguing it should be first.

I haven't seen anyone say it should be first, no one at all.


> Yet, down here you say changing the order is not acceptable.

No, I don't. . .just moving it to another directory is.


> Should we be moving all the obsolete formats to the front,

What kind of question is that?

> essentially doing the opposite?  What service does that provide?  Indeed by 
> having the text format be first and foremost, it should send a clear signal 
> of the preferred format, and can be linked to another page explaining why.

As above, I am not saying it should come up as first, as default, etc.


>> Try again when those of us who spent all the effort on this Folio
>> project are dead, eh?
>>
>
> Michael, no one is trying to disparage your efforts.  Indeed, I have
> some questions.  Do we have it in writing there would always be a free
> reader?  I've seen some algorithms and code that decodes the folio
> format.  Would the lack a free reader for the folio allow these to be
> legally available for a person to develop a reader/converter program for it?

Personally, I wouldn't go through the effort, even if Folio has folded
and we can get the rights.  It's JUST and example. . .leave it be, put
in a comment describing this better than the one that is in there now.

From hart at pglaf.org  Thu Dec 16 08:46:49 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 16 08:46:51 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <20041216163453.D512F4F507@ws6-5.us4.outblaze.com>
References: <20041216163453.D512F4F507@ws6-5.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0412160839070.1015@pglaf.org>


On Thu, 16 Dec 2004, Joshua Hutchinson wrote:

> Michael, I know you're not this obtuse, so there must be something we are not understanding about each other's stance.
>
>>
>>> You know, it's like you're deliberately trying to make me angry!
>>
>> Sweeping it under the carpet is exactly what you are promoting here.
>>
>
> Maybe it is the word sweeping.
>
> "Sweeping it away" means deleting to me.

Are you intentionally misquoting me and thinking no one will notice.

"Sweeping under the carpet/rug" is what I said.

Putting out of view.


> I am most definitely NOT advocating that.  (And no one I've seen has been.)

It appears the opposite.


> I am advocating MOVING the file so that it is not the first thing someone 
> sees when they do a search for that text.  The easiest way to do that is to 
> move it into the OLD directory.  It is what it was created for.  It is still 
> there for anyone interested in PG history, but it doesn't confuse the average 
> user who just wants to be able to read the e-book.

I'm find with changes "so that it is not the first thing someone 
sees when they do a search for that text."

I am NOT fine with sweeping it out of the normal directory.

You can do that when there is no one left to remember the issue,
or you can try to help them remember the issue, because it IS
going to come up again.


>> It this were the case, lots of people would have complained by now.
>>
>> You are insiders. . .you have a distinctly different viewpoint.
>>
>
> It is well known that when people are searching the web, if they don't 
> understand something, they are FAR more likely to just click away and never 
> return.  For every one person that complains, you'll have hundreds, if not 
> thousands, that just clicked away never to return.

I get messages all the time about things to improve, this has never
been one of them. . .not even once.

When we get ONE message we consider it.

Even if we only get one per year, it is still considered,
but it is not considered as the kind of major issue you want it to be.

Change the search so it is last.

Change the comments about not downloading it unless you have a Folio View.
Please add a remark that there used to be a free viewer but that Folio
changed its mind, just and any other company might do, such as Adobe
about .pdf files. . .and I will be MORE than happy for you, and for me.

Good enough for now?


Thanks for coming a bit in my direction!


Michael

PS
I hope you will also thank me for moving in your direction.

From hart at pglaf.org  Thu Dec 16 08:48:39 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 16 08:48:41 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <4a6dc7604121608345a650580@mail.gmail.com>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<4a6dc7604121607365bb13e91@mail.gmail.com>
	<41C1B61A.5010704@perathoner.de>
	<4a6dc7604121608345a650580@mail.gmail.com>
Message-ID: <Pine.LNX.4.60.0412160847110.1015@pglaf.org>


On Thu, 16 Dec 2004, Jon Gorman wrote:

> On Thu, 16 Dec 2004 17:21:46 +0100, Marcello Perathoner
> <marcello@perathoner.de> wrote:
>> Jon Gorman wrote:
>>
>>> I believe Greenstone (the new software behind the scenes at
>>> gutenberg.org) allows pretty precise sorting of returns on various
>>> conditions.
>>
>> What? Who installed Greenstone without my noticing it?
>
> Errr, oops.  Dang, sorry about that.  Could have sworn I heard a bit
> ago that you guys were putting it in and the new site (gutenburg.org)
> sure looks like Greenstone.
>
> Sorry about that.  This is exactly why I usually keep my mouth shut
> about these types of things ;).
>
> Jon Gorman

Probably some confusion about domain names here:

gutenberg.org = gutenberg.net  the old site

pgcc.net = gutenberg.us = gutenberg.cc  the new site

mh
From gbnewby at pglaf.org  Thu Dec 16 08:51:32 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Dec 16 08:51:34 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <Pine.LNX.4.60.0412160805330.1015@pglaf.org>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<Pine.LNX.4.60.0412160805330.1015@pglaf.org>
Message-ID: <20041216165132.GA6868@pglaf.org>

On Thu, Dec 16, 2004 at 08:10:26AM -0800, Michael Hart wrote:
> ...
> BTW, you can still get the Folio reader with the TIME Magazing CDs
> which sell for $1.

I wasn't aware of this - do you have a copy?  We can
make sure Brewster's site archives it, and maybe even
provide our own archival copy.
  -- Greg
From ciesiels at bigpond.net.au  Thu Dec 16 08:51:09 2004
From: ciesiels at bigpond.net.au (Michael Ciesielski)
Date: Thu Dec 16 08:52:05 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <Pine.LNX.4.60.0412160847110.1015@pglaf.org>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>	<4a6dc7604121607365bb13e91@mail.gmail.com>	<41C1B61A.5010704@perathoner.de>	<4a6dc7604121608345a650580@mail.gmail.com>
	<Pine.LNX.4.60.0412160847110.1015@pglaf.org>
Message-ID: <41C1BCFD.9050500@bigpond.net.au>

Michael Hart wrote:

> Probably some confusion about domain names here:
>
> gutenberg.org = gutenberg.net  the old site
>
> pgcc.net = gutenberg.us = gutenberg.cc  the new site
>
Uh, excuse me?

When did "PG2"/"PGCC"/"WEL" become the new Project Gutenberg site?

--
Michael Ciesielski

From hart at pglaf.org  Thu Dec 16 08:54:02 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 16 08:54:04 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <20041216165132.GA6868@pglaf.org>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<Pine.LNX.4.60.0412160805330.1015@pglaf.org>
	<20041216165132.GA6868@pglaf.org>
Message-ID: <Pine.LNX.4.60.0412160853380.1015@pglaf.org>


On Thu, 16 Dec 2004, Greg Newby wrote:

> On Thu, Dec 16, 2004 at 08:10:26AM -0800, Michael Hart wrote:
>> ...
>> BTW, you can still get the Folio reader with the TIME Magazing CDs
>> which sell for $1.
>
> I wasn't aware of this - do you have a copy?  We can
> make sure Brewster's site archives it, and maybe even
> provide our own archival copy.

I have TIME, but the reader is NOT the free one.

>  -- Greg
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From gbnewby at pglaf.org  Thu Dec 16 08:56:35 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Dec 16 08:56:36 2004
Subject: greenstone (Re: !@!Re: [gutvol-d] [Fwd: Folio files])
In-Reply-To: <4a6dc7604121608345a650580@mail.gmail.com>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<4a6dc7604121607365bb13e91@mail.gmail.com>
	<41C1B61A.5010704@perathoner.de>
	<4a6dc7604121608345a650580@mail.gmail.com>
Message-ID: <20041216165635.GB6868@pglaf.org>

On Thu, Dec 16, 2004 at 10:34:23AM -0600, Jon Gorman wrote:
> On Thu, 16 Dec 2004 17:21:46 +0100, Marcello Perathoner
> <marcello@perathoner.de> wrote:
> > Jon Gorman wrote:
> > 
> > > I believe Greenstone (the new software behind the scenes at
> > > gutenberg.org) allows pretty precise sorting of returns on various
> > > conditions.
> > 
> > What? Who installed Greenstone without my noticing it?
> 
> Errr, oops.  Dang, sorry about that.  Could have sworn I heard a bit
> ago that you guys were putting it in and the new site (gutenburg.org)
> sure looks like Greenstone.

iBiblio runs the Greenstone search engine, which we link
to.  It's not bad, but takes a long time to re-index the
site (and doesn't do all the filetypes), and is not updated
too regularly.

We link to it from the gutenberg.org/gutenberg.net pages,
as well as Yahoo & Google (which, similarly, we don't run:
they just index us as part of their service).

> Sorry about that.  This is exactly why I usually keep my mouth shut
> about these types of things ;).

Not at all - it's often not too clear what's "ours" (as
in stuff we run) and what's not ours, without detailed
reading.
  -- Greg

From hart at pglaf.org  Thu Dec 16 08:59:56 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 16 08:59:58 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <41C1BCFD.9050500@bigpond.net.au>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<4a6dc7604121607365bb13e91@mail.gmail.com>
	<41C1B61A.5010704@perathoner.de>
	<4a6dc7604121608345a650580@mail.gmail.com>
	<Pine.LNX.4.60.0412160847110.1015@pglaf.org>
	<41C1BCFD.9050500@bigpond.net.au>
Message-ID: <Pine.LNX.4.60.0412160854490.1015@pglaf.org>


On Fri, 17 Dec 2004, Michael Ciesielski wrote:

> Michael Hart wrote:
>
>> Probably some confusion about domain names here:
>> 
>> gutenberg.org = gutenberg.net  the old site
>> 
>> pgcc.net = gutenberg.us = gutenberg.cc  the new site
>> 
> Uh, excuse me?
>
> When did "PG2"/"PGCC"/"WEL" become the new Project Gutenberg site?

This was announced many times in the Weekly Newsletter,
and discussed in several listserv conversations.

The current site was online for testing at least since Jun 22,
the offical date of change from testing to opening was Nov 4.

gutenberg.org replaced gutenberg.net as the preferred domain
name for that site during the same period.
From gbnewby at pglaf.org  Thu Dec 16 09:00:36 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Dec 16 09:00:37 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <Pine.LNX.4.60.0412160853380.1015@pglaf.org>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<Pine.LNX.4.60.0412160805330.1015@pglaf.org>
	<20041216165132.GA6868@pglaf.org>
	<Pine.LNX.4.60.0412160853380.1015@pglaf.org>
Message-ID: <20041216170036.GA7970@pglaf.org>

On Thu, Dec 16, 2004 at 08:54:02AM -0800, Michael Hart wrote:
> 
> 
> On Thu, 16 Dec 2004, Greg Newby wrote:
> 
> >On Thu, Dec 16, 2004 at 08:10:26AM -0800, Michael Hart wrote:
> >>...
> >>BTW, you can still get the Folio reader with the TIME Magazing CDs
> >>which sell for $1.
> >
> >I wasn't aware of this - do you have a copy?  We can
> >make sure Brewster's site archives it, and maybe even
> >provide our own archival copy.
> 
> I have TIME, but the reader is NOT the free one.

Brewster @ TIA has a project to archive such orphaned
software (copyrighted, but not being sold/owned).  It's
legit, & might be a good place to send this software.
  -- Greg
From marcello at perathoner.de  Thu Dec 16 09:03:02 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Dec 16 09:03:10 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <Pine.LNX.4.60.0412160805330.1015@pglaf.org>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<Pine.LNX.4.60.0412160805330.1015@pglaf.org>
Message-ID: <41C1BFC6.4020408@perathoner.de>

Michael Hart wrote:


> Sweeping it under the carpet is exactly what you are promoting here.

Actually we are advocating greater visibility of the files in question.


Current situation: if a reader who looks for Gibbon *by chance* happens 
to download the Folio files, she *may* realize that proprietary formats 
are bad.

Disadvantage: more probably she will not realize where the problem is 
because nobody told her and just form a bad opinion of PG: "What the 
hell do they keep around files if nobody can read them ?"


Proposed change: move the Folio files out of the catalog, write a "Hall 
of Shame" page explaining the problem and link to the Folio files from 
there.

Advantage: people who don't look for Gibbon can see the "Hall of Shame" 
page. People actually realize the problem because it is explained to them.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Thu Dec 16 09:17:13 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Dec 16 09:17:19 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <Pine.LNX.4.60.0412160854490.1015@pglaf.org>
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>	<4a6dc7604121607365bb13e91@mail.gmail.com>	<41C1B61A.5010704@perathoner.de>	<4a6dc7604121608345a650580@mail.gmail.com>	<Pine.LNX.4.60.0412160847110.1015@pglaf.org>	<41C1BCFD.9050500@bigpond.net.au>
	<Pine.LNX.4.60.0412160854490.1015@pglaf.org>
Message-ID: <41C1C319.6060103@perathoner.de>

Michael Hart wrote:

>> When did "PG2"/"PGCC"/"WEL" become the new Project Gutenberg site?
> 
> This was announced many times in the Weekly Newsletter,
> and discussed in several listserv conversations.

And most everybody took exception with the "new" and "old" connotation.


> The current site was online for testing at least since Jun 22,
> the offical date of change from testing to opening was Nov 4.

> gutenberg.org replaced gutenberg.net as the preferred domain
> name for that site during the same period.

The two changes are completely unrelated.

gutenberg.net was abandoned in favor of gutenberg.org because .org is 
the standard TLD for non-profits. Getting rid of multiple domains also 
gave us better search engine ranking.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Thu Dec 16 09:17:17 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Dec 16 09:17:23 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <Pine.LNX.4.60.0412160839070.1015@pglaf.org>
References: <20041216163453.D512F4F507@ws6-5.us4.outblaze.com>
	<Pine.LNX.4.60.0412160839070.1015@pglaf.org>
Message-ID: <41C1C31D.7030500@perathoner.de>

Michael Hart wrote:

> I'm find with changes "so that it is not the first thing someone sees 
> when they do a search for that text."

I edited the title so it will sort later.


> I am NOT fine with sweeping it out of the normal directory.

How fine are you with removing the files from the catalog database?


> I get messages all the time about things to improve, this has never
> been one of them. . .not even once.

Actually we just got a message. The one that started this discussion.


> When we get ONE message we consider it.

Alright. We got ONE message.


> Change the comments about not downloading it unless you have a Folio View.
> Please add a remark that there used to be a free viewer but that Folio
> changed its mind, just and any other company might do, such as Adobe
> about .pdf files. . .and I will be MORE than happy for you, and for me.

Go ahead and write a better comment.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Thu Dec 16 09:39:35 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Dec 16 09:39:41 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
Message-ID: <20041216173935.5F7944F49F@ws6-5.us4.outblaze.com>


> 
> On Thu, 16 Dec 2004, Joshua Hutchinson wrote:
> 
> > Michael, I know you're not this obtuse, so there must be something we are 
> > not understanding about each other's stance.
> >
> >>
> >>> You know, it's like you're deliberately trying to make me angry!
> >>
> >> Sweeping it under the carpet is exactly what you are promoting here.
> >>
> >
> > Maybe it is the word sweeping.
> >
> > "Sweeping it away" means deleting to me.
> 
> Are you intentionally misquoting me and thinking no one will notice.
> 

Yeah, Michael.  I'm misquoting you.  That's why your original words are exactly two lines above it.  

My quote was a simple mistake of putting the quote in the wrong spot.  It was supposed to read:

"Sweeping it" away means deleting it to me.

> "Sweeping under the carpet/rug" is what I said.
> 
> Putting out of view.
> 
> 
> > I am most definitely NOT advocating that.  (And no one I've seen has been.)
> 
> It appears the opposite.
> 
> 

Yep, it looks like you define sweeping differently than I do.  Moving the file doesn't mean get rid of it or put where no one can see it.  It moves it so that the search bar doesn't bring it up as the default search result.

> 
> You can do that when there is no one left to remember the issue,
> or you can try to help them remember the issue, because it IS
> going to come up again.
> 

If this really was an issue you cared about, you would have put a section up on the web page with links to the examples of why this is a "bad thing."  You just seem to be arguing against change for the sake of arguing.  As we are all fond of saying around here, if this bothers you so much, DO something about it.  Create a page deriding proprietary formats.

> 
> 
> I get messages all the time about things to improve, this has never
> been one of them. . .not even once.
> 
> When we get ONE message we consider it.

What the heck do you call the message that started this whole thread?  A big thank you for having an unreadable file out there?  Come on, we do have a complaint message!

> 
> Even if we only get one per year, it is still considered,
> but it is not considered as the kind of major issue you want it to be.

You're right.  This isn't major.  It should be fixed in about 30 seconds.  But for some reason, you're arguing like mad to keep something that results in a lower level of usability.  It is really boggling my mind.

> 
> Change the search so it is last.

That is exactly what moving the file to the OLD directory (which is a subdirectory of its current location), would do!

> 
> Change the comments about not downloading it unless you have a Folio View.
> Please add a remark that there used to be a free viewer but that Folio
> changed its mind, just and any other company might do, such as Adobe
> about .pdf files. . .and I will be MORE than happy for you, and for me.
> 

Why do we need to handle this one differently than we would any other file in the collection.  As (I believe) Andrew pointed out, this would normally be handled by moving the file to the OLD directory.  So we have an established manner of handling these situations.  You just seem to want to fight against it.

If, however, putting a disclaimer in the search field is the best we can get... fine, I'll take it.  At least it is something (if not the "best practice" method I'd like to see).

Josh
From joshua at hutchinson.net  Thu Dec 16 09:43:20 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Dec 16 09:43:25 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
Message-ID: <20041216174320.88AA54F42B@ws6-5.us4.outblaze.com>

Amen, Marcello.  And thank you for spelling it out much clearer and calmer than I have been able to.

Josh

----- Original Message -----
From: "Marcello Perathoner" <marcello@perathoner.de>
To: "Michael S. Hart" <hart@pobox.com>, "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: !@!Re: [gutvol-d] [Fwd: Folio files]
Date: Thu, 16 Dec 2004 18:03:02 +0100

> 
> Michael Hart wrote:
> 
> 
> > Sweeping it under the carpet is exactly what you are promoting here.
> 
> Actually we are advocating greater visibility of the files in question.
> 
> 
> Current situation: if a reader who looks for Gibbon *by chance* happens to 
> download the Folio files, she *may* realize that proprietary formats are bad.
> 
> Disadvantage: more probably she will not realize where the problem is because 
> nobody told her and just form a bad opinion of PG: "What the hell do they keep 
> around files if nobody can read them ?"
> 
> 
> 
> Proposed change: move the Folio files out of the catalog, write a "Hall of 
> Shame" page explaining the problem and link to the Folio files from there.
> 
> Advantage: people who don't look for Gibbon can see the "Hall of Shame" page. 
> People actually realize the problem because it is explained to them.
> 
> 
> 
> 
> -- Marcello Perathoner
> webmaster@gutenberg.org
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From joshua at hutchinson.net  Thu Dec 16 09:49:25 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Dec 16 09:49:30 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
Message-ID: <20041216174925.E50A09E82B@ws6-2.us4.outblaze.com>

As a side question ---

Is PGCC a part of PG?  Officially?  Does Greg, for instance, have oversight over it.  Is anyone from PG (outside of Michael) invovled in this site in any way?

I know this is reopening old wounds around here, but the last paragraph on the PGCC home page makes it sound like this is an official part of PG, which I never understood to be the actual case...

[quote]
Up until now, Project Gutenberg has focused on the creation of the eBooks rather than their distribution and we have spent as much of our time on copyright as on eBook creation and distribution. This is our first attempt focused on distribution rather than creation.
[/quote]

Another paragraph, though, indicates that PGCC is a subset of the World eBook Library Consortia, which I'm almost positive is not related directly to PG.

[quote]
Project Gutenberg Consortia Center, promoting global literacy by multiplying intellectual properties though Internet library lending and increasing access to digital archives and repositories. Project Gutenberg Consortia Center is a branch of The World eBook Library Consortia.
[/quote]

Josh

PS Oh, and I strongly disagree with the characterisation of pgcc.net as the "new PG" site, which no matter how you MEAN it sound, it will imply that it is the "replacement" for the OLD site (gutenberg.org).

----- Original Message -----
From: "Michael Ciesielski" <ciesiels@bigpond.net.au>
To: "Michael S. Hart" <hart@pobox.com>, "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: !@!Re: [gutvol-d] [Fwd: Folio files]
Date: Fri, 17 Dec 2004 03:51:09 +1100

> 
> Michael Hart wrote:
> 
> > Probably some confusion about domain names here:
> >
> > gutenberg.org = gutenberg.net  the old site
> >
> > pgcc.net = gutenberg.us = gutenberg.cc  the new site
> >
> Uh, excuse me?
> 
> When did "PG2"/"PGCC"/"WEL" become the new Project Gutenberg site?
> 
> --
> Michael Ciesielski
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From hart at pglaf.org  Thu Dec 16 10:55:18 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 16 10:55:20 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <20041216174925.E50A09E82B@ws6-2.us4.outblaze.com>
References: <20041216174925.E50A09E82B@ws6-2.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0412161038370.11591@pglaf.org>


On Thu, 16 Dec 2004, Joshua Hutchinson wrote:

> As a side question ---
>
> Is PGCC a part of PG?  Officially?  Does Greg, for instance, have oversight 
> over it.  Is anyone from PG (outside of Michael) invovled in this site in any 
> way?

Yes, Greg has spent plenty of time on PGCC, more than plenty.

> I know this is reopening old wounds around here, but the last paragraph on 
> the PGCC home page makes it sound like this is an official part of PG, which 
> I never understood to be the actual case...

Different people want to have different views. . .we haven't pushed it
very hard. . .keeping the reports and numbers separate in the Weekly
Newsletter, etc.


> [quote] Up until now, Project Gutenberg has focused on the creation of the 
> eBooks rather than their distribution and we have spent as much of our time 
> on copyright as on eBook creation and distribution. This is our first attempt 
> focused on distribution rather than creation. [/quote]

Yes, the Mission Statements of Project Gutenberg have always made it clear
that PG is intended to focus on both the creation and distribution of eBooks.

Obviously eBooks must be created before they can be distributed.

Many eBook creators insist their books be left in certain formats that
have not passed muster with PG processing and post-processing standards,
and we pass these on to PGCC, who is willing to post them in original
formats, pagination, files, etc.  In addition, PGCC surfs the web for
any and all possible eBook sites and sends requests to them.


> Another paragraph, though, indicates that PGCC is a subset of the World eBook 
> Library Consortia, which I'm almost positive is not related directly to PG.

> [quote] Project Gutenberg Consortia Center, promoting global literacy by 
> multiplying intellectual properties though Internet library lending and 
> increasing access to digital archives and repositories. Project Gutenberg 
> Consortia Center is a branch of The World eBook Library Consortia. [/quote]

This is probably something that obviously needs correction, if you can send
the exact location I will forward it so it can be corrected immediately.
As far as I know, there should be no reference to World eBook Library.
BTW, this is a different World Library than donated us the Shakespeare
files from which we made book #100.

> Josh
>
> PS Oh, and I strongly disagree with the characterisation of pgcc.net as the 
> "new PG" site, which no matter how you MEAN it sound, it will imply that it 
> is the "replacement" for the OLD site (gutenberg.org).

Sorry, that was quoted from someone else who used the term
"new site (gutenberg.org)" in reference to the location of
the Greenstone program, and I obviously should have made
that quotation clearly marked.


From joshua at hutchinson.net  Thu Dec 16 11:24:37 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Dec 16 11:24:44 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
Message-ID: <20041216192437.399854F432@ws6-5.us4.outblaze.com>


----- Original Message -----
From: "Michael Hart" <hart@pglaf.org>
> 
> On Thu, 16 Dec 2004, Joshua Hutchinson wrote:
> 
> > Another paragraph, though, indicates that PGCC is a subset of the World 
> > eBook Library Consortia, which I'm almost positive is not related directly 
> > to PG.
> 
> > [quote] Project Gutenberg Consortia Center, promoting global literacy by 
> > multiplying intellectual properties though Internet library lending and 
> > increasing access to digital archives and repositories. Project Gutenberg 
> > Consortia Center is a branch of The World eBook Library Consortia. [/quote]
> 
> This is probably something that obviously needs correction, if you can send
> the exact location I will forward it so it can be corrected immediately.
> As far as I know, there should be no reference to World eBook Library.
> BTW, this is a different World Library than donated us the Shakespeare
> files from which we made book #100.
> 

On the main pgcc.net page in the lower right corner, in the green side bar area.  There is a Project Gutenberg Consortia Center logo with the above text below it.

> > Josh
> >
> > PS Oh, and I strongly disagree with the characterisation of pgcc.net as the 
> > "new PG" site, which no matter how you MEAN it sound, it will imply that it 
> > is the "replacement" for the OLD site (gutenberg.org).
> 
> Sorry, that was quoted from someone else who used the term
> "new site (gutenberg.org)" in reference to the location of
> the Greenstone program, and I obviously should have made
> that quotation clearly marked.
> 

Fair enough and thanks for the clarification.

Josh
From gbnewby at pglaf.org  Thu Dec 16 12:04:47 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Dec 16 12:04:48 2004
Subject: [gutvol-d] pgcc
In-Reply-To: <20041216192437.399854F432@ws6-5.us4.outblaze.com>
References: <20041216192437.399854F432@ws6-5.us4.outblaze.com>
Message-ID: <20041216200447.GB14891@pglaf.org>

Two clarifications:

>From one message:

>On Thu, 16 Dec 2004, Joshua Hutchinson wrote:
>>As a side question ---
>>
>>Is PGCC a part of PG?  Officially?  Does Greg, for instance, have oversight
>>over it.  Is anyone from PG (outside of Michael) invovled in this site in
>>any way?
> 
>Yes, Greg has spent plenty of time on PGCC, more than plenty.

The question was whether I have oversight.  The answer to
that question is basically, "no," but a qualified "no."  It's
definitely true that I've spent plenty of time on PGCC.

As mentioned way back when, there were just a few criteria
on my list for what the PGCC (n??e PG2) site needed to do in order to 
use the "Project Gutenberg" trademark.  Both Michael and I
(via PGLAF) have oversight for use of the mark.

John Guagliardo (john@guagliardo.cc) is the person behind
PGCC who funded & orchastrated it.

The list was fairly straightforward: make sure correct
small print & attribution is there; don't put our free
eBooks behind for-fee sites; decouple the World eBook Library
(John's for-fee site) from the PGCC.  Also some fundamental
usability stuff, though I kept that to a minimum since I 
don't want to design or maintain someone else's site.  John
complied with all those things.

As Michael mentioned, the newsletter carried requests for
proofreading & feedback for the pgcc site for something over
4 months, leading up to a grand opening on November 4.  I'm
not sure how grand it was, but nevertheless it's there and
available.  We carry periodic updates from PGCC in the weekly
newsletter.

During the same time the PGCC site was being rolled out and
tested, I expended a fair amount of effort (with Michael & John)
to help solve the issues listed above, and also work on some ideas
for moving forward.  As someone quoted elsewhere, the idea is
for PGCC to be a "collection of collections," rather than a 
producer of eBooks.  This is a pretty clear delineation between
what we do (gutenberg.org) and what pgcc does, with no substantial
overlap in activities.

During that same time, Michael and I rolled out some new
documents (mis-named FAQ0, FAQ1 and FAQ3) that better describe
the link between the mission ("to encourage the creation and
distribution of eBooks") and various activities that either
spin-off or augment gutenberg.org (such as pg-eu), or that
work towards the mission in fairly different ways (such as pgcc).
We also ran requests for feedback etc. in the newsletter
for several months, and got several good suggestions. 

>On Thu, Dec 16, 2004 at 02:24:37PM -0500, Joshua Hutchinson wrote:
>> 
>> ----- Original Message -----
>> From: "Michael Hart" <hart@pglaf.org>
>> > 
>> > On Thu, 16 Dec 2004, Joshua Hutchinson wrote:
>> > 
>> > > Another paragraph, though, indicates that PGCC is a subset of the World 
>> > > eBook Library Consortia, which I'm almost positive is not related directly 
>> > > to PG.
>> > 
>> > > [quote] Project Gutenberg Consortia Center, promoting global literacy by 
>> > > multiplying intellectual properties though Internet library lending and 
>> > > increasing access to digital archives and repositories. Project Gutenberg 
>> > > Consortia Center is a branch of The World eBook Library Consortia. [/quote]
>> > 
>> > This is probably something that obviously needs correction, if you can send
>> > the exact location I will forward it so it can be corrected immediately.
>> > As far as I know, there should be no reference to World eBook Library.
>> > BTW, this is a different World Library than donated us the Shakespeare
>> > files from which we made book #100.
>> > 
>> 
>> On the main pgcc.net page in the lower right corner, in the green side bar area.  There is a Project Gutenberg Consortia Center logo with the above text below it.
>> 
>> > > Josh
>> > >
>> > > PS Oh, and I strongly disagree with the characterisation of pgcc.net as the 
>> > > "new PG" site, which no matter how you MEAN it sound, it will imply that it 
>> > > is the "replacement" for the OLD site (gutenberg.org).
>> > 
>> > Sorry, that was quoted from someone else who used the term
>> > "new site (gutenberg.org)" in reference to the location of
>> > the Greenstone program, and I obviously should have made
>> > that quotation clearly marked.
>> > 
>> 
>> Fair enough and thanks for the clarification.

Sorry to correct Michael on this, but in fact we specifically
decided that it was fine for PGCC to have some sort of credit/reference
to WEB, since WEB is the sponsor.  My view is that a recognition
of such sponsorship is fine, but that it's inappropriate to
further entangle the sites (for example, having links on pgcc
that go to other pgcc pages, and to also to WEB pages, interspersed).
My view is that the pgcc site has an appropriate & minimalist
set of links & info about WEB.

As always, feedback (to this list, to John, etc.) is welcome.

And, let me remind you that there is always opportunity
for even more new efforts to support the Gutenberg mission -
see http://gutenberg.net/about - there is plenty of good
work left to do!!!
  -- Greg


From jmdyck at ibiblio.org  Thu Dec 16 13:53:25 2004
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Thu Dec 16 13:55:12 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<Pine.LNX.4.60.0412160805330.1015@pglaf.org>
Message-ID: <41C203D5.86003F86@ibiblio.org>

Michael Hart wrote:
> 
> This is exactly the reason for having a separate number,
> so people will NOT get the .nfo format unless they want it.

The latter is a fine goal, but it seems to me that giving a Folio file a
separate etext number achieves precisely the opposite effect.

If a volume is available in several formats, the easiest way to convey this
fact is in a tabular listing with a "format" column. This is what the PG
online catalog's 'bibrec' pages do. However, (I'm pretty sure) a bibrec
page can only show data associated with one etext number. Conversely, the
pages that show info for multiple etexts (e.g., search results or browse
authors) do *not* convey format information.

Thus, having a different etext number for a Folio version (or for any
particular-format version) actually obscures the format distinction, making
it *more* likely that someone will get the .nfo format when they don't want
it (or plain text when they'd prefer html, or vice versa, etc).

Of course, the decision for Decline & Fall was made back in 1997, before we
had bibrec pages, or even much of an online catalog, I think. Perhaps it
made more sense given the access and indexing methods of the day, though as
far as I can tell, very little use was made of etext numbers in accessing
files. (Instead, one used filenames like etext97/dfre310xx.xxx.) Anyway,
the argument of people not getting unwanted formats would seem to point in
the opposite direction now.

Or, as Marcello put it: "The Right Thing to do is to reindex all formats
(TXT, HTML, Folio) under one etext number. Then the software would sort it
in a sensible way."

-Michael

From traverso at dm.unipi.it  Thu Dec 16 23:40:26 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Thu Dec 16 23:40:48 2004
Subject: !@!Re: [gutvol-d] [Fwd: Folio files]
In-Reply-To: <41C203D5.86003F86@ibiblio.org> (message from Michael Dyck on
	Thu, 16 Dec 2004 13:53:25 -0800)
References: <20041216152052.3CD762F97C@ws6-3.us4.outblaze.com>
	<Pine.LNX.4.60.0412160805330.1015@pglaf.org>
	<41C203D5.86003F86@ibiblio.org>
Message-ID: <200412170740.iBH7eQbH003244@posso.dm.unipi.it>

>>>>> "Michael" == Michael Dyck <jmdyck@ibiblio.org> writes:


    Michael> Or, as Marcello put it: "The Right Thing to do is to
    Michael> reindex all formats (TXT, HTML, Folio) under one etext
    Michael> number. Then the software would sort it in a sensible
    Michael> way."


I agree, but if MH objects to the renumbering another option is to
have all the formats  in all the numbers.  This can be done with
symbolic links, so that  no duplication of files occurs.   We will
have a duplication of bibrec records, but even choosing the wrong
number you'll get the correct file anyway.
 
And another link can go to the "Hall of shame". 

        
Carlo Traverso
  
  
From marcello at perathoner.de  Fri Dec 17 07:22:08 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Dec 17 07:22:13 2004
Subject: [gutvol-d] Distributed Proofreaders in German 
Message-ID: <41C2F9A0.4090004@perathoner.de>

There's a new DP for German texts only at

   http://www.gaga.net/

producing books for PG-DE

   http://gutenberg.spiegel.de/


-- 
Marcello Perathoner
webmaster@gutenberg.org

From hart at pglaf.org  Fri Dec 17 08:45:25 2004
From: hart at pglaf.org (Michael Hart)
Date: Fri Dec 17 08:45:27 2004
Subject: [gutvol-d] [BP] "A call to Arms" (fwd)
Message-ID: <Pine.LNX.4.60.0412170844400.8759@pglaf.org>


Can anyone send me a plain text copy?

http://www.library.unisa.edu.au/about/papers/calltoarms.pdf


Thanks & Best Regards,
Veenu

[Moderator: The URL above is for an 11-page paper titled
  "A Call to Arms: What in the World is Happening to Information?"
  by Karen Williams, reference librarian at the University of South Australia.

  In the abstract, she writes: "We are fighting a battle, and that battle
  is all about the provision of and access to information.  This paper looks
  briefly at how the provision of information has created gaps between those
  who have access, and those who do not .... Only if national information
  policies are moulded with a basis of equal access for all will the
  future be brighter.... The author concludes that much can be done
  by librarians..." - JMO]
-----------------------------------------------------------------------------
This message was sent via the Book People mailing list.
Posting address:              spok+bookpeople@cs.cmu.edu
Admin. & unsubscribe address: spok+bookpeople-request@cs.cmu.edu
Charter:                      http://onlinebooks.library.upenn.edu/bplist/
From flis at detk.com  Fri Dec 17 12:16:12 2004
From: flis at detk.com (William Flis)
Date: Fri Dec 17 12:09:41 2004
Subject: [gutvol-d] PG books used by visually impaired
In-Reply-To: <41B72AC9.4080105@srv.net>
Message-ID: <LBELIICCBHDEONNDCACJGEEICIAA.flis@detk.com>

> It's a monotone reading, but usable. More recent versions (i.e. FC3) seem
> to  sound smoother (less computerized) than earlier ones (i.e. RH9). Don't
> know if this is because of the Linux or the festival versions. Comes
> standard  with most recent RedHat and Fedora Core Linux installs, and
> probably others.

My old Macintosh (OS 7.5-8) had different "voices" to choose from. One
called "Cellos" was definitely not monotone--it sang the words in the melody
of "Hall of the Mountain King". I wrote a little poem that fit the
meter/rhythm and recorded it as my voice-mail answer--I still use it. The
funniest part is that I occasionally receive messages that people have left,
sung in the same melody (those are "keepers"!). Call me up for a demo.

William J. Flis
DE Technologies, Inc.
3620 Horizon Drive
King of Prussia, PA  19406
610-270-9700 x130

From stephen.thomas at adelaide.edu.au  Wed Dec 15 21:08:20 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Fri Dec 17 16:12:23 2004
Subject: [gutvol-d] Re: Google Partners with Oxford,
	Harvard & Others to Digitize Libraries
In-Reply-To: <41BEB3A3.1030805@adelaide.edu.au>
References: <41BEB3A3.1030805@adelaide.edu.au>
Message-ID: <41C11844.9060106@adelaide.edu.au>

There's actually quite a lot about this project -- "Google 
Print" -- to be learned from their own web site:

http://print.google.com/

which may dispel some of the misconceptions I see being bandied 
about. It also raises some more questions. (E.g. it is not at 
all clear whether their "print only" policy will apply to 
everything, or only the copyright books.)

But also, to keep this in perspective, it may be worth 
remembering the recent Google stock float, which may have some 
influence on the timing of this press release and the previous 
(last week's) release about Google Scholar. Clearly, these 
things are going to do no harm to their stock price.

Google Print has a long way to go, and I wish them well.

Steve

-- 

Stephen Thomas,
Senior Systems Analyst,
University of Adelaide Library
UNIVERSITY OF ADELAIDE SA 5005
AUSTRALIA
Phone: +61 8 830 35190  Fax: +61 8 830 34369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

CRICOS Provider Number 00123M
-----------------------------------------------------------
This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.

From bill at janssen.org  Thu Dec 16 00:07:12 2004
From: bill at janssen.org (Bill Janssen)
Date: Fri Dec 17 16:12:24 2004
Subject: [ebook-community] Re: [gutvol-d] re: [BP] Google Partners with
	Oxford, Harvard & Others to Digitize Libraries 
In-Reply-To: Your message of "Wed, 15 Dec 2004 20:39:35 PST."
	<19677883578.20041215213935@noring.name> 
Message-ID: <04Dec16.000715pst."58617"@synergy1.parc.xerox.com>

> It's entirely possible that Google will give, upon request, the page
> scans for any public domain books they've scanned to established
> groups like Distributed Proofreaders for conversion into proofed SDT,
> so long as Google gets a copy of the resulting high-quality SDT.

My guess is that part of the deal is that the libraries are going to
get copies of those page scans, and they will probably make them
available in various ways in addition to whatever Google does with
them.

By the way, it's astonishing to me how far OCR has come in the last 10
years.  I think the low cost of storage has made page image storage of
many historical documents feasible, relatively suddenly, and that
means that the problem of OCR'ing handwritten text, odd fonts, early
books, and other similar things has suddenly become a hot research
topic.

Bill
From jlinden at ticluse.com  Fri Dec 17 13:30:44 2004
From: jlinden at ticluse.com (James Linden)
Date: Fri Dec 17 16:12:26 2004
Subject: [gutvol-d] [BP] "A call to Arms" (fwd)
In-Reply-To: <Pine.LNX.4.60.0412170844400.8759@pglaf.org>
References: <Pine.LNX.4.60.0412170844400.8759@pglaf.org>
Message-ID: <41C35004.4030004@ticluse.com>

http://www.kodekrash.com/project/calltoarms.txt

-- James

Michael Hart wrote:

>
> Can anyone send me a plain text copy?
>
> http://www.library.unisa.edu.au/about/papers/calltoarms.pdf
>
>
> Thanks & Best Regards,
> Veenu
>
> [Moderator: The URL above is for an 11-page paper titled
>  "A Call to Arms: What in the World is Happening to Information?"
>  by Karen Williams, reference librarian at the University of South 
> Australia.
>
>  In the abstract, she writes: "We are fighting a battle, and that battle
>  is all about the provision of and access to information.  This paper 
> looks
>  briefly at how the provision of information has created gaps between 
> those
>  who have access, and those who do not .... Only if national information
>  policies are moulded with a basis of equal access for all will the
>  future be brighter.... The author concludes that much can be done
>  by librarians..." - JMO]
> ----------------------------------------------------------------------------- 
>
> This message was sent via the Book People mailing list.
> Posting address:              spok+bookpeople@cs.cmu.edu
> Admin. & unsubscribe address: spok+bookpeople-request@cs.cmu.edu
> Charter:                      
> http://onlinebooks.library.upenn.edu/bplist/
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

From hart at pglaf.org  Sat Dec 18 09:25:51 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Dec 18 09:25:53 2004
Subject: [ebook-community] Re: [gutvol-d] re: [BP] Google Partners with
	Oxford, Harvard & Others to Digitize Libraries 
In-Reply-To: <04Dec16.000715pst."58617"@synergy1.parc.xerox.com>
References: <04Dec16.000715pst."58617"@synergy1.parc.xerox.com>
Message-ID: <Pine.LNX.4.60.0412180924370.11506@pglaf.org>


On Thu, 16 Dec 2004, Bill Janssen wrote:

>> It's entirely possible that Google will give, upon request, the page
>> scans for any public domain books they've scanned to established
>> groups like Distributed Proofreaders for conversion into proofed SDT,
>> so long as Google gets a copy of the resulting high-quality SDT.
>
> My guess is that part of the deal is that the libraries are going to
> get copies of those page scans, and they will probably make them
> available in various ways in addition to whatever Google does with them.

AFAIK each library will keep the scans of their own books,
and z/j/ealously guard them. . . .


> By the way, it's astonishing to me how far OCR has come in the last 10
> years.  I think the low cost of storage has made page image storage of
> many historical documents feasible, relatively suddenly, and that
> means that the problem of OCR'ing handwritten text, odd fonts, early
> books, and other similar things has suddenly become a hot research topic.

I heard they are still having huge troubles with older books. . . .

mh
From hart at pglaf.org  Sat Dec 18 09:28:11 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Dec 18 09:28:13 2004
Subject: [gutvol-d] Re: Google Partners with Oxford, Harvard & Others to
	Digitize Libraries
In-Reply-To: <41C11844.9060106@adelaide.edu.au>
References: <41BEB3A3.1030805@adelaide.edu.au>
	<41C11844.9060106@adelaide.edu.au>
Message-ID: <Pine.LNX.4.60.0412180926160.11506@pglaf.org>


On Thu, 16 Dec 2004, Steve Thomas wrote:

> There's actually quite a lot about this project -- "Google Print" -- to be 
> learned from their own web site:
>
> http://print.google.com/
>
> which may dispel some of the misconceptions I see being bandied about. It 
> also raises some more questions. (E.g. it is not at all clear whether their 
> "print only" policy will apply to everything, or only the copyright books.)

You can only SEE a few pages from the copyrighted books,
but can NEVER print ANY pages from ANY books, as far as
I can tell.


> But also, to keep this in perspective, it may be worth remembering the recent 
> Google stock float, which may have some influence on the timing of this press 
> release and the previous (last week's) release about Google Scholar. Clearly, 
> these things are going to do no harm to their stock price.

As far as I can tell, the timing was based on being exacly one year from our
meeting with them when we pitched the eLibrary idea at their headquarters,
after which the silence was deafening. . .no replies to emails. . . .

>

mh
From stephen.thomas at adelaide.edu.au  Sun Dec 19 20:48:30 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Sun Dec 19 20:48:50 2004
Subject: [gutvol-d] Re: Google Partners with Oxford,	Harvard & Others
	to Digitize Libraries
In-Reply-To: <41C11844.9060106@adelaide.edu.au>
References: <41BEB3A3.1030805@adelaide.edu.au>
	<41C11844.9060106@adelaide.edu.au>
Message-ID: <41C6599E.4000008@adelaide.edu.au>

Playing around with this a little more, I found something 
interesting:

First I did a standard Google for "William Morris".

The results page lists one book result at the top, which takes 
you to a page for the Cambridge Uni. press edition of "News from 
nowhere". (Surely they have other editions/titles too?)

This only provides a few pages, but if you use "Search within 
this book" you can get other pages. Specifically, I typed in the 
keyword "the" and seem to have made it list ALL the pages 
(assuming the word "the" would appear on every page).

They have used some Javascript magic to prevent you from saving 
the page images. I dare say someone will figure out a way round 
that. Also they seem to have some kind of counter that stops you 
viewing too many pages at once.

All of this is aimed at their publisher market of course -- they 
want publishers to let them scan their books ard make them 
searchable, and this is the trade-off.

Still not clear whether they'll treat the PD stuff from librarys 
the same way.


Steve

-- 

Stephen Thomas,
Senior Systems Analyst,
University of Adelaide Library
UNIVERSITY OF ADELAIDE SA 5005
AUSTRALIA
Phone: +61 8 830 35190  Fax: +61 8 830 34369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

CRICOS Provider Number 00123M
-----------------------------------------------------------
This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.

From stephen.thomas at adelaide.edu.au  Sun Dec 19 21:32:12 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Sun Dec 19 21:32:31 2004
Subject: [gutvol-d] Re: Google Partners with Oxford,	Harvard & Others
	to Digitize Libraries
In-Reply-To: <41C6599E.4000008@adelaide.edu.au>
References: <41BEB3A3.1030805@adelaide.edu.au>
	<41C11844.9060106@adelaide.edu.au>
	<41C6599E.4000008@adelaide.edu.au>
Message-ID: <41C663DC.6080006@adelaide.edu.au>

Strangely, searching for "William Morris News from Nowhere" does 
NOT bring up the book link! So I guess Google still have a few 
wrinkles to iron out.

Steve


Steve Thomas wrote:

> Playing around with this a little more, I found something interesting:
> 
> First I did a standard Google for "William Morris".
> 
> The results page lists one book result at the top, which takes you to a 
> page for the Cambridge Uni. press edition of "News from nowhere". 
> (Surely they have other editions/titles too?)
> 
> This only provides a few pages, but if you use "Search within this book" 
> you can get other pages. Specifically, I typed in the keyword "the" and 
> seem to have made it list ALL the pages (assuming the word "the" would 
> appear on every page).
> 
> They have used some Javascript magic to prevent you from saving the page 
> images. I dare say someone will figure out a way round that. Also they 
> seem to have some kind of counter that stops you viewing too many pages 
> at once.
> 
> All of this is aimed at their publisher market of course -- they want 
> publishers to let them scan their books ard make them searchable, and 
> this is the trade-off.
> 
> Still not clear whether they'll treat the PD stuff from librarys the 
> same way.
> 
> 
> Steve
> 

-- 

Stephen Thomas,
Senior Systems Analyst,
University of Adelaide Library
UNIVERSITY OF ADELAIDE SA 5005
AUSTRALIA
Phone: +61 8 830 35190  Fax: +61 8 830 34369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

CRICOS Provider Number 00123M
-----------------------------------------------------------
This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.

From jonhendry at mac.com  Sun Dec 19 22:48:42 2004
From: jonhendry at mac.com (Jonathan Hendry)
Date: Sun Dec 19 22:49:11 2004
Subject: [gutvol-d] 'CDDB' for Gutenberg texts
Message-ID: <300811F0-5253-11D9-ABD1-000A956D5546@mac.com>

Hi, I'm new here. I hope this isn't out of place.

I'm working on a Mac OS X program for reading Gutenberg e-texts.

It occurs to me that it would be useful if there were something for 
Gutenberg e-texts akin to the CDDB database for MP3s. It would hold 
information about e-texts, keyed to the Gutenberg filename. The sort of 
information stored would be things like long-format titles, author's 
name, information about the Gutenberg file if it's a revision, 
information about the original source text, etc.

This would all be useful for developers of ebook readers, or web 
interfaces to the Gutenberg texts. This information is often available 
in the files themselves, but it would be difficult to extract it 
through software.

It might be extended to include character lists for novels or plays, 
synopses, summaries, connections to other works, byte offsets to 
chapter starts, file-specific aids to parsing, and other useful bits of 
information.

The information would be supplied by users, piece by piece, similar to 
the way people submit track listings to CDDB. Ideally, etext reader 
apps would have a UI for entering and uploading new information.

There'd be no change to the Gutenberg files themselves. The meta-info 
would all be kept apart from the e-texts. So the format need not 
change, old texts wouldn't need updating, and the files would remain 
universally compatible.

If the user has an etext program which supports it, then after 
downloading a text, they would have the option download the meta-info 
from a separate 'gtdb' server. The program could then use the meta-info 
to enhance the user interface.

Naturally, the "gtdb" database would be non-commercial, and in some 
non-proprietary format, and/or available as SQL dumps.

So, my questions.

1) Is anyone working on such a thing already?
2) Has such a thing been discussed?
3) Does anyone else think it'd be a good thing?

Thanks,

Jon

From hacker at gnu-designs.com  Sun Dec 19 23:35:55 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Sun Dec 19 23:37:03 2004
Subject: [gutvol-d] 'CDDB' for Gutenberg texts
In-Reply-To: <300811F0-5253-11D9-ABD1-000A956D5546@mac.com>
References: <300811F0-5253-11D9-ABD1-000A956D5546@mac.com>
Message-ID: <Pine.LNX.4.58.0412200233001.32089@aphrodite.gnu-designs.com>


> It occurs to me that it would be useful if there were something for
> Gutenberg e-texts akin to the CDDB database for MP3s.

	You mean like the RDF catalog of all of the Gutenberg texts?

	http://gutenberg.net/browse/rdf/catalog.rdf.bz2

	I've posted perl here before that splits this apart and
imports it into SQL in about 8 lines of code. Search the archives.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From hart at pglaf.org  Mon Dec 20 09:55:46 2004
From: hart at pglaf.org (Michael Hart)
Date: Mon Dec 20 09:55:49 2004
Subject: [gutvol-d] Need Help With MS Word File
Message-ID: <Pine.LNX.4.60.0412200953010.1091@pglaf.org>


A volunteer who has been working on a book for us for three years
is very near completion, but can't do any more due to medical issues.

If anyone is willing to take a look at this book and help get it
into a final format, please let me know.

Forwarded message:

Date: Mon, 20 Dec 2004 11:13:38 -0500
From: Jeanette Hayward <jeanett@teacher.com>
To: hart@beryl.ils.unc.edu
Subject: Ebook submission

Hello,
My name is Jeanette Hayward.  I began transcribing Henry A. Beers,
A History of English Romanticism in the Eighteenth Century nearly 3
years ago. Because of a number of issues,I am just now getting to the
finished state.  Unfortunately, I will not be able to do much more
with the transcription.  But, I did want to submit the work because
I feel it is important to be able to add it to the collection.

I tried sending this message to the submission team; however, my ISP
apparently had difficulty recognizing the address as a valid e-mail
address.

***

My HUGE thanks to anyone who can help!

Michael


From nihil_obstat at mindspring.com  Mon Dec 20 14:42:02 2004
From: nihil_obstat at mindspring.com (Dennis McCarthy)
Date: Mon Dec 20 14:42:14 2004
Subject: [gutvol-d] Need Help With MS Word File
Message-ID: <3221311.1103582522659.JavaMail.root@wamui10.slb.atl.earthlink.net>


The original message was a little vague on the details, so I contacted Jeanette Hayward for some more information the the help request for:

Henry A. Beers
A History of English Romanticism in the Eighteenth Century


Mrs. Hayward began this text three years ago, but has had to stop in a near finished state to deal with serious family medical issuse.

A friend loaned some webspace, so you may download a sample 10 pages of the text at:
http://www.lakeclaire.org/beers/beers_sample.doc

The full text is about 250 pages worth of text.

If like what you see, and wish to contact Mrs. Hayward to adopt this project, please contact her at: jeanett@teacher.com

She can also give you the web address for the full as-is transcription.

--------------

"Finished state" refers to the proofing, and/or final version (HTML, etc).  She has typed in everything except the index within the book which isn't necessary in this format.

Currently, it is in MS-Word, but can be converted to anything you wish to use.

She has proofed as she went along, but as always, it would be better to have someone else proof also.

There are some "misspellings" but those are the author's words, not typos.  So, she is willing to ship the original copy to the proofer.


-----Original Message-----
From: Michael Hart <hart@pglaf.org>
Sent: Dec 20, 2004 12:55 PM
To: Project Gutenberg Whitewashers <pgww@lists.pglaf.org>, 
	The gutvol-d Mailing List <gutvol-d@lists.pglaf.org>
Subject: [gutvol-d] Need Help With MS Word File


A volunteer who has been working on a book for us for three years
is very near completion, but can't do any more due to medical issues.

If anyone is willing to take a look at this book and help get it
into a final format, please let me know.

Forwarded message:

Date: Mon, 20 Dec 2004 11:13:38 -0500
From: Jeanette Hayward <jeanett@teacher.com>
To: hart@beryl.ils.unc.edu
Subject: Ebook submission

Hello,
My name is Jeanette Hayward.  I began transcribing Henry A. Beers,
A History of English Romanticism in the Eighteenth Century nearly 3
years ago. Because of a number of issues,I am just now getting to the
finished state.  Unfortunately, I will not be able to do much more
with the transcription.  But, I did want to submit the work because
I feel it is important to be able to add it to the collection.

I tried sending this message to the submission team; however, my ISP
apparently had difficulty recognizing the address as a valid e-mail
address.

***

My HUGE thanks to anyone who can help!

Michael


_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


---------------------------
Dennis McCarthy
nihil_obstat@mindspring.com

From ajhaines at shaw.ca  Tue Dec 21 10:35:10 2004
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Tue Dec 21 10:35:10 2004
Subject: [gutvol-d] Help with German text
Message-ID: <000c01c4e78b$cdd9a600$6401a8c0@ahainesp2600>

I'm working on a book that's mostly in English, but that has several short passages in German.  Each German passage is immediately followed by its English translation.  According to my research, the German material is using the Fraktur alphabet.  None of the German characters appear to use any accenting - umlauts, etc.

I think I've generally managed to transliterate the German characters into their English equivalents, but since I don't understand German, I'm baffled as to how to tell the difference, in some contexts, between its lower-case f's and s's, and between its lower-case k's and t's.

Is there someone out there to whom I can send the six page scans involved and their matching text files, to have my transliteration checked?

Al
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041221/de26d9f1/attachment.html
From hart at pglaf.org  Thu Dec 23 09:58:14 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 23 09:58:17 2004
Subject: [gutvol-d] !@!Re: E-DOCS: Google Print Questions [J. Roland]
In-Reply-To: <20041223031630.16124F2548@boggle.pobox.com>
References: <20041223031630.16124F2548@boggle.pobox.com>
Message-ID: <Pine.LNX.4.60.0412230923370.9920@pglaf.org>


On Wed, 22 Dec 2004, Lloyd Benson wrote:

> From: Jon Roland <jon.roland@constitution.org>
> Date: Tue, 21 Dec 2004 14:50:22 -0600
> Subject: Google Print questions
>
> The announcements would seem to suggest that Google intends to not only scan 
> the images of all these books, but to OCR and correct the recognition errors 
> of all of them, so they can be made searchable, offer the complete texts of 
> all the public domain works, and excerpts of the copyrighted ones (presumably 
> under the fair use doctrine). One announcement also estimated a cost of $10 
> per volume.

Project Gutenberg has already produced and distributed nearly 15,000 eBooks,
with a budget that has yet to reach a significant total for all 33+ years,
and is projected to reach a million eBooks without undue expense or effort.

We'll just have to wait and see if either Google Print, or any of the various
"Million eBook Projects" will ever come up with even 1% of a million eBooks
that you can carry with you on a one inch stack of plain homemade DVDs.

If it hasn't been proofread, and if you can't take it with you, it is only
of limited value. . .sort of like reading over someone's shoulder.

With Project Gutenberg eBooks, you OWN them. . .forever. . .and can save them
in your own favorite formats, fonts, margination, pagination, or whatever,
and you can search, quote, print, and do all the normal eBook fuctions.

"A picture of a book is not an eBook."

The term eBook should not be used to describe raw scans or raw OCR,
as has been tried by some of the Google and "Million eBook" particpants
over the past decade.

I would say that an eBook has to be at least 99.9% accurate, and that it
should then be a process as people read the eBooks, to send in corrections.

Most of the Project Gutenberg and Distributed Proofeaders would say it has
to be over 99.99% and perhaps even over 99.999%.

99.999% would be one error perhaps every 100 pages or so, and I'm pretty
sure the source materals we have are not that accurate. . .not that eBooks
won't become more and more accurate, closer and closer to 100% accuracy,
but I'm not sure they have to be all the much better than 99.9% before
they can be made available.

> This is highly ambitious, even to scan the images. The experience of the U. 
> of Michigan should show that it is not feasible to OCR these works accurately 
> for that cost, or in that timeframe. While uncorrected OCRs might enable 
> search, since most words appear more than once in a work, and at least one of 
> them might be expected to be recognized accurately, searching on entire 
> phrases could be expected to be much more problematic.

I have heard this described before. . .has anyone tried their test eBooks???


> As one who works from a lot of older works to not only scan and OCR but 
> correct them, I know how much human labor is involved. There are volunteer 
> efforts like Distributed Proofreaders http://www.pgdp.net/c/default.php , but 
> I have concluded that it takes me more time to set up a project for them than 
> it would take for me to do the proofreading myself, and my work would likely 
> be more accurate, since I would understand the underlying content and know 
> how to render obscure text.

While it does take a little time to set up one's first project with the
Distributed Proofreaders, it is usually quite a bit easier the second time,
not to mention that we have volunteers who will walk you through processes
the first few times around, which seems to do the trick for nearly everyone.


> So my basic question and concern is, how do we ensure that this project does 
> not release too many uncorrected texts into the world that never get 
> corrected, and perhaps propagate errors that come to be accepted as accurate 
> even when they are not?

I wonder how many of these will be "released into the world". . .I have a
strong suspicion that the answer is "none."  Unless some outside source
does it.


> I would submit that it would be better to prioritize these works and release 
> fully corrected and annotated digital editions of the most important first, 
> going for quality rather than quantity. This has been the approach used by 
> the online collections such as ours at 
> http://www.constitution.org/liberlib.htm Although we do put some works up 
> before the correcting and reformatting is finished, we always flag those that 
> are still in progress, indicating the state of completion, and we stand by to 
> quickly make corrections that outsiders may discover are needed.

I view all eBooks as "still in progress" as I have never proofread one in which
I didn't find any mistakes. . . .

My own views are that I would prefer to have access to twice as many eBooks
at the 99.95% accuracy level [the Library on Congress standard] than half
as many at the 99.995% level I think is being suggested here.

After all, the books that get read the most will be the ones that get the
most corrections. . .an obvious way to aim effort at the proper targets!

Not only that, but, viewing the entire eBook effort as a 50 year process,
of which I have walked 33+ years, I must state for the record that I think
OCR, spellcheckers, grammarcheckers., etc. will be so much better a decade
from now that doing the proofreading on the more obscure works will require
so much less effort than it does today, that it will be a great trade-off.

I'm not at all sure why people want eBooks to be so perfect to start with.
I would prefer to get all 10 million public domain works we can find. . .
or at least a million of them. . .online and freely downloadable before we
try to approach the 100% accuracy level.

Of course, I don't believe in the "raw OCR" idea that seems to be what the
Google Print idea has in mind, even with spelling and "scanno" checkers,
and I also don't believe in going so far in the other direction that we
try for such accuracy levels that the number of eBooks only grows at half
the rate it has been growing.

The path is obviously somewhere in the middle. . .machine production is
obviously not accurate enough [except in certain tests I have seen run
with high contrast new materials] and after a certain point it becomes
inefficient to keep proofreading before letting the public have access.

After all, the public IS what this is all about, is it not?

So let's let the public do the final proofreading, as a process, for all
the years to come. . .at least until we have OCR that makes only 1 error
in a million characters. . .and thus most of the errors we find are from
the original publications.  [Bye the bye, this is one of the reasons for
using more than one paper edition to produce an eBook, when multiples of
paper editions are available.  Then the machine processes can compare to
find even more errors.

Well, enough now. . .let's make more eBooks!!!


Thanks!!!


So Nice To Hear From You!

Happy Holidays!!!


Michael


Give FreeBooks!!!
In 39 Languages!!!

As of December 23, 2004
~14,780 FreeBooks at:
~220 to go to 15,000
http://www.gutenberg.org
http://www.gutenberg.net

We are ~95% of the way
from 10,000 to 15,000.

Now even more PG eBooks
In 104 Languages!!!
http://gutenberg.cc
http://gutenberg.us

Michael S. Hart
<hart@pobox.com>
Project Gutenberg
Executive Coordinator^M
"*Internet User ~#100*"

If you do not receive
a prompt reply, please
resend, keep resending.

From joshua at hutchinson.net  Thu Dec 23 10:12:52 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Dec 23 10:12:59 2004
Subject: [gutvol-d] !@!Re: E-DOCS: Google Print Questions [J. Roland]
Message-ID: <20041223181252.F236510992F@ws6-4.us4.outblaze.com>


----- Original Message -----
From: "Michael Hart" <hart@pglaf.org>

> > As one who works from a lot of older works to not only scan and OCR but 
> > correct them, I know how much human labor is involved. There are volunteer 
> > efforts like Distributed Proofreaders http://www.pgdp.net/c/default.php , 
> > but I have concluded that it takes me more time to set up a project for them 
> > than it would take for me to do the proofreading myself, and my work would 
> > likely be more accurate, since I would understand the underlying content and 
> > know how to render obscure text.
> 
> While it does take a little time to set up one's first project with the
> Distributed Proofreaders, it is usually quite a bit easier the second time,
> not to mention that we have volunteers who will walk you through processes
> the first few times around, which seems to do the trick for nearly everyone.
> 


I just want to make a quick comment on this part (since I somehow missed the initial e-mail).

Setting up projects at DP is not time consuming (well, the upload of the image files can be, depending on your internet connection), especially once you've done it a few times.  As one of the larger DP project managers (currently at 687 projects created for DP), I can tell you that there is NO WAY to proof even an easy text in the amount of time it takes to create and upload the project to DP.  Even if I take into account OCR time (which I batch up and run overnight), it is still less time than I would take to proof the work.

I can also reiterated Michael's comment that there are plenty of folks ready to help out new content providers on their first few projects.  It can be a little daunting the first time, but it gets easier once you've done a couple times.  Also, for folks that don't want to get heavily involved, we can usually work something out with someone that just wants to provide the image scans.  We can usually take it from there (assuming they are public domain scans, of course).

Josh

PS I also haven't created any new projects in many months because of the backlog we've got in the system.  I wanted to help clear out some more work before sending more into the queue.  So those 687 were done in a much shorter frame of time than my login statistics at DP might otherwise imply.
From stephen.thomas at adelaide.edu.au  Wed Dec 22 19:36:44 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Thu Dec 23 10:37:41 2004
Subject: [gutvol-d] Internet Archive to build alternative to Google [Print]
Message-ID: <41CA3D4C.1030002@adelaide.edu.au>

Yet another story -- this one on an alternative to GP:

http://www.iwr.co.uk/IWR/1160176


-- 

Stephen Thomas,
Senior Systems Analyst,
University of Adelaide Library
UNIVERSITY OF ADELAIDE SA 5005
AUSTRALIA
Phone: +61 8 830 35190  Fax: +61 8 830 34369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

CRICOS Provider Number 00123M
-----------------------------------------------------------
This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.

From nwolcott at dsdial.net  Thu Dec 23 08:58:26 2004
From: nwolcott at dsdial.net (N Wolcott)
Date: Thu Dec 23 10:37:42 2004
Subject: [gutvol-d] Need Help With MS Word File
References: <3221311.1103582522659.JavaMail.root@wamui10.slb.atl.earthlink.net>
Message-ID: <00bb01c4e910$a4443ae0$7d9895ce@gw98>

ok
----- Original Message -----
From: Dennis McCarthy <nihil_obstat@mindspring.com>
To: Michael S. Hart <hart@pobox.com>; Project Gutenberg Volunteer Discussion
<gutvol-d@lists.pglaf.org>
Cc: <jeanett@teacher.com>
Sent: Monday, December 20, 2004 5:42 PM
Subject: Re: [gutvol-d] Need Help With MS Word File


>
> The original message was a little vague on the details, so I contacted
Jeanette Hayward for some more information the the help request for:
>
> Henry A. Beers
> A History of English Romanticism in the Eighteenth Century
>
>
> Mrs. Hayward began this text three years ago, but has had to stop in a
near finished state to deal with serious family medical issuse.
>
> A friend loaned some webspace, so you may download a sample 10 pages of
the text at:
> http://www.lakeclaire.org/beers/beers_sample.doc
>
> The full text is about 250 pages worth of text.
>
> If like what you see, and wish to contact Mrs. Hayward to adopt this
project, please contact her at: jeanett@teacher.com
>
> She can also give you the web address for the full as-is transcription.
>
> --------------
>
> "Finished state" refers to the proofing, and/or final version (HTML, etc).
She has typed in everything except the index within the book which isn't
necessary in this format.
>
> Currently, it is in MS-Word, but can be converted to anything you wish to
use.
>
> She has proofed as she went along, but as always, it would be better to
have someone else proof also.
>
> There are some "misspellings" but those are the author's words, not typos.
So, she is willing to ship the original copy to the proofer.
>
>
>
> -----Original Message-----
> From: Michael Hart <hart@pglaf.org>
> Sent: Dec 20, 2004 12:55 PM
> To: Project Gutenberg Whitewashers <pgww@lists.pglaf.org>,
> The gutvol-d Mailing List <gutvol-d@lists.pglaf.org>
> Subject: [gutvol-d] Need Help With MS Word File
>
>
> A volunteer who has been working on a book for us for three years
> is very near completion, but can't do any more due to medical issues.
>
> If anyone is willing to take a look at this book and help get it
> into a final format, please let me know.
>
> Forwarded message:
>
> Date: Mon, 20 Dec 2004 11:13:38 -0500
> From: Jeanette Hayward <jeanett@teacher.com>
> To: hart@beryl.ils.unc.edu
> Subject: Ebook submission
>
> Hello,
> My name is Jeanette Hayward.  I began transcribing Henry A. Beers,
> A History of English Romanticism in the Eighteenth Century nearly 3
> years ago. Because of a number of issues,I am just now getting to the
> finished state.  Unfortunately, I will not be able to do much more
> with the transcription.  But, I did want to submit the work because
> I feel it is important to be able to add it to the collection.
>
> I tried sending this message to the submission team; however, my ISP
> apparently had difficulty recognizing the address as a valid e-mail
> address.
>
> ***
>
> My HUGE thanks to anyone who can help!
>
> Michael
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
> ---------------------------
> Dennis McCarthy
> nihil_obstat@mindspring.com
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From nwolcott at dsdial.net  Thu Dec 23 09:18:29 2004
From: nwolcott at dsdial.net (N Wolcott)
Date: Thu Dec 23 10:37:44 2004
Subject: [gutvol-d] test if going through
Message-ID: <018801c4e913$a03e3ba0$7d9895ce@gw98>

test message x
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041223/d7dc03d5/attachment.html
From Gutenberg9443 at aol.com  Thu Dec 23 15:38:36 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Thu Dec 23 15:38:57 2004
Subject: [gutvol-d] Fwd: Project Googleberg
Message-ID: <110.3fbf95f2.2efcb0fc@aol.com>

Skipped content of type multipart/alternative-------------- next part --------------
An embedded message was scrubbed...
From: Gutenberg9443@aol.com
Subject: Re: Project Googleberg
Date: Thu, 23 Dec 2004 18:35:27 EST
Size: 6675
Url: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041223/cda11de6/attachment.mht
From hacker at gnu-designs.com  Thu Dec 23 16:41:13 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Thu Dec 23 16:42:10 2004
Subject: [gutvol-d] Fwd: Project Googleberg
In-Reply-To: <110.3fbf95f2.2efcb0fc@aol.com>
References: <110.3fbf95f2.2efcb0fc@aol.com>
Message-ID: <Pine.LNX.4.58.0412231936220.21400@angst.gnu-designs.com>


>    [ Part 2: "Included Message" ]

> Date: Thu, 23 Dec 2004 18:35:27 EST
> From: Gutenberg9443@aol.com
> To: hart@pobox.com
> Subject: Re: Project Googleberg

	First and foremost, when composing email, the best place for
your text is in the the _body_ of the email, not sent as an attachment
(a non-RFC-compliant attachment at that).

	Please don't do that.

> I have examined what seems to be the preliminary "Googleprint"
> catalog. It consists of books scanned and posted by other people
> including us. At least half of them that I looked at are available
> only as page scans, and I have to want a book an awful lot to put
> page scans together just for my own use. They use a LOT of our
> books; in fact, everything they have that they are aware we have
> shows us as the best or only site to go and get the book.

	Second, when you wish to post to a mailing list about a
particular subject, it is best to read the archives first, in full, so
you can see if the subject or question you were about to ask has been
discussed before, as this one has.

	Please go back and re-read the archives of the last few weeks
to bring yourself up to speed on the issues, concerns, support, and
other items related to "Google Print".


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From stephen.thomas at adelaide.edu.au  Thu Dec 23 17:00:01 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Thu Dec 23 17:00:15 2004
Subject: [gutvol-d] Fwd: Project Googleberg
In-Reply-To: <Pine.LNX.4.58.0412231936220.21400@angst.gnu-designs.com>
References: <110.3fbf95f2.2efcb0fc@aol.com>
	<Pine.LNX.4.58.0412231936220.21400@angst.gnu-designs.com>
Message-ID: <41CB6A11.5010104@adelaide.edu.au>

Hey, who put David D. in charge of the Internet?!

David,

if you can't cope with the way other people do things on the 
'net, you'd better save yourself a lot of grief and leave now. 
Or you could try a little tolerance, and accept that some 
people, not having your supreme level of skill, will 
occasionally do "the wrong thing".

Get over it man, and while I'm on your case, get some manners.

Sheesh. Merry Christmas all.

Steve


David A. Desrosiers wrote:

>>   [ Part 2: "Included Message" ]
> 
> 
>>Date: Thu, 23 Dec 2004 18:35:27 EST
>>From: Gutenberg9443@aol.com
>>To: hart@pobox.com
>>Subject: Re: Project Googleberg
> 
> 
> 	First and foremost, when composing email, the best place for
> your text is in the the _body_ of the email, not sent as an attachment
> (a non-RFC-compliant attachment at that).
> 
> 	Please don't do that.
> 
> 
>>I have examined what seems to be the preliminary "Googleprint"
>>catalog. It consists of books scanned and posted by other people
>>including us. At least half of them that I looked at are available
>>only as page scans, and I have to want a book an awful lot to put
>>page scans together just for my own use. They use a LOT of our
>>books; in fact, everything they have that they are aware we have
>>shows us as the best or only site to go and get the book.
> 
> 
> 	Second, when you wish to post to a mailing list about a
> particular subject, it is best to read the archives first, in full, so
> you can see if the subject or question you were about to ask has been
> discussed before, as this one has.
> 
> 	Please go back and re-read the archives of the last few weeks
> to bring yourself up to speed on the issues, concerns, support, and
> other items related to "Google Print".
> 
> 
> David A. Desrosiers
> desrod@gnu-designs.com
> http://gnu-designs.com
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

-- 

Stephen Thomas,
Senior Systems Analyst,
University of Adelaide Library
UNIVERSITY OF ADELAIDE SA 5005
AUSTRALIA
Phone: +61 8 830 35190  Fax: +61 8 830 34369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

CRICOS Provider Number 00123M
-----------------------------------------------------------
This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.

From holden.mcgroin at dsl.pipex.com  Thu Dec 23 20:04:10 2004
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Thu Dec 23 20:04:30 2004
Subject: [gutvol-d] Ibn Batuta (Was Re: Fwd: Project Googleberg)
In-Reply-To: <110.3fbf95f2.2efcb0fc@aol.com>
References: <110.3fbf95f2.2efcb0fc@aol.com>
Message-ID: <41CB953A.50402@dsl.pipex.com>

Gutenberg9443@aol.com wrote:
> By the way, does ANYBODY know where we can get a public domain copy of 
> Ibn Batuta? I've had no luck finding one online. I even asked the king 
> of Saudi Arabia for a copy, but His Majesty didn't answer. The few 
> snippets I've seen are fascinating. He left his home to go on a haj, and 
> then kept going, spending 29 years travelling and writing fascinating 
> notes of where he went, namely everywhere you could get to without going 
> to Arctica, Antarctica, or the Americas.

I have to agree with Anne. Every time I hear about Ibn Batuta's amazing 
travels, I feel the urge to read his writings. Is there any chance we 
could get them online as part of Gutenberg's collection?

Cheers,
Holden
From hacker at gnu-designs.com  Thu Dec 23 20:20:22 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Thu Dec 23 20:21:22 2004
Subject: [gutvol-d] Fwd: Project Googleberg
In-Reply-To: <41CB6A11.5010104@adelaide.edu.au>
References: <110.3fbf95f2.2efcb0fc@aol.com>
	<Pine.LNX.4.58.0412231936220.21400@angst.gnu-designs.com>
	<41CB6A11.5010104@adelaide.edu.au>
Message-ID: <Pine.LNX.4.58.0412232309430.22593@angst.gnu-designs.com>


> if you can't cope with the way other people do things on the 'net,
> you'd better save yourself a lot of grief and leave now.

	Its important for others to realize that not everyone reads
their email on desktop machines, or on fully-featured email clients.
What about text-to-speech readers and PDAs? Its best to stick to the
standards, and not make up your own.

	Open your eyes, and realize the world isn't just like you.

> Or you could try a little tolerance, and accept that some people,
> not having your supreme level of skill, will occasionally do "the
> wrong thing".

	I find this to be the case in a lot of things, unfortunately.

> Get over it man, and while I'm on your case, get some manners.

	I've got plenty of manners, but thanks for pointing it out to
others who may not have the same "supreme" level of diplomacy that I
often exhibit. Google for my name, if you feel I'm some sort of rude
person without manners. You might be surprised at what you find.

> Sheesh. Merry Christmas all.

	Happy Christmahanakwanzaka to all as well.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From sly at victoria.tc.ca  Thu Dec 23 23:15:38 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Dec 23 23:15:59 2004
Subject: [gutvol-d] Ibn Batuta
In-Reply-To: <41CB953A.50402@dsl.pipex.com>
References: <110.3fbf95f2.2efcb0fc@aol.com> <41CB953A.50402@dsl.pipex.com>
Message-ID: <Pine.GSO.4.58.0412232259170.8358@vtn1.victoria.tc.ca>


On Fri, 24 Dec 2004, Holden McGroin wrote:

> I have to agree with Anne. Every time I hear about Ibn Batuta's amazing
> travels, I feel the urge to read his writings. Is there any chance we
> could get them online as part of Gutenberg's collection?

Of course there is. I believe it would just be a matter of how much
effort and expense some volunteers would like to go to in order to
make it happen. Oh, and a bit of luck too.

After reading what I could find about Ibn Batuta, I agree, this
could be worth searching out... (I would imagine in an english
translation)

Andrew
From Gutenberg9443 at aol.com  Fri Dec 24 10:26:54 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Dec 24 10:27:17 2004
Subject: [gutvol-d] Fwd: Project Googleberg
Message-ID: <ba.67dcd0cd.2efdb96e@aol.com>

 
In a message dated 12/23/2004 5:42:07 PM Mountain Standard Time,  
hacker@gnu-designs.com writes:

Second,  when you wish to post to a mailing list about a
particular subject, it is  best to read the archives first, in full, so
you can see if the subject or  question you were about to ask has been
discussed before, as this one  has.

Please go back and re-read the archives of the last  few weeks
to bring yourself up to speed on the issues, concerns, support,  and
other items related to "Google Print".


Thank you for your suggestions.
 
Now I am going to ignore them completely, as I have far too much to do to  go 
back and reread all the archives. I prefer to risk being redundant than to  
risk leaving a question unanswered.
 
A question was asked. I looked into the matter. I answered the question.  
Period. 
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041224/98b2af58/attachment.html
From Gutenberg9443 at aol.com  Fri Dec 24 10:29:46 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Dec 24 10:29:58 2004
Subject: [gutvol-d] Fwd: Project Googleberg
Message-ID: <9a.1c8862d0.2efdba1a@aol.com>

 
In a message dated 12/23/2004 5:42:07 PM Mountain Standard Time,  
hacker@gnu-designs.com writes:

First  and foremost, when composing email, the best place for
your text is in the  the _body_ of the email, not sent as an attachment
(a non-RFC-compliant  attachment at that).

Please don't do  that.


Where was the attachment? What was the attachment? I ask because I didn't  
remember an attachment and when I went back and looked at my "send" list I  
didn't find an attachment. Therefore, if there was an attachment, somebody else  
attached it and I'd like to know who, when, how, and why.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041224/3de05947/attachment-0001.html
From jlinden at pglaf.org  Fri Dec 24 10:35:19 2004
From: jlinden at pglaf.org (James Linden)
Date: Fri Dec 24 10:37:34 2004
Subject: [gutvol-d] Fwd: Project Googleberg
In-Reply-To: <9a.1c8862d0.2efdba1a@aol.com>
References: <9a.1c8862d0.2efdba1a@aol.com>
Message-ID: <41CC6167.10704@pglaf.org>

Anne,

 All your messages come through with an attachment because you use HTML 
formatted mail. While I do hate HTML email, it's nothing to worry about.

-- James

Gutenberg9443@aol.com wrote:

> In a message dated 12/23/2004 5:42:07 PM Mountain Standard Time, 
> hacker@gnu-designs.com writes:
>
>     First and foremost, when composing email, the best place for
>     your text is in the the _body_ of the email, not sent as an attachment
>     (a non-RFC-compliant attachment at that).
>
>         Please don't do that.
>
> Where was the attachment? What was the attachment? I ask because I 
> didn't remember an attachment and when I went back and looked at my 
> "send" list I didn't find an attachment. Therefore, if there was an 
> attachment, somebody else attached it and I'd like to know who, when, 
> how, and why.
>  
> Anne
>
>------------------------------------------------------------------------
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>  
>

From marcello at perathoner.de  Fri Dec 24 09:37:25 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Dec 24 10:38:32 2004
Subject: [gutvol-d] !@!Re: E-DOCS: Google Print Questions [J. Roland]
In-Reply-To: <Pine.LNX.4.60.0412230923370.9920@pglaf.org>
References: <20041223031630.16124F2548@boggle.pobox.com>
	<Pine.LNX.4.60.0412230923370.9920@pglaf.org>
Message-ID: <41CC53D5.3010005@perathoner.de>

Michael Hart wrote:

> Project Gutenberg has already produced and distributed nearly 15,000 
> eBooks,
> with a budget that has yet to reach a significant total for all 33+ years,
> and is projected to reach a million eBooks without undue expense or effort.

PG produces books at a lower cost only if you neglect the cost of 
volunteer work. I'm sure a big organized corporation like Google can 
create eBooks way cheaper than a loosely organized group of volunteers 
like PG.


> We'll just have to wait and see if either Google Print, or any of the 
> various
> "Million eBook Projects" will ever come up with even 1% of a million eBooks
> that you can carry with you on a one inch stack of plain homemade DVDs.

Whereas PG already has reached 1.5% of a million books with 98.5% still 
to go.


> If it hasn't been proofread, and if you can't take it with you, it is only
> of limited value. . .sort of like reading over someone's shoulder.

Depends on what you want to do with the book. If you only want to cite 
some work a page scan (that you cannot take with you but is error-free) 
is much better than a proofread eBook (which may contain OCR errors).


> With Project Gutenberg eBooks, you OWN them. . .forever. . .and can save 
> them
> in your own favorite formats, fonts, margination, pagination, or whatever,
> and you can search, quote, print, and do all the normal eBook fuctions.

Yours forever ... until new copyright laws separate you.


> I would say that an eBook has to be at least 99.9% accurate, and that it
> should then be a process as people read the eBooks, to send in corrections.

That is ~ 2 errors per page if you assume a line length of 55 and page 
length of 40 (~ 2000) chars.

> Most of the Project Gutenberg and Distributed Proofeaders would say it has
> to be over 99.99% and perhaps even over 99.999%.

That is approx. one error every 5 pages or every 50 pages. Still not 
very good.


> Not only that, but, viewing the entire eBook effort as a 50 year process,
> of which I have walked 33+ years, I must state for the record that I think
> OCR, spellcheckers, grammarcheckers., etc. will be so much better a decade
> from now that doing the proofreading on the more obscure works will require
> so much less effort than it does today, that it will be a great trade-off.

Which poses the question: isn't Google's approach to just scan the books 
today and wait, better suited to achieve the 1 million target? Every 
progress in OCR technology automatically "proof-reads" all books Google 
has scanned.


-- 
Marcello Perathoner
webmaster@gutenberg.org


From sharris at steveharris.net  Thu Dec 23 20:55:10 2004
From: sharris at steveharris.net (steve harris)
Date: Fri Dec 24 14:21:21 2004
Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta 
In-Reply-To: <41CB953A.50402@dsl.pipex.com>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAYXWAK2B810G3anMDOR0xOAKBAAAQAAAAD6QQRcyqIUu0Jth05JrCXQEAAAAA@steveharris.net>

The Library of Congress doesn't list anything before 1929.  The British Library shows:

Author - personal    Batu?ta, Ibn.
Title   	The travels of Ibn Batu?ta : translated from the abridged manuscript copies, preserved in the public library of Cambridge with notes, illustrative of the history, geography, botany, antiquities, &c. occurring through the work / by Samuel Lee.
Publisher/year   	London : Darf, 1984, 1829.
Added name   	Lee, Samuel.
holdings (1)   	All items
Holdings (BL)   	89/27495 DSC Request
ISBN   	1850770352

Good Luck.

Thx, steve h

sharris@steveharris.net 


> -----Original Message-----
> From: gutvol-d-bounces@lists.pglaf.org 
> [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Holden McGroin
> Sent: Thursday, December 23, 2004 8:04 PM
> To: Project Gutenberg Volunteer Discussion
> Subject: [gutvol-d] Ibn Batuta (Was Re: Fwd: Project Googleberg)
> 
> 
> Gutenberg9443@aol.com wrote:
> > By the way, does ANYBODY know where we can get a public 
> domain copy of
> > Ibn Batuta? I've had no luck finding one online. I even 
> asked the king 
> > of Saudi Arabia for a copy, but His Majesty didn't answer. The few 
> > snippets I've seen are fascinating. He left his home to go 
> on a haj, and 
> > then kept going, spending 29 years travelling and writing 
> fascinating 
> > notes of where he went, namely everywhere you could get to 
> without going 
> > to Arctica, Antarctica, or the Americas.
> 
> I have to agree with Anne. Every time I hear about Ibn 
> Batuta's amazing 
> travels, I feel the urge to read his writings. Is there any chance we 
> could get them online as part of Gutenberg's collection?
> 
> Cheers,
> Holden
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From gbuchana at rogers.com  Fri Dec 24 14:44:45 2004
From: gbuchana at rogers.com (Gardner Buchanan)
Date: Fri Dec 24 14:45:07 2004
Subject: [gutvol-d] Ibn Batuta
In-Reply-To: <Pine.GSO.4.58.0412232259170.8358@vtn1.victoria.tc.ca>
Message-ID: <XFMail.041224174445.gbuchana@rogers.com>


Andrew Sly wrote:
> 
> 
> On Fri, 24 Dec 2004, Holden McGroin wrote:
> 
>> I have to agree with Anne. Every time I hear about Ibn Batuta's amazing
>> travels, I feel the urge to read his writings. Is there any chance we
>> could get them online as part of Gutenberg's collection?
> 
> Of course there is. I believe it would just be a matter of how much
> effort and expense some volunteers would like to go to in order to
> make it happen. Oh, and a bit of luck too.
> 
> After reading what I could find about Ibn Batuta, I agree, this
> could be worth searching out... (I would imagine in an english
> translation)
> 

The 1829 English translation by (Reverend) Samuel Lee looks like
the best bet:

The Travels of Ibn Batuta; Translated from the Abridged Arabic
Manuscript Copies, preserved in the Public Library of Cambridge.
Translated by Rev. Samuel Lee. London: Printed for the Oriental
Translation Committee, 1829

I see a couple of 1985 re-prints going for ~$60US.

The Gibb translation is too new.  It looks like it was published
in the 50s or so.  Gibb lived until 1971.

============================================================
Gardner Buchanan                       <gbuchana@rogers.com>
Ottawa, ON             FreeBSD: Where you want to go. Today.
From Gutenberg9443 at aol.com  Fri Dec 24 15:14:31 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Dec 24 15:14:48 2004
Subject: [gutvol-d] Fwd: Project Googleberg
Message-ID: <1b9.98e68bb.2efdfcd7@aol.com>

 
In a message dated 12/24/2004 11:37:36 AM Mountain Standard Time,  
jlinden@pglaf.org writes:

All your  messages come through with an attachment because you use HTML 
formatted  mail.


I didn't know that. What kind of an attachment is it?
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041224/f8238e44/attachment.html
From flis at detk.com  Fri Dec 24 17:50:04 2004
From: flis at detk.com (William Flis)
Date: Fri Dec 24 17:43:38 2004
Subject: [gutvol-d] Fwd: Project Googleberg
In-Reply-To: <1b9.98e68bb.2efdfcd7@aol.com>
Message-ID: <LBELIICCBHDEONNDCACJMEJCCIAA.flis@detk.com>

The attachment I get in a lot of mesages from this list says this:

<quote>
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d
</quote>

Who would put such an attachment?

Bill Flis
  -----Original Message-----
  From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Gutenberg9443@aol.com
  Sent: Friday, December 24, 2004 6:15 PM
  To: gutvol-d@lists.pglaf.org
  Subject: Re: [gutvol-d] Fwd: Project Googleberg


  In a message dated 12/24/2004 11:37:36 AM Mountain Standard Time,
jlinden@pglaf.org writes:
    All your messages come through with an attachment because you use HTML
    formatted mail.
  I didn't know that. What kind of an attachment is it?

  Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041224/51e020b5/attachment.html
From Gutenberg9443 at aol.com  Fri Dec 24 18:15:30 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Dec 24 18:15:53 2004
Subject: [gutvol-d] Fwd: Project Googleberg
Message-ID: <199.35331039.2efe2742@aol.com>

 
In a message dated 12/24/2004 6:43:46 PM Mountain Standard Time,  
flis@detk.com writes:

Who would put such an  attachment?


I haven't the foggiest. Some great guru of the Internet must have done it.  
I'm glad I'm not the culprit.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041224/db9b3605/attachment.html
From gbuchana at rogers.com  Fri Dec 24 19:41:10 2004
From: gbuchana at rogers.com (Gardner Buchanan)
Date: Fri Dec 24 19:41:44 2004
Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAYXWAK2B810G3anMDOR0xOAKBAAAQAAAAD6QQRcyqIUu0Jth05JrCXQEAAAAA@steveharris.net>
Message-ID: <XFMail.041224224110.gbuchana@rogers.com>


On 04:55:10 steve harris wrote:
> The Library of Congress doesn't list anything before 1929.
> 
> 
> Good Luck.
> 

It's not as bad as all that.  The Lee translation (1829) is available
pretty easily.  Amazon.com has an edition for $12.  Look for ISBN
0486437655.  There was a 1940s re-print of the Lee translation that
seems to go for $60 used.

The French 1859 translation by Defremery and Sanguinetti is based
on more/better source material - but I don't read French.


============================================================
Gardner Buchanan                       <gbuchana@rogers.com>
Ottawa, ON             FreeBSD: Where you want to go. Today.
From hyphen at hyphenologist.co.uk  Fri Dec 24 23:28:09 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Fri Dec 24 23:28:43 2004
Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta
In-Reply-To: <XFMail.041224224110.gbuchana@rogers.com>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAYXWAK2B810G3anMDOR0xOAKBAAAQAAAAD6QQRcyqIUu0Jth05JrCXQEAAAAA@steveharris.net>
	<XFMail.041224224110.gbuchana@rogers.com>
Message-ID: <di5qs0t4ok0o94s3j4u2m9ntrcrchel9um@4ax.com>

On Fri, 24 Dec 2004 22:41:10 -0500 (EST),  Gardner Buchanan
<gbuchana@rogers.com> wrote:

| 
| On 04:55:10 steve harris wrote:
| > The Library of Congress doesn't list anything before 1929.
| > 
| > 
| > Good Luck.
| > 
| 
| It's not as bad as all that.  The Lee translation (1829) is available
| pretty easily.  Amazon.com has an edition for $12.  Look for ISBN
| 0486437655.  There was a 1940s re-print of the Lee translation that
| seems to go for $60 used.
| 
| The French 1859 translation by Defremery and Sanguinetti is based
| on more/better source material - but I don't read French.

The original ?Arabic? would be nice.

-- 
Dave F

From shalesller at writeme.com  Fri Dec 24 23:55:11 2004
From: shalesller at writeme.com (D. Starner)
Date: Fri Dec 24 23:55:30 2004
Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta
Message-ID: <20041225075511.0EC874BDAB@ws1-1.us4.outblaze.com>

"Dave Fawthrop" writes:
> The original ?Arabic? would be nice. 

In etext form? Yes. In paper form, it'd be down right useless for PG.
We just don't have anyone really capable of handling it. We don't have
OCR--the Urdu team at DP-EU is completely type-in, and I don't know
of anyone interested in proofing more than a few lines of Arabic.

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From hyphen at hyphenologist.co.uk  Sat Dec 25 01:11:03 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sat Dec 25 01:11:37 2004
Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta
In-Reply-To: <20041225075511.0EC874BDAB@ws1-1.us4.outblaze.com>
References: <20041225075511.0EC874BDAB@ws1-1.us4.outblaze.com>
Message-ID: <8vaqs0plv9mb4vjj0d99lb4rg2ldi8bj7f@4ax.com>

On Fri, 24 Dec 2004 23:55:11 -0800,  "D. Starner" <shalesller@writeme.com>
wrote:

| "Dave Fawthrop" writes:
| > The original ?Arabic? would be nice. 
| 
| In etext form? Yes. In paper form, it'd be down right useless for PG.
| We just don't have anyone really capable of handling it. 

Maybe a friend of a friend of someone on gutvol-d?

| We don't have
| OCR--the Urdu team at DP-EU is completely type-in, and I don't know
| of anyone interested in proofing more than a few lines of Arabic.

There are quite a lot of people who can read Classical Arabic, though not
perhaps in the USA.   I ended guessed an Arabic word and ended up with a
long discussion with a Muslim lady who is quadra lingual about the
differences between Classical Arabic and modern Arabics.

Unicode has all the characters required. Also right to left writing.

Also because Arabic is a language designed for calligraphy, a page scan of
a well written copy would be useful.

Sorry but my knowledge of Arabic is theoretical :-(


-- 
Dave F

From hart at pglaf.org  Sat Dec 25 10:14:07 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Dec 25 10:14:10 2004
Subject: !@!Re: [gutvol-d] RE: [gavel-d] Ibn Batuta
In-Reply-To: <8vaqs0plv9mb4vjj0d99lb4rg2ldi8bj7f@4ax.com>
References: <20041225075511.0EC874BDAB@ws1-1.us4.outblaze.com>
	<8vaqs0plv9mb4vjj0d99lb4rg2ldi8bj7f@4ax.com>
Message-ID: <Pine.LNX.4.60.0412251013330.13378@pglaf.org>


If you can forward this to me as an attachment,
I think I have someone who can proof it for you.

Michael


On Sat, 25 Dec 2004, Dave Fawthrop wrote:

> On Fri, 24 Dec 2004 23:55:11 -0800,  "D. Starner" <shalesller@writeme.com>
> wrote:
>
> | "Dave Fawthrop" writes:
> | > The original ?Arabic? would be nice.
> |
> | In etext form? Yes. In paper form, it'd be down right useless for PG.
> | We just don't have anyone really capable of handling it.
>
> Maybe a friend of a friend of someone on gutvol-d?
>
> | We don't have
> | OCR--the Urdu team at DP-EU is completely type-in, and I don't know
> | of anyone interested in proofing more than a few lines of Arabic.
>
> There are quite a lot of people who can read Classical Arabic, though not
> perhaps in the USA.   I ended guessed an Arabic word and ended up with a
> long discussion with a Muslim lady who is quadra lingual about the
> differences between Classical Arabic and modern Arabics.
>
> Unicode has all the characters required. Also right to left writing.
>
> Also because Arabic is a language designed for calligraphy, a page scan of
> a well written copy would be useful.
>
> Sorry but my knowledge of Arabic is theoretical :-(
>
>
> --
> Dave F
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From shalesller at writeme.com  Sat Dec 25 10:18:06 2004
From: shalesller at writeme.com (D. Starner)
Date: Sat Dec 25 10:18:12 2004
Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta
Message-ID: <20041225181806.8BEBC101D0@ws1-3.us4.outblaze.com>

"Dave Fawthrop" writes:
> Maybe a friend of a friend of someone on gutvol-d? 

I don't think it's helpful to try and push people into handling
a entire specific book right at the start, especially without
OCR.
 
> There are quite a lot of people who can read Classical Arabic, 

There's a lot of language communities out there that PG doesn't
have much contact with.

> Unicode has all the characters required. Also right to left writing. 

Unicode has pretty much all the letters we need, sans the myriad varietes
of early 20th-century phonetic characters. That doesn't mean we can transcribe
them easily.
 
> Also because Arabic is a language designed for calligraphy, a page scan of 
> a well written copy would be useful. 

It's not really any different from English. A page of English calligraphy
may be beautiful, but it's not a text version.

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From hart at pglaf.org  Sat Dec 25 10:37:58 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Dec 25 10:37:59 2004
Subject: [gutvol-d] !@!Re: E-DOCS: Google Print Questions [J. Roland]
In-Reply-To: <41CC53D5.3010005@perathoner.de>
References: <20041223031630.16124F2548@boggle.pobox.com>
	<Pine.LNX.4.60.0412230923370.9920@pglaf.org>
	<41CC53D5.3010005@perathoner.de>
Message-ID: <Pine.LNX.4.60.0412251026010.13378@pglaf.org>


On Fri, 24 Dec 2004, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>> Project Gutenberg has already produced and distributed nearly 15,000 
>> eBooks,
>> with a budget that has yet to reach a significant total for all 33+ years,
>> and is projected to reach a million eBooks without undue expense or 
>> effort.
>
> PG produces books at a lower cost only if you neglect the cost of volunteer 
> work. I'm sure a big organized corporation like Google can create eBooks way 
> cheaper than a loosely organized group of volunteers like PG.

We'll find out, won't we?

I'm still betting we will be first to 100,000.

Then it'll be fun to see how it goes to 1,000,000.

Of course, after 10,000,000, things will really slow down,
in the sense that it will become hard to find more books.

>> We'll just have to wait and see if either Google Print, or any of the 
>> various
>> "Million eBook Projects" will ever come up with even 1% of a million 
>> eBooks
>> that you can carry with you on a one inch stack of plain homemade DVDs.
>
> Whereas PG already has reached 1.5% of a million books with 98.5% still to 
> go.

Hopefully more news on this front shortly.


>> If it hasn't been proofread, and if you can't take it with you, it is only
>> of limited value. . .sort of like reading over someone's shoulder.
>
> Depends on what you want to do with the book. If you only want to cite some 
> work a page scan (that you cannot take with you but is error-free) is much 
> better than a proofread eBook (which may contain OCR errors).

I have yet to read any paper book that is error free. . . .

Eventually the eBook will be more accurate than the source,
perhaps in your lifetime for many eBooks.

>
>
>> With Project Gutenberg eBooks, you OWN them. . .forever. . .and can save 
>> them
>> in your own favorite formats, fonts, margination, pagination, or whatever,
>> and you can search, quote, print, and do all the normal eBook fuctions.
>
> Yours forever ... until new copyright laws separate you.

Luckily US and AU copyright changes are not retroactive,
as are those of more olde worlde countries. . . .

>
>
>> I would say that an eBook has to be at least 99.9% accurate, and that it
>> should then be a process as people read the eBooks, to send in 
>> corrections.
>
> That is ~ 2 errors per page if you assume a line length of 55 and page length 
> of 40 (~ 2000) chars.

The Library on Congress standard is 99.95%. . .one error per page.

Of course, some people count a stray character in the margins as an error,
or a typo in the header/footer/page#. . .I only count the authors's words.

>
>> Most of the Project Gutenberg and Distributed Proofeaders would say it has
>> to be over 99.99% and perhaps even over 99.999%.
>
> That is approx. one error every 5 pages or every 50 pages. Still not very 
> good.

Reading one of Brewster's books with Greg the other day,
it was obvious only the author's words had been proofed,
the headers/footers/page# were often messy, but the book
itself was quite readable.

It had perhaps less than 1,000 characters per page,
but only one real error. . .another was a capitalization
error that may bother some and not others. . .in about 10 pages.
That's at least one "hard" error, and one "soft" error per 10K,
99.99% or 99.98%. . .if you don't count header/footer/page#
errors. . . .

This is well beyond the Library of Congress standards of 99.95%
if someone were to decided to "sew all the pages together, into
a single file eBook, and eliminate the headers/footers/page#'s etc.

I was quite impressed. . .and I will have to look at more of them.

>
>
>> Not only that, but, viewing the entire eBook effort as a 50 year process,
>> of which I have walked 33+ years, I must state for the record that I think
>> OCR, spellcheckers, grammarcheckers., etc. will be so much better a decade
>> from now that doing the proofreading on the more obscure works will 
>> require
>> so much less effort than it does today, that it will be a great trade-off.
>
> Which poses the question: isn't Google's approach to just scan the books 
> today and wait, better suited to achieve the 1 million target? Every progress 
> in OCR technology automatically "proof-reads" all books Google has scanned.

This has been the approach of all the "quick and dirty" eBook projects,
certainly all those that project a million eBooks in the next 10 years.

Except, of course, Project Gutenberg.


Thanks!!!


So Nice To Hear From You!

Happy Holidays!!!


Michael


Give FreeBooks!!!
In 39 Languages!!!

As of December 25, 2004
~14,815 FreeBooks at:
~185 to go to 15,000
http://www.gutenberg.org
http://www.gutenberg.net

We are ~96% of the way
from 10,000 to 15,000.

Now even more PG eBooks
In 104 Languages!!!
http://gutenberg.cc
http://gutenberg.us

Michael S. Hart
<hart@pobox.com>
Project Gutenberg
Executive Coordinator^M
"*Internet User ~#100*"

If you do not receive
a prompt reply, please
resend, keep resending.

From hart at pglaf.org  Sat Dec 25 10:42:13 2004
From: hart at pglaf.org (Michael Hart)
Date: Sat Dec 25 10:42:15 2004
Subject: !@!Re: [gutvol-d] Fwd: Project Googleberg (fwd)
Message-ID: <Pine.LNX.4.60.0412251040260.13378@pglaf.org>


Request permission to quote/forward your comments,
you can be anonymous if you like. . .as it may get
back to Google. . . .

> I have examined what seems to be the preliminary "Googleprint"
> catalog. It consists of books scanned and posted by other people
> including us. At least half of them that I looked at are available
> only as page scans, and I have to want a book an awful lot to put
> page scans together just for my own use. They use a LOT of our
> books; in fact, everything they have that they are aware we have
> shows us as the best or only site to go and get the book.

Though from this message it wasn't clear whose words these are.
David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d
From hyphen at hyphenologist.co.uk  Sat Dec 25 12:17:32 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sat Dec 25 12:18:25 2004
Subject: [gutvol-d] RE: [gavel-d] Ibn Batuta
In-Reply-To: <20041225181806.8BEBC101D0@ws1-3.us4.outblaze.com>
References: <20041225181806.8BEBC101D0@ws1-3.us4.outblaze.com>
Message-ID: <akirs05d6eemakpo8ttbi14kfdgkmi91um@4ax.com>

On Sat, 25 Dec 2004 10:18:06 -0800,  "D. Starner" <shalesller@writeme.com>
wrote:

| There's a lot of language communities out there that PG doesn't
| have much contact with.

Shame about that :-(


-- 
Dave F

From holden.mcgroin at dsl.pipex.com  Sat Dec 25 15:15:50 2004
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Sat Dec 25 15:16:08 2004
Subject: [gutvol-d] Ibn Batuta
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAYXWAK2B810G3anMDOR0xOAKBAAAQAAAAD6QQRcyqIUu0Jth05JrCXQEAAAAA@steveharris.net>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAYXWAK2B810G3anMDOR0xOAKBAAAQAAAAD6QQRcyqIUu0Jth05JrCXQEAAAAA@steveharris.net>
Message-ID: <41CDF4A6.3000906@dsl.pipex.com>

steve harris wrote:
> The Library of Congress doesn't list anything before 1929.  The British Library shows:
> 
> Author - personal    Batu?ta, Ibn.
> Title   	The travels of Ibn Batu?ta : translated from the abridged manuscript copies, preserved in the public library of Cambridge with notes, illustrative of the history, geography, botany, antiquities, &c. occurring through the work / by Samuel Lee.
> Publisher/year   	London : Darf, 1984, 1829.
> Added name   	Lee, Samuel.
> holdings (1)   	All items
> Holdings (BL)   	89/27495 DSC Request
> ISBN   	1850770352

Hi!

Thanks for the info. I've really wanted to read Ibn Batuta for a while 
now so perhaps this is the golden opportunity to finally get a PG 
version in motion. So, does anybody have any experience ordering copies 
of whole books from the British Library? I'd love to do it (obviously, 
depending on price) if it's at all possible :-)

Cheers,
Holden
From sly at victoria.tc.ca  Sat Dec 25 23:17:06 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat Dec 25 23:17:27 2004
Subject: [gutvol-d] Ibn Batuta
In-Reply-To: <41CDF4A6.3000906@dsl.pipex.com>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAYXWAK2B810G3anMDOR0xOAKBAAAQAAAAD6QQRcyqIUu0Jth05JrCXQEAAAAA@steveharris.net>
	<41CDF4A6.3000906@dsl.pipex.com>
Message-ID: <Pine.GSO.4.58.0412252312110.20442@vtn1.victoria.tc.ca>


On Sat, 25 Dec 2004, Holden McGroin wrote:

> Thanks for the info. I've really wanted to read Ibn Batuta for a while
> now so perhaps this is the golden opportunity to finally get a PG
> version in motion. So, does anybody have any experience ordering copies
> of whole books from the British Library? I'd love to do it (obviously,
> depending on price) if it's at all possible :-)
>

Hmmm... you may not need to go that far afield.

It looks as if there is a reprint in the University Library in my
city (Victoria, British Columbia) and given that it was published
in New York, I'd expect you could find it in some American cities...

Here's a full record:

   Author/Creator: Ibn Batuta, 1304-1377.
   Other Author/Creator(s): Lee, Samuel, 1783-1852.
   Title: The travels of Ibn Batuta. Translated from the abridged Arabic
   manuscript copies, preserved in the Public Library of Cambridge. With
   notes illustrative of the history, geography, botany, antiquities,
   occurring throughout the work, by Samuel Lee.
   Uniform Title: Tuhfat al-nuzzar English. 1971                    ,
     _________________________________________________________________

   Database: University of Victoria Libraries
   Location: McPherson Library
   Call Number: G370 I23
   Number of Items: 1
   Status: In Library
   Subject(s): Voyages and travels
               Africa--Description and travel--To 1900
               Asia--Description and travel
   Published: New York, B. Franklin [1971]
   Description: xviii, 243 p. 24 cm.
   Series: Burt Franklin research & source works series, 817
   Geography and discovery, 13
   Notes: Reprint of the 1829 ed.
   Translation of Tuhfat al-nuzzar.
   ISBN: 0833720511

From hyphen at hyphenologist.co.uk  Sun Dec 26 01:06:02 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sun Dec 26 01:06:33 2004
Subject: [gutvol-d] Ibn Batuta
In-Reply-To: <41CDF4A6.3000906@dsl.pipex.com>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAYXWAK2B810G3anMDOR0xOAKBAAAQAAAAD6QQRcyqIUu0Jth05JrCXQEAAAAA@steveharris.net>
	<41CDF4A6.3000906@dsl.pipex.com>
Message-ID: <uvtss09ndjduef1mbs0neiqidrdi3ijjho@4ax.com>

On Sat, 25 Dec 2004 23:15:50 +0000,  Holden McGroin
<holden.mcgroin@dsl.pipex.com> wrote:


| Thanks for the info. I've really wanted to read Ibn Batuta for a while 
| now so perhaps this is the golden opportunity to finally get a PG 
| version in motion. So, does anybody have any experience ordering copies 
| of whole books from the British Library? I'd love to do it (obviously, 
| depending on price) if it's at all possible :-)

It all depends where the copy is stored.   
If it is in London, forget it, you can not borrow it, you must get
readership permission and consult it there.

If it is in Boston Spa, you can consult it at Boston Spa but it takes some
time to get it from storage so organize yourself first.   It is far easier
to get things via your local library.   Fill in a request form, and they
will first look for an ordinary local library copy, and failing this they
will get a copy from Boston Spa.
************  BEWARE THE BL FINES SYSTEM THEY ARE HORRENDOUS  **********
-- 
Dave F

From Gutenberg9443 at aol.com  Sun Dec 26 15:45:51 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Sun Dec 26 15:46:09 2004
Subject: [gutvol-d] Final Report on eBookWise 1150
Message-ID: <9b.5597c5bf.2f00a72f@aol.com>


I promised a final report after the eBookWise 1500 Librarian was  released. 
It was released last week. It took me a little while to get the  computer and 
the device to talk to each other, but the problem turned  out to be that I had 
only turned the computer off and on after each program was  installed. I was 
supposed to have completely unplugged the computer.
 
Please do not flame me over this. You are authorized to disagree with me. I  
know that some people would rather spend 42 days stuck in a semiprivate 
hospital  room with a roommate who is in love with the soaps, than possess a device 
that  will allow them to read and mentally tune out the television.
 
Forget about all the negative comments in my initial report. As long  as I 
have loved and been faithful to my Rocket, I have to admit that the 1500 is  
better. It will allow me to do things that my Rocket won't allow, including  
making handwritten notes with my stylus, so henceforth the Rocket will be my  
pleasure reading device and the 1500 (we have named her Isis, to relate  well to 
my computer, whose name is Sesheta. Sesheta was the ancient Egyptian  goddess 
of libraries and librarians; Isis was the Lady high everything else.)  will be 
my work device.
 
The only problem I'm still having is the fact that Isis holds only about 20  
books. But after I get to the computer store and get a SmartMedia card and its 
 driver, Isis will hold over 300 books very easily.
 
This is a winner.
 
If you have any interest in being able to carry 300, or even 20, books  
around in your purse or backpack without tearing your shoulder into shreds, hie  
yourself over to eBookWise.com and buy the eBookWise while the price is right.  
$110 will buy and deliver it, and you'll then be given $20 to spend on books.  
All books are 20% off for about another week.
 
With this device, you can read ANY BOOK ON THE INTERNET, unless it is in an  
encrypted format incompatible with .txt and .htm and .doc. You can definitely  
use it to read anything posted on PG and anything posted on Blackmask.
 
Between FictionWise and eBookWise, literally thousands of commercial titles  
are available. Also anything you can get in .rb will transform itself into the 
 right format in less than a minute.
 
If you use a PDA or even a cellular phone with a lot of memory, you can  
carry around one to three books, if you don't mind reading in teensie weensie  
typesize with a line that consists of three words (or two if they're longer  
words). Blech. Yesterday at the family Christmas party I was showing this thing  
to my husband's former wife, and she held it in her hand and said, "Well, it  
doesn't weigh less than a book that size." I conceded the point--a Rocket is 18 
 ounces and I think an eBookWise 1500 is about 22 ounces-- but then pointed  
out that it weighs far less than 20, or 300, books that size. (When anybody  
in our family is in the hospital for more than a day, daily book runs are  
necessary to take away the read books and bring new ones. This device will  
obviate that necessity. Although it is true that you should never have valuable  
stuff lying around your hospital room, as thieves are familiar with hospital  
rooms, you can ask a kind nurse to lock it away for you when you're about to go  
to sleep.)
 
Or--You're travelling cross-country. Every evening you stop at a motel. At  
the motel, you have your choice of the Gideon Bible if it hasn't been swiped,  
the food service menu, or television--or your own selection of 300 books.
 
OR--if you're stuck in the hospital emergency room waiting interminably for  
somebody to come and attend to you or your loved one, wouldn't you like to 
have  something to read with you? Maybe even two somethings, so that the person 
in  bed, if it's not you, can also have something to read?
 
If I had the money I'd give one of these to everybody I know.
 
If you don't want one, don't buy it. But if you can afford it, at least  give 
it a one-week trial. Then if you still don't like it, give it to somebody  
you don't like.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041226/532b2d28/attachment.html
From Gutenberg9443 at aol.com  Sun Dec 26 15:48:06 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Sun Dec 26 15:48:22 2004
Subject: [gutvol-d] Ibn Batuta
Message-ID: <111.4043b5d3.2f00a7b6@aol.com>

 
In a message dated 12/26/2004 12:17:36 AM Mountain Standard Time,  
sly@victoria.tc.ca writes:

<<<Hmmm... you may not need to go that far  afield.


I really appreciate the way that people are looking for ibn Batuta. He's  
enjoyable both because of his interesting writing and the interesting tales he  
has to tell, and because of his unique world-view that illuminates history from 
 this period.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041226/e51e33a5/attachment.html
From sly at victoria.tc.ca  Sun Dec 26 22:33:29 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Dec 26 22:33:48 2004
Subject: [gutvol-d] Project Gutenberg as musty library?
Message-ID: <Pine.GSO.4.58.0412262230100.5938@vtn1.victoria.tc.ca>


Snippet of conversation found on usenet:

> > I have probably 2,000 books and
> > perhaps 50 videos, not a single one of them clutter. Getting them from the
> > library and then returning them just isn't the same as rereading
> well-worn, familiar pages, either, for instance.
>
>
> Yes, it's quite different. Tho come to think of it, I feel that way about a
> few books I get from the library once a year or so to reread. Elizabeth
> Goudge's CITY OF BELLS for one. But those are old books themselves, been
> there a long time, nice old pictures and nice old typeface....
>
> What I'm trying to do is transfer that feeling to Project Gutenberg's 'plain
> vanilla' texts. I already kind of feel like their site is a very old shabby
> library, smelling of leather, mice, and I forget what all else....
>
>
From shimmin at uiuc.edu  Mon Dec 27 07:30:44 2004
From: shimmin at uiuc.edu (Robert Shimmin)
Date: Mon Dec 27 07:30:50 2004
Subject: [gutvol-d] !@!Re: E-DOCS: Google Print Questions [J. Roland]
In-Reply-To: <Pine.LNX.4.60.0412251026010.13378@pglaf.org>
References: <20041223031630.16124F2548@boggle.pobox.com>	<Pine.LNX.4.60.0412230923370.9920@pglaf.org>	<41CC53D5.3010005@perathoner.de>
	<Pine.LNX.4.60.0412251026010.13378@pglaf.org>
Message-ID: <41D02AA4.3010301@uiuc.edu>

Michael Hart wrote:

> I'm still betting we will be first to 100,000.

Anyone can beat us to 100,000 if they mirror all of our content and add 
some of their own.

-- RS
From stephen.thomas at adelaide.edu.au  Sat Dec 25 15:16:16 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Wed Dec 29 13:35:43 2004
Subject: [gutvol-d] Revolutionary chapter / Google's ambitious book-scanning
 plan seen as key shift in paper-based culture
Message-ID: <41CDF4C0.10200@adelaide.edu.au>

Not sure if I sent this already, but it provides some useful info on the 
process Google is adopting. In the light of discussion on this topic so 
far, this may be enlightening for some readers.

http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2004/12/20/BUGROAD6QT1.DTL

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/


From hart at pglaf.org  Tue Dec 28 03:49:41 2004
From: hart at pglaf.org (Michael Hart)
Date: Wed Dec 29 13:35:46 2004
Subject: [gutvol-d] !@!Googleberg eBooks
Message-ID: <Pine.LNX.4.60.0412280347130.19277@pglaf.org>


How many of you have tried Google Print?

Have you noticed that the intitial offering of eBooks
strongly resembles the Project Gutenberg catalogue???

We'd love to hear your experiences with Google Print.


Thanks!!!

Michael S. Hart

From nihil_obstat at mindspring.com  Wed Dec 29 14:36:05 2004
From: nihil_obstat at mindspring.com (Dennis McCarthy)
Date: Wed Dec 29 15:23:50 2004
Subject: [gutvol-d] !@!Googleberg eBooks
Message-ID: <22839263.1104359765940.JavaMail.root@wamui02.slb.atl.earthlink.net>


There is no "Google Print" library in a sense that I would think of one:  i.e. I cannot seem to get any catalog of its collection or just browse Google Print.  It works as an added feature to its regular search--so Google Print titles come up, as well as external links.  This is fine in a way, because if I search for a text, and PG has it, the search results usually have a high ranking link to the a P.G. server.  (So in a way, all of P.G. is essentially as findable to Google clients as Google Print is.)  The lack of a catalog is bad in a way, for there is no easy way to see just what is available at Google that is in the public domain--in case you actually wanted to read an entire book on-line.  You have to seach for topics or people and try to wend through what comes up in the search results.

Of course Google does not claim to be a library.  From its own website:
"In general, Google Print is designed to help you discover books, not read them from start to finish. It's like going to a bookstore and browsing ? only with a Google twist."

I do not find it a very useful service.  But it is something that was not there before.  I am not going to complain about it--that would be like looking a semi-lame-gift-nag-with-a-google-twist in the mouth.


-----Original Message-----
From: Michael Hart <hart@pglaf.org>
Sent: Dec 28, 2004 6:49 AM
To: undisclosed-recipients: ;
Subject: [gutvol-d] !@!Googleberg eBooks


How many of you have tried Google Print?

Have you noticed that the intitial offering of eBooks
strongly resembles the Project Gutenberg catalogue???

We'd love to hear your experiences with Google Print.


Thanks!!!

Michael S. Hart

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


---------------------------
Dennis McCarthy
nihil_obstat@mindspring.com

From servalan at ar.com.au  Thu Dec 30 03:16:58 2004
From: servalan at ar.com.au (Pauline)
Date: Thu Dec 30 03:17:34 2004
Subject: [gutvol-d] !@!Googleberg eBooks
In-Reply-To: <Pine.LNX.4.60.0412280347130.19277@pglaf.org>
References: <Pine.LNX.4.60.0412280347130.19277@pglaf.org>
Message-ID: <41D3E3AA.30508@ar.com.au>

Michael Hart wrote:
> 
> How many of you have tried Google Print?
> 
> Have you noticed that the intitial offering of eBooks
> strongly resembles the Project Gutenberg catalogue???

Why not include info in all PG ebooks which make it:
a) easy for readers to identify the source of the book (PG & the 
"Produced by" line)
b) easy for readers/mirror sites/republishers to send corrections back 
to the source (PG &/| the producers)
c) not OK to drop this info from PG ebooks when they are republished

As a reader, knowing the source of the book is exceedingly valuable.

Cheers,
P
--
Distributed Proofreaders: http://www.pgdp.net
"Preserving history one page at a time."
From shimmin at uiuc.edu  Thu Dec 30 08:35:49 2004
From: shimmin at uiuc.edu (Robert Shimmin)
Date: Thu Dec 30 08:35:55 2004
Subject: [gutvol-d] !@!Googleberg eBooks
In-Reply-To: <41D3E3AA.30508@ar.com.au>
References: <Pine.LNX.4.60.0412280347130.19277@pglaf.org>
	<41D3E3AA.30508@ar.com.au>
Message-ID: <41D42E65.3020408@uiuc.edu>

Pauline wrote:

> c) not OK to drop this info from PG ebooks when they are republished

The idea of a public domain is that anyone can do anything they like 
with the text, including edit it, republish it, and package it however 
they wish.
-- RS
From hart at pglaf.org  Thu Dec 30 08:55:14 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 30 08:55:16 2004
Subject: [gutvol-d] !@!Googleberg eBooks
In-Reply-To: <41D42E65.3020408@uiuc.edu>
References: <Pine.LNX.4.60.0412280347130.19277@pglaf.org>
	<41D3E3AA.30508@ar.com.au> <41D42E65.3020408@uiuc.edu>
Message-ID: <Pine.LNX.4.60.0412300854480.24293@pglaf.org>


On Thu, 30 Dec 2004, Robert Shimmin wrote:

> Pauline wrote:
>
>> c) not OK to drop this info from PG ebooks when they are republished
>
> The idea of a public domain is that anyone can do anything they like with the 
> text, including edit it, republish it, and package it however they wish.

But you can't say you are the author. . .and perhaps other things.

mh
From hart at pglaf.org  Thu Dec 30 09:00:40 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Dec 30 09:00:41 2004
Subject: [gutvol-d] !@!Googleberg eBooks
In-Reply-To: <41D3E3AA.30508@ar.com.au>
References: <Pine.LNX.4.60.0412280347130.19277@pglaf.org>
	<41D3E3AA.30508@ar.com.au>
Message-ID: <Pine.LNX.4.60.0412300858180.24293@pglaf.org>


On Thu, 30 Dec 2004, Pauline wrote:

> Michael Hart wrote:
>> 
>> How many of you have tried Google Print?
>> 
>> Have you noticed that the intitial offering of eBooks
>> strongly resembles the Project Gutenberg catalogue???
>
> Why not include info in all PG ebooks which make it:
> a) easy for readers to identify the source of the book (PG & the "Produced 
> by" line)

eBooks often have multiple paper sources.


> b) easy for readers/mirror sites/republishers to send corrections back to the 
> source (PG &/| the producers)

There is already a email address for errors in the eBooks,
not to mention bugs@pglaf.org and my own email address.
You can pretty much send error messages to ANY PG address
and they will be fixed.


> c) not OK to drop this info from PG ebooks when they are republished

As in earlier messages, we only have something to say if they use
the PG trademark.


mh
From jtinsley at pobox.com  Thu Dec 30 13:03:18 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Thu Dec 30 13:03:35 2004
Subject: [gutvol-d] !@!Googleberg eBooks
In-Reply-To: <Pine.LNX.4.60.0412280347130.19277@pglaf.org>
References: <Pine.LNX.4.60.0412280347130.19277@pglaf.org>
Message-ID: <20041230210318.GA27098@panix.com>

On Tue, 28 Dec 2004 03:49:41 -0800 (PST), Michael Hart <hart@pglaf.org> wrote:

>
>How many of you have tried Google Print?
>
>Have you noticed that the intitial offering of eBooks
>strongly resembles the Project Gutenberg catalogue???
>

This and some responses made me think that some people are thinking
along the lines that they are using our texts in some way, so I checked
it out. I figure that the answer is no, to both the explicit and implied
questions. 

I started by searching for quotes from 20 etexts chosen at random from
etext99, as follows:

book "cardinals, abbots, councillors, legates, bishops, princes"
book "indeed we be no fatted bullocks, we two"
book "Est-ce que je ne connais pas mon filleul?"
book "Suchet's head-quarters at that time was the old palace of the"
book "She always has this man of letters of hers on her"
book "Afterwards," he answered quickly. "A cursed gutta serena."
book "himself with the people, he partially recognizes the truth of his words."
book "Epistles are spurious, as that the Republic, the Timaeus, and the Laws" 
book "You may recall that our mutual and dear friend, old Allan Quatermain,"
book "Where rose the husbandman's abode,"
book "the felicity of his fellow beings, and sit down darkling"
book "by a tub, artesian cold, and a loud and joyous singing of"
book "As desires of waking hours are answered in sleep,"
book "Even while speaking at random, perhaps the better to hide"
book "Calm and proud, Tartarin of Tarascon marched on in the night"
book "Another fallacy is produced which turns on the absoluteness of"
book "The evidence for the steadily growing danger of secession"
book "Morose-minded people may complain of this; for myself I regard it"
book "THAT old bell, presage of a train, had just"

All of them returned normal search results, including a few from PG,
but only the second (Jungle Book 2) offered a Google Print link.

(Incidentally, for those who want to try, I find that preceding your
search term with "book" will often produce a Google Print link when
the bare search term doesn't.)

A search for "book Tarzan" yielded, in Print results:

Tarzan of the Apes - by Edgar Rice Burroughs - 320 pages
Human Computer Interaction - edited by Julie Jacko, Constantine Stephanidis - 1348 pages
C Primer Plus - by Stephen Arata, Stephen Prata, Kathleen Prata - 970 pages

Not what I'd consider a typical PG search result! :-)

"book barsoom" and "book mars" did even less well. No sign of the
ERB series.

Erewhon, Alice, Little Women, Oliver Twist, Tom Sawyer, Huck Finn,
Zenda, Decline and Fall, at least some Sherlock Holmes, Last of the
Mohicans, several from Plato and at least most of Shakespeare, are
present. Richard Feveral is there, but Shagpat is nowhere. Tom Swift is
AWOL. Tartarin of Tarascon can't be found. John Carter is once again
mysteriously missing. Kai Lung has effaced himself into invisibility.
And in the process of searching for these, I turned up about twice as
many modern as pre-23 book titles.

The page images I looked at are all from modern reprints, with
"Copyrighted Material" tags on their sides. I imagine that the
publishers would insist on this, which makes much sense of Google
wanting to work with a collection of PD books from libraries.

This pattern is, I think, consistent with what book publishers might be
willing to provide. Any list of books drawn up by English speakers is
going to have the most popular classics on it. An awful lot of the
search results I found were from Penguin Classics, so it may well be
that they simply have the whole Penguin Classics range. If so, a
significant overlap with PG is inevitable. And the Google Print entries
seem to have a lot more modern books than classics.

Hmmm. Interesting. The only Tarzan link for Google Print is "Tarzan of
the Apes", and the only Tarzan search result at the Penguin Classics
site is, guess what? "Tarzan of the Apes". And Penguin Classics does not
publish the Barsoom series. "Coincidence? I think not!"

Interesting: both the search 
book "she could have seen through a pair of stove-lids just as well."
and
book "A robber is more high-toned"
find Tom Sawyer in Google Print, and
book "Christmas won't be Christmas without any presents,"
finds Little Women, but 
book "Papa was a pickle bottle"
doesn't, and 
book Little Women pickle
does find the book, but with the word pickle much further down in the
book.

Hmmm, I see. The text in the Google Print image reads "pa was a
pickle-bottle" instead. So much for any thought of them using our text.

The larger reason that they can't be using our text is that their search
results point to page images, with the search term highlighted in
yellow. You really couldn't do that unless you had mapped your text to
the dimensions and placing of the image: it would be vastly easier to do
it programmatically from the OCR process than to use an outside text.


>We'd love to hear your experiences with Google Print.

It will be handy, though probably not as handy as Amazon, for confirming
unclear corrections in some older texts. They've somewhat protected
their page images from downloading by the casual browser, but it's easy
to bypass that. The more significant restriction is the number of pages
any one session is allowed to download. This seems, to me, a reasonable
compromise for genuinely-copyrighted books, though an annoyance on these
reprints where the main story is in the PD and only the bookends are in
copyright. It'll be interesting to see what they do with 100% pre-23
guaranteed content.

jim

From shalesller at writeme.com  Thu Dec 30 13:25:36 2004
From: shalesller at writeme.com (D. Starner)
Date: Thu Dec 30 13:25:46 2004
Subject: [gutvol-d] !@!Googleberg eBooks
Message-ID: <20041230212536.30DEE4BDAB@ws1-1.us4.outblaze.com>

Michael Hart writes:
>
> eBooks often have multiple paper sources.

PG eBooks verifiably do not often have multiple paper sources. They
sometimes, occasionally, have multiple paper sources. It is the 
exception that they have multiple paper sources, and even more the 
exception that they come from multiple paper editions.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From gbnewby at pglaf.org  Thu Dec 30 15:39:23 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Dec 30 15:39:24 2004
Subject: [gutvol-d] !@!Googleberg eBooks
In-Reply-To: <20041230210318.GA27098@panix.com>
References: <Pine.LNX.4.60.0412280347130.19277@pglaf.org>
	<20041230210318.GA27098@panix.com>
Message-ID: <20041230233923.GA2406@pglaf.org>

On Thu, Dec 30, 2004 at 04:03:18PM -0500, Jim Tinsley wrote:
> On Tue, 28 Dec 2004 03:49:41 -0800 (PST), Michael Hart <hart@pglaf.org> wrote:
> 
> >
> >How many of you have tried Google Print?
> >
> >Have you noticed that the intitial offering of eBooks
> >strongly resembles the Project Gutenberg catalogue???
> >
> 
> This and some responses made me think that some people are thinking
> along the lines that they are using our texts in some way, so I checked
> it out. I figure that the answer is no, to both the explicit and implied
> questions. 
> 
> I started by searching for quotes from 20 etexts chosen at random from
> etext99, as follows:
> ...

Fascinating analysis, thanks.

Just a quick note that Google only indexes the first 100 or 150K
of eBooks (they didn't give me a firm number, but confirmed
there was a limit).  This means that quotes from later parts of
our eBooks > ~150K won't be found.
  -- Greg
From jtinsley at pobox.com  Thu Dec 30 15:49:20 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Thu Dec 30 15:49:32 2004
Subject: [gutvol-d] !@!Googleberg eBooks
In-Reply-To: <20041230233923.GA2406@pglaf.org>
References: <Pine.LNX.4.60.0412280347130.19277@pglaf.org>
	<20041230210318.GA27098@panix.com>
	<20041230233923.GA2406@pglaf.org>
Message-ID: <20041230234920.GC22506@panix.com>

On Thu, Dec 30, 2004 at 03:39:23PM -0800, Greg Newby wrote:
>
>Just a quick note that Google only indexes the first 100 or 150K
>of eBooks (they didn't give me a firm number, but confirmed
>there was a limit).  This means that quotes from later parts of
>our eBooks > ~150K won't be found.

This is true for our books, as searched for by Google in general,
like any other page, but it is not true for the Google Print
search results; when they search Google Print, they do search
the whole text, regardless of length. I did confirm this by
searching for quotes that were near the ends of books.

jim


From phil at thalasson.com  Thu Dec 30 17:12:41 2004
From: phil at thalasson.com (Philip Baker)
Date: Thu Dec 30 17:23:12 2004
Subject: [gutvol-d] !@!Googleberg eBooks
In-Reply-To: <20041230210318.GA27098@panix.com>
Message-ID: <Q$XcNKAJeK1BFw3i@thalasson.com>

Jim Tinsley <jtinsley@pobox.com> wrote:

>(Incidentally, for those who want to try, I find that preceding your
>search term with "book" will often produce a Google Print link when
>the bare search term doesn't.)


A few days ago Steve Thomas gave the following link to an article in the
San Francisco Chronicle:
http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2004/12/20/BUGROAD6QT1.DTL

In the article it says:

  Typing in "book" and any search term within the Google window generates a 
  "Book results" listing if a match of the search term is made within an 
  indexed book. These results can be clicked to read excerpts from the 
  book.

Looks as if this may develop into a 'book: key-words' type search which
will only search Google Print.
-- 
Philip Baker
From hart at pglaf.org  Fri Dec 31 10:59:51 2004
From: hart at pglaf.org (Michael Hart)
Date: Fri Dec 31 10:59:53 2004
Subject: [gutvol-d] !@!Googleberg eBooks
In-Reply-To: <20041230234920.GC22506@panix.com>
References: <Pine.LNX.4.60.0412280347130.19277@pglaf.org>
	<20041230210318.GA27098@panix.com>
	<20041230233923.GA2406@pglaf.org> <20041230234920.GC22506@panix.com>
Message-ID: <Pine.LNX.4.60.0412311058250.22737@pglaf.org>


On Thu, 30 Dec 2004, Jim Tinsley wrote:

> On Thu, Dec 30, 2004 at 03:39:23PM -0800, Greg Newby wrote:
>>
>> Just a quick note that Google only indexes the first 100 or 150K
>> of eBooks (they didn't give me a firm number, but confirmed
>> there was a limit).  This means that quotes from later parts of
>> our eBooks > ~150K won't be found.
>
> This is true for our books, as searched for by Google in general,
> like any other page, but it is not true for the Google Print
> search results; when they search Google Print, they do search
> the whole text, regardless of length. I did confirm this by
> searching for quotes that were near the ends of books.
>
> jim

Aren't the PG eBooks already in Google Print?

That's what I heard, so I would have figured they would have
re-indexed them to make them complete???

I wonder if they left the old files, and just are making new ones,
still from PG eBooks?

If so, how would you tell the difference?

mh
From hart at pglaf.org  Fri Dec 31 11:04:35 2004
From: hart at pglaf.org (Michael Hart)
Date: Fri Dec 31 11:04:35 2004
Subject: [gutvol-d] !@!Googleberg eBooks
In-Reply-To: <20041230212536.30DEE4BDAB@ws1-1.us4.outblaze.com>
References: <20041230212536.30DEE4BDAB@ws1-1.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0412311102150.22737@pglaf.org>


On Thu, 30 Dec 2004, D. Starner wrote:

> Michael Hart writes:
>>
>> eBooks often have multiple paper sources.
>
> PG eBooks verifiably do not often have multiple paper sources. They
> sometimes, occasionally, have multiple paper sources. It is the
> exception that they have multiple paper sources, and even more the
> exception that they come from multiple paper editions.

The above might take more than one reading. . . .

In addition, I should add the pretty much ALL the original PG eBooks
came from multiple editions, simply to do better error checking.

Michael

From jon at noring.name  Fri Dec 31 11:25:27 2004
From: jon at noring.name (Jon Noring)
Date: Fri Dec 31 11:25:43 2004
Subject: [gutvol-d] !@!Googleberg eBooks
In-Reply-To: <Pine.LNX.4.60.0412311102150.22737@pglaf.org>
References: <20041230212536.30DEE4BDAB@ws1-1.us4.outblaze.com>
	<Pine.LNX.4.60.0412311102150.22737@pglaf.org>
Message-ID: <142187819718.20041231122527@noring.name>

Michael Hart wrote:

> In addition, I should add the pretty much ALL the original PG eBooks
> came from multiple editions, simply to do better error checking.

How many of the PG texts fall into the category "the original PG
eBooks"?

There is, of course, a difference between consulting other sources to
clarify a few things with the text derived from the primary source, and
simply kludging together a bunch of different editions to form a "new
edition".

An example of how things got out of whack with the "original PG texts"
is Mary Shelley's "Frankenstein", where there are two quite different
editions, and the version at PG is not even marked as to which edition
it conforms with.

It was a mistake to not include source information with the early PG
texts (even if the work was a derivative.) Mistakes happen. Some of
these mistakes can be corrected after-the-fact. And future works can do
it right. No need to apologize for the past, Michael -- all projects
make mistakes. The key is to learn from the mistakes and make the
necessary changes in policies and procedures.

(Am I correct in that the policy has changed, and all new PG texts
are to include the source metadata?)

Jon

From jtinsley at pobox.com  Fri Dec 31 12:19:56 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Fri Dec 31 12:20:06 2004
Subject: [gutvol-d] !@!Googleberg eBooks
In-Reply-To: <Pine.LNX.4.60.0412311058250.22737@pglaf.org>
References: <Pine.LNX.4.60.0412280347130.19277@pglaf.org>
	<20041230210318.GA27098@panix.com>
	<20041230233923.GA2406@pglaf.org>
	<20041230234920.GC22506@panix.com>
	<Pine.LNX.4.60.0412311058250.22737@pglaf.org>
Message-ID: <20041231201956.GA2782@panix.com>

On Fri, 31 Dec 2004 10:59:51 -0800 (PST), Michael Hart <hart@pglaf.org> wrote:

>Aren't the PG eBooks already in Google Print?

No. Definitively, no. That is one of the things my experiments
demonstrated (see "pickle-bottle").

Our texts, or at least, as Greg says, the first 100K or so of them, 
are indexed in Google, and Yahoo!, and other search engines. But that's 
Google, not Google Print.

Google Print is a NEW content source. The content for Google Print is
not directly available on the web now; it is held internally by Google.

I have no inside information, but I think that my reconstruction below,
based on my actually trying the thing, is pretty close.

1. Google agree with Penguin Classics, among others, that they can use
their publications in Google Print.

2. Penguin Classics, et. al., ship Google a copy of every book they
currently have in print (which is covered by this agreement -- I imagine
there may be some restrictions).

3. Google cut the pages ('cos the scans are just _beautiful_!) and scan
the pages of the books into images.

4. Google run OCR on the pages. Along with every word, they store its
position in the image. Like: the word "poorer" is on page 62, in a box
1.1 cm wide and 0.4cm high whose top left corner is 4.2 cm from the top
of the page and 3.1 cm from the left margin, . . . except I'm sure
they're not using cm. as their unit. Abbyy does this in its internal
files it saves, so it wouldn't shock me to find that they're using Abbyy
for OCR.

5. Google resize and transform the images to JPEG for display. (I can't
prove that they didn't start with JPEGs of that size, but I think it's
likely that they scanned at 600 or higher initially.)

6. Google store the OCRed text, complete with the co-ordinates of each
word on the pages where it appears, and index that OCRed text. They also
store the JPEG images. Because they know that all the text in a book is
useful (and that a book is of a finite size!) they store _all_ of the
text of each book, not just the first 100K.

7. When a Google search is run, not only the main Google index is
searched, but also the Google Print OCR text.

8. If the search returns results from Google Print, they are displayed
on the search results page, along with the main Google results.

9. If a user clicks on a Google Print result, they are brought to the
first page image -- the JPEG file -- where that search term is found in
the OCRed text. When the page image is displayed, the search term is
highlighted in yellow, using the co-ordinates captured at OCR time.
(Actually, what is shown is the page image without the yellow, as I
demonstrated by viewing the page images directly, with the HTML creatd
dynamically to overlay yellow at the appropriate co-ordinates.)

10. The user can then browse back and forth, with limitations, through
the page images.

11. The text that Google OCRed is never actually displayed as text, or
HTML; it is used only to find the right page and highlight the search
term.


>
>That's what I heard, 

Then I feel quite certain that you heard wrong.

>so I would have figured they would have
>re-indexed them to make them complete???
>
>I wonder if they left the old files, and just are making new ones,
>still from PG eBooks?
>
>If so, how would you tell the difference?

If they were using our texts, which I am quite sure they are not, we
could tell the difference by seeing whether their text was the same as
our text. I do that quite a lot when checking out corrections to our
texts, and I can actually reel off various errors in various eeditions
of e-texts around the web by now. Their page images, and their search
index, do not contain the same words as our texts. My "pickle-bottle"
example is the least demonstration of that: many of the Penguin Classics
they have in Google Print include introductions that we do not have.
And, remember, they never display text: they _only_ display page images.

No, I conclude that Google Print overlaps not at all with PG, except
that we both have (different editions of) a large number of classic
books.

jim

From juliet.sutherland at verizon.net  Fri Dec 31 20:09:25 2004
From: juliet.sutherland at verizon.net (juliet.sutherland@verizon.net)
Date: Fri Dec 31 20:09:41 2004
Subject: [gutvol-d] !@!Googleberg eBooks
Message-ID: <20050101040925.FZKW17379.out008.verizon.net@outgoing.verizon.net>


> 
> From: Jim Tinsley <jtinsley@pobox.com>
> Date: 2004/12/31 Fri PM 12:19:56 PST
> To: gutvol-d@lists.pglaf.org
> Subject: Re: [gutvol-d] !@!Googleberg eBooks

<snip>

> 3. Google cut the pages ('cos the scans are just _beautiful_!) and scan
> the pages of the books into images.

As I've previously noted, destructive scanning of modern reprints is easy and usually results in good images and good OCR.

> 4. Google run OCR on the pages. Along with every word, they store its
> position in the image. Like: the word "poorer" is on page 62, in a box
> 1.1 cm wide and 0.4cm high whose top left corner is 4.2 cm from the top
> of the page and 3.1 cm from the left margin, . . . except I'm sure
> they're not using cm. as their unit. Abbyy does this in its internal
> files it saves, so it wouldn't shock me to find that they're using Abbyy
> for OCR.

The folks at The Million Book Project and The Internet Archive are using something called djvu that does this. It creates bounding boxes around each word in the image, then stores that information along with the text. The OCR associated with djvu is not ABBYY but another product that does not work quite as well.

A DP volunteer posted the following in our forums:

----------------------------

Here's an interesting experiment... 

Go to http://www.google.com/googleblog/. 

Under "All booked up" (which talks about the Google/Library project), click on the link labelled "the survival of the fittest". This takes you to a beta of Google Print, for the specific book "Darwin, and After Darwin". 

Under "Search within this book", type "Darwin" and hit "Go". You'll get a new window with 3 images, showing the first few occurrences of "Darwin" in the book, where "Darwin" is highlighted in yellow. 

What's interesting is that in the third image, there are two occurrences of the word "Darwin", but the first is not highlighted. 

Similarly, if you search for "Berkeley", one occurrence in the second image is missing its highlight. 

This suggests that their searches are based on unproofed OCR results (where the unhighlighted occurrences correspond to uncorrected scannos). 

... searching for "1 arwin" (one, space, arwin) and having it highlight "Darwin". (Try it, it's neat!) 
---------------

All of the above would appear to confirm Jim's assessment about what Google has done to date.

JulietS