From Bowerbird at aol.com  Sun Jul  2 01:14:33 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Jul  2 01:14:39 2006
Subject: [gutvol-d] the little red hen
Message-ID: <3af.46261f0.31d8da69@aol.com>

well, well, ok, one of my favorite stories
-- the little red hen -- has now hit p.g.
with its illustrations, as e-text #18735.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060702/5ad5e3d9/attachment.html
From tony at baechler.net  Sun Jul  2 23:58:10 2006
From: tony at baechler.net (Tony Baechler)
Date: Sun Jul  2 23:58:01 2006
Subject: [gutvol-d] ftp.archive.org
In-Reply-To: <44A5B5A3.1010703@aol.com>
References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net>
	<44A04DB1.6070608@aol.com>
	<7.0.1.0.2.20060628003711.03fdd800@baechler.net>
	<Pine.LNX.4.64.0606280708370.8333@aphrodite.gnu-designs.com>
	<7.0.1.0.2.20060630005304.03354d60@baechler.net>
	<44A5B5A3.1010703@aol.com>
Message-ID: <7.0.1.0.2.20060702235033.03360330@baechler.net>

Hello list.  I really appreciate the help with rsync, especially 
David's sample command.  However, I don't think people understand 
what I want.  I do not want a full mirror of the PG archive.  I do 
have a partial mirror but it's very specialized.  I don't follow the 
standard PG directory structure.  I only download English books in 
plain text.  I don't want html, non-English, or 8-bit.  I have books 
divided up into 1,000 per directory.  For example, my etext18 
contains files 18000-18999 etc.  For reposts, etext0\ contains books 
0-999.  This is completely different from the PG structure.  Once I 
download a file, I don't have any reason to retrieve it again unless 
it gets updated.  If rsync will grab all the files I need and put 
them in my customized structure, rsync will help.  However, based on 
my reading of the help, it doesn't do that.  My apologies for the 
Windows comment.  I found it and it's part of cygwin as written 
below.  The syntax actually isn't as bad as I thought.  It reminds me 
of wget which I use frequently.  It would be perfect for mirroring a 
primary server to a backup machine.  Again, I'm not trying to mirror 
here so it doesn't do what I want based on my understanding of 
comments made on this list and the rsync help.

At 04:37 PM 6/30/06 -0700, you wrote:
>Rsync's available for Windows as part of the cygwin package.  Just like
>FTP or wget you can tell rsync to get only the stuff you want. and
>unlike FTP or wget it will only download the files that need updating,
>without you having to wait several hours for it to skip over every file
>that hasn't changed.
>
>I admit it can be confusing since it's a very powerful too.  I was
>talking about it with Aaron Cannon and he says it's a better way to make
>a "mirror" of PG (with or without specific files that you want.


-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.1.394 / Virus Database: 268.9.8/380 - Release Date: 6/30/06


From tony at baechler.net  Mon Jul  3 00:06:03 2006
From: tony at baechler.net (Tony Baechler)
Date: Mon Jul  3 00:05:56 2006
Subject: [gutvol-d] ftp.archive.org
In-Reply-To: <Pine.LNX.4.64.0606301956120.4089@aphrodite.gnu-designs.com
 >
References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net>
	<44A04DB1.6070608@aol.com>
	<7.0.1.0.2.20060628003711.03fdd800@baechler.net>
	<Pine.LNX.4.64.0606280708370.8333@aphrodite.gnu-designs.com>
	<7.0.1.0.2.20060630005304.03354d60@baechler.net>
	<44A5B5A3.1010703@aol.com>
	<Pine.LNX.4.64.0606301956120.4089@aphrodite.gnu-designs.com>
Message-ID: <7.0.1.0.2.20060702235853.0336ea40@baechler.net>

Hello David.  I tried Unison extensively because it's available for 
both Windows and Debian.  I thought it would be great for miroring 
the Debian files to the Windows machine etc.  I found it far worse to 
use than rsync.  I followed the instructions exactly but I never got 
it to work.  It uses a complicated url scheme that doesn't conform to 
standards.  While it uses a protocol similar to rsync, it isn't the 
same.  It has no means of encrypting the files.  The documentation 
makes it clear that it is not secure and shouldn't be used for 
anything critical.  I found that scp works a lot better, is far more 
secure and actually works.  I eventually got the files that way and 
had ssh encryption as a bonus.  Believe me, I would rather learn 
rsync than try that again.  I'm sure it's useful rfor some 
applications but not in this case.  Also both systmes must run the 
same version which means that all old versions need to be kept 
available.  As you probably know, Debian stable can be a few versions 
behind the regular development.  As it turned out, the versions 
didn't match so I had to find an older Windows version to get it to 
work.  Thanks anyway.

At 07:56 PM 6/30/06 -0400, you wrote:

>         How about using Unison?
>
>         http://www.cis.upenn.edu/~bcpierce/unison/


-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.1.394 / Virus Database: 268.9.8/380 - Release Date: 6/30/06


From gbnewby at pglaf.org  Mon Jul  3 03:45:05 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon Jul  3 03:45:07 2006
Subject: [gutvol-d] ftp.archive.org
In-Reply-To: <7.0.1.0.2.20060628004644.03fd5a20@baechler.net>
References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net>
	<20060626223514.GA11041@pglaf.org>
	<7.0.1.0.2.20060628004644.03fd5a20@baechler.net>
Message-ID: <20060703104505.GB24484@pglaf.org>

On Wed, Jun 28, 2006 at 12:52:27AM -0700, Tony Baechler wrote:
> Hi.  Thanks very much, the readingroo.ms server seems much 
> faster.  When I checked last, snowy.arsc.alaska.edu seemed to be a 
> few hours behind the other master sites.  I am no longer able to 

This was a little mysterious...  turns out I have two
independent copies on snowy.arsc.alaska.edu.

The one at ftp://snowy.arsc.alaska.edu/mirrors/gutenberg
is not actually a mirror.  It receives a live copy of files
as they are posted to readingoo.ms and our main server
at ibiblio.org (which runs gutenberg.org's server).

The one at http://snowy.arsc.alaska.edu/gutenberg is just
a regular mirror that I pull back from ibiblio, daily.
That explains why it's not quite current.

I'm in the middle of setting up some additional mirrors,
so this will probably continue to change a bit.
  -- Greg

> connect to ftp.archive.org, it just times out.  I am not a Debian 
> expert but I do run a Debian server and know a reasonable amount 
> about it.  What needs doing?  I am not really a programmer but I know 
> how to install packages and set up things for the most part.  If 
> there is something that needs to be done, let me know and I'll see.
> 
> At 03:35 PM 6/26/06 -0700, you wrote:
> 
> >I hope this helps.  My guess is the readingroo.ms server will
> >give you the best throughput (though it will have some
> >brief downtime, then possibly be heavily loaded during the
> >world ebook fair, http://www.worldebookfair.com).
> >
> >Are there any Debian whizzes on this list who might want to help look
> >after the readingroo.ms server with me?
> >
> >  -- Greg
> 
> 
> -- 
> No virus found in this outgoing message.
> Checked by AVG Anti-Virus.
> Version: 7.1.394 / Virus Database: 268.9.5/376 - Release Date: 6/26/06
> 
From gbnewby at pglaf.org  Tue Jul  4 01:44:20 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue Jul  4 01:44:23 2006
Subject: [gutvol-d] New DVD ISO to try
In-Reply-To: <7.0.1.0.2.20060628084333.0426a7b0@baechler.net>
References: <20060626093237.GA27369@pglaf.org>
	<7.0.1.0.2.20060628084333.0426a7b0@baechler.net>
Message-ID: <20060704084420.GA11229@pglaf.org>

Thanks to everyone who provided feedback and ideas for the new DVD
image.  I've made a new image that contains *all* of the plain text
titles (zipped), plus a bunch of multimedia and some nice HTML with
images.

Feedback welcome:

	http://snowy.arsc.alaska.edu/gbn/pgimages/jul06special-work/

my notes on what's included:

	http://snowy.arsc.alaska.edu/gbn/pgimages/newdvd.txt

As you will read at the first URL, I went ahead and included
lots of our copyrighted content.  Then, I said that the DVD
could be given away, but NOT sold.  I like this.

Enjoy!
  -- Greg
From cannona at fireantproductions.com  Tue Jul  4 07:39:20 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Tue Jul  4 07:39:48 2006
Subject: [gutvol-d] New DVD ISO to try
References: <20060626093237.GA27369@pglaf.org><7.0.1.0.2.20060628084333.0426a7b0@baechler.net>
	<20060704084420.GA11229@pglaf.org>
Message-ID: <000201c69f77$af35a190$0132a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This looks great.  One thing you could change would be to redo the title
index and distribute all of the "the" titles to their proper places in the
lists.  The reason is that the t index for titles is huge.  Either that or
split it.  HTML files that are too large cause problems for my screen
reader, and I imagine that they might for some older systems as well, but I
could be wrong.

As an alternative, or in addition, I would suggest also providing a text
index of titles and authors.

Before you make the ISO, you might slap the autorun.inf file into the root
directory.  Use the one from the CD as the DVD autorun doesn't work on older
systems.  You shouldn't need to change anything, as it already points to
index.html.

Finally, I would like to write up a short set of instructions on how to
"Install" a copy of the dvd on your hard drive.  It wouldn't be anything
fancy, just create a folder, copy the contents of the disc to that folder,
and create a short cut to index.html and place it either on the desktop or
under programs.

If any mac users would like to write some instructions for their OS, they
would be appreciated I'm sure.  Anyone who uses Linux or the like shouldn't
need instructions, but if someone disagrees, they could be included as well
if you write them.

That's all for now.

Sincerely
Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)
- ----- Original Message -----
From: "Greg Newby" <gbnewby@pglaf.org>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Cc: "Project Gutenberg CDs" <cd@pglaf.org>
Sent: Tuesday, July 04, 2006 3:44 AM
Subject: [gutvol-d] New DVD ISO to try


> Thanks to everyone who provided feedback and ideas for the new DVD
> image.  I've made a new image that contains *all* of the plain text
> titles (zipped), plus a bunch of multimedia and some nice HTML with
> images.
>
> Feedback welcome:
>
> http://snowy.arsc.alaska.edu/gbn/pgimages/jul06special-work/
>
> my notes on what's included:
>
> http://snowy.arsc.alaska.edu/gbn/pgimages/newdvd.txt
>
> As you will read at the first URL, I went ahead and included
> lots of our copyrighted content.  Then, I said that the DVD
> could be given away, but NOT sold.  I like this.
>
> Enjoy!
>  -- Greg
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFEqn20I7J99hVZuJcRAj7JAJ0SAru+IMO+NrLX4aXe1lvq4svVNACfQEI1
yURPmloPbGZeKGXQEMR1zzY=
=cqkK
-----END PGP SIGNATURE-----

From kouhia at nic.funet.fi  Tue Jul  4 08:15:31 2006
From: kouhia at nic.funet.fi (Juhana Sadeharju)
Date: Tue Jul  4 08:47:50 2006
Subject: [gutvol-d] Copyright question
Message-ID: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>


Hello. Most often I hear that the copyright of the book lasts 80 years
after the death of author. But it is normal that the copyright is
transferred to the publisher in the contract. Then why the copyright
expiration is still tied to the author who don't have the copyright
anymore? Is this misuse of copyright law? Should author keep the
copyright (and publisher only license) so that the death+80 rule
applies?

That is most convenient to publishers, of course, because they
get the copyright and its expiration is still tied to the author.

In the example case, the book writing contract was made 8 years
ago and the contract included the second edition published now.
Because the publisher owns the copyright of the second edition
already due the contract, the author has never owned the copyright.
So how in this case the copyright expiration could never be tied to
the author?

Juhana
From sly at victoria.tc.ca  Tue Jul  4 09:16:35 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Jul  4 09:16:37 2006
Subject: [gutvol-d] Copyright question
In-Reply-To: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>
References: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>
Message-ID: <Pine.GSO.4.58.0607040900410.25387@vtn1.victoria.tc.ca>


Copyright laws are different in every country.

I know that in Canada, the duration of copyright is
determined by the life-span of the creator, regardless
of who actually owns the copyright. I cannot speak for
any other countries.

You are unlikely to find a useful answer here on the
Project Gutenberg Volunteer Discussion list. For a
list dedicated to discussing copyright issues, see:
http://www.cni.org/forums/cni-copyright/

Andrew

On Tue, 4 Jul 2006, Juhana Sadeharju wrote:

>
> Hello. Most often I hear that the copyright of the book lasts 80 years
> after the death of author. But it is normal that the copyright is
> transferred to the publisher in the contract. Then why the copyright
> expiration is still tied to the author who don't have the copyright
> anymore? Is this misuse of copyright law? Should author keep the
> copyright (and publisher only license) so that the death+80 rule
> applies?
>
> That is most convenient to publishers, of course, because they
> get the copyright and its expiration is still tied to the author.
>
> In the example case, the book writing contract was made 8 years
> ago and the contract included the second edition published now.
> Because the publisher owns the copyright of the second edition
> already due the contract, the author has never owned the copyright.
> So how in this case the copyright expiration could never be tied to
> the author?
>
> Juhana
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From Bowerbird at aol.com  Tue Jul  4 11:11:07 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Jul  4 11:11:15 2006
Subject: [gutvol-d] New DVD ISO to try
Message-ID: <bb9.4bed10.31dc093b@aol.com>

greg said:
>    http://snowy.arsc.alaska.edu/gbn/pgimages/newdvd.txt

thanks, greg, this document is _extremely_ informative.

have a   happy holiday...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060704/57646858/attachment.html
From gbnewby at pglaf.org  Tue Jul  4 12:37:05 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue Jul  4 12:37:06 2006
Subject: [gutvol-d] New DVD ISO to try
In-Reply-To: <bb9.4bed10.31dc093b@aol.com>
References: <bb9.4bed10.31dc093b@aol.com>
Message-ID: <20060704193705.GD26049@pglaf.org>

On Tue, Jul 04, 2006 at 02:11:07PM -0400, Bowerbird@aol.com wrote:
> greg said:
> >    http://snowy.arsc.alaska.edu/gbn/pgimages/newdvd.txt
> 
> thanks, greg, this document is _extremely_ informative.

That's the part that looks like it should be easy,
but actually took me about 25 hours!  I also have
the list of "best of" titles, which took a long time.
It's in the same location:

	http://snowy.arsc.alaska.edu/gbn/pgimages/

  -- Greg
From Bowerbird at aol.com  Tue Jul  4 16:24:30 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Jul  4 16:24:44 2006
Subject: [gutvol-d] New DVD ISO to try
Message-ID: <4ba.35e4a06.31dc52ae@aol.com>

greg said:
>    That's the part that looks like it should be easy,
>    but actually took me about 25 hours!?

oh yeah, i'm well aware that it's harder than it looks.
i'll have a few questions for you about it, tomorrow...

for today, happy birthday to project gutenberg!

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060704/a14a65c4/attachment.html
From desrod at gnu-designs.com  Tue Jul  4 14:43:15 2006
From: desrod at gnu-designs.com (David A. Desrosiers)
Date: Tue Jul  4 18:43:32 2006
Subject: [gutvol-d] ftp.archive.org
In-Reply-To: <7.0.1.0.2.20060702235033.03360330@baechler.net>
References: <7.0.1.0.2.20060626114129.032ee4e0@baechler.net>
	<44A04DB1.6070608@aol.com>
	<7.0.1.0.2.20060628003711.03fdd800@baechler.net>
	<Pine.LNX.4.64.0606280708370.8333@aphrodite.gnu-designs.com>
	<7.0.1.0.2.20060630005304.03354d60@baechler.net>
	<44A5B5A3.1010703@aol.com>
	<7.0.1.0.2.20060702235033.03360330@baechler.net>
Message-ID: <1152049396.16394.6.camel@localhost.localdomain>

On Sun, 2006-07-02 at 23:58 -0700, Tony Baechler wrote:
> If rsync will grab all the files I need and put 
> them in my customized structure, rsync will help.  However, based on 
> my reading of the help, it doesn't do that.  

In fact, it does exactly that, depending on how you tell it where to put
the files its copying over. You may have to script a bit of it or run
several rsync commands in a series to get what you want (fetch text
first, indices next and so on). 

rsync -avSP --delete *[0-9].txt /my/custom/directory
rsync -avSP --delete *.gz /other/place

...and so on. I've been using rsync for MANY years now, and tridge is
one of the alumnus' of a previous company we both worked for
(Linuxcare), so I can say with absolute-certainty, that if you're
running into trouble with what rsync does for you, you're doing
something wrong ;) 

If there's one thing rsync does well, its everything. There are even
people out there who use rsync _exclusively_ as their MTA/MDA. Nutty,
but true. 

> It would be perfect for mirroring a primary server to a backup
> machine.  Again, I'm not trying to mirror here so it doesn't do what I
> want based on my understanding of comments made on this list and the
> rsync help. 

It can do a lot of things, incremental backups, snapshots, mirroring,
cloning of directories, complete transposition... pretty much anything
you want. It just takes a remote file, block-copies it to some local
place (or local to local, if you're cloning a drive for example. I've
used rsync quite a bit to upgrade hard drives in laptops, works great). 

In any case, just define your schema and apply the rsync methodology to
it. No need to get complicated or fancy. 

Oh, and lastly... rsync does NOT have to have the same version running
on both ends. If that were true, it would break in thousands of
situations. You simply have to have a version which understands the
options you're passing it (i.e. rsync v1.x isn't future-compatible with
v2.6.6). 

I can't speak for the Windows <-> Linux syncronization, but it should be
moot. You don't need to run rsyncd to rsync files from machine to
machine either, but you can if you wish. 

Good luck!


-- 
David A. Desrosiers
desrod gnu-designs com
http://gnu-designs.com

"Erosion of civil liberties... is a threat to national security."

From rnmscott at netspace.net.au  Wed Jul  5 06:18:02 2006
From: rnmscott at netspace.net.au (rnmscott@netspace.net.au)
Date: Wed Jul  5 06:18:08 2006
Subject: [gutvol-d] Re: Automated readability scores for PG eBooks
Message-ID: <1152105482.44abbc0abcbbd@webmail.netspace.net.au>

 
Interesting idea, Greg.  Amazon has this for some texts, via another method 
(or two, I think, the names escape me currently, Flesch something or other?) 
or two, and I think there are cpan modules that do these.  A GUTREADABILITY 
file from these would be interesting/fun. 
 
Someone would get an interesting academic project or two out of that perhaps, 
as well? 
 
Riohard 


------------------------------------------------------------
This email was sent from Netspace Webmail: http://www.netspace.net.au

From rnmscott at netspace.net.au  Wed Jul  5 06:24:27 2006
From: rnmscott at netspace.net.au (rnmscott@netspace.net.au)
Date: Wed Jul  5 06:24:31 2006
Subject: [gutvol-d] Re: Automated readability scores for PG eBooks
In-Reply-To: <1152105482.44abbc0abcbbd@webmail.netspace.net.au>
References: <1152105482.44abbc0abcbbd@webmail.netspace.net.au>
Message-ID: <1152105867.44abbd8ba3b73@webmail.netspace.net.au>

greg said: 
>   one value of this is that it does  
>    a good job of identifing children's eBooks  
>    (they tend to be "easy").  
 
checklist said: 
>    bigword density 
>   short word density (-) 
>   wordsPerSentences 
>   syllablesPerWords 
>   profainwordsPerWords 
>   numbersPerWords 
>   mostCommon1000WordsPerWord (-) 
>   commascharsPerWords 
>   wordsPerParagraphs 
>   letterFrequencyDistributionError 
>   adjacentLetterPairsFrequencyDistributionError 
>   uniqueStemmedWordsPerWord; 
 
aren't scientists silly?            :+) 
 
look, greg, if you want a list of children's e-books, 
or a list of "easy" e-books, or any kind of list of books, 
just ask the distributed proofreaders people for the list... 
 
-- If I ask them to classify 20000 books for me, will I get a reply any time 
this century? :-) 
 
 
they'll give you a long list of books, any kind of list you want, 
and you won't have to do one little bit of fancy-ass statistics... 
 
i'm serious, they can give a list with p.g. e-text numbers and 
meaningful notes, and funny little stories, and _everything_... 
 
much more vivid than your boring-ass statistics...          :+) 
 
 
-- and anecdotes about 'hey I remember that really weird typo on page 263' and 
'I picked that up at Fred's garage sale on Smith street' are very exciting? 
;-) 


------------------------------------------------------------
This email was sent from Netspace Webmail: http://www.netspace.net.au

From rnmscott at netspace.net.au  Wed Jul  5 06:10:22 2006
From: rnmscott at netspace.net.au (rnmscott@netspace.net.au)
Date: Wed Jul  5 06:29:45 2006
Subject: [gutvol-d] Re: 'Lasker's Manual of Chess'
Message-ID: <1152105022.44abba3e4d275@webmail.netspace.net.au>

Interesting idea.  I had never even thought of chess works, despite having 
actually read this, way back when, I think. 
 
How would you do it, with images?  Some of them could be pretty big, with lots 
of board positions.  Re-doing them as ascii boards like on old chess servers 
wouldn't be too much fun, but possible? 
 
Richard 
 

------------------------------------------------------------
This email was sent from Netspace Webmail: http://www.netspace.net.au

From rnmscott at netspace.net.au  Wed Jul  5 06:45:11 2006
From: rnmscott at netspace.net.au (rnmscott@netspace.net.au)
Date: Wed Jul  5 06:45:15 2006
Subject: [gutvol-d] Re: Automated readability scores for PG eBooks
In-Reply-To: <1152105867.44abbd8ba3b73@webmail.netspace.net.au>
References: <1152105482.44abbc0abcbbd@webmail.netspace.net.au>
	<1152105867.44abbd8ba3b73@webmail.netspace.net.au>
Message-ID: <1152107111.44abc267d5099@webmail.netspace.net.au>

On 6/26/06, Scott Lawton <scott_bulkmail at productarchitect.com> wrote: 
> While I agree that it would not be worth adding readability score if it had 
much 
> impact on these and other worthy goals, 
 
But if it doesn't, then those goals aren't reasons _for_ adding it. 
 
> There are lots and lots of cool things that could be done with the catalog. 
 
We could start with the results of stripping the header and running wc 
on it. That strikes me at least as useful as this result. Also, the 
ten or twelve most common words in the book after stripping the ten or 
twelve most common words in the English language. 
 
-- I'd like to see that too, word count is perhaps a little more meaningful to 
your average reader than size in kilobytes (which is displayed? and useful to 
know as well, of course) 
 
> Even in the context of the above, the scores would provide a great starting 
point for 
> being improved with manual cataloging and literacy labeling. 
 
I don't think so. It's downright useless for manual cataloging, as it 
only handles that one dimension. I don't think it will help literacy 
labeling much, either, which is best done manually. 
 
-- Certainly wouldn't be useless, if you are going to catalogue/tag things 
manually. 
 
> Don't let the perfect stand in the way of the good. 
 
But I don't think having these numbers anywhere prominent is good. 
Right now our pages only have a few pieces of important information; 
minutia like this should go to a page linked to a page linked only 
from the book page, which we can fill with various stats to our hearts 
content. 
 
-- Shouldn't be next to the title, but an index page of all of them would be 
cool.  As the texts stand now lots of them have pages of mind numbing 
legalese, already,  not sure two lines of numbers matter in all of that. 
 
 
It also seems a little weird to have some proprietary reading level 
numbers on the system, instead of the Fog index or the Flesch-Kincaid 
Readability tests. It feels like an advertisement. 
 
-- That was the two I was trying to think of before, thanks! :) 
 
 
This is clipped from the 'Voyages of Doctor Dolittle' 
 
Readability 	Compared with books in All Categories     
Fog Index:  	8.3 	16% are easier 		84% are harder 
Flesch Index:  	74.6 	11% are easier 		89% are harder 
Flesch-Kincaid Index:  	6.5 	17% are easier 		83% are harder 
  
Complexity (learn more) 
Complex Words:  	6% 	8% have fewer 		92% have more 
Syllables per Word:  	1.4 	7% have fewer 		93% have more 
Words per Sentence:  	14.8 	39% have fewer 		61% have more 
  
Number of 
Characters:  	387,512 	47% have fewer 		53% have more 
Words:  	72,671 	55% have fewer 		45% have more 
Sentences:  	4,912 	64% have fewer 		36% have more 
 
 
I suppose with wc etc. you could have percentiles of book by 'length' etc. 
which are relevant to people, too. 
 
Richard 


------------------------------------------------------------
This email was sent from Netspace Webmail: http://www.netspace.net.au

From jon.ingram at gmail.com  Wed Jul  5 08:10:59 2006
From: jon.ingram at gmail.com (Jon Ingram)
Date: Wed Jul  5 08:18:06 2006
Subject: [gutvol-d] Re: 'Lasker's Manual of Chess'
In-Reply-To: <1152105022.44abba3e4d275@webmail.netspace.net.au>
References: <1152105022.44abba3e4d275@webmail.netspace.net.au>
Message-ID: <4baf53720607050810s3f4ad3b8ud15615f88e2ccf31@mail.gmail.com>

On 7/5/06, rnmscott@netspace.net.au <rnmscott@netspace.net.au> wrote:
> Interesting idea.  I had never even thought of chess works, despite having
> actually read this, way back when, I think.
>
> How would you do it, with images?  Some of them could be pretty big, with lots
> of board positions.  Re-doing them as ascii boards like on old chess servers
> wouldn't be too much fun, but possible?

Symbols for all the chess pieces are in Unicode (see
  http://www.unicode.org/charts/PDF/U2600.pdf
), but I don't image the glyphs are in all that many fonts!

Having lots of images isn't that big a problem, especially if the
images are only black-and-white.

-- 
Jon Ingram
From jon at noring.name  Wed Jul  5 08:27:44 2006
From: jon at noring.name (Jon Noring)
Date: Wed Jul  5 08:27:59 2006
Subject: [gutvol-d] Re: 'Lasker's Manual of Chess'
In-Reply-To: <4baf53720607050810s3f4ad3b8ud15615f88e2ccf31@mail.gmail.com>
References: <1152105022.44abba3e4d275@webmail.netspace.net.au>
	<4baf53720607050810s3f4ad3b8ud15615f88e2ccf31@mail.gmail.com>
Message-ID: <1256093780.20060705092744@noring.name>

Jon Ingram wrote:
> rnmscott@netspace.net.au wrote:

>> Interesting idea.  I had never even thought of chess works, despite having
>> actually read this, way back when, I think.
>>
>> How would you do it, with images?  Some of them could be pretty big, with lots
>> of board positions.  Re-doing them as ascii boards like on old chess servers
>> wouldn't be too much fun, but possible?

> Symbols for all the chess pieces are in Unicode (see
>   http://www.unicode.org/charts/PDF/U2600.pdf
> ), but I don't image the glyphs are in all that many fonts!
>
> Having lots of images isn't that big a problem, especially if the
> images are only black-and-white.


Another approach to consider, and with any highly formatted textual
objects where "layout is content" [note], is to use SVG to represent the
chess board positions. With animated SVG, one should even be able to show
the move-by-move board positions.

SVG rendering engines are getting to be ubiquitous. The Mozilla engine
includes support for some flavor of SVG.

Jon


[note: such things as ultra-complex tables, poetry and prose where the
position of the text itself communicates content, etc., are types of
content amenable to representation using SVG.]

From sly at victoria.tc.ca  Wed Jul  5 09:05:56 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Jul  5 09:05:59 2006
Subject: [gutvol-d] Re: 'Lasker's Manual of Chess'
In-Reply-To: <1152105022.44abba3e4d275@webmail.netspace.net.au>
References: <1152105022.44abba3e4d275@webmail.netspace.net.au>
Message-ID: <Pine.GSO.4.58.0607050855260.5666@vtn1.victoria.tc.ca>


Hmmm..... ok summary of ideas so far, and one more
of my own.

1) As you mention, you could do the positions using just
ascii characters. (a little tedious to do, but perhaps the
most portable.)
2) Similar to above, only using high-unicode codepoints for
chess pieces. Good point: standards compliant. Drawback: at
present, not very many people could view it correctly.
3) You could extract images of each board position from
page scans, and create an html file.
4) Using your own software, or what-have-you, you could
create new images showing the same positions. (Probably
result in cleaner images this way.)
5) Jon Noring mentioned using SVG, which I wouldn't have
thought of. Investigate at your pleasure.
6) I'm sure I've seen somewhere in PG some use of PGN
for recording chess games. It might be of use in this
case. See: http://en.wikipedia.org/wiki/Portable_Game_Notation


Andrew

On Wed, 5 Jul 2006 rnmscott@netspace.net.au wrote:

> Interesting idea.  I had never even thought of chess works, despite having
> actually read this, way back when, I think.
>
> How would you do it, with images?  Some of them could be pretty big, with lots
> of board positions.  Re-doing them as ascii boards like on old chess servers
> wouldn't be too much fun, but possible?
>
> Richard
From slybarger at gmail.com  Wed Jul  5 09:35:27 2006
From: slybarger at gmail.com (Suzanne Lybarger)
Date: Wed Jul  5 09:35:31 2006
Subject: [gutvol-d] Re: 'Lasker's Manual of Chess'
In-Reply-To: <Pine.GSO.4.58.0607050855260.5666@vtn1.victoria.tc.ca>
References: <1152105022.44abba3e4d275@webmail.netspace.net.au>
	<Pine.GSO.4.58.0607050855260.5666@vtn1.victoria.tc.ca>
Message-ID: <72f95c520607050935t7ed5b9d7r417abe01da2bf489@mail.gmail.com>

On 7/5/06, Andrew Sly <sly@victoria.tc.ca> wrote:

<snip!>

> 6) I'm sure I've seen somewhere in PG some use of PGN
> for recording chess games. It might be of use in this
> case. See: http://en.wikipedia.org/wiki/Portable_Game_Notation

Yep!  Here is The Blue Book of Chess, post-processed by Peter Barozzi at DP:

http://www.gutenberg.org/etext/16377

He used PGN and provided modern notation for all the games in both
file types.  We created ascii boards for the text version in proofing,
and he separately generated the illustrations of the boards for the
HTML.


Cheers,
Suzanne
=======================================
= Project Gutenberg's Distributed Proofreaders =
       Preserving History One Page at a Time.
                   http:///www.pgdp.net
=======================================
From gbnewby at pglaf.org  Wed Jul  5 17:20:28 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Jul  5 17:20:29 2006
Subject: [gutvol-d] Pls. test worldebookfair.com
Message-ID: <20060706002028.GB18396@pglaf.org>

http://www.worldebookfair.com

It was on an overloaded network connection earlier, but
we moved it this (Wednesday) morning and the site seems to be
performing well.

Take a look - it's pretty neat!

There are a few missing files & broken links, but
for the most part things seem OK.   
  -- Greg
From rnmscott at netspace.net.au  Wed Jul  5 19:46:14 2006
From: rnmscott at netspace.net.au (rnmscott@netspace.net.au)
Date: Wed Jul  5 19:46:20 2006
Subject: [gutvol-d] Pls. test worldebookfair.com
In-Reply-To: <20060706002028.GB18396@pglaf.org>
References: <20060706002028.GB18396@pglaf.org>
Message-ID: <1152153974.44ac7976d0348@webmail.netspace.net.au>

Still seems to be really slow, pretty much the same as yesterday (the first 
time I looked).  I am getting < 1k download a lot of the time. 
 
 
Quoting Greg Newby <gbnewby@pglaf.org>: 
 
> http://www.worldebookfair.com 
>  
> It was on an overloaded network connection earlier, but 
> we moved it this (Wednesday) morning and the site seems to be 
> performing well. 
>  
> Take a look - it's pretty neat! 
>  


------------------------------------------------------------
This email was sent from Netspace Webmail: http://www.netspace.net.au

From brad at chenla.org  Wed Jul  5 21:24:45 2006
From: brad at chenla.org (Brad Collins)
Date: Wed Jul  5 21:22:06 2006
Subject: [gutvol-d] Conference Paper for Electronic Library Markup Language
Message-ID: <m3irmbmkc2.fsf@chenla.org>


I while back I mentioned that I would be presenting a paper at the
Extreme Markup Language Conference in Montreal in August.

The paper is an introduction to BMF (The Burr Metadata Framework)
which is a monster markup language which pulls together concepts from
the FRBR and Z39.19 (NISO Standard for Monolingual Thesuri) and draws
on many concepts from TEI.

BMF is designed to provide a framework for building distributed
electronic libraries which can be annotated and extended by anyone.

A few people expressed interest in reading the paper but I lost the
list.  So if anyone wants to read the paper drop me a note.  It's
about 80K zipped.

Cheers,

b/

-- 
Brad Collins <brad@chenla.org>, Banqwao, Thailand

From joey at joeysmith.com  Wed Jul  5 21:33:35 2006
From: joey at joeysmith.com (joey)
Date: Wed Jul  5 21:34:42 2006
Subject: [gutvol-d] Pls. test worldebookfair.com
In-Reply-To: <1152153974.44ac7976d0348@webmail.netspace.net.au>
References: <20060706002028.GB18396@pglaf.org>
	<1152153974.44ac7976d0348@webmail.netspace.net.au>
Message-ID: <20060706043334.GC20863@joeysmith.com>

Can you check which IP address your machine resolves www.worldebookfair.com
to? I'm getting 2Mb/s from 208.99.202.194 (the readingroo.ms server). Perhaps
your DNS simply hasn't updated, or maybe there's congestion between you and
readingroo.ms, but I'd like to know before I try adding some of the rate
limiting stuff Greg has asked me to look into.

On Thu, Jul 06, 2006 at 12:46:14PM +1000, rnmscott@netspace.net.au wrote:
> Still seems to be really slow, pretty much the same as yesterday (the first 
> time I looked).  I am getting < 1k download a lot of the time. 
>  
>  
> Quoting Greg Newby <gbnewby@pglaf.org>: 
>  
> > http://www.worldebookfair.com 
> >  
> > It was on an overloaded network connection earlier, but 
> > we moved it this (Wednesday) morning and the site seems to be 
> > performing well. 
> >  
> > Take a look - it's pretty neat! 
> >  
> 
From rnmscott at netspace.net.au  Wed Jul  5 21:40:28 2006
From: rnmscott at netspace.net.au (rnmscott@netspace.net.au)
Date: Wed Jul  5 21:40:33 2006
Subject: [gutvol-d] Pls. test worldebookfair.com
In-Reply-To: <20060706043334.GC20863@joeysmith.com>
References: <20060706002028.GB18396@pglaf.org>
	<1152153974.44ac7976d0348@webmail.netspace.net.au>
	<20060706043334.GC20863@joeysmith.com>
Message-ID: <1152160828.44ac943c29349@webmail.netspace.net.au>

 
PING www.worldebookfair.com (72.235.235.66) 56(84) bytes of data 
64 bytes from 72.235.235.66: icmp_seq=1 ttl=110 time=3769 ms 
64 bytes from 72.235.235.66: icmp_seq=2 ttl=110 time=3566 ms 
64 bytes from 72.235.235.66: icmp_seq=3 ttl=110 time=3628 ms 
64 bytes from 72.235.235.66: icmp_seq=4 ttl=110 time=3427 ms 
64 bytes from 72.235.235.66: icmp_seq=5 ttl=110 time=3501 ms 
 
Quoting joey <joey@joeysmith.com>: 
 
> Can you check which IP address your machine resolves www.worldebookfair.com 
> to? I'm getting 2Mb/s from 208.99.202.194 (the readingroo.ms server). 
Perhaps 
> your DNS simply hasn't updated, or maybe there's congestion between you and 
> readingroo.ms, but I'd like to know before I try adding some of the rate 
> limiting stuff Greg has asked me to look into. 
>  
 

------------------------------------------------------------
This email was sent from Netspace Webmail: http://www.netspace.net.au

From joey at joeysmith.com  Wed Jul  5 22:20:27 2006
From: joey at joeysmith.com (joey)
Date: Wed Jul  5 22:21:32 2006
Subject: [gutvol-d] Pls. test worldebookfair.com
In-Reply-To: <1152160828.44ac943c29349@webmail.netspace.net.au>
References: <20060706002028.GB18396@pglaf.org>
	<1152153974.44ac7976d0348@webmail.netspace.net.au>
	<20060706043334.GC20863@joeysmith.com>
	<1152160828.44ac943c29349@webmail.netspace.net.au>
Message-ID: <20060706052027.GE20863@joeysmith.com>

You're still getting a connection to the old server. The new server
is 208.99.202.194

On Thu, Jul 06, 2006 at 02:40:28PM +1000, rnmscott@netspace.net.au wrote:
>  
> PING www.worldebookfair.com (72.235.235.66) 56(84) bytes of data 
> 64 bytes from 72.235.235.66: icmp_seq=1 ttl=110 time=3769 ms 
> 64 bytes from 72.235.235.66: icmp_seq=2 ttl=110 time=3566 ms 
> 64 bytes from 72.235.235.66: icmp_seq=3 ttl=110 time=3628 ms 
> 64 bytes from 72.235.235.66: icmp_seq=4 ttl=110 time=3427 ms 
> 64 bytes from 72.235.235.66: icmp_seq=5 ttl=110 time=3501 ms 
>  
> Quoting joey <joey@joeysmith.com>: 
>  
> > Can you check which IP address your machine resolves www.worldebookfair.com 
> > to? I'm getting 2Mb/s from 208.99.202.194 (the readingroo.ms server). 
> Perhaps 
> > your DNS simply hasn't updated, or maybe there's congestion between you and 
> > readingroo.ms, but I'd like to know before I try adding some of the rate 
> > limiting stuff Greg has asked me to look into. 
> >  
From gbnewby at pglaf.org  Wed Jul  5 23:31:54 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Jul  5 23:31:56 2006
Subject: [gutvol-d] Pls. test worldebookfair.com
In-Reply-To: <20060706052027.GE20863@joeysmith.com>
References: <20060706002028.GB18396@pglaf.org>
	<1152153974.44ac7976d0348@webmail.netspace.net.au>
	<20060706043334.GC20863@joeysmith.com>
	<1152160828.44ac943c29349@webmail.netspace.net.au>
	<20060706052027.GE20863@joeysmith.com>
Message-ID: <20060706063154.GB24389@pglaf.org>

On Wed, Jul 05, 2006 at 11:20:27PM -0600, joey wrote:
> You're still getting a connection to the old server. The new server
> is 208.99.202.194

Right.  I changed the network TTL (the time before
a cached IP address expires) from 1 day to 1 hour, so 
further changes will propagate faster.  Depending on what
network connection you're using, you might be able to 
force a cache reload (rebooting your system often works).

During my testing earlier, I had it pushing 50Mbp/s.  It's
been averaging about 8Mbps all day.  
  -- Greg

> On Thu, Jul 06, 2006 at 02:40:28PM +1000, rnmscott@netspace.net.au wrote:
> >  
> > PING www.worldebookfair.com (72.235.235.66) 56(84) bytes of data 
> > 64 bytes from 72.235.235.66: icmp_seq=1 ttl=110 time=3769 ms 
> > 64 bytes from 72.235.235.66: icmp_seq=2 ttl=110 time=3566 ms 
> > 64 bytes from 72.235.235.66: icmp_seq=3 ttl=110 time=3628 ms 
> > 64 bytes from 72.235.235.66: icmp_seq=4 ttl=110 time=3427 ms 
> > 64 bytes from 72.235.235.66: icmp_seq=5 ttl=110 time=3501 ms 
> >  
> > Quoting joey <joey@joeysmith.com>: 
> >  
> > > Can you check which IP address your machine resolves www.worldebookfair.com 
> > > to? I'm getting 2Mb/s from 208.99.202.194 (the readingroo.ms server). 
> > Perhaps 
> > > your DNS simply hasn't updated, or maybe there's congestion between you and 
> > > readingroo.ms, but I'd like to know before I try adding some of the rate 
> > > limiting stuff Greg has asked me to look into. 
> > >  
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From sly at victoria.tc.ca  Wed Jul  5 23:34:14 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Jul  5 23:34:18 2006
Subject: [gutvol-d] Looking for feedback on color image files in html book.
Message-ID: <Pine.GSO.4.58.0607052320170.8450@vtn1.victoria.tc.ca>


I have an 1875 picture book to prepare for PG, with 36
pages of color illustrations. There is not much text
to go with these illustrations. (The first section is
an alphabet, followed by a couple fairy-tale-type
stories based on nursery rhymes.)

The scans I've made are 8mb each, and when I crop them
and convert to png, I can get each one down to just over
a megabyte. However, from reading the PG guidelines and
looking at a number of other example PG postings, this
is still much too large for the purpose of an easily
downloadable html file. So, what seems to make sense
is to scale down and make some jpg images that would
fit better.

It would be nice though, to have somewhere to preserve
the high-resolution images, even if not at PG.

You can see a rough draft, with the first six images included,
at: http://www.victoria.tc.ca/~sly/pb.htm

Any comments would be welcome...

Andrew
From jon.ingram at gmail.com  Thu Jul  6 00:31:03 2006
From: jon.ingram at gmail.com (Jon Ingram)
Date: Thu Jul  6 00:31:05 2006
Subject: [gutvol-d] Looking for feedback on color image files in html book.
In-Reply-To: <Pine.GSO.4.58.0607052320170.8450@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0607052320170.8450@vtn1.victoria.tc.ca>
Message-ID: <4baf53720607060031s7f5fbda2md6a23cb9abd4c18b@mail.gmail.com>

On 7/6/06, Andrew Sly <sly@victoria.tc.ca> wrote:
>
> I have an 1875 picture book to prepare for PG, with 36
> pages of color illustrations. There is not much text
> to go with these illustrations. (The first section is
> an alphabet, followed by a couple fairy-tale-type
> stories based on nursery rhymes.)
>
> The scans I've made are 8mb each, and when I crop them
> and convert to png, I can get each one down to just over
> a megabyte. However, from reading the PG guidelines and
> looking at a number of other example PG postings, this
> is still much too large for the purpose of an easily
> downloadable html file. So, what seems to make sense
> is to scale down and make some jpg images that would
> fit better.
>
> It would be nice though, to have somewhere to preserve
> the high-resolution images, even if not at PG.
>
> You can see a rough draft, with the first six images included,
> at: http://www.victoria.tc.ca/~sly/pb.htm

You can display the lower resolution versions in the main page, and
link each image to a high resolution version. We do a similar thing
with the illustrated magazines we put through DP.

-- 
Jon Ingram
From tony at baechler.net  Thu Jul  6 00:53:54 2006
From: tony at baechler.net (Tony Baechler)
Date: Thu Jul  6 00:53:39 2006
Subject: [gutvol-d] New DVD ISO to try
In-Reply-To: <000201c69f77$af35a190$0132a8c0@blackbox>
References: <20060626093237.GA27369@pglaf.org>
	<7.0.1.0.2.20060628084333.0426a7b0@baechler.net>
	<20060704084420.GA11229@pglaf.org>
	<000201c69f77$af35a190$0132a8c0@blackbox>
Message-ID: <7.0.1.0.2.20060706004856.033c04a0@baechler.net>


Hi, I'm really surprised at this comment.  I admit that huge html 
pages take some time to load into the buffer, but I've never had a 
problem with them regardless of size in most cases.  For older 
systems, I recommend Lynx for DOS or Linux.  It is text-based but 
that shouldn't impose a problem.  It has a free license so binaries 
could be distributed on the DVD.  As far as graphical browsers, again 
I've never had a problem with huge html pages regardless of size and 
screen reader.  I'm not sure that older systems will have a problem 
either.  I'll have to actually look at the title index to be sure but 
I don't think a large html file should be considered a problem.  I 
use Window-Eyes 5.5.  You can contact me off list if you want since 
I'm not sure how your screen reader, at least nowadays, could be an 
issue.  If it was several years ago, I would agree with the screen 
reader issue.

At 09:39 AM 7/4/06 -0500, you wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>This looks great.  One thing you could change would be to redo the title
>index and distribute all of the "the" titles to their proper places in the
>lists.  The reason is that the t index for titles is huge.  Either that or
>split it.  HTML files that are too large cause problems for my screen
>reader, and I imagine that they might for some older systems as well, but I
>could be wrong.


-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.1.394 / Virus Database: 268.9.9/382 - Release Date: 7/4/06


From tony at baechler.net  Thu Jul  6 01:03:46 2006
From: tony at baechler.net (Tony Baechler)
Date: Thu Jul  6 01:03:31 2006
Subject: [gutvol-d] Pls. test worldebookfair.com
In-Reply-To: <20060706002028.GB18396@pglaf.org>
References: <20060706002028.GB18396@pglaf.org>
Message-ID: <7.0.1.0.2.20060706005451.033c4650@baechler.net>

Yes, I noticed the slowness.  It seems much better now.  I have a 
question though.  It says that you can download all ebooks.  I don't 
care about many of them but I would like to grab at least a few 
hundred if not a few thousands.  How?  Do I really have to 
individually download every single pdf file by hand?  I don't expect 
a nice ftp/rsync/http directory listing, but it would at least be 
nice if all the titles from a certain collection could be on one 
search page.  If you use the search form, you only get the first 10 
results.  The "browse collections" page only shows a random sampling 
of titles from any particular collection.  Also, the numbers are 
wrong or I'm doing something wrong.  One figure shows about 250,000 
pdf files, another shows 330,000 depending on whether you search or 
not.  The Census page shows about 30,000 pdf files but the search 
shows about 52,000.  I tried the advanced search but that seems to 
only be a help document unless I did something wrong.

I freely admit that I'm missing something here.  What am I 
missing?  Should I be using a more specific search syntax to get what 
I want, i.e. all books from one collection on a page?  Is there a way 
to show 50 results instead of 10?  Also, what about the missing 
files?  I looked at some rocketry links on the NASA collection and 
got error 404.  Where are they?  Other than security, is there any 
reason to not allow raw directory lists?  That would make downloading 
much easier.  With the Baen books, how do I find titles?  The page 
only lists ISBNs and authors but no titles except for mp3 
samples.  Finally, are any of these going to eventually make it to 
the main PG site?  Some are public domain and there is no reason why 
they can't be part of PG except for possible layout and pdf issues.

At 05:20 PM 7/5/06 -0700, you wrote:
>http://www.worldebookfair.com
>
>It was on an overloaded network connection earlier, but
>we moved it this (Wednesday) morning and the site seems to be
>performing well.
>
>Take a look - it's pretty neat!
>
>There are a few missing files & broken links, but
>for the most part things seem OK.
>   -- Greg


-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.1.394 / Virus Database: 268.9.9/382 - Release Date: 7/4/06


From cannona at fireantproductions.com  Thu Jul  6 05:52:34 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu Jul  6 05:52:57 2006
Subject: [gutvol-d] New DVD ISO to try
References: <20060626093237.GA27369@pglaf.org><7.0.1.0.2.20060628084333.0426a7b0@baechler.net><20060704084420.GA11229@pglaf.org><000201c69f77$af35a190$0132a8c0@blackbox>
	<7.0.1.0.2.20060706004856.033c04a0@baechler.net>
Message-ID: <000c01c6a0fb$1b23e310$0132a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Interesting.  Perhaps it's a Jaws thing, as it's done so on many systems
over the years.

By the way, it's the T title list.

Sincerely
Aaron Cannon

- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)
- ----- Original Message -----
From: "Tony Baechler" <tony@baechler.net>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Thursday, July 06, 2006 2:53 AM
Subject: Re: [gutvol-d] New DVD ISO to try


>
>
> Hi, I'm really surprised at this comment.  I admit that huge html pages
> take some time to load into the buffer, but I've never had a problem with
> them regardless of size in most cases.  For older systems, I recommend
> Lynx for DOS or Linux.  It is text-based but that shouldn't impose a
> problem.  It has a free license so binaries could be distributed on the
> DVD.  As far as graphical browsers, again I've never had a problem with
> huge html pages regardless of size and screen reader.  I'm not sure that
> older systems will have a problem either.  I'll have to actually look at
> the title index to be sure but I don't think a large html file should be
> considered a problem.  I use Window-Eyes 5.5.  You can contact me off list
> if you want since I'm not sure how your screen reader, at least nowadays,
> could be an issue.  If it was several years ago, I would agree with the
> screen reader issue.
>
> At 09:39 AM 7/4/06 -0500, you wrote:
>>-----BEGIN PGP SIGNED MESSAGE-----
>>Hash: SHA1
>>
>>This looks great.  One thing you could change would be to redo the title
>>index and distribute all of the "the" titles to their proper places in the
>>lists.  The reason is that the t index for titles is huge.  Either that or
>>split it.  HTML files that are too large cause problems for my screen
>>reader, and I imagine that they might for some older systems as well, but
>>I
>>could be wrong.
>
>
> --
> No virus found in this outgoing message.
> Checked by AVG Anti-Virus.
> Version: 7.1.394 / Virus Database: 268.9.9/382 - Release Date: 7/4/06
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFErQepI7J99hVZuJcRAtdFAKCuueilBp8JK4BdD8NolCn212tNRACgnjZR
eBXfuMq+L50Q4JRfBwqwpfA=
=Kc6G
-----END PGP SIGNATURE-----

From gbnewby at pglaf.org  Thu Jul  6 09:39:24 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Jul  6 09:39:25 2006
Subject: [gutvol-d] Pls. test worldebookfair.com
In-Reply-To: <7.0.1.0.2.20060706005451.033c4650@baechler.net>
References: <20060706002028.GB18396@pglaf.org>
	<7.0.1.0.2.20060706005451.033c4650@baechler.net>
Message-ID: <20060706163924.GB1852@pglaf.org>

On Thu, Jul 06, 2006 at 01:03:46AM -0700, Tony Baechler wrote:
> Yes, I noticed the slowness.  It seems much better now.  I have a 
> question though.  It says that you can download all ebooks.  I don't 
> care about many of them but I would like to grab at least a few 
> hundred if not a few thousands.  How?  Do I really have to 

Tony, please send WEF questions directly to John, cc'd or
  John Guagliardo <john@guagliardo.cc>

Most of the collections (though not all) do not have
access except via search.


> individually download every single pdf file by hand?  I don't expect 
> a nice ftp/rsync/http directory listing, but it would at least be 
> nice if all the titles from a certain collection could be on one 
> search page.  If you use the search form, you only get the first 10 
> results.  The "browse collections" page only shows a random sampling 
> of titles from any particular collection.  Also, the numbers are 
> wrong or I'm doing something wrong.  One figure shows about 250,000 
> pdf files, another shows 330,000 depending on whether you search or 
> not.  The Census page shows about 30,000 pdf files but the search 
> shows about 52,000.  I tried the advanced search but that seems to 
> only be a help document unless I did something wrong.

There are about 330,000 files; we talk about 250,000 (1/4
million) to take overlap into account.

I don't know about the Census docs,

You're right that "Advanced Search" really just gives help.

> I freely admit that I'm missing something here.  What am I 
> missing?  Should I be using a more specific search syntax to get what 
> I want, i.e. all books from one collection on a page?  Is there a way 
> to show 50 results instead of 10?  Also, what about the missing 
> files?  I looked at some rocketry links on the NASA collection and 
> got error 404.  Where are they?  Other than security, is there any 

There are two main sources of 404s now:
1) Some files are case-sensitive (many came from a Windoze system).
   We're working on this.

2) A few collections are still being loaded into the different
   servers.  We're working on this, too.

You can email John or me specific failed filenames, and I can try to 
locate them.  That's something I can do.

> reason to not allow raw directory lists?  That would make downloading 
> much easier.  With the Baen books, how do I find titles?  The page 
> only lists ISBNs and authors but no titles except for mp3 
> samples.  Finally, are any of these going to eventually make it to 
> the main PG site?  Some are public domain and there is no reason why 
> they can't be part of PG except for possible layout and pdf issues.

Nothing will make it to the main PG site without "someone" doing
the work!  But most of the public domain content is already
on pgcc.net , so it's not going to go away after August 4.

I don't know about directory listings etc., those are questions
for John. 
  -- Greg


> At 05:20 PM 7/5/06 -0700, you wrote:
> >http://www.worldebookfair.com
> >
> >It was on an overloaded network connection earlier, but
> >we moved it this (Wednesday) morning and the site seems to be
> >performing well.
> >
> >Take a look - it's pretty neat!
> >
> >There are a few missing files & broken links, but
> >for the most part things seem OK.
> >  -- Greg
> 
> 
> -- 
> No virus found in this outgoing message.
> Checked by AVG Anti-Virus.
> Version: 7.1.394 / Virus Database: 268.9.9/382 - Release Date: 7/4/06
> 
From gbnewby at pglaf.org  Thu Jul  6 09:44:46 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Jul  6 09:44:47 2006
Subject: [gutvol-d] New DVD ISO to try
In-Reply-To: <7.0.1.0.2.20060706004856.033c04a0@baechler.net>
References: <20060626093237.GA27369@pglaf.org>
	<7.0.1.0.2.20060628084333.0426a7b0@baechler.net>
	<20060704084420.GA11229@pglaf.org>
	<000201c69f77$af35a190$0132a8c0@blackbox>
	<7.0.1.0.2.20060706004856.033c04a0@baechler.net>
Message-ID: <20060706164446.GD1852@pglaf.org>

1) Yes, I'll have a few different indexes, so they're not so
    uneven in size.  Also a "whole DVD" listing.

2) "The" as the first word in the title is an artifact of
    the back-end catalog.  These basically need to be fixed
    by hand.  (Yes, I'm sure some could be automated....  Marcello
    would like to hear your thoughts on this, I'm certain.)

I should have the "final" version up within 24 hours.  I said that
yesterday, then we had a big rainstorm and (maybe coincidentally) my
'net went out.
  -- Greg

On Thu, Jul 06, 2006 at 12:53:54AM -0700, Tony Baechler wrote:
> 
> 
> Hi, I'm really surprised at this comment.  I admit that huge html 
> pages take some time to load into the buffer, but I've never had a 
> problem with them regardless of size in most cases.  For older 
> systems, I recommend Lynx for DOS or Linux.  It is text-based but 
> that shouldn't impose a problem.  It has a free license so binaries 
> could be distributed on the DVD.  As far as graphical browsers, again 
> I've never had a problem with huge html pages regardless of size and 
> screen reader.  I'm not sure that older systems will have a problem 
> either.  I'll have to actually look at the title index to be sure but 
> I don't think a large html file should be considered a problem.  I 
> use Window-Eyes 5.5.  You can contact me off list if you want since 
> I'm not sure how your screen reader, at least nowadays, could be an 
> issue.  If it was several years ago, I would agree with the screen 
> reader issue.
> 
> At 09:39 AM 7/4/06 -0500, you wrote:
> >-----BEGIN PGP SIGNED MESSAGE-----
> >Hash: SHA1
> >
> >This looks great.  One thing you could change would be to redo the title
> >index and distribute all of the "the" titles to their proper places in the
> >lists.  The reason is that the t index for titles is huge.  Either that or
> >split it.  HTML files that are too large cause problems for my screen
> >reader, and I imagine that they might for some older systems as well, but I
> >could be wrong.
> 
> 
> -- 
> No virus found in this outgoing message.
> Checked by AVG Anti-Virus.
> Version: 7.1.394 / Virus Database: 268.9.9/382 - Release Date: 7/4/06
> 
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From gbnewby at pglaf.org  Thu Jul  6 09:56:20 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Jul  6 09:56:22 2006
Subject: [gutvol-d] Showcase eBook #18131
Message-ID: <20060706165620.GA2836@pglaf.org>

I have been perusing our collection for interesting stuff
to put on the new DVD image ("interesting" being as broadly
defined as possible).

Here's one that is pretty new, but I didn't notice it
earlier.  It includes MIDI files and sheet music, as part
of an eBook.  It's a great example, to me, of the type of
thing you can do with a computer-based eBook, but not so
easily with plain old paper.

The Rescue of the Princess Winsome, by Fellows-Johnston and Bacon        18131

http://www.gutenberg.org/etext/18131  (just view the HTML file, it 
links to the rest)

Wow!
  -- Greg

From ajhaines at shaw.ca  Thu Jul  6 10:19:09 2006
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Thu Jul  6 10:21:39 2006
Subject: [gutvol-d] Showcase eBook #18131
References: <20060706165620.GA2836@pglaf.org>
Message-ID: <000501c6a120$4b0df870$6401a8c0@ahainesp2400>

There seems to be a problem with the book's HTML version.  If you look at 
the first stage direction, just before the first speech by Ogre, the first 
few characters of the direction have been chopped off.  It looks like this 
has happened with all stage directions where the first line is left indented 
relative to the rest of the direction (a hanging indent).

This is with Internet Explorer V6.  Mozilla Firefox (V 1.5) displays the 
stage directions properly.

Al

----- Original Message ----- 
From: "Greg Newby" <gbnewby@pglaf.org>
To: <gutvol-d@lists.pglaf.org>
Sent: Thursday, July 06, 2006 9:56 AM
Subject: [gutvol-d] Showcase eBook #18131


>I have been perusing our collection for interesting stuff
> to put on the new DVD image ("interesting" being as broadly
> defined as possible).
>
> Here's one that is pretty new, but I didn't notice it
> earlier.  It includes MIDI files and sheet music, as part
> of an eBook.  It's a great example, to me, of the type of
> thing you can do with a computer-based eBook, but not so
> easily with plain old paper.
>
> The Rescue of the Princess Winsome, by Fellows-Johnston and Bacon 
> 18131
>
> http://www.gutenberg.org/etext/18131  (just view the HTML file, it
> links to the rest)
>
> Wow!
>  -- Greg
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 


From sly at victoria.tc.ca  Thu Jul  6 11:31:29 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Jul  6 11:31:33 2006
Subject: [gutvol-d] Pls. test worldebookfair.com
In-Reply-To: <20060706163924.GB1852@pglaf.org>
References: <20060706002028.GB18396@pglaf.org>
	<7.0.1.0.2.20060706005451.033c4650@baechler.net>
	<20060706163924.GB1852@pglaf.org>
Message-ID: <Pine.GSO.4.58.0607061121220.5468@vtn1.victoria.tc.ca>


As I am looking up authors etc. for the PG online catalog,
and just generally browsing, I seem to be constantly running
into more websites that have transcribed material that could
be added to PG. With a little effort, I could make a list for
you of dozens of sites, with thousands of books that could be
adapted. However, getting copyright clearance, and reformatting
these is perhaps not as "glamarous" as Distributed Proofreading,
so it does not attract as many people. :)

Though I already have too many different PG projects I'm in
the middle of, I would be willing to help if you'd like to
start processing some of these texts.

Andrew

On Thu, 6 Jul 2006, Greg Newby wrote:

> On Thu, Jul 06, 2006 at 01:03:46AM -0700, Tony Baechler wrote:
> > samples.  Finally, are any of these going to eventually make it to
> > the main PG site?  Some are public domain and there is no reason why
> > they can't be part of PG except for possible layout and pdf issues.
>
> Nothing will make it to the main PG site without "someone" doing
> the work!  But most of the public domain content is already
> on pgcc.net , so it's not going to go away after August 4.
>
From sly at victoria.tc.ca  Thu Jul  6 11:47:53 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Jul  6 11:47:56 2006
Subject: [gutvol-d] New DVD ISO to try
In-Reply-To: <20060706164446.GD1852@pglaf.org>
References: <20060626093237.GA27369@pglaf.org>
	<7.0.1.0.2.20060628084333.0426a7b0@baechler.net>
	<20060704084420.GA11229@pglaf.org>
	<000201c69f77$af35a190$0132a8c0@blackbox>
	<7.0.1.0.2.20060706004856.033c04a0@baechler.net>
	<20060706164446.GD1852@pglaf.org>
Message-ID: <Pine.GSO.4.58.0607061132310.5468@vtn1.victoria.tc.ca>


On Thu, 6 Jul 2006, Greg Newby wrote:

> 2) "The" as the first word in the title is an artifact of
>     the back-end catalog.  These basically need to be fixed
>     by hand.  (Yes, I'm sure some could be automated....  Marcello
>     would like to hear your thoughts on this, I'm certain.)

If you are taking your information from the PG online catalog,
there should be a field for "non-filing characters" for each
title, etc. which indicates how many characters to ignore for
sorting purposes. This is used for initial articles.
For example the title "The Adventures of Billy"
would be marked as having 4 non-filing characters; the title
"An adventure with Billy", 3; "Das Leben Billys", 4;
"A Long day with Billy", 2; "Les Amours de Billie", 4;
"La Maraj Vojagxoj de Bilio", 3.

Right now, as new titles added, the common
initial articles for German and English are
looked after automatically. I've dealt with
many of the French ones manually.

Hope this helps,
Andrew
From Bowerbird at aol.com  Thu Jul  6 12:01:20 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Jul  6 12:01:38 2006
Subject: [gutvol-d] Pls. test worldebookfair.com
Message-ID: <542.2785522.31deb800@aol.com>

andrew said:
>    With a little effort, I could make a list for you 
>    of dozens of sites, with thousands of books 
>    that could be adapted.

the real work, of course, is doing the "adapting".

but even if p.g. doesn't want the list you'd make
-- and i can't see why they would turn it down --
i'm sure plenty of people would find it useful...

so go ahead!          :+)

i'd suggest a wiki, though, so other people could
augment it.   you can create a free wiki over here:
>    http://pbwiki.com

-bowerbird

p.s.   that's the first time i've heard d.p. work called
"glamorous"...             :+)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060706/7500da31/attachment.html
From Bowerbird at aol.com  Thu Jul  6 12:08:23 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Jul  6 12:08:31 2006
Subject: [gutvol-d] =?iso-8859-1?q?re=3A_the=A0only_literature_people_car?=
 =?iso-8859-1?q?e_enough_about_to_steal?=
Message-ID: <549.229560e.31deb9a7@aol.com>

cory doctorow said:
>    science fiction is the?only literature people 
>    care enough about to steal on the Internet.?
>    It's the only literature that regularly shows up, 
>    scanned and run through optical character recognition 
>    software and lovingly hand-edited on darknet newsgroups, 
>    Russian websites, IRC channels and?elsewhere.

you can find the whole article at:
>    http://www.locusmag.com/2006/Issues/07DoctorowCommentary.html

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060706/7eab5e86/attachment.html
From cannona at fireantproductions.com  Thu Jul  6 13:34:15 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu Jul  6 13:44:31 2006
Subject: [gutvol-d] New DVD ISO to try
References: <20060626093237.GA27369@pglaf.org><7.0.1.0.2.20060628084333.0426a7b0@baechler.net><20060704084420.GA11229@pglaf.org><000201c69f77$af35a190$0132a8c0@blackbox><7.0.1.0.2.20060706004856.033c04a0@baechler.net><20060706164446.GD1852@pglaf.org>
	<Pine.GSO.4.58.0607061132310.5468@vtn1.victoria.tc.ca>
Message-ID: <007b01c6a13c$f03df5e0$0132a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hmmm... I wonder if these are included in the RDF output, as that is what
the DVD creation system uses.

Sincerely
Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)
- ----- Original Message -----
From: "Andrew Sly" <sly@victoria.tc.ca>
To: <gbnewby@pglaf.org>; "Project Gutenberg Volunteer Discussion"
<gutvol-d@pglaf.org>
Sent: Thursday, July 06, 2006 1:47 PM
Subject: Re: [gutvol-d] New DVD ISO to try


>
>
> On Thu, 6 Jul 2006, Greg Newby wrote:
>
>> 2) "The" as the first word in the title is an artifact of
>>     the back-end catalog.  These basically need to be fixed
>>     by hand.  (Yes, I'm sure some could be automated....  Marcello
>>     would like to hear your thoughts on this, I'm certain.)
>
> If you are taking your information from the PG online catalog,
> there should be a field for "non-filing characters" for each
> title, etc. which indicates how many characters to ignore for
> sorting purposes. This is used for initial articles.
> For example the title "The Adventures of Billy"
> would be marked as having 4 non-filing characters; the title
> "An adventure with Billy", 3; "Das Leben Billys", 4;
> "A Long day with Billy", 2; "Les Amours de Billie", 4;
> "La Maraj Vojagxoj de Bilio", 3.
>
> Right now, as new titles added, the common
> initial articles for German and English are
> looked after automatically. I've dealt with
> many of the French ones manually.
>
> Hope this helps,
> Andrew
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFErXY1I7J99hVZuJcRAoXOAKDrOaVyJxlMAl1nXdGJhThVO4mGAwCfcxwF
l9NMGtWuve+YIXDRJ2aBk5Q=
=Z/MM
-----END PGP SIGNATURE-----

From prosfilaes at gmail.com  Thu Jul  6 13:52:04 2006
From: prosfilaes at gmail.com (David Starner)
Date: Thu Jul  6 13:52:12 2006
Subject: [gutvol-d] Pls. test worldebookfair.com
In-Reply-To: <Pine.GSO.4.58.0607061121220.5468@vtn1.victoria.tc.ca>
References: <20060706002028.GB18396@pglaf.org>
	<7.0.1.0.2.20060706005451.033c4650@baechler.net>
	<20060706163924.GB1852@pglaf.org>
	<Pine.GSO.4.58.0607061121220.5468@vtn1.victoria.tc.ca>
Message-ID: <6d99d1fd0607061352p20f2cc31v4f8da61d69d131ff@mail.gmail.com>

On 7/6/06, Andrew Sly <sly@victoria.tc.ca> wrote:
> However, getting copyright clearance, and reformatting
> these is perhaps not as "glamarous" as Distributed Proofreading,
> so it does not attract as many people. :)

It's not just glamarous, it's hard. You have to go through all the
work of finding a specific edition that may not be well-identified in
the ebook. You have to dump the text in such a way that doesn't lose
all the formatting information, which may range from easy to hard, but
will certainly require custom code and massaging. You have to work
with a text that is unlikely to be the quality of what DP can produce
after five rounds, and could turn out to be pretty bad. And it
requires some tedious comparison. I'd actually rather rescan and
reprocess and compare after a lot of times rather than try and
reformat existing material.

If we can't get information for clearing from the source, or at least
handle them as a group, they're pretty hard to do.
From joshua at hutchinson.net  Thu Jul  6 14:00:06 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Jul  6 14:00:11 2006
Subject: [gutvol-d] Pls. test worldebookfair.com
Message-ID: <20060706210006.CF24DEE6F6@ws6-1.us4.outblaze.com>

I'm currently finishing up a raid of the English texts on reference.bahai.org (I'm scared of the Arabic and Persian originals/translations ;) ).  When I'm done with that, I'd love to have a new site to raid.  If you have a site or two you want to send my way, I'll look into clearances and raiding their text.

Josh

> ----- Original Message -----
> From: "Andrew Sly" <sly@victoria.tc.ca>
> 
> As I am looking up authors etc. for the PG online catalog,
> and just generally browsing, I seem to be constantly running
> into more websites that have transcribed material that could
> be added to PG. With a little effort, I could make a list for
> you of dozens of sites, with thousands of books that could be
> adapted. However, getting copyright clearance, and reformatting
> these is perhaps not as "glamarous" as Distributed Proofreading,
> so it does not attract as many people. :)
> 
> Though I already have too many different PG projects I'm in
> the middle of, I would be willing to help if you'd like to
> start processing some of these texts.
> 
> Andrew
> 
From phil at thalasson.com  Thu Jul  6 15:36:50 2006
From: phil at thalasson.com (Philip Baker)
Date: Thu Jul  6 15:40:31 2006
Subject: [gutvol-d] Review  of "The Wealth of Networks" in the TLS
Message-ID: <Oc7rFBACCZrEFwj8@thalasson.com>


This week's Times Literary Supplement has a review of "The Wealth of
Networks" by Yochai Benkler. The book mentions Project Gutenberg. Of more
immediate interest is what the reviewer has to say about Project Gutenberg.
The relevant part of the review is quoted below. The reviewer is Paul Duguid,
Visiting Professor at the School of Information and Management Systems at the
University of California, Berkeley. 

  'Given their openness, both Project Gutenberg and Wikipedia are 
   surprisingly good and unsurprisingly bad. Some thirty years in the  
   making, Gutenberg offers about 17,000 "etexts". Many seem unexceptional, 
   but for some the need to avoid copyright entanglements has led 
   contributors to resurrect editions that were better left buried. Its 
   version of Pan, the novel by Nobel-Prizewinner Knut Hamsun, for example, 
   puts William Wurster's ridiculously prudish translation of 1921 before 
   unsuspecting readers. Relying on a communications medium admired for its 
   ability to "route around censorship", yet driven by a certain contempt for 
   scholarship, Project Gutenberg threatens to make a number of poor editions 
   - some bowdlerized, some originally corrupt, and some newly corrupted for 
   the new medium - the internet standard.'

-- 
Philip Baker
From Bowerbird at aol.com  Thu Jul  6 16:23:51 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Jul  6 16:23:58 2006
Subject: =?ISO-8859-1?Q?re:=20[gutvol-d]=20Review=A0=20of=20"The=20Wealth?=
	=?ISO-8859-1?Q?=20of=20Networks"=20in=20the=20TLS?=
Message-ID: <494.4f728c8.31def587@aol.com>

professor duguid ("do good"?) should digitize
_his_ choice of the "best" version of that book,
and donate it to project gutenberg.   tell him so.

then p.g. can make his choice "the internet standard".           ;+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060706/6f3be59e/attachment.html
From sly at victoria.tc.ca  Thu Jul  6 16:41:30 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Jul  6 16:41:32 2006
Subject: [gutvol-d] Adapting texts from other collections
In-Reply-To: <20060706210006.CF24DEE6F6@ws6-1.us4.outblaze.com>
References: <20060706210006.CF24DEE6F6@ws6-1.us4.outblaze.com>
Message-ID: <Pine.GSO.4.58.0607061640390.22122@vtn1.victoria.tc.ca>


I'll reply to three different fellow PG'ers here...

bowerbird said:

>but even if p.g. doesn't want the list you'd make
>-- and i can't see why they would turn it down --
>i'm sure plenty of people would find it useful...
>
>so go ahead! :+)
>
>i'd suggest a wiki, though, so other people could
>augment it.

My thoughts exactly! (Is it a bad sign if I'm thinking
the same as bowerbird?) Actually, there is a wiki on the
PG website now, as Marcello announced here not long ago,
and this is one of the I've had that I've put on a list
of possible uses to make of it.

David Starner said:

>It's not just glamarous, it's hard. You have to go through all the
>work of finding a specific edition that may not be well-identified in
>the ebook. You have to dump the text in such a way that doesn't lose
>all the formatting information, which may range from easy to hard, but
>will certainly require custom code and massaging. You have to work
>with a text that is unlikely to be the quality of what DP can produce
>after five rounds, and could turn out to be pretty bad. And it
>requires some tedious comparison. I'd actually rather rescan and
>reprocess and compare after a lot of times rather than try and
>reformat existing material.

You're right. I've gone through a process like this
perhaps 25 times in reformatting a text for PG, and
I think almost every time it's ended up being a bigger
job than I had intended. And yet, I keep seeing more
material that has had a lot of effort put into it
already, and may disappear some year, if not put into
PG.

Joshua Hutchinson said:

>I'm currently finishing up a raid of the English texts on
reference.bahai.org
>(I'm scared of the Arabic and Persian originals/translations ;) ).  When
I'm
>done with that, I'd love to have a new site to raid.  If you have a site
or two
>you want to send my way, I'll look into clearances and raiding their
text.

How about some Russian? <g> Now that I'm able to recognize
letters of the cyrillic alphabet, I've discussed with a
Russian-speaking DP volunteer the possibility of adapting
some of the classics texts from from lib.ru
(My local university library has a surprising amount
of pre-1923 volumes in Russian, including complete
Pushkin, Lermontov, Gogol, et al.)

Andrew

From Bowerbird at aol.com  Thu Jul  6 16:44:37 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Jul  6 16:44:43 2006
Subject: [gutvol-d] re: a bad sign
Message-ID: <426.5b00444.31defa65@aol.com>

andrew said:
>    (Is it a bad sign if I'm thinking the same as bowerbird?)

you betcha.   check yourself in to the psych ward tomorrow.         ;+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060706/399a9f16/attachment.html
From joshua at hutchinson.net  Thu Jul  6 18:53:22 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Jul  6 18:53:26 2006
Subject: [gutvol-d] Adapting texts from other collections
Message-ID: <20060707015322.695BCDA59F@ws6-6.us4.outblaze.com>


> ----- Original Message -----
> From: "Andrew Sly" <sly@victoria.tc.ca>
> 
> How about some Russian? <g> Now that I'm able to recognize
> letters of the cyrillic alphabet, I've discussed with a
> Russian-speaking DP volunteer the possibility of adapting
> some of the classics texts from from lib.ru
> (My local university library has a surprising amount
> of pre-1923 volumes in Russian, including complete
> Pushkin, Lermontov, Gogol, et al.)
> 

Actually, as long as I don't have to doing any spell-checking, I'm good!  :)

It wasn't so much the foreign language aspects of the Arabic and Persian texts that scared me ... It was the fact the text flows from left to right and doesn't seem to have the same paragraph type structure that I'm used to in European languages.  I imagine Chinese and Japanese texts would scare me equally!  :)

Josh
From Bowerbird at aol.com  Thu Jul  6 23:04:41 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Jul  6 23:04:53 2006
Subject: [gutvol-d] the long tail, v2006
Message-ID: <576.310e96.31df5379@aol.com>


see chris anderson's 2006 take on the long tail:
>    http://www.wired.com/wired/archive/14.07/longtail.html

this article is just _dripping_ with juicy quotes, info, and insight.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060707/2cdb14a4/attachment.html
From jon.ingram at gmail.com  Fri Jul  7 00:22:57 2006
From: jon.ingram at gmail.com (Jon Ingram)
Date: Fri Jul  7 00:23:00 2006
Subject: [gutvol-d] re: the only literature people care enough about to
	steal
Message-ID: <4baf53720607070022ha442e4eof687c6fdd389258b@mail.gmail.com>

On 7/6/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
> cory doctorow said:
> >   science fiction is the only literature people
> >   care enough about to steal on the Internet.
> >   It's the only literature that regularly shows up,
> >   scanned and run through optical character recognition
> >   software and lovingly hand-edited on darknet newsgroups,
> >   Russian websites, IRC channels and elsewhere.
>
> you can find the whole article at:
> >
> http://www.locusmag.com/2006/Issues/07DoctorowCommentary.html

An odd article, because the title really has nothing to do with the
rest of the text. In a way this is good, because the title is
incorrect. There's plenty of non-Science Fiction being 'stolen' on the
internet, all the way from technical manuals, scans of magazines (both
porn and non-porn), and non-fiction, to literary fiction. It's
probably true that SciFi is over-represented, but that's a more subtle
and interesting point.

-- 
Jon Ingram
From JBuck814366460 at aol.com  Fri Jul  7 02:13:30 2006
From: JBuck814366460 at aol.com (Jared Buck)
Date: Fri Jul  7 02:13:32 2006
Subject: [gutvol-d] re: the only literature people care enough about to
	steal
In-Reply-To: <4baf53720607070022ha442e4eof687c6fdd389258b@mail.gmail.com>
References: <4baf53720607070022ha442e4eof687c6fdd389258b@mail.gmail.com>
Message-ID: <44AE25BA.10003@aol.com>

I've spoken to Dr. Doctorow on a number of occasions online and i really 
like his writing. Haven't read the article yet but i plan to soon.

Jared

Jon Ingram wrote on 07/07/2006, 12:22 AM:

 > On 7/6/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
 > > cory doctorow said:
 > > >   science fiction is the only literature people
 > > >   care enough about to steal on the Internet.
 > > >   It's the only literature that regularly shows up,
 > > >   scanned and run through optical character recognition
 > > >   software and lovingly hand-edited on darknet newsgroups,
 > > >   Russian websites, IRC channels and elsewhere.
 > >
 > > you can find the whole article at:
 > > >
 > > http://www.locusmag.com/2006/Issues/07DoctorowCommentary.html
 >
 > An odd article, because the title really has nothing to do with the
 > rest of the text. In a way this is good, because the title is
 > incorrect. There's plenty of non-Science Fiction being 'stolen' on the
 > internet, all the way from technical manuals, scans of magazines (both
 > porn and non-porn), and non-fiction, to literary fiction. It's
 > probably true that SciFi is over-represented, but that's a more subtle
 > and interesting point.
 >
 > --
 > Jon Ingram
 > _______________________________________________
 > gutvol-d mailing list
 > gutvol-d@lists.pglaf.org
 > http://lists.pglaf.org/listinfo.cgi/gutvol-d
 >

-- 


From gbnewby at pglaf.org  Fri Jul  7 03:30:44 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Jul  7 03:30:45 2006
Subject: [gutvol-d] DVD: last check
Message-ID: <20060707103044.GA21769@pglaf.org>

I have not confirmed this will actually fit on a DVD...that's
for tomorrow.  The ISO will be done in a few minutes, but
you can also browse the DVD online:

  http://snowy.arsc.alaska.edu/gbn/pgimages/jul06special/index.htm

  http://snowy.arsc.alaska.edu/gbn/pgimages/jul06special.iso

There were good suggestions, and I think I managed to act
on all of them.  Thanks for all your thoughts on this...
  -- Greg

From cannona at fireantproductions.com  Fri Jul  7 06:26:41 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Fri Jul  7 06:27:06 2006
Subject: [gutvol-d] DVD: last check
References: <20060707103044.GA21769@pglaf.org>
Message-ID: <000301c6a1c8$fee501b0$0132a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You'll need to edit the autorun.inf file.  Since you changed from index.html
to .htm, you'll need to change the autorun.inf file as well.  Just change
the .html to .htm and it will work fine.

Also, it should fit with no problems.  I believe you even have about 350 MB
left over.

You might think about changing the label of the disc to something which
mentions PG, if only in abbreviation.  I think the original DVD was PGDVD or
some such, and the CD was pg-2003-08 I believe.  Perhaps something like
PGDVD072006.  Just a thought, but not a big deal.

Finally, it might be cool if someone were to create an .ico file of the PG
logo, and that could be added to the autorun.inf file to give the disc an
icon under windows.  Totally without purpose, except that it might look
cool.

Sincerely
Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)
- ----- Original Message -----
From: "Greg Newby" <gbnewby@pglaf.org>
To: <gutvol-d@lists.pglaf.org>
Sent: Friday, July 07, 2006 5:30 AM
Subject: [gutvol-d] DVD: last check


>I have not confirmed this will actually fit on a DVD...that's
> for tomorrow.  The ISO will be done in a few minutes, but
> you can also browse the DVD online:
>
>  http://snowy.arsc.alaska.edu/gbn/pgimages/jul06special/index.htm
>
>  http://snowy.arsc.alaska.edu/gbn/pgimages/jul06special.iso
>
> There were good suggestions, and I think I managed to act
> on all of them.  Thanks for all your thoughts on this...
>  -- Greg
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFErmExI7J99hVZuJcRAgzBAJ9QZUyV/C3Pt6T1qZjr+RufwGoQEACfbCIP
4cWfIsIrgEDoE4fCJLzxY24=
=hff/
-----END PGP SIGNATURE-----

From donovan at abs.net  Fri Jul  7 06:54:34 2006
From: donovan at abs.net (D Garcia)
Date: Fri Jul  7 06:55:07 2006
Subject: [dp-pg] Re: [gutvol-d] DVD: last check
In-Reply-To: <000301c6a1c8$fee501b0$0132a8c0@blackbox>
References: <20060707103044.GA21769@pglaf.org>
	<000301c6a1c8$fee501b0$0132a8c0@blackbox>
Message-ID: <200607070954.34942.donovan@abs.net>

On Friday 07 July 2006 09:26 am, Aaron Cannon wrote:
> Finally, it might be cool if someone were to create an .ico file of the PG
> logo, and that could be added to the autorun.inf file to give the disc an
> icon under windows.  Totally without purpose, except that it might look
> cool.

Create?
The PG website icon favicon.ico (name might not be right, I have a cold and am 
working from memory ... the one that shows up in your browser) is an .ico 
file. Should be able to use it as-is.
From cannona at fireantproductions.com  Fri Jul  7 07:53:03 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Fri Jul  7 07:54:22 2006
Subject: [dp-pg] Re: [gutvol-d] DVD: last check
References: <20060707103044.GA21769@pglaf.org><000301c6a1c8$fee501b0$0132a8c0@blackbox>
	<200607070954.34942.donovan@abs.net>
Message-ID: <000601c6a1d5$20793c40$0132a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Right you are.  It's at http://www.gutenberg.org/favicon.ico .  All that
needs to be done is add it to the root directory of the disc and the line:
icon=favicon.ico
to the autorun file and it should work.

Sincerely
Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)
- ----- Original Message -----
From: "D Garcia" <donovan@abs.net>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@pglaf.org>
Sent: Friday, July 07, 2006 8:54 AM
Subject: Re: [dp-pg] Re: [gutvol-d] DVD: last check


> On Friday 07 July 2006 09:26 am, Aaron Cannon wrote:
>> Finally, it might be cool if someone were to create an .ico file of the
>> PG
>> logo, and that could be added to the autorun.inf file to give the disc an
>> icon under windows.  Totally without purpose, except that it might look
>> cool.
>
> Create?
> The PG website icon favicon.ico (name might not be right, I have a cold
> and am
> working from memory ... the one that shows up in your browser) is an .ico
> file. Should be able to use it as-is.
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFErnWmI7J99hVZuJcRAtSLAKCfVftMzGPlF9D4gb7IV6Rll0gBxgCg28gV
ebFKoC1E138JT9qUniSK/vU=
=TuGc
-----END PGP SIGNATURE-----

From Bowerbird at aol.com  Fri Jul  7 12:10:54 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Jul  7 12:11:00 2006
Subject: [gutvol-d] DVD: last check
Message-ID: <53f.2b98e6e.31e00bbe@aol.com>

greg-

here is a slight reworking of that info-page you made on the d.v.d.

search for question-marks to find questions i had...

-bowerbird

===========================================================

"july 2006 special" -- current as of ebook #18739

===========================================================

baseline: everything but the hgp:

1-2199 (then skip 2200-2224)
2225-3500 (then skip 3501-3524)
3525-11774 (then skip 11775-11799)
11800-20000

===========================================================

particular items of interest included that are not text and not html:

116 (zip/avi, select "all") -- apollo 11 moon landing movie
156 (midi, select "all") -- beethoven's 5th symphony audio
249 (zip/html) -- french cave-paintings pictures
256 (zip/mpg) -- rotating-earth movie
3002 (mp3) -- janis ian, society's child audio
5212-5216 ("all") -- a-bomb videos (was "5212-5215"? -- is 5216 a 
compilation?)
9551 ("all") -- human-read sherlock holmes audio
10177 ("all") -- ride of the valkyries, audio
17246 ("all") , but it doesn't include all the mp3s -- wrong e-text #?

===========================================================

selected top-100 titles to specify as html:

11 -- alice in wonderland (does this mean you used #928?)
132 -- art of war (does this mean you used #17405?)
5000 (da vinci notebooks, html?, complete set of #4998 and #4999?)
5001 -- einstein's relativity
5200 --   "metamorphosis" (anything special about this html file?)
8710 -- dore bible illustrations
8800 -- dante's divine comedy (only available as html download)
9551 -- human-read sherlock holmes
10681 -- roget's thesaurus (heavily formatted with styles)
13510 -- "knots, splices and rope work"

===========================================================

a few extras to specify as html:

10600 -- kerr's "voyages and travels" (but no images in this file?)

illustrated beatrix potter:
17089, 15575, 15284, 15234, 15137, 15077, 14877, 14872, 14868, 14848,
14838, 14837, 14814, 14797, 14407, 14304, 14220, 12103

the first 20 punch: -- (all of the punch are listed separately below)
18114, 17994, 17654, 17653, 17634, 17629, 17596, 17471, 17397, 17216,
16877, 16727, 16717, 16707, 16684, 16673, 16640, 16628, 16619, 16592,

the sciam (232mb) -- (listed below) -- (so, were all these included on the 
dvd?)

===========================================================

eliminate some titles that are part of series.
these "complete" volumes were skipped,
and their individual volumes were retained.)

to skip (a total of 245 duplicate "completes"):

17216, 16205, 16190, 16146, 13260, 13042, 12242, 12215, 12161, 11996, 11976,
10876, 9774, 9761, 9755, 9670, 9600, 9450, 9320, 9170, 9169, 8800, 8726, 
8710,
8562, 8525, 8516, 8505, 8460, 8100, 7878, 7852, 7761, 7756, 7749, 7735, 7727, 

7714, 7701, 7691, 7684, 7671, 7658, 7649, 7639, 7630, 7623, 7614, 7608, 7605, 

7535, 7420, 7400, 7332, 7317, 7290, 7140, 7025, 7005, 6944, 6941, 6780, 6775, 

6761, 6615, 6516, 6478, 6400, 6300, 6299, 6295, 6291, 6288, 6284, 6280, 6274, 

6271, 6267, 6260, 6253, 6249, 6241, 6236, 6229, 6222, 6217, 6214, 6210, 6205, 

6201, 6194, 6191, 6179, 6156, 6098, 5999, 5998, 5946, 5921, 5668, 5650, 5600, 

5587, 5583, 5577, 5571, 5560, 5551, 5542, 5529, 5516, 5507, 5499, 5493, 5482, 

5472, 5466, 5460, 5449, 5416, 5400, 5396, 5387, 5382, 5373, 5364, 5355, 5300, 

5240, 5225, 5060, 5059, 5058, 5057, 5056, 5055, 5000, 4973, 4912, 4900, 4899, 

4885, 4884, 4872, 4860, 4847, 4836, 4800, 4645, 4546, 4500, 4491, 4488, 4482, 

4476, 4470, 4464, 4460, 4452, 4443, 4434, 4426, 4420, 4412, 4405, 4397, 4367, 

4362, 4361, 4330, 4270, 4269, 4264, 4261, 4200, 4199, 4195, 4184, 4171, 4162, 

4153, 4145, 4138, 4131, 4125, 4116, 4107, 3999, 3995, 3990, 3985, 3980, 3975, 

3971, 3967, 3962, 3957, 3953, 3946, 3942, 3938, 3934, 3930, 3926, 3922, 3918, 

3913, 3899, 3883, 3859, 3854, 3846, 3841, 3766, 3739, 3684, 3649, 3600, 3580, 

3567, 3545, 3534, 3374, 3350, 3254, 3253, 3252, 3199, 3189, 3178, 3177, 3176, 

3125, 3090, 3072, 2988, 2895, 2760, 2270, 2144, 1837, 100, 86, 76, 74

===========================================================

these individual volumes skipped, their "complete" version retained.   (do i 
have that right?)

11801-11856
8301-8373
8228-8293
8001-8066
6419-6420
6348-6349
6161
5010-5049
1609-1610
1581-1582

===========================================================

hgp items to skip (the reverse of the first list above):

11775-11799
3501-3524 -- (the original "4501-3524" was a mistake?)
2200-2224

===========================================================

here are all the punch/punchinello issues (about 660mb):

18114, 17994, 17654, 17653, 17634, 17629, 17596, 17471, 17397, 17216,
16877, 16727, 16717, 16707, 16684, 16673, 16640, 16628, 16619, 16592,
16563, 16509, 16401, 16394, 16364, 16281, 16271, 16263, 16213, 16152,
16113, 16107, 15973, 15957, 15912, 15742, 15688, 15677, 15657, 15615,
15605, 15594, 15512, 15453, 15442, 15441, 15439, 15377, 15366, 15332,
15330, 15196, 15166, 15144, 15142, 15121, 15064, 15049, 15026, 15021,
15012, 14991, 14974, 14973, 14966, 14965, 14942, 14941, 14940, 14939,
14938, 14937, 14936, 14935, 14934, 14933, 14932, 14931, 14930, 14929,
14928, 14927, 14926, 14925, 14924, 14923, 14922, 14921, 14920, 14919,
14856, 14846, 14845, 14808, 14787, 14769, 14767, 14747, 14745, 14707,
14695, 14694, 14690, 14652, 14639, 14601, 14592, 14544, 14516, 14514,
14483, 14455, 14452, 14450, 14390, 14389, 14365, 14364, 14344, 14341,
14321, 14277, 14272, 14250, 14231, 14229, 14217, 14199, 14186, 14166,
14165, 14146, 14141, 14135, 14123, 14122, 14093, 14074, 14067, 14057,
14053, 14046, 13995, 13994, 13966, 13961, 13954, 13927, 13903, 13710,
13639, 13563, 13538, 13503, 13502, 13491, 13466, 13465, 13446, 13422,
13421, 13391, 13390, 13373, 13352, 13348, 13327, 13323, 13313, 13297,
13283, 13281, 13270, 13269, 13253, 13252, 13244, 13186, 13185, 13098,
13074, 13067, 12951, 12944, 12934, 12917, 12905, 12872, 12866, 12860,
12825, 12739, 12738, 12737, 12614, 12536, 12517, 12469, 12468, 12467,
12466, 12465, 12395, 12394, 12393, 12392, 12378, 12323, 12306, 12305,
12294, 12292, 12262, 12232, 12231, 12114, 12079, 12043, 11963, 11919,
11910, 11908, 11907, 11872, 11868, 11732, 11726, 11712, 11704, 11670,
11638, 11630, 11629, 11619, 11617, 11571, 11570, 11491, 11466, 11444,
11443, 11429, 11428, 11425, 11359, 11284, 11225, 11201, 11177, 11169,
11133, 11109, 11094, 11076, 10964, 10952, 10934, 10933, 10923, 10903,
10721, 10711, 10663, 10614, 10595, 10594, 10544, 10450, 10292, 10144,
10143, 10106, 10105, 10104, 10092, 10091, 10047, 10036, 10035, 10034,
10033, 10032, 10019, 10018, 10017, 10016, 10015, 10014, 10013, 9962,
9961, 9960, 9953, 9898, 9885, 9877, 9819, 9797, 9658, 9636, 9549, 9545,
9544, 9481, 8643, 8433

===========================================================

all the scientific american supplements (about 235mb)

18345, 18265, 17817, 17755, 17167, 16972, 16948, 16792, 16773, 16671,
16360, 16354, 16353, 16270, 15889, 15833, 15831, 15708, 15417, 15193,
15052, 15051, 15050, 14990, 14989, 14097, 14041, 14009, 13962, 13939,
13640, 13443, 13401, 13399, 13358, 12490, 11761, 11736, 11735, 11734,
11662, 11649, 11648, 11647, 11498, 11385, 11383, 11344, 9666, 9266, 4],
9163, 9076, 8952, 8951, 8950, 8862, 8742, 8718, 8717, 8687, 8559, 8504,
8484, 8483, 8452, 8408, 8391, 8297, 8296, 8195,

===========================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060707/92e9af28/attachment.html
From gbnewby at pglaf.org  Fri Jul  7 12:12:10 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Jul  7 12:12:13 2006
Subject: [dp-pg] Re: [gutvol-d] DVD: last check
In-Reply-To: <000601c6a1d5$20793c40$0132a8c0@blackbox>
References: <200607070954.34942.donovan@abs.net>
	<000601c6a1d5$20793c40$0132a8c0@blackbox>
Message-ID: <20060707191210.GC2905@pglaf.org>

On Fri, Jul 07, 2006 at 09:53:03AM -0500, Aaron Cannon wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Right you are.  It's at http://www.gutenberg.org/favicon.ico .  All that
> needs to be done is add it to the root directory of the disc and the line:
> icon=favicon.ico
> to the autorun file and it should work.
> 
> Sincerely
> Aaron Cannon
> 

Got it:

[autorun]
open=rundll32.exe url.dll,FileProtocolHandler index.htm 
icon=favicon.ico

> - --
> Skype: cannona
> MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
> address.)
> - ----- Original Message -----
> From: "D Garcia" <donovan@abs.net>
> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@pglaf.org>
> Sent: Friday, July 07, 2006 8:54 AM
> Subject: Re: [dp-pg] Re: [gutvol-d] DVD: last check
> 
> 
> >On Friday 07 July 2006 09:26 am, Aaron Cannon wrote:
> >>Finally, it might be cool if someone were to create an .ico file of the
> >>PG
> >>logo, and that could be added to the autorun.inf file to give the disc an
> >>icon under windows.  Totally without purpose, except that it might look
> >>cool.
> >
> >Create?
> >The PG website icon favicon.ico (name might not be right, I have a cold
> >and am
> >working from memory ... the one that shows up in your browser) is an .ico
> >file. Should be able to use it as-is.
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d@lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> >
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
> Comment: Key available from all major key servers.
> 
> iD8DBQFErnWmI7J99hVZuJcRAtSLAKCfVftMzGPlF9D4gb7IV6Rll0gBxgCg28gV
> ebFKoC1E138JT9qUniSK/vU=
> =TuGc
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From gbnewby at pglaf.org  Fri Jul  7 13:54:34 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Jul  7 13:54:35 2006
Subject: [gutvol-d] Re: DVD: last check
In-Reply-To: <53f.2b98e6e.31e00bbe@aol.com>
References: <53f.2b98e6e.31e00bbe@aol.com>
Message-ID: <20060707205434.GA5067@pglaf.org>

More fixes applied....added the .ico,
added a few inspirational quotes from our content,

  -- Greg
From cannona at fireantproductions.com  Fri Jul  7 14:18:26 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Fri Jul  7 14:30:39 2006
Subject: [gutvol-d] Re: DVD: last check
References: <53f.2b98e6e.31e00bbe@aol.com> <20060707205434.GA5067@pglaf.org>
Message-ID: <000b01c6a20c$96ba8030$0132a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Looks like the dvd was also moved.
http://snowy.arsc.alaska.edu/gbn/pgimages/pgdvd072006/ is the online
version.

Just say the word and I'll get it up on the torrent tracker.

Sincerely
Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)
- ----- Original Message -----
From: "Greg Newby" <gbnewby@pglaf.org>
To: <gutvol-d@lists.pglaf.org>
Sent: Friday, July 07, 2006 3:54 PM
Subject: [gutvol-d] Re: DVD: last check


> More fixes applied....added the .ico,
> added a few inspirational quotes from our content,
>
>  -- Greg
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFErtKMI7J99hVZuJcRAvd4AKCtpw0k7lSJteGX45OzO04xUxCYIACgnB1v
uvQIQv1tCi04A+BluD2rXk0=
=Fcwy
-----END PGP SIGNATURE-----

From gbnewby at pglaf.org  Fri Jul  7 16:05:13 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Jul  7 16:05:15 2006
Subject: [gutvol-d] Re: DVD: last check
In-Reply-To: <000b01c6a20c$96ba8030$0132a8c0@blackbox>
References: <53f.2b98e6e.31e00bbe@aol.com> <20060707205434.GA5067@pglaf.org>
	<000b01c6a20c$96ba8030$0132a8c0@blackbox>
Message-ID: <20060707230513.GA8091@pglaf.org>

On Fri, Jul 07, 2006 at 04:18:26PM -0500, Aaron Cannon wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Looks like the dvd was also moved.
> http://snowy.arsc.alaska.edu/gbn/pgimages/pgdvd072006/ is the online
> version.

Right.

The ISO:
http://snowy.arsc.alaska.edu/gbn/pgimages/pgdvd072006.iso

The checksum:
http://snowy.arsc.alaska.edu/gbn/pgimages/pgdvd072006.md5

> Just say the word and I'll get it up on the torrent tracker.

Go for it!  Though I might relocate it at some point.

I made a physical DVD, it's just fine.  I don't know why it's not as full
as I thought...the automated tool must do some poor math somewhere.

The main anomaly, which I won't try to fix right now, is
that we have many cases where there is a -8.zip, a -0.zip
and a .zip for different character sets but the same eBook.

I also found a title that just didn't make it, and assume
there are more.  So, we might get these fixed, but the DVD 
is "good enough" to make some copies, and to put up for
people to experiment with.
  -- Greg


> Sincerely
> Aaron Cannon
> 
> 
> - --
> Skype: cannona
> MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
> address.)
> - ----- Original Message -----
> From: "Greg Newby" <gbnewby@pglaf.org>
> To: <gutvol-d@lists.pglaf.org>
> Sent: Friday, July 07, 2006 3:54 PM
> Subject: [gutvol-d] Re: DVD: last check
> 
> 
> >More fixes applied....added the .ico,
> >added a few inspirational quotes from our content,
> >
> > -- Greg
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d@lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> >
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
> Comment: Key available from all major key servers.
> 
> iD8DBQFErtKMI7J99hVZuJcRAvd4AKCtpw0k7lSJteGX45OzO04xUxCYIACgnB1v
> uvQIQv1tCi04A+BluD2rXk0=
> =Fcwy
> -----END PGP SIGNATURE-----
From urbangleaner56 at yahoo.com  Sat Jul  8 17:48:35 2006
From: urbangleaner56 at yahoo.com (Jacqulyn Perry)
Date: Sat Jul  8 17:55:18 2006
Subject: [gutvol-d] Comments & Questions About Book Illustrations
Message-ID: <20060709004835.98347.qmail@web38510.mail.mud.yahoo.com>

Hi;
  Obviously, I'm new here. I realize that the Project is mainly text driven, but...  I down loaded Hans Christian Anderson Fairy Tales edition with Edmund Dulac's illustrations-after jumping around the room in joyful glee-then realized that the dpi on the images isn't high enough to print them out, with the rest of the book. At least, not very well. Then, the images-or scans of them-were so dark that you can't even really see them. I'm assuming that that's mainly due to the age of the book. Anyway, I took the liberty of working with the illos a little... I lightened them a little, so they were viewable, and placed them in a folder in my computer, leaving the original image file in the ebook folder alone.
  I'm more than happy to send these lightened images to someone, to see if they want to re-place the exsisting image file in the ebook with THIS file. They aren't any higher dpi ofcourse, but at least people will be able to see them. But if there is an interest, I need to know who as well as how to send them. 
   
  My question is this... why aren't the images higher res? I realize it would take up more space, but it would be wonderful to be able to have good quality illos to go along with the text. But then, besides being a reader, I'm a visual artist myself. But it would be wonderful, especially given the fact that so much of this artwork is completly inaccessable to the public, or if available in poster/print form, its anywhere from $45-$80 or more for a small reproduction, and this is work thats in the public domain!
   
  Anyway, sorry about the rant.
  Leigh

 		
---------------------------------
Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2?/min or less.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060708/714c91d3/attachment.html
From grythumn at gmail.com  Sat Jul  8 18:44:07 2006
From: grythumn at gmail.com (Robert Cicconetti)
Date: Sat Jul  8 18:45:42 2006
Subject: [gutvol-d] Comments & Questions About Book Illustration
In-Reply-To: <20060709004835.98347.qmail@web38510.mail.mud.yahoo.com>
References: <20060709004835.98347.qmail@web38510.mail.mud.yahoo.com>
Message-ID: <15cfa2a50607081844i5b8dd42fkad50d9253bc55455@mail.gmail.com>

Current policy on DP books[1] is to have high-resolution scans of the
illustrations archived along with the projects..  they will eventually be
made available for people to use. Keep in mind raw scans are extremely
expensive in terms of disk space/bandwidth.. a mildly graphics heavy book
can easily hit 200-300 megs, while a book with a lot of color plates can
easily exceed several gigabytes. (Before descreening, etc. I scan all illos
at 600 DPI as PNGs; after descreening and downscaling to 300 DPI the size
drops dramatically.)

R C
[1] Not sure when it went into effect. Probably sometime after the Project
Manager/Post Processor jobs were split.

On 7/8/06, Jacqulyn Perry <urbangleaner56@yahoo.com> wrote:
>
> Hi;
> Obviously, I'm new here. I realize that the Project is mainly text driven,
> but...  I down loaded Hans Christian Anderson Fairy Tales edition with
> Edmund Dulac's illustrations-after jumping around the room in joyful
> glee-then realized that the dpi on the images isn't high enough to print
> them out, with the rest of the book. At least, not very well. Then, the
> images-or scans of them-were so dark that you can't even really see them.
> I'm assuming that that's mainly due to the age of the book. Anyway, I took
> the liberty of working with the illos a little... I lightened them a little,
> so they were viewable, and placed them in a folder in my computer, leaving
> the original image file in the ebook folder alone.
> I'm more than happy to send these lightened images to someone, to see if
> they want to re-place the exsisting image file in the ebook with THIS file.
> They aren't any higher dpi ofcourse, but at least people will be able to see
> them. But if there is an interest, I need to know who as well as how to send
> them.
>
> My question is this... why aren't the images higher res? I realize it
> would take up more space, but it would be wonderful to be able to have good
> quality illos to go along with the text. But then, besides being a reader,
> I'm a visual artist myself. But it would be wonderful, especially given the
> fact that so much of this artwork is completly inaccessable to the public,
> or if available in poster/print form, its anywhere from $45-$80 or more for
> a small reproduction, and this is work thats in the public domain!
>
> Anyway, sorry about the rant.
> Leigh
>
> ------------------------------
> Yahoo! Messenger with Voice. Make PC-to-Phone Calls<http://us.rd.yahoo.com/mail_us/taglines/postman1/*http://us.rd.yahoo.com/evt=39663/*http://voice.yahoo.com>to the US (and 30+ countries) for 2?/min or less.
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060708/3d165f19/attachment.html
From urbangleaner56 at yahoo.com  Sat Jul  8 19:51:29 2006
From: urbangleaner56 at yahoo.com (Jacqulyn Perry)
Date: Sat Jul  8 19:51:32 2006
Subject: [gutvol-d] Comments & Questions About Book Illustration
In-Reply-To: <15cfa2a50607081844i5b8dd42fkad50d9253bc55455@mail.gmail.com>
Message-ID: <20060709025129.40318.qmail@web38507.mail.mud.yahoo.com>

Okay, that makes sense... would the powers that be want to take a looksee at the image file of 'lightened' images? 
  Leigh

Robert Cicconetti <grythumn@gmail.com> wrote:
  Current policy on DP books[1] is to have high-resolution scans of the illustrations archived along with the projects..  they will eventually be made available for people to use. Keep in mind raw scans are extremely expensive in terms of disk space/bandwidth.. a mildly graphics heavy book can easily hit 200-300 megs, while a book with a lot of color plates can easily exceed several gigabytes. (Before descreening, etc. I scan all illos at 600 DPI as PNGs; after descreening and downscaling to 300 DPI the size drops dramatically.) 

R C
[1] Not sure when it went into effect. Probably sometime after the Project Manager/Post Processor jobs were split.

  On 7/8/06, Jacqulyn Perry <urbangleaner56@yahoo.com> wrote:      Hi;
  Obviously, I'm new here. I realize that the Project is mainly text driven, but...  I down loaded Hans Christian Anderson Fairy Tales edition with Edmund Dulac's illustrations-after jumping around the room in joyful glee-then realized that the dpi on the images isn't high enough to print them out, with the rest of the book. At least, not very well. Then, the images-or scans of them-were so dark that you can't even really see them. I'm assuming that that's mainly due to the age of the book. Anyway, I took the liberty of working with the illos a little... I lightened them a little, so they were viewable, and placed them in a folder in my computer, leaving the original image file in the ebook folder alone. 
  I'm more than happy to send these lightened images to someone, to see if they want to re-place the exsisting image file in the ebook with THIS file. They aren't any higher dpi ofcourse, but at least people will be able to see them. But if there is an interest, I need to know who as well as how to send them. 
   
  My question is this... why aren't the images higher res? I realize it would take up more space, but it would be wonderful to be able to have good quality illos to go along with the text. But then, besides being a reader, I'm a visual artist myself. But it would be wonderful, especially given the fact that so much of this artwork is completly inaccessable to the public, or if available in poster/print form, its anywhere from $45-$80 or more for a small reproduction, and this is work thats in the public domain! 
   
  Anyway, sorry about the rant.
  Leigh

    
---------------------------------
  Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2?/min or less.   


_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org 
http://lists.pglaf.org/listinfo.cgi/gutvol-d


_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


---------------------------------
Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2?/min or less.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060708/03bf4f62/attachment.html
From sly at victoria.tc.ca  Sat Jul  8 23:42:56 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat Jul  8 23:42:59 2006
Subject: [gutvol-d] Comments & Questions About Book Illustrations
In-Reply-To: <20060709004835.98347.qmail@web38510.mail.mud.yahoo.com>
References: <20060709004835.98347.qmail@web38510.mail.mud.yahoo.com>
Message-ID: <Pine.GSO.4.58.0607082329220.24886@vtn1.victoria.tc.ca>

Hi Leigh.

Thanks for sharing your ideas.
I suppose an answer for your question is that, as you mention,
PG is "mainly text driven". For a general user, images are needed
that are accessible for download over a non-high-speed connection.
Also, we need to remember that, in these cases, how the images
are prepared is up to each individual volunteer. I find dealing
with images a pain, because unlike the relatively straight-forward
text, there are so many variables in digatizing images.

I'm now leaning towards transcribing this picture book using
reduced-size jpgs (averaging 375 by 530 pixels) that will hopefully
make for a smoothly loading html file for most users. Then,
I'll include ziped hi-res page images for the occasional
person who might want them.

Andrew

On Sat, 8 Jul 2006, Jacqulyn Perry wrote:

> Hi;
>   Obviously, I'm new here. I realize that the Project is mainly text driven, but...  I down loaded Hans Christian Anderson Fairy Tales edition with Edmund Dulac's illustrations-after jumping around the room in joyful glee-then realized that the dpi on the images isn't high enough to print them out, with the rest of the book. At least, not very well. Then, the images-or scans of them-were so dark that you can't even really see them. I'm assuming that that's mainly due to the age of the book. Anyway, I took the liberty of working with the illos a little... I lightened them a little, so they were viewable, and placed them in a folder in my computer, leaving the original image file in the ebook folder alone.
>   I'm more than happy to send these lightened images to someone, to see if they want to re-place the exsisting image file in the ebook with THIS file. They aren't any higher dpi ofcourse, but at least people will be able to see them. But if there is an interest, I need to know who as well as how to send them.
>
>   My question is this... why aren't the images higher res? I realize it would take up more space, but it would be wonderful to be able to have good quality illos to go along with the text. But then, besides being a reader, I'm a visual artist myself. But it would be wonderful, especially given the fact that so much of this artwork is completly inaccessable to the public, or if available in poster/print form, its anywhere from $45-$80 or more for a small reproduction, and this is work thats in the public domain!
>
>   Anyway, sorry about the rant.
>   Leigh
>
From urbangleaner56 at yahoo.com  Sun Jul  9 02:37:16 2006
From: urbangleaner56 at yahoo.com (Jacqulyn Perry)
Date: Sun Jul  9 02:37:20 2006
Subject: [gutvol-d] Comments & Questions About Book Illustrations
In-Reply-To: <Pine.GSO.4.58.0607082329220.24886@vtn1.victoria.tc.ca>
Message-ID: <20060709093716.13839.qmail@web38509.mail.mud.yahoo.com>

Hi Andrew!
  That sounds like a great idea, and a sensible compromise! Is there a way I could help with this? I'm not sure HOW I could help, I don't have access to the books, as I seriously doubt I could get my hands on a copy of any of these books-most are collectors items and way outside my budget.
   
  In the meantime, I still have the re-worked images I mentioned, which I would very much like to pass on to you or whomever. They are the same ones in the original file, just re-worked a little so you can actually see them, rather than a dark blur.
   
  Leigh
  

Andrew Sly <sly@victoria.tc.ca> wrote:
  Hi Leigh.

Thanks for sharing your ideas.
I suppose an answer for your question is that, as you mention,
PG is "mainly text driven". For a general user, images are needed
that are accessible for download over a non-high-speed connection.
Also, we need to remember that, in these cases, how the images
are prepared is up to each individual volunteer. I find dealing
with images a pain, because unlike the relatively straight-forward
text, there are so many variables in digatizing images.

I'm now leaning towards transcribing this picture book using
reduced-size jpgs (averaging 375 by 530 pixels) that will hopefully
make for a smoothly loading html file for most users. Then,
I'll include ziped hi-res page images for the occasional
person who might want them.

Andrew

On Sat, 8 Jul 2006, Jacqulyn Perry wrote:

> Hi;
> Obviously, I'm new here. I realize that the Project is mainly text driven, but... I down loaded Hans Christian Anderson Fairy Tales edition with Edmund Dulac's illustrations-after jumping around the room in joyful glee-then realized that the dpi on the images isn't high enough to print them out, with the rest of the book. At least, not very well. Then, the images-or scans of them-were so dark that you can't even really see them. I'm assuming that that's mainly due to the age of the book. Anyway, I took the liberty of working with the illos a little... I lightened them a little, so they were viewable, and placed them in a folder in my computer, leaving the original image file in the ebook folder alone.
> I'm more than happy to send these lightened images to someone, to see if they want to re-place the exsisting image file in the ebook with THIS file. They aren't any higher dpi ofcourse, but at least people will be able to see them. But if there is an interest, I need to know who as well as how to send them.
>
> My question is this... why aren't the images higher res? I realize it would take up more space, but it would be wonderful to be able to have good quality illos to go along with the text. But then, besides being a reader, I'm a visual artist myself. But it would be wonderful, especially given the fact that so much of this artwork is completly inaccessable to the public, or if available in poster/print form, its anywhere from $45-$80 or more for a small reproduction, and this is work thats in the public domain!
>
> Anyway, sorry about the rant.
> Leigh
>
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


---------------------------------
Sneak preview the  all-new Yahoo.com. It's not radically different. Just radically better. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060709/1fba70bc/attachment-0001.html
From cannona at fireantproductions.com  Sun Jul  9 05:07:48 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sun Jul  9 05:08:09 2006
Subject: [gutvol-d] new dvd image
Message-ID: <000301c6a350$508ac260$0300a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello all.  The new dvd image is available as a torrent.
http://snowy.arsc.alaska.edu:6969 .

My email is once again doing strange things, so I seem to only be able to
send, but not receive.  However, it doesn't seem to be bouncing, so if you
sent me something, hopefully I'll get it eventually, but perhaps not.

Sincerely
Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFEsPGlI7J99hVZuJcRAivqAJ41UaZW0AFrqHLRRWxKtVXPJgozRQCg0x51
2X5uERI+EPFoCQ398haa6jw=
=eh2z
-----END PGP SIGNATURE-----

From sly at victoria.tc.ca  Sun Jul  9 09:20:45 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Jul  9 09:20:48 2006
Subject: [gutvol-d] Comments & Questions About Book Illustrations
In-Reply-To: <20060709093716.13839.qmail@web38509.mail.mud.yahoo.com>
References: <20060709093716.13839.qmail@web38509.mail.mud.yahoo.com>
Message-ID: <Pine.GSO.4.58.0607090848020.15320@vtn1.victoria.tc.ca>


Leigh:

How accomplished are you as editing images? If you'd like to help
with this particular item I have underway, one image could use a
small clean-up where there seems to be a smudge of blue ink.
Take a look at http://www.victoria.tc.ca/~sly/pb.htm
(See the first "alphabet" image, on the baby's forehead.)
This is beyond what I feel comfortable dealing with.
If you are interested, I have a 1.4mb png file that I would
request to have edited and returned in the same format.

Or, were you asking in a more general point of view?
By far the easiest way for a new volunteer to contribute
to Project Gutenberg is to sign up at Distributed
Proofreaders (pgdp.net) and help with one page at a time.

And about your re-worked images, it would be best to ask
one of the white-washers, (people who actually post files).
Perhaps via errata [at] pglaf.org. In that case, please
include more full details, such as PG number of the ebook.

In this case, I can predict that a likely response would be
that lightened images would be welcome if you can make them
from an original source. In the case at hand the files included
are jpgs. Jpgs use a lossy compression, which means that each
time you save them, some information is lost. So if you take a
jpg, edit it and save it again, you are probably going to have
a file which is both larger in size and poorer in quality than
what you started with. If you really want to perserver on this
text, you could then try to contact whoever prepared this
text and ask if high-resolution images are availible you could
work from. (And all of this provides a good argument why I'd
like to preserve hi-res images somewhere for the text I'm working
on.)

Andrew

On Sun, 9 Jul 2006, Jacqulyn Perry wrote:

> Hi Andrew!
>   That sounds like a great idea, and a sensible compromise! Is there a way I could help with this? I'm not sure HOW I could help, I don't have access to the books, as I seriously doubt I could get my hands on a copy of any of these books-most are collectors items and way outside my budget.
>
>   In the meantime, I still have the re-worked images I mentioned, which I would very much like to pass on to you or whomever. They are the same ones in the original file, just re-worked a little so you can actually see them, rather than a dark blur.
>
>   Leigh

From Bowerbird at aol.com  Sun Jul  9 09:31:49 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Jul  9 09:31:58 2006
Subject: [gutvol-d] Comments & Questions About Book Illustrations
Message-ID: <3f4.63db38f.31e28975@aol.com>

andrew said:
>    By far the easiest way for a new volunteer to contribute to 
>    Project Gutenberg is to sign up at Distributed Proofreaders 
>    (pgdp.net) and help with one page at a time.

that might be the "easiest" way for a new volunteer to contribute,
but it won't make the best use of the skill-set leigh has to offer...

instead, leigh, trot on over to distributed proofreaders and find,
in the _forums_ there, the group of people who are focused on
handling images, and let them know you'd like to go to work...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060709/e904cf14/attachment.html
From urbangleaner56 at yahoo.com  Sun Jul  9 13:03:05 2006
From: urbangleaner56 at yahoo.com (Jacqulyn Perry)
Date: Sun Jul  9 13:03:11 2006
Subject: [gutvol-d] Comments & Questions About Book Illustrations
In-Reply-To: <Pine.GSO.4.58.0607090848020.15320@vtn1.victoria.tc.ca>
Message-ID: <20060709200305.43554.qmail@web38505.mail.mud.yahoo.com>

My real interest is in working with the images. My problem is I'm not very computer literate-I'm an 'old fashion' painter-so I'm not sure how much I can do with the limited graphics programs I have.
   
  I'm pretty sure I can take care of the smudge, but it would require me using Paint to remove the smudge, then printing the image out and doing the retouch by hand-I used to work as a photo re-touch person-then scanning the re-touched image to send back to you. Which I would be glad to do.
   
  I've posted the image at an artist website I belong to, that has a VERY active computer graphics forum and asked their advice. I've also asked about a good graphics program. Though from what I've seen so far, most of the images you folks have, at the most just require a little brightening and maybe a tiny bite of color adjustment. That I CAN do with what I have. Anything I CAN'T do, I will say so. 
   
  I'm sure that Adobe Photoshop would take care of anything at all I would need to do, but due to lack of cash, buying it is out of the question for now.
   
  Oh yes, I figured I would need to contact the person who originally did the book, and ask for a high res file of the images. I just wanted someone to see what a difference lightening them, makes.
   
  Leigh
  

Andrew Sly <sly@victoria.tc.ca> wrote:
  
Leigh:

How accomplished are you as editing images? If you'd like to help
with this particular item I have underway, one image could use a
small clean-up where there seems to be a smudge of blue ink.
Take a look at http://www.victoria.tc.ca/~sly/pb.htm
(See the first "alphabet" image, on the baby's forehead.)
This is beyond what I feel comfortable dealing with.
If you are interested, I have a 1.4mb png file that I would
request to have edited and returned in the same format.

Or, were you asking in a more general point of view?
By far the easiest way for a new volunteer to contribute
to Project Gutenberg is to sign up at Distributed
Proofreaders (pgdp.net) and help with one page at a time.

And about your re-worked images, it would be best to ask
one of the white-washers, (people who actually post files).
Perhaps via errata [at] pglaf.org. In that case, please
include more full details, such as PG number of the ebook.

In this case, I can predict that a likely response would be
that lightened images would be welcome if you can make them
from an original source. In the case at hand the files included
are jpgs. Jpgs use a lossy compression, which means that each
time you save them, some information is lost. So if you take a
jpg, edit it and save it again, you are probably going to have
a file which is both larger in size and poorer in quality than
what you started with. If you really want to perserver on this
text, you could then try to contact whoever prepared this
text and ask if high-resolution images are availible you could
work from. (And all of this provides a good argument why I'd
like to preserve hi-res images somewhere for the text I'm working
on.)

Andrew

On Sun, 9 Jul 2006, Jacqulyn Perry wrote:

> Hi Andrew!
> That sounds like a great idea, and a sensible compromise! Is there a way I could help with this? I'm not sure HOW I could help, I don't have access to the books, as I seriously doubt I could get my hands on a copy of any of these books-most are collectors items and way outside my budget.
>
> In the meantime, I still have the re-worked images I mentioned, which I would very much like to pass on to you or whomever. They are the same ones in the original file, just re-worked a little so you can actually see them, rather than a dark blur.
>
> Leigh

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


---------------------------------
Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls.  Great rates starting at 1?/min.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060709/38b5c28d/attachment.html
From cannona at fireantproductions.com  Sun Jul  9 14:46:57 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sun Jul  9 14:54:06 2006
Subject: [gutvol-d] PG Wiki
Message-ID: <000001c6a3a2$31797270$0300a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Marcello and all.  Any word on when the static content on Gutenberg.org
will be done away with and replaced by the wiki?

Also, congratulations Italy!  That was an intense game. :)

Sincerely
Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFEsXr7I7J99hVZuJcRAi4SAJ9VGglvtmHeHrqaq8xGjtDgyKTckwCgo2Ow
eWxF9+N2OTZk7QHmv6Qof7Y=
=/56y
-----END PGP SIGNATURE-----

From vze3rknp at verizon.net  Sun Jul  9 15:01:34 2006
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Sun Jul  9 15:01:36 2006
Subject: [gutvol-d] Comments & Questions About Book Illustrations
In-Reply-To: <20060709200305.43554.qmail@web38505.mail.mud.yahoo.com>
References: <20060709200305.43554.qmail@web38505.mail.mud.yahoo.com>
Message-ID: <44B17CBE.6020907@verizon.net>

Hi Jacqulyn,

You might have a look at The GIMP, which does almost everything 
photoshop does and is free.

There is an Illustrators team at DP that always needs help. I hope that 
you, and perhaps some of the folks you are in contact with, will join 
and give us a hand with our illustrations. I hope that eventually DP 
will have a parallel process that will have experts preparing 
illustrations while the text is being proofed and formatted.

Whether for DP or otherwise, there are several common steps in prepping 
illustrations for a PG book. First, the originals have to be scanned. 
Getting good scans of illustrations takes practice and not all of our 
volunteer content providers are good at it. But everyone does the best 
they can and we encourage them all to scan illustrations at a decent 
resolution and, in the case of DP, upload those scans to our server. 
Another stage of the process is taking the raw scans and making them as 
good as possible, while still leaving them large for archiving. I 
usually do this before I upload to the DP server (stuff like deskewing, 
making sure the colors match as best as possible, etc). But not all 
volunteers have learned enough about graphics programs to do that part. 
Then further, PG usually wants illustrations that will look good on a 
screen, and to keep the overall file sizes down, so there is another 
stage of processing that reduces the image as much as possible without 
unacceptable loss of detail. There are definitely tricks to doing that 
(which I don't know). Often folks will choose to make a smaller version 
for display within the ebook and a larger one that can be obtained by 
clicking on the picture. Also, what's considered "reasonable" for size 
and detail depends to some extent on the book. A children's picture 
book, or a book about art, can reasonably have larger illustrations than 
something that was starting with not-so-good B&W photographs poorly printed.

We deal with everything from simple line art to steel-cut engravings 
(very fine detail) to printed color illustrations (needing descreening) 
to the xyz-gravure stuff that seems to scan beautifully (I don't know 
what the process is for the various -gravure stuff but it doesn't seem 
to result in the same kind of screen dots that one sees in most color or 
B&W photo stuff), to beat-up decorative book covers. There are also 
illustrations and maps that are too large to be scanned in one piece and 
need to be put back together. Lots of challenges for people who like to 
do restoration.

JulietS
DP Site Admin

Jacqulyn Perry wrote:

> My real interest is in working with the images. My problem is I'm not 
> very computer literate-I'm an 'old fashion' painter-so I'm not sure 
> how much I can do with the limited graphics programs I have.
>  
> I'm pretty sure I can take care of the smudge, but it would require me 
> using Paint to remove the smudge, then printing the image out and 
> doing the retouch by hand-I used to work as a photo re-touch 
> person-then scanning the re-touched image to send back to you. Which I 
> would be glad to do.
>  
> I've posted the image at an artist website I belong to, that has a 
> VERY active computer graphics forum and asked their advice. I've also 
> asked about a good graphics program. Though from what I've seen so 
> far, most of the images you folks have, at the most just require a 
> little brightening and maybe a tiny bite of color adjustment. That I 
> CAN do with what I have. Anything I CAN'T do, I will say so.
>  
> I'm sure that Adobe Photoshop would take care of anything at all I 
> would need to do, but due to lack of cash, buying it is out of the 
> question for now.
>  
> Oh yes, I figured I would need to contact the person who originally 
> did the book, and ask for a high res file of the images. I just wanted 
> someone to see what a difference lightening them, makes.
>  
> Leigh


From urbangleaner56 at yahoo.com  Sun Jul  9 20:32:07 2006
From: urbangleaner56 at yahoo.com (Jacqulyn Perry)
Date: Sun Jul  9 20:32:11 2006
Subject: [gutvol-d] Comments & Questions About Book Illustrations
In-Reply-To: <53a.2df95ce.31e2deeb@aol.com>
Message-ID: <20060710033207.33340.qmail@web38506.mail.mud.yahoo.com>

Thanks! 
  These are only a few of the re-worked images I have. I still have some left to do, and I need to put all of the re-worked ones in a file to send off after that.
   
  Yeah, that's one of the things I'm concerned about, because I don't have the actual book so I can compare to the printed images. So as long as someone who DOES have access to the books can check my work and as long as I'm VERY conservative about what I do to the images-and how much-then we should be golden.
   
  Couple of other things... I'm teaching this week at our local art center-kids summer arts program-and wont have much time this coming week, as well as my own projects and job hunting.
  The other thing is, that I need to educate myself about computer imaging programs ect, because until I did a web search, I had no idea WHAT PNG is, or that computers are capable of 'true color'. I need to talk to my son-in-law and ask him how to go about finding out what my computer and monitor are capable of. So obviously I have some homework ahead of me.
  Leigh

Bowerbird@aol.com wrote:
  leigh-

>   It WORKED!!!

yes it did, and they look nice.
(except it seems you skipped over plate05.)

i'll compare them to the original,
but so far i'd say you did a good job.

-bowerbird


 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060709/d6ddac84/attachment.html
From sly at victoria.tc.ca  Sun Jul  9 20:40:52 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Jul  9 20:40:54 2006
Subject: [gutvol-d] Comments & Questions About Book Illustrations
In-Reply-To: <20060710033207.33340.qmail@web38506.mail.mud.yahoo.com>
References: <20060710033207.33340.qmail@web38506.mail.mud.yahoo.com>
Message-ID: <Pine.GSO.4.58.0607092038181.21456@vtn1.victoria.tc.ca>


That's a good point. Through being involved with Project Gutenberg,
I've learned many different things in more areas than I would have
imagined.

Andrew

On Sun, 9 Jul 2006, Jacqulyn Perry wrote:

>   The other thing is, that I need to educate myself about computer imaging programs ect, because until I did a web search, I had no idea WHAT PNG is, or that computers are capable of 'true color'. I need to talk to my son-in-law and ask him how to go about finding out what my computer and monitor are capable of. So obviously I have some homework ahead of me.
>   Leigh
>
From urbangleaner56 at yahoo.com  Sun Jul  9 20:47:02 2006
From: urbangleaner56 at yahoo.com (Jacqulyn Perry)
Date: Sun Jul  9 20:47:05 2006
Subject: [gutvol-d] Comments & Questions About Book Illustrations
In-Reply-To: <44B17CBE.6020907@verizon.net>
Message-ID: <20060710034702.36120.qmail@web38510.mail.mud.yahoo.com>

Oh good!  Does the GIMP software have a Help file? I hope?
   
  Okay, what is DP? Is it part of PG?  I'm really interested in working with childrens book illus, as that is what I have some background in as an artist. After I've learned more about working with computer graphics programs, I would probably be willing to help out with the other types of imaging work as well.
  Leigh

Juliet Sutherland <vze3rknp@verizon.net> wrote:
  Hi Jacqulyn,

You might have a look at The GIMP, which does almost everything 
photoshop does and is free.

There is an Illustrators team at DP that always needs help. I hope that 
you, and perhaps some of the folks you are in contact with, will join 
and give us a hand with our illustrations. I hope that eventually DP 
will have a parallel process that will have experts preparing 
illustrations while the text is being proofed and formatted.

Whether for DP or otherwise, there are several common steps in prepping 
illustrations for a PG book. First, the originals have to be scanned. 
Getting good scans of illustrations takes practice and not all of our 
volunteer content providers are good at it. But everyone does the best 
they can and we encourage them all to scan illustrations at a decent 
resolution and, in the case of DP, upload those scans to our server. 
Another stage of the process is taking the raw scans and making them as 
good as possible, while still leaving them large for archiving. I 
usually do this before I upload to the DP server (stuff like deskewing, 
making sure the colors match as best as possible, etc). But not all 
volunteers have learned enough about graphics programs to do that part. 
Then further, PG usually wants illustrations that will look good on a 
screen, and to keep the overall file sizes down, so there is another 
stage of processing that reduces the image as much as possible without 
unacceptable loss of detail. There are definitely tricks to doing that 
(which I don't know). Often folks will choose to make a smaller version 
for display within the ebook and a larger one that can be obtained by 
clicking on the picture. Also, what's considered "reasonable" for size 
and detail depends to some extent on the book. A children's picture 
book, or a book about art, can reasonably have larger illustrations than 
something that was starting with not-so-good B&W photographs poorly printed.

We deal with everything from simple line art to steel-cut engravings 
(very fine detail) to printed color illustrations (needing descreening) 
to the xyz-gravure stuff that seems to scan beautifully (I don't know 
what the process is for the various -gravure stuff but it doesn't seem 
to result in the same kind of screen dots that one sees in most color or 
B&W photo stuff), to beat-up decorative book covers. There are also 
illustrations and maps that are too large to be scanned in one piece and 
need to be put back together. Lots of challenges for people who like to 
do restoration.

JulietS
DP Site Admin

Jacqulyn Perry wrote:

> My real interest is in working with the images. My problem is I'm not 
> very computer literate-I'm an 'old fashion' painter-so I'm not sure 
> how much I can do with the limited graphics programs I have.
> 
> I'm pretty sure I can take care of the smudge, but it would require me 
> using Paint to remove the smudge, then printing the image out and 
> doing the retouch by hand-I used to work as a photo re-touch 
> person-then scanning the re-touched image to send back to you. Which I 
> would be glad to do.
> 
> I've posted the image at an artist website I belong to, that has a 
> VERY active computer graphics forum and asked their advice. I've also 
> asked about a good graphics program. Though from what I've seen so 
> far, most of the images you folks have, at the most just require a 
> little brightening and maybe a tiny bite of color adjustment. That I 
> CAN do with what I have. Anything I CAN'T do, I will say so.
> 
> I'm sure that Adobe Photoshop would take care of anything at all I 
> would need to do, but due to lack of cash, buying it is out of the 
> question for now.
> 
> Oh yes, I figured I would need to contact the person who originally 
> did the book, and ask for a high res file of the images. I just wanted 
> someone to see what a difference lightening them, makes.
> 
> Leigh


_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


---------------------------------
Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls.  Great rates starting at 1?/min.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060709/af27622a/attachment-0001.html
From urbangleaner56 at yahoo.com  Sun Jul  9 20:50:29 2006
From: urbangleaner56 at yahoo.com (Jacqulyn Perry)
Date: Sun Jul  9 20:50:33 2006
Subject: [gutvol-d] Comments & Questions About Book Illustrations
In-Reply-To: <Pine.GSO.4.58.0607092038181.21456@vtn1.victoria.tc.ca>
Message-ID: <20060710035029.19529.qmail@web38514.mail.mud.yahoo.com>

LOL! Yeah, this looks like its going to combine some of my favorite things... art, and learning new stuff! Can't get much better than that!
  Leigh

Andrew Sly <sly@victoria.tc.ca> wrote:
  
That's a good point. Through being involved with Project Gutenberg,
I've learned many different things in more areas than I would have
imagined.

Andrew

On Sun, 9 Jul 2006, Jacqulyn Perry wrote:

> The other thing is, that I need to educate myself about computer imaging programs ect, because until I did a web search, I had no idea WHAT PNG is, or that computers are capable of 'true color'. I need to talk to my son-in-law and ask him how to go about finding out what my computer and monitor are capable of. So obviously I have some homework ahead of me.
> Leigh
>
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


---------------------------------
Do you Yahoo!?
 Get on board. You're invited to try the new Yahoo! Mail Beta.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060709/d785884c/attachment.html
From grythumn at gmail.com  Sun Jul  9 21:23:13 2006
From: grythumn at gmail.com (Robert Cicconetti)
Date: Sun Jul  9 21:23:15 2006
Subject: [gutvol-d] Comments & Questions About Book Illustration
In-Reply-To: <20060710034702.36120.qmail@web38510.mail.mud.yahoo.com>
References: <44B17CBE.6020907@verizon.net>
	<20060710034702.36120.qmail@web38510.mail.mud.yahoo.com>
Message-ID: <15cfa2a50607092123g5982d349g3c8cb9233d76de1c@mail.gmail.com>

There is some documentation at:

http://www.gimp.org/docs/

And a list of books in print at:

http://www.gimp.org/books/

I have done most of the image processing on the Beatrix Potter books (Not
all, but most of them), and while I'm not 100% happy with them, they are
fairly decent. For an example:

http://www.gutenberg.org/files/17089/17089-h/17089-h.htm

My workflow for the potter books (color, screened images):
(Note, this also assumes that you are working within the DP workflow; not
everyone chooses to do so.)

1) Scan at 600 DPI 24-bit color, using a scanner profile, and not
auto-exposure. (Auto-exposure was causing some hard-to-track-down color
shifts.) I have calibrated my scanner with a standard target, but the bigger
issue is to get the scans consistent across the whole scan. A rough
calibration of your monitor settings is not a bad idea, either. There are a
number of websites with color bars to assist this.
2) Load two or three images into Gimp/Photoshop, and play with the settings
for that particular book.. radius for gaussian blur in the descreening
process, levels, unsharp settings, etc. After determining the settings, I'll
load them into an action and apply them to all images.

a) Descreen. There are some alternative methods (the russian descreening
plugin is pretty good at preserving detail), and some scanner drivers do
better than others when descreening at stage 1. Typically, if descreening in
software, I'll magnify to 200%, and adjust the gaussian blur radius to just
show a slight visible hatching in darker areas.
b) Downscale (bicubic) to 300 DPI. This reduces file size and makes image
easier to work with, now that we have removed any possibility of moire
patterns. (Staying at 600 DPI does not help as we threw away any extra
information with the blur.)
c) Adjust levels, hue, and saturation. It is best done with a light hand, as
it is difficult to tell exactly how the colors looked before they faded.
Typically I'll show a sample to a few people as a sanity check at this
stage.
d) If I don't think the post processor will do one after the final scaling,
I will do a moderate sharpening (usually unsharp mask) at this stage to
compensate for the blurring effect from  steps b and a.

(These steps can't be scripted and have to be done to each image by hand.)

e) Clean up dust marks, printing artifacts, etc. I haven't found anything
particularly effective at repairing misaligned screens (it is usually not
visible at the final resolution in any case), but most other problems can be
cleaned up.
f) Crop, and force remaining background (if any) to white. Remember, your
image will be rectangular and displayed against a white background. You can
specify an alpha layer in your PNG, but this will not be preserved in the
final JPG.

g) Save as PNG. At this stage, images will be on the order of 1 MB apiece.

I'd prefer to save the raw scans as well, but they are much bigger.

The post processor downloads these images, scales them to fit current
guidelines (usually 400-600 pixels across), compresses them to an
appropriate format (usually JPG for these types of images), and inserts them
into the HTML. Most images end up in the 50-100 kb range.

R C

On 7/9/06, Jacqulyn Perry <urbangleaner56@yahoo.com> wrote:
>
> Oh good!  Does the GIMP software have a Help file? I hope?
>
> Okay, what is DP? Is it part of PG?  I'm really interested in working with
> childrens book illus, as that is what I have some background in as an
> artist. After I've learned more about working with computer graphics
> programs, I would probably be willing to help out with the other types of
> imaging work as well.
> Leigh
>
> *Juliet Sutherland <vze3rknp@verizon.net>* wrote:
>
> Hi Jacqulyn,
>
> You might have a look at The GIMP, which does almost everything
> photoshop does and is free.
>
> There is an Illustrators team at DP that always needs help. I hope that
> you, and perhaps some of the folks you are in contact with, will join
> and give us a hand with our illustrations. I hope that eventually DP
> will have a parallel process that will have experts preparing
> illustrations while the text is being proofed and formatted.
>
> Whether for DP or otherwise, there are several common steps in prepping
> illustrations for a PG book. First, the originals have to be scanned.
> Getting good scans of illustrations takes practice and not all of our
> volunteer content providers are good at it. But everyone does the best
> they can and we encourage them all to scan illustrations at a decent
> resolution and, in the case of DP, upload those scans to our server.
> Another stage of the process is taking the raw scans and making them as
> good as possible, while still leaving them large for archiving. I
> usually do this before I upload to the DP server (stuff like deskewing,
> making sure the colors match as best as possible, etc). But not all
> volunteers have learned enough about graphics programs to do that part.
> Then further, PG usually wants illustrations that will look good on a
> screen, and to keep the overall file sizes down, so there is another
> stage of processing that reduces the image as much as possible without
> unacceptable loss of detail. There are definitely tricks to doing that
> (which I don't know). Often folks will choose to make a smaller version
> for display within the ebook and a larger one that can be obtained by
> clicking on the picture. Also, what's considered "reasonable" for size
> and detail depends to some extent on the book. A children's picture
> book, or a book about art, can reasonably have larger illustrations than
> something that was starting with not-so-good B&W photographs poorly
> printed.
>
> We deal with everything from simple line art to steel-cut engravings
> (very fine detail) to printed color illustrations (needing descreening)
> to the xyz-gravure stuff that seems to scan beautifully (I don't know
> what the process is for the various -gravure stuff but it doesn't seem
> to result in the same kind of screen dots that one sees in most color or
> B&W photo stuff), to beat-up decorative book covers. There are also
> illustrations and maps that are too large to be scanned in one piece and
> need to be put back together. Lots of challenges for people who like to
> do restoration.
>
> JulietS
> DP Site Admin
>
>
>
> Jacqulyn Perry wrote:
>
> > My real interest is in working with the images. My problem is I'm not
> > very computer literate-I'm an 'old fashion' painter-so I'm not sure
> > how much I can do with the limited graphics programs I have.
> >
> > I'm pretty sure I can take care of the smudge, but it would require me
> > using Paint to remove the smudge, then printing the image out and
> > doing the retouch by hand-I used to work as a photo re-touch
> > person-then scanning the re-touched image to send back to you. Which I
> > would be glad to do.
> >
> > I've posted the image at an artist website I belong to, that has a
> > VERY active computer graphics forum and asked their advice. I've also
> > asked about a good graphics program. Though from what I've seen so
> > far, most of the images you folks have, at the most just require a
> > little brightening and maybe a tiny bite of color adjustment. That I
> > CAN do with what I have. Anything I CAN'T do, I will say so.
> >
> > I'm sure that Adobe Photoshop would take care of anything at all I
> > would need to do, but due to lack of cash, buying it is out of the
> > question for now.
> >
> > Oh yes, I figured I would need to contact the person who originally
> > did the book, and ask for a high res file of the images. I just wanted
> > someone to see what a difference lightening them, makes.
> >
> > Leigh
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
> ------------------------------
> Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls. Great rates
> starting at 1?/min.
> <http://us.rd.yahoo.com/mail_us/taglines/postman7/*http://us.rd.yahoo.com/evt=39666/*http://messenger.yahoo.com>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060710/401b772e/attachment.html
From sly at victoria.tc.ca  Mon Jul 10 00:49:32 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon Jul 10 00:49:38 2006
Subject: [gutvol-d] Categorizing PG content
Message-ID: <Pine.GSO.4.58.0607100048510.933@vtn1.victoria.tc.ca>


I've just sent a message to the PG catalog list exploring
categorizing posibilities for PG.

I've put a copy of it at:
http://www.victoria.tc.ca/~sly/pgcat.txt

Any extensive discussions might be better placed
on the gutcat list. To subscribe, see:
http://lists.pglaf.org/listinfo.cgi

Andrew
From hyphen at hyphenologist.co.uk  Mon Jul 10 02:15:14 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Mon Jul 10 02:15:27 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <Pine.GSO.4.58.0607100048510.933@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0607100048510.933@vtn1.victoria.tc.ca>
Message-ID: <nf64b25vm2gbh0o51qvveokdu7nfm3ervh@4ax.com>

On Mon, 10 Jul 2006 00:49:32 -0700 (PDT),  Andrew Sly <sly@victoria.tc.ca>
wrote:

|
|I've just sent a message to the PG catalog list exploring
|categorizing posibilities for PG.
|
|I've put a copy of it at:
|http://www.victoria.tc.ca/~sly/pgcat.txt
|
|Any extensive discussions might be better placed
|on the gutcat list. To subscribe, see:
|http://lists.pglaf.org/listinfo.cgi

Just a plea for Dewey Decimal.
http://www.oclc.org/dewey/

-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain

From Bowerbird at aol.com  Mon Jul 10 02:43:12 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Jul 10 02:43:24 2006
Subject: [gutvol-d] Categorizing PG content
Message-ID: <4f8.289c0c6.31e37b30@aol.com>

andrew said:
>    Any extensive discussions might be better placed on the gutcat list.

extensive discussions on this topic were already held here on gutvol-d.
why go through it all again?   and again and again and again?

if you don't put this stuff on a wiki, you're just on a merry-go-round...

it stifles participation when contributions regularly go down the drain.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060710/dd01bfb3/attachment-0001.html
From marcello at perathoner.de  Mon Jul 10 04:27:45 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon Jul 10 04:28:11 2006
Subject: [gutvol-d] PG Wiki
In-Reply-To: <000001c6a3a2$31797270$0300a8c0@blackbox>
References: <000001c6a3a2$31797270$0300a8c0@blackbox>
Message-ID: <44B239B1.4000003@perathoner.de>

Aaron Cannon wrote:

> Hi Marcello and all.  Any word on when the static content on Gutenberg.org
> will be done away with and replaced by the wiki?

I still have to port the FAQ and then I'd like to export all book
reviews also.

Currently there's a lot of traffic due to the 1M book drive, I think
I'll switch when things return normal.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From sly at victoria.tc.ca  Mon Jul 10 08:32:20 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon Jul 10 08:32:46 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <nf64b25vm2gbh0o51qvveokdu7nfm3ervh@4ax.com>
References: <Pine.GSO.4.58.0607100048510.933@vtn1.victoria.tc.ca>
	<nf64b25vm2gbh0o51qvveokdu7nfm3ervh@4ax.com>
Message-ID: <Pine.GSO.4.58.0607100827180.10036@vtn1.victoria.tc.ca>


On Mon, 10 Jul 2006, Dave Fawthrop wrote:

> On Mon, 10 Jul 2006 00:49:32 -0700 (PDT),  Andrew Sly <sly@victoria.tc.ca>
> wrote:
>
> |
> |I've just sent a message to the PG catalog list exploring
> |categorizing posibilities for PG.
> |
> |I've put a copy of it at:
> |http://www.victoria.tc.ca/~sly/pgcat.txt
> |

> Just a plea for Dewey Decimal.
> http://www.oclc.org/dewey/
>

As I say in my message:

>Dewey-Decimal Classification
>
>This appeals to me for being strongly language independent.
>That is, this could be perhaps the easiest way to classify
>PG texts, which could, in the future, be translated into
>different interfaces in different languages.
>
>Drawbacks:
>Intellectual rights claims may limit usage. (OCLC claims rights
>to use this system and licences it out to libraries.)

A while ago, the Internet Public Library had a large
index of online books, organized on a dewy-decimal system.
I wonder if pressure from OCLC had anything to do with it
disappearing.

Andrew
From joshua at hutchinson.net  Mon Jul 10 08:40:58 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Jul 10 08:41:04 2006
Subject: [gutvol-d] Categorizing PG content
Message-ID: <20060710154059.1A38E9EEE3@ws6-2.us4.outblaze.com>

How is it that the OCLC can enforce such a claim when the DDS was first written in 1870 (according to their website)?  Shouldn't it be out of copyright and therefore open for anyone to use?


> ----- Original Message -----
> From: "Andrew Sly" <sly@victoria.tc.ca>
> > Drawbacks:
> > Intellectual rights claims may limit usage. (OCLC claims rights
> > to use this system and licences it out to libraries.)
> 

From sly at victoria.tc.ca  Mon Jul 10 08:51:13 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon Jul 10 08:51:15 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <4f8.289c0c6.31e37b30@aol.com>
References: <4f8.289c0c6.31e37b30@aol.com>
Message-ID: <Pine.GSO.4.58.0607100833110.10036@vtn1.victoria.tc.ca>


Thanks for trying to help here bb.

What I'm doing here is not the same as unfocused, meandering
exchange of possible ideas you can often find on a mailing list.
(which is sometimes a good thing to have). Instead, I'm trying
to move ahead with something that will actually contribute to
making PG content more accessible to many people out there.

I've already discussed this with Marcello, and we could already
"have this stuff on a wiki", but I'm overly cautious, and wanted
to get input from other people, that might help implement it
in a better way, and also give people an idea of what's coming.

Andrew

On Mon, 10 Jul 2006 Bowerbird@aol.com wrote:

> andrew said:
> >    Any extensive discussions might be better placed on the gutcat list.
>
> extensive discussions on this topic were already held here on gutvol-d.
> why go through it all again?   and again and again and again?
>
> if you don't put this stuff on a wiki, you're just on a merry-go-round...
>
> it stifles participation when contributions regularly go down the drain.
>
> -bowerbird
>
From hyphen at hyphenologist.co.uk  Mon Jul 10 09:01:31 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Mon Jul 10 09:01:46 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <20060710154059.1A38E9EEE3@ws6-2.us4.outblaze.com>
References: <20060710154059.1A38E9EEE3@ws6-2.us4.outblaze.com>
Message-ID: <hcu4b21gpnrj30blfpllv4c8s0ahehsjs8@4ax.com>

On Mon, 10 Jul 2006 10:40:58 -0500,  "Joshua Hutchinson"
<joshua@hutchinson.net> wrote:

|How is it that the OCLC can enforce such a claim when the DDS was first written in 1870 (according to their website)?  Shouldn't it be out of copyright and therefore open for anyone to use?
|
|
|> ----- Original Message -----
|> From: "Andrew Sly" <sly@victoria.tc.ca>
|> > Drawbacks:
|> > Intellectual rights claims may limit usage. (OCLC claims rights
|> > to use this system and licences it out to libraries.)

There will be a 1922 version which we could use.
-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain

From sly at victoria.tc.ca  Mon Jul 10 09:13:52 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon Jul 10 09:13:55 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <hcu4b21gpnrj30blfpllv4c8s0ahehsjs8@4ax.com>
References: <20060710154059.1A38E9EEE3@ws6-2.us4.outblaze.com>
	<hcu4b21gpnrj30blfpllv4c8s0ahehsjs8@4ax.com>
Message-ID: <Pine.GSO.4.58.0607100913150.1190@vtn1.victoria.tc.ca>


On Mon, 10 Jul 2006, Dave Fawthrop wrote:

> On Mon, 10 Jul 2006 10:40:58 -0500,  "Joshua Hutchinson"
> <joshua@hutchinson.net> wrote:
>
> |How is it that the OCLC can enforce such a claim when the DDS was first written in 1870 (according to their website)?  Shouldn't it be out of copyright and therefore open for anyone to use?
> |
> |
> |> ----- Original Message -----
> |> From: "Andrew Sly" <sly@victoria.tc.ca>
> |> > Drawbacks:
> |> > Intellectual rights claims may limit usage. (OCLC claims rights
> |> > to use this system and licences it out to libraries.)
>
> There will be a 1922 version which we could use.
>

Looking at the site: http://www.oclc.org/dewey/
I see this notice at the bottom of the page:

   All copyright rights in the Dewey Decimal Classification system are
   owned by OCLC.
   Dewey, Dewey Decimal Classification, DDC, OCLC and WebDewey are
   registered trademarks of OCLC.

In other words, they are taking everything they can get.
DDC is regularly revised. (I believe the 22nd edition is latest)
Modern Dewey is significantly different from its original publication.
Also, the term is trademarked.

However, I'm not saying "No, this is impossible." A good thing
about the wiki approach is that it (hopefully) encourages
different concurrent approaches. I'm just suggesting that
if PG ends up having a high-profile use of DDC, OCLC might
object.

A drawback of using some old version is consistency with
what is current. Not only have many new headings been added
over time, but there has been much revising and moving
headings from one place to another.
From cannona at fireantproductions.com  Mon Jul 10 09:36:50 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Mon Jul 10 09:37:05 2006
Subject: [gutvol-d] Categorizing PG content
References: <20060710154059.1A38E9EEE3@ws6-2.us4.outblaze.com>
	<hcu4b21gpnrj30blfpllv4c8s0ahehsjs8@4ax.com>
Message-ID: <001301c6a43f$0e2f8cd0$0300a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Here is my $0.02:

I think that the Dewey-decimal system is the best of all the options.  The
main advantage I see it having is that it would be easy to extract and work
with when building collections.  It would also be easy to use to find books
from a particular subject.  However, just because we have the Dewey system,
I don't think that would mean that we couldn't have a wiki as well.  A wiki
is great because it would allow readers to categorize books beyond what was
offered by Dewey.

As far as the legal constraints go, you can see what Wikipedia is doing to
over come (or not) this limitation at
http://en.wikipedia.org/wiki/Wikipedia:Dewey_Decimal_System

Sincerely
Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFEsoIvI7J99hVZuJcRAgXlAJ9QaBmGsH715P1IxMx7Hy+jTq5b4wCg423d
DvZedGOX4nbqPVsFUord/VI=
=p9UN
-----END PGP SIGNATURE-----

From Bowerbird at aol.com  Mon Jul 10 09:57:17 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Jul 10 09:57:23 2006
Subject: [gutvol-d] Categorizing PG content
Message-ID: <515.37dcd19.31e3e0ed@aol.com>

andrew said:
>    Thanks for trying to help here bb.

thanks for recognizing the intent.


>   I'm trying to move ahead with something that 
>    will actually contribute to making PG content 
>    more accessible to many people out there.

that's a great goal.   putting your plan on a wiki
-- the _plan_, not necessarily the catalog itself,
although that might be a rather good idea too --
will ensure that other people will cumulate on it,
if your current execution doesn't take it all the way
to completion.   otherwise, the institutional memory
will be lost, and they'll be starting from scratch again,
just like you are now.   make sure your work _persists_.

a wiki will even help in getting new people "up to speed"
as the come on-board your effort.   it's rather unwieldy to
access the archives of these listserves, as well as to keep up
with "the current policy" when it necessarily evolves quickly...

***

as for dewey decimal and the o.c.l.c., why don't you approach
them and tell them you'd like to use their system for p.g., and
maybe even ask them to contribute some of their _expertise_
in addition to simply granting permission.   i would think that
some people over there might be happy to do pro bono for p.g.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060710/363ee4d5/attachment.html
From dixonm at pobox.com  Mon Jul 10 10:00:15 2006
From: dixonm at pobox.com (Meredith Dixon)
Date: Mon Jul 10 10:00:12 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <Pine.GSO.4.58.0607100913150.1190@vtn1.victoria.tc.ca>
References: <20060710154059.1A38E9EEE3@ws6-2.us4.outblaze.com>	<hcu4b21gpnrj30blfpllv4c8s0ahehsjs8@4ax.com>
	<Pine.GSO.4.58.0607100913150.1190@vtn1.victoria.tc.ca>
Message-ID: <44B2879F.7000502@pobox.com>

There's also the problem that there's not just one "modern Dewey".  The 
DDC as used outside the U.S.  (the UDC)
is significantly different from the DDC within the U.S.   At least, that 
was true when I was last a cataloger; I've been
out of the field for fifteen years.


-- 
Meredith Dixon <dixonm@pobox.com>
Check out *Raven Days* <www.ravendays.org>
For victims and survivors of bullying at school.
And for those who want to help.
From joey at joeysmith.com  Mon Jul 10 12:09:43 2006
From: joey at joeysmith.com (joey)
Date: Mon Jul 10 12:11:34 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <Pine.GSO.4.58.0607100913150.1190@vtn1.victoria.tc.ca>
References: <20060710154059.1A38E9EEE3@ws6-2.us4.outblaze.com>
	<hcu4b21gpnrj30blfpllv4c8s0ahehsjs8@4ax.com>
	<Pine.GSO.4.58.0607100913150.1190@vtn1.victoria.tc.ca>
Message-ID: <20060710190942.GL20863@joeysmith.com>

I don't think a wiki entry per book is a very elegant or scalable way
to approach this. I have an alternate suggestion that I'd like to put
together, but won't have anything to show until Friday.
From Bowerbird at aol.com  Mon Jul 10 13:28:01 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Jul 10 13:28:09 2006
Subject: [gutvol-d] Categorizing PG content
Message-ID: <24d.d9fdbcc.31e41251@aol.com>

joey said:
>     I don't think a wiki entry per book is 
>    a very elegant or scalable way to approach this.

again, i suggested a wiki for _coordination_,
not for cataloging per se.   there's a whole
lot of coordination that needs to be done
before you can even begin the cataloging.

but having said that, i would agree with joey
that it wouldn't be elegant or scalable to have
a wiki-page for each book, and i don't think
anyone would even suggest that for "a catalog".

(you might want to have a separate wiki-page
for each book for _discussion_ about that book,
but that would be an entirely different animal.)

i'd suggest a wiki-page for each "big category"
-- e.g., reference, fiction, nonfiction, serials --
with a list of e-book numbers and titles in each.

when a page gets too big, split it into sub-pages
depending on what kind of split of it makes sense.
(that's assuming that a split _does_ make sense.)

but again, much of the thought-work on this has
already been done previously on this very listserve,
so someone should first recover all of that work
instead of doing it all over again from scratch...

further, it should be possible to leverage some of
the work that greg just did in creating the d.v.d.

for instance, i draw your attention to the files here:
>    http://snowy.arsc.alaska.edu/gbn/pgimages/amazon/

there are amazon pages for the penguin classics library;
one list is sorted by title, the other list is sorted by author.
although the book-links on these copied pages don't work,
if you go to the current amazon pages, the links will work.

those individual-book pages could be quite useful to you.
for instance, the one for "around the world in 80 days" has:

>    Subjects
>    > Literature & Fiction > General > Classics
>    > Literature & Fiction > World Literature > British > 19th Century
>    > Literature & Fiction > World Literature > French
>    
>    Look for similar items by subject Classics
>    Fiction
>    French Novel And Short Story
>    Literature - Classics / Criticism
>    Literature: Classics
>    Voyages around the world
>    19th century fiction
>    Classic fiction
>    Fiction / Classics
>    French

if you were to scrape the amazon page for every e-text that p.g. has,
you'd end up with a lot of information to help you create a catalog...

perhaps the first thing you need to do is make a skeleton of exactly
how you want your catalog to look, and how you want it to behave...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060710/8c016547/attachment.html
From sly at victoria.tc.ca  Mon Jul 10 14:06:32 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon Jul 10 14:06:36 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <24d.d9fdbcc.31e41251@aol.com>
References: <24d.d9fdbcc.31e41251@aol.com>
Message-ID: <Pine.GSO.4.58.0607101350320.29000@vtn1.victoria.tc.ca>


Ok, more clarification then. I'm not talking about
trying to make something which strives towards
the ideals of traditional library cataloging.
Personally, I do not believe that that could
happen with a group of volunteers.

Please notice that the subject heading used
for this discussion uses the word "Categorizing",
not "cataloging". I am interested in, as you
suggest below, creating broader catagories.

And yes, I am also aware of many sources we
could look to for some ideas. Probably the most
productive would be taking a look at places that
have reformatted PG texts and presented them
anew, such as Blackmask, or Samizdat.

Andrew
On Mon, 10 Jul 2006 Bowerbird@aol.com wrote:

> again, i suggested a wiki for _coordination_,
> not for cataloging per se.   there's a whole
> lot of coordination that needs to be done
> before you can even begin the cataloging.

> but having said that, i would agree with joey
> that it wouldn't be elegant or scalable to have
> a wiki-page for each book, and i don't think
> anyone would even suggest that for "a catalog".
>
> (you might want to have a separate wiki-page
> for each book for _discussion_ about that book,
> but that would be an entirely different animal.)
>
> i'd suggest a wiki-page for each "big category"
> -- e.g., reference, fiction, nonfiction, serials --
> with a list of e-book numbers and titles in each.
>
> when a page gets too big, split it into sub-pages
> depending on what kind of split of it makes sense.
> (that's assuming that a split _does_ make sense.)
>
> but again, much of the thought-work on this has
> already been done previously on this very listserve,
> so someone should first recover all of that work
> instead of doing it all over again from scratch...
>
> further, it should be possible to leverage some of
> the work that greg just did in creating the d.v.d.
>
> for instance, i draw your attention to the files here:
> >    http://snowy.arsc.alaska.edu/gbn/pgimages/amazon/
>
> there are amazon pages for the penguin classics library;
> one list is sorted by title, the other list is sorted by author.
> although the book-links on these copied pages don't work,
> if you go to the current amazon pages, the links will work.
>
> those individual-book pages could be quite useful to you.
> for instance, the one for "around the world in 80 days" has:
>
> >    Subjects
> >    > Literature & Fiction > General > Classics
> >    > Literature & Fiction > World Literature > British > 19th Century
> >    > Literature & Fiction > World Literature > French
> >
> >    Look for similar items by subject Classics
> >    Fiction
> >    French Novel And Short Story
> >    Literature - Classics / Criticism
> >    Literature: Classics
> >    Voyages around the world
> >    19th century fiction
> >    Classic fiction
> >    Fiction / Classics
> >    French
>
> if you were to scrape the amazon page for every e-text that p.g. has,
> you'd end up with a lot of information to help you create a catalog...
>
> perhaps the first thing you need to do is make a skeleton of exactly
> how you want your catalog to look, and how you want it to behave...
>
> -bowerbird
>
From Bowerbird at aol.com  Mon Jul 10 14:53:23 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Jul 10 14:53:49 2006
Subject: [gutvol-d] Categorizing PG content
Message-ID: <542.2eb5a78.31e42653@aol.com>

andrew said:
>    And yes, I am also aware of many sources we
>    could look to for some ideas. 

scraping will give you more than "ideas"...         ;+)


>    Probably the most productive would be 
>    taking a look at places that have reformatted 
>    PG texts and presented them anew, such as 
>    Blackmask, or Samizdat.

fantastic idea!

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060710/3100cd05/attachment.html
From greg at durendal.org  Mon Jul 10 16:00:08 2006
From: greg at durendal.org (Greg Weeks)
Date: Mon Jul 10 16:30:03 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <542.2eb5a78.31e42653@aol.com>
References: <542.2eb5a78.31e42653@aol.com>
Message-ID: <Pine.LNX.4.63.0607101858160.26796@durendal.durendal.org>

On Mon, 10 Jul 2006, Bowerbird@aol.com wrote:

>>    Blackmask, or Samizdat.

Speaking of Blackmask, has anyone heard anything about them? They are 
still down from the DMCA takedown and I've not heard anything since.

-- 
Greg Weeks
http://durendal.org:8080/greg/

From phil at thalasson.com  Mon Jul 10 18:15:44 2006
From: phil at thalasson.com (Philip Baker)
Date: Mon Jul 10 18:18:08 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <20060710154059.1A38E9EEE3@ws6-2.us4.outblaze.com>
Message-ID: <jKsAwAAAvvsEFwfZ@thalasson.com>

In article <20060710154059.1A38E9EEE3@ws6-2.us4.outblaze.com>, Joshua
Hutchinson <joshua@hutchinson.net> writes
>How is it that the OCLC can enforce such a claim when the DDS was first written 
>in 1870 (according to their website)?  Shouldn't it be out of copyright and 
>therefore open for anyone to use?

Simply using the DDS does not necessarily require making a copy of any
DDS specification. To give an analogy. You don't breach copyright by
following the diet presented a diet book.
-- 
Philip Baker
From joshua at hutchinson.net  Mon Jul 10 19:18:11 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Jul 10 19:18:17 2006
Subject: [gutvol-d] Copyrighted texts in PG catalog
Message-ID: <20060711021812.0C9F99EEB4@ws6-2.us4.outblaze.com>

What does the catalog system check for to see if a text is copyrighted?  For instance, http://www.gutenberg.org/etext/16697, is copyrighted, and shows up as such in each file, but the catalog page shows that it is NOT copyrighted.  Is this something I make have done wrong (or David did wrong in posting) or is it a problem with the catalog system?

Thanks,
Josh
From sly at victoria.tc.ca  Mon Jul 10 19:56:03 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon Jul 10 19:56:07 2006
Subject: [gutvol-d] Copyrighted texts in PG catalog
In-Reply-To: <20060711021812.0C9F99EEB4@ws6-2.us4.outblaze.com>
References: <20060711021812.0C9F99EEB4@ws6-2.us4.outblaze.com>
Message-ID: <Pine.GSO.4.58.0607101939500.5287@vtn1.victoria.tc.ca>

That is a good question.

Well, I have one guess. When I look at the copyrighted PG texts
posted in 2005, it looks like they all have these two lines in
the header:

** This is a COPYRIGHTED Project Gutenberg eBook, Details Below **
**     Please follow the copyright guidelines in this file.     **

In the text in question, I found this:

This is a _copyrighted_ Project Gutenberg eBook, details below. Please
follow the copyright guidelines in this file.

I would guess that it is something to do with the particular
formatting with the **'s

I've marked PG#16697 as copyrighted and generated a new page
for it. How many others might be like this...?

Andrew

On Mon, 10 Jul 2006, Joshua Hutchinson wrote:

> What does the catalog system check for to see if a text is copyrighted?  For instance, http://www.gutenberg.org/etext/16697, is copyrighted, and shows up as such in each file, but the catalog page shows that it is NOT copyrighted.  Is this something I make have done wrong (or David did wrong in posting) or is it a problem with the catalog system?
>
> Thanks,
> Josh
From gbuchana at teksavvy.com  Mon Jul 10 20:18:02 2006
From: gbuchana at teksavvy.com (Gardner Buchanan)
Date: Mon Jul 10 20:26:30 2006
Subject: [gutvol-d] Comments & Questions About Book Illustrations
In-Reply-To: <Pine.GSO.4.58.0607090848020.15320@vtn1.victoria.tc.ca>
References: <20060709093716.13839.qmail@web38509.mail.mud.yahoo.com>
	<Pine.GSO.4.58.0607090848020.15320@vtn1.victoria.tc.ca>
Message-ID: <44B3186A.8000302@teksavvy.com>

Hi Andrew,

Andrew Sly wrote:

> How accomplished are you as editing images? If you'd like to help
> with this particular item I have underway, one image could use a
> small clean-up where there seems to be a smudge of blue ink.
> Take a look at http://www.victoria.tc.ca/~sly/pb.htm
> (See the first "alphabet" image, on the baby's forehead.)
> This is beyond what I feel comfortable dealing with.
> If you are interested, I have a 1.4mb png file that I would
> request to have edited and returned in the same format.
> 

If you have not already found someone with the necessary software, I
can take care of that in a couple of secs.  It would be best to
work from a larger sized image -- I assume the large PNG is basically
at the original scanned resolution.

Also, if it seems desirable, I can set you up with Photoshop and/or
any other Adobe software you might need or want.  I work for Adobe
and can get anything PG might need.

============================================================
Gardner Buchanan                     <gbuchana@teksavvy.com>
Ottawa, ON             FreeBSD: Where you want to go. Today.
From marcello at perathoner.de  Tue Jul 11 05:43:40 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Jul 11 05:44:08 2006
Subject: [gutvol-d] Copyrighted texts in PG catalog
In-Reply-To: <20060711021812.0C9F99EEB4@ws6-2.us4.outblaze.com>
References: <20060711021812.0C9F99EEB4@ws6-2.us4.outblaze.com>
Message-ID: <44B39CFC.6030108@perathoner.de>

Joshua Hutchinson wrote:

> What does the catalog system check for to see if a text is
> copyrighted?  For instance, http://www.gutenberg.org/etext/16697, is
> copyrighted, and shows up as such in each file, but the catalog page
> shows that it is NOT copyrighted.  Is this something I make have done
> wrong (or David did wrong in posting) or is it a problem with the
> catalog system?


        if (/COPYRIGHTED Project Gutenberg eBook/i) {
            $o->{'copyright'} = 1;
        }


See also:

  http://www.gutenberg.org/howto/header-howto


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Tue Jul 11 05:48:32 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Jul 11 05:48:34 2006
Subject: [gutvol-d] Copyrighted texts in PG catalog
Message-ID: <20060711124832.222149E836@ws6-2.us4.outblaze.com>

Quick look through my recently posted shows these:

16697 - marked as copyright, but has an extra .htm file.  Keep the .html and delete the .htm

16939
16940
16941
16983  (This one also has an extra .htm file.  Keep the .html and delete the .htm)
16984  (This one also has an extra .htm file.  Keep the .html and delete the .htm)
16985  (This one also has an extra .htm file.  Keep the .html and delete the .htm)
16986  (This one also has an extra .htm file.  Keep the .html and delete the .htm)
17309 
17310

Josh

NOTE: For those curious, I believe the extra .htm file came about because we reposted these recently to fix a problem with a missing PGHeader/Footer text and the old scripts generated .htm and the new scripts generate .html


> ----- Original Message -----
> From: "Andrew Sly" <sly@victoria.tc.ca>
> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@pglaf.org>
> Subject: Re: [gutvol-d] Copyrighted texts in PG catalog
> Date: Mon, 10 Jul 2006 19:56:03 -0700 (PDT)
> 
> 
> That is a good question.
> 
> Well, I have one guess. When I look at the copyrighted PG texts
> posted in 2005, it looks like they all have these two lines in
> the header:
> 
> ** This is a COPYRIGHTED Project Gutenberg eBook, Details Below **
> **     Please follow the copyright guidelines in this file.     **
> 
> In the text in question, I found this:
> 
> This is a _copyrighted_ Project Gutenberg eBook, details below. Please
> follow the copyright guidelines in this file.
> 
> I would guess that it is something to do with the particular
> formatting with the **'s
> 
> I've marked PG#16697 as copyrighted and generated a new page
> for it. How many others might be like this...?
> 
> Andrew
> 
> On Mon, 10 Jul 2006, Joshua Hutchinson wrote:
> 
> > What does the catalog system check for to see if a text is copyrighted?  For 
> > instance, http://www.gutenberg.org/etext/16697, is copyrighted, and shows up 
> > as such in each file, but the catalog page shows that it is NOT copyrighted. 
> >  Is this something I make have done wrong (or David did wrong in posting) or 
> > is it a problem with the catalog system?
> >
> > Thanks,
> > Josh
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

>

From marcello at perathoner.de  Tue Jul 11 06:02:29 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Jul 11 06:02:57 2006
Subject: [gutvol-d] Copyrighted texts in PG catalog
In-Reply-To: <20060711124832.222149E836@ws6-2.us4.outblaze.com>
References: <20060711124832.222149E836@ws6-2.us4.outblaze.com>
Message-ID: <44B3A165.9080609@perathoner.de>

Joshua Hutchinson wrote:

> 16939
> 16940
> 16941
> 16983  (This one also has an extra .htm file.  Keep the .html and delete the .htm)
> 16984  (This one also has an extra .htm file.  Keep the .html and delete the .htm)
> 16985  (This one also has an extra .htm file.  Keep the .html and delete the .htm)
> 16986  (This one also has an extra .htm file.  Keep the .html and delete the .htm)
> 17309 
> 17310

gutenberg=> update books set copyrighted = 1 where pk in
(16939,16940,16941,16983,16984,16985,16986,17309,17310);
UPDATE 9
gutenberg=>

Bibrec pages will get fixed when their caching expires (max. 24 h) or
you can rebuild them manually with:

  http://www.gutenberg.org/etext/16697r

(notice the 'r' appended to url)


-- 
Marcello Perathoner
webmaster@gutenberg.org

From prosfilaes at gmail.com  Tue Jul 11 13:07:57 2006
From: prosfilaes at gmail.com (David Starner)
Date: Tue Jul 11 13:08:06 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <jKsAwAAAvvsEFwfZ@thalasson.com>
References: <20060710154059.1A38E9EEE3@ws6-2.us4.outblaze.com>
	<jKsAwAAAvvsEFwfZ@thalasson.com>
Message-ID: <6d99d1fd0607111307ic18afd6n8184ed4c5398fc4d@mail.gmail.com>

On 7/10/06, Philip Baker <phil@thalasson.com> wrote:
> Simply using the DDS does not necessarily require making a copy of any
> DDS specification. To give an analogy. You don't breach copyright by
> following the diet presented a diet book.

But following a diet is a personal action that doesn't fix anything in
a permanent format. Using the DDS to categorize a library is to create
something in physical permanent form, that basically embodies the
system in such a way that the system could more or less be extracted
from our catalog. That's a whole different issue, and I think a judge
might well rule in favor of them on it. It is, IMO, creating a legally
actionable deriviative work.
From klofstrom at gmail.com  Tue Jul 11 13:33:46 2006
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Tue Jul 11 14:05:35 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <6d99d1fd0607111307ic18afd6n8184ed4c5398fc4d@mail.gmail.com>
References: <20060710154059.1A38E9EEE3@ws6-2.us4.outblaze.com>
	<jKsAwAAAvvsEFwfZ@thalasson.com>
	<6d99d1fd0607111307ic18afd6n8184ed4c5398fc4d@mail.gmail.com>
Message-ID: <1e8e65080607111333n7eb4f54fna8d41ccdbcb41bfe@mail.gmail.com>

Suggestion: have a competition to design an open-source cataloging
system for e-books, where there are no physical constraints on
"shelving." Publicize it in library schools. Major ego-boo for the
teacher/graduate student whose scheme is accepted, free design for PG.

--
Zora
aka Karen Lofstrom
From ricardofdiogo at gmail.com  Tue Jul 11 20:07:35 2006
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Tue Jul 11 20:07:37 2006
Subject: [gutvol-d] Copyright question
In-Reply-To: <Pine.GSO.4.58.0607040900410.25387@vtn1.victoria.tc.ca>
References: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>
	<Pine.GSO.4.58.0607040900410.25387@vtn1.victoria.tc.ca>
Message-ID: <9c6138c50607112007r50633afdtb960db2ffe887b3f@mail.gmail.com>

Hi. Is it possible to send PG a collaborative/"distribued" translation
(made, for instance, at Wikisource), based on an already PG published
eBook? (This wouldn't actually be a self-submitted translation, since
Wikisource works based on GFDL... And the original PG etext is already
copyright cleared...)

Ricardo

2006/7/4, Andrew Sly <sly@victoria.tc.ca>:
>
> Copyright laws are different in every country.
>
> I know that in Canada, the duration of copyright is
> determined by the life-span of the creator, regardless
> of who actually owns the copyright. I cannot speak for
> any other countries.
>
> You are unlikely to find a useful answer here on the
> Project Gutenberg Volunteer Discussion list. For a
> list dedicated to discussing copyright issues, see:
> http://www.cni.org/forums/cni-copyright/
>
> Andrew
>
> On Tue, 4 Jul 2006, Juhana Sadeharju wrote:
>
> >
> > Hello. Most often I hear that the copyright of the book lasts 80 years
> > after the death of author. But it is normal that the copyright is
> > transferred to the publisher in the contract. Then why the copyright
> > expiration is still tied to the author who don't have the copyright
> > anymore? Is this misuse of copyright law? Should author keep the
> > copyright (and publisher only license) so that the death+80 rule
> > applies?
> >
> > That is most convenient to publishers, of course, because they
> > get the copyright and its expiration is still tied to the author.
> >
> > In the example case, the book writing contract was made 8 years
> > ago and the contract included the second edition published now.
> > Because the publisher owns the copyright of the second edition
> > already due the contract, the author has never owned the copyright.
> > So how in this case the copyright expiration could never be tied to
> > the author?
> >
> > Juhana
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d@lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


-- 
?Vi de que noite ? feita a luz do dia!?

(Antero de Quental)

D? livros electr?nicos ao Mundo. Ajude em http://www.pgdp.net e em
http://dp.rastko.net
From sly at victoria.tc.ca  Tue Jul 11 21:34:47 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Jul 11 21:34:50 2006
Subject: [gutvol-d] Copyright question
In-Reply-To: <9c6138c50607112007r50633afdtb960db2ffe887b3f@mail.gmail.com>
References: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>
	<Pine.GSO.4.58.0607040900410.25387@vtn1.victoria.tc.ca>
	<9c6138c50607112007r50633afdtb960db2ffe887b3f@mail.gmail.com>
Message-ID: <Pine.GSO.4.58.0607112130330.28341@vtn1.victoria.tc.ca>


Ricardo:

Interesting idea. At one point I had thought about what
it would take to set up some kind of distributed
translation process.

I would say the best way to proceed is to send an
email to Greg Newby (gbnewby [at] pglaf.org)
asking this question, with complete details of
the item in question, and a link to it.

And I'm curious, could you let me know what
it is too?

Andrew

On Wed, 12 Jul 2006, Ricardo F Diogo wrote:

> Hi. Is it possible to send PG a collaborative/"distribued" translation
> (made, for instance, at Wikisource), based on an already PG published
> eBook? (This wouldn't actually be a self-submitted translation, since
> Wikisource works based on GFDL... And the original PG etext is already
> copyright cleared...)
>
> Ricardo
>
From gbnewby at pglaf.org  Wed Jul 12 00:08:34 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Jul 12 00:08:36 2006
Subject: [gutvol-d] Copyright question
In-Reply-To: <Pine.GSO.4.58.0607112130330.28341@vtn1.victoria.tc.ca>
References: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>
	<Pine.GSO.4.58.0607040900410.25387@vtn1.victoria.tc.ca>
	<9c6138c50607112007r50633afdtb960db2ffe887b3f@mail.gmail.com>
	<Pine.GSO.4.58.0607112130330.28341@vtn1.victoria.tc.ca>
Message-ID: <20060712070834.GA32363@pglaf.org>

On Tue, Jul 11, 2006 at 09:34:47PM -0700, Andrew Sly wrote:
> 
> Ricardo:
> 
> Interesting idea. At one point I had thought about what
> it would take to set up some kind of distributed
> translation process.
> 
> I would say the best way to proceed is to send an
> email to Greg Newby (gbnewby [at] pglaf.org)
> asking this question, with complete details of
> the item in question, and a link to it.

Hi, Ricardo.  As long as it's formatted OK for us, we would probably
accept it.  We don't go for a lot of stuff in a few categories of items
(tech docs and religion are two examples), and don't publish PDF-only
documents nor, usually, HTML without plain text.

You can see more guidance here on formatting:
	http://www.gutenberg.org/faq
and here for our general non-public domain submission
guidelines:
	http://www.gutenberg.org/howto/scopy-howto

Note that we don't need the same level of permission
letter for a GFDL/CC/etc. free license, but we still
like to ask for permission.  And, of course, it's still
copyrighted.

Finally, note that we don't have the personnel to 
handle frequent updates.   For documents in flux, PG
is likely not the right destination.

I hope this helps, and thanks for your suggestions &
ideas.
  -- Greg


> 
> And I'm curious, could you let me know what
> it is too?
> 
> Andrew
> 
> On Wed, 12 Jul 2006, Ricardo F Diogo wrote:
> 
> > Hi. Is it possible to send PG a collaborative/"distribued" translation
> > (made, for instance, at Wikisource), based on an already PG published
> > eBook? (This wouldn't actually be a self-submitted translation, since
> > Wikisource works based on GFDL... And the original PG etext is already
> > copyright cleared...)
> >
> > Ricardo
> >
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From walter.van.holst at xs4all.nl  Wed Jul 12 05:12:40 2006
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Wed Jul 12 05:48:28 2006
Subject: [gutvol-d] Copyright question
In-Reply-To: <9c6138c50607112007r50633afdtb960db2ffe887b3f@mail.gmail.com>
References: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>	<Pine.GSO.4.58.0607040900410.25387@vtn1.victoria.tc.ca>
	<9c6138c50607112007r50633afdtb960db2ffe887b3f@mail.gmail.com>
Message-ID: <44B4E738.8070501@xs4all.nl>

Ricardo F Diogo wrote:
> Hi. Is it possible to send PG a collaborative/"distribued" translation
> (made, for instance, at Wikisource), based on an already PG published
> eBook? (This wouldn't actually be a self-submitted translation, since
> Wikisource works based on GFDL... And the original PG etext is already
> copyright cleared...)

That would require some license from the translators that would fit with 
PG. In several countries you have copyrights for translators.

Regards,

  Walter

From ricardofdiogo at gmail.com  Wed Jul 12 05:56:42 2006
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Wed Jul 12 05:56:44 2006
Subject: [gutvol-d] Copyright question
In-Reply-To: <20060712070834.GA32363@pglaf.org>
References: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>
	<Pine.GSO.4.58.0607040900410.25387@vtn1.victoria.tc.ca>
	<9c6138c50607112007r50633afdtb960db2ffe887b3f@mail.gmail.com>
	<Pine.GSO.4.58.0607112130330.28341@vtn1.victoria.tc.ca>
	<20060712070834.GA32363@pglaf.org>
Message-ID: <9c6138c50607120556t35cab646qbf40781c1c4c13fc@mail.gmail.com>

Hi again

2006/7/12, Greg Newby <gbnewby@pglaf.org>:
> On Tue, Jul 11, 2006 at 09:34:47PM -0700, Andrew Sly wrote:
> >
> > Ricardo:
> >
> > Interesting idea. At one point I had thought about what
> > it would take to set up some kind of distributed
> > translation process.
> >
Theoretically, something like PGDP would do (with some improvements
and special documentation), since we could make "png's" from the txt
files, with a few lines per page for guidance. Project
comments/discussions could support the translation and the
Post-Processor would need to check not only the format but also the
consistency of the translation. It would be definitely harder than
"simple" proofing, but.... it would be possible.

This system could also be used not only for translations but also for
other tasks PG could benefit from but PGDP doesn't support. For these
other tasks it would be nice to have some sort of a "Distributed
Gutenbergers". For instance, due to deep spelling changes, Portuguese
ebooks we're publishing now will probably be read only by University
teachers/students. (I suppose the same happens with German). Ordinary
people simply give up after a paragraph or may think they have lots of
mistakes. I'm modernizing myself the spelling of an average etext and
pre-calculate the differences in more than two thousand. In a
distributed basis it would be a lot easier.

This discussion on distributed translation started from a post in the
Portuguese PGDP forum. I'm not thinking of any particular translation,
Andrew, but perhaps I''ll had _Alice's Adventures in Wonderland_ to
the translations in progress section at http://pt.wikisource.org to
see what happens.

Here's how they do it there:
http://pt.wikisource.org/wiki/Predefini%C3%A7%C3%A3o:Lista_dos_Textos_em_Tradu%C3%A7%C3%A3o

>
> Hi, Ricardo.  As long as it's formatted OK for us, we would probably
> accept it.  We don't go for a lot of stuff in a few categories of items
> (tech docs and religion are two examples), and don't publish PDF-only
> documents nor, usually, HTML without plain text.
>
> You can see more guidance here on formatting:

Thanks Greg. I'm familiar with PG rules (actually I've already
translated some of them and I'm just waiting for PG's wiki to go
public in order to add them).

> Note that we don't need the same level of permission
> letter for a GFDL/CC/etc. free license, but we still
> like to ask for permission.  And, of course, it's still
> copyrighted.
>
How would this work? Would someone just need to send you an email
saying "Greg, I guess the distributed translation made at... is pretty
good" so you could ask (eg Wikisource) permission to add it to PG's
catalog? Then would you answer your volunteer "Wikisource said it's OK
for us to redistribute that translation. You are now allowed to format
the Wikisource file according to our rules." And PG license would be
something like "Public domain as soon as you do not change this file.
Produced by Wikisource".?

> Finally, note that we don't have the personnel to
> handle frequent updates.   For documents in flux, PG
> is likely not the right destination.
>
Yes. I guess that anyone who actually wants to add a distributed
translation to PG's collection must perform some kind of a translation
project management. The hard decision is, I think,  to know when to
stop, since translations can be changed _ad aeternum_. But even if the
translation suffers deep changes, PG could always add another version.
And if only a few changes are made, another edition.

We can use PG's wiki itself to make such translations. Hopefully, it
will be easier than having to relate with outer projects. (Unless
there's a specially-designed-for-PG project, like PGDP).

>
>
> >
> > And I'm curious, could you let me know what
> > it is too?
> >
> > Andrew
> >
> > On Wed, 12 Jul 2006, Ricardo F Diogo wrote:
> >
> > > Hi. Is it possible to send PG a collaborative/"distribued" translation
> > > (made, for instance, at Wikisource), based on an already PG published
> > > eBook? (This wouldn't actually be a self-submitted translation, since
> > > Wikisource works based on GFDL... And the original PG etext is already
> > > copyright cleared...)
> > >
> > > Ricardo
> > >
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d@lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


-- 
?Vi de que noite ? feita a luz do dia!?

(Antero de Quental)

D? livros electr?nicos ao Mundo. Ajude em http://www.pgdp.net e em
http://dp.rastko.net
From hart at pglaf.org  Wed Jul 12 09:38:47 2006
From: hart at pglaf.org (Michael Hart)
Date: Wed Jul 12 09:38:49 2006
Subject: !@!Re: [gutvol-d] Copyright question
In-Reply-To: <44B4E738.8070501@xs4all.nl>
References: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>
	<Pine.GSO.4.58.0607040900410.25387@vtn1.victoria.tc.ca>
	<9c6138c50607112007r50633afdtb960db2ffe887b3f@mail.gmail.com>
	<44B4E738.8070501@xs4all.nl>
Message-ID: <Pine.LNX.4.60.0607120935150.7383@pglaf.org>


On Wed, 12 Jul 2006, Walter van Holst wrote:

> Ricardo F Diogo wrote:
>> Hi. Is it possible to send PG a collaborative/"distribued" translation
>> (made, for instance, at Wikisource), based on an already PG published
>> eBook? (This wouldn't actually be a self-submitted translation, since
>> Wikisource works based on GFDL... And the original PG etext is already
>> copyright cleared...)
>
> That would require some license from the translators that would fit with PG. 
> In several countries you have copyrights for translators.

I don't think anything would be required if the translation were made for PG
of a work PG already had published, as stated above.  It would be implicit
in the offering of the translations to PG that it was meant for distribution
through the normal PG channels.  However, if the translator did want works
in translation copyrighted, and distributed though PG that way, this would
be the same as with any copyrighted work, and we would need a permission
letter stating that this was a copyrighted work for PG ditribution and we
would use the normal copyright header/footer.

Michael


>
> Regards,
>
> Walter
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From sly at victoria.tc.ca  Wed Jul 12 13:00:47 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Jul 12 13:00:50 2006
Subject: !@!Re: [gutvol-d] Copyright question
In-Reply-To: <Pine.LNX.4.60.0607120935150.7383@pglaf.org>
References: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>
	<Pine.GSO.4.58.0607040900410.25387@vtn1.victoria.tc.ca>
	<9c6138c50607112007r50633afdtb960db2ffe887b3f@mail.gmail.com>
	<44B4E738.8070501@xs4all.nl>
	<Pine.LNX.4.60.0607120935150.7383@pglaf.org>
Message-ID: <Pine.GSO.4.58.0607121256160.6699@vtn1.victoria.tc.ca>

I would think that in some kind of collaberative translation
project as discussed, it might be ideal to have a little
notice, something along that lines of "By contributing,
I agree that all my contributions are released to the public
domain (or released under some CC licence, etc.)"

For an example of a copyrighted translation of a work that
came from PG, take a look at:
Szachy i Warcaby: Droga do mistrzostwa
http://www.gutenberg.org/etext/15201

A Polish translation of Edward Lasker's
Chess and Checkers: The Way to Mastership


Andrew

On Wed, 12 Jul 2006, Michael Hart wrote:

> I don't think anything would be required if the translation were made for PG
> of a work PG already had published, as stated above.  It would be implicit
> in the offering of the translations to PG that it was meant for distribution
> through the normal PG channels.  However, if the translator did want works
> in translation copyrighted, and distributed though PG that way, this would
> be the same as with any copyrighted work, and we would need a permission
> letter stating that this was a copyrighted work for PG ditribution and we
> would use the normal copyright header/footer.
>
> Michael
>
>
From walter.van.holst at xs4all.nl  Wed Jul 12 13:15:17 2006
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Wed Jul 12 13:16:08 2006
Subject: !@!Re: [gutvol-d] Copyright question
In-Reply-To: <Pine.LNX.4.60.0607120935150.7383@pglaf.org>
References: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>	<Pine.GSO.4.58.0607040900410.25387@vtn1.victoria.tc.ca>	<9c6138c50607112007r50633afdtb960db2ffe887b3f@mail.gmail.com>	<44B4E738.8070501@xs4all.nl>
	<Pine.LNX.4.60.0607120935150.7383@pglaf.org>
Message-ID: <44B55855.2030800@xs4all.nl>

Michael Hart wrote:

> I don't think anything would be required if the translation were made 
> for PG
> of a work PG already had published, as stated above.  It would be implicit
> in the offering of the translations to PG that it was meant for 
> distribution
> through the normal PG channels.  However, if the translator did want works

Yes, I would agree that there is a implicit licence. However, as soon as 
the translation would be printed etc., you might run into issues. I may 
be a nitpicker, but I'd prefer some clear understanding, for example a 
CC licence.

Regards,

  Walter
From Bowerbird at aol.com  Wed Jul 12 14:15:22 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Jul 12 14:15:28 2006
Subject: [gutvol-d] writely and so on
Message-ID: <55f.264ae44.31e6c06a@aol.com>

given the emergence of rich-text editing on the web,
exemplified by writely (which will be widespread soon,
since google bought 'em), has distributed proofreaders
explored this new possibility yet?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060712/a2383a6f/attachment.html
From gbnewby at pglaf.org  Wed Jul 12 23:56:46 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Jul 12 23:56:48 2006
Subject: [gutvol-d] DVD: last check
In-Reply-To: <53f.2b98e6e.31e00bbe@aol.com>
References: <53f.2b98e6e.31e00bbe@aol.com>
Message-ID: <20060713065646.GA19596@pglaf.org>

Thanks for this - a few good catches.

I've uploaded a replacement newdvd.txt -- needed
to cut a few to fit on the DVD, and fixed a few
minor things.  The main thing that should be fixed,
but will await a technical fix to the program, is
that *8.txt, *7.txt, *0.txt and *.txt are not mutually
exclusive (that is, they are all included to match
.txt, or their counterparts to match txt/zip).

More:

On Fri, Jul 07, 2006 at 03:10:54PM -0400, Bowerbird@aol.com wrote:
> greg-
> 
> here is a slight reworking of that info-page you made on the d.v.d.
> 
> search for question-marks to find questions i had...
> 
> -bowerbird
> 
> ===========================================================
> 
> "july 2006 special" -- current as of ebook #18739

18815

> ===========================================================
> 
> baseline: everything but the hgp:
> 
> 1-2199 (then skip 2200-2224)
> 2225-3500 (then skip 3501-3524)
> 3525-11774 (then skip 11775-11799)
> 11800-20000
> 
> ===========================================================
> 
> particular items of interest included that are not text and not html:
> 
> 116 (zip/avi, select "all") -- apollo 11 moon landing movie
> 156 (midi, select "all") -- beethoven's 5th symphony audio
> 249 (zip/html) -- french cave-paintings pictures
> 256 (zip/mpg) -- rotating-earth movie
> 3002 (mp3) -- janis ian, society's child audio
> 5212-5216 ("all") -- a-bomb videos (was "5212-5215"? -- is 5216 a 
> compilation?)

I ditched #5216

> 9551 ("all") -- human-read sherlock holmes audio
> 10177 ("all") -- ride of the valkyries, audio
> 17246 ("all") , but it doesn't include all the mp3s -- wrong e-text #?

Mistake...I'm not sure what I had in mind.

> ===========================================================
> 
> selected top-100 titles to specify as html:
> 
> 11 -- alice in wonderland (does this mean you used #928?)

Added 928.

> 132 -- art of war (does this mean you used #17405?)

Added 17405

> 5000 (da vinci notebooks, html?, complete set of #4998 and #4999?)

yes.

> 5001 -- einstein's relativity
> 5200 --   "metamorphosis" (anything special about this html file?)

No, but it's one of my favorite titles :)

> 8710 -- dore bible illustrations
> 8800 -- dante's divine comedy (only available as html download)

Fixed...dropped 8800, added 8779-8799 in plain HTML w/ images.
(Also a favorite, though Paradiso is a lot less entertaining
than Inferno!)

> 9551 -- human-read sherlock holmes
> 10681 -- roget's thesaurus (heavily formatted with styles)
> 13510 -- "knots, splices and rope work"
> 
> ===========================================================
> 
> a few extras to specify as html:
> 
> 10600 -- kerr's "voyages and travels" (but no images in this file?)

I'm not sure what I had in mind...

> 
> illustrated beatrix potter:
> 17089, 15575, 15284, 15234, 15137, 15077, 14877, 14872, 14868, 14848,
> 14838, 14837, 14814, 14797, 14407, 14304, 14220, 12103
> 
> the first 20 punch: -- (all of the punch are listed separately below)
> 18114, 17994, 17654, 17653, 17634, 17629, 17596, 17471, 17397, 17216,
> 16877, 16727, 16717, 16707, 16684, 16673, 16640, 16628, 16619, 16592,

Dropped 10 to save space

> the sciam (232mb) -- (listed below) -- (so, were all these included on the 
> dvd?)

Dropped about 17 to save space

> ===========================================================
> 
> eliminate some titles that are part of series.
> these "complete" volumes were skipped,
> and their individual volumes were retained.)
> 
> to skip (a total of 245 duplicate "completes"):
> ...

> 
> hgp items to skip (the reverse of the first list above):
> 
> 11775-11799
> 3501-3524 -- (the original "4501-3524" was a mistake?)

Yes....it's 3501, not 4501

> 2200-2224
> 

  -- Greg
From ricardofdiogo at gmail.com  Thu Jul 13 01:55:50 2006
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Thu Jul 13 01:55:53 2006
Subject: !@!Re: [gutvol-d] Copyright question
In-Reply-To: <44B55855.2030800@xs4all.nl>
References: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>
	<Pine.GSO.4.58.0607040900410.25387@vtn1.victoria.tc.ca>
	<9c6138c50607112007r50633afdtb960db2ffe887b3f@mail.gmail.com>
	<44B4E738.8070501@xs4all.nl>
	<Pine.LNX.4.60.0607120935150.7383@pglaf.org>
	<44B55855.2030800@xs4all.nl>
Message-ID: <9c6138c50607130155k5479e855we0742a76ebd2540a@mail.gmail.com>

>
> On Wed, 12 Jul 2006, Michael Hart wrote:
>
> I don't think anything would be required if the translation were made for PG
>  of a work PG already had published, as stated above.  It would be implicit
>  in the offering of the translations to PG that it was meant for distribution
>  through the normal PG channels.

Thing is, when those translations are made in websites that have a
GFDL, like Wikisource, I suppose that in order to distribute them we'd
have to add the GFDL itself, and PG would have to do the same
(right?). But PG has its own licence, and GFDL says no clauses can be
added.

> However, if the translator did want works
>  in translation copyrighted, and distributed though PG that way, this would
>  be the same as with any copyrighted work, and we would need a permission letter

Yes, but for a massive/distributed/collaborative translation who would
write that letter? Only those who want to keep copyright? Even if one
in a hundred? And if s/he doesn't write it?

2006/7/12, Andrew Sly <sly@victoria.tc.ca>:
> I would think that in some kind of collaberative translation
> project as discussed, it might be ideal to have a little
> notice, something along that lines of "By contributing,
> I agree that all my contributions are released to the public
> domain (or released under some CC licence, etc.)"
>
Under US law, if a website's general license is GFDL but for a given
project we make such public domain notice, would it be effective?

In some countries we can't release books to the public domain. What we
could do would be something like "By contributing,
you agree that all your contributions are released to the public
domain. If that doesn't apply to your country, you agree that all your
contributions can be freely distributed, changed... etc."

2006/7/12, Walter van Holst <walter.van.holst@xs4all.nl>:

> Yes, I would agree that there is a implicit licence. However, as soon as
> the translation would be printed etc., you might run into issues. I may
> be a nitpicker, but I'd prefer some clear understanding, for example a
> CC licence.

Maybe an understanding between PG and other major projects like
Wikimedia could make this issue a lot easier. (A default procedure for
PG<-->Wikisource distributed/collaborative translations could save a
lot of trouble and increase the number of translations.)

-- 
?Vi de que noite ? feita a luz do dia!?

(Antero de Quental)

D? livros electr?nicos ao Mundo. Ajude em http://www.pgdp.net e em
http://dp.rastko.net
From Bowerbird at aol.com  Thu Jul 13 14:42:29 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Jul 13 14:42:34 2006
Subject: [gutvol-d] Categorizing PG content
Message-ID: <57b.fcf4c8.31e81845@aol.com>

karen said:
>    Suggestion: have a competition to design 
>    an open-source cataloging system for e-books,
>    where there are no physical constraints on "shelving." 
>    Publicize it in library schools. Major ego-boo for the
>    teacher/graduate student whose scheme is accepted, 
>    free design for PG.

um, i don't know that i'm seeing much quality thinking
coming out of the library schools, i am chagrined to say.

besides, it's not so much the "design" that is so difficult,
but rather the _implementation_, and the grunt work of
assigning e-texts within the system.   so it'd be far better
to have the competition at the _programming_ level...

and again, much of the design work has already been done,
when this thread had an earlier incantation on this listserve.
if no one is willing to check the archives, what's the point?...

finally, i'm not sure that y'all understand the major need here.
and i'm quite certain that library-school students will miss it.

answer this question:   why should we categorize the e-texts?

i'm serious.   formulate an answer.   i'll wait...

got one?   ok, great...

if your response runs along the lines of "so end-users can find
the book they want, and download it", you're on the wrong path.

that's the function catalogs used to serve, in the dead-tree world.
after all, since a person had made a trip to a library to get a book,
and would have to be making another trip to bring it back, it made
a lot of sense for that person to find a book that they would enjoy.
in that scenario, the catalog helped avoid the cost of a wrong choice.
the physical nature of bound pages creates a situation of obligation.

but in our new era of high-bandwidth and terrabyte hard-drives,
it's silly for a person to spend even mere seconds trying to decide
_whether_or_not_ to download a book.   it's _far_ more convenient
to download vast portions of the library, since they can have their
computer do it automatically while they are partying, or sleeping...

even the dial-up people can request the d.v.d., for free, and have
the entire p.g. library sitting on their hard-disk in a week or so...

not only is it not wise to make people spend any time "choosing",
it's at odds with the important concept of _unlimited_distribution_.

and that's why the library-school people don't understand this.
because unlike them, we _want_ people to take a whole bunch!

it's not just that there's "no shortage of shelf-space" with e-books,
it's that we have an endless source of production.   so take 'em all!

we are all still trapped, to a large degree, by our history of scarcity,
so it's difficult for us to realize how deeply it pervades our thinking.
(especially since we all live in the real world too, where scarcity still
is a hard fact of life.)   but this is one place where we can shed that...

these implications of unlimited-production-and-distribution turn
our thinking on its head.   instead of helping users choose what to
_pick_ in the library, we have to help 'em choose what to _discard_.

in many ways, this is a much easier task.

human genome project files?   ya, you probably won't want 'em.
e-texts in a language that you cannot read?   you can skip those.
text-to-speech files?   videos?   magazines?   maybe yes, maybe no.

they start with 20,000 "possibles", weeding 'em out to their taste,
thanks to our handy-dandy program, which then auto-downloads
the ones that are left, in the background, with zero input needed.

at that point, the cost of selecting a book is double-clicking it and
starting to read it.   and if it doesn't appeal to you, just stop reading
and go on to the next one.   you don't have much need for a catalog.

oh sure, it might still be kind of handy to have be guided to e-texts,
so some means of categorizing an e-text as being "similar to" others
would be nice.   but that's how we need to _approach_ this project,
from the get-go, and not from our implicit notions about "a catalog",
because those are outdated and irrelevant to the task now at hand.
you're barking up the wrong tree if you don't rearrange that thinking.

but anyway, as i said, a system of categorization would be handy,
and i'll have some work to show in that regard in a separate post.
i believe it's important to start out with the philosophical point...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060713/180337d5/attachment.html
From jon at noring.name  Thu Jul 13 15:28:54 2006
From: jon at noring.name (Jon Noring)
Date: Thu Jul 13 15:29:02 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <57b.fcf4c8.31e81845@aol.com>
References: <57b.fcf4c8.31e81845@aol.com>
Message-ID: <1681483192.20060713162854@noring.name>

Bowerbird wrote:
> karen said:

>>??  Suggestion: have a competition to design 
>>??  an open-source cataloging system for e-books,
>>??  where there are no physical constraints on "shelving." 
>>??  Publicize it in library schools. Major ego-boo for the
>>??  teacher/graduate student whose scheme is accepted, 
>>??  free design for PG.

>  answer this question:?  why should we categorize the e-texts?

Actually, I think what we'd like to do is to "categorize" the texts
using one or more categorical systems, and then embed that information
right into the book (which is a digital object).

This is essentially adding metadata, or what the Yahoo folk call
"microformats" (which is a terrible name), right into the object.
This is done now in many kinds of digital objects, such as audio,
video and some ebook formats.

This way no external categorization need to be applied -- it is all
recorded internally, meaning each book can become autonomous of the
others since it carries its own metadata. Particular "libraries" can
build a lookup table of their choosing by simply sniffing through all
the texts it holds. It doesn't really matter where the text files are
placed or organized in a file structure. Multiple categorization
systems can be supported in parallel provided the texts carry the
requisite information.

In XML, there's a number of ways this info could be embedded. In plain
text documents, some sort of machine recognizable "plain text" syntax
has to be developed -- it'd be quite simple, actually. I think those
who advocate plain text should develop a "plain text" metadata system
(such as one based on Dublin Core) to insert somewhere in the file.

Jon Noring


From Bowerbird at aol.com  Thu Jul 13 15:54:56 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Jul 13 15:55:06 2006
Subject: [gutvol-d] Categorizing PG content
Message-ID: <32d.1305f980.31e82940@aol.com>

it's a good thing i don't respond to jon noring any more or 
i'd just get bogged down in shit like markup and metadata.

programming is a whole lot more fun.           :+)

-bowerbird

p.s.   luv ya, jon, no, seriously!   thanks for all the work you do!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060713/080202d3/attachment.html
From jon at noring.name  Thu Jul 13 16:07:55 2006
From: jon at noring.name (Jon Noring)
Date: Thu Jul 13 16:08:06 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <32d.1305f980.31e82940@aol.com>
References: <32d.1305f980.31e82940@aol.com>
Message-ID: <292503298.20060713170755@noring.name>

bowerbird wrote:

>  p.s.?  luv ya, jon, no, seriously!?  thanks for all the work you do!

Luv ya, too, buddy.  :^)

And keep up the work on ZML. As I've noted before, we do need a
standardized way to express plain text books.

Jon

From sly at victoria.tc.ca  Thu Jul 13 23:27:33 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Jul 13 23:27:40 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <1681483192.20060713162854@noring.name>
References: <57b.fcf4c8.31e81845@aol.com>
	<1681483192.20060713162854@noring.name>
Message-ID: <Pine.GSO.4.58.0607132322411.23654@vtn1.victoria.tc.ca>

Jon:

One point to take into account is that the upcoming
wiki catgorizing will be flexible, never "finished",
changing as needed. Embedding this in the files would
take a much larger amount of effort, and remove much
of the possibility for collaberative effort.
Andrew

On Thu, 13 Jul 2006, Jon Noring wrote:

> Actually, I think what we'd like to do is to "categorize" the texts
> using one or more categorical systems, and then embed that information
> right into the book (which is a digital object).
>
> This is essentially adding metadata, or what the Yahoo folk call
> "microformats" (which is a terrible name), right into the object.
> This is done now in many kinds of digital objects, such as audio,
> video and some ebook formats.
>
> This way no external categorization need to be applied -- it is all
> recorded internally, meaning each book can become autonomous of the
> others since it carries its own metadata. Particular "libraries" can
> build a lookup table of their choosing by simply sniffing through all
> the texts it holds. It doesn't really matter where the text files are
> placed or organized in a file structure. Multiple categorization
> systems can be supported in parallel provided the texts carry the
> requisite information.
>
From sly at victoria.tc.ca  Fri Jul 14 00:01:50 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Fri Jul 14 00:02:40 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <57b.fcf4c8.31e81845@aol.com>
References: <57b.fcf4c8.31e81845@aol.com>
Message-ID: <Pine.GSO.4.58.0607132347390.23654@vtn1.victoria.tc.ca>

Hi bb.

I'm replying to some statements from a few different messages
of yours here.


>extensive discussions on this topic were already held here on gutvol-d.
>why go through it all again? and again and again and again?

You've hinted a few times that I'm just starting from
scratch here, ignoring what has already been done.

This is not true. I've read with interest the previous
discussion you mentioned. However, ideas seem to, all too
often, be of the variety that would only be practicable
if we had a couple of highly-trained, professional
librarians who decided to donate their full-time services
to PG. (I can dream can't I?)

So, I've actually tried doing something productive. I've
spent countless hours editing parts of the PG online catalog
(focusing mostly on author headings, having given up on
trying to make title statements that would be acceptable
to the Library Sciences community)


>besides, it's not so much the "design" that is so difficult,
>but rather the _implementation_, and the grunt work of
>assigning e-texts within the system. so it'd be far better
>to have the competition at the _programming_ level...

In this you are very correct. "The grunt work" is a very
real factor here. What I am hoping is to eventually get
these wiki pages working in a way that will _invite_
people to contribute, making it more a collaborative
effort. If I just go ahead and do as much as I can myself,
there will really be no advantage over what I could have
done just in editing the PG online catalog.

>answer this question: why should we categorize the e-texts?

>if your response runs along the lines of "so end-users can find
>the book they want, and download it", you're on the wrong path.

You then argue that:

>in our new era of high-bandwidth and terrabyte hard-drives,
>it's silly for a person to spend even mere seconds trying to decide
>_whether_or_not_ to download a book. it's _far_ more convenient
>to download vast portions of the library, since they can have their
>computer do it automatically while they are partying, or sleeping...


So, let's assume that someone is interested in the Science Fiction
books that we've posted a decent number of lately. Should this person
have to download a few hundred books and then do his own time-consuming
search of these books now on his own system, trying to identify which
ones might be science fiction?

I think the need for something like categorizing is apparent, because
I've seen a decent number of independent web sites which present
a subset of PG books relating to a certain subject. Ones that
spring to mind are collections of Australaiana, Canadiana,
Esperanto-related topics (not necessarily _in_ Esperanto),
and books related to the Philippines. Also, not long ago, I
had someone ask if there was some way he could look through
just 18th century books.

I would argue that having general categories was one reason that
Blackmask was so popular.

>these implications of unlimited-production-and-distribution turn
>our thinking on its head. instead of helping users choose what to
>_pick_ in the library, we have to help 'em choose what to _discard_.

I must disagree here.

People by nature, prefer to have a smaller number of choices. (How
many people will look at an extensive menu in a restaurant, be
intimidated, and just pick something off the small "feature"
list?--having worked in such a place, I can tell you: lots of people.)

Would you rather have a selection of items in one particular category
that may be of interest, or have a massive list where you have to
go "Nope, don't want that one. Nope, don't want that one" two-hundred
times?


>from the get-go, and not from our implicit notions about "a catalog",
>because those are outdated and irrelevant to the task now at hand.

Careful now. The traditional library catalog is still an
extremely useful resource (for those who know how to use it).
I might be susceptible to the argument, however, that its
limits get stretched uncomfortably trying to describe digital
material.

Andrew

From Bowerbird at aol.com  Fri Jul 14 00:15:15 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Jul 14 00:15:32 2006
Subject: [gutvol-d] Categorizing PG content
Message-ID: <558.12396c80.31e89e83@aol.com>

andrew said:
>    You've hinted a few times that I'm just starting from
>    scratch here, ignoring what has already been done.

not you in particular.   all of us in general.          :+)

it seems to me that _you_ are far ahead of the crowd
by virtue of the fact that you're actively working on it.

and i look forward to your results when you show us.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060714/1c9381a0/attachment-0001.html
From joey at joeysmith.com  Fri Jul 14 00:34:02 2006
From: joey at joeysmith.com (joey)
Date: Fri Jul 14 00:36:30 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <57b.fcf4c8.31e81845@aol.com>
References: <57b.fcf4c8.31e81845@aol.com>
Message-ID: <20060714073402.GM20863@joeysmith.com>

Sorry for the length, everyone, but I wanted to try and cover
in words what I was unable to cover in production software.

On Thu, Jul 13, 2006 at 05:42:29PM -0400, Bowerbird@aol.com wrote:
	...
> finally, i'm not sure that y'all understand the major need here.
> and i'm quite certain that library-school students will miss it.
> 
> answer this question:   why should we categorize the e-texts?
> 
> if your response runs along the lines of "so end-users can find
> the book they want, and download it", you're on the wrong path.
> 
> that's the function catalogs used to serve, in the dead-tree world.
	...
> but in our new era of high-bandwidth and terrabyte hard-drives,
> it's silly for a person to spend even mere seconds trying to decide
> _whether_or_not_ to download a book.   it's _far_ more convenient
> to download vast portions of the library, since they can have their
> computer do it automatically while they are partying, or sleeping...

	I disagree. I have a 100Mb/s municipal fiber connection and almost
2 terabytes of disk space available, and "download[ing] vast portions
of the library" is not an option for me. I don't find it difficult to
imagine that if I have a hard time accepting this answer, there are
going to be others who do so as well, with far fewer resources at
their command.

> even the dial-up people can request the d.v.d., for free, and have
> the entire p.g. library sitting on their hard-disk in a week or so...

	I also don't agree with the implied assertion here that having the
full (or even "vast portions of the") library means that users don't
want help identifying and locating content within that collection.

	Of course, this means that we'll want to help people who download
the library get the catalog data that matches their portion of the
library!

> not only is it not wise to make people spend any time "choosing",
> it's at odds with the important concept of _unlimited_distribution_.

	Having a catalog does not equate to making people use it. It's a
tool for those who want to make use of it. That said, let's make sure
that whatever tool(s) we come up with fit as many of the percieved needs
as we possibly can! You clearly have different ideas of the use of a
catalog than do I. As you've already enumerated some of the points of
*my* use, perhaps you could elaborate on your ideas?

	(On the other hand, if you already did this, ignore this request.
I generally avoid topics once you start weighing in on them, so I may
have missed the applicable portions from the last time this topic
came up.)


---

So, on to my proposal. I had hoped to actually be able to provide a
tool demonstrating it, but my day job interfered too much this week
to allow me to realize that hope. So instead, let me see if I can lay
out the concept.

It's based on the tagging system known as the "Debian Package
Browser" [1].

Some important parts of the idea that might be missed initially:
* Every book gets tagged initially with a placeholder value
* Wherever we can identify existing valuable tags, they are
	added to the initial load. Some examples of tags I'd want
	include: year published in PG; Author/Creator; Language;
	LoC Class; Copyright Status (sounding familiar to anyone?)
* Tags need to be nestable. This is something the Debian system
	is not able to support, but I think it's very important. One
	example Browerbird already pointed out is the Amazon.com
	categorization scheme.
* The default behaviour of the tagging system should be marking
	which of the existing tags are best applied to this book, but
	it also needs to be flexible enough to add new tags (and
	hierarchies thereof). Setting the default behaviour this way
	is one way of preventing the "del.icio.us syndrome" found in
	many folksonomies, where there are as many different ways of
	tagging a piece of content as there are users of the system.
* It should be easy, when viewing a particular ebook, to do any 
	of the following actions: view tags already on this book;
	see a list of "suggested tags", based on a weighted list of
	tags attached to content that has other tags in common with
	the current content; view other content tagged in common;
	add / remove tags.
* It needs to be easy to see all content with a particular tag
	or tagset. I'm envisioning something akin to the Flamenco [2]
	system here.

I envision a lot of things coming out of this effort, including
an easier way for people to suggest content for the "Best Of"
DVDs so that Greg doesn't have to do so much of the leg-work
himself. As people come across suggestions, they tag them, then
Greg can just pull a list of ebooks with that tag.

I've done some work on a prototype, but as I said, the real world
invaded and sapped my time. Then again, I know there are many
others on this list that are talented software developers, so
perhaps one of you will beat me to it...or propose an even better
system.

[1]: http://debian.vitavonni.de/packagebrowser/
[2]: http://flamenco.berkeley.edu/
	 If you'd like to see Flamenco at work, but don't have the
	 resources to set it up yourself, drop me a line off-list
	 and I'll provide you with a URL to one I've setup.
From kth at srv.net  Fri Jul 14 09:17:08 2006
From: kth at srv.net (Kevin Handy)
Date: Fri Jul 14 09:02:17 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <1681483192.20060713162854@noring.name>
References: <57b.fcf4c8.31e81845@aol.com>
	<1681483192.20060713162854@noring.name>
Message-ID: <44B7C384.5080700@srv.net>

Jon Noring wrote:

> Actually, I think what we'd like to do is to "categorize" the texts
>
>using one or more categorical systems, and then embed that information
>right into the book (which is a digital object).
>  
>
Instead of embedding it into the e-book, I think it would work
better as a seperate file.

If you embed it into the ebooks,  you will need to put it in all the
versions (html, text, pdf, tei, etc..), and keep ALL of them up-to-date.
Also, you would have to search the entire text of the book to find
all the meta-data.

As a seperate file, it would also by easier to download just that
when you want to be able to do "local" searches, without needing
to download the full text of every e-book.

Also, if you want to make it "user" editable, however you want
to define "user", it would be better as seperate file, so that the
original files don't constantly get flagged as modified.

Also, make it easy to join the meta-files into a single file
(cat *.meta > all.meta would be ideal) so that large number
of books could be munged at once, or catalogues of specific
groupings could easily be created (i.e. science-fiction/german).
This would just require having a header in each file specifying
which book it applies to.

The format could be text, or XML, or even tei. If you use an
XML based version, a text version could be easily created.

>This is essentially adding metadata, or what the Yahoo folk call
>"microformats" (which is a terrible name), right into the object.
>This is done now in many kinds of digital objects, such as audio,
>video and some ebook formats.
>  
>
Instead of just category, you could store all sorts of information
in the "meta" file. Authors name, copyright date(s), categories
(science finction, horticulture, cook-book), available formats (text,
html, tei, pdf, etc.), language(s), links to web sites, link to author
meta file, and any other information like you would like to find
in a card catalog,

>This way no external categorization need to be applied -- it is all
>recorded internally, meaning each book can become autonomous of the
>others since it carries its own metadata. Particular "libraries" can
>build a lookup table of their choosing by simply sniffing through all
>the texts it holds. It doesn't really matter where the text files are
>placed or organized in a file structure. Multiple categorization
>systems can be supported in parallel provided the texts carry the
>requisite information.
>  
>
I think that it could become a problem if the meta-data in the
different formats were found to be different. Which one has the
most correct information, the text version or the html one?

>In XML, there's a number of ways this info could be embedded. In plain
>text documents, some sort of machine recognizable "plain text" syntax
>has to be developed -- it'd be quite simple, actually. I think those
>who advocate plain text should develop a "plain text" metadata system
>(such as one based on Dublin Core) to insert somewhere in the file.
>  
>
If you wanted to search for all polish math books, how would you write
the query program so that you would get all of them, without
duplicates because of the different formats, and without wasting a
lot of CPU cycles. Not all texts have a .txt version,

From hart at pglaf.org  Fri Jul 14 09:18:12 2006
From: hart at pglaf.org (Michael Hart)
Date: Fri Jul 14 09:18:14 2006
Subject: !@!Re: [gutvol-d] Copyright question
In-Reply-To: <9c6138c50607130155k5479e855we0742a76ebd2540a@mail.gmail.com>
References: <S581646AbWGDPPbGZhj4/20060704151531Z+12203@nic.funet.fi>
	<Pine.GSO.4.58.0607040900410.25387@vtn1.victoria.tc.ca>
	<9c6138c50607112007r50633afdtb960db2ffe887b3f@mail.gmail.com>
	<44B4E738.8070501@xs4all.nl>
	<Pine.LNX.4.60.0607120935150.7383@pglaf.org>
	<44B55855.2030800@xs4all.nl>
	<9c6138c50607130155k5479e855we0742a76ebd2540a@mail.gmail.com>
Message-ID: <Pine.LNX.4.60.0607140916100.14664@pglaf.org>


On Thu, 13 Jul 2006, Ricardo F Diogo wrote:

>> 
>> On Wed, 12 Jul 2006, Michael Hart wrote:
>> 
>> I don't think anything would be required if the translation were made for 
>> PG
>>  of a work PG already had published, as stated above.  It would be 
>> implicit
>>  in the offering of the translations to PG that it was meant for 
>> distribution
>>  through the normal PG channels.
>
> Thing is, when those translations are made in websites that have a
> GFDL, like Wikisource, I suppose that in order to distribute them we'd
> have to add the GFDL itself, and PG would have to do the same
> (right?). But PG has its own licence, and GFDL says no clauses can be
> added.

Yet one more reason to stay away from such licences, just more trouble.

The PG licence works just fine for this, better than GPL, or others.

Best to just make sure everyone working on such projects understands
and approves the process before they start.

Keep it simple. . . .


Thanks!!!

Give the world eBooks in 2006!!!

Michael S. Hart
Founder
Project Gutenberg

Blog at http://hart.pglaf.org


>
>> However, if the translator did want works
>>  in translation copyrighted, and distributed though PG that way, this 
>> would
>>  be the same as with any copyrighted work, and we would need a permission 
>> letter
>
> Yes, but for a massive/distributed/collaborative translation who would
> write that letter? Only those who want to keep copyright? Even if one
> in a hundred? And if s/he doesn't write it?
>
> 2006/7/12, Andrew Sly <sly@victoria.tc.ca>:
>> I would think that in some kind of collaberative translation
>> project as discussed, it might be ideal to have a little
>> notice, something along that lines of "By contributing,
>> I agree that all my contributions are released to the public
>> domain (or released under some CC licence, etc.)"
>> 
> Under US law, if a website's general license is GFDL but for a given
> project we make such public domain notice, would it be effective?
>
> In some countries we can't release books to the public domain. What we
> could do would be something like "By contributing,
> you agree that all your contributions are released to the public
> domain. If that doesn't apply to your country, you agree that all your
> contributions can be freely distributed, changed... etc."
>
> 2006/7/12, Walter van Holst <walter.van.holst@xs4all.nl>:
>
>> Yes, I would agree that there is a implicit licence. However, as soon as
>> the translation would be printed etc., you might run into issues. I may
>> be a nitpicker, but I'd prefer some clear understanding, for example a
>> CC licence.
>
> Maybe an understanding between PG and other major projects like
> Wikimedia could make this issue a lot easier. (A default procedure for
> PG<-->Wikisource distributed/collaborative translations could save a
> lot of trouble and increase the number of translations.)
>
> -- 
> ?Vi de que noite ? feita a luz do dia!?
>
> (Antero de Quental)
>
> D? livros electr?nicos ao Mundo. Ajude em http://www.pgdp.net e em
> http://dp.rastko.net
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From Bowerbird at aol.com  Fri Jul 14 10:19:28 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Jul 14 10:19:39 2006
Subject: [gutvol-d] Categorizing PG content
Message-ID: <483.5cfc7ed.31e92c20@aol.com>

joey said:
>   I have a 100Mb/s municipal fiber connection 
>    and almost 2 terabytes of disk space available, 
>    and "download[ing] vast portions of the library" 
>    is not an option for me.

well joey, i do look forward to your tool,
when you find time to create it, because
these general discussions we are having
around this topic have a lot of fuzziness
about them, which must all be resolved
when one starts writing code.

so i won't respond to all your points until
i can see exactly what you meant by them.

but this point here is quite easy to deal with.
downloading the project gutenberg library
-- even the whole thing -- can be a breeze.

first of all, as is always the default with me,
i'm only concerned with one version of each
-- the "master version", in z.m.l. format --
as the other versions can be spun out of it.

second, as i said, it's reasonable to eliminate
big classes of e-texts from the downloading,
such as the human genome files, audio/video,
and books in languages that you don't read...

third, there are a lot of duplicate files where
pieces of a volume were presented separately,
and then the volume as a whole in another file.
now that we have the information (thanks greg),
those separate-piece files can easily be ignored.

fourth, there are some people who will not want
the magazines that are being added increasingly.

once you've eliminated all of these files from your
download queue, you find the list is much smaller.

on to the next step...   i have written a program that
lets a person click one button to start downloading
e-texts as a background process on their machine.

as soon as one e-text has been completely received,
the next one is requested, thus the downloading is
_relentless_, and you'd be surprised how fast it goes.

for a d.s.l. person like myself, after doing the deletions
i mentioned above, it will merely take _a_few_days_ to
download all the e-texts.   to get the _whole_ library,
it might take you a week or so.   but remember, during
this whole time, you will not have to do a single thing.
all you had to do was click that one button.

plus, you do have to enter a code every 108 minutes,
but it's just this sequence of 6 numbers, no big deal.      ;+)


>    I also don't agree with the implied assertion here 
>    that having the full (or even "vast portions of the") 
>    library means that users don't want help identifying 
>    and locating content within that collection.

it was only because i knew some might _infer_ such an
"assertion" that i closed my post with the explicit note
that this later purpose _is_ still "handy", and therefore
should be the _focus_ of this task.   did you read that?


>   I generally avoid topics once you start weighing in on them, 
>    so I may have missed the applicable portions from the last time 
>    this topic came up.

well that's a remarkable admission.

since i "weigh in" on every topic that is _interesting_
and usually "start" doing so fairly early in the thread,
that must mean you're "avoiding" most of the posts,
and all the interesting threads.   life must be sad.         :+)

at any rate, i thank you for your candor.

perhaps you will thank me for mine when i tell you that
if you didn't read what i have written on this topic before,
you're likely to take a path that will end up biting your ass.

***

anyway, as i read your proposal, it's a social tagging scheme.
as a general approach, that would be one way of doing things.
again, the specifics are vital, so let us know when you have 'em.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060714/bae55553/attachment.html
From ajhaines at shaw.ca  Fri Jul 14 10:38:46 2006
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Fri Jul 14 10:41:42 2006
Subject: [gutvol-d] Page numbers in text e-books
Message-ID: <000a01c6a76c$5bd99310$6401a8c0@ahainesp2400>

I'm looking for examples of how page numbers are handled/formatted 
throughout the main portion of a text e-book (that material between its 
Table of Contents and its Index).  Can someone point me to a few examples?

I've tried looking myself, but finding a text e-book with page numbers 
(aside from those in its TOC and Index), in a collection of nearly 19,000 
e-books, is kind of needle-in-haystackish <g>.

Thanks,
Al 


From Bowerbird at aol.com  Fri Jul 14 10:41:56 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Jul 14 10:42:05 2006
Subject: [gutvol-d] Categorizing PG content
Message-ID: <490.5f0e857.31e93164@aol.com>

kevin said:
>    Instead of embedding it into the e-book, 
>    I think it would work better as a seperate file.

if i had to choose between the two, i'd agree with you.
but there's no reason we can't do it both ways.


>    If you embed it into the ebooks,? you will need to 
>    put it in all the versions (html, text, pdf, tei, etc..), 
>    and keep ALL of them up-to-date.

you put it in the master version (z.m.l.) and 
then re-propagate the auxiliary versions...


>    Also, if you want to make it "user" editable, 
>    however you want to define "user", it would be 
>    better as seperate file, so that the original files 
>    don't constantly get flagged as modified.

social tagging is an ongoing process, so yes,
it doesn't make sense to put that into the file,
because your files will be constantly changing.

you could roll social tags into your documents on
a regular basis, however, and that might be useful.

(and every e-text should have a changelog anyway.
until you install that, you'll never have a good handle
on controlling the contents of your library.   never.)

but until we see a social tagging system that really
works for our purposes, this planning is premature.


>    make it easy to join the meta-files into a single file
>    (cat *.meta > all.meta would be ideal)

yes, of course.   indeed the single-file version should
be the one that is public-facing, for easy download.
we can give 'em a tool that splits it on their machine.


>    The format could be text, or XML, or even tei. If you use an
>    XML based version, a text version could be easily created.

at one time, i looked at the x.m.l. version of the catalog.
what a bloated crufty mess!    kevin, please demonstrate 
that there is some reality behind what you have said here
by showing us "the text version that could be easily created".

because in order to make any of these plans really _work_,
we will need a simple list of the e-texts.   i'd like to see one
with about 20,000 lines, each line looking something like this:
>    00011 -- alice's advertures in wonderland -- lewis carroll


>    Instead of just category, you could store all sorts of information
>    in the "meta" file. Authors name, copyright date(s), categories
>    (science finction, horticulture, cook-book), available formats 
>    (text, html, tei, pdf, etc.), language(s), links to web sites, 
>    link to author meta file, and any other information 
>    like you would like to find in a card catalog,

you'll find much of that data in the existing x.m.l. catalog.
so have at it.   show us what you can do with it.


>    Which one has the most correct information, 
>    the text version or the html one?

if such a difference comes into existence, you have a bigger problem,
which is that your workflow has some bug in it that needs to be fixed.


>    If you wanted to search for all polish math books, 
>    how would you write the query program so that 
>    you would get all of them, without duplicates because 
>    of the different formats, and without wasting a
>    lot of CPU cycles. Not all texts have a .txt version

good question.   got an answer?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060714/f8964029/attachment-0001.html
From joshua at hutchinson.net  Fri Jul 14 11:50:51 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Jul 14 11:50:53 2006
Subject: [gutvol-d] Page numbers in text e-books
Message-ID: <20060714185051.7EB83109ADE@ws6-4.us4.outblaze.com>

Here is one I posted yesterday.

http://www.gutenberg.org/etext/18827

The HTML and PDF (well, and the TEI master) versions have original page numbers, the plain text does not.

Josh

> ----- Original Message -----
> From: "Al Haines (shaw)" <ajhaines@shaw.ca>
> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
> Subject: [gutvol-d] Page numbers in text e-books
> Date: Fri, 14 Jul 2006 10:38:46 -0700
> 
> 
> I'm looking for examples of how page numbers are handled/formatted throughout 
> the main portion of a text e-book (that material between its Table of Contents 
> and its Index).  Can someone point me to a few examples?
> 
> I've tried looking myself, but finding a text e-book with page numbers (aside 
> from those in its TOC and Index), in a collection of nearly 19,000 e-books, is 
> kind of needle-in-haystackish <g>.
> 
> Thanks,
> Al _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

>

From Bowerbird at aol.com  Fri Jul 14 12:15:20 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Jul 14 12:15:32 2006
Subject: [gutvol-d] Page numbers in text e-books
Message-ID: <bd7.112b15.31e94748@aol.com>

al said:
>    I'm looking for examples of how page numbers are handled/
>    formatted throughout the main portion of a text e-book 
>    (that material between its Table of Contents and its Index).? 
>    Can someone point me to a few examples?

i don't know of any _text_ examples in the p.g. library.

but here's a demo of one using my zen markup language:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myant.zml

this example was created for the purpose of coordinating
the _scans_ for the pages with the text, so it's a little more
broad that just the incorporation of the page-numbers...

to see how these individual pages are presented
to people for convenient viewing on the web, go to:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myantp003.html
(the number in the u.r.l. indicates the page-number,
so you can quickly and easily navigate to any page.)

i can't say for sure that this is the _final_ version of
how page-oriented markers will be formatted, but
the final version won't be much different from this.

for instance, here's the break between pages 11 and 12:

>    she talked. "But first you come down to the
>    kitchen with me, and have a nice warm bath
>    
>     [[9]]
>     {{myantp010.png}} || My Antonia ||
>    
>    behind the stove. Bring your things; there's
>    nobody about.

the page-number of a page is put underneath the
text for the page, surrounded by double-brackets.
(this is irrespective of where it was in the p-book.)

and right underneath that is the name of the scan
for the next page (in this case, p. 10), surrounded
by double-curly-brackets, and the running-head
for that page is also included on that same line...
(the or-bars indicate left/center/right justification.)

in this particular example, there is one blank line
above the double-bracketed page-number and
one below the double-curly-bracket scan-name.
this indicates that the paragraph is continued...

in the case where a _new_ paragraph starts at
the top of a page, there will be _two_ blank lines
above the page-number on the previous page,
as well as _two_ blank lines after the scan-name.

this is because each page is an entity unto itself.
thus, the preceding page needs to know that its
bottom line is the concluding line of a paragraph
-- because such lines are not to be justified --
and the first line on the following page needs to
know that it's the first line of a paragraph, so that
it's indented if the user specified such indentation.

you can see a case where a new paragraph starts
on a page by searching for "{{myantp004.png}}".

this situation of new-versus-continued paragraphs
is one that even abbyy hasn't quite perfected yet, so
it's not at all uncommon to find errors in this regard.
and sometimes the decision isn't all that easy to make.

for example, look at this scan:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myantp014.png
at the page bottom, the last line, is that the end of a paragraph?

now look at this one and answer that same question:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myantp040.png

and this one:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myantp074.png

and this one:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myantp093.png

and this one:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myantp113.png

and this one:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myantp123.png

and this one:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myantp137.png

and this one:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myantp268.png

and this one:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myantp407.png

(to discover the answers, just add 1 to the number in each u.r.l.,
which will take you to the next page, where an indentation will
indicate that a new paragraph has started, which means that
the last line on the previous page was the end of that paragraph.
for those who are too lazy to do this, the answers are in the p.s.)

and there is another variant on this, the page on which
a new _chapter_ starts.   search for "{{myantp009.png}}"
and you'll find an example of this.   whenever this occurs,
you'll see there are 3 blank lines above the page-number on 
the preceding page, and 4 blank lines below the scan-name
(and thus above the title of the chapter, as per p.g. standard).

pages that start a new _section_ have 3 blank lines above
the page-number on the preceding page, and 7 blank lines
below the scan-name (and above the title of the section).

and yes, of course, the program that presents this z.m.l. file
knows how to collapse all of that page-number/scan-name
information appropriately, so the person reading the e-text
doesn't have to deal with all of that disorienting clutter.

the reader gets nicely formatted text -- indented paragraphs
if they want, section and chapter headings that are big and bold,
if they want, and page-numbers corresponding to the original
p-book source, if they want -- but still the task of "authoring"
the e-text (doing the zen markup, if you will) is very elementary.
even a fourth-grader could do it.

-bowerbird

p.s.   for the answers to the questions posed above, scroll down...


respectively, the answers are no/no/no/no/no/yes/yes/yes/yes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060714/f2fc6bc0/attachment.html
From Bowerbird at aol.com  Fri Jul 14 15:21:18 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Jul 14 15:21:39 2006
Subject: [gutvol-d] first pass at a p.g. categorization scheme
Message-ID: <386.82c81af.31e972de@aol.com>

appended is a "first pass" at a p.g. categorization scheme.

unlike the "tagging" models, i'm using here a "folder" model.

inside each folder will be a bunch of alias "files" that point to
the e-texts proper.   this allows you to put one e-text into
multiple folders, by simply generating another alias file.
the possibility of nesting folders is also present.

as i've said, i think the cataloging tool needs to give people
an ability to rule out unwanted classes of e-texts, including:

first cut -- by language
second cut -- human genome
third cut -- audio
fourth cut -- video
fifth cut -- magazines
sixth cut -- copyrighted
seventh cut -- reference
eighth cut -- religious
ninth cut -- poetry
tenth cut -- plays
eleventh cut -- short story collections
twelfth cut -- anthologies

and once a person has ruled out whatever they don't want,
the downloading of the other e-texts can go automatically.

-bowerbird

p.s.   

books

top-100 titles

selected top-100 titles in html

fiction

nonfiction

reference
   dictionary
   encyclopedia
   thesaurus
   quotations

poetry

plays

short story collections

anthologies

religious

heavily illustrated
   beatrix potter

children's
   beatrix potter

cookbooks

how-to guides

magazines
   punch/punchinello
   scientific american

items of interest that are not text and not html

audio
   music
   human-read books
   text-to-speech books

video

copyrighted work

human genome project

languages other than english
   french
   german
   italian
   ...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060714/519f2296/attachment.html
From brad at chenla.org  Fri Jul 14 19:07:17 2006
From: brad at chenla.org (Brad Collins)
Date: Fri Jul 14 19:04:37 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <1681483192.20060713162854@noring.name> (Jon Noring's message
	of "Thu, 13 Jul 2006 16:28:54 -0600")
References: <57b.fcf4c8.31e81845@aol.com>
	<1681483192.20060713162854@noring.name>
Message-ID: <m3mzbbsjsa.fsf@chenla.org>

Jon Noring <jon@noring.name> writes:

> In plain text documents, some sort of machine recognizable "plain
> text" syntax has to be developed -- it'd be quite simple,
> actually. I think those who advocate plain text should develop a
> "plain text" metadata system (such as one based on Dublin Core) to
> insert somewhere in the file.

I would suggest using YAML -- there are a number of applications for
processing it, and it can be mapped to dublin core elements easily.

The following is a complete YAML Dublin Core document:

---
- title: <scalar>
- creator: <scalar>
- subject: <scalar>
- description: <scalar>
- publisher: <scalar>
- contributor: <scalar>
- date: <iso 8601>
- type: <uri>
- format: <mime-type> 
- identifier: <scalar>
- source: <scalar>
- language: <rfc-3066_iso639>
- relation: <scalar>
- coverage: <scalar>
- rights: <scalar>


This can easily be parsed, it's human readable and maps well to
html/xml structures.

b/

-- 
Brad Collins <brad@chenla.org>, Banqwao, Thailand
From klofstrom at gmail.com  Fri Jul 14 19:24:39 2006
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Fri Jul 14 19:24:41 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <m3mzbbsjsa.fsf@chenla.org>
References: <57b.fcf4c8.31e81845@aol.com>
	<1681483192.20060713162854@noring.name> <m3mzbbsjsa.fsf@chenla.org>
Message-ID: <1e8e65080607141924j38a25b8boe6a43c58f8fe6b06@mail.gmail.com>

On 7/14/06, Brad Collins <brad@chenla.org> wrote:

> I would suggest using YAML -- there are a number of applications for
> processing it, and it can be mapped to dublin core elements easily.

To my untutored eye, it looks good. I wonder if we could make a start
at adding that info to all PG texts and ALSO develop an extra-textual
cataloguing system that might contain more detail.

My dream library would also contain info on how often a text had been
downloaded and a rating/recommendation system like the various book
and movie rating systems out there. You know, "Readers who liked
"Campfire Girls Go Bananas" also liked "Campfire Girls Make Whoopee".

-- 
Karen Lofstrom
From brad at chenla.org  Fri Jul 14 21:29:48 2006
From: brad at chenla.org (Brad Collins)
Date: Fri Jul 14 21:27:05 2006
Subject: [gutvol-d] first pass at a p.g. categorization scheme
In-Reply-To: <386.82c81af.31e972de@aol.com> (Bowerbird@aol.com's message of
	"Fri, 14 Jul 2006 18:21:18 EDT")
References: <386.82c81af.31e972de@aol.com>
Message-ID: <m3irlzsd6r.fsf@chenla.org>

Bowerbird@aol.com writes:

> as i've said, i think the cataloging tool needs to give people
> an ability to rule out unwanted classes of e-texts, including:
>
> first cut -- by language
> second cut -- human genome
> third cut -- audio
> fourth cut -- video
> fifth cut -- magazines
> sixth cut -- copyrighted
> seventh cut -- reference
> eighth cut -- religious
> ninth cut -- poetry
> tenth cut -- plays
> eleventh cut -- short story collections
> twelfth cut -- anthologies

Functionally this is no different from using Borges' fictional Chinese
encyclopedia for dividing different kinds of animals.

first cut -- belonging to the Emperor
second cut --  embalmed
third cut --  tame
fourth cut -- sucking pigs
fifth cut -- sirens
sixth cut -- fabulous
seventh cut -- stray dogs
eigth cut -- included in the present classification
ninth cut -- frenzied
tenth cut --  innumerable
eleventh cut -- drawn with a very fine camelhair brush
twelfth cut -- et cetera
thirteenth cut -- having just broken the water pitcher
fifteenth cut -- that from a long way off look like flies

Your folders are just as semantically flat as tags.

You're also mixing different classes of metadata. 

  language : English, French, Finnish etc. 
  format   : audio, video, html, plain_text
  form     : prose, poetry, drama, anthology, serial, reference
  nature   : fiction, biography, religious, textbook
  licence  : pg_licence, cc_licence, restricted, gpl etc.

Though personally, I would love to be able to rule out all ebooks
"having just broken the water pitcher".

b/

-- 
Brad Collins <brad@chenla.org>, Banqwao, Thailand
From Bowerbird at aol.com  Fri Jul 14 22:59:43 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Jul 14 22:59:50 2006
Subject: [gutvol-d] first pass at a p.g. categorization scheme
Message-ID: <2e3.9e537dd.31e9de4f@aol.com>

brad said:
>    Functionally this is no different from using 
>    Borges' fictional Chinese encyclopedia for 
>    dividing different kinds of animals.
>
>    first cut -- belonging to the Emperor
>   second cut --? embalmed
>   third cut --? tame
>   fourth cut -- sucking pigs
>   fifth cut -- sirens
>   sixth cut -- fabulous
>   seventh cut -- stray dogs
>   eigth cut -- included in the present classification
>   ninth cut -- frenzied
>   tenth cut --? innumerable
>   eleventh cut -- drawn with a very fine camelhair brush
>   twelfth cut -- et cetera
>   thirteenth cut -- having just broken the water pitcher
>   fifteenth cut -- that from a long way off look like flies

ok, brad, you create that categorization scheme,
and i'll continue with the work on creating mine
(because i'm not looking for help from anyone),
and we'll see which one appeals to users more...        :+)


>    Your folders are just as semantically flat as tags.

and your "semantically rich" system is a pipedream.       :+)


>    You're also mixing different classes of metadata.

which just goes to show how pointless "metadata" is.        :+)

dublin core.   yeah, right...

-bowerbird

p.s.   and if you want the first pass at collaborative filtering,
just scrape amazon screens for their recommendations...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060715/32f2ce97/attachment-0001.html
From cannona at fireantproductions.com  Sat Jul 15 06:47:26 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sat Jul 15 06:48:57 2006
Subject: [gutvol-d] final DVD up on the torrent tracker, with fixes
Message-ID: <000401c6a815$5bad4720$0300a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello all.  As you know, we announced a DVD a few days ago and said that it
was ready.  Well, turns out that it was missing several hundred files.  This
has been fixed,and you can now download the true final version from
http://snowy.arsc.alaska.edu:6969 .

Thanks Greg for getting this done, and sorry for any inconvenience that this
might have caused anyone.

Sincerely
Aaron Cannon


 --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
 address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFEuPJLI7J99hVZuJcRAqinAKCBTSpqojhT9Vq+mRM2cOGiXqvUAACg+v0X
y/7To5652Hj09weFlCpwxlA=
=8F32
-----END PGP SIGNATURE-----

From brad at chenla.org  Sat Jul 15 07:20:39 2006
From: brad at chenla.org (Brad Collins)
Date: Sat Jul 15 07:17:53 2006
Subject: [gutvol-d] first pass at a p.g. categorization scheme
In-Reply-To: <2e3.9e537dd.31e9de4f@aol.com> (Bowerbird@aol.com's message of
	"Sat, 15 Jul 2006 01:59:43 EDT")
References: <2e3.9e537dd.31e9de4f@aol.com>
Message-ID: <m3ejwnrlu0.fsf@chenla.org>

Bowerbird@aol.com writes:

> which just goes to show how pointless "metadata" is.        :+)
>
> dublin core.   yeah, right...

Okay, perhaps I was a tad harsh.  But at the same time you are missing
the two points I was trying to make.

  a) It's not trivial to create a taxonomy because there are so many
     different ways that people organize things.

  b) Metadata is simply breaking down information that describes
     something into well defined key/value pairs which have some 
     commonality

When the Internet came along a lot of people (including Yahoo)
thought, ah, this ain't so tough, we don't need no stink'n
librarians.  By and large, those systems suck.

Librarians think in long time frames, so often they are a bit behind
what is happening on the edge.  But that doesn't mean that the
centuries of knowledge and experience they have accumulated it
worthless.

For stuff that has been created in the last five minutes or even
fifteen months, tags are a fantastic means of categorizing content.

But for anything that has survived longer than that and should be
preserved, a solid cataloging regime should be used, supervised by
folks who know what they are doing.

Even for material that has already been formally cataloged, adding
tags will still be a useful means of providing immediate context which
a formal catalog can't provide.

But I'm sorry.  A Zen ML approach to cataloging?  That dog don't hunt.

b/

-- 
Brad Collins <brad@chenla.org>, Banqwao, Thailand
From Bowerbird at aol.com  Sat Jul 15 10:58:17 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Jul 15 10:58:27 2006
Subject: [gutvol-d] first pass at a p.g. categorization scheme
Message-ID: <553.3224165.31ea86b9@aol.com>

brad said:
>    For stuff that has been created in the last five minutes or even
>    fifteen months, tags are a fantastic means of categorizing content.

perhaps we should define our terms.
do you think what i'm using is "tags"?


>    But for anything that has survived longer than that and 
>    should be preserved, a solid cataloging regime should be used, 
>    supervised by folks who know what they are doing.

that's an interesting postulate.   at least until we ask about
where this "solid cataloging regime" is, and who among us
is the "folks who know what they are doing" who should be
"supervising"... i dunno, i guess the rest of us, presumably.

you seem to be laboring under the impression that there are
a fleet of highly-trained employees waiting for your leadership,
and will jump to the task as soon as they receive instructions...

i'm laying out a system that _i_ can create, all by _myself_,
if necessary, which can be deployed easily, using software
that i will write myself, all by myself, which i am _certain_
will have some usefulness to some end-users out there,
without imposing any requirements on p.g. as a whole,
and thus is totally "non-exclusive", which means that you
are free to do the same thing, and our two methodologies
can compete on the level playing-field of real-life users...

so, have at it, my friend, have at it...               :+)


>    But I'm sorry.? A Zen ML approach to cataloging?? 
>    That dog don't hunt.

then i shouldn't be able to come home with any birds.

right?

so let us see who can actually feed the end-users and
who is left standing at the chalkboard in front of an
empty classroom while they go hungry, shall we?      ;+)

the proof will be in the usage.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060715/265e7f03/attachment.html
From joey at joeysmith.com  Sun Jul 16 00:46:35 2006
From: joey at joeysmith.com (joey)
Date: Sun Jul 16 00:49:26 2006
Subject: [gutvol-d] first pass at a p.g. categorization scheme
In-Reply-To: <553.3224165.31ea86b9@aol.com>
References: <553.3224165.31ea86b9@aol.com>
Message-ID: <20060716074635.GN20863@joeysmith.com>

On Sat, Jul 15, 2006 at 01:58:17PM -0400, Bowerbird@aol.com wrote:
> perhaps we should define our terms.
> do you think what i'm using is "tags"?

I see no distinction between your model and mine, other than what
they're called.

From joey at joeysmith.com  Sun Jul 16 05:38:35 2006
From: joey at joeysmith.com (joey)
Date: Sun Jul 16 05:41:28 2006
Subject: [gutvol-d] Categorizing PG content
In-Reply-To: <1e8e65080607141924j38a25b8boe6a43c58f8fe6b06@mail.gmail.com>
References: <57b.fcf4c8.31e81845@aol.com>
	<1681483192.20060713162854@noring.name> <m3mzbbsjsa.fsf@chenla.org>
	<1e8e65080607141924j38a25b8boe6a43c58f8fe6b06@mail.gmail.com>
Message-ID: <20060716123835.GO20863@joeysmith.com>

On Fri, Jul 14, 2006 at 04:24:39PM -1000, Karen Lofstrom wrote:
> On 7/14/06, Brad Collins <brad@chenla.org> wrote:
> 
> To my untutored eye, it looks good. I wonder if we could make a start
> at adding that info to all PG texts and ALSO develop an extra-textual
> cataloguing system that might contain more detail.

	I don't think this is a bad idea at all.

> My dream library would also contain info on how often a text had been
> downloaded and a rating/recommendation system like the various book
> and movie rating systems out there. You know, "Readers who liked
> "Campfire Girls Go Bananas" also liked "Campfire Girls Make Whoopee".

	This is one of the reasons I wanted to go with a tagging/folksonomy
model. I've already begun reading some of the available research on how
to creative collaborative filtering engines using folksonomies as seeds.
From Bowerbird at aol.com  Sun Jul 16 10:35:35 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Jul 16 10:35:46 2006
Subject: [gutvol-d] first pass at a p.g. categorization scheme
Message-ID: <2d6.2e0d6680.31ebd2e7@aol.com>

joey said:
>    I see no distinction between your model and mine, 
>    other than what they're called.

i can't really say that there _is_ a difference between them,
joey, not until your model is fleshed out with a real live app.

in my version, a book is represented by alias files that live
in various folders.   the names of those folders _could_be_
considered as "tags".   but that's not how "tagging systems"
are generally architected, not if i understand 'em correctly.

what i'm looking to create is a simple system that people can
understand implicitly, and operate easily on their machines...

how that system is labeled is nothing but a semantic matter,
just as long as everyone understands exactly how it works.

i'll give people a "starter-set" of folders, but after that, they
can develop things from there according to their own aims.

if they want a category called "phat books", they tell my app
to create a "phat books" folder, then they start checking off
the books that they want to have represented in that folder.

i see this as more disciplined and restrained than tagging,
where idiosyncratic tags are more-or-less routinely applied.
but, you know, i guess there's nothing to stop a person from
generating many folders with just one or two books in each.

at any rate, it is this _personalization_ of the categorization
that i see as the main difference between folders and tags.
tagging systems usually operate in a social network arena.

and this is perhaps an important distinction as well, in that
i see my system running as an app on a person's machine.
although i think people haven't been too clear on it thus far,
it seems that most of you see this operating on a webserver.
as an aside, y'all might want to look at ning.com for a means
by which you can easily create a social-networking web-app.

now, it may well be that the best start-set of folders is created
via a tagging system, perhaps one that is generated on a wiki.
as i said yesterday, though, i'm more interested in doing this
_by_myself_, because it seems too difficult to get any helpers,
and too unwieldy to build a system that captures all their help.

(it's far easier to just write the program for the end-user and
leverage the work that's already done in regard to cataloging.
for instance, one of the first cuts will be on the _language_,
and i can determine that by simply checking that in the file.)

but, speaking of a wiki, i think what you would get from that
would be more amenable to my "folder" structure than tags,
because each "page" on the wiki would represent a "folder".

at least that's how _i_ would organize the wiki.   for instance,
i'd have a "beatrice potter" wikipage that listed all her e-texts.
and i'd have an "esperanto" wikipage listing all those e-texts.

(of course, you could also organize the wiki with each page
representing an e-text, and then apply the tags on the page.
but i think you would find that approach to be cumbersome.
again, until i can see an actual working example on your end,
it's difficult for me to comment positively or negatively on it.)

but since i'm doing my thing by myself, the architecture of my
catalog depends on being able to collect almost all of the data
_programmatically_, via computerized analysis of the e-texts.

the other source of information i will use in generating the
starter-set of folders is the catalog-structure richard seltzer
has set up over at samizdat.com.   (and, if i could recover it,
i'd add the one that david moynihan had at blackmask.com.)

to sum up, i don't want to spend a lot of time generating
the initial catalog structure, and i don't want to spend a lot
of time assigning e-texts within that catalog structure.   ok?
my third concern is that people can modify to their desire.

there are other concerns, too, such as being able to capture
any additional information that people might contribute in
the long run while modifying their catalog (the arena where
tags really shine), but my 3 main concerns are the ones listed.

also, a main goal of the starter-set is to give end-users a way
to quickly eliminate the parts of the library they do not want
to have downloaded to their machine, and give the rest of it
a basic structure that can be navigated easily and intuitively,
and i think the "folder" model qualifies well in that regard...

so yes, you might be right that the same program that can
administer the folder-structure might be an equivalent of
one that can administer a tagging model.   or it might not.
i know what my app will look like.   i need to see the other.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060716/e1ef9d10/attachment.html
From marcello at perathoner.de  Sun Jul 16 16:59:56 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Jul 16 17:00:36 2006
Subject: [gutvol-d] Categorizing in Wiki
In-Reply-To: <20060716123835.GO20863@joeysmith.com>
References: <57b.fcf4c8.31e81845@aol.com>	<1681483192.20060713162854@noring.name>
	<m3mzbbsjsa.fsf@chenla.org>	<1e8e65080607141924j38a25b8boe6a43c58f8fe6b06@mail.gmail.com>
	<20060716123835.GO20863@joeysmith.com>
Message-ID: <44BAD2FC.7020303@perathoner.de>

Before everybody goes all warm and fuzzy about his/her pet
categorization scheme, let me remind you that the discussion started
about how to use the wiki for categorizing.

A wiki has no built-in authority control. If we want to end up with
useful categories we need to develop a restricted vocabulary. The good
news is: if we use one page per category, the vocabulary will build
itself. Pages can easily be split or merged whenever the vocabulary changes.

Also, it is very easy to harvest sites that already have categorized pg
books and convert their data into a wiki list.


The easiest way to start is to:

 1. Create an account
 2. Create a page containing a list of books
 3. Add the page to the "Bookshelf" category

like here:

  http://www.gutenberg.org/wiki/Detective_Fiction


Remember that this is a wiki. Don't expect things you edit to stay
edited. If you want to express a personal opinion use a subpage of your
user page like this:

  http://www.gutenberg.org/wiki/User:Marcello/Marcello's_Tops_and_Flops


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joey at joeysmith.com  Sun Jul 16 17:25:45 2006
From: joey at joeysmith.com (joey)
Date: Sun Jul 16 17:28:40 2006
Subject: First Pass at Tagging Site [was Re: [gutvol-d] Categorizing PG
	content]
In-Reply-To: <Pine.GSO.4.58.0607100048510.933@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0607100048510.933@vtn1.victoria.tc.ca>
Message-ID: <20060717002545.GC22029@joeysmith.com>

I've got a first prototype up. Keep in mind I've only spent about half an hour
on this so far, but I wanted to get something out so that it's not pure vapour.

I don't have a lot of bandwidth on this host, so please don't do anything like
trying to index/crawl the site. The code and database are available for anyone
who'd like to see it.

http://www.joeysmith.com:8080/

Some things I should point out:

1) The database model isn't where I want it yet. It only supports the parent/child
	level of tag nesting.

2) I only seeded it with 1000 books to start.

3) I seeded the tag list with just some VERY basic tags initially.
	special/untagged - All books have this tag
	special/none     - This is a placeholder. It will probably not be in future
						releases.
	special/all		 - Also a placeholder.
	language/		 - This is a list of all the languages we have books for.

4) This is still in the beginning stages, so the add and remove tag links do
	not yet work.
From joey at joeysmith.com  Sun Jul 16 22:14:47 2006
From: joey at joeysmith.com (joey)
Date: Sun Jul 16 22:17:44 2006
Subject: [gutvol-d] first pass at a p.g. categorization scheme
In-Reply-To: <2d6.2e0d6680.31ebd2e7@aol.com>
References: <2d6.2e0d6680.31ebd2e7@aol.com>
Message-ID: <20060717051447.GD22029@joeysmith.com>

On Sun, Jul 16, 2006 at 01:35:35PM -0400, Bowerbird@aol.com wrote:
> joey said:
> >    I see no distinction between your model and mine, 
> >    other than what they're called.
> 
> i can't really say that there _is_ a difference between them,
> joey, not until your model is fleshed out with a real live app.

Really, the only difference I can see between your model and mine
is that mine is a server-side, collaborative effort, while yours
builds destkop-oriented data islands. And I don't see why they
would need to be mutually exclusive, either. Perhaps you could
populate your client-side app with a list of "folders" and "alias
files" generated by polling my server-side app, and perhaps people
could publish their data island back to the world as a set of
tags.
From Bowerbird at aol.com  Mon Jul 17 10:50:21 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Jul 17 10:50:32 2006
Subject: [gutvol-d] first pass at a p.g. categorization scheme
Message-ID: <2fc.28b29100.31ed27dd@aol.com>

joey said:
>   Really, the only difference I can see between your model and mine
>    is that mine is a server-side, collaborative effort, while yours
>    builds destkop-oriented data islands.

well, you've done some semantic loading there,
contrasting "collaborative" with "data-islands".

i could counter with "nonpersonalized" versus
"individualized", but i don't see much purpose.

ultimately, the catalog should work both online
and offline.   and either approach is capable of it.

the real difference is shown when you contrast
your system with the wiki system marcello made,
which operates in a fashion similar to my system.

in your system, tags are applied toward e-texts.
in marcello's system (or mine), the e-texts are
applied toward categories.   that's the difference.

again, whether we want to consider that to be
a "significant" difference doesn't really matter,
as long as we understand that _is_ the difference.

all this boils down to the catalogers:   would they
rather start with a _category_, and apply e-texts,
or would they rather start with an e-text and then
apply categories.   i tend to think it'd be the former.

(and certainly, from my standpoint of programmatic
cataloging, rather than human-enacted, the former
approach is far more easy for me to write code for.)

there are still some things you haven't shown us 
with your example.   the first is how nimble it is,
once you have scaled it up to the 20,000 e-texts.

so i multiplied your 1000-set by 20 times.   it's here:
>    http://snowy.arsc.alaska.edu/bowerbird/misc/joeytags.html
at 3.1 megs, it'll be a bit of a pain for dial-up users.

the second is the back-end work that you will do
to assign tags automatically.   (for instance, although
i recognize you were just using it for your example,
language tags are unnecessary, because that info
is already available in the current catalog, and just
needs to be converted.)

the third is how you're gonna let your catalogers
add new tags to your standard set.

the fourth -- a big one -- is how the catalog will be
made into a public-facing entity end-users navigate.

but none of these are especially _difficult_ challenges,
so i'm sure you can manage them, and i look forward
to experimenting with your system when it's finished.


>   And I don't see why they would need to be 
>    mutually exclusive, either.

sure, the data can be munged to work either way;
as usual, it is just a question of _doing_ that work.

and giving cataloger volunteers a choice of models
would seem to me to be the best way to proceed...

but my mission is not to work with volunteers at all,
but rather to build a system myself and then put it
directly into the hands of end-users.   that's my focus.

my gut feeling is that a wiki-style approach will be
more successful at attracting volunteers than your
tag model, but i wouldn't be surprised if i'm wrong.

marcello's wiki has an inordinate level of difficulty;
i'd allow people to simply enter the e-text number,
and then automatically generate all the other info.
having them enter that info (and then formatting 
all the links in wiki-markup) is a recipe for errors,
plus it raises the cost of contributing to the point
where i think you would have very few volunteers...


>   Perhaps you could populate your client-side app 
>    with a list of "folders" and "alias files" generated by 
>    polling my server-side app, and perhaps people could 
>    publish their data island back to the world as a set of tags.

you are welcome to collect data from my model if you want.

and just as i'll be getting info from samizdat and blackmask,
i'll also collect data from your server-side approach if i can...

except i expect to be finished with my effort (limited as it is)
before you even get fully started with your (unlimited) one...
 
-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060717/0581a771/attachment.html
From Bowerbird at aol.com  Wed Jul 19 00:44:30 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Jul 19 00:44:40 2006
Subject: [gutvol-d] 20, 000 e-texts versus 300 billion dollars (a rather
	remarkable ant)
Message-ID: <48a.62de9c2.31ef3cde@aol.com>


juxtaposition with p.g. hitting 20,000 e-texts...

the cost of the war will soon hit $300 _billion_.

(very soon.   it just crossed the $297 billion mark,
and a billion dollars just ain't what it used to be.)

the power to create, the power to destroy.
the human is a rather remarkable ant, eh?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060719/66ab3e3b/attachment.html
From joey at joeysmith.com  Thu Jul 20 18:19:32 2006
From: joey at joeysmith.com (joey)
Date: Thu Jul 20 18:23:09 2006
Subject: [gutvol-d] first pass at a p.g. categorization scheme
In-Reply-To: <2fc.28b29100.31ed27dd@aol.com>
References: <2fc.28b29100.31ed27dd@aol.com>
Message-ID: <20060721011932.GD7576@joeysmith.com>

Once again, I've found that my interest in PG cannot overcome my
dislike for certain people who've chosen to involve themselves.

I tried to convince myself I could get past it this time, and even
got so far as to write some code, but my anger and apathy have won
out again. If there's anyone who's interested in the Turbogears
project I put up the other day, let me know, but I doubt I'll put
any more effort into it at this point.
From Bowerbird at aol.com  Sat Jul 22 14:19:02 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Jul 22 14:19:13 2006
Subject: [gutvol-d] 1:5 scale
Message-ID: <2d3.bb42e4c.31f3f046@aol.com>

i'm doing a 1:5 scale reworking of the p.g. library.

i have selected e-texts 10000-14000 to work on.

one early step is a massive clean-up of the catalog.

this might involve serious breakage of compatibility
with the existing library.   (in other words, you won't
necessarily be able to easily import my corrections.)

if there is anyone out there who would like me to
maintain a limited compatibility, so as to dovetail
with their work, please do inform me immediately,
and we can have a discussion about cooperation...

thank you.

-bowerbird

p.s.   if anyone would like to design the user interface
for a program for end-users to access this reworking,
feel free to share that too, frontchannel or backchannel.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060722/67c2a051/attachment.html
From Bowerbird at aol.com  Sat Jul 22 16:27:12 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Jul 22 16:27:22 2006
Subject: [gutvol-d] re: 1:5 scale
Message-ID: <bbc.1aefc07.31f40e50@aol.com>

speaking of interfaces, wow,
a whole shitload of librarians
just got their asses kicked bad...

>    http://www.amitgupta.info/E41ST/

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060722/c6f184bc/attachment.html
From hyphen at hyphenologist.co.uk  Sat Jul 22 21:43:30 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sat Jul 22 21:43:43 2006
Subject: [gutvol-d] 1:5 scale
In-Reply-To: <2d3.bb42e4c.31f3f046@aol.com>
References: <2d3.bb42e4c.31f3f046@aol.com>
Message-ID: <bfv5c2hettk245ltovbinb1kcuquosp5hi@4ax.com>

On Sat, 22 Jul 2006 17:19:02 EDT,  Bowerbird@aol.com wrote:

|i'm doing a 1:5 scale reworking of the p.g. library.

What on earth is one fifth of a library?

-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain

From Bowerbird at aol.com  Sun Jul 23 14:08:24 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Jul 23 14:08:34 2006
Subject: [gutvol-d] 1:5 scale
Message-ID: <4a7.589392f.31f53f48@aol.com>

dave said:
>    What on earth is one fifth of a library?

first and foremost, it means porta-potties instead of rest-rooms.

other than that, it means we have all the books, but just up
through page 18 with each one, because statistics tell us that
that's only how far most people read in a book that they buy...

>   Most readers do not get past page 18 in a book they have purchased.
>   http://www.parapublishing.com/sites/para/resources/statistics.cfm

many of you might know dan poynter, who collected all these
statistics from various places, as the guru of self-publishing...

michael, once you get past the frustration of poynter's statistics,
i'm positive you will find his web-page to be utterly fascinating...

here are a few samples, on publishers:
>   18. On average, they pay $465.17 for a simple cover design 
>    to as much as $3,533.26 for a complex cover design. 
>    Typical cover costs range $450 to $3,000.
...
>    24. An average of 10 to 15 hours are spent designing a book cover.
>    25. On average, 61 hours are spent in the editing process.
>    26. On average, 29 hours are spent producing a news release for a new 
book.

really, michael, i look forward to a ton of new posts from you
based on the motivation you'll get from all of these statistics...        :+)

-bowerbird

p.s.   dave, actually it just means that i started out with 4,000
out of the (roughly) 20,000 e-texts in the p.g. library, so 1/5.
for some reason (i'm not sure why), model-builders have always
been fond of the 1/5 scale.   it's big enough to be "realistic", yet
still small enough to be a "model".   it makes things very cute, too.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060723/339b1484/attachment.html
From kth at srv.net  Mon Jul 24 09:24:23 2006
From: kth at srv.net (Kevin Handy)
Date: Mon Jul 24 09:09:30 2006
Subject: [gutvol-d] 1:5 scale
In-Reply-To: <bfv5c2hettk245ltovbinb1kcuquosp5hi@4ax.com>
References: <2d3.bb42e4c.31f3f046@aol.com>
	<bfv5c2hettk245ltovbinb1kcuquosp5hi@4ax.com>
Message-ID: <44C4F437.8060404@srv.net>

Dave Fawthrop wrote:

>On Sat, 22 Jul 2006 17:19:02 EDT,  Bowerbird@aol.com wrote:
>
>|i'm doing a 1:5 scale reworking of the p.g. library.
>
>What on earth is one fifth of a library?
>
>  
>
Using a smaller font to save disk space?  ;)

From marcello at perathoner.de  Mon Jul 24 09:31:20 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon Jul 24 09:31:24 2006
Subject: [gutvol-d] 1:5 scale
In-Reply-To: <44C4F437.8060404@srv.net>
References: <2d3.bb42e4c.31f3f046@aol.com>	<bfv5c2hettk245ltovbinb1kcuquosp5hi@4ax.com>
	<44C4F437.8060404@srv.net>
Message-ID: <44C4F5D8.7080409@perathoner.de>

Kevin Handy wrote:

>> What on earth is one fifth of a library?
>
> Using a smaller font to save disk space?  ;)

Some people just don't play with a full deck.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Mon Jul 24 11:44:42 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Jul 24 11:44:50 2006
Subject: [gutvol-d] re: 1:5 scale
Message-ID: <56b.264f307.31f66f1a@aol.com>

well, obviously, in prototyping a system,
you use just part of the data, not all of it.

especially on something like this.         :+)

i subset the content so i can _examine_ it,
molding it into shape manually if necessary.

in the process of that, i gain an understanding
of what needs to be done, so i can program it,
and i develop the first pass at those routines...

for instance, as i said, one of the first tasks is
whipping the catalog into the shape i want it...

that job has already taken me a number of hours,
and it's not done yet.   the catalog was quite a mess
-- and hey, it's just titles and author-names! -- so
all told, it'll probably take me some 20 hours, and
maybe 30, just for this 1:5 subset... 

you can see the current state of my clean-up work here:
>    http://snowy.arsc.alaska.edu/bowerbird/mcl/-catalog

there's still work that needs to be done on subtitles,
and on the "mirror" titles (which were a total disaster),
but other than that, this data is now very consistent...

by the time i'm done with this, i'll have good routines
to clean it up automatically, to the extent it's possible.

so i expect the next 1/5 of the catalog to be cleaned
in half the time -- 10-15 hours.   during each phase,
i'll pick up more information on how to automate it.

so the next 1/5 of the catalog will take half the time,
about 6-8 hours.   and the next 1/5 will take half that,
about 3-4 hours.   and the last 1/5 will take 1-2 hours.

and by then, i'll have some very well-polished routines.
so if i decided to do the whole job over, from scratch,
for maximal consistency, it'd take about 4-10 hours...

this time-savings, via automation, is what you want.
that's why you just do a subset of the data in a model.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060724/22648fad/attachment.html
From jon at noring.name  Wed Jul 26 10:17:53 2006
From: jon at noring.name (Jon Noring)
Date: Wed Jul 26 10:24:19 2006
Subject: [gutvol-d] World eBook Fair: 12 million downloads. Anyone notice?
Message-ID: <459964460.20060726111753@noring.name>

Everyone,

I just posted a TeleRead blog article, which in turn links to the blog
article posted by Catherine Hodge at DPP Store, about the World eBook
Fair (WeBF).

My blog article:

   http://www.teleread.org/blog/?p=5230

Catherine's blog article: 

   http://dppebookstore.blogspot.com/2006/07/world-ebook-fair-12-million-downloads.html


Both Catherine and I are perplexed by the lack of public discussion
about the WeBF on the various ebook-related forums such as this one.
What are your thoughts?

Jon Noring

From sly at victoria.tc.ca  Wed Jul 26 13:23:30 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Jul 26 13:23:34 2006
Subject: [gutvol-d] World eBook Fair: 12 million downloads. Anyone notice?
In-Reply-To: <459964460.20060726111753@noring.name>
References: <459964460.20060726111753@noring.name>
Message-ID: <Pine.GSO.4.58.0607261309220.14091@vtn1.victoria.tc.ca>


On Wed, 26 Jul 2006, Jon Noring wrote:

> Both Catherine and I are perplexed by the lack of public discussion
> about the WeBF on the various ebook-related forums such as this one.
> What are your thoughts?
>

Ok, since you ask, I'll share my viewpoint.

I think that most PG volunteers are aware that PG
texts are widely reused, reformatted, and re-presented,
on many websites (and sometimes in print, as well).
The WEBF can be seen as just one more of these
instances.

And yes, I know that it also contains many texts
from other collections. For years there has been
thousands of other texts online that cannot be
found in PG. Again, the WEBF is just one more of
these instances.

What I do give it credit for is good marketing.
It's like putting up a big sign saying:
"Free for a limited time only!" and giving
away material which you can find freely any
time you want. However, for some people, that
might be the best way to get their interest.

It certainly does fit into Michael Hart's vision
of giving away as many eBooks as possible to as
many people as possible in as many ways as possible.

Andrew
From joey at joeysmith.com  Wed Jul 26 23:00:11 2006
From: joey at joeysmith.com (joey)
Date: Wed Jul 26 23:04:50 2006
Subject: [gutvol-d] World eBook Fair: 12 million downloads. Anyone notice?
In-Reply-To: <459964460.20060726111753@noring.name>
References: <459964460.20060726111753@noring.name>
Message-ID: <20060727060011.GE7576@joeysmith.com>

For my part, I volunteered to help Greg with readingroo.ms administration
and then promptly forgot that you can't all see the graphs I can about the
amount of throughput that server generated. :)
From gbnewby at pglaf.org  Thu Jul 27 02:41:16 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Jul 27 02:41:18 2006
Subject: [gutvol-d] World eBook Fair: 12 million downloads. Anyone notice?
In-Reply-To: <20060727060011.GE7576@joeysmith.com>
References: <459964460.20060726111753@noring.name>
	<20060727060011.GE7576@joeysmith.com>
Message-ID: <20060727094116.GB2352@pglaf.org>

On Thu, Jul 27, 2006 at 12:00:11AM -0600, joey wrote:
> For my part, I volunteered to help Greg with readingroo.ms administration
> and then promptly forgot that you can't all see the graphs I can about the
> amount of throughput that server generated. :)

They're here (though maybe someday they should be
password-protected):

	http://ibis.riseup.net/munin/ms/readingroo.ms.html

We maxed at 100Mbps last week, and have been pushing
40-60Mbps daily, with peaks during the daytime in
Europe/Asia.
  -- Greg

From marcello at perathoner.de  Thu Jul 27 06:15:49 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Jul 27 06:15:53 2006
Subject: [gutvol-d] World eBook Fair: 12 million downloads. Anyone notice?
In-Reply-To: <459964460.20060726111753@noring.name>
References: <459964460.20060726111753@noring.name>
Message-ID: <44C8BC85.5060700@perathoner.de>

According to worldebookfair.com they serve 1 million ebooks / day.

gutenberg.org serves 60.000 ebooks / day.

According to alexa? worldebookfair.com gets less traffic than
gutenberg.org and still they manage to serve 16 times as many ebooks. I
wonder how they do that?

?) http://www.gutenberg.org/internal/stats/alexa
   user: internal
   pass: books


On the plus side gutenberg.org gets some traffic from
worldebookfair.com. This is where people came from in July:

Listing the top 30 referring sites by the number of requests, sorted by
the number of requests.

reqs	%reqs	site
236070	19.04%	http://www.google.com/
125094	10.09%	http://en.wikipedia.org/
107354	8.66%	http://worldebookfair.com/
57974	4.68%	http://search.yahoo.com/
31210	2.52%	http://www.google.co.uk/
25132	2.03%	http://www.promo.net/
18850	1.52%	http://www.google.co.in/
17347	1.40%	http://www.google.ca/
16011	1.29%	http://www.google.de/
15762	1.27%	http://www.stumbleupon.com/
13664	1.10%	http://profile.myspace.com/
13238	1.07%	http://www.google.com.au/
12807	1.03%	http://my.yahoo.com/
12650	1.02%	http://www.google.fr/
12649	1.02%	http://64.233.179.104/
11854	0.96%	http://www.digg.com/
11694	0.94%	http://digg.com/
9228	0.74%	http://www.google.com.ph/
8621	0.70%	http://search.msn.com/
7801	0.63%	http://www.ovelho.com/
7671	0.62%	http://www.worldebookfair.com/
6568	0.53%	http://66.249.93.104/
6487	0.52%	http://oldfashionededucation.com/
6475	0.52%	http://www.google.es/
6023	0.49%	http://www.google.it/
5894	0.48%	http://www.google.pl/
5854	0.47%	http://librivox.org/
5824	0.47%	http://www.google.com.br/
5751	0.46%	http://www.google.nl/
5106	0.41%	http://luminis1.wright.edu/
413228	33.33%	[not listed: 20,347 sites]

-- 
Marcello Perathoner
webmaster@gutenberg.org

From JBuck814366460 at aol.com  Thu Jul 27 07:21:17 2006
From: JBuck814366460 at aol.com (Jared Buck)
Date: Thu Jul 27 07:21:24 2006
Subject: [gutvol-d] World eBook Fair: 12 million downloads. Anyone notice?
In-Reply-To: <44C8BC85.5060700@perathoner.de>
References: <459964460.20060726111753@noring.name>
	<44C8BC85.5060700@perathoner.de>
Message-ID: <44C8CBDD.1040002@aol.com>

Marcello Perathoner wrote on 27/07/2006, 6:15 AM:

 > According to worldebookfair.com they serve 1 million ebooks / day.
 >
 > gutenberg.org serves 60.000 ebooks / day.
 >
 > According to alexa? worldebookfair.com gets less traffic than
 > gutenberg.org and still they manage to serve 16 times as many ebooks. I
 > wonder how they do that?
 >
 > ?) http://www.gutenberg.org/internal/stats/alexa
 >    user: internal
 >    pass: books
 >
 >
 > On the plus side gutenberg.org gets some traffic from
 > worldebookfair.com. This is where people came from in July:
 >
 > Listing the top 30 referring sites by the number of requests, sorted by
 > the number of requests.
 >
 > reqs    %reqs   site
 > 236070  19.04%  http://www.google.com/
 > 125094  10.09%  http://en.wikipedia.org/
 > 107354  8.66%   http://worldebookfair.com/
 > 57974   4.68%   http://search.yahoo.com/
 > 31210   2.52%   http://www.google.co.uk/
 > 25132   2.03%   http://www.promo.net/
 > 18850   1.52%   http://www.google.co.in/
 > 17347   1.40%   http://www.google.ca/
 > 16011   1.29%   http://www.google.de/
 > 15762   1.27%   http://www.stumbleupon.com/
 > 13664   1.10%   http://profile.myspace.com/
 > 13238   1.07%   http://www.google.com.au/
 > 12807   1.03%   http://my.yahoo.com/
 > 12650   1.02%   http://www.google.fr/
 > 12649   1.02%   http://64.233.179.104/
 > 11854   0.96%   http://www.digg.com/
 > 11694   0.94%   http://digg.com/
 > 9228    0.74%   http://www.google.com.ph/
 > 8621    0.70%   http://search.msn.com/
 > 7801    0.63%   http://www.ovelho.com/
 > 7671    0.62%   http://www.worldebookfair.com/
 > 6568    0.53%   http://66.249.93.104/
 > 6487    0.52%   http://oldfashionededucation.com/
 > 6475    0.52%   http://www.google.es/
 > 6023    0.49%   http://www.google.it/
 > 5894    0.48%   http://www.google.pl/
 > 5854    0.47%   http://librivox.org/
 > 5824    0.47%   http://www.google.com.br/
 > 5751    0.46%   http://www.google.nl/
 > 5106    0.41%   http://luminis1.wright.edu/
 > 413228  33.33%  [not listed: 20,347 sites]
 >
 > --
 > Marcello Perathoner
 > webmaster@gutenberg.org
 >
 > _______________________________________________
 > gutvol-d mailing list
 > gutvol-d@lists.pglaf.org
 > http://lists.pglaf.org/listinfo.cgi/gutvol-d
 >


I do say, that's good word of mouth to get over 100,000 requests from 
the book fair site.  Of course indeed, a lot of our traffic comes from 
wikipedia (which has a nice article on PG) and from people searching for 
books or for PG itself.

Any word of mouth is great, no matter where it comes from :)

Jared

-- 
            .

From rnmscott at netspace.net.au  Thu Jul 27 07:21:30 2006
From: rnmscott at netspace.net.au (rnmscott@netspace.net.au)
Date: Thu Jul 27 07:21:34 2006
Subject: [gutvol-d] World eBook Fair: 12 million downloads. Anyone notice?
In-Reply-To: <44C8BC85.5060700@perathoner.de>
References: <459964460.20060726111753@noring.name>
	<44C8BC85.5060700@perathoner.de>
Message-ID: <1154010090.44c8cbeae2248@webmail.netspace.net.au>

Has it been about the same (gutenberg downloads) while said fair has been on? 
 
Quoting Marcello Perathoner <marcello@perathoner.de>: 
 
> According to worldebookfair.com they serve 1 million ebooks / day. 
>  
> gutenberg.org serves 60.000 ebooks / day. 
>  
> According to alexa? worldebookfair.com gets less traffic than 
> gutenberg.org and still they manage to serve 16 times as many ebooks. I 
> wonder how they do that? 
>  
 

------------------------------------------------------------
This email was sent from Netspace Webmail: http://www.netspace.net.au

From marcello at perathoner.de  Thu Jul 27 08:30:25 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Jul 27 08:30:30 2006
Subject: [gutvol-d] World eBook Fair: 12 million downloads. Anyone notice?
In-Reply-To: <1154010090.44c8cbeae2248@webmail.netspace.net.au>
References: <459964460.20060726111753@noring.name>	<44C8BC85.5060700@perathoner.de>
	<1154010090.44c8cbeae2248@webmail.netspace.net.au>
Message-ID: <44C8DC11.2090803@perathoner.de>

rnmscott@netspace.net.au wrote:

> Has it been about the same (gutenberg downloads) while said fair has been on? 

I'd say we got about 20% more book downloads in the first two weeks ...
its back to normal now.

  http://www.gutenberg.org/browse/scores/books-downloaded.png


-- 
Marcello Perathoner
webmaster@gutenberg.org

From gbnewby at pglaf.org  Thu Jul 27 11:11:11 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Jul 27 11:11:14 2006
Subject: [gutvol-d] World eBook Fair: 12 million downloads. Anyone notice?
In-Reply-To: <44C8BC85.5060700@perathoner.de>
References: <459964460.20060726111753@noring.name>
	<44C8BC85.5060700@perathoner.de>
Message-ID: <20060727181111.GA8585@pglaf.org>

On Thu, Jul 27, 2006 at 03:15:49PM +0200, Marcello Perathoner wrote:
> According to worldebookfair.com they serve 1 million ebooks / day.
> 
> gutenberg.org serves 60.000 ebooks / day.
> 
> According to alexa? worldebookfair.com gets less traffic than
> gutenberg.org and still they manage to serve 16 times as many ebooks. I
> wonder how they do that?

My first guess is that since Alexa is based on sampling, their
estimate is incorrect.  I've watched traffic from 
wef since it started, and we've been pushing anywhere from
20Mbps to as high as 100Mbps (with typical daily peaks
of 40-60Mbps).  That's a lot of data.  The last time
I heard, UNC (where iBiblio is based) has 600Mbps total
capacity, and about 1/3 of that (200Mbps) is allocated to
iBiblio, where gutenberg.org lives.  Those numbers might
have increased in the last few years, however.

On the other hand, maybe I'm counting wrong.  I'll be
looking at the 7GB access_log (currently) in detail once the WEF is
over, and maybe Marcello can help so we can compare apples
to apples.  I have tried to only include successful/completed
downloads, and also to only include eBooks (not stuff like
front page images and the catalog page), but the count
is based on a simple "grep" so could be off.

One other factoid: We are using iptables to limit the number
of simultaneous connections from a single IP address.  (This
might make for some unhappy proxy users, unfortunately.)

The download total as of right now is just over 19 million.

  -- Greg


> ?) http://www.gutenberg.org/internal/stats/alexa
>    user: internal
>    pass: books
> 
> 
> On the plus side gutenberg.org gets some traffic from
> worldebookfair.com. This is where people came from in July:
> 
> Listing the top 30 referring sites by the number of requests, sorted by
> the number of requests.
> 
> reqs	%reqs	site
> 236070	19.04%	http://www.google.com/
> 125094	10.09%	http://en.wikipedia.org/
> 107354	8.66%	http://worldebookfair.com/
> 57974	4.68%	http://search.yahoo.com/
> 31210	2.52%	http://www.google.co.uk/
> 25132	2.03%	http://www.promo.net/
> 18850	1.52%	http://www.google.co.in/
> 17347	1.40%	http://www.google.ca/
> 16011	1.29%	http://www.google.de/
> 15762	1.27%	http://www.stumbleupon.com/
> 13664	1.10%	http://profile.myspace.com/
> 13238	1.07%	http://www.google.com.au/
> 12807	1.03%	http://my.yahoo.com/
> 12650	1.02%	http://www.google.fr/
> 12649	1.02%	http://64.233.179.104/
> 11854	0.96%	http://www.digg.com/
> 11694	0.94%	http://digg.com/
> 9228	0.74%	http://www.google.com.ph/
> 8621	0.70%	http://search.msn.com/
> 7801	0.63%	http://www.ovelho.com/
> 7671	0.62%	http://www.worldebookfair.com/
> 6568	0.53%	http://66.249.93.104/
> 6487	0.52%	http://oldfashionededucation.com/
> 6475	0.52%	http://www.google.es/
> 6023	0.49%	http://www.google.it/
> 5894	0.48%	http://www.google.pl/
> 5854	0.47%	http://librivox.org/
> 5824	0.47%	http://www.google.com.br/
> 5751	0.46%	http://www.google.nl/
> 5106	0.41%	http://luminis1.wright.edu/
> 413228	33.33%	[not listed: 20,347 sites]
> 
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From jon at noring.name  Thu Jul 27 11:43:46 2006
From: jon at noring.name (Jon Noring)
Date: Thu Jul 27 11:44:00 2006
Subject: [gutvol-d] World eBook Fair: 12 million downloads. Anyone notice?
In-Reply-To: <20060727181111.GA8585@pglaf.org>
References: <459964460.20060726111753@noring.name>
	<44C8BC85.5060700@perathoner.de> <20060727181111.GA8585@pglaf.org>
Message-ID: <475770793.20060727124346@noring.name>

Greg wrote:

> On the other hand, maybe I'm counting wrong.  I'll be
> looking at the 7GB access_log (currently) in detail once the WEF is
> over, and maybe Marcello can help so we can compare apples
> to apples.  I have tried to only include successful/completed
> downloads, and also to only include eBooks (not stuff like
> front page images and the catalog page), but the count
> is based on a simple "grep" so could be off.
>
> One other factoid: We are using iptables to limit the number
> of simultaneous connections from a single IP address.  (This
> might make for some unhappy proxy users, unfortunately.)
>
> The download total as of right now is just over 19 million.

A more telling statistic would be the number of unique downloaders
rather than books downloaded. I hypothesize that a sizable chunk of
the downloads for the WeBF are being done by a relatively small
number of people who are massively downloading the collection,
especially the non-PG stuff.

Jon Noring


From Bowerbird at aol.com  Thu Jul 27 12:24:46 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Jul 27 12:24:58 2006
Subject: [gutvol-d] re: 19 million
Message-ID: <584.1b5aff3.31fa6cfe@aol.com>

greg said:
>   The download total as of right now is just over 19 million.

19 million downloads out of that measly p.r.?   i'm impressed!

i wouldn't have been surprised if nobody even heard about it.

these days, if you don't have a multi-million-dollar ad budget,
the big boys will usually drown you out with all their big noise.

but i guess that word "free" still hasn't lost its magic touch, eh?
people are sheep.

so good job on the hype, michael!   but now get back to work.        :+)

-bowerbird

p.s.   i think to become a discussion topic on one certain listserve
and another certain blog, you have to be promoting some vapor,
and make grandiose promises that e-books will soon cure cancer.
reality -- especially .pdf reality -- is too humdrum for some people.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060727/c5b4f73f/attachment.html