From phil at thalasson.com  Fri Jul  3 13:46:31 2009
From: phil at thalasson.com (Philip Baker)
Date: Fri, 3 Jul 2009 21:46:31 +0100
Subject: [gutvol-d] Re: Has anyone noticed that the gutindex files for this
	year haven't been updated since January?
In-Reply-To: <15cfa2a50906301405r449a85amf884edb102d77ad0@mail.gmail.com>
References: <dc3e58a00906261143t5ef4a9ddh84433343e0c7d782@mail.gmail.com>
	<Pine.GSO.4.58.0906262103200.9881@vtn1.victoria.tc.ca>
	<2A81022B-1955-4BB2-A3AE-E1E3062A900B@uni-trier.de>
	<dc3e58a00906271928y69d8db13k76572d9e8a3474ed@mail.gmail.com>
	<30447D202735492EBF4302C4D4F96955@alp2400>
	<Pine.GSO.4.58.0906272123440.25637@vtn1.victoria.tc.ca>
	<alpine.DEB.1.00.0906272107580.6821@snowy.arsc.alaska.edu>
	<dc3e58a00906272213q3222c1e4hf2b0d2727292e173@mail.gmail.com>
	<5E822D6C-65B2-11DE-BE50-000D93B743B8@thalasson.com>
	<15cfa2a50906301405r449a85amf884edb102d77ad0@mail.gmail.com>
Message-ID: <96BEBE7C-6812-11DE-85B8-000D93B743B8@thalasson.com>


On 30 Jun 2009, at 22:05, Robert Cicconetti wrote:

>
> On Tue, Jun 30, 2009 at 4:12 PM, Philip Baker <phil at thalasson.com> 
> wrote:
>  There are some gaps - the most recent emails, and a big hole - 27825 
> to 28013 where I do not have the posted emails. Where is the archive? 
> GUTINDEX.ALL goes up to 27930.
>
>
> The archive is at http://lists.pglaf.org/pipermail/posted/
>
> However, there is a gap due to technical problems earlier this year.
>
>  I have the posted emails from that period and can send them to you, 
> if you want.
>
> /me rummages through mail folders.
>

Thanks for the offer but don't do anything, at least for the moment. I 
am going to look at the possibility of using the RDF file for what I 
want.


Philip Baker


From Bowerbird at aol.com  Fri Jul  3 16:26:44 2009
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 3 Jul 2009 19:26:44 EDT
Subject: [gutvol-d] happy birthday, project gutenberg
Message-ID: <c69.4befaf0f.377fedb4@aol.com>

so let me be among the first to say
"happy b-day" to project gutenberg
on this sunny 4th-of-july weekend.

the longevity of this cyberlibrary is
due to the intelligent vision of its
founder michael hart who saw the
power of keep-it-simple philosophy
with insistence on plain text, baby...

so it is not without massive irony
that we mark a new development:
the w3c working group for xhtml2
will stop work at the end of 2009...
>    http://www.w3.org/News/2009#item119

instead, there'll be a new emphasis
on html5 -- as "the future of html".

in other words...

all those technocrats telling you that
"xml is the future" for the last decade
were just plain flat-out purely wrong.

good thing you didn't listen to 'em, eh?

thanks, michael, for your vision and
for the tenacity of your persistence...

-bowerbird



**************
A Good Credit Score is 700 or Above. See yours in just 2 easy 
steps! 
(http://pr.atwola.com/promoclk/100126575x1222585087x1201462804/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072&amp;hmpgID=62&amp;
bcd=JulystepsfooterNO62)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090703/ea6f898c/attachment.html>

From cannona at fireantproductions.com  Thu Jul 16 17:06:52 2009
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu, 16 Jul 2009 19:06:52 -0500
Subject: [gutvol-d] A new DVD?
Message-ID: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com>

Hi all.

I'm sorry for disappearing for a while.  I've been dealing with some
health issues, work, and school, but mainly health.  Anyway, I think
it may be time to create a new DVD.  The latest one is 3 years old
this month.  However, the project has obviously grown drastically in
that time, so I'm wondering if one DVD still makes sense?

The drawbacks to creating a 2 DVD collection that immediately come to mind are:
1. In the DVD/CD mailing project, we have consistently been sending
two copies of the DVD, or a DVD and CD.  This will either have to
change, or we will have to pay more for postage.
2. It takes twice as long to download.  On the other hand, if you can
download 4GB, it's probably not that big of a stretch to download 8.
3. It's not as elegant as one DVD.  This is probably the least
important, but just thought I'd mention it as it might prove important
to some.

It is of course possible to stick with just one DVD, but it will
require leaving out a lot.

Any thoughts/ideas?

Aaron

From cannona at fireantproductions.com  Thu Jul 16 17:50:49 2009
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu, 16 Jul 2009 19:50:49 -0500
Subject: [gutvol-d] Re: !@! Re: A new DVD?
In-Reply-To: <alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu>
References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com>
	<alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu>
Message-ID: <628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com>

Hi Michael.

You are probably right about the CDs.  When CDs are requested, we
always send two.

I don't think I know who Richard Seltzer is or what he does.

Like you, I have been watching dual layer DVDs, but the price is still
quite high.  Shop4tech.com has them for $2.50.  It doesn't make sense
considering that one single layer DVD costs just $0.26.

Thanks.

Aaron

On 7/16/09, Michael S. Hart <hart at pobox.com> wrote:
>
>
> First thoughts:
>
> I've not sure there is any need to send CDs except when requested,
> and they will be, probably by those who need them most, but we are
> here and now living mostly in an age DVDs, so send two of them.
>
> When they ask for CDs, send two of those. . .even if the same one.
>
> We can also refer people to Richard Seltzer.
>
> I should also mention dual layered DVDs, but every time I look the
> price still seems too high.
>
> More thoughts?
>
> MH
>
>
>
> On Thu, 16 Jul 2009, Aaron Cannon wrote:
>
>> Hi all.
>>
>> I'm sorry for disappearing for a while.  I've been dealing with some
>> health issues, work, and school, but mainly health.  Anyway, I think
>> it may be time to create a new DVD.  The latest one is 3 years old
>> this month.  However, the project has obviously grown drastically in
>> that time, so I'm wondering if one DVD still makes sense?
>>
>> The drawbacks to creating a 2 DVD collection that immediately come to mind
>> are:
>> 1. In the DVD/CD mailing project, we have consistently been sending
>> two copies of the DVD, or a DVD and CD.  This will either have to
>> change, or we will have to pay more for postage.
>> 2. It takes twice as long to download.  On the other hand, if you can
>> download 4GB, it's probably not that big of a stretch to download 8.
>> 3. It's not as elegant as one DVD.  This is probably the least
>> important, but just thought I'd mention it as it might prove important
>> to some.
>>
>> It is of course possible to stick with just one DVD, but it will
>> require leaving out a lot.
>>
>> Any thoughts/ideas?
>>
>> Aaron
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d at lists.pglaf.org
>> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From prosfilaes at gmail.com  Thu Jul 16 18:15:40 2009
From: prosfilaes at gmail.com (David Starner)
Date: Thu, 16 Jul 2009 21:15:40 -0400
Subject: [gutvol-d] Re: !@! Re: A new DVD?
In-Reply-To: <628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com>
References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com>
	<alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu>
	<628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com>
Message-ID: <6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com>

On Thu, Jul 16, 2009 at 8:50 PM, Aaron
Cannon<cannona at fireantproductions.com> wrote:
> Like you, I have been watching dual layer DVDs, but the price is still
> quite high. ?Shop4tech.com has them for $2.50. ?It doesn't make sense
> considering that one single layer DVD costs just $0.26.

It's not about simple production costs; it's about the value of
producing in quantity. Everyone and their brother have single layer
DVD drives, so the media is produced in the billions. Dual layer
drives are much rarer, so the media isn't mass-produced in the same
quantities. However, looking at shop4tech.com, I'm seeing several
offers of dual layer DVDs at ~$1.00 a DVD; are those not suitable for
us for some reason?

-- 
Kie ekzistas vivo, ekzistas espero.

From cannona at fireantproductions.com  Thu Jul 16 18:28:23 2009
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu, 16 Jul 2009 20:28:23 -0500
Subject: [gutvol-d] Re: !@! Re: A new DVD?
In-Reply-To: <6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com>
References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com>
	<alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu>
	<628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com>
	<6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com>
Message-ID: <628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com>

Hi David.

There's no reason that media wouldn't work.  I apparently just didn't
look hard enough.  However, the other problem with publishing a dual
layer DVD is that not as many people would be able to burn it,
because, as you say, there aren't as many burners out there that can
do dual layers.

Other thoughts?

Aaron

On 7/16/09, David Starner <prosfilaes at gmail.com> wrote:
> On Thu, Jul 16, 2009 at 8:50 PM, Aaron
> Cannon<cannona at fireantproductions.com> wrote:
>> Like you, I have been watching dual layer DVDs, but the price is still
>> quite high. ?Shop4tech.com has them for $2.50. ?It doesn't make sense
>> considering that one single layer DVD costs just $0.26.
>
> It's not about simple production costs; it's about the value of
> producing in quantity. Everyone and their brother have single layer
> DVD drives, so the media is produced in the billions. Dual layer
> drives are much rarer, so the media isn't mass-produced in the same
> quantities. However, looking at shop4tech.com, I'm seeing several
> offers of dual layer DVDs at ~$1.00 a DVD; are those not suitable for
> us for some reason?
>
> --
> Kie ekzistas vivo, ekzistas espero.
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From gbnewby at pglaf.org  Thu Jul 16 22:15:37 2009
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu, 16 Jul 2009 22:15:37 -0700
Subject: [gutvol-d] Re: !@! Re: A new DVD?
In-Reply-To: <628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com>
References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com>
	<alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu>
	<628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com>
	<6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com>
	<628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com>
Message-ID: <20090717051537.GB31087@mail.pglaf.org>

On Thu, Jul 16, 2009 at 08:28:23PM -0500, Aaron Cannon wrote:
> Hi David.
> 
> There's no reason that media wouldn't work.  I apparently just didn't
> look hard enough.  However, the other problem with publishing a dual
> layer DVD is that not as many people would be able to burn it,
> because, as you say, there aren't as many burners out there that can
> do dual layers.
> 
> Other thoughts?
> 
> Aaron

It's time for a new DVD.

The current DVD has the majority of all the PG content as
.zip txt, plus a variety of other content in other formats.
So, to do the same thing today would take somewhat more
space (I don't know how much).

It seems many modern drives can read dual-layer discs.  Has
anyone seen statistics on this?  

I think we could also releas the dual-layer content as a pair
of DVD images, for those who would prefer it that way.  It
would allow us to retire the older DVD image.  In fact, I would
probably start a new dual-layer DVD image with the full contents
of the "best of" CD (updated to reflect changes to the eBooks
since then...maybe with a new call for "best of" nominations).
These days, it seems fair to have only DVDs, not CDs.

We can certainly afford to purchase a handful of external
or internal dual-layer DVD writers for people willing to
do the burning.  Media are more expensive, but as David mentioned,
we can shop around and buy in bulk to help offset costs.  Generally,
the CD/DVD giveaways have paid for themselves in returned donations,
so I suspect they will remain self-supporting even if costs
go up - we just need to ask, when discs are sent.
  -- Greg

> On 7/16/09, David Starner <prosfilaes at gmail.com> wrote:
> > On Thu, Jul 16, 2009 at 8:50 PM, Aaron
> > Cannon<cannona at fireantproductions.com> wrote:
> >> Like you, I have been watching dual layer DVDs, but the price is still
> >> quite high. ?Shop4tech.com has them for $2.50. ?It doesn't make sense
> >> considering that one single layer DVD costs just $0.26.
> >
> > It's not about simple production costs; it's about the value of
> > producing in quantity. Everyone and their brother have single layer
> > DVD drives, so the media is produced in the billions. Dual layer
> > drives are much rarer, so the media isn't mass-produced in the same
> > quantities. However, looking at shop4tech.com, I'm seeing several
> > offers of dual layer DVDs at ~$1.00 a DVD; are those not suitable for
> > us for some reason?
> >
> > --
> > Kie ekzistas vivo, ekzistas espero.
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d at lists.pglaf.org
> > http://lists.pglaf.org/mailman/listinfo/gutvol-d
> >
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d

From schultzk at uni-trier.de  Thu Jul 16 23:49:02 2009
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Fri, 17 Jul 2009 08:49:02 +0200
Subject: [gutvol-d] Re: !@! Re: A new DVD?
In-Reply-To: <20090717051537.GB31087@mail.pglaf.org>
References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com>
	<alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu>
	<628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com>
	<6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com>
	<628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com>
	<20090717051537.GB31087@mail.pglaf.org>
Message-ID: <EAF6B4D3-3F9D-488F-A7D1-CE8DA4920DE1@uni-trier.de>

Hi All,

	Actually, there are more DL-burners out thier than you think. As  
mentioned
	most should be able to read DL-DVDs.

	One thought though: If somebody is willing to download the 8 GB whether
	DL or two DVDs I do not think they will burn it. At least I have not  
burned
	the images I have. I have the images on my drive and just mount it  
when I need
	it. No fuss with having to carry a DVD along. It is also, possible to  
put the image
	or its content on a USB-stick.

	Second, if the users can burn a DVD they could divide it themselves.  
With a carefully
	crafted index.html and a little javascript magic one could easily  
divide the content among
	two DVDs. All they need to do edit the index.html in one place for  
the two DVDs.
	Another approach would be to use two directories for the content  
containing say
	DVD1 and DVD2. Then if somebody has a DL-burner s/he can you that
	and who only can burn single layer can do that.

	I see no big problem using DL-images. Then again I maybe to savvy.  
Personally I would
	prefer an image with all the PG-content zipped. I use to ftp the PG  
directories. But, at one point
	the guys at the unversity ask to me to take it easy because one day I  
had effectively
	used almost all bandwidth by using "get -r *.*".   That was a long  
time ago, maybe I should try
	that again.

	All aside, I would suggest using the second model containing  
directories, this way in another
	couple of years we can use the same model for even larger images and  
we do not need
	to bother with what type of burner the user has whether it be single,  
DL or even Blue-ray, or
	whatever might appear.


	regards
		Keith.


From cannona at fireantproductions.com  Fri Jul 17 05:45:16 2009
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Fri, 17 Jul 2009 07:45:16 -0500
Subject: [gutvol-d] Re: !@! Re: A new DVD?
In-Reply-To: <EAF6B4D3-3F9D-488F-A7D1-CE8DA4920DE1@uni-trier.de>
References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com>
	<alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu>
	<628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com>
	<6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com>
	<628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com>
	<20090717051537.GB31087@mail.pglaf.org>
	<EAF6B4D3-3F9D-488F-A7D1-CE8DA4920DE1@uni-trier.de>
Message-ID: <628c29180907170545i48bb1ccavb818de49dfec81ad@mail.gmail.com>

Hi Greg, Keith and all.

I found this quote at
http://www.burnworld.com/howto/articles/intro-to-dual-layer.htm:

"Dual layer DVD recordable discs offer up to four hours of high
quality MPEG-2 video, or up to 8.5GB of data on a single-sided disc
with two individual recordable ?layers.? Dual layer capable recorders
will have the ability to record on the new dual layer DVD recordable
discs, as well as on traditional single layer DVD discs and CDs too.
Want more? Because a recorded dual layer DVD disc is compliant with
the DVD9 specification, the discs are compatible with most consumer
DVD players and computer DVD-ROM drives already installed in the
market."

It reads as if it were written before DL burners became available (or
shortly after the first ones were released), so hopefully the author
knows of what he speaks.

My initial tests have shown that just the zipped text content of all
the books (excluding all HTML, and non-ebooks except for a little
sheet music, and also excluding some of the larger data files such as
the HGP files and the numbers) weighs in at about 5.5 GB.  This is
also excluding ASCII encoded files when UTF-8 or ISO-8859-X files are
available.  This does not exclude any images that were included in the
zip files with the text version, nor does it exclude any copyrighted
texts.

5.5 GB leaves us a good 3 GB more to play with.

I think it would make more sense to offer the dual layer DVD ISO, and
also offer two single-layer DVD .iso images for folks with only a
single layer DVD burner.  A lot of people who have emailed us in the
past have had a hard time just burning the .ISO.  If possible, I would
like to keep things as simple as we can for them.

I'll have to check to see if PG's 11-disc burner can handle dual layer
drives, or if by any chance it already has such drives installed.  It
might take a firmware upgrade, but I would be surprised if it can't at
least use DL drives.  Greg, do you by chance have an easily accessible
record of what model of duplicator we bought?  If not, I can open it
up and check the model numbers on the controler.  I just don't have
the email anymore, and there aren't any labels on the outside of the
case.

Thanks.

Aaron

On 7/17/09, Keith J. Schultz <schultzk at uni-trier.de> wrote:
> Hi All,
>
> 	Actually, there are more DL-burners out thier than you think. As
> mentioned
> 	most should be able to read DL-DVDs.
>
> 	One thought though: If somebody is willing to download the 8 GB whether
> 	DL or two DVDs I do not think they will burn it. At least I have not
> burned
> 	the images I have. I have the images on my drive and just mount it
> when I need
> 	it. No fuss with having to carry a DVD along. It is also, possible to
> put the image
> 	or its content on a USB-stick.
>
> 	Second, if the users can burn a DVD they could divide it themselves.
> With a carefully
> 	crafted index.html and a little javascript magic one could easily
> divide the content among
> 	two DVDs. All they need to do edit the index.html in one place for
> the two DVDs.
> 	Another approach would be to use two directories for the content
> containing say
> 	DVD1 and DVD2. Then if somebody has a DL-burner s/he can you that
> 	and who only can burn single layer can do that.
>
> 	I see no big problem using DL-images. Then again I maybe to savvy.
> Personally I would
> 	prefer an image with all the PG-content zipped. I use to ftp the PG
> directories. But, at one point
> 	the guys at the unversity ask to me to take it easy because one day I
> had effectively
> 	used almost all bandwidth by using "get -r *.*".   That was a long
> time ago, maybe I should try
> 	that again.
>
> 	All aside, I would suggest using the second model containing
> directories, this way in another
> 	couple of years we can use the same model for even larger images and
> we do not need
> 	to bother with what type of burner the user has whether it be single,
> DL or even Blue-ray, or
> 	whatever might appear.
>
>
> 	regards
> 		Keith.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From cannona at fireantproductions.com  Fri Jul 17 15:15:54 2009
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Fri, 17 Jul 2009 17:15:54 -0500
Subject: [gutvol-d] Re: !@! Re: A new DVD?
In-Reply-To: <628c29180907170545i48bb1ccavb818de49dfec81ad@mail.gmail.com>
References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com>
	<alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu>
	<628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com>
	<6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com>
	<628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com>
	<20090717051537.GB31087@mail.pglaf.org>
	<EAF6B4D3-3F9D-488F-A7D1-CE8DA4920DE1@uni-trier.de>
	<628c29180907170545i48bb1ccavb818de49dfec81ad@mail.gmail.com>
Message-ID: <628c29180907171515o760354ack47b731478b97db6a@mail.gmail.com>

I was mistaken.  It actually weighs in at 4.5.  So, it does in fact
fit on a single layer DVD (verified with Nero).  In fact, we have
about 87MB to spare.  So, in light of this new information, the
question is whether we want to create a DL DVD or not.

Thoughts?

Aaron

On 7/17/09, Aaron Cannon <cannona at fireantproductions.com> wrote:
> Hi Greg, Keith and all.
>
> I found this quote at
> http://www.burnworld.com/howto/articles/intro-to-dual-layer.htm:
>
> "Dual layer DVD recordable discs offer up to four hours of high
> quality MPEG-2 video, or up to 8.5GB of data on a single-sided disc
> with two individual recordable ?layers.? Dual layer capable recorders
> will have the ability to record on the new dual layer DVD recordable
> discs, as well as on traditional single layer DVD discs and CDs too.
> Want more? Because a recorded dual layer DVD disc is compliant with
> the DVD9 specification, the discs are compatible with most consumer
> DVD players and computer DVD-ROM drives already installed in the
> market."
>
> It reads as if it were written before DL burners became available (or
> shortly after the first ones were released), so hopefully the author
> knows of what he speaks.
>
> My initial tests have shown that just the zipped text content of all
> the books (excluding all HTML, and non-ebooks except for a little
> sheet music, and also excluding some of the larger data files such as
> the HGP files and the numbers) weighs in at about 5.5 GB.  This is
> also excluding ASCII encoded files when UTF-8 or ISO-8859-X files are
> available.  This does not exclude any images that were included in the
> zip files with the text version, nor does it exclude any copyrighted
> texts.
>
> 5.5 GB leaves us a good 3 GB more to play with.
>
> I think it would make more sense to offer the dual layer DVD ISO, and
> also offer two single-layer DVD .iso images for folks with only a
> single layer DVD burner.  A lot of people who have emailed us in the
> past have had a hard time just burning the .ISO.  If possible, I would
> like to keep things as simple as we can for them.
>
> I'll have to check to see if PG's 11-disc burner can handle dual layer
> drives, or if by any chance it already has such drives installed.  It
> might take a firmware upgrade, but I would be surprised if it can't at
> least use DL drives.  Greg, do you by chance have an easily accessible
> record of what model of duplicator we bought?  If not, I can open it
> up and check the model numbers on the controler.  I just don't have
> the email anymore, and there aren't any labels on the outside of the
> case.
>
> Thanks.
>
> Aaron
>
> On 7/17/09, Keith J. Schultz <schultzk at uni-trier.de> wrote:
>> Hi All,
>>
>> 	Actually, there are more DL-burners out thier than you think. As
>> mentioned
>> 	most should be able to read DL-DVDs.
>>
>> 	One thought though: If somebody is willing to download the 8 GB whether
>> 	DL or two DVDs I do not think they will burn it. At least I have not
>> burned
>> 	the images I have. I have the images on my drive and just mount it
>> when I need
>> 	it. No fuss with having to carry a DVD along. It is also, possible to
>> put the image
>> 	or its content on a USB-stick.
>>
>> 	Second, if the users can burn a DVD they could divide it themselves.
>> With a carefully
>> 	crafted index.html and a little javascript magic one could easily
>> divide the content among
>> 	two DVDs. All they need to do edit the index.html in one place for
>> the two DVDs.
>> 	Another approach would be to use two directories for the content
>> containing say
>> 	DVD1 and DVD2. Then if somebody has a DL-burner s/he can you that
>> 	and who only can burn single layer can do that.
>>
>> 	I see no big problem using DL-images. Then again I maybe to savvy.
>> Personally I would
>> 	prefer an image with all the PG-content zipped. I use to ftp the PG
>> directories. But, at one point
>> 	the guys at the unversity ask to me to take it easy because one day I
>> had effectively
>> 	used almost all bandwidth by using "get -r *.*".   That was a long
>> time ago, maybe I should try
>> 	that again.
>>
>> 	All aside, I would suggest using the second model containing
>> directories, this way in another
>> 	couple of years we can use the same model for even larger images and
>> we do not need
>> 	to bother with what type of burner the user has whether it be single,
>> DL or even Blue-ray, or
>> 	whatever might appear.
>>
>>
>> 	regards
>> 		Keith.
>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d at lists.pglaf.org
>> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>>
>

From schultzk at uni-trier.de  Sat Jul 18 02:58:57 2009
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Sat, 18 Jul 2009 11:58:57 +0200
Subject: [gutvol-d] Re: !@! Re: A new DVD?
In-Reply-To: <628c29180907171515o760354ack47b731478b97db6a@mail.gmail.com>
References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com>
	<alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu>
	<628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com>
	<6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com>
	<628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com>
	<20090717051537.GB31087@mail.pglaf.org>
	<EAF6B4D3-3F9D-488F-A7D1-CE8DA4920DE1@uni-trier.de>
	<628c29180907170545i48bb1ccavb818de49dfec81ad@mail.gmail.com>
	<628c29180907171515o760354ack47b731478b97db6a@mail.gmail.com>
Message-ID: <6962E1B2-E316-413F-A112-52F7A3483BD2@uni-trier.de>

Hi Aaron,

	Well it depends! Do we want to put the html zipped
	version on it or even non-ebooks?

	I believe some prefer html over text. Maybe a dual
	aproach. "normal DVD with text version and a DL
	with more on it. Of course it is possible to make
	two "normal" DVDs. I would prefer a DL.

	regards
		Keith.

Am 18.07.2009 um 00:15 schrieb Aaron Cannon:

> I was mistaken.  It actually weighs in at 4.5.  So, it does in fact
> fit on a single layer DVD (verified with Nero).  In fact, we have
> about 87MB to spare.  So, in light of this new information, the
> question is whether we want to create a DL DVD or not.
>
> Thoughts?
>


From user5013 at aol.com  Sat Jul 18 06:10:37 2009
From: user5013 at aol.com (Christa & Jay Toser)
Date: Sat, 18 Jul 2009 08:10:37 -0500
Subject: [gutvol-d] Re: A new DVD?
Message-ID: <22178060-2060-44B5-88A3-F48982C4E24F@aol.com>

Hi,  Time for me to put my two cents worth in.

I do a good-sized chunk of the international mailings for the  
Gutenberg project.  These are the physical discs that are sent out,  
not any of the internet downloads.

Point One:  We cannot abandon the legacy of a CD.  Most of Africa,  
and many ex-Soviet satellite nations, do not have personal computers  
that can read DVD's.  For instance, the One Laptop Per Child computer  
can only read CD's.  Perhaps the Gutenberg Project can update their  
CD to include the 630 most popular books from 2009 (instead of from  
2003), but we must maintain a CD.

Point Two:  We must update the DVD.  Folk in India have noticed that  
the current DVD holds about 10,000 fewer books than are on-line.   
They wonder why we are behind the times.

Point Three:  On the internet downloads, yes we can have a dual-layer  
or Blue-Ray disk image.  Wonderful.  BUT in the physical world, we  
will have to stick with the standard DVD format.  We must understand  
that much of the world will continue with older legacy formats for at  
least the next decade.  Which means, the next release of the DVD will  
probably have to be a set of two discs.

Point Four:  Mailing of discs.  Currently, I mail two copies of the  
DVD (or CD) for every request.  The philosophy has always been:   
"Keep one copy, and give the other copy to your local school or  
library."  This has worked pretty well at the current postage costs.   
However, if I have to mail four discs, then costs of mailing will go up.

I propose this change:  Any requester may ask for one single personal  
copy of the two DVD's.  _AND_ there should be an additional checkbox  
for them to ask for a second set of DVD's for them to give away.   
That way, I would normally send out only the two DVD's (for the  
single personal copy) -- unless the requester wants more -- and then  
I would spend the greater postage to send the extra discs.

Critics may ask: "If I send a second set of discs, will that extra  
set be personally delivered to the other destination?"  I say, Oh  
yes, I GUARANTEE IT.  Americans simply do not realize just how much  
the rest of the wold values books.  If someone says they will deliver  
the duplicates -- then they WILL.  It would be worth it to the  
Gutenberg Project to pay the extra postage for the duplicate DVD's.

Point Four-and-a-Half:  No, DVD BURNERS are not as common as you  
think.  Lots of folk across the world can READ a DVD, but not so many  
can BURN a DVD.  As much as these requesters would want to make  
copies -- they just can't.

Point Five:  Download time.  The debate I have read in the last  
couple days, does not seem to take into account internet access.   
They seem to think just anyone can download the DVD (or Blue-Ray)  
images easily.  That is not the case in most of the world.

I currently have 56K dial-up.  If I were to download a (hypothetical)  
2-DVD set, that download would complete, oh, about October.  And yet,  
I do not experience the incredible delays seen in the Palestinian  
Territories, or some of the emerging Communist nations, or much of  
Africa -- there the best speed can be as low as 9600 baud (2.4K).

Whenever Project Gutenberg creates the be-all-end-all Blue-Ray DVD  
disk which contains everything in the project -- then about 2/3rds of  
the world will not have the bandwidth to access any of it.


So, I recommend moderation in whatever you create.  Please keep in  
mind the legacy DVD & CD standards that are already in place; that  
the world currently can read; and try to create the new discs so that  
(almost) anyone can read them.

Hope this helps,
Jay Toser

P.S. Should any of these new discs be created, will someone remember  
to mail to me, a hard copy?  As I've said, I'm not going to be able  
to download a copy.  That's why I do the international mailings.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090718/6f4528b1/attachment.html>

From grythumn at gmail.com  Sat Jul 18 08:58:29 2009
From: grythumn at gmail.com (Robert Cicconetti)
Date: Sat, 18 Jul 2009 11:58:29 -0400
Subject: [gutvol-d] Re: A new DVD?
In-Reply-To: <22178060-2060-44B5-88A3-F48982C4E24F@aol.com>
References: <22178060-2060-44B5-88A3-F48982C4E24F@aol.com>
Message-ID: <15cfa2a50907180858w7c97edb6s62a1a6e0ce3edcb4@mail.gmail.com>

On Sat, Jul 18, 2009 at 9:10 AM, Christa & Jay Toser <user5013 at aol.com>wrote:

> Point Three:  On the internet downloads, yes we can have a dual-layer or
> Blue-Ray disk image.  Wonderful.  BUT in the physical world, we will have to
> stick with the standard DVD format.  We must understand that much of the
> world will continue with older legacy formats for at least the next decade.
> Which means, the next release of the DVD will probably have to be a set of
> two discs.
>

Dual-layer disks are PART of the standard DVD format[0], and just about any
DVD-ROM[1] will read a burned dual layer disk. There were issues with burned
DL disks in very early consumer DVD players, but DVD-ROMs are more robust,
especially if you use DVD+R DL disks[2].

The bigger issue is burners... DL burners were only really standard in the
last 3 or 4 years, but anything can read them.

So you actually have it backwards... for internet downloads, you want single
layer (DVD-5) images, but for physical mailings you want dual-layer images
(DVD-9).

R C
[0] In fact the majority of commercial DVDs are dual-layer. Pressed
dual-layer, not burned, but still dual-layer.
[1] I won't guarantee every device will, but even my ancient 2x DVD-ROM from
~1998 can read burned DL disks.
[2] Some older devices have trouble with -R DL disks if the two layers
aren't burned to the same length.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090718/5b39c99a/attachment.html>

From cannona at fireantproductions.com  Sat Jul 18 09:18:35 2009
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sat, 18 Jul 2009 11:18:35 -0500
Subject: [gutvol-d] Re: A new DVD?
In-Reply-To: <22178060-2060-44B5-88A3-F48982C4E24F@aol.com>
References: <22178060-2060-44B5-88A3-F48982C4E24F@aol.com>
Message-ID: <628c29180907180918g442e12aas5c4e43847a77d889@mail.gmail.com>

Hi Jay.

Thanks for your insights on this, and especially thanks for all your
work mailing DVDs.

Points 1 and 2. agreed.

Point 3. Based on what I've read, all drives (including the very
oldest) should be able to read dual layer DVDs.  The reason is that
even though DL burners only became available relatively recently, the
dual layer format has existed from the beginning, and in fact many
DVDs from the mid to late 90's were dual layer.  So, theoretically at
least, if a person can read our current DVDs, they should be able to
read dual layer discs.  I have no way of knowing if this is true in
practice however.

Point 4.  It's a good idea.  The decision to send two discs, whether
it be two CDs, two DVDs, or one of each is so that the shipping weight
for every package will be uniform.  That was a decision I made, and
the reason I did so was just to make things a little simpler for
volunteers.  However, once we know what the next generation DVD will
look like, we can reevaluate this decision.

Points 4 and 4.5 are also quite valid and definitly something that
needs to be kept in mind.

Thanks again.

Aaron

On 7/18/09, Christa & Jay Toser <user5013 at aol.com> wrote:
> Hi,  Time for me to put my two cents worth in.
>
> I do a good-sized chunk of the international mailings for the
> Gutenberg project.  These are the physical discs that are sent out,
> not any of the internet downloads.
>
> Point One:  We cannot abandon the legacy of a CD.  Most of Africa,
> and many ex-Soviet satellite nations, do not have personal computers
> that can read DVD's.  For instance, the One Laptop Per Child computer
> can only read CD's.  Perhaps the Gutenberg Project can update their
> CD to include the 630 most popular books from 2009 (instead of from
> 2003), but we must maintain a CD.
>
> Point Two:  We must update the DVD.  Folk in India have noticed that
> the current DVD holds about 10,000 fewer books than are on-line.
> They wonder why we are behind the times.
>
> Point Three:  On the internet downloads, yes we can have a dual-layer
> or Blue-Ray disk image.  Wonderful.  BUT in the physical world, we
> will have to stick with the standard DVD format.  We must understand
> that much of the world will continue with older legacy formats for at
> least the next decade.  Which means, the next release of the DVD will
> probably have to be a set of two discs.
>
> Point Four:  Mailing of discs.  Currently, I mail two copies of the
> DVD (or CD) for every request.  The philosophy has always been:
> "Keep one copy, and give the other copy to your local school or
> library."  This has worked pretty well at the current postage costs.
> However, if I have to mail four discs, then costs of mailing will go up.
>
> I propose this change:  Any requester may ask for one single personal
> copy of the two DVD's.  _AND_ there should be an additional checkbox
> for them to ask for a second set of DVD's for them to give away.
> That way, I would normally send out only the two DVD's (for the
> single personal copy) -- unless the requester wants more -- and then
> I would spend the greater postage to send the extra discs.
>
> Critics may ask: "If I send a second set of discs, will that extra
> set be personally delivered to the other destination?"  I say, Oh
> yes, I GUARANTEE IT.  Americans simply do not realize just how much
> the rest of the wold values books.  If someone says they will deliver
> the duplicates -- then they WILL.  It would be worth it to the
> Gutenberg Project to pay the extra postage for the duplicate DVD's.
>
> Point Four-and-a-Half:  No, DVD BURNERS are not as common as you
> think.  Lots of folk across the world can READ a DVD, but not so many
> can BURN a DVD.  As much as these requesters would want to make
> copies -- they just can't.
>
> Point Five:  Download time.  The debate I have read in the last
> couple days, does not seem to take into account internet access.
> They seem to think just anyone can download the DVD (or Blue-Ray)
> images easily.  That is not the case in most of the world.
>
> I currently have 56K dial-up.  If I were to download a (hypothetical)
> 2-DVD set, that download would complete, oh, about October.  And yet,
> I do not experience the incredible delays seen in the Palestinian
> Territories, or some of the emerging Communist nations, or much of
> Africa -- there the best speed can be as low as 9600 baud (2.4K).
>
> Whenever Project Gutenberg creates the be-all-end-all Blue-Ray DVD
> disk which contains everything in the project -- then about 2/3rds of
> the world will not have the bandwidth to access any of it.
>
>
> So, I recommend moderation in whatever you create.  Please keep in
> mind the legacy DVD & CD standards that are already in place; that
> the world currently can read; and try to create the new discs so that
> (almost) anyone can read them.
>
> Hope this helps,
> Jay Toser
>
> P.S. Should any of these new discs be created, will someone remember
> to mail to me, a hard copy?  As I've said, I'm not going to be able
> to download a copy.  That's why I do the international mailings.
>

From cannona at fireantproductions.com  Sat Jul 18 10:15:35 2009
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sat, 18 Jul 2009 12:15:35 -0500
Subject: [gutvol-d] Re: !@! Re: A new DVD?
In-Reply-To: <6962E1B2-E316-413F-A112-52F7A3483BD2@uni-trier.de>
References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com>
	<alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu>
	<628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com>
	<6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com>
	<628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com>
	<20090717051537.GB31087@mail.pglaf.org>
	<EAF6B4D3-3F9D-488F-A7D1-CE8DA4920DE1@uni-trier.de>
	<628c29180907170545i48bb1ccavb818de49dfec81ad@mail.gmail.com>
	<628c29180907171515o760354ack47b731478b97db6a@mail.gmail.com>
	<6962E1B2-E316-413F-A112-52F7A3483BD2@uni-trier.de>
Message-ID: <628c29180907181015t5d80f0c9nbb099c5d97664d98@mail.gmail.com>

That is in fact the question.  Is the extra content worth the
inconvenience and expense of the DL format and/or the creation of two
discs?  I personally could go either way, though just releasing one
single layer DVD would be easier.

Incidentally, I have compiled a list of files which will fit on 1
single layer DVD with about 87MB to spare.  Whether we choose the
single or dual layer, I propose that this list serve as a starting
point.

If anyone has any files that they feel should be included or excluded,
please let me know.  The list is available as tab delimited data at:
(as a .zip file) http://snowy.arsc.alaska.edu/cdproject/dvdfiles.zip
(as a .bz2 file) http://snowy.arsc.alaska.edu/cdproject/dvdfiles.csv.bz2

The data was generated as follows:
1. using the catalog.rdf file downloaded on July 14.
2. Removing all books that have a "type" assigned in the RDF record.
This basically gets rid of almost everything that isn't an Ebook,
including audio books, music, data, ETC.
3. removed books 2201 through 2224, books 11775 through 11855, and
books 19159, 10802, 11220, and 3202.
4. removed files with formats pageimages, msword, text/xml,
audio/mpeg, application/octet-stream type=anything, tei, html, pdf,
rtf, tex, folio, palm, raider, and unspecified.  In a few cases this
removed entire titles, but in most cases, this just decreased the
number of formats a given title was available in.
5. If a title was available in UTF-8 and/or ISO-8859-X, and also
available in ASCII, then the ASCII version was not included.
6. If a book has a zipped and unzipped version in the archive, then
only the zipped versions were kept.

Again, suggestions are very welcome.

Thanks.

Aaron

On 7/18/09, Keith J. Schultz <schultzk at uni-trier.de> wrote:
> Hi Aaron,
>
> 	Well it depends! Do we want to put the html zipped
> 	version on it or even non-ebooks?
>
> 	I believe some prefer html over text. Maybe a dual
> 	aproach. "normal DVD with text version and a DL
> 	with more on it. Of course it is possible to make
> 	two "normal" DVDs. I would prefer a DL.
>
> 	regards
> 		Keith.
>
> Am 18.07.2009 um 00:15 schrieb Aaron Cannon:
>
>> I was mistaken.  It actually weighs in at 4.5.  So, it does in fact
>> fit on a single layer DVD (verified with Nero).  In fact, we have
>> about 87MB to spare.  So, in light of this new information, the
>> question is whether we want to create a DL DVD or not.
>>
>> Thoughts?
>>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From cannona at fireantproductions.com  Mon Jul 20 10:44:19 2009
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Mon, 20 Jul 2009 12:44:19 -0500
Subject: [gutvol-d] Suggestions for the DVD must haves?
Message-ID: <628c29180907201044s76b48906t246ca18ce2fe2c8b@mail.gmail.com>

Hi all.

I would like some recommendations on some books from the collection
that you feel should be included on the DVD.  Remember that all books
that have text versions and that aren't data get included.  However,
what books should have their HTML versions included as well?  Which
other nontext works (such as mp3, films, ETC.) should be included?

Email me your recommendations privately or reply to the list, as you prefer.

Thanks.

Aaron

From i30817 at gmail.com  Wed Jul 15 17:28:24 2009
From: i30817 at gmail.com (Paulo Levi)
Date: Thu, 16 Jul 2009 01:28:24 +0100
Subject: [gutvol-d] Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com>
References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com>
Message-ID: <212322090907151728x84b224by697776eec6e265a4@mail.gmail.com>

---------- Forwarded message ----------
From: Paulo Levi <i30817 at gmail.com>
Date: Thu, Jul 16, 2009 at 1:25 AM
Subject: Programmatic fetching books from Gutenberg
To: gutvol-p at lists.pglaf.org


I made a ebook reader
(here) http://code.google.com/p/bookjar/downloads/list

and i'd like to search and download Gutenberg books. I already have a
searcher prototype using LuceneSail a library that uses Lucene to index rdf
documents and only indexing what i want from the catalog.rdf.zip.

Now i'd like to know how from the url inside the catalog i can fetch the
book itself, and what are the variants for the formats.
A example query result:
 author: Shakespeare, William, 1564-1616
 url: http://www.gutenberg.org/feeds/catalog.rdf#etext1802
 title: King Henry VIII

So, i like to know how from the etext1802 number can i get a working url to
download the book, and how to construct variants for each format.

Thank you in advance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090716/ea48a4cc/attachment.html>

From pterandon at gmail.com  Thu Jul 16 20:02:18 2009
From: pterandon at gmail.com (Greg M. Johnson)
Date: Thu, 16 Jul 2009 23:02:18 -0400
Subject: [gutvol-d] Hey, I've been working on a DVD!
Message-ID: <a0bf3e960907162002k15b3b256m990b6af5d5dddcf9@mail.gmail.com>

Hi.  I'm getting the email from this list but am not sure if I'm able to
send to the list.

Anyway, I've been working on a DVD.  I've got 3500 of your HTML files on it
organized into 21 different genres.   It's *everything*  under the sun, but
from my own editorial stance, I'm most interested in Christian sermons,
scifi, old sea tales, and the (US) Civil War, so there might be too much in
those genres there for some's taste.  At one point I was trying to avoid
controversial material (eugenics, slavery, women's suffrage), but then
thought it would be more valuable to include all kinds of viewpoints, even
ones universally understood to be wrong today.

The files are HTML, which i find to be the best way to read the material.
I've stressed those files which are <1MB, but of course granting waivers to
image-rich classics like Beatrix Potter's work.     I've seen the DVD you
put out with 19000 zipped txt files on it.  That DVD probably the thing you
want to have in every home basement in case civilization needs to survive a
nuclear war, but it's not very user friendly.  I entitled mine, "Some of the
Best of Project Gutenberg."

So, I'm planning to start distributing it to all my friends and every local
nursing home.  But I'd gladly give the current draft up to your group under
a CC zero license.



ADVENTURE IN THE AGE OF STEAM
SCIENCE FICTION
ADVENTURE NOVELS FOR YOUTH
BEDTIME STORIES
MANUALS & HOW-TO BOOKS
SCHOOLBOOKS & EDUCATION THEORY
ENGLISH
LITERATURE
DRAMA
POETRY
MYSTERY
RELIGION- CHRISTIAN
RELIGION-OTHER, MYTHOLOGY, & PHILOSOPHY
SCIENCE- MEDICINE, PHYSIOLOGY, & PSYCHOLOGY
SCIENCE- NATURAL
SCIENCE- PHYSICS & ENGINEERING
SOCIAL SCIENCE, THE ARTS, WORLDWIDE HISTORY
U.S. & THE AMERICAS HISTORY & CULTURE
EUROPEAN HISTORY & CULTURE
ASIAN & AFRICAN HISTORY & CULTURE
MAGAZINES





-- 
Greg M. Johnson
http://pterandon.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090716/65f71c57/attachment.html>

From hart at pobox.com  Mon Jul 20 17:42:02 2009
From: hart at pobox.com (Michael S. Hart)
Date: Mon, 20 Jul 2009 16:42:02 -0800 (AKDT)
Subject: [gutvol-d] !@! Barnes & Noble Unveils Online Bookstore to Compete
 Directly With Amazon
Message-ID: <alpine.DEB.1.00.0907201637410.21137@snowy.arsc.alaska.edu>


http://tinyurl.com/np5ke2

"Top U.S. bookseller Barnes & Noble (BKS) announced Monday the
launch of the world's largest online bookstore, with over 700,000
titles that can be read on a range of platforms from Apple's iPhone
to personal computers. Sounding a challenge to online retailer
Amazon, the company said its selection would grow to over 1 million
titles within the next year and include every available e-book from
every book publisher."

From marcello at perathoner.de  Mon Jul 27 00:24:41 2009
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 27 Jul 2009 09:24:41 +0200
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com>
References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com>	<212322090907151728x84b224by697776eec6e265a4@mail.gmail.com>
	<a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com>
Message-ID: <4A6D5639.5080600@perathoner.de>

David A. Desrosiers wrote:
> On Wed, Jul 15, 2009 at 8:28 PM, Paulo Levi<i30817 at gmail.com> wrote:
>> So, i like to know how from the etext1802 number can i get a working url to
>> download the book, and how to construct variants for each format.
> 
> I do something very similar on the Plucker "samples" page:
> 
> http://www.plkr.org/samples
> 
> I check HEAD on each resource (using an intelligent caching mechanism
> on my side), and then either present a working link, or a striked-out
> link, depending on whether the format is available or not.

That seems an horrible waste of resources seeing that you only need to 
scan the rdf file to see what files we have.



From marcello at perathoner.de  Mon Jul 27 00:41:19 2009
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 27 Jul 2009 09:41:19 +0200
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <212322090907151728x84b224by697776eec6e265a4@mail.gmail.com>
References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com>
	<212322090907151728x84b224by697776eec6e265a4@mail.gmail.com>
Message-ID: <4A6D5A1F.9010403@perathoner.de>

Paulo Levi wrote:
> 
> ---------- Forwarded message ----------
> From: *Paulo Levi* <i30817 at gmail.com <mailto:i30817 at gmail.com>>
> Date: Thu, Jul 16, 2009 at 1:25 AM
> Subject: Programmatic fetching books from Gutenberg
> To: gutvol-p at lists.pglaf.org <mailto:gutvol-p at lists.pglaf.org>
> 
> 
> I made a ebook reader
> (here) http://code.google.com/p/bookjar/downloads/list
> 
> and i'd like to search and download Gutenberg books. I already have a searcher 
> prototype using LuceneSail a library that uses Lucene to index rdf documents and 
> only indexing what i want from the catalog.rdf.zip.
> 
> Now i'd like to know how from the url inside the catalog i can fetch the book 
> itself, and what are the variants for the formats.
> A example query result:
>  author: Shakespeare, William, 1564-1616
>  url: http://www.gutenberg.org/feeds/catalog.rdf#etext1802
>  title: King Henry VIII
>  
> So, i like to know how from the etext1802 number can i get a working url to 
> download the book, and how to construct variants for each format.
> 
> Thank you in advance.


I already told you how to do that on gutvol-p.

You make a very simple thing very complicated because you refuse to use 
xml tools to scan an xml file.

This simple xpath query:

   xpath 
("//pgterms:file[dcterms::isFormatOf[@rdf:resource='#etext29514']]")

will get all files we have for book 29514 with mimetype, size and last 
modification date.





--- excerpt from catalog.rdf ---

<pgterms:file rdf:about="&f;2/9/5/1/29514/29514-8.txt">
   <dc:format><dcterms:IMT><rdf:value>text/plain; 
charset="iso-8859-1"</rdf:value></dcterms:IMT></dc:format>
   <dcterms:extent>27727</dcterms:extent>
 
<dcterms:modified><dcterms:W3CDTF><rdf:value>2009-07-25</rdf:value></dcterms:W3CDTF></dcterms:modified>
   <dcterms:isFormatOf rdf:resource="#etext29514" />
</pgterms:file>

<pgterms:file rdf:about="&f;2/9/5/1/29514/29514-8.zip">
   <dc:format><dcterms:IMT><rdf:value>text/plain; 
charset="iso-8859-1"</rdf:value></dcterms:IMT></dc:format>
 
<dc:format><dcterms:IMT><rdf:value>application/zip</rdf:value></dcterms:IMT></dc:format>
   <dcterms:extent>10751</dcterms:extent>
 
<dcterms:modified><dcterms:W3CDTF><rdf:value>2009-07-25</rdf:value></dcterms:W3CDTF></dcterms:modified>
   <dcterms:isFormatOf rdf:resource="#etext29514" />
</pgterms:file>

<pgterms:file rdf:about="&f;2/9/5/1/29514/29514-h/29514-h.htm">
   <dc:format><dcterms:IMT><rdf:value>text/html; 
charset="iso-8859-1"</rdf:value></dcterms:IMT></dc:format>
   <dcterms:extent>29847</dcterms:extent>
 
<dcterms:modified><dcterms:W3CDTF><rdf:value>2009-07-25</rdf:value></dcterms:W3CDTF></dcterms:modified>
   <dcterms:isFormatOf rdf:resource="#etext29514" />
</pgterms:file>

<pgterms:file rdf:about="&f;2/9/5/1/29514/29514-h.zip">
   <dc:format><dcterms:IMT><rdf:value>text/html; 
charset="iso-8859-1"</rdf:value></dcterms:IMT></dc:format>
 
<dc:format><dcterms:IMT><rdf:value>application/zip</rdf:value></dcterms:IMT></dc:format>
   <dcterms:extent>18787</dcterms:extent>
 
<dcterms:modified><dcterms:W3CDTF><rdf:value>2009-07-25</rdf:value></dcterms:W3CDTF></dcterms:modified>
   <dcterms:isFormatOf rdf:resource="#etext29514" />
</pgterms:file>



From pterandon at gmail.com  Mon Jul 27 05:23:38 2009
From: pterandon at gmail.com (Greg M. Johnson)
Date: Mon, 27 Jul 2009 08:23:38 -0400
Subject: [gutvol-d] Compilation ideas
Message-ID: <a0bf3e960907270523t3ead7c35lc59db00679a77f33@mail.gmail.com>

First of all, apologies for the triplicate submissions on the same topic. I
wasn't getting through to the list.

Secondly, the question about resource use with the plucker list underscores
my idea that we should be encouraging torrent-downloads of format-specific
DVDs, arranged with some sort of genre or index stronger than merely
alphabetical author & title.    In a real bookstore, folks might want to
peek at the first two pages of dozens of books before they buy.  In ebooks,
is there a way to serve the customer and save bandwidth?  This is one way.

I'm working on a compilation DVD in HTML format.  If I were to do it over
again, and had a computer to select things, I'd do something like this:

1)  Find  "Best of" lists.  Some IIRC were on the pg web site.  I'd start
with the one about 100 books to read for a liberal education.  Then add a
well-rounded selection of items from "best of " lists for other genres:
mystery, history of science, history, scifi, religion, etc., etc...

2) Figure out how much space you have and what format sceme (all, plucker,
zipped txt's, HTML, etc.) you're going to concentrate on.

3)  Start with the #1 item on every list.   Add the complete works of each
author. (My idea is that the most obscure work of Author #1 may be more fun
to read than the everyone-knows item from Author #99.  )  Continue down the
list until you've filled up your space requirement.   (If you're doing HTML
format, skip the >20 MB books. )

4)  Indices:  Author Alphabetical, Title Alphabetical,  LOC classification
(not my original idea this one), and editors' picks. ( Strongly encourage
3-12 well-read and opinionated folks to come up with "Editor's Picks"
lists.)






-- 
Greg M. Johnson
http://pterandon.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090727/2eb111da/attachment.html>

From desrod at gnu-designs.com  Mon Jul 27 07:34:39 2009
From: desrod at gnu-designs.com (David A. Desrosiers)
Date: Mon, 27 Jul 2009 10:34:39 -0400
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <4A6D5639.5080600@perathoner.de>
References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com>
	<212322090907151728x84b224by697776eec6e265a4@mail.gmail.com>
	<a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com>
	<4A6D5639.5080600@perathoner.de>
Message-ID: <a82cdbb90907270734k24ecd8b5y6a75e3b3e8297eb6@mail.gmail.com>

On Mon, Jul 27, 2009 at 3:24 AM, Marcello
Perathoner<marcello at perathoner.de> wrote:
> That seems an horrible waste of resources seeing that you only need to scan
> the rdf file to see what files we have.

Scanning the RDF file tells me absolutely nothing about the
availability of the actual target format itself. Checking HEAD on each
target link does, however. Since I'm caching it on the server-side, I
only have to remotely check it the first time, which is not a
"horrible waste of resources" at all.

From Bowerbird at aol.com  Mon Jul 27 09:40:57 2009
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 27 Jul 2009 12:40:57 EDT
Subject: [gutvol-d] Re: !@! Barnes & Noble Unveils Online Bookstore to
	Compete Directly With Amazon
Message-ID: <d12.25bbad71.379f3299@aol.com>

b&n won't be able to compete with the kindle 3:
>    http://www.youtube.com/watch?v=GI0Zry_R4RQ

"i can't hear you; i'm reading!"

-bowerbird



**************
An Excellent Credit Score is 750. See Yours in Just 2 Easy 
Steps! 
(http://pr.atwola.com/promoclk/100126575x1221823322x1201398723/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072&amp;hmpgID=62&amp;
bcd=JulyExcfooterNO62)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090727/8b0b639f/attachment.html>

From ralf at ark.in-berlin.de  Mon Jul 27 10:45:05 2009
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Mon, 27 Jul 2009 19:45:05 +0200
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <a82cdbb90907270734k24ecd8b5y6a75e3b3e8297eb6@mail.gmail.com>
References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com>
	<212322090907151728x84b224by697776eec6e265a4@mail.gmail.com>
	<a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com>
	<4A6D5639.5080600@perathoner.de>
	<a82cdbb90907270734k24ecd8b5y6a75e3b3e8297eb6@mail.gmail.com>
Message-ID: <002494FB-121D-4CC8-80B0-AE3EFA238ADB@ark.in-berlin.de>


On Jul 27, 2009, at 4:34 PM, David A. Desrosiers wrote:

> On Mon, Jul 27, 2009 at 3:24 AM, Marcello
> Perathoner<marcello at perathoner.de> wrote:
>> That seems an horrible waste of resources seeing that you only need  
>> to scan
>> the rdf file to see what files we have.
>
> Scanning the RDF file tells me absolutely nothing about the
> availability of the actual target format itself. Checking HEAD on each
> target link does, however. Since I'm caching it on the server-side, I
> only have to remotely check it the first time, which is not a
> "horrible waste of resources" at all.

My, can't we admit that XPath is a bit over our head,
so we prefer confronting the admin we're supposed
to be cooperating with? Wrt resources, my guess it's
about par traffic-wise (1-5k per book vs. megabytes
of RDF) but much better CPU-wise. That is, if you don't
want the RDF for other fine things like metadata etc.


ralf



From desrod at gnu-designs.com  Mon Jul 27 11:42:36 2009
From: desrod at gnu-designs.com (David A. Desrosiers)
Date: Mon, 27 Jul 2009 14:42:36 -0400
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <002494FB-121D-4CC8-80B0-AE3EFA238ADB@ark.in-berlin.de>
References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com>
	<212322090907151728x84b224by697776eec6e265a4@mail.gmail.com>
	<a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com>
	<4A6D5639.5080600@perathoner.de>
	<a82cdbb90907270734k24ecd8b5y6a75e3b3e8297eb6@mail.gmail.com>
	<002494FB-121D-4CC8-80B0-AE3EFA238ADB@ark.in-berlin.de>
Message-ID: <a82cdbb90907271142p34e72bbdv86211a862580560f@mail.gmail.com>

On Mon, Jul 27, 2009 at 1:45 PM, Ralf Stephan<ralf at ark.in-berlin.de> wrote:
> My, can't we admit that XPath is a bit over our head,
> so we prefer confronting the admin we're supposed
> to be cooperating with? Wrt resources, my guess it's
> about par traffic-wise (1-5k per book vs. megabytes
> of RDF) but much better CPU-wise. That is, if you don't
> want the RDF for other fine things like metadata etc.

I think you've missed my point.

The RDF flat-out cannot tell me which of the target _formats_ are
available for immediate download to the users. I'm not looking for
which _titles_ are available in the catalog, I'm looking for which
_formats_ are available. Also note that I'm already parsing the feeds
to see what the top 'n' titles are already, so parsing XML via
whatever methods I need is not the blocker here.

Let me give you an example of two titles available in the catalog:

Verg?nglichkeit by Sigmund Freud
http://www.gutenberg.org/cache/plucker/29514/29514

The Lost Word by Henry Van Dyke
http://www.gutenberg.org/cache/plucker/4384/4384

Both of these _titles_ are available in the Gutenberg catalog, but the
second one is not available in the Plucker _format_ for immediate
download. Big difference from parsing title availability from the
catalog.rdf file.

Make sense now?

From ralf at ark.in-berlin.de  Tue Jul 28 00:16:41 2009
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Tue, 28 Jul 2009 09:16:41 +0200
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <a82cdbb90907271142p34e72bbdv86211a862580560f@mail.gmail.com>
References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com>
	<212322090907151728x84b224by697776eec6e265a4@mail.gmail.com>
	<a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com>
	<4A6D5639.5080600@perathoner.de>
	<a82cdbb90907270734k24ecd8b5y6a75e3b3e8297eb6@mail.gmail.com>
	<002494FB-121D-4CC8-80B0-AE3EFA238ADB@ark.in-berlin.de>
	<a82cdbb90907271142p34e72bbdv86211a862580560f@mail.gmail.com>
Message-ID: <79902BC0-1EED-4021-B529-60A6CE413F38@ark.in-berlin.de>

I confirm that neither the Plucker nor the Mobile formats
are mentioned in the catalog file. Do you have an
explanation, Marcello?


ralf

On Jul 27, 2009, at 8:42 PM, David A. Desrosiers wrote:

> On Mon, Jul 27, 2009 at 1:45 PM, Ralf Stephan<ralf at ark.in-berlin.de>  
> wrote:
>> My, can't we admit that XPath is a bit over our head,
>> so we prefer confronting the admin we're supposed
>> to be cooperating with? Wrt resources, my guess it's
>> about par traffic-wise (1-5k per book vs. megabytes
>> of RDF) but much better CPU-wise. That is, if you don't
>> want the RDF for other fine things like metadata etc.
>
> I think you've missed my point.
>
> The RDF flat-out cannot tell me which of the target _formats_ are
> available for immediate download to the users. I'm not looking for
> which _titles_ are available in the catalog, I'm looking for which
> _formats_ are available. Also note that I'm already parsing the feeds
> to see what the top 'n' titles are already, so parsing XML via
> whatever methods I need is not the blocker here.
>
> Let me give you an example of two titles available in the catalog:
>
> Verg?nglichkeit by Sigmund Freud
> http://www.gutenberg.org/cache/plucker/29514/29514
>
> The Lost Word by Henry Van Dyke
> http://www.gutenberg.org/cache/plucker/4384/4384
>
> Both of these _titles_ are available in the Gutenberg catalog, but the
> second one is not available in the Plucker _format_ for immediate
> download. Big difference from parsing title availability from the
> catalog.rdf file.
>
> Make sense now?
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d

Ralf Stephan
http://www.ark.in-berlin.de
pub   1024D/C5114CB2 2009-06-07 [expires: 2011-06-06]
       Key fingerprint = 76AE 0D21 C06C CBF9 24F8  7835 1809 DE97 C511  
4CB2





From gbnewby at pglaf.org  Tue Jul 28 05:50:29 2009
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue, 28 Jul 2009 05:50:29 -0700
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <79902BC0-1EED-4021-B529-60A6CE413F38@ark.in-berlin.de>
References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com>
	<212322090907151728x84b224by697776eec6e265a4@mail.gmail.com>
	<a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com>
	<4A6D5639.5080600@perathoner.de>
	<a82cdbb90907270734k24ecd8b5y6a75e3b3e8297eb6@mail.gmail.com>
	<002494FB-121D-4CC8-80B0-AE3EFA238ADB@ark.in-berlin.de>
	<a82cdbb90907271142p34e72bbdv86211a862580560f@mail.gmail.com>
	<79902BC0-1EED-4021-B529-60A6CE413F38@ark.in-berlin.de>
Message-ID: <20090728125029.GA24834@mail.pglaf.org>

On Tue, Jul 28, 2009 at 09:16:41AM +0200, Ralf Stephan wrote:
> I confirm that neither the Plucker nor the Mobile formats
> are mentioned in the catalog file. Do you have an
> explanation, Marcello?

I believe Marcello is out on vacation for 2 weeks.

But I know the explanation: the epub, mobi and a few other
formats are not part of the Project Gutenberg collection's
files, so not part of the database.

They are generated on-demand (or cached if they were generated
recently enough), from HTML or text.

We are planning many more "on the fly" conversion options for
the future.  I have one for a mobile eBook format (for cell
phones), and hope to have a PDF converter (with lots of options).
We've been working on some text-to-speech converters, too, but
that work has gone slowly.

The catalog file only tracks the actual files that are stored
as part of the collection (stuff you can view while navigating
the directory tree via FTP or other methods).
  -- Greg

> On Jul 27, 2009, at 8:42 PM, David A. Desrosiers wrote:
>
>> On Mon, Jul 27, 2009 at 1:45 PM, Ralf Stephan<ralf at ark.in-berlin.de>  
>> wrote:
>>> My, can't we admit that XPath is a bit over our head,
>>> so we prefer confronting the admin we're supposed
>>> to be cooperating with? Wrt resources, my guess it's
>>> about par traffic-wise (1-5k per book vs. megabytes
>>> of RDF) but much better CPU-wise. That is, if you don't
>>> want the RDF for other fine things like metadata etc.
>>
>> I think you've missed my point.
>>
>> The RDF flat-out cannot tell me which of the target _formats_ are
>> available for immediate download to the users. I'm not looking for
>> which _titles_ are available in the catalog, I'm looking for which
>> _formats_ are available. Also note that I'm already parsing the feeds
>> to see what the top 'n' titles are already, so parsing XML via
>> whatever methods I need is not the blocker here.
>>
>> Let me give you an example of two titles available in the catalog:
>>
>> Verg?nglichkeit by Sigmund Freud
>> http://www.gutenberg.org/cache/plucker/29514/29514
>>
>> The Lost Word by Henry Van Dyke
>> http://www.gutenberg.org/cache/plucker/4384/4384
>>
>> Both of these _titles_ are available in the Gutenberg catalog, but the
>> second one is not available in the Plucker _format_ for immediate
>> download. Big difference from parsing title availability from the
>> catalog.rdf file.
>>
>> Make sense now?
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d at lists.pglaf.org
>> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
> Ralf Stephan
> http://www.ark.in-berlin.de
> pub   1024D/C5114CB2 2009-06-07 [expires: 2011-06-06]
>       Key fingerprint = 76AE 0D21 C06C CBF9 24F8  7835 1809 DE97 C511  
> 4CB2
>
>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d

From joshua at hutchinson.net  Tue Jul 28 07:16:15 2009
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue, 28 Jul 2009 14:16:15 +0000 (GMT)
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
Message-ID: <891262632.99249.1248790575502.JavaMail.mail@webmail05>

An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090728/97a29e33/attachment.html>

From gbnewby at pglaf.org  Tue Jul 28 07:33:06 2009
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue, 28 Jul 2009 07:33:06 -0700
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <891262632.99249.1248790575502.JavaMail.mail@webmail05>
References: <891262632.99249.1248790575502.JavaMail.mail@webmail05>
Message-ID: <20090728143306.GA28986@mail.pglaf.org>

On Tue, Jul 28, 2009 at 02:16:15PM +0000, Joshua Hutchinson wrote:
>    Any chance of creating on the fly zips of some of the books??? For
>    instance, the audio books are huge and usually divided along chapter
>    lines.?? Single file zips are very useful (and something we've done on some
>    of them manually) but the space waste is huge.?? On the fly zipping of
>    those files would save huge in storage space.
> 
>    Josh

Somebody would need to write the software :)

Zipping an mp3 is not a winning strategy: they really don't
compress much, if at all.

Putting multiple mp3 files for a single eBook in one file,
on the fly, would be a great move - making it easier to 
download a group of files.

A more general approach would be to let visitors to www.gutenberg.org
put their selected files (including those generated on-the-fly)
on a bookshelf (i.e., shopping cart), then download in one big file,
or several small ones.  

This would involve some fairly significant additions to the
current PHP-based back-end at www.gutenberg.org, but is certainly
not a huge technical feat.
  -- Greg


>    On Jul 28, 2009, Greg Newby <gbnewby at pglaf.org> wrote:
> 
>      On Tue, Jul 28, 2009 at 09:16:41AM +0200, Ralf Stephan wrote:
>      > I confirm that neither the Plucker nor the Mobile formats
>      > are mentioned in the catalog file. Do you have an
>      > explanation, Marcello?
> 
>      I believe Marcello is out on vacation for 2 weeks.
> 
>      But I know the explanation: the epub, mobi and a few other
>      formats are not part of the Project Gutenberg collection's
>      files, so not part of the database.
> 
>      They are generated on-demand (or cached if they were generated
>      recently enough), from HTML or text.
> 
>      We are planning many more "on the fly" conversion options for
>      the future. I have one for a mobile eBook format (for cell
>      phones), and hope to have a PDF converter (with lots of options).
>      We've been working on some text-to-speech converters, too, but
>      that work has gone slowly.
> 
>      The catalog file only tracks the actual files that are stored
>      as part of the collection (stuff you can view while navigating
>      the directory tree via FTP or other methods).
>      -- Greg
> 
>      > On Jul 27, 2009, at 8:42 PM, David A. Desrosiers wrote:
>      >
>      >> On Mon, Jul 27, 2009 at 1:45 PM, Ralf
>      Stephan<[1]ralf at ark.in-berlin.de>
>      >> wrote:
>      >>> My, can't we admit that XPath is a bit over our head,
>      >>> so we prefer confronting the admin we're supposed
>      >>> to be cooperating with? Wrt resources, my guess it's
>      >>> about par traffic-wise (1-5k per book vs. megabytes
>      >>> of RDF) but much better CPU-wise. That is, if you don't
>      >>> want the RDF for other fine things like metadata etc.
>      >>
>      >> I think you've missed my point.
>      >>
>      >> The RDF flat-out cannot tell me which of the target _formats_ are
>      >> available for immediate download to the users. I'm not looking for
>      >> which _titles_ are available in the catalog, I'm looking for which
>      >> _formats_ are available. Also note that I'm already parsing the feeds
>      >> to see what the top 'n' titles are already, so parsing XML via
>      >> whatever methods I need is not the blocker here.
>      >>
>      >> Let me give you an example of two titles available in the catalog:
>      >>
>      >> Verg??nglichkeit by Sigmund Freud
>      >> [2]http://www.gutenberg.org/cache/plucker/29514/29514
>      >>
>      >> The Lost Word by Henry Van Dyke
>      >> [3]http://www.gutenberg.org/cache/plucker/4384/4384
>      >>
>      >> Both of these _titles_ are available in the Gutenberg catalog, but
>      the
>      >> second one is not available in the Plucker _format_ for immediate
>      >> download. Big difference from parsing title availability from the
>      >> catalog.rdf file.
>      >>
>      >> Make sense now?
>      >> _______________________________________________
>      >> gutvol-d mailing list
>      >> [4]gutvol-d at lists.pglaf.org
>      >> [5]http://lists.pglaf.org/mailman/listinfo/gutvol-d
>      >
>      > Ralf Stephan
>      > [6]http://www.ark.in-berlin.de
>      > pub 1024D/C5114CB2 2009-06-07 [expires: 2011-06-06]
>      > Key fingerprint = 76AE 0D21 C06C CBF9 24F8 7835 1809 DE97 C511
>      > 4CB2
>      >
>      >
>      >
>      >
>      > _______________________________________________
>      > gutvol-d mailing list
>      > [7]gutvol-d at lists.pglaf.org
>      > [8]http://lists.pglaf.org/mailman/listinfo/gutvol-d
>      _______________________________________________
>      gutvol-d mailing list
>      [9]gutvol-d at lists.pglaf.org
>      [10]http://lists.pglaf.org/mailman/listinfo/gutvol-d
> 
> References
> 
>    Visible links
>    1. mailto:ralf at ark.in-berlin.de
>    2. http://www.gutenberg.org/cache/plucker/29514/29514
>    3. http://www.gutenberg.org/cache/plucker/4384/4384
>    4. mailto:gutvol-d at lists.pglaf.org
>    5. http://lists.pglaf.org/mailman/listinfo/gutvol-d
>    6. http://www.ark.in-berlin.de/
>    7. mailto:gutvol-d at lists.pglaf.org
>    8. http://lists.pglaf.org/mailman/listinfo/gutvol-d
>    9. mailto:gutvol-d at lists.pglaf.org
>   10. http://lists.pglaf.org/mailman/listinfo/gutvol-d

> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d


From joey at joeysmith.com  Tue Jul 28 17:24:08 2009
From: joey at joeysmith.com (Joey Smith)
Date: Tue, 28 Jul 2009 18:24:08 -0600
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <20090728143306.GA28986@mail.pglaf.org>
References: <891262632.99249.1248790575502.JavaMail.mail@webmail05>
	<20090728143306.GA28986@mail.pglaf.org>
Message-ID: <20090729002407.GA5389@joeysmith.com>

On Tue, Jul 28, 2009 at 07:33:06AM -0700, Greg Newby wrote:
> Somebody would need to write the software :)
> 
> Zipping an mp3 is not a winning strategy: they really don't
> compress much, if at all.
> 
> Putting multiple mp3 files for a single eBook in one file,
> on the fly, would be a great move - making it easier to 
> download a group of files.
> 
> A more general approach would be to let visitors to www.gutenberg.org
> put their selected files (including those generated on-the-fly)
> on a bookshelf (i.e., shopping cart), then download in one big file,
> or several small ones.  
> 
> This would involve some fairly significant additions to the
> current PHP-based back-end at www.gutenberg.org, but is certainly
> not a huge technical feat.
>   -- Greg

Where can one find the code for the "current PHP-based back-end at www.gutenberg.org"
to begin doing looking into how feasible this would be?

From gbnewby at pglaf.org  Wed Jul 29 06:27:08 2009
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed, 29 Jul 2009 06:27:08 -0700
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <20090729002407.GA5389@joeysmith.com>
References: <891262632.99249.1248790575502.JavaMail.mail@webmail05>
	<20090728143306.GA28986@mail.pglaf.org>
	<20090729002407.GA5389@joeysmith.com>
Message-ID: <20090729132708.GA31539@mail.pglaf.org>

On Tue, Jul 28, 2009 at 06:24:08PM -0600, Joey Smith wrote:
> On Tue, Jul 28, 2009 at 07:33:06AM -0700, Greg Newby wrote:
> > Somebody would need to write the software :)
> > 
> > Zipping an mp3 is not a winning strategy: they really don't
> > compress much, if at all.
> > 
> > Putting multiple mp3 files for a single eBook in one file,
> > on the fly, would be a great move - making it easier to 
> > download a group of files.
> > 
> > A more general approach would be to let visitors to www.gutenberg.org
> > put their selected files (including those generated on-the-fly)
> > on a bookshelf (i.e., shopping cart), then download in one big file,
> > or several small ones.  
> > 
> > This would involve some fairly significant additions to the
> > current PHP-based back-end at www.gutenberg.org, but is certainly
> > not a huge technical feat.
> >   -- Greg
> 
> Where can one find the code for the "current PHP-based back-end at www.gutenberg.org"
> to begin doing looking into how feasible this would be?

Thanks for your interest :)

It isn't bundled up for download anywhere.  We'll probably need to wait
for Marcello's return from vacation to provide details on how to add
components like this.  The current system is modular & (I think)
well-organized, but complex...including lots of stuff that readers never
see (such as the cataloger interface and various programs that add new
files).  Plus, as you know, there is a lot of stuff that is in the
Wiki, rather than PHP.  The Wiki might be where new features could
be added, or there might be modules "out there" that could make it easier. 

I did grab catalog/world/bibrec.php , where bibrecs like this are
made:
  http://www.gutenberg.org/etext/11

It is below.  This should give you an idea how where various things are
tied in from the database, the on-disk cached records, and stuff that is
generated on the fly.  The various .phh files it references (which cascade
to include a whole bunch of stuff) are mostly for presentation (html
and css), not functionality.

A bookshelf/shopping cart would probably be a brand new set of files,
with just a little overlap with the existing php.  It would need to
access the database, and presumably would need a table or two to keep
track of bookshelf users & entries.  (Maybe a separate database...maybe
part of the Wiki instead of a standalone set of PHP programs.)  Cookies,
or something similar, could be used to track user sessions and their
bookshelves/shopping carts/whatever, and add an entry to various pages
at www.gutenberg.org for them to access it (sort of like a regular
ecommerce site).


-------- bibrec.php
<?php

include_once ("pgcat.phh");

$cli = php_sapi_name () == "cli";
if ($cli) {
  $fk_books = intval ($_SERVER['argv'][1]);
} else {
  $devserver = preg_match ("/www-dev/", $_SERVER['HTTP_HOST']);
  if ($devserver) {
    nocache ();
  }
  getint ("fk_books");
}

$db = $config->db ();

$keywords     = array ();
$frontispiece = null;
$category     = 0;
$newfilesys   = false;
$help         = "/wiki/Gutenberg:Help_on_Bibliographic_Record_Page";
$helpicon     = "<img src=\"/pics/help.png\" class=\"helpicon\" alt=\"[help]\"$config->endtag>";

$db->exec ("select * from mn_books_categories where fk_books = $fk_books order by fk_categories");
if ($db->FirstRow ()) {
  $category = $db->get ("fk_categories", SQLINT);
}

$friendlytitle = friendlytitle ($fk_books, 80);
$config->description = htmlspecialchars ("Download the free {$category_descriptions[$category]}: $friendlytitle");

for ($i = 0; $i < 26; ++$i) {
  $base32[sprintf ("%05b", $i)] = chr (0x41 + $i);
}
for ($i = 26; $i < 32; ++$i) {
  $base32[sprintf ("%05b", $i)] = chr (0x32 + $i - 26);
}

// find best file for recode facility

class recode_candidate {
  function recode_candidate () {
    $this->score = 0;
    $this->fk_files = null;
    $this->filename = null;
    $this->encoding = null;
    $this->type     = null;
  }
}

function find_recode_candidate ($fk_books) {
  global $db;

  $candidate = new recode_candidate ();

  $db->exec ("select pk, filename, fk_encodings from files " . 
             "where fk_books = $fk_books and fk_filetypes = 'txt' " . 
             "and fk_compressions = 'none' and diskstatus = 0 and obsoleted = 0");

  if ($db->FirstRow ()) {
    do {
      $tmp = new recode_candidate ();
      $tmp->fk_files = $db->get ("pk", SQLINT);
      $tmp->filename = $db->get ("filename", SQLCHAR);
      $tmp->encoding = $db->get ("fk_encodings", SQLCHAR);

      if ((!isset ($tmp->encoding) || $tmp->encoding == "us-ascii")) {
        $tmp->score = 1;
        $tmp->encoding = "ASCII";
      }
      if ($tmp->encoding == "big5") {
        $tmp->score = 2;
        $tmp->encoding = "BIG-5";
      }
      if ($tmp->encoding == "euc-kr") {
        $tmp->score = 2;
        $tmp->encoding = "EUC-KR";
      }
      if ($tmp->encoding == "Shift_JIS") {
        $tmp->score = 2;
        $tmp->encoding = "SHIFT-JIS";
      }
      if (!strncmp ($tmp->encoding, "iso-", 4)) {
        $tmp->score = 3;
      }
      if (!strncmp ($tmp->encoding, "windows-", 8)) {
        $tmp->score = 4;
      }
      if ($tmp->encoding == "utf-8") {
        $tmp->score = 5;
        $tmp->encoding = "UTF-8";
      }

      if ($tmp->score > $candidate->score) {
        $candidate = $tmp;
      }
    } while ($db->NextRow ());
  }

  return $candidate;
}

function find_plucker_candidate ($fk_books) {
  global $db;

  $candidate = new recode_candidate ();

  $db->exec ("select pk, filename, fk_encodings, fk_filetypes from files " . 
             "where fk_books = $fk_books and (fk_filetypes = 'txt' or fk_filetypes = 'html')" . 
             "and fk_compressions = 'none' and diskstatus = 0 and obsoleted = 0");

  if ($db->FirstRow ()) {
    do {
      $tmp = new recode_candidate ();
      $tmp->fk_files = $db->get ("pk", SQLINT);
      $tmp->filename = $db->get ("filename", SQLCHAR);
      $tmp->encoding = $db->get ("fk_encodings", SQLCHAR);
      $tmp->type     = $db->get ("fk_filetypes", SQLCHAR);

      if ((!isset ($tmp->encoding) || $tmp->encoding == "us-ascii")) {
        $tmp->score = 1;
      }
      if ($tmp->encoding == "iso-8859-1") {
        $tmp->score = 2;
      }
      /* if ($tmp->encoding == "windows-1252") {
        $tmp->score = 3;
      } */
      if ($tmp->type == "html") {
        $tmp->score = 4;
      }
      if ($tmp->score > $candidate->score) {
        $candidate = $tmp;
      }
    } while ($db->NextRow ());
  }

  return $candidate;
}

function base32_encode ($in) {
  global $base32;

  $bits = "";

  $in = @pack ("H*", $in);
  $len = strlen ($in);
  for ($i = 0; $i < $len; $i++) {
    $bits .= sprintf ("%08b", ord ($in{$i}));
  }

  if ($mod = strlen ($bits) % 5) {
    $bits .= str_repeat ("0", 5 - $mod);
  }

  return strtr ($bits, $base32);
}

class DownloadColumn extends dbtSimpleColumn {
  function DownloadColumn () {
    global $help, $helpicon;
    parent::dbtSimpleColumn (null, "Download&nbsp;Links <a href=\"$help#Download_Links\" title=\"Explain Download Links.\">$helpicon</a>", "pgdbfilesdownload");
  }
  function Data ($db) {
    global $config, $friendlytitle, $fk_books, $newfsbasedir;

    $filename = $db->get ("filename", SQLCHAR);

    $extension = "";
    if (preg_match ("/(\.[^.]+)$/", $filename, $matches)) {
      $extension = $matches[1];
    }

    $dir = etext2dir ($fk_books);
    if (preg_match ("!^$dir!", $filename)) {
      $symlink = preg_replace ("!^$dir!", $newfsbasedir, $filename);
    } else {
      $symlink = "$config->downloadbase/$filename";
    }

    $links = array ();

    $links[] = "<a href=\"$symlink\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main&nbsp;site</span></a>";
    $links[] = "<a href=\"$config->world/mirror-redirect?file=$filename\" title=\"Download from mirror site.\" rel=\"nofollow\">mirror&nbsp;sites</a>";

    $sha1 = base32_encode ($db->get ("sha1hash", SQLCHAR));
    $tt   = base32_encode ($db->get ("tigertreehash", SQLCHAR));

    $links[] = "<a href=\"magnet:?xt=urn:sha1:$sha1" .
      "&amp;xt=urn:kzhash:" . $db->get ("kzhash", SQLCHAR) .
      "&amp;xt=urn:ed2k:"   . $db->get ("ed2khash", SQLCHAR) .
      "&amp;xt=urn:bitprint:$sha1.$tt" .
      "&amp;xs=http://$config->domain$symlink" . 
      "&amp;dn=" . urlencode ("$friendlytitle$extension") .
      "\" title=\"Magnetlink to download from P2P network.\">P2P</a>";

    return "<td class=\"pgdbfilesdownload\">" . join (" ", $links) . "</td>";
  }
}

$array = array ();

$db->exec ("select * from books where pk = $fk_books;");

if (!$db->FirstRow ()) {
  error_msg ("No etext no. $fk_books.");
}

$release_date = $db->get ("release_date");
$copyrighted  = $db->get ("copyrighted") ? 
                "Copyrighted. You may download this ebook but you may be limited in other uses. Check the license inside the ebook." : 
                "Not copyrighted in the United States. If you live elsewhere check the laws of your country before downloading this ebook.";

$db->exec (
"select * from authors, roles, mn_books_authors 
where mn_books_authors.fk_books = $fk_books
and authors.pk = mn_books_authors.fk_authors
and roles.pk = mn_books_authors.fk_roles order by role, author"
);
$db->calcfields ["c_author"] = new CalcFieldAuthorDate ();

if ($db->FirstRow ()) {
  do {
    $pk   = $db->get ("fk_authors", SQLINT);
    $name = $db->get ("c_author",   SQLCHAR);
    $role = htmlspecialchars ($db->get ("role",     SQLCHAR));
    $array [] = preg_replace ("/ /", "&nbsp;", $role);
    $array [] = "<a href=\"/browse/authors/" . find_browse_page ($name) . "#a$pk\">$name</a>";
    $keywords [] = htmlspecialchars ($db->get ("author", SQLCHAR));
  } while ($db->NextRow ());
}

$db->exec ("select attributes.*, attriblist.name, attriblist.caption from attributes, attriblist " . 
           "where attributes.fk_books = $fk_books and " . 
           "attributes.fk_attriblist = attriblist.pk " . 
           "order by attriblist.name;");

if ($db->FirstRow ()) {
  do {
    $note    = htmlspecialchars ($db->get ("text", SQLCHAR));
    $caption = htmlspecialchars ($db->get ("caption", SQLCHAR));
    $note    = preg_replace ("/\n/", "<br$config->endtag>", $note);

    if ($caption) {
      $name  = $db->get ("name", SQLCHAR);
      switch (intval ($name)) {
      case 901:
        $note = "<a href=\"$note?nocount\"><img src=\"$note?nocount\" title=\"$caption\" alt=\"$caption\" $config->endtag></a>";
        break;
      case 902:
      case 903:
        $note = "<a href=\"$note?nocount\">$caption</a>";
        break;
      case 10:
	$note = "$note <img src=\"/pics/link.png\" alt=\"\" $config->endtag> <a href=\"http://lccn.loc.gov/$note\" title=\"Look up this book in the Library of Congress catalog.\">LoC catalog record</a>";
	break;
      default:
	$note = strip_marc_subfields ($note);
	if (substr ($name, 0, 1) == '5') {
	  $patterns = array ("/http:\/\/\S+/", "/#(\d+)/");
	  $replaces = array ("<a href=\"$0\">$0</a>", "<a href=\"/ebooks/$1\">$0</a>");
	  $note = preg_replace ($patterns, $replaces, $note);
	}
      }
      $array [] = preg_replace ("/ /", "&nbsp;", $caption);
      $array [] = $note;
    }
  } while ($db->NextRow ());
}

$db->exec ("select * from langs, mn_books_langs 
where langs.pk = mn_books_langs.fk_langs
and mn_books_langs.fk_books = $fk_books;"
);

if ($db->FirstRow ()) {
  do {
    $pk   = $db->get ("pk", SQLCHAR);
    $lang = htmlspecialchars ($db->get ("lang", SQLCHAR));
    $array [] = "Language";
    if ($pk != 'en') {
      $array [] = "<a href=\"/browse/languages/$pk\">$lang</a>";
    } else {
      $array [] = $lang;
    }
  } while ($db->NextRow ());
}

$db->exec ("select * from loccs, mn_books_loccs 
where loccs.pk = mn_books_loccs.fk_loccs
and mn_books_loccs.fk_books = $fk_books;"
);

if ($db->FirstRow ()) {
  do {
    $pk   = $db->get ("pk", SQLCHAR);
    $pkl  = strtolower ($pk);
    $locc = htmlspecialchars ($db->get ("locc", SQLCHAR));
    $array [] = "LoC&nbsp;Class";
    $array [] = "<a href=\"/browse/loccs/$pkl\">$pk: $locc</a>";
    $keywords [] = $locc;
  } while ($db->NextRow ());
}

$db->exec ("select * from subjects, mn_books_subjects 
where subjects.pk = mn_books_subjects.fk_subjects
and mn_books_subjects.fk_books = $fk_books;"
);

if ($db->FirstRow ()) {
  do {
    $subject = htmlspecialchars ($db->get ("subject", SQLCHAR));
    // $url = urlencode ($subject);
    $array [] = "Subject";
    //    $array [] = "<a href=\"$config->world/results?subject=$url\">$subject</a>";
    $array [] = $subject;
    $keywords [] = $subject;
  } while ($db->NextRow ());
}

$db->exec ("select * from categories, mn_books_categories 
where categories.pk = mn_books_categories.fk_categories 
and mn_books_categories.fk_books = $fk_books;");

if ($db->FirstRow ()) {
  do {
    $pk       = $db->get ("pk", SQLINT);
    $category = $db->get ("category", SQLCHAR);
    $array [] = "Category";
    $array [] = "<a href=\"/browse/categories/$pk\">$category</a>";
  } while ($db->NextRow ());
}


$array [] = "EText-No.";
$array [] = $fk_books;
$array [] = "Release&nbsp;Date";
$array [] = $release_date;
$array [] = "Copyright Status";
$array [] = $copyrighted;

$db->exec ("select count (*) as cnt from reviews.reviews where fk_books = $fk_books");
if (($cnt = $db->get ("cnt", SQLINT)) > 0) {
  $s = ($cnt == 1) ? "is a review" : "are $cnt reviews";
  $array [] = "Reviews";
  $array [] = "<a href=\"$config->world/reviews?fk_books=$fk_books\">There $s of this book available.</a>";
}

$newfsbasedir = "$config->files/$fk_books/";
$db->exec ("select filename from files where fk_books = $fk_books and filename ~ '^[1-9]/'");
if ($db->FirstRow ()) {
  $newfilesys = true;
  $array [] = "Base&nbsp;Directory";
  $array [] = "<a href=\"$newfsbasedir\">$newfsbasedir</a>";
}

for ($i = 0; $i < count ($keywords); $i++) {
  $keywords[$i] = preg_replace ("/,\s*/", " ", $keywords[$i]);
}
$config->keywords = htmlspecialchars (join (", ", $keywords)) . ", $config->keywords";

$recode_candidate  = find_recode_candidate  ($fk_books);
$plucker_candidate = find_plucker_candidate ($fk_books);
$offer_recode  = $recode_candidate->score  > 0;
$offer_plucker = $plucker_candidate->score > 0;

///////////////////////////////////////////////////////////////////////////////
// start page output

pageheader (htmlspecialchars ($friendlytitle));

$manubar = array ();

$menubar[] = "<a href=\"$help\" title=\"Explain this page.\" rel=\"Help\">Help</a>";

if ($offer_recode) {
  $menubar[] = "<a href=\"$config->world/readfile?fk_files=$recode_candidate->fk_files\" title=\"Read this book online.\"rel=\"nofollow\">Read online</a>";
}

p (join (" &mdash; ", $menubar));

echo ("<div class=\"pgdbdata\">\n\n");

$table = new BibrecTable ();
$table->summary = "Bibliographic data of author and book.";
$table->toprows = $array;
$table->PrintTable (null, "Bibliographic Record <a href=\"$help#Table:_Bibliographic_Record\" title=\"Explain this table.\">$helpicon</a>");

echo ("</div>\n\n");

$db->exec ("select filetype, sortorder, compression, " . 
           "case files.fk_filetypes when 'txt' then fk_encodings when 'mp3' then fk_encodings else null end as fk_encodings, " . 
	   "edition, filename, filesize, sha1hash, kzhash, tigertreehash, ed2khash " . 
	   "from files " . 
	   "left join filetypes on files.fk_filetypes = filetypes.pk " . 
	   "left join compressions on files.fk_compressions = compressions.pk " .
	   "where fk_books = $fk_books and obsoleted = 0 and diskstatus = 0 " . 
	   "order by edition desc, sortorder, filetype, fk_encodings, compression, filename;");
$db->calcfields ["c_hrsize"] = new CalcFieldHRSize ();

echo ("<div class=\"pgdbfiles\">\n\n");

echo ("<h2>Download this ebook for free</h2>\n\n");

class FilesTable extends ListTable {
  function FilesTable () {
    global $newfilesys, $offer_recode, $help, $helpicon;
    if (!$newfilesys) {
      $this->AddSimpleColumn ("edition",      "Edition",     "narrow pgdbfilesedition");
    }
    $footnote = ($offer_recode) ? "&nbsp;\xC2\xB9" : "";
    $this->AddSimpleColumn ("filetype",     "Format <a href=\"$help#Format\" title=\"Explain Format.\">$helpicon</a>", "pgdbfilesformat");
    $this->AddSimpleColumn ("fk_encodings", "Encoding$footnote <a href=\"$help#Encoding\" title=\"Explain Encoding.\">$helpicon</a>",    "pgdbfilesencoding");
    $this->AddSimpleColumn ("compression",  "Compression <a href=\"$help#Compression\" title=\"Explain Compression.\">$helpicon</a>", "pgdbfilescompression");
    $this->AddSimpleColumn ("c_hrsize",     "Size",        "right narrow pgdbfilessize");
    $this->AddColumnObject (new DownloadColumn ());

    $this->limit = -1;
  }
}

$array = array ();

function epub_file ($fk_books) {
  return "/cache/epub/$fk_books/pg$fk_books.epub";
}

function epub_images_file ($fk_books) {
  return "/cache/epub/$fk_books/pg${fk_books}-images.epub";
}

function mobi_file ($fk_books) {
  return "/cache/epub/$fk_books/pg$fk_books.mobi";
}

function mobi_images_file ($fk_books) {
  return "/cache/epub/$fk_books/pg${fk_books}-images.mobi";
}

$epub        = epub_file ($fk_books);
$epub_images = epub_images_file ($fk_books);
$mobi        = mobi_file ($fk_books);
$mobi_images = mobi_images_file ($fk_books);

// epub stuff

if (is_readable ("$config->documentroot$epub") && filesize ("$config->documentroot$epub") > 1024) {
  if (!$newfilesys) {
    $array [] = "";
  }
  $array [] = "EPUB (experimental) <a href=\"$help#EPUB\" title=\"Explain EPUB.\">$helpicon</a>";
  $array [] = "";
  $array [] = "";
  $array [] = human_readable_size (filesize ("$config->documentroot$epub"));
  $array [] = "<a href=\"$epub\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main&nbsp;site</span></a>";
}

if (is_readable ("$config->documentroot$epub_images") && filesize ("$config->documentroot$epub_images") > 1024) {
  if (!$newfilesys) {
    $array [] = "";
  }
  $array [] = "EPUB with images (experimental) <a href=\"$help#EPUB\" title=\"Explain EPUB.\">$helpicon</a>";
  $array [] = "";
  $array [] = "";
  $array [] = human_readable_size (filesize ("$config->documentroot$epub_images"));
  $array [] = "<a href=\"$epub_images\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main&nbsp;site</span></a>";
}

// mobi stuff

if (is_readable ("$config->documentroot$mobi") && filesize ("$config->documentroot$mobi") > 1024) {
  if (!$newfilesys) {
    $array [] = "";
  }
  $array [] = "MOBI (experimental) <a href=\"$help#MOBI\" title=\"Explain MOBI.\">$helpicon</a>";
  $array [] = "";
  $array [] = "";
  $array [] = human_readable_size (filesize ("$config->documentroot$mobi"));
  $array [] = "<a href=\"$mobi\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main&nbsp;site</span></a>";
}

if (is_readable ("$config->documentroot$mobi_images") && filesize ("$config->documentroot$mobi_images") > 1024) {
  if (!$newfilesys) {
    $array [] = "";
  }
  $array [] = "MOBI with images (experimental) <a href=\"$help#MOBI\" title=\"Explain MOBI.\">$helpicon</a>";
  $array [] = "";
  $array [] = "";
  $array [] = human_readable_size (filesize ("$config->documentroot$mobi_images"));
  $array [] = "<a href=\"$mobi_images\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main&nbsp;site</span></a>";
}


// plucker stuff

if ($offer_plucker) {
  if (!$newfilesys) {
    $array [] = "";
  }
  $array [] = "Plucker <a href=\"$help#Plucker\" title=\"Explain Plucker.\">$helpicon</a>";
  $array [] = "";
  $array [] = "";
  $array [] = "unknown";
  $array [] = "<a href=\"/cache/plucker/$fk_books/$fk_books\" 
title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main&nbsp;site</span></a>";

# gbn: mobile ebooks.  If Plucker conversion works, this should work, too:
  if (!$newfilesys) {
    $array [] = "";
  }
  $array [] = "Mobile eBooks <a href=\"$help#Mobile\" title=\"Explain Mobile.\">$helpicon</a>";
  $array [] = "";
  $array [] = "";
  $array [] = "unknown";
  $array [] = "<a href=\"mobile/mobile.php?fk_books=$fk_books\" 
title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main&nbsp;site</span></a>";

}

$table = new FilesTable ();
$table->summary = "Table of available file types and sizes.";
$table->toprows = $array;
$table->PrintTable ($db, "Formats Available For Download <a href=\"$help#Table:_Formats_Available_For_Download\" title=\"Explain this table.\">$helpicon</a>", "pgdbfiles");

echo ("</div>\n\n");

if ($offer_recode) {
  $recode_encoding = strtoupper ($recode_candidate->encoding);
  p ("\xC2\xB9 If you need a special character set, try our " . 
     "<a href=\"$config->world/recode?file=$recode_candidate->filename" . 
     "&amp;from=$recode_encoding\" rel=\"nofollow\">" . 
     "online recoding service</a>.");
}

pagefooter (0);

// implements a page cache
// if this page is viewed it will write a static version
// into the etext cache directory
// MultiViews and mod_rewrite then take care to serve
// the static page to the next requester

$cachedir  = "$config->documentroot/cache/bibrec/$fk_books";
umask (0);
mkdir ($cachedir);
$cachefile = "$cachedir/$fk_books.html.utf8";

$hd = fopen ($cachefile, "w");
if ($hd) {
  fwrite ($hd, $output);
  fclose ($hd);
}

$hd = gzopen ("$cachefile.gz", "w9");
if ($hd) {
  gzwrite ($hd, $output);
  gzclose ($hd);
}

exit ();

?>

From joey at joeysmith.com  Wed Jul 29 08:48:23 2009
From: joey at joeysmith.com (Joey Smith)
Date: Wed, 29 Jul 2009 09:48:23 -0600
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <20090729132708.GA31539@mail.pglaf.org>
References: <891262632.99249.1248790575502.JavaMail.mail@webmail05>
	<20090728143306.GA28986@mail.pglaf.org>
	<20090729002407.GA5389@joeysmith.com>
	<20090729132708.GA31539@mail.pglaf.org>
Message-ID: <20090729154823.GB5389@joeysmith.com>

On Wed, Jul 29, 2009 at 06:27:08AM -0700, Greg Newby wrote:
> Thanks for your interest :)
> 
> It isn't bundled up for download anywhere.  We'll probably need to wait
> for Marcello's return from vacation to provide details on how to add
> components like this.  The current system is modular & (I think)
> well-organized, but complex...including lots of stuff that readers never
> see (such as the cataloger interface and various programs that add new
> files).  Plus, as you know, there is a lot of stuff that is in the
> Wiki, rather than PHP.  The Wiki might be where new features could
> be added, or there might be modules "out there" that could make it easier. 
> 
> I did grab catalog/world/bibrec.php , where bibrecs like this are
> made:
>   http://www.gutenberg.org/etext/11
> 
> It is below.  This should give you an idea how where various things are
> tied in from the database, the on-disk cached records, and stuff that is
> generated on the fly.  The various .phh files it references (which cascade
> to include a whole bunch of stuff) are mostly for presentation (html
> and css), not functionality.
> 
> A bookshelf/shopping cart would probably be a brand new set of files,
> with just a little overlap with the existing php.  It would need to
> access the database, and presumably would need a table or two to keep
> track of bookshelf users & entries.  (Maybe a separate database...maybe
> part of the Wiki instead of a standalone set of PHP programs.)  Cookies,
> or something similar, could be used to track user sessions and their
> bookshelves/shopping carts/whatever, and add an entry to various pages
> at www.gutenberg.org for them to access it (sort of like a regular
> ecommerce site).


You know, now that I look at this code, I recall looking over this stuff
with Marcello once, years ago...doesn't look like it has changed much. I'll
drop a note to Marcello and wait to hear from him. Thanks, Greg!

From desrod at gnu-designs.com  Wed Jul 29 09:13:07 2009
From: desrod at gnu-designs.com (David A. Desrosiers)
Date: Wed, 29 Jul 2009 12:13:07 -0400
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <20090729154823.GB5389@joeysmith.com>
References: <891262632.99249.1248790575502.JavaMail.mail@webmail05>
	<20090728143306.GA28986@mail.pglaf.org>
	<20090729002407.GA5389@joeysmith.com>
	<20090729132708.GA31539@mail.pglaf.org>
	<20090729154823.GB5389@joeysmith.com>
Message-ID: <a82cdbb90907290913m156cfdcepfa72a2b4bf6f5898@mail.gmail.com>

On Wed, Jul 29, 2009 at 11:48 AM, Joey Smith<joey at joeysmith.com> wrote:
> You know, now that I look at this code, I recall looking over this stuff
> with Marcello once, years ago...doesn't look like it has changed much. I'll
> drop a note to Marcello and wait to hear from him.

That code could easily be 1/3 the size, but if it works... no need to
go breaking things. :)

From gbnewby at pglaf.org  Wed Jul 29 10:25:10 2009
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed, 29 Jul 2009 10:25:10 -0700
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com>
References: <891262632.99249.1248790575502.JavaMail.mail@webmail05>
	<20090728143306.GA28986@mail.pglaf.org>
	<a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com>
Message-ID: <20090729172510.GC8946@mail.pglaf.org>

On Tue, Jul 28, 2009 at 12:17:23PM -0400, David A. Desrosiers wrote:
> On Tue, Jul 28, 2009 at 10:33 AM, Greg Newby<gbnewby at pglaf.org> wrote:
> > A more general approach would be to let visitors to www.gutenberg.org
> > put their selected files (including those generated on-the-fly)
> > on a bookshelf (i.e., shopping cart), then download in one big file,
> > or several small ones.
> 
> If you're looking at it at that level, why not just offer some
> streaming audio of the books as well? You could do this very simply
> with any number of dozens of dynamic content streaming applications in
> whatever language you choose (Perl, PHP, Python, Java, etc.)

This is a good point.  I don't know why we don't have streaming,
especially since iBiblio does have streaming (I think).  

If you could suggest some software that seems likely to work
on the iBiblio server (Apache, PHP, Perl all on Linux; free),
especially that could just be dropped into bibrec.php that I sent
earlier, that would be a tremendous help.  

The funny part is that I get inquiries all the time via help@
on "how do I save an audio file locally?"  It seems the most
common audio listening experience is to download & play back
(perhaps with a delay for the download to complete), so people
are doing the same thing as streaming (i.e., immediate listening),
but needing to wait for the download to complete.  It would
be nice to offer streaming, instead.

  -- Greg


> I actually used one to demo for a DJ/Amtrak train conductor several
> months back. He wanted a way to pull the tags/artists out of his
> enormous mp3 collection, and in 15 minutes on the train (with 'net), I
> found one that would let him "radio-enable" his entire mp3 collection,
> including a web interface to stream, play, download, view, sort,
> browse all of the artists by collection, tag, album art, date, etc.
> all in Perl.
> 
> It should be a simple matter to have something similar latched onto
> the Gutenberg audio collection, so anyone can click on the audiobook
> to either download, stream, convert, etc. the book in whatever format
> they prefer.
> 
> Just an idea...

From Bowerbird at aol.com  Wed Jul 29 15:00:16 2009
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 29 Jul 2009 18:00:16 EDT
Subject: [gutvol-d] someone needs to
Message-ID: <d22.49174583.37a22070@aol.com>

someone needs to remount independently the scan-set from
every single public-domain book that google has scanned...

might that be project gutenberg?

if not, then someone else needs to do it.

because it needs to be done...

those books don't belong to google, or to sony, or to
barnes & noble, or to anyone else that google decides
to share them with.   those books belong to the public.

-bowerbird



**************
Hot Deals at Dell on Popular Laptops perfect for Back to 
School 
(http://pr.atwola.com/promoclk/100126575x1223106546x1201717234/aol?redir=http:%2F%2Faltfarm.mediaplex.com%2Fad%2Fck%2F12309%2D81939%2D1629%2D8)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090729/3760e5e3/attachment.html>

From i30817 at gmail.com  Wed Jul 29 15:11:04 2009
From: i30817 at gmail.com (Paulo Levi)
Date: Wed, 29 Jul 2009 23:11:04 +0100
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <20090729172510.GC8946@mail.pglaf.org>
References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> 
	<20090728143306.GA28986@mail.pglaf.org>
	<a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com> 
	<20090729172510.GC8946@mail.pglaf.org>
Message-ID: <212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com>

Tell me, please, if the gutenberg rtf index file, besides being
autogenerated, is also sorted.

What i mean is, i'm indexing parts of the file, and i gain a major
speed up of treating the file as:

a massive list of pgterms:etext definitions

followed by a (more massive) list of pgterms:file definitions

This allows the string comparisons i have to do to be lower (about
n/2), but at the cost that any etext record in between the second list
is not picked up.

That won't be changed, but is only for my peace of mind.
Also, in the pgterms:file records, are the records referring to the
same file consecutive ? I ask because if so, i could do the same sort
of filtering Aaron Cannon is doing in its dvd project, to speed up the
index some more and remove duplicates.
(If they aren't consecutive i would have to  issue queries  between
building the index to see if they were already inserted).

I have nothing against xpath, indeed i think the scanning of the file
in lucene already uses something similar. But i need free text
searches, and they have to be fast (i'm already experimenting with a
memory cache after the query too, and it works okish for my
application)

From i30817 at gmail.com  Wed Jul 29 15:13:01 2009
From: i30817 at gmail.com (Paulo Levi)
Date: Wed, 29 Jul 2009 23:13:01 +0100
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com>
References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> 
	<20090728143306.GA28986@mail.pglaf.org>
	<a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com> 
	<20090729172510.GC8946@mail.pglaf.org>
	<212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com>
Message-ID: <212322090907291513h478330a8h36701a67bea021d@mail.gmail.com>

Of course if they were sorted by priory say, most feature full free
format : html -> rtf -> text UTF-8 -> ASCII would be very nice too.

From hart at pobox.com  Wed Jul 29 18:49:02 2009
From: hart at pobox.com (Michael S. Hart)
Date: Wed, 29 Jul 2009 17:49:02 -0800 (AKDT)
Subject: [gutvol-d] How To Get and Use eBooks on Cell Phones
Message-ID: <alpine.DEB.1.00.0907291748440.12450@snowy.arsc.alaska.edu>


How To Get and Use eBooks on Cell Phones


This is a request for anyone to submit ideas for little "How To's"
for their particular brands and models of cell phones.

Nothing fancy to start off with, just the bare bones of how to get
the eBook into the phones and how to read them in comfortable ways
in terms of setting font size, zoom, or the like.

Eventually we hope to create "How To" files for each model, and in
ways that will encourage a greater usage of the nearly 4.5 billion
cell phones now in use, and to give some worthwhile uses to phones
that are no longer in service, but still work well as eReaders.

If your phone has WiFi built in, so much the better.

If nothing more, please just send the barest outline, and we would
hope to get others to fill it in to make it more user friendly.


Many Many Thanks!!!



Michael S. Hart
Founder
Project Gutenberg
Inventor of ebooks


If you ever do not get a prompt response, please resend, then
keep resending, I won't mind getting several copies per week.


From cannona at fireantproductions.com  Wed Jul 29 20:50:51 2009
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Wed, 29 Jul 2009 22:50:51 -0500
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <212322090907291513h478330a8h36701a67bea021d@mail.gmail.com>
References: <891262632.99249.1248790575502.JavaMail.mail@webmail05>
	<20090728143306.GA28986@mail.pglaf.org>
	<a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com>
	<20090729172510.GC8946@mail.pglaf.org>
	<212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com>
	<212322090907291513h478330a8h36701a67bea021d@mail.gmail.com>
Message-ID: <628c29180907292050o60dc29e0n2600a37ebcdf4677@mail.gmail.com>

Just to clarify, you wrote:
"Also, in the pgterms:file records, are the records referring to the
same file consecutive ? I ask because if so, i could do the same sort
of filtering Aaron Cannon is doing in its dvd project, to speed up the
index some more and remove duplicates.
(If they aren't consecutive i would have to ?issue queries ?between
building the index to see if they were already inserted)."

There are actually not really any duplicates that I am aware of in the
RDF catalog.  It's just that most books are in the archive in more
than one encoding or format.  What I have been filtering out are the
more lossey encodeings (like ASCII) when there is a less lossey one
available (like UTF-8).

As for the sorting, I don't know for sure, but it seems like the
current ordering is likely an artifact of the way the RDF was
generated.  Whether or not you want to rely on that never changing is
up to you.

I haven't followed the thread closely enough to know what you're
trying to do, but it sounds as though you might be using the RDF in a
way which it was never intended.  What I mean by that is you seem to
be trying to read directly from it like a database when someone does a
search, rather than loading the RDF into an actual database, and
reading from that.  Having just recently worked on a Python app which
parses the RDF into memory, I can tell you that parsing the XML is the
slowest part of the process, at least in my application.  Your mileage
may vary, but when you have to do tens-of-thousands of string
comparisons from a file which is roughly 100MB in size before you can
return a result in a web app (I'm assuming it's a web app), you're
likely going to have problems.

Good luck.

Aaron

On 7/29/09, Paulo Levi <i30817 at gmail.com> wrote:
> Of course if they were sorted by priory say, most feature full free
> format : html -> rtf -> text UTF-8 -> ASCII would be very nice too.
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From i30817 at gmail.com  Wed Jul 29 21:40:08 2009
From: i30817 at gmail.com (Paulo Levi)
Date: Thu, 30 Jul 2009 05:40:08 +0100
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <628c29180907292050o60dc29e0n2600a37ebcdf4677@mail.gmail.com>
References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> 
	<20090728143306.GA28986@mail.pglaf.org>
	<a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com> 
	<20090729172510.GC8946@mail.pglaf.org>
	<212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com> 
	<212322090907291513h478330a8h36701a67bea021d@mail.gmail.com> 
	<628c29180907292050o60dc29e0n2600a37ebcdf4677@mail.gmail.com>
Message-ID: <212322090907292140t454d0390r8a21253f6b41a0fa@mail.gmail.com>

But i am reading the rdf into a (file) database. That is more or less
what Lucene is.
What i am filtering is just what insert into the database, so that its
creation is faster / searches only on the fields that interest me.

Sure its a lot of code, that will break if the format changes, but it
reduced the creation step from 5 minutes or so to 40 seconds (this on
a fast dual-core computer - i shudder to think what would happen if a
user tried to re-index in a 1000 hertz machine).

The index is at about 33.5 mb, and should compress into < 10mb.
Probably enough to be included into the application.

From schultzk at uni-trier.de  Wed Jul 29 22:55:01 2009
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Thu, 30 Jul 2009 07:55:01 +0200
Subject: [gutvol-d] Re: How To Get and Use eBooks on Cell Phones
In-Reply-To: <alpine.DEB.1.00.0907291748440.12450@snowy.arsc.alaska.edu>
References: <alpine.DEB.1.00.0907291748440.12450@snowy.arsc.alaska.edu>
Message-ID: <D9159201-1785-4F9E-ACBC-2894584F143A@uni-trier.de>

Hi Micheal,

	The problem is a slightly more complicated than you
	state, though you probably know that!

	1) What programs do you have on the cell phone.
	2) What is availible for free! can be still gotten.
	3) What software/hardwarte for uploading to the cell phone
	4) If no WifI(or connection availible) what harware/software the user  
has

	If a cell phone has Wifi it is a relatively new phone. so it will be  
able
	to handle html, most likely PDF, some rtf, doc, it all depends.


	regards
		Keith.

Am 30.07.2009 um 03:49 schrieb Michael S. Hart:

>
> How To Get and Use eBooks on Cell Phones
>
>
> This is a request for anyone to submit ideas for little "How To's"
> for their particular brands and models of cell phones.
>
> Nothing fancy to start off with, just the bare bones of how to get
> the eBook into the phones and how to read them in comfortable ways
> in terms of setting font size, zoom, or the like.
>
> Eventually we hope to create "How To" files for each model, and in
> ways that will encourage a greater usage of the nearly 4.5 billion
> cell phones now in use, and to give some worthwhile uses to phones
> that are no longer in service, but still work well as eReaders.
>
> If your phone has WiFi built in, so much the better.
>
> If nothing more, please just send the barest outline, and we would
> hope to get others to fill it in to make it more user friendly.
>
>
> Many Many Thanks!!!
>
>
>
> Michael S. Hart
> Founder
> Project Gutenberg
> Inventor of ebooks
>
>
> If you ever do not get a prompt response, please resend, then
> keep resending, I won't mind getting several copies per week.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d


From schultzk at uni-trier.de  Wed Jul 29 23:03:06 2009
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Thu, 30 Jul 2009 08:03:06 +0200
Subject: [gutvol-d] Re: someone needs to
In-Reply-To: <d22.49174583.37a22070@aol.com>
References: <d22.49174583.37a22070@aol.com>
Message-ID: <FBEBAD1D-AAA5-467F-B3F4-9DDD29CAE1E3@uni-trier.de>

Tsk, Tsk, Tsk,

	Bower you should know better.

	Though I agree with you morally, but ...

	Shakespeare is public domain! But, is I scan say the
	folios, it is mine to do with it I want to and use what
	whatever copyright I personally care for!! The same
	goes for google.

	Of course it would be nice to have these scans.

	Just on a side nopte I thought you thought the google
	scans were not that good according to you!??

	regards
		Keith

Am 30.07.2009 um 00:00 schrieb Bowerbird at aol.com:

> someone needs to remount independently the scan-set from
> every single public-domain book that google has scanned...
>
> might that be project gutenberg?
>
> if not, then someone else needs to do it.
>
> because it needs to be done...
>
> those books don't belong to google, or to sony, or to
> barnes & noble, or to anyone else that google decides
> to share them with.  those books belong to the public.
>
> -bowerbird
>
>
>
> **************
> Hot Deals at Dell on Popular Laptops perfect for Back to School (http://pr.atwola.com/promoclk/100126575x1223106546x1201717234/aol?redir=http:%2F%2Faltfarm.mediaplex.com%2Fad%2Fck%2F12309%2D81939%2D1629%2D8 
> ) _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090730/e7b0f016/attachment.html>

From schultzk at uni-trier.de  Wed Jul 29 23:26:17 2009
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Thu, 30 Jul 2009 08:26:17 +0200
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <212322090907292140t454d0390r8a21253f6b41a0fa@mail.gmail.com>
References: <891262632.99249.1248790575502.JavaMail.mail@webmail05>
	<20090728143306.GA28986@mail.pglaf.org>
	<a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com>
	<20090729172510.GC8946@mail.pglaf.org>
	<212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com>
	<212322090907291513h478330a8h36701a67bea021d@mail.gmail.com>
	<628c29180907292050o60dc29e0n2600a37ebcdf4677@mail.gmail.com>
	<212322090907292140t454d0390r8a21253f6b41a0fa@mail.gmail.com>
Message-ID: <C1FEBF04-128F-419D-9AEA-0AA8B25A1B8E@uni-trier.de>

Hi Paulo,


Am 30.07.2009 um 06:40 schrieb Paulo Levi:

> But i am reading the rdf into a (file) database. That is more or less
> what Lucene is.
> What i am filtering is just what insert into the database, so that its
> creation is faster / searches only on the fields that interest me.
>
> Sure its a lot of code, that will break if the format changes, but it
	If you program modularely that would be no problem

> reduced the creation step from 5 minutes or so to 40 seconds (this on
	If you are filtering and just get a factor of 13 I said it your  
system that
	is slow. If I remember correctly you are just request?ng the certain
	information so somebody else is doing the work!

> a fast dual-core computer - i shudder to think what would happen if a
> user tried to re-index in a 1000 hertz machine).
	Let's see. My Mac SE was a 1 Mega Hertz machine. That was twenty years
	ago. It would handle something like this in about ten minutes. I do  
not know
	what dbase system I was using.

>
> The index is at about 33.5 mb, and should compress into < 10mb.
> Probably enough to be included into the application.
	Hardcoding data of that size into the program is not
	feasible. Though most newer computers can load it into
	memory quite quickly. gives you a factor of 100 if everything
	is in memory that is why perl is so fast.

regard
	Keith



From joey at joeysmith.com  Wed Jul 29 23:27:03 2009
From: joey at joeysmith.com (Joey Smith)
Date: Thu, 30 Jul 2009 00:27:03 -0600
Subject: [gutvol-d] Re: someone needs to
In-Reply-To: <FBEBAD1D-AAA5-467F-B3F4-9DDD29CAE1E3@uni-trier.de>
References: <d22.49174583.37a22070@aol.com>
	<FBEBAD1D-AAA5-467F-B3F4-9DDD29CAE1E3@uni-trier.de>
Message-ID: <20090730062703.GE5389@joeysmith.com>

On Thu, Jul 30, 2009 at 08:03:06AM +0200, Keith J. Schultz wrote:
> Tsk, Tsk, Tsk,
>
> 	Bower you should know better.
>
> 	Though I agree with you morally, but ...
>
> 	Shakespeare is public domain! But, is I scan say the
> 	folios, it is mine to do with it I want to and use what
> 	whatever copyright I personally care for!! The same
> 	goes for google.
>
> 	Of course it would be nice to have these scans.
>
> 	Just on a side nopte I thought you thought the google
> 	scans were not that good according to you!??
>
> 	regards
> 		Keith
>

Wouldn't this amount to a "sweat of the brow" copyright? As I understand it,
the USA (at least) has rejected the concept of "sweat of the brow"
copyright. Or am I missing something?

From hart at pobox.com  Thu Jul 30 04:37:54 2009
From: hart at pobox.com (Michael S. Hart)
Date: Thu, 30 Jul 2009 03:37:54 -0800 (AKDT)
Subject: [gutvol-d] Re: someone needs to
In-Reply-To: <FBEBAD1D-AAA5-467F-B3F4-9DDD29CAE1E3@uni-trier.de>
References: <d22.49174583.37a22070@aol.com>
	<FBEBAD1D-AAA5-467F-B3F4-9DDD29CAE1E3@uni-trier.de>
Message-ID: <alpine.DEB.1.00.0907300333230.30609@snowy.arsc.alaska.edu>


No. . .under U.S. law just scanning, xeroxing, photographing or
otherwise reproducing a two-dimensional object, two-dimensional
result that closely resembles the original. . .no copyrights!!!

There must be intellectual input to get a new copyright. . . .

However, this might not be true in other countries where sweat
of the brow, as it is called, gets you a new copyright, so the
answer could be yes for Mr. Schultz and no for Mr. Bowerbird--
or vice versa, depending on how you ask the question.


mh



On Thu, 30 Jul 2009, Keith J. Schultz wrote:

> Tsk, Tsk, Tsk,
>
> 	Bower you should know better.
>
> 	Though I agree with you morally, but ...
>
> 	Shakespeare is public domain! But, is I scan say the
> 	folios, it is mine to do with it I want to and use what
> 	whatever copyright I personally care for!! The same
> 	goes for google.
>
> 	Of course it would be nice to have these scans.
>
> 	Just on a side nopte I thought you thought the google
> 	scans were not that good according to you!??
>
> 	regards
> 		Keith
>
> Am 30.07.2009 um 00:00 schrieb Bowerbird at aol.com:
>
> > someone needs to remount independently the scan-set from
> > every single public-domain book that google has scanned...
> >
> > might that be project gutenberg?
> >
> > if not, then someone else needs to do it.
> >
> > because it needs to be done...
> >
> > those books don't belong to google, or to sony, or to
> > barnes & noble, or to anyone else that google decides
> > to share them with.  those books belong to the public.
> >
> > -bowerbird
> >
> >
> >
> > **************
> > Hot Deals at Dell on Popular Laptops perfect for Back to School
> > (http://pr.atwola.com/promoclk/100126575x1223106546x1201717234/aol?redir=http:%2F%2Faltfarm.mediaplex.com%2Fad%2Fck%2F12309%2D81939%2D1629%2D8)
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d at lists.pglaf.org
> > http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From i30817 at gmail.com  Thu Jul 30 10:18:50 2009
From: i30817 at gmail.com (Paulo Levi)
Date: Thu, 30 Jul 2009 18:18:50 +0100
Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg
In-Reply-To: <C1FEBF04-128F-419D-9AEA-0AA8B25A1B8E@uni-trier.de>
References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> 
	<20090728143306.GA28986@mail.pglaf.org>
	<a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com> 
	<20090729172510.GC8946@mail.pglaf.org>
	<212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com> 
	<212322090907291513h478330a8h36701a67bea021d@mail.gmail.com> 
	<628c29180907292050o60dc29e0n2600a37ebcdf4677@mail.gmail.com> 
	<212322090907292140t454d0390r8a21253f6b41a0fa@mail.gmail.com> 
	<C1FEBF04-128F-419D-9AEA-0AA8B25A1B8E@uni-trier.de>
Message-ID: <212322090907301018u65b18eefgc7854e38aa2ada67@mail.gmail.com>

>> reduced the creation step from 5 minutes or so to 40 seconds (this on
>
> ? ? ? ?If you are filtering and just get a factor of 13 I said it your
> system that
> ? ? ? ?is slow. If I remember correctly you are just request?ng the certain
> ? ? ? ?information so somebody else is doing the work!
>

It's not a server application, so the client is (potentially) doing
the indexing if he want to update the catalog.
Its the indexing that takes 40 s.


>
>>
>> The index is at about 33.5 mb, and should compress into < 10mb.
>> Probably enough to be included into the application.
>
> ? ? ? ?Hardcoding data of that size into the program is not
> ? ? ? ?feasible. Though most newer computers can load it into
> ? ? ? ?memory quite quickly. gives you a factor of 100 if everything
> ? ? ? ?is in memory that is why perl is so fast.

Including everything in memory would more than double my program heap,
and don't forget that this is a java application, so it would never be
released before it ending (or at least a sub process ending). Besides,
as lucene uses files, i think i can't use the in memory index to
search the rdf (using LuceneSail, that uses Sesame and Lucene)

From Bowerbird at aol.com  Thu Jul 30 23:50:18 2009
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 31 Jul 2009 02:50:18 EDT
Subject: [gutvol-d] brewster kahle responds --  someone needs to (fwd)
Message-ID: <cdd.5592ca09.37a3ee2a@aol.com>

michael hart forwarded my post to brewster kahle, who responded:

>    bowerbird--
>
>    some very active volunteers have been taking the downloadable 
>    pdf's from google's site and uploading them to the archive.? 
>    the archive stores these, OCR's them to make them searchable, 
>    and if someone wants the pdf-- points them back to google.
>
>    we would like to see more of this.? if there are volunteers 
>    to expand this program, we would like to play our part.
>
>    -brewster


i admit the internet archive has fallen off my radar somewhat
-- ever since i was banned from their listserves because i had
the temerity to continue complaining about their o.c.r. flaws --
but i should have remembered that they are indeed remounting
google's public-domain scans, and deserve kudos for doing so.

so thank you, brewster, and internet archive.

"universal access to knowledge" has always appealed to me,
by virtue of its deep simplicity, but i find that more and more,
lately, brewster, i am chagrined that google has subverted it...
instead of a global library, we're getting a global book-store...

and "free to all" increasingly means "dig out your wallet"...

-bowerbird



**************
An Excellent Credit Score is 750. See Yours in Just 2 Easy 
Steps! 
(http://pr.atwola.com/promoclk/100126575x1222846709x1201493018/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072&amp;hmpgID=62&amp;
bcd=JulyExcfooterNO62)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090731/ff7a0a6d/attachment-0001.html>

From gbnewby at pglaf.org  Fri Jul 31 12:28:49 2009
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri, 31 Jul 2009 12:28:49 -0700
Subject: [gutvol-d] Re: [Fwd: Re:  someone needs to (fwd)] (fwd)
In-Reply-To: <alpine.DEB.1.00.0907301823030.25934@snowy.arsc.alaska.edu>
References: <alpine.DEB.1.00.0907301823030.25934@snowy.arsc.alaska.edu>
Message-ID: <20090731192849.GA17410@mail.pglaf.org>

Forwarding from Brewster Kahle:

> From: Brewster Kahle <brewster at archive.org>
> To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com
> CC: "Michael S. Hart" <hart at pglaf.org>, John Guagliardo <john at gutenberg.cc>
> Subject: Re: [gutvol-d] someone needs to (fwd)
> 
>
> bowerbird--
>
> some very active volunteers have been taking the downloadeable pdf's  
> from google's site and uploading them to the archive.  the archive  
> stores these, OCR's them to make them searchable, and if someone wants  
> the pdf-- points them back to google.
>
> we would like to see more of this.  if there are volunteers to expand  
> this program, we would like to play our part.
>
> -brewster
>
>
>
> Michael S. Hart wrote:
>>
>> ---------- Forwarded message ----------
>> Date: Wed, 29 Jul 2009 18:00:16 EDT
>> From: Bowerbird at aol.com
>> Reply-To: Project Gutenberg Volunteer Discussion <gutvol-d at lists.pglaf.org>
>> To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com
>> Subject: [gutvol-d] someone needs to
>>
>> someone needs to remount independently the scan-set from
>> every single public-domain book that google has scanned...
>>
>> might that be project gutenberg?
>>
>> if not, then someone else needs to do it.
>>
>> because it needs to be done...
>>
>> those books don't belong to google, or to sony, or to
>> barnes & noble, or to anyone else that google decides
>> to share them with.   those books belong to the public.
>>
>> -bowerbird

From hart at pobox.com  Fri Jul 31 19:49:17 2009
From: hart at pobox.com (Michael S. Hart)
Date: Fri, 31 Jul 2009 18:49:17 -0800 (AKDT)
Subject: [gutvol-d] 1984 and Animal Farm
Message-ID: <alpine.DEB.1.00.0907311849000.4445@snowy.arsc.alaska.edu>


Does anyone know about how many Kindle copies were erased?


thanks!!!


Michael

From jimad at msn.com  Fri Jul 31 20:36:02 2009
From: jimad at msn.com (Jim Adcock)
Date: Fri, 31 Jul 2009 20:36:02 -0700
Subject: [gutvol-d] Re: 1984 and Animal Farm
In-Reply-To: <alpine.DEB.1.00.0907311849000.4445@snowy.arsc.alaska.edu>
References: <alpine.DEB.1.00.0907311849000.4445@snowy.arsc.alaska.edu>
Message-ID: <BAY120-DAV2AA5FC2B2C54CD4529123AE110@phx.gbl>

I can't find a count of books sold anywhere, but here is an interesting
apology from Bezos:

http://tinyurl.com/m3c9z5



From jimad at msn.com  Fri Jul 31 20:40:30 2009
From: jimad at msn.com (Jim Adcock)
Date: Fri, 31 Jul 2009 20:40:30 -0700
Subject: [gutvol-d] Re: 1984 and Animal Farm
In-Reply-To: <alpine.DEB.1.00.0907311849000.4445@snowy.arsc.alaska.edu>
References: <alpine.DEB.1.00.0907311849000.4445@snowy.arsc.alaska.edu>
Message-ID: <BAY120-DAV10105319ABB9FDA54AFC45AE110@phx.gbl>


And here's a copy of the "Kindle Ate My Homework" Lawsuit:

http://tinyurl.com/n9tm6s