From phil at thalasson.com Fri Jul 3 13:46:31 2009 From: phil at thalasson.com (Philip Baker) Date: Fri, 3 Jul 2009 21:46:31 +0100 Subject: [gutvol-d] Re: Has anyone noticed that the gutindex files for this year haven't been updated since January? In-Reply-To: <15cfa2a50906301405r449a85amf884edb102d77ad0@mail.gmail.com> References: <dc3e58a00906261143t5ef4a9ddh84433343e0c7d782@mail.gmail.com> <Pine.GSO.4.58.0906262103200.9881@vtn1.victoria.tc.ca> <2A81022B-1955-4BB2-A3AE-E1E3062A900B@uni-trier.de> <dc3e58a00906271928y69d8db13k76572d9e8a3474ed@mail.gmail.com> <30447D202735492EBF4302C4D4F96955@alp2400> <Pine.GSO.4.58.0906272123440.25637@vtn1.victoria.tc.ca> <alpine.DEB.1.00.0906272107580.6821@snowy.arsc.alaska.edu> <dc3e58a00906272213q3222c1e4hf2b0d2727292e173@mail.gmail.com> <5E822D6C-65B2-11DE-BE50-000D93B743B8@thalasson.com> <15cfa2a50906301405r449a85amf884edb102d77ad0@mail.gmail.com> Message-ID: <96BEBE7C-6812-11DE-85B8-000D93B743B8@thalasson.com> On 30 Jun 2009, at 22:05, Robert Cicconetti wrote: > > On Tue, Jun 30, 2009 at 4:12 PM, Philip Baker <phil at thalasson.com> > wrote: > There are some gaps - the most recent emails, and a big hole - 27825 > to 28013 where I do not have the posted emails. Where is the archive? > GUTINDEX.ALL goes up to 27930. > > > The archive is at http://lists.pglaf.org/pipermail/posted/ > > However, there is a gap due to technical problems earlier this year. > > I have the posted emails from that period and can send them to you, > if you want. > > /me rummages through mail folders. > Thanks for the offer but don't do anything, at least for the moment. I am going to look at the possibility of using the RDF file for what I want. Philip Baker From Bowerbird at aol.com Fri Jul 3 16:26:44 2009 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 3 Jul 2009 19:26:44 EDT Subject: [gutvol-d] happy birthday, project gutenberg Message-ID: <c69.4befaf0f.377fedb4@aol.com> so let me be among the first to say "happy b-day" to project gutenberg on this sunny 4th-of-july weekend. the longevity of this cyberlibrary is due to the intelligent vision of its founder michael hart who saw the power of keep-it-simple philosophy with insistence on plain text, baby... so it is not without massive irony that we mark a new development: the w3c working group for xhtml2 will stop work at the end of 2009... > http://www.w3.org/News/2009#item119 instead, there'll be a new emphasis on html5 -- as "the future of html". in other words... all those technocrats telling you that "xml is the future" for the last decade were just plain flat-out purely wrong. good thing you didn't listen to 'em, eh? thanks, michael, for your vision and for the tenacity of your persistence... -bowerbird ************** A Good Credit Score is 700 or Above. See yours in just 2 easy steps! (http://pr.atwola.com/promoclk/100126575x1222585087x1201462804/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072&hmpgID=62& bcd=JulystepsfooterNO62) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090703/ea6f898c/attachment.html> From cannona at fireantproductions.com Thu Jul 16 17:06:52 2009 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu, 16 Jul 2009 19:06:52 -0500 Subject: [gutvol-d] A new DVD? Message-ID: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com> Hi all. I'm sorry for disappearing for a while. I've been dealing with some health issues, work, and school, but mainly health. Anyway, I think it may be time to create a new DVD. The latest one is 3 years old this month. However, the project has obviously grown drastically in that time, so I'm wondering if one DVD still makes sense? The drawbacks to creating a 2 DVD collection that immediately come to mind are: 1. In the DVD/CD mailing project, we have consistently been sending two copies of the DVD, or a DVD and CD. This will either have to change, or we will have to pay more for postage. 2. It takes twice as long to download. On the other hand, if you can download 4GB, it's probably not that big of a stretch to download 8. 3. It's not as elegant as one DVD. This is probably the least important, but just thought I'd mention it as it might prove important to some. It is of course possible to stick with just one DVD, but it will require leaving out a lot. Any thoughts/ideas? Aaron From cannona at fireantproductions.com Thu Jul 16 17:50:49 2009 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu, 16 Jul 2009 19:50:49 -0500 Subject: [gutvol-d] Re: !@! Re: A new DVD? In-Reply-To: <alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu> References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com> <alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu> Message-ID: <628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com> Hi Michael. You are probably right about the CDs. When CDs are requested, we always send two. I don't think I know who Richard Seltzer is or what he does. Like you, I have been watching dual layer DVDs, but the price is still quite high. Shop4tech.com has them for $2.50. It doesn't make sense considering that one single layer DVD costs just $0.26. Thanks. Aaron On 7/16/09, Michael S. Hart <hart at pobox.com> wrote: > > > First thoughts: > > I've not sure there is any need to send CDs except when requested, > and they will be, probably by those who need them most, but we are > here and now living mostly in an age DVDs, so send two of them. > > When they ask for CDs, send two of those. . .even if the same one. > > We can also refer people to Richard Seltzer. > > I should also mention dual layered DVDs, but every time I look the > price still seems too high. > > More thoughts? > > MH > > > > On Thu, 16 Jul 2009, Aaron Cannon wrote: > >> Hi all. >> >> I'm sorry for disappearing for a while. I've been dealing with some >> health issues, work, and school, but mainly health. Anyway, I think >> it may be time to create a new DVD. The latest one is 3 years old >> this month. However, the project has obviously grown drastically in >> that time, so I'm wondering if one DVD still makes sense? >> >> The drawbacks to creating a 2 DVD collection that immediately come to mind >> are: >> 1. In the DVD/CD mailing project, we have consistently been sending >> two copies of the DVD, or a DVD and CD. This will either have to >> change, or we will have to pay more for postage. >> 2. It takes twice as long to download. On the other hand, if you can >> download 4GB, it's probably not that big of a stretch to download 8. >> 3. It's not as elegant as one DVD. This is probably the least >> important, but just thought I'd mention it as it might prove important >> to some. >> >> It is of course possible to stick with just one DVD, but it will >> require leaving out a lot. >> >> Any thoughts/ideas? >> >> Aaron >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/mailman/listinfo/gutvol-d >> > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From prosfilaes at gmail.com Thu Jul 16 18:15:40 2009 From: prosfilaes at gmail.com (David Starner) Date: Thu, 16 Jul 2009 21:15:40 -0400 Subject: [gutvol-d] Re: !@! Re: A new DVD? In-Reply-To: <628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com> References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com> <alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu> <628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com> Message-ID: <6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com> On Thu, Jul 16, 2009 at 8:50 PM, Aaron Cannon<cannona at fireantproductions.com> wrote: > Like you, I have been watching dual layer DVDs, but the price is still > quite high. ?Shop4tech.com has them for $2.50. ?It doesn't make sense > considering that one single layer DVD costs just $0.26. It's not about simple production costs; it's about the value of producing in quantity. Everyone and their brother have single layer DVD drives, so the media is produced in the billions. Dual layer drives are much rarer, so the media isn't mass-produced in the same quantities. However, looking at shop4tech.com, I'm seeing several offers of dual layer DVDs at ~$1.00 a DVD; are those not suitable for us for some reason? -- Kie ekzistas vivo, ekzistas espero. From cannona at fireantproductions.com Thu Jul 16 18:28:23 2009 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu, 16 Jul 2009 20:28:23 -0500 Subject: [gutvol-d] Re: !@! Re: A new DVD? In-Reply-To: <6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com> References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com> <alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu> <628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com> <6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com> Message-ID: <628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com> Hi David. There's no reason that media wouldn't work. I apparently just didn't look hard enough. However, the other problem with publishing a dual layer DVD is that not as many people would be able to burn it, because, as you say, there aren't as many burners out there that can do dual layers. Other thoughts? Aaron On 7/16/09, David Starner <prosfilaes at gmail.com> wrote: > On Thu, Jul 16, 2009 at 8:50 PM, Aaron > Cannon<cannona at fireantproductions.com> wrote: >> Like you, I have been watching dual layer DVDs, but the price is still >> quite high. ?Shop4tech.com has them for $2.50. ?It doesn't make sense >> considering that one single layer DVD costs just $0.26. > > It's not about simple production costs; it's about the value of > producing in quantity. Everyone and their brother have single layer > DVD drives, so the media is produced in the billions. Dual layer > drives are much rarer, so the media isn't mass-produced in the same > quantities. However, looking at shop4tech.com, I'm seeing several > offers of dual layer DVDs at ~$1.00 a DVD; are those not suitable for > us for some reason? > > -- > Kie ekzistas vivo, ekzistas espero. > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From gbnewby at pglaf.org Thu Jul 16 22:15:37 2009 From: gbnewby at pglaf.org (Greg Newby) Date: Thu, 16 Jul 2009 22:15:37 -0700 Subject: [gutvol-d] Re: !@! Re: A new DVD? In-Reply-To: <628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com> References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com> <alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu> <628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com> <6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com> <628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com> Message-ID: <20090717051537.GB31087@mail.pglaf.org> On Thu, Jul 16, 2009 at 08:28:23PM -0500, Aaron Cannon wrote: > Hi David. > > There's no reason that media wouldn't work. I apparently just didn't > look hard enough. However, the other problem with publishing a dual > layer DVD is that not as many people would be able to burn it, > because, as you say, there aren't as many burners out there that can > do dual layers. > > Other thoughts? > > Aaron It's time for a new DVD. The current DVD has the majority of all the PG content as .zip txt, plus a variety of other content in other formats. So, to do the same thing today would take somewhat more space (I don't know how much). It seems many modern drives can read dual-layer discs. Has anyone seen statistics on this? I think we could also releas the dual-layer content as a pair of DVD images, for those who would prefer it that way. It would allow us to retire the older DVD image. In fact, I would probably start a new dual-layer DVD image with the full contents of the "best of" CD (updated to reflect changes to the eBooks since then...maybe with a new call for "best of" nominations). These days, it seems fair to have only DVDs, not CDs. We can certainly afford to purchase a handful of external or internal dual-layer DVD writers for people willing to do the burning. Media are more expensive, but as David mentioned, we can shop around and buy in bulk to help offset costs. Generally, the CD/DVD giveaways have paid for themselves in returned donations, so I suspect they will remain self-supporting even if costs go up - we just need to ask, when discs are sent. -- Greg > On 7/16/09, David Starner <prosfilaes at gmail.com> wrote: > > On Thu, Jul 16, 2009 at 8:50 PM, Aaron > > Cannon<cannona at fireantproductions.com> wrote: > >> Like you, I have been watching dual layer DVDs, but the price is still > >> quite high. ?Shop4tech.com has them for $2.50. ?It doesn't make sense > >> considering that one single layer DVD costs just $0.26. > > > > It's not about simple production costs; it's about the value of > > producing in quantity. Everyone and their brother have single layer > > DVD drives, so the media is produced in the billions. Dual layer > > drives are much rarer, so the media isn't mass-produced in the same > > quantities. However, looking at shop4tech.com, I'm seeing several > > offers of dual layer DVDs at ~$1.00 a DVD; are those not suitable for > > us for some reason? > > > > -- > > Kie ekzistas vivo, ekzistas espero. > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d at lists.pglaf.org > > http://lists.pglaf.org/mailman/listinfo/gutvol-d > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d From schultzk at uni-trier.de Thu Jul 16 23:49:02 2009 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Fri, 17 Jul 2009 08:49:02 +0200 Subject: [gutvol-d] Re: !@! Re: A new DVD? In-Reply-To: <20090717051537.GB31087@mail.pglaf.org> References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com> <alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu> <628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com> <6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com> <628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com> <20090717051537.GB31087@mail.pglaf.org> Message-ID: <EAF6B4D3-3F9D-488F-A7D1-CE8DA4920DE1@uni-trier.de> Hi All, Actually, there are more DL-burners out thier than you think. As mentioned most should be able to read DL-DVDs. One thought though: If somebody is willing to download the 8 GB whether DL or two DVDs I do not think they will burn it. At least I have not burned the images I have. I have the images on my drive and just mount it when I need it. No fuss with having to carry a DVD along. It is also, possible to put the image or its content on a USB-stick. Second, if the users can burn a DVD they could divide it themselves. With a carefully crafted index.html and a little javascript magic one could easily divide the content among two DVDs. All they need to do edit the index.html in one place for the two DVDs. Another approach would be to use two directories for the content containing say DVD1 and DVD2. Then if somebody has a DL-burner s/he can you that and who only can burn single layer can do that. I see no big problem using DL-images. Then again I maybe to savvy. Personally I would prefer an image with all the PG-content zipped. I use to ftp the PG directories. But, at one point the guys at the unversity ask to me to take it easy because one day I had effectively used almost all bandwidth by using "get -r *.*". That was a long time ago, maybe I should try that again. All aside, I would suggest using the second model containing directories, this way in another couple of years we can use the same model for even larger images and we do not need to bother with what type of burner the user has whether it be single, DL or even Blue-ray, or whatever might appear. regards Keith. From cannona at fireantproductions.com Fri Jul 17 05:45:16 2009 From: cannona at fireantproductions.com (Aaron Cannon) Date: Fri, 17 Jul 2009 07:45:16 -0500 Subject: [gutvol-d] Re: !@! Re: A new DVD? In-Reply-To: <EAF6B4D3-3F9D-488F-A7D1-CE8DA4920DE1@uni-trier.de> References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com> <alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu> <628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com> <6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com> <628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com> <20090717051537.GB31087@mail.pglaf.org> <EAF6B4D3-3F9D-488F-A7D1-CE8DA4920DE1@uni-trier.de> Message-ID: <628c29180907170545i48bb1ccavb818de49dfec81ad@mail.gmail.com> Hi Greg, Keith and all. I found this quote at http://www.burnworld.com/howto/articles/intro-to-dual-layer.htm: "Dual layer DVD recordable discs offer up to four hours of high quality MPEG-2 video, or up to 8.5GB of data on a single-sided disc with two individual recordable ?layers.? Dual layer capable recorders will have the ability to record on the new dual layer DVD recordable discs, as well as on traditional single layer DVD discs and CDs too. Want more? Because a recorded dual layer DVD disc is compliant with the DVD9 specification, the discs are compatible with most consumer DVD players and computer DVD-ROM drives already installed in the market." It reads as if it were written before DL burners became available (or shortly after the first ones were released), so hopefully the author knows of what he speaks. My initial tests have shown that just the zipped text content of all the books (excluding all HTML, and non-ebooks except for a little sheet music, and also excluding some of the larger data files such as the HGP files and the numbers) weighs in at about 5.5 GB. This is also excluding ASCII encoded files when UTF-8 or ISO-8859-X files are available. This does not exclude any images that were included in the zip files with the text version, nor does it exclude any copyrighted texts. 5.5 GB leaves us a good 3 GB more to play with. I think it would make more sense to offer the dual layer DVD ISO, and also offer two single-layer DVD .iso images for folks with only a single layer DVD burner. A lot of people who have emailed us in the past have had a hard time just burning the .ISO. If possible, I would like to keep things as simple as we can for them. I'll have to check to see if PG's 11-disc burner can handle dual layer drives, or if by any chance it already has such drives installed. It might take a firmware upgrade, but I would be surprised if it can't at least use DL drives. Greg, do you by chance have an easily accessible record of what model of duplicator we bought? If not, I can open it up and check the model numbers on the controler. I just don't have the email anymore, and there aren't any labels on the outside of the case. Thanks. Aaron On 7/17/09, Keith J. Schultz <schultzk at uni-trier.de> wrote: > Hi All, > > Actually, there are more DL-burners out thier than you think. As > mentioned > most should be able to read DL-DVDs. > > One thought though: If somebody is willing to download the 8 GB whether > DL or two DVDs I do not think they will burn it. At least I have not > burned > the images I have. I have the images on my drive and just mount it > when I need > it. No fuss with having to carry a DVD along. It is also, possible to > put the image > or its content on a USB-stick. > > Second, if the users can burn a DVD they could divide it themselves. > With a carefully > crafted index.html and a little javascript magic one could easily > divide the content among > two DVDs. All they need to do edit the index.html in one place for > the two DVDs. > Another approach would be to use two directories for the content > containing say > DVD1 and DVD2. Then if somebody has a DL-burner s/he can you that > and who only can burn single layer can do that. > > I see no big problem using DL-images. Then again I maybe to savvy. > Personally I would > prefer an image with all the PG-content zipped. I use to ftp the PG > directories. But, at one point > the guys at the unversity ask to me to take it easy because one day I > had effectively > used almost all bandwidth by using "get -r *.*". That was a long > time ago, maybe I should try > that again. > > All aside, I would suggest using the second model containing > directories, this way in another > couple of years we can use the same model for even larger images and > we do not need > to bother with what type of burner the user has whether it be single, > DL or even Blue-ray, or > whatever might appear. > > > regards > Keith. > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From cannona at fireantproductions.com Fri Jul 17 15:15:54 2009 From: cannona at fireantproductions.com (Aaron Cannon) Date: Fri, 17 Jul 2009 17:15:54 -0500 Subject: [gutvol-d] Re: !@! Re: A new DVD? In-Reply-To: <628c29180907170545i48bb1ccavb818de49dfec81ad@mail.gmail.com> References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com> <alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu> <628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com> <6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com> <628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com> <20090717051537.GB31087@mail.pglaf.org> <EAF6B4D3-3F9D-488F-A7D1-CE8DA4920DE1@uni-trier.de> <628c29180907170545i48bb1ccavb818de49dfec81ad@mail.gmail.com> Message-ID: <628c29180907171515o760354ack47b731478b97db6a@mail.gmail.com> I was mistaken. It actually weighs in at 4.5. So, it does in fact fit on a single layer DVD (verified with Nero). In fact, we have about 87MB to spare. So, in light of this new information, the question is whether we want to create a DL DVD or not. Thoughts? Aaron On 7/17/09, Aaron Cannon <cannona at fireantproductions.com> wrote: > Hi Greg, Keith and all. > > I found this quote at > http://www.burnworld.com/howto/articles/intro-to-dual-layer.htm: > > "Dual layer DVD recordable discs offer up to four hours of high > quality MPEG-2 video, or up to 8.5GB of data on a single-sided disc > with two individual recordable ?layers.? Dual layer capable recorders > will have the ability to record on the new dual layer DVD recordable > discs, as well as on traditional single layer DVD discs and CDs too. > Want more? Because a recorded dual layer DVD disc is compliant with > the DVD9 specification, the discs are compatible with most consumer > DVD players and computer DVD-ROM drives already installed in the > market." > > It reads as if it were written before DL burners became available (or > shortly after the first ones were released), so hopefully the author > knows of what he speaks. > > My initial tests have shown that just the zipped text content of all > the books (excluding all HTML, and non-ebooks except for a little > sheet music, and also excluding some of the larger data files such as > the HGP files and the numbers) weighs in at about 5.5 GB. This is > also excluding ASCII encoded files when UTF-8 or ISO-8859-X files are > available. This does not exclude any images that were included in the > zip files with the text version, nor does it exclude any copyrighted > texts. > > 5.5 GB leaves us a good 3 GB more to play with. > > I think it would make more sense to offer the dual layer DVD ISO, and > also offer two single-layer DVD .iso images for folks with only a > single layer DVD burner. A lot of people who have emailed us in the > past have had a hard time just burning the .ISO. If possible, I would > like to keep things as simple as we can for them. > > I'll have to check to see if PG's 11-disc burner can handle dual layer > drives, or if by any chance it already has such drives installed. It > might take a firmware upgrade, but I would be surprised if it can't at > least use DL drives. Greg, do you by chance have an easily accessible > record of what model of duplicator we bought? If not, I can open it > up and check the model numbers on the controler. I just don't have > the email anymore, and there aren't any labels on the outside of the > case. > > Thanks. > > Aaron > > On 7/17/09, Keith J. Schultz <schultzk at uni-trier.de> wrote: >> Hi All, >> >> Actually, there are more DL-burners out thier than you think. As >> mentioned >> most should be able to read DL-DVDs. >> >> One thought though: If somebody is willing to download the 8 GB whether >> DL or two DVDs I do not think they will burn it. At least I have not >> burned >> the images I have. I have the images on my drive and just mount it >> when I need >> it. No fuss with having to carry a DVD along. It is also, possible to >> put the image >> or its content on a USB-stick. >> >> Second, if the users can burn a DVD they could divide it themselves. >> With a carefully >> crafted index.html and a little javascript magic one could easily >> divide the content among >> two DVDs. All they need to do edit the index.html in one place for >> the two DVDs. >> Another approach would be to use two directories for the content >> containing say >> DVD1 and DVD2. Then if somebody has a DL-burner s/he can you that >> and who only can burn single layer can do that. >> >> I see no big problem using DL-images. Then again I maybe to savvy. >> Personally I would >> prefer an image with all the PG-content zipped. I use to ftp the PG >> directories. But, at one point >> the guys at the unversity ask to me to take it easy because one day I >> had effectively >> used almost all bandwidth by using "get -r *.*". That was a long >> time ago, maybe I should try >> that again. >> >> All aside, I would suggest using the second model containing >> directories, this way in another >> couple of years we can use the same model for even larger images and >> we do not need >> to bother with what type of burner the user has whether it be single, >> DL or even Blue-ray, or >> whatever might appear. >> >> >> regards >> Keith. >> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/mailman/listinfo/gutvol-d >> > From schultzk at uni-trier.de Sat Jul 18 02:58:57 2009 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Sat, 18 Jul 2009 11:58:57 +0200 Subject: [gutvol-d] Re: !@! Re: A new DVD? In-Reply-To: <628c29180907171515o760354ack47b731478b97db6a@mail.gmail.com> References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com> <alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu> <628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com> <6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com> <628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com> <20090717051537.GB31087@mail.pglaf.org> <EAF6B4D3-3F9D-488F-A7D1-CE8DA4920DE1@uni-trier.de> <628c29180907170545i48bb1ccavb818de49dfec81ad@mail.gmail.com> <628c29180907171515o760354ack47b731478b97db6a@mail.gmail.com> Message-ID: <6962E1B2-E316-413F-A112-52F7A3483BD2@uni-trier.de> Hi Aaron, Well it depends! Do we want to put the html zipped version on it or even non-ebooks? I believe some prefer html over text. Maybe a dual aproach. "normal DVD with text version and a DL with more on it. Of course it is possible to make two "normal" DVDs. I would prefer a DL. regards Keith. Am 18.07.2009 um 00:15 schrieb Aaron Cannon: > I was mistaken. It actually weighs in at 4.5. So, it does in fact > fit on a single layer DVD (verified with Nero). In fact, we have > about 87MB to spare. So, in light of this new information, the > question is whether we want to create a DL DVD or not. > > Thoughts? > From user5013 at aol.com Sat Jul 18 06:10:37 2009 From: user5013 at aol.com (Christa & Jay Toser) Date: Sat, 18 Jul 2009 08:10:37 -0500 Subject: [gutvol-d] Re: A new DVD? Message-ID: <22178060-2060-44B5-88A3-F48982C4E24F@aol.com> Hi, Time for me to put my two cents worth in. I do a good-sized chunk of the international mailings for the Gutenberg project. These are the physical discs that are sent out, not any of the internet downloads. Point One: We cannot abandon the legacy of a CD. Most of Africa, and many ex-Soviet satellite nations, do not have personal computers that can read DVD's. For instance, the One Laptop Per Child computer can only read CD's. Perhaps the Gutenberg Project can update their CD to include the 630 most popular books from 2009 (instead of from 2003), but we must maintain a CD. Point Two: We must update the DVD. Folk in India have noticed that the current DVD holds about 10,000 fewer books than are on-line. They wonder why we are behind the times. Point Three: On the internet downloads, yes we can have a dual-layer or Blue-Ray disk image. Wonderful. BUT in the physical world, we will have to stick with the standard DVD format. We must understand that much of the world will continue with older legacy formats for at least the next decade. Which means, the next release of the DVD will probably have to be a set of two discs. Point Four: Mailing of discs. Currently, I mail two copies of the DVD (or CD) for every request. The philosophy has always been: "Keep one copy, and give the other copy to your local school or library." This has worked pretty well at the current postage costs. However, if I have to mail four discs, then costs of mailing will go up. I propose this change: Any requester may ask for one single personal copy of the two DVD's. _AND_ there should be an additional checkbox for them to ask for a second set of DVD's for them to give away. That way, I would normally send out only the two DVD's (for the single personal copy) -- unless the requester wants more -- and then I would spend the greater postage to send the extra discs. Critics may ask: "If I send a second set of discs, will that extra set be personally delivered to the other destination?" I say, Oh yes, I GUARANTEE IT. Americans simply do not realize just how much the rest of the wold values books. If someone says they will deliver the duplicates -- then they WILL. It would be worth it to the Gutenberg Project to pay the extra postage for the duplicate DVD's. Point Four-and-a-Half: No, DVD BURNERS are not as common as you think. Lots of folk across the world can READ a DVD, but not so many can BURN a DVD. As much as these requesters would want to make copies -- they just can't. Point Five: Download time. The debate I have read in the last couple days, does not seem to take into account internet access. They seem to think just anyone can download the DVD (or Blue-Ray) images easily. That is not the case in most of the world. I currently have 56K dial-up. If I were to download a (hypothetical) 2-DVD set, that download would complete, oh, about October. And yet, I do not experience the incredible delays seen in the Palestinian Territories, or some of the emerging Communist nations, or much of Africa -- there the best speed can be as low as 9600 baud (2.4K). Whenever Project Gutenberg creates the be-all-end-all Blue-Ray DVD disk which contains everything in the project -- then about 2/3rds of the world will not have the bandwidth to access any of it. So, I recommend moderation in whatever you create. Please keep in mind the legacy DVD & CD standards that are already in place; that the world currently can read; and try to create the new discs so that (almost) anyone can read them. Hope this helps, Jay Toser P.S. Should any of these new discs be created, will someone remember to mail to me, a hard copy? As I've said, I'm not going to be able to download a copy. That's why I do the international mailings. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090718/6f4528b1/attachment.html> From grythumn at gmail.com Sat Jul 18 08:58:29 2009 From: grythumn at gmail.com (Robert Cicconetti) Date: Sat, 18 Jul 2009 11:58:29 -0400 Subject: [gutvol-d] Re: A new DVD? In-Reply-To: <22178060-2060-44B5-88A3-F48982C4E24F@aol.com> References: <22178060-2060-44B5-88A3-F48982C4E24F@aol.com> Message-ID: <15cfa2a50907180858w7c97edb6s62a1a6e0ce3edcb4@mail.gmail.com> On Sat, Jul 18, 2009 at 9:10 AM, Christa & Jay Toser <user5013 at aol.com>wrote: > Point Three: On the internet downloads, yes we can have a dual-layer or > Blue-Ray disk image. Wonderful. BUT in the physical world, we will have to > stick with the standard DVD format. We must understand that much of the > world will continue with older legacy formats for at least the next decade. > Which means, the next release of the DVD will probably have to be a set of > two discs. > Dual-layer disks are PART of the standard DVD format[0], and just about any DVD-ROM[1] will read a burned dual layer disk. There were issues with burned DL disks in very early consumer DVD players, but DVD-ROMs are more robust, especially if you use DVD+R DL disks[2]. The bigger issue is burners... DL burners were only really standard in the last 3 or 4 years, but anything can read them. So you actually have it backwards... for internet downloads, you want single layer (DVD-5) images, but for physical mailings you want dual-layer images (DVD-9). R C [0] In fact the majority of commercial DVDs are dual-layer. Pressed dual-layer, not burned, but still dual-layer. [1] I won't guarantee every device will, but even my ancient 2x DVD-ROM from ~1998 can read burned DL disks. [2] Some older devices have trouble with -R DL disks if the two layers aren't burned to the same length. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090718/5b39c99a/attachment.html> From cannona at fireantproductions.com Sat Jul 18 09:18:35 2009 From: cannona at fireantproductions.com (Aaron Cannon) Date: Sat, 18 Jul 2009 11:18:35 -0500 Subject: [gutvol-d] Re: A new DVD? In-Reply-To: <22178060-2060-44B5-88A3-F48982C4E24F@aol.com> References: <22178060-2060-44B5-88A3-F48982C4E24F@aol.com> Message-ID: <628c29180907180918g442e12aas5c4e43847a77d889@mail.gmail.com> Hi Jay. Thanks for your insights on this, and especially thanks for all your work mailing DVDs. Points 1 and 2. agreed. Point 3. Based on what I've read, all drives (including the very oldest) should be able to read dual layer DVDs. The reason is that even though DL burners only became available relatively recently, the dual layer format has existed from the beginning, and in fact many DVDs from the mid to late 90's were dual layer. So, theoretically at least, if a person can read our current DVDs, they should be able to read dual layer discs. I have no way of knowing if this is true in practice however. Point 4. It's a good idea. The decision to send two discs, whether it be two CDs, two DVDs, or one of each is so that the shipping weight for every package will be uniform. That was a decision I made, and the reason I did so was just to make things a little simpler for volunteers. However, once we know what the next generation DVD will look like, we can reevaluate this decision. Points 4 and 4.5 are also quite valid and definitly something that needs to be kept in mind. Thanks again. Aaron On 7/18/09, Christa & Jay Toser <user5013 at aol.com> wrote: > Hi, Time for me to put my two cents worth in. > > I do a good-sized chunk of the international mailings for the > Gutenberg project. These are the physical discs that are sent out, > not any of the internet downloads. > > Point One: We cannot abandon the legacy of a CD. Most of Africa, > and many ex-Soviet satellite nations, do not have personal computers > that can read DVD's. For instance, the One Laptop Per Child computer > can only read CD's. Perhaps the Gutenberg Project can update their > CD to include the 630 most popular books from 2009 (instead of from > 2003), but we must maintain a CD. > > Point Two: We must update the DVD. Folk in India have noticed that > the current DVD holds about 10,000 fewer books than are on-line. > They wonder why we are behind the times. > > Point Three: On the internet downloads, yes we can have a dual-layer > or Blue-Ray disk image. Wonderful. BUT in the physical world, we > will have to stick with the standard DVD format. We must understand > that much of the world will continue with older legacy formats for at > least the next decade. Which means, the next release of the DVD will > probably have to be a set of two discs. > > Point Four: Mailing of discs. Currently, I mail two copies of the > DVD (or CD) for every request. The philosophy has always been: > "Keep one copy, and give the other copy to your local school or > library." This has worked pretty well at the current postage costs. > However, if I have to mail four discs, then costs of mailing will go up. > > I propose this change: Any requester may ask for one single personal > copy of the two DVD's. _AND_ there should be an additional checkbox > for them to ask for a second set of DVD's for them to give away. > That way, I would normally send out only the two DVD's (for the > single personal copy) -- unless the requester wants more -- and then > I would spend the greater postage to send the extra discs. > > Critics may ask: "If I send a second set of discs, will that extra > set be personally delivered to the other destination?" I say, Oh > yes, I GUARANTEE IT. Americans simply do not realize just how much > the rest of the wold values books. If someone says they will deliver > the duplicates -- then they WILL. It would be worth it to the > Gutenberg Project to pay the extra postage for the duplicate DVD's. > > Point Four-and-a-Half: No, DVD BURNERS are not as common as you > think. Lots of folk across the world can READ a DVD, but not so many > can BURN a DVD. As much as these requesters would want to make > copies -- they just can't. > > Point Five: Download time. The debate I have read in the last > couple days, does not seem to take into account internet access. > They seem to think just anyone can download the DVD (or Blue-Ray) > images easily. That is not the case in most of the world. > > I currently have 56K dial-up. If I were to download a (hypothetical) > 2-DVD set, that download would complete, oh, about October. And yet, > I do not experience the incredible delays seen in the Palestinian > Territories, or some of the emerging Communist nations, or much of > Africa -- there the best speed can be as low as 9600 baud (2.4K). > > Whenever Project Gutenberg creates the be-all-end-all Blue-Ray DVD > disk which contains everything in the project -- then about 2/3rds of > the world will not have the bandwidth to access any of it. > > > So, I recommend moderation in whatever you create. Please keep in > mind the legacy DVD & CD standards that are already in place; that > the world currently can read; and try to create the new discs so that > (almost) anyone can read them. > > Hope this helps, > Jay Toser > > P.S. Should any of these new discs be created, will someone remember > to mail to me, a hard copy? As I've said, I'm not going to be able > to download a copy. That's why I do the international mailings. > From cannona at fireantproductions.com Sat Jul 18 10:15:35 2009 From: cannona at fireantproductions.com (Aaron Cannon) Date: Sat, 18 Jul 2009 12:15:35 -0500 Subject: [gutvol-d] Re: !@! Re: A new DVD? In-Reply-To: <6962E1B2-E316-413F-A112-52F7A3483BD2@uni-trier.de> References: <628c29180907161706v4a784116vbf7f870905d47e57@mail.gmail.com> <alpine.DEB.1.00.0907161618470.6025@snowy.arsc.alaska.edu> <628c29180907161750p5e4c81b5ne373605ca3304efb@mail.gmail.com> <6d99d1fd0907161815g72171d5dvcdc8cef35b068ec5@mail.gmail.com> <628c29180907161828u1034cfe3od6deb71b3037fc5f@mail.gmail.com> <20090717051537.GB31087@mail.pglaf.org> <EAF6B4D3-3F9D-488F-A7D1-CE8DA4920DE1@uni-trier.de> <628c29180907170545i48bb1ccavb818de49dfec81ad@mail.gmail.com> <628c29180907171515o760354ack47b731478b97db6a@mail.gmail.com> <6962E1B2-E316-413F-A112-52F7A3483BD2@uni-trier.de> Message-ID: <628c29180907181015t5d80f0c9nbb099c5d97664d98@mail.gmail.com> That is in fact the question. Is the extra content worth the inconvenience and expense of the DL format and/or the creation of two discs? I personally could go either way, though just releasing one single layer DVD would be easier. Incidentally, I have compiled a list of files which will fit on 1 single layer DVD with about 87MB to spare. Whether we choose the single or dual layer, I propose that this list serve as a starting point. If anyone has any files that they feel should be included or excluded, please let me know. The list is available as tab delimited data at: (as a .zip file) http://snowy.arsc.alaska.edu/cdproject/dvdfiles.zip (as a .bz2 file) http://snowy.arsc.alaska.edu/cdproject/dvdfiles.csv.bz2 The data was generated as follows: 1. using the catalog.rdf file downloaded on July 14. 2. Removing all books that have a "type" assigned in the RDF record. This basically gets rid of almost everything that isn't an Ebook, including audio books, music, data, ETC. 3. removed books 2201 through 2224, books 11775 through 11855, and books 19159, 10802, 11220, and 3202. 4. removed files with formats pageimages, msword, text/xml, audio/mpeg, application/octet-stream type=anything, tei, html, pdf, rtf, tex, folio, palm, raider, and unspecified. In a few cases this removed entire titles, but in most cases, this just decreased the number of formats a given title was available in. 5. If a title was available in UTF-8 and/or ISO-8859-X, and also available in ASCII, then the ASCII version was not included. 6. If a book has a zipped and unzipped version in the archive, then only the zipped versions were kept. Again, suggestions are very welcome. Thanks. Aaron On 7/18/09, Keith J. Schultz <schultzk at uni-trier.de> wrote: > Hi Aaron, > > Well it depends! Do we want to put the html zipped > version on it or even non-ebooks? > > I believe some prefer html over text. Maybe a dual > aproach. "normal DVD with text version and a DL > with more on it. Of course it is possible to make > two "normal" DVDs. I would prefer a DL. > > regards > Keith. > > Am 18.07.2009 um 00:15 schrieb Aaron Cannon: > >> I was mistaken. It actually weighs in at 4.5. So, it does in fact >> fit on a single layer DVD (verified with Nero). In fact, we have >> about 87MB to spare. So, in light of this new information, the >> question is whether we want to create a DL DVD or not. >> >> Thoughts? >> > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From cannona at fireantproductions.com Mon Jul 20 10:44:19 2009 From: cannona at fireantproductions.com (Aaron Cannon) Date: Mon, 20 Jul 2009 12:44:19 -0500 Subject: [gutvol-d] Suggestions for the DVD must haves? Message-ID: <628c29180907201044s76b48906t246ca18ce2fe2c8b@mail.gmail.com> Hi all. I would like some recommendations on some books from the collection that you feel should be included on the DVD. Remember that all books that have text versions and that aren't data get included. However, what books should have their HTML versions included as well? Which other nontext works (such as mp3, films, ETC.) should be included? Email me your recommendations privately or reply to the list, as you prefer. Thanks. Aaron From i30817 at gmail.com Wed Jul 15 17:28:24 2009 From: i30817 at gmail.com (Paulo Levi) Date: Thu, 16 Jul 2009 01:28:24 +0100 Subject: [gutvol-d] Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com> References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com> Message-ID: <212322090907151728x84b224by697776eec6e265a4@mail.gmail.com> ---------- Forwarded message ---------- From: Paulo Levi <i30817 at gmail.com> Date: Thu, Jul 16, 2009 at 1:25 AM Subject: Programmatic fetching books from Gutenberg To: gutvol-p at lists.pglaf.org I made a ebook reader (here) http://code.google.com/p/bookjar/downloads/list and i'd like to search and download Gutenberg books. I already have a searcher prototype using LuceneSail a library that uses Lucene to index rdf documents and only indexing what i want from the catalog.rdf.zip. Now i'd like to know how from the url inside the catalog i can fetch the book itself, and what are the variants for the formats. A example query result: author: Shakespeare, William, 1564-1616 url: http://www.gutenberg.org/feeds/catalog.rdf#etext1802 title: King Henry VIII So, i like to know how from the etext1802 number can i get a working url to download the book, and how to construct variants for each format. Thank you in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090716/ea48a4cc/attachment.html> From pterandon at gmail.com Thu Jul 16 20:02:18 2009 From: pterandon at gmail.com (Greg M. Johnson) Date: Thu, 16 Jul 2009 23:02:18 -0400 Subject: [gutvol-d] Hey, I've been working on a DVD! Message-ID: <a0bf3e960907162002k15b3b256m990b6af5d5dddcf9@mail.gmail.com> Hi. I'm getting the email from this list but am not sure if I'm able to send to the list. Anyway, I've been working on a DVD. I've got 3500 of your HTML files on it organized into 21 different genres. It's *everything* under the sun, but from my own editorial stance, I'm most interested in Christian sermons, scifi, old sea tales, and the (US) Civil War, so there might be too much in those genres there for some's taste. At one point I was trying to avoid controversial material (eugenics, slavery, women's suffrage), but then thought it would be more valuable to include all kinds of viewpoints, even ones universally understood to be wrong today. The files are HTML, which i find to be the best way to read the material. I've stressed those files which are <1MB, but of course granting waivers to image-rich classics like Beatrix Potter's work. I've seen the DVD you put out with 19000 zipped txt files on it. That DVD probably the thing you want to have in every home basement in case civilization needs to survive a nuclear war, but it's not very user friendly. I entitled mine, "Some of the Best of Project Gutenberg." So, I'm planning to start distributing it to all my friends and every local nursing home. But I'd gladly give the current draft up to your group under a CC zero license. ADVENTURE IN THE AGE OF STEAM SCIENCE FICTION ADVENTURE NOVELS FOR YOUTH BEDTIME STORIES MANUALS & HOW-TO BOOKS SCHOOLBOOKS & EDUCATION THEORY ENGLISH LITERATURE DRAMA POETRY MYSTERY RELIGION- CHRISTIAN RELIGION-OTHER, MYTHOLOGY, & PHILOSOPHY SCIENCE- MEDICINE, PHYSIOLOGY, & PSYCHOLOGY SCIENCE- NATURAL SCIENCE- PHYSICS & ENGINEERING SOCIAL SCIENCE, THE ARTS, WORLDWIDE HISTORY U.S. & THE AMERICAS HISTORY & CULTURE EUROPEAN HISTORY & CULTURE ASIAN & AFRICAN HISTORY & CULTURE MAGAZINES -- Greg M. Johnson http://pterandon.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090716/65f71c57/attachment.html> From hart at pobox.com Mon Jul 20 17:42:02 2009 From: hart at pobox.com (Michael S. Hart) Date: Mon, 20 Jul 2009 16:42:02 -0800 (AKDT) Subject: [gutvol-d] !@! Barnes & Noble Unveils Online Bookstore to Compete Directly With Amazon Message-ID: <alpine.DEB.1.00.0907201637410.21137@snowy.arsc.alaska.edu> http://tinyurl.com/np5ke2 "Top U.S. bookseller Barnes & Noble (BKS) announced Monday the launch of the world's largest online bookstore, with over 700,000 titles that can be read on a range of platforms from Apple's iPhone to personal computers. Sounding a challenge to online retailer Amazon, the company said its selection would grow to over 1 million titles within the next year and include every available e-book from every book publisher." From marcello at perathoner.de Mon Jul 27 00:24:41 2009 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon, 27 Jul 2009 09:24:41 +0200 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com> References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com> <212322090907151728x84b224by697776eec6e265a4@mail.gmail.com> <a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com> Message-ID: <4A6D5639.5080600@perathoner.de> David A. Desrosiers wrote: > On Wed, Jul 15, 2009 at 8:28 PM, Paulo Levi<i30817 at gmail.com> wrote: >> So, i like to know how from the etext1802 number can i get a working url to >> download the book, and how to construct variants for each format. > > I do something very similar on the Plucker "samples" page: > > http://www.plkr.org/samples > > I check HEAD on each resource (using an intelligent caching mechanism > on my side), and then either present a working link, or a striked-out > link, depending on whether the format is available or not. That seems an horrible waste of resources seeing that you only need to scan the rdf file to see what files we have. From marcello at perathoner.de Mon Jul 27 00:41:19 2009 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon, 27 Jul 2009 09:41:19 +0200 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <212322090907151728x84b224by697776eec6e265a4@mail.gmail.com> References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com> <212322090907151728x84b224by697776eec6e265a4@mail.gmail.com> Message-ID: <4A6D5A1F.9010403@perathoner.de> Paulo Levi wrote: > > ---------- Forwarded message ---------- > From: *Paulo Levi* <i30817 at gmail.com <mailto:i30817 at gmail.com>> > Date: Thu, Jul 16, 2009 at 1:25 AM > Subject: Programmatic fetching books from Gutenberg > To: gutvol-p at lists.pglaf.org <mailto:gutvol-p at lists.pglaf.org> > > > I made a ebook reader > (here) http://code.google.com/p/bookjar/downloads/list > > and i'd like to search and download Gutenberg books. I already have a searcher > prototype using LuceneSail a library that uses Lucene to index rdf documents and > only indexing what i want from the catalog.rdf.zip. > > Now i'd like to know how from the url inside the catalog i can fetch the book > itself, and what are the variants for the formats. > A example query result: > author: Shakespeare, William, 1564-1616 > url: http://www.gutenberg.org/feeds/catalog.rdf#etext1802 > title: King Henry VIII > > So, i like to know how from the etext1802 number can i get a working url to > download the book, and how to construct variants for each format. > > Thank you in advance. I already told you how to do that on gutvol-p. You make a very simple thing very complicated because you refuse to use xml tools to scan an xml file. This simple xpath query: xpath ("//pgterms:file[dcterms::isFormatOf[@rdf:resource='#etext29514']]") will get all files we have for book 29514 with mimetype, size and last modification date. --- excerpt from catalog.rdf --- <pgterms:file rdf:about="&f;2/9/5/1/29514/29514-8.txt"> <dc:format><dcterms:IMT><rdf:value>text/plain; charset="iso-8859-1"</rdf:value></dcterms:IMT></dc:format> <dcterms:extent>27727</dcterms:extent> <dcterms:modified><dcterms:W3CDTF><rdf:value>2009-07-25</rdf:value></dcterms:W3CDTF></dcterms:modified> <dcterms:isFormatOf rdf:resource="#etext29514" /> </pgterms:file> <pgterms:file rdf:about="&f;2/9/5/1/29514/29514-8.zip"> <dc:format><dcterms:IMT><rdf:value>text/plain; charset="iso-8859-1"</rdf:value></dcterms:IMT></dc:format> <dc:format><dcterms:IMT><rdf:value>application/zip</rdf:value></dcterms:IMT></dc:format> <dcterms:extent>10751</dcterms:extent> <dcterms:modified><dcterms:W3CDTF><rdf:value>2009-07-25</rdf:value></dcterms:W3CDTF></dcterms:modified> <dcterms:isFormatOf rdf:resource="#etext29514" /> </pgterms:file> <pgterms:file rdf:about="&f;2/9/5/1/29514/29514-h/29514-h.htm"> <dc:format><dcterms:IMT><rdf:value>text/html; charset="iso-8859-1"</rdf:value></dcterms:IMT></dc:format> <dcterms:extent>29847</dcterms:extent> <dcterms:modified><dcterms:W3CDTF><rdf:value>2009-07-25</rdf:value></dcterms:W3CDTF></dcterms:modified> <dcterms:isFormatOf rdf:resource="#etext29514" /> </pgterms:file> <pgterms:file rdf:about="&f;2/9/5/1/29514/29514-h.zip"> <dc:format><dcterms:IMT><rdf:value>text/html; charset="iso-8859-1"</rdf:value></dcterms:IMT></dc:format> <dc:format><dcterms:IMT><rdf:value>application/zip</rdf:value></dcterms:IMT></dc:format> <dcterms:extent>18787</dcterms:extent> <dcterms:modified><dcterms:W3CDTF><rdf:value>2009-07-25</rdf:value></dcterms:W3CDTF></dcterms:modified> <dcterms:isFormatOf rdf:resource="#etext29514" /> </pgterms:file> From pterandon at gmail.com Mon Jul 27 05:23:38 2009 From: pterandon at gmail.com (Greg M. Johnson) Date: Mon, 27 Jul 2009 08:23:38 -0400 Subject: [gutvol-d] Compilation ideas Message-ID: <a0bf3e960907270523t3ead7c35lc59db00679a77f33@mail.gmail.com> First of all, apologies for the triplicate submissions on the same topic. I wasn't getting through to the list. Secondly, the question about resource use with the plucker list underscores my idea that we should be encouraging torrent-downloads of format-specific DVDs, arranged with some sort of genre or index stronger than merely alphabetical author & title. In a real bookstore, folks might want to peek at the first two pages of dozens of books before they buy. In ebooks, is there a way to serve the customer and save bandwidth? This is one way. I'm working on a compilation DVD in HTML format. If I were to do it over again, and had a computer to select things, I'd do something like this: 1) Find "Best of" lists. Some IIRC were on the pg web site. I'd start with the one about 100 books to read for a liberal education. Then add a well-rounded selection of items from "best of " lists for other genres: mystery, history of science, history, scifi, religion, etc., etc... 2) Figure out how much space you have and what format sceme (all, plucker, zipped txt's, HTML, etc.) you're going to concentrate on. 3) Start with the #1 item on every list. Add the complete works of each author. (My idea is that the most obscure work of Author #1 may be more fun to read than the everyone-knows item from Author #99. ) Continue down the list until you've filled up your space requirement. (If you're doing HTML format, skip the >20 MB books. ) 4) Indices: Author Alphabetical, Title Alphabetical, LOC classification (not my original idea this one), and editors' picks. ( Strongly encourage 3-12 well-read and opinionated folks to come up with "Editor's Picks" lists.) -- Greg M. Johnson http://pterandon.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090727/2eb111da/attachment.html> From desrod at gnu-designs.com Mon Jul 27 07:34:39 2009 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Mon, 27 Jul 2009 10:34:39 -0400 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <4A6D5639.5080600@perathoner.de> References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com> <212322090907151728x84b224by697776eec6e265a4@mail.gmail.com> <a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com> <4A6D5639.5080600@perathoner.de> Message-ID: <a82cdbb90907270734k24ecd8b5y6a75e3b3e8297eb6@mail.gmail.com> On Mon, Jul 27, 2009 at 3:24 AM, Marcello Perathoner<marcello at perathoner.de> wrote: > That seems an horrible waste of resources seeing that you only need to scan > the rdf file to see what files we have. Scanning the RDF file tells me absolutely nothing about the availability of the actual target format itself. Checking HEAD on each target link does, however. Since I'm caching it on the server-side, I only have to remotely check it the first time, which is not a "horrible waste of resources" at all. From Bowerbird at aol.com Mon Jul 27 09:40:57 2009 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 27 Jul 2009 12:40:57 EDT Subject: [gutvol-d] Re: !@! Barnes & Noble Unveils Online Bookstore to Compete Directly With Amazon Message-ID: <d12.25bbad71.379f3299@aol.com> b&n won't be able to compete with the kindle 3: > http://www.youtube.com/watch?v=GI0Zry_R4RQ "i can't hear you; i'm reading!" -bowerbird ************** An Excellent Credit Score is 750. See Yours in Just 2 Easy Steps! (http://pr.atwola.com/promoclk/100126575x1221823322x1201398723/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072&hmpgID=62& bcd=JulyExcfooterNO62) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090727/8b0b639f/attachment.html> From ralf at ark.in-berlin.de Mon Jul 27 10:45:05 2009 From: ralf at ark.in-berlin.de (Ralf Stephan) Date: Mon, 27 Jul 2009 19:45:05 +0200 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <a82cdbb90907270734k24ecd8b5y6a75e3b3e8297eb6@mail.gmail.com> References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com> <212322090907151728x84b224by697776eec6e265a4@mail.gmail.com> <a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com> <4A6D5639.5080600@perathoner.de> <a82cdbb90907270734k24ecd8b5y6a75e3b3e8297eb6@mail.gmail.com> Message-ID: <002494FB-121D-4CC8-80B0-AE3EFA238ADB@ark.in-berlin.de> On Jul 27, 2009, at 4:34 PM, David A. Desrosiers wrote: > On Mon, Jul 27, 2009 at 3:24 AM, Marcello > Perathoner<marcello at perathoner.de> wrote: >> That seems an horrible waste of resources seeing that you only need >> to scan >> the rdf file to see what files we have. > > Scanning the RDF file tells me absolutely nothing about the > availability of the actual target format itself. Checking HEAD on each > target link does, however. Since I'm caching it on the server-side, I > only have to remotely check it the first time, which is not a > "horrible waste of resources" at all. My, can't we admit that XPath is a bit over our head, so we prefer confronting the admin we're supposed to be cooperating with? Wrt resources, my guess it's about par traffic-wise (1-5k per book vs. megabytes of RDF) but much better CPU-wise. That is, if you don't want the RDF for other fine things like metadata etc. ralf From desrod at gnu-designs.com Mon Jul 27 11:42:36 2009 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Mon, 27 Jul 2009 14:42:36 -0400 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <002494FB-121D-4CC8-80B0-AE3EFA238ADB@ark.in-berlin.de> References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com> <212322090907151728x84b224by697776eec6e265a4@mail.gmail.com> <a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com> <4A6D5639.5080600@perathoner.de> <a82cdbb90907270734k24ecd8b5y6a75e3b3e8297eb6@mail.gmail.com> <002494FB-121D-4CC8-80B0-AE3EFA238ADB@ark.in-berlin.de> Message-ID: <a82cdbb90907271142p34e72bbdv86211a862580560f@mail.gmail.com> On Mon, Jul 27, 2009 at 1:45 PM, Ralf Stephan<ralf at ark.in-berlin.de> wrote: > My, can't we admit that XPath is a bit over our head, > so we prefer confronting the admin we're supposed > to be cooperating with? Wrt resources, my guess it's > about par traffic-wise (1-5k per book vs. megabytes > of RDF) but much better CPU-wise. That is, if you don't > want the RDF for other fine things like metadata etc. I think you've missed my point. The RDF flat-out cannot tell me which of the target _formats_ are available for immediate download to the users. I'm not looking for which _titles_ are available in the catalog, I'm looking for which _formats_ are available. Also note that I'm already parsing the feeds to see what the top 'n' titles are already, so parsing XML via whatever methods I need is not the blocker here. Let me give you an example of two titles available in the catalog: Verg?nglichkeit by Sigmund Freud http://www.gutenberg.org/cache/plucker/29514/29514 The Lost Word by Henry Van Dyke http://www.gutenberg.org/cache/plucker/4384/4384 Both of these _titles_ are available in the Gutenberg catalog, but the second one is not available in the Plucker _format_ for immediate download. Big difference from parsing title availability from the catalog.rdf file. Make sense now? From ralf at ark.in-berlin.de Tue Jul 28 00:16:41 2009 From: ralf at ark.in-berlin.de (Ralf Stephan) Date: Tue, 28 Jul 2009 09:16:41 +0200 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <a82cdbb90907271142p34e72bbdv86211a862580560f@mail.gmail.com> References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com> <212322090907151728x84b224by697776eec6e265a4@mail.gmail.com> <a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com> <4A6D5639.5080600@perathoner.de> <a82cdbb90907270734k24ecd8b5y6a75e3b3e8297eb6@mail.gmail.com> <002494FB-121D-4CC8-80B0-AE3EFA238ADB@ark.in-berlin.de> <a82cdbb90907271142p34e72bbdv86211a862580560f@mail.gmail.com> Message-ID: <79902BC0-1EED-4021-B529-60A6CE413F38@ark.in-berlin.de> I confirm that neither the Plucker nor the Mobile formats are mentioned in the catalog file. Do you have an explanation, Marcello? ralf On Jul 27, 2009, at 8:42 PM, David A. Desrosiers wrote: > On Mon, Jul 27, 2009 at 1:45 PM, Ralf Stephan<ralf at ark.in-berlin.de> > wrote: >> My, can't we admit that XPath is a bit over our head, >> so we prefer confronting the admin we're supposed >> to be cooperating with? Wrt resources, my guess it's >> about par traffic-wise (1-5k per book vs. megabytes >> of RDF) but much better CPU-wise. That is, if you don't >> want the RDF for other fine things like metadata etc. > > I think you've missed my point. > > The RDF flat-out cannot tell me which of the target _formats_ are > available for immediate download to the users. I'm not looking for > which _titles_ are available in the catalog, I'm looking for which > _formats_ are available. Also note that I'm already parsing the feeds > to see what the top 'n' titles are already, so parsing XML via > whatever methods I need is not the blocker here. > > Let me give you an example of two titles available in the catalog: > > Verg?nglichkeit by Sigmund Freud > http://www.gutenberg.org/cache/plucker/29514/29514 > > The Lost Word by Henry Van Dyke > http://www.gutenberg.org/cache/plucker/4384/4384 > > Both of these _titles_ are available in the Gutenberg catalog, but the > second one is not available in the Plucker _format_ for immediate > download. Big difference from parsing title availability from the > catalog.rdf file. > > Make sense now? > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d Ralf Stephan http://www.ark.in-berlin.de pub 1024D/C5114CB2 2009-06-07 [expires: 2011-06-06] Key fingerprint = 76AE 0D21 C06C CBF9 24F8 7835 1809 DE97 C511 4CB2 From gbnewby at pglaf.org Tue Jul 28 05:50:29 2009 From: gbnewby at pglaf.org (Greg Newby) Date: Tue, 28 Jul 2009 05:50:29 -0700 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <79902BC0-1EED-4021-B529-60A6CE413F38@ark.in-berlin.de> References: <212322090907151725x194f5361j2741f331dbf775f1@mail.gmail.com> <212322090907151728x84b224by697776eec6e265a4@mail.gmail.com> <a82cdbb90907261925l1ba27052m1813b59b14f2e785@mail.gmail.com> <4A6D5639.5080600@perathoner.de> <a82cdbb90907270734k24ecd8b5y6a75e3b3e8297eb6@mail.gmail.com> <002494FB-121D-4CC8-80B0-AE3EFA238ADB@ark.in-berlin.de> <a82cdbb90907271142p34e72bbdv86211a862580560f@mail.gmail.com> <79902BC0-1EED-4021-B529-60A6CE413F38@ark.in-berlin.de> Message-ID: <20090728125029.GA24834@mail.pglaf.org> On Tue, Jul 28, 2009 at 09:16:41AM +0200, Ralf Stephan wrote: > I confirm that neither the Plucker nor the Mobile formats > are mentioned in the catalog file. Do you have an > explanation, Marcello? I believe Marcello is out on vacation for 2 weeks. But I know the explanation: the epub, mobi and a few other formats are not part of the Project Gutenberg collection's files, so not part of the database. They are generated on-demand (or cached if they were generated recently enough), from HTML or text. We are planning many more "on the fly" conversion options for the future. I have one for a mobile eBook format (for cell phones), and hope to have a PDF converter (with lots of options). We've been working on some text-to-speech converters, too, but that work has gone slowly. The catalog file only tracks the actual files that are stored as part of the collection (stuff you can view while navigating the directory tree via FTP or other methods). -- Greg > On Jul 27, 2009, at 8:42 PM, David A. Desrosiers wrote: > >> On Mon, Jul 27, 2009 at 1:45 PM, Ralf Stephan<ralf at ark.in-berlin.de> >> wrote: >>> My, can't we admit that XPath is a bit over our head, >>> so we prefer confronting the admin we're supposed >>> to be cooperating with? Wrt resources, my guess it's >>> about par traffic-wise (1-5k per book vs. megabytes >>> of RDF) but much better CPU-wise. That is, if you don't >>> want the RDF for other fine things like metadata etc. >> >> I think you've missed my point. >> >> The RDF flat-out cannot tell me which of the target _formats_ are >> available for immediate download to the users. I'm not looking for >> which _titles_ are available in the catalog, I'm looking for which >> _formats_ are available. Also note that I'm already parsing the feeds >> to see what the top 'n' titles are already, so parsing XML via >> whatever methods I need is not the blocker here. >> >> Let me give you an example of two titles available in the catalog: >> >> Verg?nglichkeit by Sigmund Freud >> http://www.gutenberg.org/cache/plucker/29514/29514 >> >> The Lost Word by Henry Van Dyke >> http://www.gutenberg.org/cache/plucker/4384/4384 >> >> Both of these _titles_ are available in the Gutenberg catalog, but the >> second one is not available in the Plucker _format_ for immediate >> download. Big difference from parsing title availability from the >> catalog.rdf file. >> >> Make sense now? >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/mailman/listinfo/gutvol-d > > Ralf Stephan > http://www.ark.in-berlin.de > pub 1024D/C5114CB2 2009-06-07 [expires: 2011-06-06] > Key fingerprint = 76AE 0D21 C06C CBF9 24F8 7835 1809 DE97 C511 > 4CB2 > > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d From joshua at hutchinson.net Tue Jul 28 07:16:15 2009 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue, 28 Jul 2009 14:16:15 +0000 (GMT) Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg Message-ID: <891262632.99249.1248790575502.JavaMail.mail@webmail05> An HTML attachment was scrubbed... URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090728/97a29e33/attachment.html> From gbnewby at pglaf.org Tue Jul 28 07:33:06 2009 From: gbnewby at pglaf.org (Greg Newby) Date: Tue, 28 Jul 2009 07:33:06 -0700 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <891262632.99249.1248790575502.JavaMail.mail@webmail05> References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> Message-ID: <20090728143306.GA28986@mail.pglaf.org> On Tue, Jul 28, 2009 at 02:16:15PM +0000, Joshua Hutchinson wrote: > Any chance of creating on the fly zips of some of the books??? For > instance, the audio books are huge and usually divided along chapter > lines.?? Single file zips are very useful (and something we've done on some > of them manually) but the space waste is huge.?? On the fly zipping of > those files would save huge in storage space. > > Josh Somebody would need to write the software :) Zipping an mp3 is not a winning strategy: they really don't compress much, if at all. Putting multiple mp3 files for a single eBook in one file, on the fly, would be a great move - making it easier to download a group of files. A more general approach would be to let visitors to www.gutenberg.org put their selected files (including those generated on-the-fly) on a bookshelf (i.e., shopping cart), then download in one big file, or several small ones. This would involve some fairly significant additions to the current PHP-based back-end at www.gutenberg.org, but is certainly not a huge technical feat. -- Greg > On Jul 28, 2009, Greg Newby <gbnewby at pglaf.org> wrote: > > On Tue, Jul 28, 2009 at 09:16:41AM +0200, Ralf Stephan wrote: > > I confirm that neither the Plucker nor the Mobile formats > > are mentioned in the catalog file. Do you have an > > explanation, Marcello? > > I believe Marcello is out on vacation for 2 weeks. > > But I know the explanation: the epub, mobi and a few other > formats are not part of the Project Gutenberg collection's > files, so not part of the database. > > They are generated on-demand (or cached if they were generated > recently enough), from HTML or text. > > We are planning many more "on the fly" conversion options for > the future. I have one for a mobile eBook format (for cell > phones), and hope to have a PDF converter (with lots of options). > We've been working on some text-to-speech converters, too, but > that work has gone slowly. > > The catalog file only tracks the actual files that are stored > as part of the collection (stuff you can view while navigating > the directory tree via FTP or other methods). > -- Greg > > > On Jul 27, 2009, at 8:42 PM, David A. Desrosiers wrote: > > > >> On Mon, Jul 27, 2009 at 1:45 PM, Ralf > Stephan<[1]ralf at ark.in-berlin.de> > >> wrote: > >>> My, can't we admit that XPath is a bit over our head, > >>> so we prefer confronting the admin we're supposed > >>> to be cooperating with? Wrt resources, my guess it's > >>> about par traffic-wise (1-5k per book vs. megabytes > >>> of RDF) but much better CPU-wise. That is, if you don't > >>> want the RDF for other fine things like metadata etc. > >> > >> I think you've missed my point. > >> > >> The RDF flat-out cannot tell me which of the target _formats_ are > >> available for immediate download to the users. I'm not looking for > >> which _titles_ are available in the catalog, I'm looking for which > >> _formats_ are available. Also note that I'm already parsing the feeds > >> to see what the top 'n' titles are already, so parsing XML via > >> whatever methods I need is not the blocker here. > >> > >> Let me give you an example of two titles available in the catalog: > >> > >> Verg??nglichkeit by Sigmund Freud > >> [2]http://www.gutenberg.org/cache/plucker/29514/29514 > >> > >> The Lost Word by Henry Van Dyke > >> [3]http://www.gutenberg.org/cache/plucker/4384/4384 > >> > >> Both of these _titles_ are available in the Gutenberg catalog, but > the > >> second one is not available in the Plucker _format_ for immediate > >> download. Big difference from parsing title availability from the > >> catalog.rdf file. > >> > >> Make sense now? > >> _______________________________________________ > >> gutvol-d mailing list > >> [4]gutvol-d at lists.pglaf.org > >> [5]http://lists.pglaf.org/mailman/listinfo/gutvol-d > > > > Ralf Stephan > > [6]http://www.ark.in-berlin.de > > pub 1024D/C5114CB2 2009-06-07 [expires: 2011-06-06] > > Key fingerprint = 76AE 0D21 C06C CBF9 24F8 7835 1809 DE97 C511 > > 4CB2 > > > > > > > > > > _______________________________________________ > > gutvol-d mailing list > > [7]gutvol-d at lists.pglaf.org > > [8]http://lists.pglaf.org/mailman/listinfo/gutvol-d > _______________________________________________ > gutvol-d mailing list > [9]gutvol-d at lists.pglaf.org > [10]http://lists.pglaf.org/mailman/listinfo/gutvol-d > > References > > Visible links > 1. mailto:ralf at ark.in-berlin.de > 2. http://www.gutenberg.org/cache/plucker/29514/29514 > 3. http://www.gutenberg.org/cache/plucker/4384/4384 > 4. mailto:gutvol-d at lists.pglaf.org > 5. http://lists.pglaf.org/mailman/listinfo/gutvol-d > 6. http://www.ark.in-berlin.de/ > 7. mailto:gutvol-d at lists.pglaf.org > 8. http://lists.pglaf.org/mailman/listinfo/gutvol-d > 9. mailto:gutvol-d at lists.pglaf.org > 10. http://lists.pglaf.org/mailman/listinfo/gutvol-d > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d From joey at joeysmith.com Tue Jul 28 17:24:08 2009 From: joey at joeysmith.com (Joey Smith) Date: Tue, 28 Jul 2009 18:24:08 -0600 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <20090728143306.GA28986@mail.pglaf.org> References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> <20090728143306.GA28986@mail.pglaf.org> Message-ID: <20090729002407.GA5389@joeysmith.com> On Tue, Jul 28, 2009 at 07:33:06AM -0700, Greg Newby wrote: > Somebody would need to write the software :) > > Zipping an mp3 is not a winning strategy: they really don't > compress much, if at all. > > Putting multiple mp3 files for a single eBook in one file, > on the fly, would be a great move - making it easier to > download a group of files. > > A more general approach would be to let visitors to www.gutenberg.org > put their selected files (including those generated on-the-fly) > on a bookshelf (i.e., shopping cart), then download in one big file, > or several small ones. > > This would involve some fairly significant additions to the > current PHP-based back-end at www.gutenberg.org, but is certainly > not a huge technical feat. > -- Greg Where can one find the code for the "current PHP-based back-end at www.gutenberg.org" to begin doing looking into how feasible this would be? From gbnewby at pglaf.org Wed Jul 29 06:27:08 2009 From: gbnewby at pglaf.org (Greg Newby) Date: Wed, 29 Jul 2009 06:27:08 -0700 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <20090729002407.GA5389@joeysmith.com> References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> <20090728143306.GA28986@mail.pglaf.org> <20090729002407.GA5389@joeysmith.com> Message-ID: <20090729132708.GA31539@mail.pglaf.org> On Tue, Jul 28, 2009 at 06:24:08PM -0600, Joey Smith wrote: > On Tue, Jul 28, 2009 at 07:33:06AM -0700, Greg Newby wrote: > > Somebody would need to write the software :) > > > > Zipping an mp3 is not a winning strategy: they really don't > > compress much, if at all. > > > > Putting multiple mp3 files for a single eBook in one file, > > on the fly, would be a great move - making it easier to > > download a group of files. > > > > A more general approach would be to let visitors to www.gutenberg.org > > put their selected files (including those generated on-the-fly) > > on a bookshelf (i.e., shopping cart), then download in one big file, > > or several small ones. > > > > This would involve some fairly significant additions to the > > current PHP-based back-end at www.gutenberg.org, but is certainly > > not a huge technical feat. > > -- Greg > > Where can one find the code for the "current PHP-based back-end at www.gutenberg.org" > to begin doing looking into how feasible this would be? Thanks for your interest :) It isn't bundled up for download anywhere. We'll probably need to wait for Marcello's return from vacation to provide details on how to add components like this. The current system is modular & (I think) well-organized, but complex...including lots of stuff that readers never see (such as the cataloger interface and various programs that add new files). Plus, as you know, there is a lot of stuff that is in the Wiki, rather than PHP. The Wiki might be where new features could be added, or there might be modules "out there" that could make it easier. I did grab catalog/world/bibrec.php , where bibrecs like this are made: http://www.gutenberg.org/etext/11 It is below. This should give you an idea how where various things are tied in from the database, the on-disk cached records, and stuff that is generated on the fly. The various .phh files it references (which cascade to include a whole bunch of stuff) are mostly for presentation (html and css), not functionality. A bookshelf/shopping cart would probably be a brand new set of files, with just a little overlap with the existing php. It would need to access the database, and presumably would need a table or two to keep track of bookshelf users & entries. (Maybe a separate database...maybe part of the Wiki instead of a standalone set of PHP programs.) Cookies, or something similar, could be used to track user sessions and their bookshelves/shopping carts/whatever, and add an entry to various pages at www.gutenberg.org for them to access it (sort of like a regular ecommerce site). -------- bibrec.php <?php include_once ("pgcat.phh"); $cli = php_sapi_name () == "cli"; if ($cli) { $fk_books = intval ($_SERVER['argv'][1]); } else { $devserver = preg_match ("/www-dev/", $_SERVER['HTTP_HOST']); if ($devserver) { nocache (); } getint ("fk_books"); } $db = $config->db (); $keywords = array (); $frontispiece = null; $category = 0; $newfilesys = false; $help = "/wiki/Gutenberg:Help_on_Bibliographic_Record_Page"; $helpicon = "<img src=\"/pics/help.png\" class=\"helpicon\" alt=\"[help]\"$config->endtag>"; $db->exec ("select * from mn_books_categories where fk_books = $fk_books order by fk_categories"); if ($db->FirstRow ()) { $category = $db->get ("fk_categories", SQLINT); } $friendlytitle = friendlytitle ($fk_books, 80); $config->description = htmlspecialchars ("Download the free {$category_descriptions[$category]}: $friendlytitle"); for ($i = 0; $i < 26; ++$i) { $base32[sprintf ("%05b", $i)] = chr (0x41 + $i); } for ($i = 26; $i < 32; ++$i) { $base32[sprintf ("%05b", $i)] = chr (0x32 + $i - 26); } // find best file for recode facility class recode_candidate { function recode_candidate () { $this->score = 0; $this->fk_files = null; $this->filename = null; $this->encoding = null; $this->type = null; } } function find_recode_candidate ($fk_books) { global $db; $candidate = new recode_candidate (); $db->exec ("select pk, filename, fk_encodings from files " . "where fk_books = $fk_books and fk_filetypes = 'txt' " . "and fk_compressions = 'none' and diskstatus = 0 and obsoleted = 0"); if ($db->FirstRow ()) { do { $tmp = new recode_candidate (); $tmp->fk_files = $db->get ("pk", SQLINT); $tmp->filename = $db->get ("filename", SQLCHAR); $tmp->encoding = $db->get ("fk_encodings", SQLCHAR); if ((!isset ($tmp->encoding) || $tmp->encoding == "us-ascii")) { $tmp->score = 1; $tmp->encoding = "ASCII"; } if ($tmp->encoding == "big5") { $tmp->score = 2; $tmp->encoding = "BIG-5"; } if ($tmp->encoding == "euc-kr") { $tmp->score = 2; $tmp->encoding = "EUC-KR"; } if ($tmp->encoding == "Shift_JIS") { $tmp->score = 2; $tmp->encoding = "SHIFT-JIS"; } if (!strncmp ($tmp->encoding, "iso-", 4)) { $tmp->score = 3; } if (!strncmp ($tmp->encoding, "windows-", 8)) { $tmp->score = 4; } if ($tmp->encoding == "utf-8") { $tmp->score = 5; $tmp->encoding = "UTF-8"; } if ($tmp->score > $candidate->score) { $candidate = $tmp; } } while ($db->NextRow ()); } return $candidate; } function find_plucker_candidate ($fk_books) { global $db; $candidate = new recode_candidate (); $db->exec ("select pk, filename, fk_encodings, fk_filetypes from files " . "where fk_books = $fk_books and (fk_filetypes = 'txt' or fk_filetypes = 'html')" . "and fk_compressions = 'none' and diskstatus = 0 and obsoleted = 0"); if ($db->FirstRow ()) { do { $tmp = new recode_candidate (); $tmp->fk_files = $db->get ("pk", SQLINT); $tmp->filename = $db->get ("filename", SQLCHAR); $tmp->encoding = $db->get ("fk_encodings", SQLCHAR); $tmp->type = $db->get ("fk_filetypes", SQLCHAR); if ((!isset ($tmp->encoding) || $tmp->encoding == "us-ascii")) { $tmp->score = 1; } if ($tmp->encoding == "iso-8859-1") { $tmp->score = 2; } /* if ($tmp->encoding == "windows-1252") { $tmp->score = 3; } */ if ($tmp->type == "html") { $tmp->score = 4; } if ($tmp->score > $candidate->score) { $candidate = $tmp; } } while ($db->NextRow ()); } return $candidate; } function base32_encode ($in) { global $base32; $bits = ""; $in = @pack ("H*", $in); $len = strlen ($in); for ($i = 0; $i < $len; $i++) { $bits .= sprintf ("%08b", ord ($in{$i})); } if ($mod = strlen ($bits) % 5) { $bits .= str_repeat ("0", 5 - $mod); } return strtr ($bits, $base32); } class DownloadColumn extends dbtSimpleColumn { function DownloadColumn () { global $help, $helpicon; parent::dbtSimpleColumn (null, "Download Links <a href=\"$help#Download_Links\" title=\"Explain Download Links.\">$helpicon</a>", "pgdbfilesdownload"); } function Data ($db) { global $config, $friendlytitle, $fk_books, $newfsbasedir; $filename = $db->get ("filename", SQLCHAR); $extension = ""; if (preg_match ("/(\.[^.]+)$/", $filename, $matches)) { $extension = $matches[1]; } $dir = etext2dir ($fk_books); if (preg_match ("!^$dir!", $filename)) { $symlink = preg_replace ("!^$dir!", $newfsbasedir, $filename); } else { $symlink = "$config->downloadbase/$filename"; } $links = array (); $links[] = "<a href=\"$symlink\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; $links[] = "<a href=\"$config->world/mirror-redirect?file=$filename\" title=\"Download from mirror site.\" rel=\"nofollow\">mirror sites</a>"; $sha1 = base32_encode ($db->get ("sha1hash", SQLCHAR)); $tt = base32_encode ($db->get ("tigertreehash", SQLCHAR)); $links[] = "<a href=\"magnet:?xt=urn:sha1:$sha1" . "&xt=urn:kzhash:" . $db->get ("kzhash", SQLCHAR) . "&xt=urn:ed2k:" . $db->get ("ed2khash", SQLCHAR) . "&xt=urn:bitprint:$sha1.$tt" . "&xs=http://$config->domain$symlink" . "&dn=" . urlencode ("$friendlytitle$extension") . "\" title=\"Magnetlink to download from P2P network.\">P2P</a>"; return "<td class=\"pgdbfilesdownload\">" . join (" ", $links) . "</td>"; } } $array = array (); $db->exec ("select * from books where pk = $fk_books;"); if (!$db->FirstRow ()) { error_msg ("No etext no. $fk_books."); } $release_date = $db->get ("release_date"); $copyrighted = $db->get ("copyrighted") ? "Copyrighted. You may download this ebook but you may be limited in other uses. Check the license inside the ebook." : "Not copyrighted in the United States. If you live elsewhere check the laws of your country before downloading this ebook."; $db->exec ( "select * from authors, roles, mn_books_authors where mn_books_authors.fk_books = $fk_books and authors.pk = mn_books_authors.fk_authors and roles.pk = mn_books_authors.fk_roles order by role, author" ); $db->calcfields ["c_author"] = new CalcFieldAuthorDate (); if ($db->FirstRow ()) { do { $pk = $db->get ("fk_authors", SQLINT); $name = $db->get ("c_author", SQLCHAR); $role = htmlspecialchars ($db->get ("role", SQLCHAR)); $array [] = preg_replace ("/ /", " ", $role); $array [] = "<a href=\"/browse/authors/" . find_browse_page ($name) . "#a$pk\">$name</a>"; $keywords [] = htmlspecialchars ($db->get ("author", SQLCHAR)); } while ($db->NextRow ()); } $db->exec ("select attributes.*, attriblist.name, attriblist.caption from attributes, attriblist " . "where attributes.fk_books = $fk_books and " . "attributes.fk_attriblist = attriblist.pk " . "order by attriblist.name;"); if ($db->FirstRow ()) { do { $note = htmlspecialchars ($db->get ("text", SQLCHAR)); $caption = htmlspecialchars ($db->get ("caption", SQLCHAR)); $note = preg_replace ("/\n/", "<br$config->endtag>", $note); if ($caption) { $name = $db->get ("name", SQLCHAR); switch (intval ($name)) { case 901: $note = "<a href=\"$note?nocount\"><img src=\"$note?nocount\" title=\"$caption\" alt=\"$caption\" $config->endtag></a>"; break; case 902: case 903: $note = "<a href=\"$note?nocount\">$caption</a>"; break; case 10: $note = "$note <img src=\"/pics/link.png\" alt=\"\" $config->endtag> <a href=\"http://lccn.loc.gov/$note\" title=\"Look up this book in the Library of Congress catalog.\">LoC catalog record</a>"; break; default: $note = strip_marc_subfields ($note); if (substr ($name, 0, 1) == '5') { $patterns = array ("/http:\/\/\S+/", "/#(\d+)/"); $replaces = array ("<a href=\"$0\">$0</a>", "<a href=\"/ebooks/$1\">$0</a>"); $note = preg_replace ($patterns, $replaces, $note); } } $array [] = preg_replace ("/ /", " ", $caption); $array [] = $note; } } while ($db->NextRow ()); } $db->exec ("select * from langs, mn_books_langs where langs.pk = mn_books_langs.fk_langs and mn_books_langs.fk_books = $fk_books;" ); if ($db->FirstRow ()) { do { $pk = $db->get ("pk", SQLCHAR); $lang = htmlspecialchars ($db->get ("lang", SQLCHAR)); $array [] = "Language"; if ($pk != 'en') { $array [] = "<a href=\"/browse/languages/$pk\">$lang</a>"; } else { $array [] = $lang; } } while ($db->NextRow ()); } $db->exec ("select * from loccs, mn_books_loccs where loccs.pk = mn_books_loccs.fk_loccs and mn_books_loccs.fk_books = $fk_books;" ); if ($db->FirstRow ()) { do { $pk = $db->get ("pk", SQLCHAR); $pkl = strtolower ($pk); $locc = htmlspecialchars ($db->get ("locc", SQLCHAR)); $array [] = "LoC Class"; $array [] = "<a href=\"/browse/loccs/$pkl\">$pk: $locc</a>"; $keywords [] = $locc; } while ($db->NextRow ()); } $db->exec ("select * from subjects, mn_books_subjects where subjects.pk = mn_books_subjects.fk_subjects and mn_books_subjects.fk_books = $fk_books;" ); if ($db->FirstRow ()) { do { $subject = htmlspecialchars ($db->get ("subject", SQLCHAR)); // $url = urlencode ($subject); $array [] = "Subject"; // $array [] = "<a href=\"$config->world/results?subject=$url\">$subject</a>"; $array [] = $subject; $keywords [] = $subject; } while ($db->NextRow ()); } $db->exec ("select * from categories, mn_books_categories where categories.pk = mn_books_categories.fk_categories and mn_books_categories.fk_books = $fk_books;"); if ($db->FirstRow ()) { do { $pk = $db->get ("pk", SQLINT); $category = $db->get ("category", SQLCHAR); $array [] = "Category"; $array [] = "<a href=\"/browse/categories/$pk\">$category</a>"; } while ($db->NextRow ()); } $array [] = "EText-No."; $array [] = $fk_books; $array [] = "Release Date"; $array [] = $release_date; $array [] = "Copyright Status"; $array [] = $copyrighted; $db->exec ("select count (*) as cnt from reviews.reviews where fk_books = $fk_books"); if (($cnt = $db->get ("cnt", SQLINT)) > 0) { $s = ($cnt == 1) ? "is a review" : "are $cnt reviews"; $array [] = "Reviews"; $array [] = "<a href=\"$config->world/reviews?fk_books=$fk_books\">There $s of this book available.</a>"; } $newfsbasedir = "$config->files/$fk_books/"; $db->exec ("select filename from files where fk_books = $fk_books and filename ~ '^[1-9]/'"); if ($db->FirstRow ()) { $newfilesys = true; $array [] = "Base Directory"; $array [] = "<a href=\"$newfsbasedir\">$newfsbasedir</a>"; } for ($i = 0; $i < count ($keywords); $i++) { $keywords[$i] = preg_replace ("/,\s*/", " ", $keywords[$i]); } $config->keywords = htmlspecialchars (join (", ", $keywords)) . ", $config->keywords"; $recode_candidate = find_recode_candidate ($fk_books); $plucker_candidate = find_plucker_candidate ($fk_books); $offer_recode = $recode_candidate->score > 0; $offer_plucker = $plucker_candidate->score > 0; /////////////////////////////////////////////////////////////////////////////// // start page output pageheader (htmlspecialchars ($friendlytitle)); $manubar = array (); $menubar[] = "<a href=\"$help\" title=\"Explain this page.\" rel=\"Help\">Help</a>"; if ($offer_recode) { $menubar[] = "<a href=\"$config->world/readfile?fk_files=$recode_candidate->fk_files\" title=\"Read this book online.\"rel=\"nofollow\">Read online</a>"; } p (join (" — ", $menubar)); echo ("<div class=\"pgdbdata\">\n\n"); $table = new BibrecTable (); $table->summary = "Bibliographic data of author and book."; $table->toprows = $array; $table->PrintTable (null, "Bibliographic Record <a href=\"$help#Table:_Bibliographic_Record\" title=\"Explain this table.\">$helpicon</a>"); echo ("</div>\n\n"); $db->exec ("select filetype, sortorder, compression, " . "case files.fk_filetypes when 'txt' then fk_encodings when 'mp3' then fk_encodings else null end as fk_encodings, " . "edition, filename, filesize, sha1hash, kzhash, tigertreehash, ed2khash " . "from files " . "left join filetypes on files.fk_filetypes = filetypes.pk " . "left join compressions on files.fk_compressions = compressions.pk " . "where fk_books = $fk_books and obsoleted = 0 and diskstatus = 0 " . "order by edition desc, sortorder, filetype, fk_encodings, compression, filename;"); $db->calcfields ["c_hrsize"] = new CalcFieldHRSize (); echo ("<div class=\"pgdbfiles\">\n\n"); echo ("<h2>Download this ebook for free</h2>\n\n"); class FilesTable extends ListTable { function FilesTable () { global $newfilesys, $offer_recode, $help, $helpicon; if (!$newfilesys) { $this->AddSimpleColumn ("edition", "Edition", "narrow pgdbfilesedition"); } $footnote = ($offer_recode) ? " \xC2\xB9" : ""; $this->AddSimpleColumn ("filetype", "Format <a href=\"$help#Format\" title=\"Explain Format.\">$helpicon</a>", "pgdbfilesformat"); $this->AddSimpleColumn ("fk_encodings", "Encoding$footnote <a href=\"$help#Encoding\" title=\"Explain Encoding.\">$helpicon</a>", "pgdbfilesencoding"); $this->AddSimpleColumn ("compression", "Compression <a href=\"$help#Compression\" title=\"Explain Compression.\">$helpicon</a>", "pgdbfilescompression"); $this->AddSimpleColumn ("c_hrsize", "Size", "right narrow pgdbfilessize"); $this->AddColumnObject (new DownloadColumn ()); $this->limit = -1; } } $array = array (); function epub_file ($fk_books) { return "/cache/epub/$fk_books/pg$fk_books.epub"; } function epub_images_file ($fk_books) { return "/cache/epub/$fk_books/pg${fk_books}-images.epub"; } function mobi_file ($fk_books) { return "/cache/epub/$fk_books/pg$fk_books.mobi"; } function mobi_images_file ($fk_books) { return "/cache/epub/$fk_books/pg${fk_books}-images.mobi"; } $epub = epub_file ($fk_books); $epub_images = epub_images_file ($fk_books); $mobi = mobi_file ($fk_books); $mobi_images = mobi_images_file ($fk_books); // epub stuff if (is_readable ("$config->documentroot$epub") && filesize ("$config->documentroot$epub") > 1024) { if (!$newfilesys) { $array [] = ""; } $array [] = "EPUB (experimental) <a href=\"$help#EPUB\" title=\"Explain EPUB.\">$helpicon</a>"; $array [] = ""; $array [] = ""; $array [] = human_readable_size (filesize ("$config->documentroot$epub")); $array [] = "<a href=\"$epub\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; } if (is_readable ("$config->documentroot$epub_images") && filesize ("$config->documentroot$epub_images") > 1024) { if (!$newfilesys) { $array [] = ""; } $array [] = "EPUB with images (experimental) <a href=\"$help#EPUB\" title=\"Explain EPUB.\">$helpicon</a>"; $array [] = ""; $array [] = ""; $array [] = human_readable_size (filesize ("$config->documentroot$epub_images")); $array [] = "<a href=\"$epub_images\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; } // mobi stuff if (is_readable ("$config->documentroot$mobi") && filesize ("$config->documentroot$mobi") > 1024) { if (!$newfilesys) { $array [] = ""; } $array [] = "MOBI (experimental) <a href=\"$help#MOBI\" title=\"Explain MOBI.\">$helpicon</a>"; $array [] = ""; $array [] = ""; $array [] = human_readable_size (filesize ("$config->documentroot$mobi")); $array [] = "<a href=\"$mobi\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; } if (is_readable ("$config->documentroot$mobi_images") && filesize ("$config->documentroot$mobi_images") > 1024) { if (!$newfilesys) { $array [] = ""; } $array [] = "MOBI with images (experimental) <a href=\"$help#MOBI\" title=\"Explain MOBI.\">$helpicon</a>"; $array [] = ""; $array [] = ""; $array [] = human_readable_size (filesize ("$config->documentroot$mobi_images")); $array [] = "<a href=\"$mobi_images\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; } // plucker stuff if ($offer_plucker) { if (!$newfilesys) { $array [] = ""; } $array [] = "Plucker <a href=\"$help#Plucker\" title=\"Explain Plucker.\">$helpicon</a>"; $array [] = ""; $array [] = ""; $array [] = "unknown"; $array [] = "<a href=\"/cache/plucker/$fk_books/$fk_books\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; # gbn: mobile ebooks. If Plucker conversion works, this should work, too: if (!$newfilesys) { $array [] = ""; } $array [] = "Mobile eBooks <a href=\"$help#Mobile\" title=\"Explain Mobile.\">$helpicon</a>"; $array [] = ""; $array [] = ""; $array [] = "unknown"; $array [] = "<a href=\"mobile/mobile.php?fk_books=$fk_books\" title=\"Download from ibiblio.org.\"><span style=\"font-weight: bold\">main site</span></a>"; } $table = new FilesTable (); $table->summary = "Table of available file types and sizes."; $table->toprows = $array; $table->PrintTable ($db, "Formats Available For Download <a href=\"$help#Table:_Formats_Available_For_Download\" title=\"Explain this table.\">$helpicon</a>", "pgdbfiles"); echo ("</div>\n\n"); if ($offer_recode) { $recode_encoding = strtoupper ($recode_candidate->encoding); p ("\xC2\xB9 If you need a special character set, try our " . "<a href=\"$config->world/recode?file=$recode_candidate->filename" . "&from=$recode_encoding\" rel=\"nofollow\">" . "online recoding service</a>."); } pagefooter (0); // implements a page cache // if this page is viewed it will write a static version // into the etext cache directory // MultiViews and mod_rewrite then take care to serve // the static page to the next requester $cachedir = "$config->documentroot/cache/bibrec/$fk_books"; umask (0); mkdir ($cachedir); $cachefile = "$cachedir/$fk_books.html.utf8"; $hd = fopen ($cachefile, "w"); if ($hd) { fwrite ($hd, $output); fclose ($hd); } $hd = gzopen ("$cachefile.gz", "w9"); if ($hd) { gzwrite ($hd, $output); gzclose ($hd); } exit (); ?> From joey at joeysmith.com Wed Jul 29 08:48:23 2009 From: joey at joeysmith.com (Joey Smith) Date: Wed, 29 Jul 2009 09:48:23 -0600 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <20090729132708.GA31539@mail.pglaf.org> References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> <20090728143306.GA28986@mail.pglaf.org> <20090729002407.GA5389@joeysmith.com> <20090729132708.GA31539@mail.pglaf.org> Message-ID: <20090729154823.GB5389@joeysmith.com> On Wed, Jul 29, 2009 at 06:27:08AM -0700, Greg Newby wrote: > Thanks for your interest :) > > It isn't bundled up for download anywhere. We'll probably need to wait > for Marcello's return from vacation to provide details on how to add > components like this. The current system is modular & (I think) > well-organized, but complex...including lots of stuff that readers never > see (such as the cataloger interface and various programs that add new > files). Plus, as you know, there is a lot of stuff that is in the > Wiki, rather than PHP. The Wiki might be where new features could > be added, or there might be modules "out there" that could make it easier. > > I did grab catalog/world/bibrec.php , where bibrecs like this are > made: > http://www.gutenberg.org/etext/11 > > It is below. This should give you an idea how where various things are > tied in from the database, the on-disk cached records, and stuff that is > generated on the fly. The various .phh files it references (which cascade > to include a whole bunch of stuff) are mostly for presentation (html > and css), not functionality. > > A bookshelf/shopping cart would probably be a brand new set of files, > with just a little overlap with the existing php. It would need to > access the database, and presumably would need a table or two to keep > track of bookshelf users & entries. (Maybe a separate database...maybe > part of the Wiki instead of a standalone set of PHP programs.) Cookies, > or something similar, could be used to track user sessions and their > bookshelves/shopping carts/whatever, and add an entry to various pages > at www.gutenberg.org for them to access it (sort of like a regular > ecommerce site). You know, now that I look at this code, I recall looking over this stuff with Marcello once, years ago...doesn't look like it has changed much. I'll drop a note to Marcello and wait to hear from him. Thanks, Greg! From desrod at gnu-designs.com Wed Jul 29 09:13:07 2009 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Wed, 29 Jul 2009 12:13:07 -0400 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <20090729154823.GB5389@joeysmith.com> References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> <20090728143306.GA28986@mail.pglaf.org> <20090729002407.GA5389@joeysmith.com> <20090729132708.GA31539@mail.pglaf.org> <20090729154823.GB5389@joeysmith.com> Message-ID: <a82cdbb90907290913m156cfdcepfa72a2b4bf6f5898@mail.gmail.com> On Wed, Jul 29, 2009 at 11:48 AM, Joey Smith<joey at joeysmith.com> wrote: > You know, now that I look at this code, I recall looking over this stuff > with Marcello once, years ago...doesn't look like it has changed much. I'll > drop a note to Marcello and wait to hear from him. That code could easily be 1/3 the size, but if it works... no need to go breaking things. :) From gbnewby at pglaf.org Wed Jul 29 10:25:10 2009 From: gbnewby at pglaf.org (Greg Newby) Date: Wed, 29 Jul 2009 10:25:10 -0700 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com> References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> <20090728143306.GA28986@mail.pglaf.org> <a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com> Message-ID: <20090729172510.GC8946@mail.pglaf.org> On Tue, Jul 28, 2009 at 12:17:23PM -0400, David A. Desrosiers wrote: > On Tue, Jul 28, 2009 at 10:33 AM, Greg Newby<gbnewby at pglaf.org> wrote: > > A more general approach would be to let visitors to www.gutenberg.org > > put their selected files (including those generated on-the-fly) > > on a bookshelf (i.e., shopping cart), then download in one big file, > > or several small ones. > > If you're looking at it at that level, why not just offer some > streaming audio of the books as well? You could do this very simply > with any number of dozens of dynamic content streaming applications in > whatever language you choose (Perl, PHP, Python, Java, etc.) This is a good point. I don't know why we don't have streaming, especially since iBiblio does have streaming (I think). If you could suggest some software that seems likely to work on the iBiblio server (Apache, PHP, Perl all on Linux; free), especially that could just be dropped into bibrec.php that I sent earlier, that would be a tremendous help. The funny part is that I get inquiries all the time via help@ on "how do I save an audio file locally?" It seems the most common audio listening experience is to download & play back (perhaps with a delay for the download to complete), so people are doing the same thing as streaming (i.e., immediate listening), but needing to wait for the download to complete. It would be nice to offer streaming, instead. -- Greg > I actually used one to demo for a DJ/Amtrak train conductor several > months back. He wanted a way to pull the tags/artists out of his > enormous mp3 collection, and in 15 minutes on the train (with 'net), I > found one that would let him "radio-enable" his entire mp3 collection, > including a web interface to stream, play, download, view, sort, > browse all of the artists by collection, tag, album art, date, etc. > all in Perl. > > It should be a simple matter to have something similar latched onto > the Gutenberg audio collection, so anyone can click on the audiobook > to either download, stream, convert, etc. the book in whatever format > they prefer. > > Just an idea... From Bowerbird at aol.com Wed Jul 29 15:00:16 2009 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 29 Jul 2009 18:00:16 EDT Subject: [gutvol-d] someone needs to Message-ID: <d22.49174583.37a22070@aol.com> someone needs to remount independently the scan-set from every single public-domain book that google has scanned... might that be project gutenberg? if not, then someone else needs to do it. because it needs to be done... those books don't belong to google, or to sony, or to barnes & noble, or to anyone else that google decides to share them with. those books belong to the public. -bowerbird ************** Hot Deals at Dell on Popular Laptops perfect for Back to School (http://pr.atwola.com/promoclk/100126575x1223106546x1201717234/aol?redir=http:%2F%2Faltfarm.mediaplex.com%2Fad%2Fck%2F12309%2D81939%2D1629%2D8) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090729/3760e5e3/attachment.html> From i30817 at gmail.com Wed Jul 29 15:11:04 2009 From: i30817 at gmail.com (Paulo Levi) Date: Wed, 29 Jul 2009 23:11:04 +0100 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <20090729172510.GC8946@mail.pglaf.org> References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> <20090728143306.GA28986@mail.pglaf.org> <a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com> <20090729172510.GC8946@mail.pglaf.org> Message-ID: <212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com> Tell me, please, if the gutenberg rtf index file, besides being autogenerated, is also sorted. What i mean is, i'm indexing parts of the file, and i gain a major speed up of treating the file as: a massive list of pgterms:etext definitions followed by a (more massive) list of pgterms:file definitions This allows the string comparisons i have to do to be lower (about n/2), but at the cost that any etext record in between the second list is not picked up. That won't be changed, but is only for my peace of mind. Also, in the pgterms:file records, are the records referring to the same file consecutive ? I ask because if so, i could do the same sort of filtering Aaron Cannon is doing in its dvd project, to speed up the index some more and remove duplicates. (If they aren't consecutive i would have to issue queries between building the index to see if they were already inserted). I have nothing against xpath, indeed i think the scanning of the file in lucene already uses something similar. But i need free text searches, and they have to be fast (i'm already experimenting with a memory cache after the query too, and it works okish for my application) From i30817 at gmail.com Wed Jul 29 15:13:01 2009 From: i30817 at gmail.com (Paulo Levi) Date: Wed, 29 Jul 2009 23:13:01 +0100 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com> References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> <20090728143306.GA28986@mail.pglaf.org> <a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com> <20090729172510.GC8946@mail.pglaf.org> <212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com> Message-ID: <212322090907291513h478330a8h36701a67bea021d@mail.gmail.com> Of course if they were sorted by priory say, most feature full free format : html -> rtf -> text UTF-8 -> ASCII would be very nice too. From hart at pobox.com Wed Jul 29 18:49:02 2009 From: hart at pobox.com (Michael S. Hart) Date: Wed, 29 Jul 2009 17:49:02 -0800 (AKDT) Subject: [gutvol-d] How To Get and Use eBooks on Cell Phones Message-ID: <alpine.DEB.1.00.0907291748440.12450@snowy.arsc.alaska.edu> How To Get and Use eBooks on Cell Phones This is a request for anyone to submit ideas for little "How To's" for their particular brands and models of cell phones. Nothing fancy to start off with, just the bare bones of how to get the eBook into the phones and how to read them in comfortable ways in terms of setting font size, zoom, or the like. Eventually we hope to create "How To" files for each model, and in ways that will encourage a greater usage of the nearly 4.5 billion cell phones now in use, and to give some worthwhile uses to phones that are no longer in service, but still work well as eReaders. If your phone has WiFi built in, so much the better. If nothing more, please just send the barest outline, and we would hope to get others to fill it in to make it more user friendly. Many Many Thanks!!! Michael S. Hart Founder Project Gutenberg Inventor of ebooks If you ever do not get a prompt response, please resend, then keep resending, I won't mind getting several copies per week. From cannona at fireantproductions.com Wed Jul 29 20:50:51 2009 From: cannona at fireantproductions.com (Aaron Cannon) Date: Wed, 29 Jul 2009 22:50:51 -0500 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <212322090907291513h478330a8h36701a67bea021d@mail.gmail.com> References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> <20090728143306.GA28986@mail.pglaf.org> <a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com> <20090729172510.GC8946@mail.pglaf.org> <212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com> <212322090907291513h478330a8h36701a67bea021d@mail.gmail.com> Message-ID: <628c29180907292050o60dc29e0n2600a37ebcdf4677@mail.gmail.com> Just to clarify, you wrote: "Also, in the pgterms:file records, are the records referring to the same file consecutive ? I ask because if so, i could do the same sort of filtering Aaron Cannon is doing in its dvd project, to speed up the index some more and remove duplicates. (If they aren't consecutive i would have to ?issue queries ?between building the index to see if they were already inserted)." There are actually not really any duplicates that I am aware of in the RDF catalog. It's just that most books are in the archive in more than one encoding or format. What I have been filtering out are the more lossey encodeings (like ASCII) when there is a less lossey one available (like UTF-8). As for the sorting, I don't know for sure, but it seems like the current ordering is likely an artifact of the way the RDF was generated. Whether or not you want to rely on that never changing is up to you. I haven't followed the thread closely enough to know what you're trying to do, but it sounds as though you might be using the RDF in a way which it was never intended. What I mean by that is you seem to be trying to read directly from it like a database when someone does a search, rather than loading the RDF into an actual database, and reading from that. Having just recently worked on a Python app which parses the RDF into memory, I can tell you that parsing the XML is the slowest part of the process, at least in my application. Your mileage may vary, but when you have to do tens-of-thousands of string comparisons from a file which is roughly 100MB in size before you can return a result in a web app (I'm assuming it's a web app), you're likely going to have problems. Good luck. Aaron On 7/29/09, Paulo Levi <i30817 at gmail.com> wrote: > Of course if they were sorted by priory say, most feature full free > format : html -> rtf -> text UTF-8 -> ASCII would be very nice too. > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From i30817 at gmail.com Wed Jul 29 21:40:08 2009 From: i30817 at gmail.com (Paulo Levi) Date: Thu, 30 Jul 2009 05:40:08 +0100 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <628c29180907292050o60dc29e0n2600a37ebcdf4677@mail.gmail.com> References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> <20090728143306.GA28986@mail.pglaf.org> <a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com> <20090729172510.GC8946@mail.pglaf.org> <212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com> <212322090907291513h478330a8h36701a67bea021d@mail.gmail.com> <628c29180907292050o60dc29e0n2600a37ebcdf4677@mail.gmail.com> Message-ID: <212322090907292140t454d0390r8a21253f6b41a0fa@mail.gmail.com> But i am reading the rdf into a (file) database. That is more or less what Lucene is. What i am filtering is just what insert into the database, so that its creation is faster / searches only on the fields that interest me. Sure its a lot of code, that will break if the format changes, but it reduced the creation step from 5 minutes or so to 40 seconds (this on a fast dual-core computer - i shudder to think what would happen if a user tried to re-index in a 1000 hertz machine). The index is at about 33.5 mb, and should compress into < 10mb. Probably enough to be included into the application. From schultzk at uni-trier.de Wed Jul 29 22:55:01 2009 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Thu, 30 Jul 2009 07:55:01 +0200 Subject: [gutvol-d] Re: How To Get and Use eBooks on Cell Phones In-Reply-To: <alpine.DEB.1.00.0907291748440.12450@snowy.arsc.alaska.edu> References: <alpine.DEB.1.00.0907291748440.12450@snowy.arsc.alaska.edu> Message-ID: <D9159201-1785-4F9E-ACBC-2894584F143A@uni-trier.de> Hi Micheal, The problem is a slightly more complicated than you state, though you probably know that! 1) What programs do you have on the cell phone. 2) What is availible for free! can be still gotten. 3) What software/hardwarte for uploading to the cell phone 4) If no WifI(or connection availible) what harware/software the user has If a cell phone has Wifi it is a relatively new phone. so it will be able to handle html, most likely PDF, some rtf, doc, it all depends. regards Keith. Am 30.07.2009 um 03:49 schrieb Michael S. Hart: > > How To Get and Use eBooks on Cell Phones > > > This is a request for anyone to submit ideas for little "How To's" > for their particular brands and models of cell phones. > > Nothing fancy to start off with, just the bare bones of how to get > the eBook into the phones and how to read them in comfortable ways > in terms of setting font size, zoom, or the like. > > Eventually we hope to create "How To" files for each model, and in > ways that will encourage a greater usage of the nearly 4.5 billion > cell phones now in use, and to give some worthwhile uses to phones > that are no longer in service, but still work well as eReaders. > > If your phone has WiFi built in, so much the better. > > If nothing more, please just send the barest outline, and we would > hope to get others to fill it in to make it more user friendly. > > > Many Many Thanks!!! > > > > Michael S. Hart > Founder > Project Gutenberg > Inventor of ebooks > > > If you ever do not get a prompt response, please resend, then > keep resending, I won't mind getting several copies per week. > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d From schultzk at uni-trier.de Wed Jul 29 23:03:06 2009 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Thu, 30 Jul 2009 08:03:06 +0200 Subject: [gutvol-d] Re: someone needs to In-Reply-To: <d22.49174583.37a22070@aol.com> References: <d22.49174583.37a22070@aol.com> Message-ID: <FBEBAD1D-AAA5-467F-B3F4-9DDD29CAE1E3@uni-trier.de> Tsk, Tsk, Tsk, Bower you should know better. Though I agree with you morally, but ... Shakespeare is public domain! But, is I scan say the folios, it is mine to do with it I want to and use what whatever copyright I personally care for!! The same goes for google. Of course it would be nice to have these scans. Just on a side nopte I thought you thought the google scans were not that good according to you!?? regards Keith Am 30.07.2009 um 00:00 schrieb Bowerbird at aol.com: > someone needs to remount independently the scan-set from > every single public-domain book that google has scanned... > > might that be project gutenberg? > > if not, then someone else needs to do it. > > because it needs to be done... > > those books don't belong to google, or to sony, or to > barnes & noble, or to anyone else that google decides > to share them with. those books belong to the public. > > -bowerbird > > > > ************** > Hot Deals at Dell on Popular Laptops perfect for Back to School (http://pr.atwola.com/promoclk/100126575x1223106546x1201717234/aol?redir=http:%2F%2Faltfarm.mediaplex.com%2Fad%2Fck%2F12309%2D81939%2D1629%2D8 > ) _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090730/e7b0f016/attachment.html> From schultzk at uni-trier.de Wed Jul 29 23:26:17 2009 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Thu, 30 Jul 2009 08:26:17 +0200 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <212322090907292140t454d0390r8a21253f6b41a0fa@mail.gmail.com> References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> <20090728143306.GA28986@mail.pglaf.org> <a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com> <20090729172510.GC8946@mail.pglaf.org> <212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com> <212322090907291513h478330a8h36701a67bea021d@mail.gmail.com> <628c29180907292050o60dc29e0n2600a37ebcdf4677@mail.gmail.com> <212322090907292140t454d0390r8a21253f6b41a0fa@mail.gmail.com> Message-ID: <C1FEBF04-128F-419D-9AEA-0AA8B25A1B8E@uni-trier.de> Hi Paulo, Am 30.07.2009 um 06:40 schrieb Paulo Levi: > But i am reading the rdf into a (file) database. That is more or less > what Lucene is. > What i am filtering is just what insert into the database, so that its > creation is faster / searches only on the fields that interest me. > > Sure its a lot of code, that will break if the format changes, but it If you program modularely that would be no problem > reduced the creation step from 5 minutes or so to 40 seconds (this on If you are filtering and just get a factor of 13 I said it your system that is slow. If I remember correctly you are just request?ng the certain information so somebody else is doing the work! > a fast dual-core computer - i shudder to think what would happen if a > user tried to re-index in a 1000 hertz machine). Let's see. My Mac SE was a 1 Mega Hertz machine. That was twenty years ago. It would handle something like this in about ten minutes. I do not know what dbase system I was using. > > The index is at about 33.5 mb, and should compress into < 10mb. > Probably enough to be included into the application. Hardcoding data of that size into the program is not feasible. Though most newer computers can load it into memory quite quickly. gives you a factor of 100 if everything is in memory that is why perl is so fast. regard Keith From joey at joeysmith.com Wed Jul 29 23:27:03 2009 From: joey at joeysmith.com (Joey Smith) Date: Thu, 30 Jul 2009 00:27:03 -0600 Subject: [gutvol-d] Re: someone needs to In-Reply-To: <FBEBAD1D-AAA5-467F-B3F4-9DDD29CAE1E3@uni-trier.de> References: <d22.49174583.37a22070@aol.com> <FBEBAD1D-AAA5-467F-B3F4-9DDD29CAE1E3@uni-trier.de> Message-ID: <20090730062703.GE5389@joeysmith.com> On Thu, Jul 30, 2009 at 08:03:06AM +0200, Keith J. Schultz wrote: > Tsk, Tsk, Tsk, > > Bower you should know better. > > Though I agree with you morally, but ... > > Shakespeare is public domain! But, is I scan say the > folios, it is mine to do with it I want to and use what > whatever copyright I personally care for!! The same > goes for google. > > Of course it would be nice to have these scans. > > Just on a side nopte I thought you thought the google > scans were not that good according to you!?? > > regards > Keith > Wouldn't this amount to a "sweat of the brow" copyright? As I understand it, the USA (at least) has rejected the concept of "sweat of the brow" copyright. Or am I missing something? From hart at pobox.com Thu Jul 30 04:37:54 2009 From: hart at pobox.com (Michael S. Hart) Date: Thu, 30 Jul 2009 03:37:54 -0800 (AKDT) Subject: [gutvol-d] Re: someone needs to In-Reply-To: <FBEBAD1D-AAA5-467F-B3F4-9DDD29CAE1E3@uni-trier.de> References: <d22.49174583.37a22070@aol.com> <FBEBAD1D-AAA5-467F-B3F4-9DDD29CAE1E3@uni-trier.de> Message-ID: <alpine.DEB.1.00.0907300333230.30609@snowy.arsc.alaska.edu> No. . .under U.S. law just scanning, xeroxing, photographing or otherwise reproducing a two-dimensional object, two-dimensional result that closely resembles the original. . .no copyrights!!! There must be intellectual input to get a new copyright. . . . However, this might not be true in other countries where sweat of the brow, as it is called, gets you a new copyright, so the answer could be yes for Mr. Schultz and no for Mr. Bowerbird-- or vice versa, depending on how you ask the question. mh On Thu, 30 Jul 2009, Keith J. Schultz wrote: > Tsk, Tsk, Tsk, > > Bower you should know better. > > Though I agree with you morally, but ... > > Shakespeare is public domain! But, is I scan say the > folios, it is mine to do with it I want to and use what > whatever copyright I personally care for!! The same > goes for google. > > Of course it would be nice to have these scans. > > Just on a side nopte I thought you thought the google > scans were not that good according to you!?? > > regards > Keith > > Am 30.07.2009 um 00:00 schrieb Bowerbird at aol.com: > > > someone needs to remount independently the scan-set from > > every single public-domain book that google has scanned... > > > > might that be project gutenberg? > > > > if not, then someone else needs to do it. > > > > because it needs to be done... > > > > those books don't belong to google, or to sony, or to > > barnes & noble, or to anyone else that google decides > > to share them with. those books belong to the public. > > > > -bowerbird > > > > > > > > ************** > > Hot Deals at Dell on Popular Laptops perfect for Back to School > > (http://pr.atwola.com/promoclk/100126575x1223106546x1201717234/aol?redir=http:%2F%2Faltfarm.mediaplex.com%2Fad%2Fck%2F12309%2D81939%2D1629%2D8) > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d at lists.pglaf.org > > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From i30817 at gmail.com Thu Jul 30 10:18:50 2009 From: i30817 at gmail.com (Paulo Levi) Date: Thu, 30 Jul 2009 18:18:50 +0100 Subject: [gutvol-d] Re: Fwd: Programmatic fetching books from Gutenberg In-Reply-To: <C1FEBF04-128F-419D-9AEA-0AA8B25A1B8E@uni-trier.de> References: <891262632.99249.1248790575502.JavaMail.mail@webmail05> <20090728143306.GA28986@mail.pglaf.org> <a82cdbb90907280917h26fda315he470a9520b07d65c@mail.gmail.com> <20090729172510.GC8946@mail.pglaf.org> <212322090907291511r2d2a2ebfw198542bca2c2eb76@mail.gmail.com> <212322090907291513h478330a8h36701a67bea021d@mail.gmail.com> <628c29180907292050o60dc29e0n2600a37ebcdf4677@mail.gmail.com> <212322090907292140t454d0390r8a21253f6b41a0fa@mail.gmail.com> <C1FEBF04-128F-419D-9AEA-0AA8B25A1B8E@uni-trier.de> Message-ID: <212322090907301018u65b18eefgc7854e38aa2ada67@mail.gmail.com> >> reduced the creation step from 5 minutes or so to 40 seconds (this on > > ? ? ? ?If you are filtering and just get a factor of 13 I said it your > system that > ? ? ? ?is slow. If I remember correctly you are just request?ng the certain > ? ? ? ?information so somebody else is doing the work! > It's not a server application, so the client is (potentially) doing the indexing if he want to update the catalog. Its the indexing that takes 40 s. > >> >> The index is at about 33.5 mb, and should compress into < 10mb. >> Probably enough to be included into the application. > > ? ? ? ?Hardcoding data of that size into the program is not > ? ? ? ?feasible. Though most newer computers can load it into > ? ? ? ?memory quite quickly. gives you a factor of 100 if everything > ? ? ? ?is in memory that is why perl is so fast. Including everything in memory would more than double my program heap, and don't forget that this is a java application, so it would never be released before it ending (or at least a sub process ending). Besides, as lucene uses files, i think i can't use the in memory index to search the rdf (using LuceneSail, that uses Sesame and Lucene) From Bowerbird at aol.com Thu Jul 30 23:50:18 2009 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 31 Jul 2009 02:50:18 EDT Subject: [gutvol-d] brewster kahle responds -- someone needs to (fwd) Message-ID: <cdd.5592ca09.37a3ee2a@aol.com> michael hart forwarded my post to brewster kahle, who responded: > bowerbird-- > > some very active volunteers have been taking the downloadable > pdf's from google's site and uploading them to the archive.? > the archive stores these, OCR's them to make them searchable, > and if someone wants the pdf-- points them back to google. > > we would like to see more of this.? if there are volunteers > to expand this program, we would like to play our part. > > -brewster i admit the internet archive has fallen off my radar somewhat -- ever since i was banned from their listserves because i had the temerity to continue complaining about their o.c.r. flaws -- but i should have remembered that they are indeed remounting google's public-domain scans, and deserve kudos for doing so. so thank you, brewster, and internet archive. "universal access to knowledge" has always appealed to me, by virtue of its deep simplicity, but i find that more and more, lately, brewster, i am chagrined that google has subverted it... instead of a global library, we're getting a global book-store... and "free to all" increasingly means "dig out your wallet"... -bowerbird ************** An Excellent Credit Score is 750. See Yours in Just 2 Easy Steps! (http://pr.atwola.com/promoclk/100126575x1222846709x1201493018/aol?redir=http://www.freecreditreport.com/pm/default.aspx?sc=668072&hmpgID=62& bcd=JulyExcfooterNO62) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20090731/ff7a0a6d/attachment-0001.html> From gbnewby at pglaf.org Fri Jul 31 12:28:49 2009 From: gbnewby at pglaf.org (Greg Newby) Date: Fri, 31 Jul 2009 12:28:49 -0700 Subject: [gutvol-d] Re: [Fwd: Re: someone needs to (fwd)] (fwd) In-Reply-To: <alpine.DEB.1.00.0907301823030.25934@snowy.arsc.alaska.edu> References: <alpine.DEB.1.00.0907301823030.25934@snowy.arsc.alaska.edu> Message-ID: <20090731192849.GA17410@mail.pglaf.org> Forwarding from Brewster Kahle: > From: Brewster Kahle <brewster at archive.org> > To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com > CC: "Michael S. Hart" <hart at pglaf.org>, John Guagliardo <john at gutenberg.cc> > Subject: Re: [gutvol-d] someone needs to (fwd) > > > bowerbird-- > > some very active volunteers have been taking the downloadeable pdf's > from google's site and uploading them to the archive. the archive > stores these, OCR's them to make them searchable, and if someone wants > the pdf-- points them back to google. > > we would like to see more of this. if there are volunteers to expand > this program, we would like to play our part. > > -brewster > > > > Michael S. Hart wrote: >> >> ---------- Forwarded message ---------- >> Date: Wed, 29 Jul 2009 18:00:16 EDT >> From: Bowerbird at aol.com >> Reply-To: Project Gutenberg Volunteer Discussion <gutvol-d at lists.pglaf.org> >> To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com >> Subject: [gutvol-d] someone needs to >> >> someone needs to remount independently the scan-set from >> every single public-domain book that google has scanned... >> >> might that be project gutenberg? >> >> if not, then someone else needs to do it. >> >> because it needs to be done... >> >> those books don't belong to google, or to sony, or to >> barnes & noble, or to anyone else that google decides >> to share them with. those books belong to the public. >> >> -bowerbird From hart at pobox.com Fri Jul 31 19:49:17 2009 From: hart at pobox.com (Michael S. Hart) Date: Fri, 31 Jul 2009 18:49:17 -0800 (AKDT) Subject: [gutvol-d] 1984 and Animal Farm Message-ID: <alpine.DEB.1.00.0907311849000.4445@snowy.arsc.alaska.edu> Does anyone know about how many Kindle copies were erased? thanks!!! Michael From jimad at msn.com Fri Jul 31 20:36:02 2009 From: jimad at msn.com (Jim Adcock) Date: Fri, 31 Jul 2009 20:36:02 -0700 Subject: [gutvol-d] Re: 1984 and Animal Farm In-Reply-To: <alpine.DEB.1.00.0907311849000.4445@snowy.arsc.alaska.edu> References: <alpine.DEB.1.00.0907311849000.4445@snowy.arsc.alaska.edu> Message-ID: <BAY120-DAV2AA5FC2B2C54CD4529123AE110@phx.gbl> I can't find a count of books sold anywhere, but here is an interesting apology from Bezos: http://tinyurl.com/m3c9z5 From jimad at msn.com Fri Jul 31 20:40:30 2009 From: jimad at msn.com (Jim Adcock) Date: Fri, 31 Jul 2009 20:40:30 -0700 Subject: [gutvol-d] Re: 1984 and Animal Farm In-Reply-To: <alpine.DEB.1.00.0907311849000.4445@snowy.arsc.alaska.edu> References: <alpine.DEB.1.00.0907311849000.4445@snowy.arsc.alaska.edu> Message-ID: <BAY120-DAV10105319ABB9FDA54AFC45AE110@phx.gbl> And here's a copy of the "Kindle Ate My Homework" Lawsuit: http://tinyurl.com/n9tm6s