From marcello at perathoner.de Wed Jun 1 11:32:49 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Jun 1 11:32:59 2005 Subject: [gutvol-d] WIPO Online Forum on Intellectual Property in the Information Society Message-ID: <429DFF51.7080007@perathoner.de> Welcome to the Online Forum on Intellectual Property in the Information Society, hosted by the World Intellectual Property Organization (WIPO) from June 1 to 15, 2005. The WIPO Online Forum is designed to enable and encourage an open debate on issues related to intellectual property in the information society, and in light of the goals of the World Summit on the Information Society (WSIS). This presents a unique opportunity for all to engage in the emerging debate on intellectual property in our day. The 10 themes for discussion are listed below - scroll down to select a theme. The WIPO Online Forum is open to participation by all interested persons ? you are invited to join in online discussions over a period of two weeks from June 1, 2005. It is hoped that the Online Forum will further inform the discussions taking place during the second phase of WSIS. The conclusions of the Online Forum will form part of WIPO?s contribution to the WSIS Tunis Summit. http://www.wipo.int/ipisforum/en/ -- Marcello Perathoner webmaster@gutenberg.org From servalan at ar.com.au Thu Jun 2 19:10:27 2005 From: servalan at ar.com.au (Pauline) Date: Thu Jun 2 19:11:15 2005 Subject: [gutvol-d] DP is back up In-Reply-To: <429C1CE9.7030206@ar.com.au> References: <20050531072555.GA20636@pglaf.org> <429C1CE9.7030206@ar.com.au> Message-ID: <429FBC13.9050503@ar.com.au> Hi All, DP is back up now. Come & have a look at our new site. http://www.pgdp.net Thanks for your patience, P -- Help digitise public domain books: Distributed Proofreaders: http://www.pgdp.net "Preserving history one page at a time." Set free dead-tree books: http://bookcrossing.com/referral/servalan From kouhia at nic.funet.fi Mon Jun 6 10:18:24 2005 From: kouhia at nic.funet.fi (Juhana Sadeharju) Date: Mon Jun 6 10:18:35 2005 Subject: [gutvol-d] Re: WIPO Online Forum on Intellectual Property Message-ID: >From: Marcello Perathoner < > >Welcome to the Online Forum on Intellectual Property in the Information >Society, hosted by the World Intellectual Property Organization (WIPO) >from June 1 to 15, 2005. What is the aim of this project? I suggest the copyright period would be changed so that each book has fixed 50 years copyright protection. Would this suggestion be seriously considered in WIPO? Or does the Disn... money talk. How about my suggestion on making anything patentable without costs so that we who develop free software could protect our intellectual property as well? All disagreements on IPs would be settled in the courts with money. Patent offices would not spend money in examining the patents. Now patent system discriminates us who don't take money from our products. Whos intellectual property WIPO is after? Who or what companies are behind the WIPO? Juhana -- http://music.columbia.edu/mailman/listinfo/linux-graphics-dev for developers of open source graphics software From marcello at perathoner.de Mon Jun 6 11:01:03 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon Jun 6 11:01:13 2005 Subject: [gutvol-d] Re: WIPO Online Forum on Intellectual Property In-Reply-To: References: Message-ID: <42A48F5F.5040708@perathoner.de> Juhana Sadeharju wrote: >>From: Marcello Perathoner < >> >>Welcome to the Online Forum on Intellectual Property in the Information >>Society, hosted by the World Intellectual Property Organization (WIPO) > >>from June 1 to 15, 2005. > > What is the aim of this project? > > I suggest the copyright period would be changed so that each book > has fixed 50 years copyright protection. Would this suggestion > be seriously considered in WIPO? Or does the Disn... money talk. > > How about my suggestion on making anything patentable without costs > so that we who develop free software could protect our intellectual > property as well? All disagreements on IPs would be settled in the > courts with money. Patent offices would not spend money in examining > the patents. Now patent system discriminates us who don't take > money from our products. > > Whos intellectual property WIPO is after? Who or what companies > are behind the WIPO? WIPO stands for World Intellectual Property Organisation Very basically its a treaty governing international patent and copyright issues. More information at: www.wipo.int We at PG should comment about the detrimental effects of overly long copyrights on culture and education. -- Marcello Perathoner webmaster@gutenberg.org From sly at victoria.tc.ca Mon Jun 6 18:29:02 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Mon Jun 6 18:29:18 2005 Subject: [gutvol-d] Re: WIPO Online Forum on Intellectual Property In-Reply-To: References: Message-ID: On Mon, 6 Jun 2005, Juhana Sadeharju wrote: > > I suggest the copyright period would be changed so that each book > has fixed 50 years copyright protection. Would this suggestion > be seriously considered in WIPO? Or does the Disn... money talk. > I believe you would need to go back in copyright history a little bit. I believe the basis of terms etc. under WIPO is based on the Berne convention. This convention (first formulated in 1886) is the most wide-spread international copyright agreement. It sets out a basic minimum copyright term of life+50. The US avoided signing onto this treaty until near the end of the twentieth century. Unfortunately, they along a few other countries, have enacted laws which grant a copyright longer than the minimum. I would suggest that at this point in time, attempts to change the minimum term enacted in the Berne convention would be useless. If possible, it might be good to encourage National laws to stay with that minimum--to present countries which do so as progressive. (Some people will argue the opposite--that countries with a life+50 term are backwards, behind the times, and should "catch up" with the U.S., the U.K., et al.) Andrew From webmaster at gutenberg.org Wed Jun 8 13:29:07 2005 From: webmaster at gutenberg.org (Marcello Perathoner) Date: Wed Jun 8 13:29:21 2005 Subject: [gutvol-d] [Fwd: Ebook Reading device?] Message-ID: <42A75513.1080508@gutenberg.org> Anybody want to answer this one? -------- Original Message -------- Subject: Ebook Reading device? Date: Wed, 08 Jun 2005 21:11:35 +0100 From: Robert Sutherland To: webmaster@gutenberg.org Being now in retirement I lately became interested in E-books and was delighted - amazed, more like! - to discover Project Gutenberg. However, I have been very puzzled by the apparent absence of a simple portable device designed for reading downloaded e.books. All my searches on the internet and my inquiries of the trade have failed to trace one. I wonder if you can put me on track of one? The trade just assume that a lap-top or a PDA would be quite adequate, but neither is really suitable. I use a lap-top mostly but they are far bigger than is required, and are far from being as portable as I am sure a specific device could be. I have not found a PDA with a large enough screen to provide comfortable reading - indeed, even to take the kind of line-length used in PG, or if they do it would excessively reduce the print size, which begins to matter as one gets older. To anyone making any considerable use of e.books a specific device designed for the purpose would be a distinct asset. As far as I see from the internet, there used to be a few such devices available but they seem to have been dedicated to special file formats used exclusively by firms producing e.books for sale: the indications seem to be that their efforts to establish monopolies mostly failed and their devices ceased to be available in the market. Some at least were exclusive to USA anyway, which would not have helped someone like myself resident in UK. I raised this matter with one of the main UK computing magazines but they came back only with the standard view that a PDA would do, which of course it would not, being designed for quite different purposes. I have also enquired of several of the main computing retailers, none of whom has shown the slightest interest. I feel quite surprised that nothing specific is available - have I missed something in my researches? If I have, I'd be very grateful if you could point me in the right direction; but if I have not, then could PG perhaps set a spark to some manufacturer's imagination? I thought that perhaps a modern DVD portable player might be the answer - some very cheap models are becoming available - but from the specifications I have seen and the advice given by retailers they are unlikely to be able to take .txt, .rtf or .pdf files. If they did, one could simply put the e.books onto CD or DVD as data files - although slightly bigger than Captain Picard uses when at leisure in his quarters, a portable DVD player would be much more convenient to use than a laptop. I am currently trying to ascertain whether it might be possible to charge an existing model with a program to make it compatible? One just needs .txt, .rtf and .pdf. Yours sincerely, Robert Sutherland ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Wed Jun 8 13:36:09 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Jun 8 13:36:18 2005 Subject: [gutvol-d] [Fwd: Ebook Reading device?] Message-ID: <20050608203609.2EC9A9E792@ws6-2.us4.outblaze.com> Robert, You might be more pleased with the results on a PDA if you try an HTML edition of the book as opposed to the Text versions. In my experience, the simplified web browser in most PDAs is quite up to the task of formatting the text to nicely fit on a PDA screen. It is the hard return marks in the text files that cause the line length issues you've seen. NOTE: This doesn't help with those texts that don't have an HTML edition available, I realize. There are others on the list that may be better suited to answer about dedicated eBook readers (which I have heard of, but have no direct experience with). Josh ----- Original Message ----- From: "Marcello Perathoner" To: "Project Gutenberg volunteer discussion" Subject: [gutvol-d] [Fwd: Ebook Reading device?] Date: Wed, 08 Jun 2005 22:29:07 +0200 > > Anybody want to answer this one? > > > -------- Original Message -------- > Subject: Ebook Reading device? > Date: Wed, 08 Jun 2005 21:11:35 +0100 > From: Robert Sutherland > To: webmaster@gutenberg.org > > > > Being now in retirement I lately became interested in E-books and was > delighted - amazed, more like! - to discover Project Gutenberg. However, I > have been very puzzled by the apparent absence of a simple portable device > designed for reading downloaded e.books. All my searches on the internet > and my inquiries of the trade have failed to trace one. I wonder if you can > put me on track of one? > > The trade just assume that a lap-top or a PDA would be quite adequate, but > neither is really suitable. I use a lap-top mostly but they are far bigger > than is required, and are far from being as portable as I am sure a > specific device could be. I have not found a PDA with a large enough screen > to provide comfortable reading - indeed, even to take the kind of > line-length used in PG, or if they do it would excessively reduce the print > size, which begins to matter as one gets older. To anyone making any > considerable use of e.books a specific device designed for the purpose > would be a distinct asset. > > As far as I see from the internet, there used to be a few such devices > available but they seem to have been dedicated to special file formats used > exclusively by firms producing e.books for sale: the indications seem to be > that their efforts to establish monopolies mostly failed and their devices > ceased to be available in the market. Some at least were exclusive to USA > anyway, which would not have helped someone like myself resident in UK. > > I raised this matter with one of the main UK computing magazines but they > came back only with the standard view that a PDA would do, which of course > it would not, being designed for quite different purposes. I have also > enquired of several of the main computing retailers, none of whom has shown > the slightest interest. > > I feel quite surprised that nothing specific is available - have I missed > something in my researches? If I have, I'd be very grateful if you could > point me in the right direction; but if I have not, then could PG perhaps > set a spark to some manufacturer's imagination? > > I thought that perhaps a modern DVD portable player might be the answer - > some very cheap models are becoming available - but from the specifications > I have seen and the advice given by retailers they are unlikely to be able > to take .txt, .rtf or .pdf files. If they did, one could simply put the > e.books onto CD or DVD as data files - although slightly bigger than > Captain Picard uses when at leisure in his quarters, a portable DVD player > would be much more convenient to use than a laptop. I am currently trying > to ascertain whether it might be possible to charge an existing model with > a program to make it compatible? One just needs .txt, .rtf and .pdf. > > Yours sincerely, > Robert Sutherland > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > -- Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From hart at pglaf.org Wed Jun 8 13:44:24 2005 From: hart at pglaf.org (Michael Hart) Date: Wed Jun 8 13:44:25 2005 Subject: [gutvol-d] [Fwd: Ebook Reading device?] In-Reply-To: <20050608203609.2EC9A9E792@ws6-2.us4.outblaze.com> References: <20050608203609.2EC9A9E792@ws6-2.us4.outblaze.com> Message-ID: Palmreader and a number of other programs seem to have functions that can do at least some of what you need, not to mention the simple stripping of hard returns you can do before loading. mh On Wed, 8 Jun 2005, Joshua Hutchinson wrote: > Robert, > > You might be more pleased with the results on a PDA if you try an HTML edition of the book as opposed to the Text versions. In my experience, the simplified web browser in most PDAs is quite up to the task of formatting the text to nicely fit on a PDA screen. It is the hard return marks in the text files that cause the line length issues you've seen. > > NOTE: This doesn't help with those texts that don't have an HTML edition available, I realize. There are others on the list that may be better suited to answer about dedicated eBook readers (which I have heard of, but have no direct experience with). > > Josh > > > ----- Original Message ----- > From: "Marcello Perathoner" > To: "Project Gutenberg volunteer discussion" > Subject: [gutvol-d] [Fwd: Ebook Reading device?] > Date: Wed, 08 Jun 2005 22:29:07 +0200 > >> >> Anybody want to answer this one? >> >> >> -------- Original Message -------- >> Subject: Ebook Reading device? >> Date: Wed, 08 Jun 2005 21:11:35 +0100 >> From: Robert Sutherland >> To: webmaster@gutenberg.org >> >> >> >> Being now in retirement I lately became interested in E-books and was >> delighted - amazed, more like! - to discover Project Gutenberg. However, I >> have been very puzzled by the apparent absence of a simple portable device >> designed for reading downloaded e.books. All my searches on the internet >> and my inquiries of the trade have failed to trace one. I wonder if you can >> put me on track of one? >> >> The trade just assume that a lap-top or a PDA would be quite adequate, but >> neither is really suitable. I use a lap-top mostly but they are far bigger >> than is required, and are far from being as portable as I am sure a >> specific device could be. I have not found a PDA with a large enough screen >> to provide comfortable reading - indeed, even to take the kind of >> line-length used in PG, or if they do it would excessively reduce the print >> size, which begins to matter as one gets older. To anyone making any >> considerable use of e.books a specific device designed for the purpose >> would be a distinct asset. >> >> As far as I see from the internet, there used to be a few such devices >> available but they seem to have been dedicated to special file formats used >> exclusively by firms producing e.books for sale: the indications seem to be >> that their efforts to establish monopolies mostly failed and their devices >> ceased to be available in the market. Some at least were exclusive to USA >> anyway, which would not have helped someone like myself resident in UK. >> >> I raised this matter with one of the main UK computing magazines but they >> came back only with the standard view that a PDA would do, which of course >> it would not, being designed for quite different purposes. I have also >> enquired of several of the main computing retailers, none of whom has shown >> the slightest interest. >> >> I feel quite surprised that nothing specific is available - have I missed >> something in my researches? If I have, I'd be very grateful if you could >> point me in the right direction; but if I have not, then could PG perhaps >> set a spark to some manufacturer's imagination? >> >> I thought that perhaps a modern DVD portable player might be the answer - >> some very cheap models are becoming available - but from the specifications >> I have seen and the advice given by retailers they are unlikely to be able >> to take .txt, .rtf or .pdf files. If they did, one could simply put the >> e.books onto CD or DVD as data files - although slightly bigger than >> Captain Picard uses when at leisure in his quarters, a portable DVD player >> would be much more convenient to use than a laptop. I am currently trying >> to ascertain whether it might be possible to charge an existing model with >> a program to make it compatible? One just needs .txt, .rtf and .pdf. >> >> Yours sincerely, >> Robert Sutherland >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> >> >> >> -- Marcello Perathoner >> webmaster@gutenberg.org >> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d@lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From jon_niehof at yahoo.com Wed Jun 8 14:10:59 2005 From: jon_niehof at yahoo.com (Jon Niehof) Date: Wed Jun 8 14:11:09 2005 Subject: [gutvol-d] [Fwd: Ebook Reading device?] In-Reply-To: <20050608203609.2EC9A9E792@ws6-2.us4.outblaze.com> Message-ID: <20050608211059.84786.qmail@web32904.mail.mud.yahoo.com> Joshua Hutchinson wrote: > You might be more pleased with the results on a PDA if you try > an HTML edition of the book as opposed to the Text versions. > In my experience, the simplified web browser in most PDAs is > quite up to the task of formatting the text to nicely fit on a > PDA screen. It is the hard return marks in the text files > that cause the line length issues you've seen. Whereas I take the opposite tack and use the plain text version coupled with Weasel ( http://gutenpalm.sourceforge.net/ ). It has an autoscroll mode that fills the screen from top to bottom and then wraps around to start filling from the top again--so by the time you reach the bottom of a screenful, the top has new text. It rewraps the text for you (with a couple of options on how to do it) so line length's not an issue. I find reading on a computer much less convenient simply because there isn't good software. With lighter laptops and especially tablets the hardware side is less of an issue; the bulkiness of a tablet per unit screen area probably isn't worse than a PDA or DVD player. There are workarounds for using portable DVD players. Most of them play VCD's and one can make VCD's that are a sequence of stills. I'm sure similar hacks are possible with DVD's as well, and it's always possible to create a movie of scrolling text. But the resolution wouldn't be much better than a modern PDA, and there'd be a lot of work involved in setting up such a system. If you're looking for the most "booklike" solution, a tablet PC is probably it. A PDA is the most cost-effective approach, which gives a slightly different "feel" to reading but one that I find just as enjoyable. Good luck, and I hope you find something that works for you. __________________________________ Discover Yahoo! Use Yahoo! to plan a weekend, have fun online and more. Check it out! http://discover.yahoo.com/ From collin at xs4all.nl Wed Jun 8 15:33:58 2005 From: collin at xs4all.nl (Branko Collin) Date: Wed Jun 8 15:20:34 2005 Subject: [gutvol-d] [Fwd: Ebook Reading device?] In-Reply-To: <42A75513.1080508@gutenberg.org> Message-ID: <42A78E76.21922.35DE58D@localhost> On 8 Jun 2005, at 22:29, Marcello Perathoner wrote: > Being now in retirement I lately became interested in E-books and was > delighted - amazed, more like! - to discover Project Gutenberg. > However, I have been very puzzled by the apparent absence of a simple > portable device designed for reading downloaded e.books. All my > searches on the internet and my inquiries of the trade have failed to > trace one. I wonder if you can put me on track of one? [snip] > I raised this matter with one of the main UK computing magazines but > they came back only with the standard view that a PDA would do, which > of course it would not, being designed for quite different purposes. I > have also enquired of several of the main computing retailers, none of > whom has shown the slightest interest. > > I feel quite surprised that nothing specific is available - have I > missed something in my researches? I am afraid you haven't missed much. There are a few devices that have been developed specifically for reading ebooks, notably the Sony Librie () and the Ebookwise 1150 (). But as you noted: > As far as I see from the internet, there used to be a few > such devices available but they seem to have been > dedicated to special file formats used exclusively by > firms producing e.books for sale: the indications seem to > be that their efforts to establish monopolies mostly failed > and their devices ceased to be available in the market. However, since you don't mind asking Project Gutenberg, which produces very raw and unadorned ebooks, you probably do not mind having to put in some extra work. Both the Librie and the Ebookwise can handle other formats once you have made a conversion step. > I thought that perhaps a modern DVD portable player might be the > answer - some very cheap models are becoming available - but from the > specifications I have seen and the advice given by retailers they are > unlikely to be able to take .txt, .rtf or .pdf files. If they did, one > could simply put the e.books onto CD or DVD as data files - although > slightly bigger than Captain Picard uses when at leisure in his > quarters, a portable DVD player would be much more convenient to use > than a laptop. I am currently trying to ascertain whether it might be > possible to charge an existing model with a program to make it > compatible? One just needs .txt, .rtf and .pdf. A Play Station Portable may approach what you are looking for; I am not sure how well developed interfaces for DVD portables are. There used to be a small computer somewhere halfway between a PDA and a notebook that sounded promising, with wireless ethernet, sub 1-kg weight, 7 inch screen (VGA), and 11 hours of battery life. It was called the Psion Netbook, and it was pretty much stillborn. But the folks at The Register liked it () and to me it always sounded like a good ebook reading device. Psion followed it up with the Netbook Pro, which is way too heavy. If I were you, I would focus on the device first, and only then look if there is conversion software available. -- branko collin collin@xs4all.nl From jon_niehof at yahoo.com Wed Jun 8 15:44:59 2005 From: jon_niehof at yahoo.com (Jon Niehof) Date: Wed Jun 8 15:45:11 2005 Subject: [gutvol-d] [Fwd: Ebook Reading device?] In-Reply-To: <42A78E76.21922.35DE58D@localhost> Message-ID: <20050608224459.60922.qmail@web32910.mail.mud.yahoo.com> I apologize for hammering your inbox, Robert, but Branko has an excellent idea: --- Branko Collin wrote: > A Play Station Portable may approach what you are looking for; It's a bit pricey for *just* an ebook reader (not that a laptop is cheap), but here are two resources: http://gamefries.blogspot.com/2005/03/how-to-get-e-books-on-your-psp.html http://pdf2psp.sourceforge.net/ It's a "batch of images" approach so you can't search or anything, but it gets the job done. Most of the portable DVD players listed on Amazon also offer JPEG or Kodak Photo CD support, and you could probably use some of the same software as used for the PSP. > There used to be a small computer somewhere halfway between a > PDA and a notebook that sounded promising, with wireless > ethernet, sub 1-kg weight, 7 inch screen (VGA), and 11 hours of > battery life. It was called the Psion Netbook, and it was > pretty much stillborn. The Oqo is similar and is finally "available", at $2600. The ThinkPad X41 would probably be a worthy competitor for e-booking--4 lbs., but 12" screen and "only" $1900. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From gbnewby at pglaf.org Thu Jun 9 00:50:22 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Jun 9 00:50:24 2005 Subject: [gutvol-d] New "draft" DVD image Message-ID: <20050609075022.GA15457@pglaf.org> There is a new DVD image some folks might like to check out. You can get it interactively here: http://snowy.arsc.alaska.edu/pgjun05 or download the full ISO here (size=4668391424 bytes, MD5sum=eb9d00a4b1e4cb30d801709ced6da282): ftp://snowy.arsc.alaska.edu/pub/gbn/pgjun05.iso This is the first major output of Craig Stephenson's program to allow people to build their own CD/DVD ISOs. I'll send a URL to the program in another week or two (it's not quite ready yet for multiple users). We started with the Best Of CD titles as core, getting updated files with an emphasis on HTML. Then, we blindly added lots more HTML, uncompressed, for a pleasurable "unzip-free" reading experience. I also made sure a few particular authors were included, in the Best Of tradition. There are a few things I know are problematic, but please inform me of any others that you spot: - a few copyrighted files snuck in (some MP3 audio and a Kafka) - the author/title index files are mixed case, and would be better in a subdirectory - there might be some Complete volumes that are partially duplicated by individual volumes. If you spot any, let me know - the author/title index pages need something like a "Link: " label for the eBook file, and also a "Language: " field. We might add a "by-language" index, in addition to the Author and Title indexes. Although I made a bunch of these for Michael Hart's visit to Alaska (public talk=Wednesday June 22 at the Fairbanks Public Library 7:00 pm), and to try to give away to AK libraries, I don't expect this to be quite polished enough to redistribute en masse. But I hope it might be the core of a new DVD option to supplement our "PG 10K Special" from December 2004. (That DVD, which is eBook 11800, is mostly zipped .txt files -- about 9400 titles). This new DVD image contains about 5100 eBooks. In a nutshell, Craig's program parses the RDF/XML catalog into a MySQL database. Then, PHP is used to provide a user with an iterative, interactive set of steps to add and delete eBooks and their formats from the ISO. Building an online browsable prototype of the ISO is simple and fast, because we use hard links (on the same filesystem as the collection mirror). Once it looks good, the actual ISO is built with mkisofs (which takes a little while) and becomes available for download via FTP (or HTTP if it's < 2GB). We'll be doing features etc., and making the code widely available (though it basically requires a complete PG mirror to work). Enjoy, and please send feedback! -- Greg From marcello at perathoner.de Thu Jun 9 02:38:11 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Jun 9 02:38:33 2005 Subject: [gutvol-d] New "draft" DVD image In-Reply-To: <20050609075022.GA15457@pglaf.org> References: <20050609075022.GA15457@pglaf.org> Message-ID: <42A80E03.6010700@perathoner.de> Greg Newby wrote: > In a nutshell, Craig's program parses the RDF/XML catalog into a MySQL > database. Then, PHP is used to provide a user with an iterative, > interactive set of steps to add and delete eBooks and their formats from > the ISO. Who do we target, the PG DVD team or the user at large? Where is this program supposed to run when it is ready? -- Marcello Perathoner webmaster@gutenberg.org From kouhia at nic.funet.fi Thu Jun 9 11:34:45 2005 From: kouhia at nic.funet.fi (Juhana Sadeharju) Date: Thu Jun 9 11:34:57 2005 Subject: [gutvol-d] Re: WIPO Online Forum on Intellectual Property Message-ID: >From: Andrew Sly > >This [Berne] convention (first formulated in 1886) is the most >wide-spread international copyright agreement. > >It sets out a basic minimum copyright term of life+50. Maybe they were wrong then as well. The term should decrease nowadays. The trend today is to have the old material available. Nobody gains if the old books are out-of-print. But we can also blame the authors who gives their soul... work for life+70+. If Berne and equivalents cannot be changed, then authors should sign only contracts which does not sell their soul. Has anyone statistics how books does sell? How many years the books sell with profit? Have we asked permission to make out-of-print and still copyrighted books available? That would save the publisher the trouble. Juhana -- http://music.columbia.edu/mailman/listinfo/linux-graphics-dev for developers of open source graphics software From gbnewby at pglaf.org Thu Jun 9 16:04:26 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Jun 9 16:04:29 2005 Subject: [gutvol-d] New "draft" DVD image In-Reply-To: <42A80E03.6010700@perathoner.de> References: <20050609075022.GA15457@pglaf.org> <42A80E03.6010700@perathoner.de> Message-ID: <20050609230426.GE1218@pglaf.org> On Thu, Jun 09, 2005 at 11:38:11AM +0200, Marcello Perathoner wrote: > Greg Newby wrote: > > >In a nutshell, Craig's program parses the RDF/XML catalog into a MySQL > >database. Then, PHP is used to provide a user with an iterative, > >interactive set of steps to add and delete eBooks and their formats from > >the ISO. > > Who do we target, the PG DVD team or the user at large? The user at large. But there are benefits for the DVD team and other purposes, as well. For example, someone will be able to "save" their ISO configuration, then return later to get *updated* files for the same eBooks. This will be particularly useful for doing things like quarterly updates of "theme" CDs or DVDs, such as Col Choat's idea of an "explorers" collection. > Where is this program supposed to run when it is ready? On a beefy server. Right now it's on snowy.arsc.alaska.edu, and I imagine snowy will be suitable for relatively large-scale use. I hope the program will be available at other mirror sites, too. I think it will be too intensive in disk & CPU for iBiblio, but you never know... if this sounds computationally unrealistic to offer to the general reader, to you, read my work .sig below :-) -- Greg Dr. Gregory B. Newby, Chief Scientist, Arctic Region Supercomputing Center Univ of Alaska Fairbanks-909 Koyukuk Dr-PO Box 756020-Fairbanks-AK 99775-6020 e: newby AT arsc.edu v: 907-450-8663 f: 907-450-8601 w: www.arsc.edu/~newby From marcello at perathoner.de Fri Jun 10 02:54:53 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Jun 10 02:55:21 2005 Subject: [gutvol-d] New "draft" DVD image In-Reply-To: <20050609230426.GE1218@pglaf.org> References: <20050609075022.GA15457@pglaf.org> <42A80E03.6010700@perathoner.de> <20050609230426.GE1218@pglaf.org> Message-ID: <42A9636D.5030301@perathoner.de> Greg Newby wrote: > For example, someone will be able to "save" their ISO configuration, > then return later to get *updated* files for the same eBooks. This will > be particularly useful for doing things like quarterly updates of > "theme" CDs or DVDs, such as Col Choat's idea of an "explorers" > collection. This will be great for the DVD team. I don't know about the users at large though. Some people (not mirrors!) are roboting our whole site once a week in search for new books. I wonder how the DVD maker will scale under similar load conditions. I was just wondering if it wasn't more realistic to use jigdo on the users side. People who burn DVDs do have a little knowledge so they could manage to install that. Jigdo advantages: no big single chunk file transfers. jigdo will get the ebook files from the ftp server and build the DVD image on the users PC. On updates the user has to transfer just the changed files not the whole DVD image. By building our own jigdo files we could round robin the ftp load to different mirrors. Jigdo disadvantages: user has to install the jigdo client. We have to somehow build a jigdo control file (but jigdo is open source, so we can figure that out.) >>Where is this program supposed to run when it is ready? > > On a beefy server. Right now it's on snowy.arsc.alaska.edu, and I > imagine snowy will be suitable for relatively large-scale use. I hope > the program will be available at other mirror sites, too. I think it > will be too intensive in disk & CPU for iBiblio, but you never know... > if this sounds computationally unrealistic to offer to the general > reader, to you, read my work .sig below :-) Of course, if you throw a NetApp terabyte server at the problem... You'll need a place to store all those custom DVD images until the user has retrieved them. (How to detect that? You can't rely on the user notifying you.) Retrieving DVD images has been a PITA even with fast DSL modems, so you'll have to save the images for at least a couple of days. > Dr. Gregory B. Newby, Chief Scientist, Arctic Region Supercomputing Center That's a good idea. You will save big on your air-conditioning bill. :-) -- Marcello Perathoner webmaster@gutenberg.org From gbnewby at pglaf.org Fri Jun 10 09:26:42 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Fri Jun 10 09:26:44 2005 Subject: [gutvol-d] New "draft" DVD image In-Reply-To: <42A9636D.5030301@perathoner.de> References: <20050609075022.GA15457@pglaf.org> <42A80E03.6010700@perathoner.de> <20050609230426.GE1218@pglaf.org> <42A9636D.5030301@perathoner.de> Message-ID: <20050610162642.GB27558@pglaf.org> On Fri, Jun 10, 2005 at 11:54:53AM +0200, Marcello Perathoner wrote: > Greg Newby wrote: > > >For example, someone will be able to "save" their ISO configuration, > >then return later to get *updated* files for the same eBooks. This will > >be particularly useful for doing things like quarterly updates of > >"theme" CDs or DVDs, such as Col Choat's idea of an "explorers" > >collection. > > This will be great for the DVD team. I don't know about the users at > large though. > > Some people (not mirrors!) are roboting our whole site once a week in > search for new books. I wonder how the DVD maker will scale under > similar load conditions. We will see, but I don't think the DVD maker will be robot-able at all. There are also provisions for load balancing....for example, when a user has the CD/DVD contents specified and says, "make me the ISO file," the ISO happens on an "as-available" basis, and the user gets email when it's ready. It's not going to be a viable tool for resource discovery. > I was just wondering if it wasn't more realistic to use jigdo on the > users side. People who burn DVDs do have a little knowledge so they > could manage to install that. > > Jigdo advantages: no big single chunk file transfers. jigdo will get the > ebook files from the ftp server and build the DVD image on the users PC. > On updates the user has to transfer just the changed files not the whole > DVD image. By building our own jigdo files we could round robin the ftp > load to different mirrors. > > Jigdo disadvantages: user has to install the jigdo client. We have to > somehow build a jigdo control file (but jigdo is open source, so we can > figure that out.) I'm 100% in favor of jigdo, and can set you up on snowy if you (or someone else) would like to get it configured. -- Greg > >>Where is this program supposed to run when it is ready? > > > >On a beefy server. Right now it's on snowy.arsc.alaska.edu, and I > >imagine snowy will be suitable for relatively large-scale use. I hope > >the program will be available at other mirror sites, too. I think it > >will be too intensive in disk & CPU for iBiblio, but you never know... > >if this sounds computationally unrealistic to offer to the general > >reader, to you, read my work .sig below :-) > > Of course, if you throw a NetApp terabyte server at the problem... > > You'll need a place to store all those custom DVD images until the user > has retrieved them. (How to detect that? You can't rely on the user > notifying you.) Retrieving DVD images has been a PITA even with fast DSL > modems, so you'll have to save the images for at least a couple of days. > > > >Dr. Gregory B. Newby, Chief Scientist, Arctic Region Supercomputing Center > > That's a good idea. You will save big on your air-conditioning bill. :-) > > > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From gbnewby at pglaf.org Sun Jun 12 17:09:22 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Jun 12 17:09:24 2005 Subject: [gutvol-d] WIPO Online Forum on Intellectual Property in the Information Society In-Reply-To: <429DFF51.7080007@perathoner.de> References: <429DFF51.7080007@perathoner.de> Message-ID: <20050613000922.GA25595@pglaf.org> Just a few more days to enter a comment. My comment is below. On Wed, Jun 01, 2005 at 08:32:49PM +0200, Marcello Perathoner wrote: > Welcome to the Online Forum on Intellectual Property in the Information > Society, hosted by the World Intellectual Property Organization (WIPO) > from June 1 to 15, 2005. > > The WIPO Online Forum is designed to enable and encourage an open debate > on issues related to intellectual property in the information society, > and in light of the goals of the World Summit on the Information Society > (WSIS). This presents a unique opportunity for all to engage in the > emerging debate on intellectual property in our day. > > The 10 themes for discussion are listed below - scroll down to select a > theme. > > The WIPO Online Forum is open to participation by all interested persons > ? you are invited to join in online discussions over a period of two > weeks from June 1, 2005. It is hoped that the Online Forum will further > inform the discussions taking place during the second phase of WSIS. > The conclusions of the Online Forum will form part of WIPO?s > contribution to the WSIS Tunis Summit. > I posted here, in their "Public Domain" topic: http://www.wipo.int/roller/comments/ipisforum/Weblog/theme_three_the_public_domain What I posted: I agree with many of the earlier comments that question the motivations of WIPO's raising these questions. Certainly the past history of WIPO's role in copyright has shown their interests to be aligned with moneyed interests. Nevertheless, I offer a few comments. As the URL suggests, I'm affiliated with Project Gutenberg, which is an all-electronic library of digitized works. The vast majority of our 16,000+ titles are in the public domain in the US. We constantly strive to expand the accessibility of public domain eBooks by seeking older literary works. We also seek to identify public domain items that might not, at first glance, appear to be public domain. These might include: - items published from 1923-1964 in the US which did not have their copyright renewed: these are public domain. - items that are no longer commercially available or for which a copyright owner cannot be identified. Under the US Title 17 section 108(h), these may be public domain, and the US Librarian of Congress seems interested in making them accessible. - items published prior to 1989 without a copyright notice in the US: these are public domain. We believe there are more than adequate protections for copyright owners to benefit from their works. Unfortunately, copyright term extensions, combined with unduly harsh penalties for copyright-related infringement (especially in the US under the DMCA), has pushed the balance so that the public domain is deemphasized. Prior to 1998, one year's worth of copyrighted items (from 1923, in that case) would enter the public domain, even as the current year's items started their multi-year journey under copyright protection. But thanks to the copyright term extention of 1998, the most astounding growth in the quantity of information in the world -- fueled by the Internet -- has not been accompanied by any significant growth in the public domain. As others have pointed out in this topic, open source software and creative commons licenses are welcome, but no substitute for the public domain. Such items still have the full force and duration of copyright law. Simply put, a healthy public domain is pre-requisite for support of the creative arts. It is very much possible to provide for ongoing commercial potential for some works, while maintaining growth in the public domain. This can be accomplished in many ways, but the most straightforward is to return to the need for active renewal of copyrights beyond a modest term. Such procedures would give the very long copyright terms desired by moneyed interests, while the vast majority of copyrighted items without such interests would enter the public domain after a limited term. WIPO's leadership role should include fostering a growing public domain. -- Greg From jefferydouglaswaddell at gmail.com Mon Jun 13 16:20:10 2005 From: jefferydouglaswaddell at gmail.com (Jeff Waddell) Date: Mon Jun 13 16:20:24 2005 Subject: [gutvol-d] Greetings ebook makers ;) In-Reply-To: <8a44f71c050613161325117bf3@mail.gmail.com> References: <8a44f71c05061316037c592bbb@mail.gmail.com> <8a44f71c050613161325117bf3@mail.gmail.com> Message-ID: <8a44f71c0506131620d308ebd@mail.gmail.com> Hello fellow ebook creators, Some of you may know me and many perhaps do not. Many long years ago I started a project called Kids Games which has spawned many newer projects and coordinated with other's. At the moment that project is basically defunct due to many factors. However I have been I have intentions to use the gutenberg project works with open source software to produce more educational software both for individuals and specifically for schools. I appreciate all the work that various members of this list have done in the past and look forward to what shall be created in the future. I recently rewrote my resume to reflect more deeply who I am and what I am about regarding my career. Because I feel that I will be utilizing the gutenberg project as one of the many resources to reach some of the goals that my career implies, I offer that document to all of you (please forward/share with anyone you deem it pertinent). You can access it at my personal website (www.spunge.org/~jwaddell ) just by following the links. It is there in 5 different formats and I hope one of them will work for you. Thank you for you time and I look forward to continuing this adventure of creating open source educational software for children. Sincerely, JefferyDouglasWaddell@gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050613/edfc4f71/attachment.html From grendelkhan at gmail.com Tue Jun 14 14:28:19 2005 From: grendelkhan at gmail.com (grendelkhan) Date: Tue Jun 14 14:28:31 2005 Subject: [gutvol-d] Print-on-demand and dead-tree copies of Gutenberg texts. Message-ID: <26b51c32050614142824831367@mail.gmail.com> I was having a discussion with my father, and I thought I would bring it up on the mailing list, as it seems to be the place for it. We'd just come out of our local Wal-Mart, and I'd noticed the out-of-copyright books (classics and such) being sold for $6 to $11 each. I commented that folks could just download the books for free if they wanted to read them, but he asked how many people owned a computer, and how many of those had heard of Project Gutenberg? So I did a bit of researching, and discovered that there exist "print on demand" publishers, which instead of doing the offset-printing runs of thousands and thousands of books, will, once a book has been prepared and typeset, sometimes keep none at all in stock, and print them only when ordered. It seems that it would be a good idea to come up with some way to offer the majority of PG's catalog through some method of print-on-demand publishing, selling at-cost. Many Gutenberg works are obscure, and not of general enough interest to warrant a print run from a traditional publisher. I'm aware that I could clearly run off and do this myself, but (a) I wanted to get some feedback from the community at large, and (b) print-on-demand publishing still requires start-up costs, and a per-book "setup" fee of some kind, above and beyond the per-copy materials cost. Given that PG has the Distributed Proofreaders to provide lots and lots of work on worthy projects, and given that PGLAF is a charitable organization which lots of people love, is there some way to get around that issue? Would it be worth it to provide a source of dead-tree editions of many of the archive's works? Thoughts? Objections? Pointers to some guy who's been doing this for the last ten years that I failed to Google up? --grendelkhan From Bowerbird at aol.com Tue Jun 14 15:43:44 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Jun 14 15:44:04 2005 Subject: [gutvol-d] Print-on-demand and dead-tree copies of Gutenberg texts. Message-ID: <128.5ed20c96.2fe0b7a0@aol.com> grendelkhan said: > Would it be worth it to provide > a source of dead-tree editions of > many of the archive's works? "worth it" to whom? :+) > Thoughts? it's a nice thought, to be sure. one that many people have had. but not an easy one to implement. > Objections? it's very smart of you to ask. so i will tell you some pitfalls. i tell you this not to dissuade you, so if you think you can make it work anyway, just go right ahead and do it. but know that these are some issues... the start-up costs are very real, so you won't get a p.o.d.-printer to waive them, even for a charity. and printed in runs of a few copies, the books still are fairly expensive. plus the shipping costs will eat you up. in addition, if you just ran the books off as they are -- as plain old ascii text -- people would largely turn their noses up. there is a certain minimum standard that we have come to expect from "a book", and failure to meet that is a recipe for failure. even the .html versions of the books won't create a p-version that would be acceptable. so you'd have to invest time/money/energy in some desktop-publishing capabilities... even after all that, in today's marketplace, simply creating a product won't do very much. today's customers are subjected to such heavy marketing that they simply won't move at all unless you bombard 'em with more of the same. that means you'd have to do hype and marketing, and probably pay for shelf-space in bookstores. and by then, you've just become another publisher... but you'd still lose out to the publishing houses, because their versions would have slicker covers. finally, if you ever _did_ make it work, by some miracle or other, you should then expect to receive vicious _flak_ from people who will _resent_ you, because you're "getting rich" off their volunteer labor and "selling something that should be given away free". so unless you have a _very_ thick skin... > Pointers to some guy who's been doing this > for the last ten years that I failed to Google up? nobody has been doing it, for the reasons i listed. that doesn't mean that nobody _will_ be doing it, however. if you're really serious about the idea, see where daniel moynihan is working these days... at blackmask.com, he demonstrated clearly that a plain-text master-file can take you a long way. he hasn't said so directly, but reading in between the pages, i'm guessing he's going even farther now. ;+) -bowerbird From ian at babcockbrown.com Tue Jun 14 15:51:36 2005 From: ian at babcockbrown.com (Ian Stoba) Date: Tue Jun 14 15:50:16 2005 Subject: [gutvol-d] Print-on-demand and dead-tree copies of Gutenberg texts. In-Reply-To: <26b51c32050614142824831367@mail.gmail.com> References: <26b51c32050614142824831367@mail.gmail.com> Message-ID: <618BC402-468D-4B38-A35E-9AA5AAADA704@babcockbrown.com> I read your message and realized that I just might be the "some guy" you were talking about. A while back (probably 10 - 12 years ago) I spoke with some professors about using PG texts in their classes. I thought that ebooks would be a good alternative to overpriced short press runs aimed at impoverished college students. At that time, the response I got from everyone I spoke to was that the quality of PG texts was just not high enough for academics to endorse them as a teaching tool. After hearing that, I thought very seriously about starting a publishing business that would match up young professors (badly in need of publishing credits in their hope of becoming tenured) with PG texts and bringing out edited, and possibly annotated, editions. I suspected that this could be done at a very reasonable cost and would be a benefit to the students as well as the professors. I ended up not pursuing that idea, and it's probably just as well that I didn't. A lot has changed in the past decade, notably: - Thanks to Distributed Proofreaders the accuracy of PG texts has increased -enormously-, likewise the breadth of the PG collection. - I learned that annotated editions would likely encumber public domain work with newly copyrighted material. This would limit students' ability to modify and redistribute the materials. - The rise of Creative Commons and Science Commons has given academics many new venues to publish outside of the mainstream presses. - Print on demand has become a viable business, with press runs of one copy now being profitable. - Brewster Kahle's bookmobile was just flat out cooler than anything I ever imagined, and it works really well. With all that said, I -still- this it would be great to have good quality editions of public domain works available at a reasonable cost to students and anyone else who doesn't want to overpay for a book. The limitation now in creating a print on demand service for PG books is that the main POD publishers tend to want an upfront fee to cover their setup and storage costs. In many cases, this may be about $500 per title. My guess is that this cost would be prohibitive for PG. I do not know if any POD publishers have expressed interest in waiving this fee for Project Gutenberg. Also, some PG volunteers are working on a standard system for publishing texts in a markup language called TEI-Lite. As I understand it, this markup language (it's a dialect of XML) would make it much easier to offer electronic texts in a variety of formats, including some that would be suitable for printing and binding. This markup would largely replace the academic editor I had imagined. If you decide to go forward with this idea, I would be very interested to hear more. I'm not interested in running a publishing house at this point in my life, but I would certainly want to order some books! --Ian On Jun 14, 2005, at 2:28 PM, grendelkhan wrote: > I was having a discussion with my father, and I thought I would bring > it up on the mailing list, as it seems to be the place for it. > > We'd just come out of our local Wal-Mart, and I'd noticed the > out-of-copyright books (classics and such) being sold for $6 to $11 > each. I commented that folks could just download the books for free if > they wanted to read them, but he asked how many people owned a > computer, and how many of those had heard of Project Gutenberg? > > So I did a bit of researching, and discovered that there exist "print > on demand" publishers, which instead of doing the offset-printing runs > of thousands and thousands of books, will, once a book has been > prepared and typeset, sometimes keep none at all in stock, and print > them only when ordered. > > It seems that it would be a good idea to come up with some way to > offer the majority of PG's catalog through some method of > print-on-demand publishing, selling at-cost. Many Gutenberg works are > obscure, and not of general enough interest to warrant a print run > from a traditional publisher. > > I'm aware that I could clearly run off and do this myself, but (a) I > wanted to get some feedback from the community at large, and (b) > print-on-demand publishing still requires start-up costs, and a > per-book "setup" fee of some kind, above and beyond the per-copy > materials cost. Given that PG has the Distributed Proofreaders to > provide lots and lots of work on worthy projects, and given that PGLAF > is a charitable organization which lots of people love, is there some > way to get around that issue? > > Would it be worth it to provide a source of dead-tree editions of many > of the archive's works? Thoughts? Objections? Pointers to some guy > who's been doing this for the last ten years that I failed to Google > up? > > --grendelkhan > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > This email message may contain information that is confidential and proprietary to Babcock & Brown or a third party. If you are not the intended recipient, please contact the sender and destroy the original and any copies of the original message. Babcock & Brown takes measures to protect the content of its communications. However, Babcock & Brown cannot guarantee that email messages will not be intercepted by third parties or that email messages will be free of errors or viruses. If you do not wish to receive any further e-mail from Babcock & Brown, please send an email to opt-out@babcockbrown.com. From cannona at fireantproductions.com Tue Jun 14 16:44:56 2005 From: cannona at fireantproductions.com (Aaron Cannon) Date: Tue Jun 14 16:49:04 2005 Subject: [gutvol-d] Print-on-demand and dead-tree copies of Gutenberg texts. In-Reply-To: <26b51c32050614142824831367@mail.gmail.com> References: <26b51c32050614142824831367@mail.gmail.com> Message-ID: <6.2.1.2.0.20050614184411.041802a8@mail.fireantproductions.com> There was some talk about working with lulu.com, but I'm not sure where that ended up. Good luck. Sincerely aaron Cannon At 04:28 PM 6/14/2005, you wrote: >I was having a discussion with my father, and I thought I would bring >it up on the mailing list, as it seems to be the place for it. > >We'd just come out of our local Wal-Mart, and I'd noticed the >out-of-copyright books (classics and such) being sold for $6 to $11 >each. I commented that folks could just download the books for free if >they wanted to read them, but he asked how many people owned a >computer, and how many of those had heard of Project Gutenberg? > >So I did a bit of researching, and discovered that there exist "print >on demand" publishers, which instead of doing the offset-printing runs >of thousands and thousands of books, will, once a book has been >prepared and typeset, sometimes keep none at all in stock, and print >them only when ordered. > >It seems that it would be a good idea to come up with some way to >offer the majority of PG's catalog through some method of >print-on-demand publishing, selling at-cost. Many Gutenberg works are >obscure, and not of general enough interest to warrant a print run >from a traditional publisher. > >I'm aware that I could clearly run off and do this myself, but (a) I >wanted to get some feedback from the community at large, and (b) >print-on-demand publishing still requires start-up costs, and a >per-book "setup" fee of some kind, above and beyond the per-copy >materials cost. Given that PG has the Distributed Proofreaders to >provide lots and lots of work on worthy projects, and given that PGLAF >is a charitable organization which lots of people love, is there some >way to get around that issue? > >Would it be worth it to provide a source of dead-tree editions of many >of the archive's works? Thoughts? Objections? Pointers to some guy >who's been doing this for the last ten years that I failed to Google >up? > >--grendelkhan >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) From Bowerbird at aol.com Wed Jun 15 00:44:23 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Jun 15 00:44:45 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf Message-ID: <62.5703acdd.2fe13657@aol.com> recently i've worked on "roundtripping" styled z.m.l. text through a .pdf. my viewer-program can write z.m.l. text to a .pdf such that copying the text out of the .pdf gives a user the same text that went in. make a few global changes -- which restores the whitespace acrobat usually strips from text -- and you can load the text back into my z.m.l. viewer-program and generate the same .pdf once again... the proof is in the pudding, and the structure is in the presentation. that is _not_ something you can do with text that's copied out of a .pdf created with other programs i know. generally, the .pdf format is known as the "roach motel" of file-formats -- content goes in and can't get out... :+) -bowerbird From hart at pglaf.org Wed Jun 15 08:14:09 2005 From: hart at pglaf.org (Michael Hart) Date: Wed Jun 15 08:14:11 2005 Subject: [gutvol-d] Print-on-demand and dead-tree copies of Gutenberg texts. In-Reply-To: <26b51c32050614142824831367@mail.gmail.com> References: <26b51c32050614142824831367@mail.gmail.com> Message-ID: On Tue, 14 Jun 2005, grendelkhan wrote: > I was having a discussion with my father, and I thought I would bring > it up on the mailing list, as it seems to be the place for it. > > We'd just come out of our local Wal-Mart, and I'd noticed the > out-of-copyright books (classics and such) being sold for $6 to $11 > each. I commented that folks could just download the books for free if > they wanted to read them, but he asked how many people owned a > computer, and how many of those had heard of Project Gutenberg? There have been over a billion computers in use in the world for some time now, and thus well over a billion computer users. In the US the computer saturation rate is somewhere around ~7/8 of all US households. [Anyone have the latest figures?] Not to mention that ~3/4 of these households have hi-speed access. As to how many of these people know about Project Gutenberg, that's hard to measure. . .perhaps we should do a survey. As for the rest of the world, the US is far from being the most saturated in terms of either computers or access, and in some lists doesn't even make the top ten. . .for some reason the Scandiavian countries seemed to beat us there. More later, Michael From distributedmel at gmail.com Wed Jun 15 08:35:00 2005 From: distributedmel at gmail.com (Melissa Er-Raqabi) Date: Wed Jun 15 08:35:10 2005 Subject: [gutvol-d] Print-on-demand and dead-tree copies of Gutenberg texts. In-Reply-To: References: <26b51c32050614142824831367@mail.gmail.com> Message-ID: Michael, where are you getting these numbers? Can you provide some sources please? I find them rather incredible. Melissa On 6/15/05, Michael Hart wrote: > > > > On Tue, 14 Jun 2005, grendelkhan wrote: > > > I was having a discussion with my father, and I thought I would bring > > it up on the mailing list, as it seems to be the place for it. > > > > We'd just come out of our local Wal-Mart, and I'd noticed the > > out-of-copyright books (classics and such) being sold for $6 to $11 > > each. I commented that folks could just download the books for free if > > they wanted to read them, but he asked how many people owned a > > computer, and how many of those had heard of Project Gutenberg? > > There have been over a billion computers in use in the world for > some time now, and thus well over a billion computer users. > > In the US the computer saturation rate is somewhere around ~7/8 > of all US households. [Anyone have the latest figures?] > > Not to mention that ~3/4 of these households have hi-speed access. > > As to how many of these people know about Project Gutenberg, > that's hard to measure. . .perhaps we should do a survey. > > As for the rest of the world, the US is far from being the > most saturated in terms of either computers or access, and > in some lists doesn't even make the top ten. . .for some > reason the Scandiavian countries seemed to beat us there. > > More later, > > Michael > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050615/94c22c81/attachment-0001.html From grendelkhan at gmail.com Wed Jun 15 13:22:58 2005 From: grendelkhan at gmail.com (grendelkhan) Date: Wed Jun 15 13:23:08 2005 Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenberg texts. Message-ID: <26b51c32050615132219efb74e@mail.gmail.com> Thanks to everyone for their comments so far! I'm learning quite a bit as I go. As of October 2003, the US Commerce department reported that about three-fifths of households had a computer; a little over half had internet access. https://www.esa.doc.gov/Reports/NationOnlineBroadband04.htm So it's not as bad as I was led to believe. Still, Perhaps I should have stated my goals a little more clearly. I have no particular interest in making money or making a business out of this. I'd simply like to make the books available---through whatever means that may be---in dead-tree form. I suppose it's a terrible idea fo tie the actual Project to a commercial entity by developing a working relationship with them---I don't think an "Official Project Gutenberg Edition" is a good idea. lulu.com, as mentioned, has no setup fees, but their pricing is a mite stiff---$4.53 plus $0.02/page. Certainly better than buying stuff from most university presses, but not exactly bargain-basement. Lightning Source charges (based on some quick googling at http://com1.runboard.com/bthescribesmessageboard.fwritingarchives.t45%7Coffset=15 ), $0.90 plus $0.013 per page, but I don't know what kind of binding that requires, or what sort of setup fees they charge. Perhaps they'd waive them if DP put out some sort of print-ready version in addition to human-readable text. I'm thinking TeX->PDF here, as it's pretty much the stablest human-readable-yet-fully-marked-up format available. Thoughts? I suppose I should take a relatively short etext, mark it up and see how it looks. I concur that simply throwing plain text, or even decent HTML, at paper is a horrible idea. So, what I ask is---is there a way to prepare the etexts as, in addition to HTML, whatever format is print-ready for these machines? Since typesetting a ready copy is a simple matter of feeding it to a Xerox DocuTech or whatever the $100,000 piece of hardware the print shop uses is, how can we do the necessary preprocessing ourselves? What exactly does the "setup fee" include? Thanks to everyone again for being so helpful with this. --grendelkhan From hyphen at hyphenologist.co.uk Wed Jun 15 13:35:08 2005 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Wed Jun 15 13:35:31 2005 Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenberg texts. In-Reply-To: <26b51c32050615132219efb74e@mail.gmail.com> References: <26b51c32050615132219efb74e@mail.gmail.com> Message-ID: <8v31b15dh04t720s70696hg7ip3n8djklk@4ax.com> On Wed, 15 Jun 2005 16:22:58 -0400, grendelkhan wrote: | Thanks to everyone for their comments so far! I'm learning quite a bit as I go. | | As of October 2003, the US Commerce department reported that about | three-fifths of households had a computer; a little over half had | internet access. | | https://www.esa.doc.gov/Reports/NationOnlineBroadband04.htm | | So it's not as bad as I was led to believe. Still, | | Perhaps I should have stated my goals a little more clearly. I have no | particular interest in making money or making a business out of this. | I'd simply like to make the books available---through whatever means | that may be---in dead-tree form. I suppose it's a terrible idea fo tie | the actual Project to a commercial entity by developing a working | relationship with them---I don't think an "Official Project Gutenberg | Edition" is a good idea. | | lulu.com, as mentioned, has no setup fees, but their pricing is a mite | stiff---$4.53 plus $0.02/page. Certainly better than buying stuff from | most university presses, but not exactly bargain-basement. Lightning | Source charges (based on some quick googling at | http://com1.runboard.com/bthescribesmessageboard.fwritingarchives.t45%7Coffset=15 | ), $0.90 plus $0.013 per page, but I don't know what kind of binding | that requires, or what sort of setup fees they charge. Perhaps they'd | waive them if DP put out some sort of print-ready version in addition | to human-readable text. I'm thinking TeX->PDF here, as it's pretty | much the stablest human-readable-yet-fully-marked-up format available. | Thoughts? I suppose I should take a relatively short etext, mark it up | and see how it looks. | | I concur that simply throwing plain text, or even decent HTML, at | paper is a horrible idea. So, what I ask is---is there a way to | prepare the etexts as, in addition to HTML, whatever format is | print-ready for these machines? Since typesetting a ready copy is a | simple matter of feeding it to a Xerox DocuTech or whatever the | $100,000 piece of hardware the print shop uses is, how can we do the | necessary preprocessing ourselves? What exactly does the "setup fee" | include? Just a mention that all Europe uses A4 paper. Anything designed solely for American paper sizes will be useless to typesetters in Europe. -- Dave Fawthrop http://www.webshots.com Thousands of wonderful professional photos for your Wallpaper and Screensaver. also 200,000 amateur pics. Four new pics each day. From prosfilaes at gmail.com Wed Jun 15 13:39:26 2005 From: prosfilaes at gmail.com (David Starner) Date: Wed Jun 15 13:39:37 2005 Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenberg texts. In-Reply-To: <8v31b15dh04t720s70696hg7ip3n8djklk@4ax.com> References: <26b51c32050615132219efb74e@mail.gmail.com> <8v31b15dh04t720s70696hg7ip3n8djklk@4ax.com> Message-ID: <6d99d1fd050615133945b30ce3@mail.gmail.com> On 6/15/05, Dave Fawthrop wrote: > Just a mention that all Europe uses A4 paper. > Anything designed solely for American paper sizes will be useless to > typesetters in Europe. Why? With decent margins, you can print letter on A4 or vice versa. And we aren't really talking typesetters here; typesetters don't care why size paper it is, since they're going to rip it apart and re-set it anyway. We're talking about people who are dumping our preformed blob to paper. From grendelkhan at gmail.com Wed Jun 15 14:09:47 2005 From: grendelkhan at gmail.com (grendelkhan) Date: Wed Jun 15 14:09:58 2005 Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenberg texts. In-Reply-To: <8v31b15dh04t720s70696hg7ip3n8djklk@4ax.com> References: <26b51c32050615132219efb74e@mail.gmail.com> <8v31b15dh04t720s70696hg7ip3n8djklk@4ax.com> Message-ID: <26b51c3205061514095c5e8f8@mail.gmail.com> On 6/15/05, Dave Fawthrop wrote: > Just a mention that all Europe uses A4 paper. > Anything designed solely for American paper sizes will be useless to > typesetters in Europe. I was planning on 6"x9", which I think is the standard trade paperback size. Except... hmm. http://www.cafepress.com/cp/info/help/learn_book_info.aspx Cafe Press states that 4.18in x 6.88in is the standard 'Mass Market Paperback' size. Also 5in x 8in for 'Standard Paperback'. http://www.whitehallprinting.com/TrimSize.html Some random printing company lists 6x9 and 5.5x8.5 as 'Standard Trim Sizes'. http://www.powerhomebiz.com/vol93/selfpublishing2.htm Another random tutorial lists 6x9 and 5-3/8x8. http://www.josephzitt.com/books/smwb-howto.php#pod Says here that apparently the 6x9 format is standard, at least with Lightning Source. Ah, and Cafe Press offers printing for $7 plus $0.03 per page with no setup fees. So, probably not the cheapest option. Perhaps I'll prep something and approach Lightning Source asking what they need in the way of preparation supplies---that is, what can be done for them. Is 6x9 a standard paperback size in Europe? I suppose that's of less interest. I'll be measuring some of my paperbacks at home this evening once I get back from work. Maybe print and trim a few test pages or something. --grendelkhan From Bowerbird at aol.com Wed Jun 15 14:43:44 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Jun 15 14:44:07 2005 Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenberg texts. Message-ID: <15d.52dcf2d7.2fe1fb10@aol.com> grendelkhan said: > Perhaps I should have stated my goals a little more clearly. > I have no particular interest in making money > or making a business out of this. but it costs money to do it. so you have to "make a business out of it" in order to do it in the first place... unless you have a lot of money to throw down the toilet. and you don't have to worry about "making money", because unless some big miracle strikes, or you are particularly clever about how you go about it, you won't make any money. you're far more likely to lose your shirt. so "avoiding the loss of money" is your real objective here. if you can't afford to lose any money, you should stay away. > I'd simply like to make the books available > ---through whatever means that may be--- > in dead-tree form. right. but you can't just wave a wand and make it happen. > lulu.com, as mentioned, has no setup fees, > but their pricing is a mite stiff---$4.53 plus $0.02/page. that $4.53 _is_ a setup fee, whether they call it that or not. and since it's $4.53 _per_ book, it's a rather high one at that. (if you really want the best p.o.d. price, i'll dig that up for you. there's one site offering quotes based on several p.o.d. places.) > I'm thinking TeX->PDF here ok, but who puts all the e-texts into tex format? that's a real cost, a very real cost, and it's huge. > Thoughts? I suppose I should take a relatively short etext, > mark it up and see how it looks. why a "relatively short" one? that'll just lead you to underestimate the actual cost, which is the best way to lose money fast. mark up one of average size and difficulty, and then multiply it by about 11,000, and then you'll have a good idea of the true cost. > So, what I ask is---is there a way to > prepare the etexts as, in addition to HTML, > whatever format is print-ready for these machines? there will be, very shortly, yes -- my viewer-program. given some minor editing of an e-text for consistency, usually 5-10 minutes for most e-texts, it will format the book according to the user's specifications (as to font, size, leading, colors, and paper-size, which dave mentioned in regard to european users) and create a .pdf. putting this program into the hands of end-users, so they can create their own output, to their own specs, and print it out on their own machines, is one route to giving them hard-copy versions of all the e-texts. it's not the only route, but since it puts all the power and all the _costs_ on their shoulders, it is likely to be the one that gets implemented more than others... -bowerbird From hyphen at hyphenologist.co.uk Wed Jun 15 23:51:54 2005 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Wed Jun 15 23:52:27 2005 Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenberg texts. In-Reply-To: <6d99d1fd050615133945b30ce3@mail.gmail.com> References: <26b51c32050615132219efb74e@mail.gmail.com> <8v31b15dh04t720s70696hg7ip3n8djklk@4ax.com> <6d99d1fd050615133945b30ce3@mail.gmail.com> Message-ID: On Wed, 15 Jun 2005 15:39:26 -0500, David Starner wrote: | On 6/15/05, Dave Fawthrop wrote: | > Just a mention that all Europe uses A4 paper. | > Anything designed solely for American paper sizes will be useless to | > typesetters in Europe. | | Why? With decent margins, you can print letter on A4 or vice versa. | And we aren't really talking typesetters here; typesetters don't care | why size paper it is, since they're going to rip it apart and re-set | it anyway. We're talking about people who are dumping our preformed | blob to paper. Please note the Subject of this thread: Print-on-demand and dead-tree copies of Gutenberg texts. IMO felling trees is a Bad Idea, especially when with a little thought fewer trees could be felled. -- Dave Fawthrop http://www.webshots.com Thousands of wonderful professional photos for your Wallpaper and Screensaver. also 200,000 amateur pics. Four new pics each day. From hyphen at hyphenologist.co.uk Thu Jun 16 00:22:38 2005 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Thu Jun 16 00:23:13 2005 Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenberg texts. In-Reply-To: <26b51c3205061514095c5e8f8@mail.gmail.com> References: <26b51c32050615132219efb74e@mail.gmail.com> <8v31b15dh04t720s70696hg7ip3n8djklk@4ax.com> <26b51c3205061514095c5e8f8@mail.gmail.com> Message-ID: On Wed, 15 Jun 2005 17:09:47 -0400, grendelkhan wrote: | On 6/15/05, Dave Fawthrop wrote: | > Just a mention that all Europe uses A4 paper. | > Anything designed solely for American paper sizes will be useless to | > typesetters in Europe. | | I was planning on 6"x9", which I think is the standard trade paperback | size. Except... hmm. But absolutely nobody uses inches any more, at least where I live. I was using a GPS which gives some measurements in feet, last weekend and found that I could not envisage how long a foot was. Even though I spent *more* than half my life using those insane measurements, ft, ins, lb, gallons (not US), perch, pole, peck, and so on. | http://www.cafepress.com/cp/info/help/learn_book_info.aspx San Leandro, California 94577 | Cafe Press states that 4.18in x 6.88in is the standard 'Mass Market | Paperback' size. Also 5in x 8in for 'Standard Paperback'. | | http://www.whitehallprinting.com/TrimSize.html | Naples, FL 34104 USA | Some random printing company lists 6x9 and 5.5x8.5 as 'Standard Trim Sizes'. | | http://www.powerhomebiz.com/vol93/selfpublishing2.htm Virginia, USA | Another random tutorial lists 6x9 and 5-3/8x8. | | http://www.josephzitt.com/books/smwb-howto.php#pod Berkeley, CA 94709 | Says here that apparently the 6x9 format is standard, at least with | Lightning Source. | | Ah, and Cafe Press offers printing for $7 plus $0.03 per page with no | setup fees. So, probably not the cheapest option. Perhaps I'll prep | something and approach Lightning Source asking what they need in the | way of preparation supplies---that is, what can be done for them. | | Is 6x9 a standard paperback size in Europe? No! A5 usually | I suppose that's of less | interest. I'll be measuring some of my ?American? | paperbacks at home this evening | once I get back from work. Maybe print and trim a few test pages or | something. Anything but A4 and A3 paper is *impossible*, for your ordinary person to get in Europe. The jobbing Printers use A0 sheets. Printing anything but A sizes produces waste trimmings :-( http://www.cl.cam.ac.uk/~mgk25/iso-paper.html >>> International standard paper sizes Standard paper sizes like ISO A4 are widely used all over the world today. This text explains the ISO 216 paper size system and the ideas behind its design. Globalization starts with getting the details right. Inconsistent use of SI units and international standard paper sizes remain today a primary cause for U.S. businesses failing to meet the expectations of customers worldwide. <<< Basically fold/cut A0 in two and you get A1. *with No Waste* What I am suggesting is that the design should be A5 for the world, with an alternative for US use. -- Dave Fawthrop http://www.webshots.com Thousands of wonderful professional photos for your Wallpaper and Screensaver. also 200,000 amateur pics. Four new pics each day. From Bowerbird at aol.com Thu Jun 16 02:27:00 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 16 02:27:28 2005 Subject: [gutvol-d] is that 60 gigs in your pocket? Message-ID: it's a photo-ipod in my pocket, but yes, i am glad to see you... -bowerbird From hacker at gnu-designs.com Thu Jun 16 06:04:50 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Thu Jun 16 06:05:31 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: <62.5703acdd.2fe13657@aol.com> References: <62.5703acdd.2fe13657@aol.com> Message-ID: > make a few global changes -- which restores the whitespace acrobat > usually strips from text -- and you can load the text back into my > z.m.l. viewer-program and generate the same .pdf once again... Acrobat doesn't store text in PDFs, they store pixels and vectors and OCR'd coordinates. Most-definately not text. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From jonathan.gorman at gmail.com Thu Jun 16 08:02:19 2005 From: jonathan.gorman at gmail.com (Jon Gorman) Date: Thu Jun 16 08:02:53 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: References: <62.5703acdd.2fe13657@aol.com> Message-ID: <4a6dc7605061608024b004c32@mail.gmail.com> On 6/16/05, David A. Desrosiers wrote: > > Acrobat doesn't store text in PDFs, they store pixels and > vectors and OCR'd coordinates. Most-definately not text. Ummm....so that's why Chapter 5 of the reference is all about text? Seriously though, it is possible to put text into pdfs. That's why you can copy and paste out of them. Granted, there are a lot of places that just scan in material and post that, but it is not the only thing that you can do with pdfs. PDF is derived from postscript after all. Unless I missed something in the conversation, in which case I'm sorry. Or you're being sarcastic and I just misread ;). Just didn't want anyone to be mislead. Jon Gorman From hacker at gnu-designs.com Thu Jun 16 08:19:54 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Thu Jun 16 08:20:32 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: <4a6dc7605061608024b004c32@mail.gmail.com> References: <62.5703acdd.2fe13657@aol.com> <4a6dc7605061608024b004c32@mail.gmail.com> Message-ID: > Seriously though, it is possible to put text into pdfs. That's why > you can copy and paste out of them. Granted, there are a lot of > places that just scan in material and post that, but it is not the > only thing that you can do with pdfs. PDF is derived from > postscript after all. Just because you can put down a cursor and go from one x,y to another x,y does not mean you are "selecting" what is visible on the screen, as your human eyes see it. PDF is pure layout, no structure. Tables are positioned text and lines, columns are positioned text... its basically OCR, without any character detection. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From jonathan.gorman at gmail.com Thu Jun 16 08:52:50 2005 From: jonathan.gorman at gmail.com (Jon Gorman) Date: Thu Jun 16 08:53:01 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: References: <62.5703acdd.2fe13657@aol.com> <4a6dc7605061608024b004c32@mail.gmail.com> Message-ID: <4a6dc760506160852491a30b5@mail.gmail.com> > Just because you can put down a cursor and go from one x,y to > another x,y does not mean you are "selecting" what is visible on the > screen, as your human eyes see it. Whoever said they were human eyes ;). Seriously though, while there are always encoding issues and the like, given a reasonable application/clipboard type the region you select should be what is visible, so I'm not sure what you're suggesting. Are you just making the point that when I select the text it's converting an essentially drawn image into an encoding text. But of course anything displayed on the monitor or printed out could be argued to be just pixels and/or vectors. > > PDF is pure layout, no structure. Tables are positioned text > and lines, columns are positioned text... its basically OCR, without > any character detection. Sorry, just guess I'm confused. You said there was no text in pdfs (implying to me just images). Chapter 5 of the Reference has a lot of info of how to include text. I'm not sure what OCR (Optical Character Recognition) means if it doesn't do any character detection... I didn't see any mention of structure anywhere in the email you sent. Just that it was impossible to have text. Which is odd since there are regions of text in a pdf document with instructions on how to draw that text. They can be encoded or just inserted when creating the document. Granted, the encoded streams are a bit of a pain, but they're arguably just as much text as any other I'm not trying to make a mountain of a molehill here. Just didn't want some people to get the impression that pdfs were solely graphic-orientated (like say...jpeg). Perhaps we have different ideas of text. Seriously, no offense to anyone. Just wanted to clarify things. I'm skeptical about bowerbird's claims as well, but it's misleading to say that Acrobat doesn't store text in the document. It is possible to make the text rather obscure, but that doesn't mean that if formatted correctly you could not scan through the file in a text editor and read it. Granted, it's rarely done, but doesn't mean it's impossible. Jon > > > > David A. Desrosiers > desrod@gnu-designs.com > http://gnu-designs.com > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Bowerbird at aol.com Thu Jun 16 09:53:18 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 16 09:53:32 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf Message-ID: <1e8.3d51866f.2fe3087e@aol.com> david said: > Acrobat doesn't store text in PDFs, they store pixels and > vectors and OCR'd coordinates. Most-definately not text. you must be making some kind of semantic argument that i don't grasp. because i can copy out my text just peachy. and most people can copy text out of many .pdfs just fine. it typically loses a good deal of its formatting, and it is not unusual for chunks of it to be ordered "out of place", and the users' ability to copy out text _can_ be disabled, or subverted in other ways (i.e., by converting text to an image format before writing it to the .pdf originally) but the experience of copying text from a .pdf is common. however, if you'd like to explain the point you're making, whether it is semantic or otherwise, do please feel free. :+) it probably won't matter much to me, but i don't mind keeping my brain exercised by doing a little thinking... -bowerbird From tim at tmeekins.com Thu Jun 16 10:03:29 2005 From: tim at tmeekins.com (Tim Meekins) Date: Thu Jun 16 10:03:47 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf References: <62.5703acdd.2fe13657@aol.com> Message-ID: <021401c57295$524a6cf0$3201a8c0@pink> Wrong! PDF most definately stores text. ----- Original Message ----- From: "David A. Desrosiers" To: "Project Gutenberg Volunteer Discussion" Sent: Thursday, June 16, 2005 6:04 AM Subject: Re: [gutvol-d] roundtripping formatted text through a .pdf > >> make a few global changes -- which restores the whitespace acrobat >> usually strips from text -- and you can load the text back into my >> z.m.l. viewer-program and generate the same .pdf once again... > > Acrobat doesn't store text in PDFs, they store pixels and > vectors and OCR'd coordinates. Most-definately not text. > > > David A. Desrosiers > desrod@gnu-designs.com > http://gnu-designs.com > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From marcello at perathoner.de Thu Jun 16 10:18:22 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Jun 16 10:18:31 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: References: <62.5703acdd.2fe13657@aol.com> Message-ID: <42B1B45E.8030806@perathoner.de> David A. Desrosiers wrote: > Acrobat doesn't store text in PDFs, they store pixels and vectors and > OCR'd coordinates. Most-definately not text. You are most definitely wrong there. How else would the "find" function work? Here's an example of a pdf file contents: /F23 17.215 Tf 56.693 509.046 Td[(Chapter)-250(I)]TJ/F23 24.787 Tf 0 -74.229 Td[(Do)10(wn)-250(the)-250(Rab)10(bit-Hole)]TJ/F20 11.955 Tf 0 -44.334 Td[(Alice)-300(w)10(as)-299(be)15(ginning)-300(to)-299(get)-300(v)15(ery)-299(tired)-300(of)-300(sitting)-299(by)-300(her)-299(sister)-300(on)]TJ 0 -14.446 Td[(the)-354(bank,)-380(and)-354(of)-354(ha)20(ving)-354(nothing)-353(to)-354(do:)-518(once)-354(or)-354(twice)-354(she)-354(had)]TJ 0 -14.446 Td[(peeped)-198(into)-199(the)-198(book)-199(her)-198(sister)-199(w)10(as)-198(reading,)-209(b)20(ut)-198(it)-199(had)-198(no)-199(pictures)]TJ 0 -14.446 Td[(or)-321(con)40(v)15(ersations)-321(in)-321(it,)-339(`and)-321(what)-321(is)-321(the)-321(use)-321(of)-321(a)-321(book,')-339(thought)]TJ 0 -14.445 Td[(Alice)-250(`without)-250(pictures)-250(or)-250(con)40(v)15(ersation?')]TJ You see that all the text is there. Spaces are simulated by horizontal movement and kernings also. It would not be too difficult to write a perl script to recover the text out of the pdf. -- Marcello Perathoner webmaster@gutenberg.org From jhowse at nf.sympatico.ca Thu Jun 16 15:10:17 2005 From: jhowse at nf.sympatico.ca (JHowse) Date: Thu Jun 16 10:41:12 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: <42B1B45E.8030806@perathoner.de> References: <62.5703acdd.2fe13657@aol.com> Message-ID: <5.1.0.14.0.20050616150642.00a688a0@pop1.nf.sympatico.ca> At 07:18 PM 16/06/05 +0200, you wrote: >David A. Desrosiers wrote: > >>Acrobat doesn't store text in PDFs, they store pixels and vectors and >>OCR'd coordinates. Most-definately not text. > >You are most definitely wrong there. How else would the "find" function >work? [snip] And fonts are imbedding into a pdf file! >You see that all the text is there. Spaces are simulated by horizontal >movement and kernings also. It would not be too difficult to write a perl >script to recover the text out of the pdf. or if you have the full adobe acrobat programme you can simply export to a rtf file. I did that sort of thing at work for three years. You may have to do some formatting to pretty it up, but it's definitely text. JHowse ================================================================================ "I'm not likely to write a great novel or compose a song or save a baby from a burning building...but I can help make sure that there is an electronic library of free knowledge available for future people to access."--jhutch. Preserving History One Page at a Time!! Celebrating our 6750th book posted to Project Gutenberg Join Project Gutenberg's Distributed Proofreaders http://www.pgdp.net/c/ ================================================================================ From hacker at gnu-designs.com Thu Jun 16 10:43:14 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Thu Jun 16 10:43:37 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: <4a6dc760506160852491a30b5@mail.gmail.com> References: <62.5703acdd.2fe13657@aol.com> <4a6dc7605061608024b004c32@mail.gmail.com> <4a6dc760506160852491a30b5@mail.gmail.com> Message-ID: > It is possible to make the text rather obscure, but that doesn't > mean that if formatted correctly you could not scan through the file > in a text editor and read it. Granted, it's rarely done, but > doesn't mean it's impossible. I just ran strings(1) across about 40 of the PDFs I have here from various clients, online resources and PDFs I've created in Windows and with OpenOffice.org, and not a single one contained any readible strings that are actually in the _content_ of the documents themselves, other than the strings which comprise URLs embedded in the document itself. So where is the text of the document stored? If its somewhere in here, why is it obfuscated by default, in every single PDF I have? The document content itself is most-definitely NOT stored as "plain text" in the pdf documents I have here, which is a pretty broad sample set. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From marcello at perathoner.de Thu Jun 16 11:07:46 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Jun 16 11:07:55 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: References: <62.5703acdd.2fe13657@aol.com> <4a6dc7605061608024b004c32@mail.gmail.com> <4a6dc760506160852491a30b5@mail.gmail.com> Message-ID: <42B1BFF2.1040905@perathoner.de> David A. Desrosiers wrote: > I just ran strings(1) across about 40 of the PDFs I have here from > various clients, online resources and PDFs I've created in Windows > and with OpenOffice.org, and not a single one contained any readible > strings that are actually in the _content_ of the documents > themselves, other than the strings which comprise URLs embedded in > the document itself. > > So where is the text of the document stored? If its somewhere in > here, why is it obfuscated by default, in every single PDF I have? > > The document content itself is most-definitely NOT stored as "plain > text" in the pdf documents I have here, which is a pretty broad > sample set. A pdf is a chunked file format and each chunk can be compressed or even encrypted. A run-of-the-mill pdf is always at least compressed. If you create your own pdf with pdftex you can set the compression level to 0 and lo! the text magically appears inside the pdf. -- Marcello Perathoner webmaster@gutenberg.org From jonathan.gorman at gmail.com Thu Jun 16 11:09:07 2005 From: jonathan.gorman at gmail.com (Jon Gorman) Date: Thu Jun 16 11:10:02 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: References: <62.5703acdd.2fe13657@aol.com> <4a6dc7605061608024b004c32@mail.gmail.com> <4a6dc760506160852491a30b5@mail.gmail.com> Message-ID: <4a6dc760506161109330131f7@mail.gmail.com> On 6/16/05, David A. Desrosiers wrote: > > > It is possible to make the text rather obscure, but that doesn't > > mean that if formatted correctly you could not scan through the file > > in a text editor and read it. Granted, it's rarely done, but > > doesn't mean it's impossible. > > I just ran strings(1) across about 40 of the PDFs I have here > from various clients, online resources and PDFs I've created in > Windows and with OpenOffice.org, and not a single one contained any > readible strings that are actually in the _content_ of the documents > themselves, other than the strings which comprise URLs embedded in the > document itself. > > So where is the text of the document stored? If its somewhere > in here, why is it obfuscated by default, in every single PDF I have? > In text blocks within the documents which can be encoded and are referenced from the part of the document that sets up the layout. > The document content itself is most-definitely NOT stored as > "plain text" in the pdf documents I have here, which is a pretty broad > sample set. People are not arguing the average case. Like I said, it's rare for it not to be obfuscated. But guess what, improbable != impossible. You said it was impossible, that the information was stored purely as pixels and vectors. It's not. There is a whole subculture that is quite used to the idea of there being embedded text from when direct tinkering with postscript/tex processing was more common. You might need a tool more complex than strings to grab the textual information out if obsfuscated (since it can really be an encoding within an encoding). I'm at a loss to what your example run proved. Just that's rare. And Marcello was kind enough to provide an example where it was not obfuscated. See those ()? Simple definition of them (It's been a while since I read the Reference, so this isn't 100%) means that the characters are not in another encoding so there is no need to convert them when generating the page. It's pretty well known that the great number of automatic pdf generators can create some very unreadable code. I knew someone who was bitterly disappointed at the amount of cruft and difficulty it brings to working with them. But ideally they still follow the rules in the Reference (it's annoying to find, but it is available through the adobe site). If it's not in that syntax, it's no more an pdf than if an "almost-XML" document had elements with no closing tags. If I had time, I'd write one by hand for ya that had none of the encoding mess. I'd agree most pdf documents would be a pain to handle by hand, but you wouldn't have to apply OCR like techniques to most. Just write a parser based off specs. I'm confused at the point of all of this. You seemed to be implying that bowerbird couldn't be doing what he claimed because: " Acrobat doesn't store text in PDFs, they store pixels and vectors and OCR'd coordinates. " Multiple people have pointed out that this is wrong, that there is text within pdfs. They've shown examples. Remember, probably most of the obstfucation code is there for more nefarious reasons, but some of the ideas come from valid problems with multiple character encoding sets. (We're talking about techniques established well before Unicode) I'm not arguing that the format is good or bad, that we should abandon ACII files here at gutenberg or anything along those lines. Just that your statement was misleading. A pdf is not like a jpeg. In fact, as far as vector-based systems go I'm not familiar with any vector system that doesn't store text in a file instead of just pure vector representation of characters due to efficiency reasons. Jon Gorman From Bowerbird at aol.com Thu Jun 16 11:48:27 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 16 11:48:39 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf Message-ID: <20f.31afa78.2fe3237b@aol.com> jon gorman said: > Just wanted to clarify things. that's good. i like clarification... :+) > I'm skeptical about bowerbird's claims as well that's good. i like skeptics... ;+) but the proof is in the pudding, jon, the proof is in the pudding... > but it's misleading to say that > Acrobat doesn't store text in the document. i believe, like you, that that would be a misleading statement. > It is possible to make the text rather obscure well, as i said, one _can_ make it rather totally "obscure" by converting it to graphic format before writing it to the .pdf. in that case, the user cannot copy out the text -- as text -- to the clipboard. such "text" is not found by "find" either. (here i'm largely speaking, of course, as a _programmer_ who is actually outputting the content to the .pdf driver. most people creating a .pdf don't have that luxury, in that they're stuck with whatever their authoring tool might do. as a sidebar here, i will note that the problems involved in copying text from a .pdf are well-known and long-standing, so they _should_ have been addressed by the programmers of common authoring tools, like word-processors, by this time. in programming my tool, i have sought to empower my users, including in this arena of round-tripping text put into a .pdf.) > but that doesn't mean that if formatted correctly > you could not scan through the file in a text editor and > read it. Granted, it's rarely done, but doesn't mean > it's impossible. well, i believe your statement is misleading as well, jon... (and if you're striving to "clarify" things, you really should try something to see if you _can_ do it before you _say_ you can...) load a .pdf into an editor; you won't find much (if any) text qua text, not in a recognizable form you can easily copy out to the clipboard. (it's not _impossible_ you will find some text, depending upon how the .pdf was created, since there is text in some .ps files. but it's never a long unbroken stretch before it is interrupted by postscript commands, so this approach is doomed to failure.) so one shouldn't expect to find text -- stored as text -- in a .pdf, not in the traditional sense. (however, see the p.s. on this post.) nonetheless, if the text wasn't stored in the .pdf in _some_ way, users wouldn't be able to copy it out to the clipboard, would they? and acrobat wouldn't be able to do "find" operations on it, would it? (notably, though, you'll discover that acrobat's "find" capabilities don't extend to whitespace. for instance, you can't do a search for two spaces, even if there were such instances in the original file.) -bowerbird p.s. it might be possible to store text in the comments of a .pdf, i'm not sure. if you could, then that _might_ be interesting to do. (i will explore the possibility, especially when my app starts to create .pdfs directly without running them through a .pdf driver.) with such storage, one wouldn't need to pull the .pdf into acrobat in order to retrieve the text from it, which might be a capability that some people would find useful. (it would also allow ordinary search programs to search the .pdf.) but that's just gravy to me; as long as users can "roundtrip" text out of a .pdf, my goal is met. once people get used to my viewer, they won't even _want_ .pdfs. From hacker at gnu-designs.com Thu Jun 16 11:52:37 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Thu Jun 16 11:53:39 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: <4a6dc760506161109330131f7@mail.gmail.com> References: <62.5703acdd.2fe13657@aol.com> <4a6dc7605061608024b004c32@mail.gmail.com> <4a6dc760506160852491a30b5@mail.gmail.com> <4a6dc760506161109330131f7@mail.gmail.com> Message-ID: > You said it was impossible, that the information was stored purely > as pixels and vectors. It's not. I'll let it drop... except this one point: I never said it was "impossible" for a pdf to contain text in any of my messages (and further, I've never even used the word "impossible" in any message I've ever posted to this list, ever.) Every single pdf I have here is exactly that: 7-bit ascii text, and nothing more, but the text in the pdfs is definately not the text that comprises the content of the pdf itself. I have heard of binary pdfs, I don't have one here and couldn't find one out there. My collection includes pdfs which are heavily encrypted with the latest-n-greatest Adobe 7.whatever product, and they're still 100% ascii text, but none of the text (except urls) is document "content". > You might need a tool more complex than strings to grab the textual > information out if obsfuscated (since it can really be an encoding > within an encoding). I've got many here, and even seen quite a few commercial (proprietary, no source available) products hijacking pdftohtml's source for their pdf rendering. I think I may have found yet-another one last night that converts PDFs for display on a Palm handheld device (a commercial "Office Documents on Palm" product). Of course the output is absolutely horrible, as is the output of most PDFs, but that's another matter. > You seemed to be implying that bowerbird couldn't be doing what he > claimed because: " Acrobat doesn't store text in PDFs, they store > pixels and vectors and OCR'd coordinates. " Actually, no tools that can decompose PDF back to readible text produce anything worth using. In 100% of the cases I've found, which includes Open Source and commercial tools, you have to go back in and reformat the entire output by hand anyway. I've tried automating the rewrap, paragraph layout and many other aspects, and its just not worth it. Its easier to load it up in xpdf or acroread and cut and paste from the GUI into another file and format from that baseline. But back to the Bowerbird case... he contends that his Z.M.L. tool written in gwbasic (or whatever its using these days) can do everything including make coffe, walk the dog, and oh yeah, convert pdfs to a pleasant-to-read format. If this is true, this would be the first tool out of literally dozens that I've tried to accomplish this feat successfully. But I'm not going to go install DOS and gwbasic to find out. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Bowerbird at aol.com Thu Jun 16 12:49:29 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 16 12:49:45 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf Message-ID: <1f6.bdf00f6.2fe331c9@aol.com> david said: > But back to the Bowerbird case... i was wondering when we were gonna stop wasting time talking about frivolous topics like .pdf and usability, and get back to the most important topic of all -- me! so thanks for getting us back on-point, david... ;+) > he contends that his Z.M.L. tool > written in gwbasic (or whatever its using these days) realbasic. http://www.realsoftware.com it runs on mac (classic and o.s.x.) and windows (95 and up) and even some flavors of linux. likewise, it creates programs that run on all those platforms as well... and just this week, they announced a new version -- rb2005, which is written in realbasic, so it is a wonderful example of eating your own pudding -- and they are making the entry-level linux version free (as in free beer). i take it you're one of those language snobs who wouldn't even consider basic, but if i'm wrong, you should take a good look at it. lots of power in it, and cross-plat that really works. i have no linux experience, so i haven't compiled my viewer-program out to linux yet, but if you want to be my guinea-pig, i mean "alpha-tester", let me know. > can do everything including make coffee i don't drink coffee, so there are no plans in that regard. > walk the dog we have a cat. she walks herself. and given my gut, i should take my own walks. so again, no plans. i _do_ eat, however. and i like toasted-cheese sandwiches. so i _do_ have plans to put a routine in my viewer-program that will make a toasted-cheese sandwich. you'll be able to specify the type of bread, how light/dark you want it toasted, and any amount of several different types of cheeses, so i am quite excited about this. i just wish i knew how to program it. perhaps i'll start an open-source effort. got any advice for me? > and oh yeah, convert pdfs to a pleasant-to-read format. no, that's not really my objective. yes, the .pdfs that my program creates _are_ pleasant-to-read, because they're just a .pdf version of what my viewer displays... but the "roundtripping" goal is that when my program makes a .pdf, the end-user can copy the text out of it, make a few global changes, and then stick it right back into my viewer and it will look the same. create another .pdf from that and it'll look identical to the first .pdf; and you can again copy the text out of that, make the global changes, and then stick it right back into my viewer and it will look the same. no fuss, no muss, no reapplication of markup, just roundtrip usage... > If this is true, this would be the first tool out of literally dozens > that I've tried to accomplish this feat successfully. actually, getting the exact same text out that you put in is not all _that_ remarkable. or _shouldn't_ be, anyway. but yeah, i know of no other tool that can do it either... > But I'm not going to go install DOS and gwbasic to find out. you silly boy. i moved out of dos well over a decade ago. and gwbasic was always quite inferior compared to qbasic. (although, as a command-line processor, dos was a _very_ friendly interface for a power-user like myself. my word, i had .bat files that would interactively create .bat files! two-letter .bat files give you 500+ quickly-run commands. i tell ya, there were many times my efficiency could _fly_. compared to that, a graphical-user-interface is molasses. but hey, it's all about selling units to the masses, right?) anyway, david, wanna alpha-test? or would you prefer to wait for the toasted-cheese sandwich feature? -bowerbird From prosfilaes at gmail.com Thu Jun 16 14:23:53 2005 From: prosfilaes at gmail.com (David Starner) Date: Thu Jun 16 14:24:04 2005 Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenberg texts. In-Reply-To: References: <26b51c32050615132219efb74e@mail.gmail.com> <8v31b15dh04t720s70696hg7ip3n8djklk@4ax.com> <6d99d1fd050615133945b30ce3@mail.gmail.com> Message-ID: <6d99d1fd05061614235190a033@mail.gmail.com> On 6/16/05, Dave Fawthrop wrote: > On Wed, 15 Jun 2005 15:39:26 -0500, David Starner > wrote: > | Why? With decent margins, you can print letter on A4 or vice versa. > | And we aren't really talking typesetters here; typesetters don't care > | why size paper it is, since they're going to rip it apart and re-set > | it anyway. We're talking about people who are dumping our preformed > | blob to paper. > > Please note the Subject of this thread: > Print-on-demand and dead-tree copies of Gutenberg texts. I noted it. I don't see how it changes anything. > IMO felling trees is a Bad Idea, especially when with a little thought > fewer trees could be felled. Generous margins are always nice, and printing letter on A4 doesn't cause more trees to be cut down; it's the same amount of text, shaped differently. > Inconsistent use of SI units and international > standard paper sizes remain today a primary > cause for U.S. businesses failing to meet > the expectations of customers worldwide. And use of international standard paper sizes remains today a primary cause of international businesses failing to meet the expectations of American customers. It's cute how you point out that all the print-on-demand places are in America; perhaps that means that we should use American paper sizes, then? From jonathan.gorman at gmail.com Thu Jun 16 14:39:44 2005 From: jonathan.gorman at gmail.com (Jon Gorman) Date: Thu Jun 16 14:39:54 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: References: <62.5703acdd.2fe13657@aol.com> <4a6dc7605061608024b004c32@mail.gmail.com> <4a6dc760506160852491a30b5@mail.gmail.com> <4a6dc760506161109330131f7@mail.gmail.com> Message-ID: <4a6dc76050616143950c11c7e@mail.gmail.com> > I never said it was "impossible" for a pdf to contain text in > any of my messages (and further, I've never even used the word > "impossible" in any message I've ever posted to this list, ever.) And I shouldn't put words in your mouth. I'm sorry. I just interpreted " Acrobat doesn't store text in PDFs" to being that the specifications says it never stores text in pdfs, hence it would be impossible to add. I realized later it could also be interpreted slightly differently (either referring to the applications that create pdfs don't do it or because of common practice). It is a real pain to get the text out and getting worse which each version of pdf. > But back to the Bowerbird case... he contends that his Z.M.L. > tool written in gwbasic (or whatever its using these days) can do > everything including make coffe, walk the dog, and oh yeah, convert > pdfs to a pleasant-to-read format. I must admit perhaps I wasn't following closely but I think bowerbird just claimed the pdf that he exported was easy to import back as pdf (via copying out the text), not necessarily that he converted an existing pdf file. Of course, I'm probably wrong about that. Without capitols I sometimes get lost ;). Except for reading e.e. cummings I suppose. Again David, I'm sorry if I hurt any feelings or anything along those lines. I known some Palm developers so your name is familiar. They're happy for the help you contributed to the community so I would be in some hot water if I ticked you off by being a little too flippant. For some reason certain threads on this mailing lists tend to warp my brain I think. Wonder if it has anything to do with the odd nesting sensation I get when I read certain parts of gutvol-d. Jon From marcello at perathoner.de Thu Jun 16 15:27:46 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Jun 16 15:27:59 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: References: <62.5703acdd.2fe13657@aol.com> <4a6dc7605061608024b004c32@mail.gmail.com> <4a6dc760506160852491a30b5@mail.gmail.com> <4a6dc760506161109330131f7@mail.gmail.com> Message-ID: <42B1FCE2.5020405@perathoner.de> David A. Desrosiers wrote: > Every single pdf I have here is exactly that: 7-bit ascii text, and > nothing more, The encoding used in a pdf depends of the font technology: Type-1, Type-3, TrueType etc. You can link a dictionary to every font and thus change the standard encoding in any way you like. pdf can even accomodate multi-byte encodings. -- Marcello Perathoner webmaster@gutenberg.org From donovan at abs.net Thu Jun 16 15:42:17 2005 From: donovan at abs.net (D Garcia) Date: Thu Jun 16 15:39:53 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: <42B1BFF2.1040905@perathoner.de> References: <62.5703acdd.2fe13657@aol.com> <42B1BFF2.1040905@perathoner.de> Message-ID: <200506161842.17244.donovan@abs.net> On Thursday 16 June 2005 02:07 pm, Marcello Perathoner wrote: > David A. Desrosiers wrote: > A pdf is a chunked file format and each chunk can be compressed or even > encrypted. A run-of-the-mill pdf is always at least compressed. > > If you create your own pdf with pdftex you can set the compression > level to 0 and lo! the text magically appears inside the pdf. And if you're truly insane (and or interested) in the format, you can obtain the specs and learn how to write a PDF by hand in a standard text editor. (Which, yes, I have done, including writing vector graphics.) If you understand the technique, you can even write simple scripts in (your interpreted language of choice) to output simple PDF files directly, which is great for doing things like cgi report generation without library dependencies and the like. iirc, the most commonly used compression in PDF is FLATE, which is relatively trivial and fast/good enough for the majority of cases. From Bowerbird at aol.com Thu Jun 16 15:52:10 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 16 15:52:27 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf Message-ID: <55.75704963.2fe35c9a@aol.com> jon gorman said: > I think bowerbird just claimed the pdf that he exported > was easy to import back as pdf (via copying out the text), > not necessarily that he converted an existing pdf file. > Of course, I'm probably wrong about that. > Without capitols I sometimes get lost ;) i understand... :+) and what you've said is pretty-much correct. yes, i'm _only_ talking about .pdfs that _my_ viewer-app creates. (if some other program created the .pdf, then blame that program.) so, you put plain-text into my viewer, and it formats it nicely. you can print that nice formatting to a .pdf (which looks nice). and then you can copy the text out of the .pdf. when you do that, much of the nice formatting has been stripped away, of course, and we're back to plain-text again. (if i remember correctly, .pdf _does_ retain italicizing, but it _doesn't_ retain bolding. i don't have the faintest idea why, it's kinda weird like that. and it definitely stores the color of the text, which is cute. but it definitely strips the _size_ of the text, which is bad. all of this is in _my_ version of acrobat reader, which is v4. we talk about acrobat/.pdf like it's one straightforward thing, but it's a crazy mish-mash of different-and-changing versions, so all of our discussion needs to be couched in careful clauses.) but the loss of formatting doesn't matter, because after you have made a few global changes (which, among other things, restore the blank lines between paragraphs that get stripped), you can put the text back into my viewer-program, and it will redo the nice formatting, just like it did it in the first place... with zen markup, this is all pretty easy to accomplish... :+) > If I had time, I'd write one by hand for ya > that had none of the encoding mess. i'd love to see that! -bowerbird From hacker at gnu-designs.com Fri Jun 17 02:46:46 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Fri Jun 17 02:48:02 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: <4a6dc76050616143950c11c7e@mail.gmail.com> References: <62.5703acdd.2fe13657@aol.com> <4a6dc7605061608024b004c32@mail.gmail.com> <4a6dc760506160852491a30b5@mail.gmail.com> <4a6dc760506161109330131f7@mail.gmail.com> <4a6dc76050616143950c11c7e@mail.gmail.com> Message-ID: > Again David, I'm sorry if I hurt any feelings or anything along > those lines. I known some Palm developers so your name is familiar. > They're happy for the help you contributed to the community so I > would be in some hot water if I ticked you off by being a little too > flippant. I have very thick skin, it takes a lot to hurt my feelings ;) No harm, no foul. Your comments (and those of others) were informative and worthwhile. > Wonder if it has anything to do with the odd nesting sensation I get > when I read certain parts of gutvol-d. I know... let's blame Bowerbird! ;) j/k David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From marcello at perathoner.de Fri Jun 17 03:15:16 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Jun 17 03:15:40 2005 Subject: [gutvol-d] roundtripping formatted text through a .pdf In-Reply-To: <55.75704963.2fe35c9a@aol.com> References: <55.75704963.2fe35c9a@aol.com> Message-ID: <42B2A2B4.1020402@perathoner.de> Bowerbird@aol.com wrote: > but the loss of formatting doesn't matter, because after you > have made a few global changes (which, among other things, > restore the blank lines between paragraphs that get stripped), > you can put the text back into my viewer-program, and it will > redo the nice formatting, just like it did it in the first place... It's no round-tripping if you have to hand-tweak the files. Before I'd have to re-apply by hand all things your program fumbled along the way, I'd "round-trip" the pdf thru images and Abbyy Finereader. (That works for *any* pdf.) What use is this feature anyway, if you just `round-trip' pdfs produced by your program? Then why not keep the zml file around? If you could convert *all* pdf files into zml, that would be something. Or did you just learn a new buzz-word: "round-trip", and are milking it for what its worth? -- Marcello Perathoner webmaster@gutenberg.org From hyphen at hyphenologist.co.uk Fri Jun 17 03:34:46 2005 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Fri Jun 17 03:35:34 2005 Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenberg texts. In-Reply-To: <6d99d1fd05061614235190a033@mail.gmail.com> References: <26b51c32050615132219efb74e@mail.gmail.com> <8v31b15dh04t720s70696hg7ip3n8djklk@4ax.com> <6d99d1fd050615133945b30ce3@mail.gmail.com> <6d99d1fd05061614235190a033@mail.gmail.com> Message-ID: On Thu, 16 Jun 2005 16:23:53 -0500, David Starner wrote: | > Inconsistent use of SI units and international | > standard paper sizes remain today a primary | > cause for U.S. businesses failing to meet | > the expectations of customers worldwide. | | And use of international standard paper sizes remains today a primary | cause of international businesses failing to meet the expectations of | American customers. What is it about *international* which you do not understand. | It's cute how you point out that all the print-on-demand places are in | America; perhaps that means that we should use American paper sizes, | then? Now that is a strange attitude in Project Gutenberg which is named after a person who lived in Mainz, which is now part of Germany and was at the time part of Europe. http://www.greatsite.com/timeline-english-bible-history/gutenberg.html Incidentally he died in 1468, and the Pilgrim fathers sailed from Plymouth, Devon, England, in the Mayflower on 16 September 1620, some 150 years after Gutenberg died. -- Dave Fawthrop http://www.webshots.com Thousands of wonderful professional photos for your Wallpaper and Screensaver. also 200,000 amateur pics. Four new pics each day. From nwolcott at dsdial.net Thu Jun 16 20:47:15 2005 From: nwolcott at dsdial.net (N Wolcott) Date: Fri Jun 17 07:57:33 2005 Subject: [gutvol-d] Print-on-demand and dead-tree copies of Gutenberg texts. References: <26b51c32050614142824831367@mail.gmail.com> Message-ID: <00cf01c5734c$c2891680$049495ce@gw98> Bowerbird's comments are very appropriate. If you have a special book you wish to republish, then you can afford the time and effort to get it ready for POD. I have published 2 books this way, by Jules Verne, which I did as an experiment in republishing a 100 year old book with illustrations. You can see it at WWW.LULU.COM, search for Verne as you will see "The Blockade Runners". Lulu is the only way to go if you do not want to pay up front charges, all other POD publishers require $500 up front. The disadvantage with Lulu is that you have to be prepared to get the book ready for press. There are a lot of things necessary to do this -- just take page numbers for example, getting them on the right place on the page (different for right and left maybe) running headers, page breaks or not for chapters, illustrations, cover design, back cover design, blurb for cover insert, art work for cover, footnotes properly numbered and placed on the page, choice of fonts, you may need type 1 fonts for a good appearance, you may need Adobe or Quark to do a half way presentable job for your book. It took me about 6 weeks to get my books (they are partly identical) ready for Lulu. They came out very well, and even selling them at cost comes out at $6 and then there is uncle sam's $2 minimum for postage so none have been sold. Admittedly this is not a barn burner of a book, but as a dual language text it would be very useful as the French is quite elementary. I will probably do another book or two, I would like to get better pictures. The POD presses use 600 dpi lasers, 1200 dpi lasers are available and are a must for decent half tone pictures.(Letterpress uses 400 lpi plus). Unfortunately Lulu does not yet have them. Good luck on your first project! ----- Original Message ----- From: "grendelkhan" To: Sent: Tuesday, June 14, 2005 5:28 PM Subject: [gutvol-d] Print-on-demand and dead-tree copies of Gutenberg texts. > I was having a discussion with my father, and I thought I would bring > it up on the mailing list, as it seems to be the place for it. > > We'd just come out of our local Wal-Mart, and I'd noticed the > out-of-copyright books (classics and such) being sold for $6 to $11 > each. I commented that folks could just download the books for free if > they wanted to read them, but he asked how many people owned a > computer, and how many of those had heard of Project Gutenberg? > > So I did a bit of researching, and discovered that there exist "print > on demand" publishers, which instead of doing the offset-printing runs > of thousands and thousands of books, will, once a book has been > prepared and typeset, sometimes keep none at all in stock, and print > them only when ordered. > > It seems that it would be a good idea to come up with some way to > offer the majority of PG's catalog through some method of > print-on-demand publishing, selling at-cost. Many Gutenberg works are > obscure, and not of general enough interest to warrant a print run > from a traditional publisher. > > I'm aware that I could clearly run off and do this myself, but (a) I > wanted to get some feedback from the community at large, and (b) > print-on-demand publishing still requires start-up costs, and a > per-book "setup" fee of some kind, above and beyond the per-copy > materials cost. Given that PG has the Distributed Proofreaders to > provide lots and lots of work on worthy projects, and given that PGLAF > is a charitable organization which lots of people love, is there some > way to get around that issue? > > Would it be worth it to provide a source of dead-tree editions of many > of the archive's works? Thoughts? Objections? Pointers to some guy > who's been doing this for the last ten years that I failed to Google > up? > > --grendelkhan > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From nwolcott at dsdial.net Fri Jun 17 07:28:41 2005 From: nwolcott at dsdial.net (N Wolcott) Date: Fri Jun 17 07:57:37 2005 Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenbergtexts. References: <26b51c32050615132219efb74e@mail.gmail.com><8v31b15dh04t720s70696hg7ip3n8djklk@4ax.com> <26b51c3205061514095c5e8f8@mail.gmail.com> Message-ID: <00d101c5734c$c3c05e00$049495ce@gw98> If you google "Print on Demand" or "pod publishing" you will find that there are several surveys which review all the POD publishers and list their costs, minimums, etc. These are not exactly up to date but will give you a good basis for comparison. Many charge you for making the cover and for a book jacket too. don't forget those if you are going hardback. Also be aware of limits on book size. Cafe has an unrealistically low limit which preludes illustrations. But other than Lulu and Cafe you are dealing with the vanity press market where you are paying up front for "marketing" and whatever else that entails. ----- Original Message ----- From: "grendelkhan" To: Sent: Wednesday, June 15, 2005 5:09 PM Subject: Re: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenbergtexts. > On 6/15/05, Dave Fawthrop wrote: > > Just a mention that all Europe uses A4 paper. > > Anything designed solely for American paper sizes will be useless to > > typesetters in Europe. > > I was planning on 6"x9", which I think is the standard trade paperback > size. Except... hmm. > > http://www.cafepress.com/cp/info/help/learn_book_info.aspx > > Cafe Press states that 4.18in x 6.88in is the standard 'Mass Market > Paperback' size. Also 5in x 8in for 'Standard Paperback'. > > http://www.whitehallprinting.com/TrimSize.html > > Some random printing company lists 6x9 and 5.5x8.5 as 'Standard Trim Sizes'. > > http://www.powerhomebiz.com/vol93/selfpublishing2.htm > > Another random tutorial lists 6x9 and 5-3/8x8. > > http://www.josephzitt.com/books/smwb-howto.php#pod > > Says here that apparently the 6x9 format is standard, at least with > Lightning Source. > > Ah, and Cafe Press offers printing for $7 plus $0.03 per page with no > setup fees. So, probably not the cheapest option. Perhaps I'll prep > something and approach Lightning Source asking what they need in the > way of preparation supplies---that is, what can be done for them. > > Is 6x9 a standard paperback size in Europe? I suppose that's of less > interest. I'll be measuring some of my paperbacks at home this evening > once I get back from work. Maybe print and trim a few test pages or > something. > > --grendelkhan > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From nwolcott at dsdial.net Fri Jun 17 07:22:33 2005 From: nwolcott at dsdial.net (N Wolcott) Date: Fri Jun 17 07:57:44 2005 Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenbergtexts. References: <26b51c32050615132219efb74e@mail.gmail.com> Message-ID: <00d001c5734c$c33a1700$049495ce@gw98> The best way to see what is involved in publishing POD texts is to actually do that on Lulu. I have no interest financially one wayor another in Lulu, but they do offer a service: for the upfront per item charge of $4 and .02 per page (cheaper than xerox) they keep your book available on their hard drive in perpeptuity, or until they go out of business. This $4 charge is buried in the $500 up front charge by other POD publishers. Please note that Lulu is not doing the publishing, they are just providing a needed service between the producer of the book and Ingram, Lightspeed, and other POD sites which print thousands of texts per day. These biggies are not interested in answering your phone call, and their product is marketed through publishing channels for $20 to $30 per paperback copy, something Lulu provides for $6. Not that Lulu, as any small company in a niche market, has not had some problems. But these are largely faced up front on their message boards and addressed conscientiously by management. Taking a book through Lulu involves going through 6 steps that are the bare minimum for a publishing process. I encourage those who are interested in POD to actually get their feet wet and produce a book. Have you thought about cover art? Are you a professional illustrator? Can you afford an artist? And if as I found no one will buy a Lulu book even at their cost of $6, then even lowering the price to $1 would not produce any sales in this marketing oriented world. And do not forget there are massmarket publishers of pd books at $3 to $5 such as the Wordsworth Classics, you just do not see them in bookstores and must special order them. Also due to marketing processes they are not handled by book distributors but by newsvendors, which makes them even more difficult to obtain. And , as often the case in PG, the source of the 1800 version is not noted but is left to booksleuths to determine, it is not unusual that "sales" are low. Case in point: Journey to the Centre of the Earth, available on PG as a Journey to the Interior of the Earth, tr by Frederick A. Malleson, a fairly complete and literary Victorian translation, $3.95 special order at Barnes and Noble. ----- Original Message ----- From: "grendelkhan" To: Sent: Wednesday, June 15, 2005 4:22 PM Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenbergtexts. > Thanks to everyone for their comments so far! I'm learning quite a bit as I go. > > As of October 2003, the US Commerce department reported that about > three-fifths of households had a computer; a little over half had > internet access. > > https://www.esa.doc.gov/Reports/NationOnlineBroadband04.htm > > So it's not as bad as I was led to believe. Still, > > Perhaps I should have stated my goals a little more clearly. I have no > particular interest in making money or making a business out of this. > I'd simply like to make the books available---through whatever means > that may be---in dead-tree form. I suppose it's a terrible idea fo tie > the actual Project to a commercial entity by developing a working > relationship with them---I don't think an "Official Project Gutenberg > Edition" is a good idea. > > lulu.com, as mentioned, has no setup fees, but their pricing is a mite > stiff---$4.53 plus $0.02/page. Certainly better than buying stuff from > most university presses, but not exactly bargain-basement. Lightning > Source charges (based on some quick googling at > http://com1.runboard.com/bthescribesmessageboard.fwritingarchives.t45%7Coffs et=15 > ), $0.90 plus $0.013 per page, but I don't know what kind of binding > that requires, or what sort of setup fees they charge. Perhaps they'd > waive them if DP put out some sort of print-ready version in addition > to human-readable text. I'm thinking TeX->PDF here, as it's pretty > much the stablest human-readable-yet-fully-marked-up format available. > Thoughts? I suppose I should take a relatively short etext, mark it up > and see how it looks. > > I concur that simply throwing plain text, or even decent HTML, at > paper is a horrible idea. So, what I ask is---is there a way to > prepare the etexts as, in addition to HTML, whatever format is > print-ready for these machines? Since typesetting a ready copy is a > simple matter of feeding it to a Xerox DocuTech or whatever the > $100,000 piece of hardware the print shop uses is, how can we do the > necessary preprocessing ourselves? What exactly does the "setup fee" > include? > > Thanks to everyone again for being so helpful with this. > > --grendelkhan > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From nwolcott at dsdial.net Fri Jun 17 08:01:20 2005 From: nwolcott at dsdial.net (N Wolcott) Date: Fri Jun 17 10:51:55 2005 Subject: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenbergtexts. References: <26b51c32050615132219efb74e@mail.gmail.com><8v31b15dh04t720s70696hg7ip3n8djklk@4ax.com><6d99d1fd050615133945b30ce3@mail.gmail.com><6d99d1fd05061614235190a033@mail.gmail.com> Message-ID: <000201c57365$207ed000$0b9495ce@gw98> If I'm not mistaken Lightspeed and some of the biggest POD publishers are UK based. ----- Original Message ----- From: "Dave Fawthrop" To: "David Starner" ; "Project Gutenberg Volunteer Discussion" Sent: Friday, June 17, 2005 6:34 AM Subject: Re: [gutvol-d] Re: Print-on-demand and dead-tree copies of Gutenbergtexts. > On Thu, 16 Jun 2005 16:23:53 -0500, David Starner > wrote: > > > | > Inconsistent use of SI units and international > | > standard paper sizes remain today a primary > | > cause for U.S. businesses failing to meet > | > the expectations of customers worldwide. > | > | And use of international standard paper sizes remains today a primary > | cause of international businesses failing to meet the expectations of > | American customers. > > What is it about *international* which you do not understand. > > | It's cute how you point out that all the print-on-demand places are in > | America; perhaps that means that we should use American paper sizes, > | then? > > Now that is a strange attitude in Project Gutenberg which is named after a > person who lived in Mainz, which is now part of Germany and was at the time > part of Europe. > http://www.greatsite.com/timeline-english-bible-history/gutenberg.html > Incidentally he died in 1468, and the Pilgrim fathers sailed from Plymouth, > Devon, England, in the Mayflower on 16 September 1620, some 150 years after > Gutenberg died. > > -- > Dave Fawthrop http://www.webshots.com > Thousands of wonderful professional photos for your Wallpaper and > Screensaver. also 200,000 amateur pics. Four new pics each day. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > From grythumn at gmail.com Sun Jun 19 14:06:27 2005 From: grythumn at gmail.com (Robert Cicconetti) Date: Sun Jun 19 14:06:37 2005 Subject: [gutvol-d] Derivative works, or, what is copyrightable? Message-ID: <15cfa2a505061914061d3cfda9@mail.gmail.com> I've been going through my files trying to close out some partially finished projects. I have several Beatrix Potter books that had missing or damaged pages, and I went to the library today to try to fill in the blanks. Unfortunately, they only had the newer editions with a modern copyright, claiming a copyright because they had made a new transfer of the old watercolors. As far as I understand copyright law, this claim is bogus; a derivative work must be different enough from the original to be considered a new work; a slight technical improvement on the reproduction is not enough. An original lithograph, sure, but not making new screens. This may or may not be complicated by the fact that the publisher operates both out of the UK and the US. Is my understanding correct enough to go through with an official clearance request? Or shall I hunt for older copies? Potter books are not rare, but finding the older ones is more difficult. Thanks, R C From collin at xs4all.nl Sun Jun 19 15:04:14 2005 From: collin at xs4all.nl (Branko Collin) Date: Sun Jun 19 14:50:28 2005 Subject: [gutvol-d] Derivative works, or, what is copyrightable? In-Reply-To: <15cfa2a505061914061d3cfda9@mail.gmail.com> Message-ID: <42B607FE.401.4C8C51@localhost> On 19 Jun 2005, at 17:06, Robert Cicconetti wrote: > Is my understanding correct enough to go through with an official > clearance request? Or shall I hunt for older copies? Potter books are > not rare, but finding the older ones is more difficult. In this case I would say you even have a duty to your readers to use the newer reproductions. :-) The deciding court case in the US is Bridgeman v. Corel. A lot has been written about it on the web. The court's decision hinged on the concept of originality, IIRC. The idea being that the reproduction was made in such a way as to convey the intent of the original author as good as possible. So yes, I would send this in for clearance. The reason why PG might reject it is if you cannot show that these are indeed mere reproductions. -- branko collin collin@xs4all.nl From shimmin at uiuc.edu Sun Jun 19 17:34:16 2005 From: shimmin at uiuc.edu (shimmin@uiuc.edu) Date: Sun Jun 19 17:34:40 2005 Subject: [gutvol-d] Derivative works, or, what is copyrightable? Message-ID: <70a85e35.f1d820e.8198d00@expms5.cites.uiuc.edu> As another poster pointed out, in the U.S., Bridgeman v. Corel says that some mechanical reproductions are not "original works" for the purpose of copyrightability; indeed, the point of creating these works is to be unoriginal. Whether PGLAF wants to stand on Bridgeman in this particular case is their own decision; as always, the only sure test as to whether something is clearable is to try and clear it. That said, if it's not the illustrations you're interested in, but merely need to consult another edition to repair lacunae in the text you're dealing with, then just consult whatever editions you have easily at hand, and repair the text accordingly. From grythumn at gmail.com Sun Jun 19 18:43:08 2005 From: grythumn at gmail.com (Robert Cicconetti) Date: Sun Jun 19 18:43:23 2005 Subject: [gutvol-d] Derivative works, or, what is copyrightable? In-Reply-To: <70a85e35.f1d820e.8198d00@expms5.cites.uiuc.edu> References: <70a85e35.f1d820e.8198d00@expms5.cites.uiuc.edu> Message-ID: <15cfa2a505061918434a7da1a1@mail.gmail.com> On 6/19/05, shimmin@uiuc.edu wrote: > As another poster pointed out, in the U.S., Bridgeman v. Corel > says that some mechanical reproductions are not "original > works" for the purpose of copyrightability; indeed, the point > of creating these works is to be unoriginal. Whether PGLAF > wants to stand on Bridgeman in this particular case is their > own decision; as always, the only sure test as to whether > something is clearable is to try and clear it. Okay. I figured it'd be "Less Work For Greg" if I asked the list in general first. :) I'll fill out the clearances tonight. To be honest, the differences are fairly small; they've cleared out some of the screening artifacts and the colors are a little more vivid; how much of that is because they are less than 20 years old I cannot say. :) > That said, if it's not the illustrations you're interested in, > but merely need to consult another edition to repair lacunae > in the text you're dealing with, then just consult whatever > editions you have easily at hand, and repair the text accordingly. Unfortunately, I need both the images and the text. We're fairly close to having a complete set; once I finish up the extant books (One will have to be DP-EU only; it's from 1930) I plan to go back and produce some cleaner scans for my first few books and possibly those from the other PMs (assuming I get permission; I don't want to step on toes.) R C From gbnewby at pglaf.org Sun Jun 19 19:06:14 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Jun 19 19:06:15 2005 Subject: [gutvol-d] Derivative works, or, what is copyrightable? In-Reply-To: <15cfa2a505061918434a7da1a1@mail.gmail.com> References: <70a85e35.f1d820e.8198d00@expms5.cites.uiuc.edu> <15cfa2a505061918434a7da1a1@mail.gmail.com> Message-ID: <20050620020614.GA24974@pglaf.org> On Sun, Jun 19, 2005 at 09:43:08PM -0400, Robert Cicconetti wrote: > On 6/19/05, shimmin@uiuc.edu wrote: > > As another poster pointed out, in the U.S., Bridgeman v. Corel > > says that some mechanical reproductions are not "original > > works" for the purpose of copyrightability; indeed, the point > > of creating these works is to be unoriginal. Whether PGLAF > > wants to stand on Bridgeman in this particular case is their > > own decision; as always, the only sure test as to whether > > something is clearable is to try and clear it. > > Okay. I figured it'd be "Less Work For Greg" if I asked the list in > general first. :) (Yes, it was a good discussion!) > I'll fill out the clearances tonight. To be honest, the differences > are fairly small; they've cleared out some of the screening artifacts > and the colors are a little more vivid; how much of that is because > they are less than 20 years old I cannot say. :) As people have said: doing such updates does not qualify for a new copyright, in our view. > > That said, if it's not the illustrations you're interested in, > > but merely need to consult another edition to repair lacunae > > in the text you're dealing with, then just consult whatever > > editions you have easily at hand, and repair the text accordingly. > > Unfortunately, I need both the images and the text. > > We're fairly close to having a complete set; once I finish up the > extant books (One will have to be DP-EU only; it's from 1930) I plan > to go back and produce some cleaner scans for my first few books and > possibly those from the other PMs (assuming I get permission; I don't > want to step on toes.) I know there were some issues with some Potter illustrations coming later than 1923, but as long as we can clear the images they're fine to include. For a reminder, here's our policy that relates (at least peripherally) to the issue of cleaned up images. Thanks! Greg PROJECT GUTENBERG'S POSITION ON "SWEAT OF THE BROW" COPYRIGHT CLAIMS Work performed on a public domain item, known as sweat of the brow, does not result in a new copyright. This is the judgment of Project Gutenberg's copyright lawyers, and is founded in a study of case law in the United States. This is founded in the notion of authorship, which is a prerequisite for a new copyright. Non-authorship activities do not create a new copyright. Some organizations erroneously claim a new copyright when they add value to a public domain item, such as to an old printed book. But despite the difficulty of the work involved, none of these activities result in new copyright protection when performed on a public domain item: - scanning and optical character recognition (OCR) - proofreading and OCR error correction - fixing spelling and typography, including substantial updates to spelling such as changing from American to British - adding markup (HTML, XML, TeX, etc.) - digitizing, cropping, color-adjusting or other modifications to images - addition of trivial new content, such as images to indicate page breaks in an HTML file, or pictures of gothic letters for the first letter in a chapter, or adding or removing a few words per chapter. - substantial reorganization, such as moving footnotes to end-notes, or changing the locations of pictures within the text - recoding to new character sets, such as Unicode, or new formats, such as PDF There is some value-added content that DOES get a new copyright, but only for the actual new work (that is, it may be possible to remove the new copyrighted content to go back to a public domain document): - translation into another human language - creating a new compilation of existing materials (though the individual items compiled retain their public domain status) - creating new original art work - creating an original derivative work, such as an audio performance, a new chapter, or a set of favorite quotations - adding a new introduction or critical essay Project Gutenberg is able to utilize any material which is judged to be public domain in the country of use (i.e., the United States). If it is determined that components of a digital item are public domain, but others are not, then the copyrighted components may be removed without the permission of whoever owns the copyright for the new content. It is Project Gutenberg's practice to seek permission of copyright claimants before harvesting their materials. This is done in order to be polite, and to allow the producer or distributor to request a particular credit be used. But if permission is not given, public domain items can still be used by Project Gutenberg, typically without any attribution. Because Project Gutenberg receives submissions from many different sources, it is not always clear where an item came from. Volunteers who submit content they did not themselves generate should be diligent about reporting sources, even if the source will not be credited in the item as distributed by Project Gutenberg. Most recently updated April 6, 2004 From grythumn at gmail.com Sun Jun 19 20:33:24 2005 From: grythumn at gmail.com (Robert Cicconetti) Date: Sun Jun 19 20:33:41 2005 Subject: [gutvol-d] Derivative works, or, what is copyrightable? In-Reply-To: <20050620020614.GA24974@pglaf.org> References: <70a85e35.f1d820e.8198d00@expms5.cites.uiuc.edu> <15cfa2a505061918434a7da1a1@mail.gmail.com> <20050620020614.GA24974@pglaf.org> Message-ID: <15cfa2a505061920334e493726@mail.gmail.com> On 6/19/05, Greg Newby wrote: > On Sun, Jun 19, 2005 at 09:43:08PM -0400, Robert Cicconetti wrote: > > Okay. I figured it'd be "Less Work For Greg" if I asked the list in > > general first. :) > > (Yes, it was a good discussion!) Doing my bit to improve the signal to noise ratio. :) > > I'll fill out the clearances tonight. To be honest, the differences > > are fairly small; they've cleared out some of the screening artifacts > > and the colors are a little more vivid; how much of that is because > > they are less than 20 years old I cannot say. :) > > As people have said: doing such updates does not qualify > for a new copyright, in our view. Great! > > We're fairly close to having a complete set; once I finish up the > > extant books (One will have to be DP-EU only; it's from 1930) I plan > > to go back and produce some cleaner scans for my first few books and > > possibly those from the other PMs (assuming I get permission; I don't > > want to step on toes.) > > I know there were some issues with some Potter illustrations coming > later than 1923, but as long as we can clear the images they're fine > to include. The problem is that the entire book was not published until 1930; apparently most of the images were created in 1906, but the work was not completed and published for a long time. It was renewed (R206616), so it is copyrighted here for a while, and in Life+70 until after 2013. However, it is clearable under Life+50 copyright law. (The Tale of Little Pig Robinson.) Aside from that, only The Story of the Fierce Bad Rabbit is unscanned, and I shall correct that now that I can use the newer edition. The other books are in various states of completion; most are waiting on missing pages. I have spent a fair amount of time and effort to get the images looking right; these are more visual works than written ones. R C From joshua at hutchinson.net Tue Jun 21 10:43:24 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Jun 21 10:43:31 2005 Subject: [gutvol-d] Baha'i Faith texts - Terms of Use acceptable for us? Message-ID: <20050621174324.B57D09E9E5@ws6-2.us4.outblaze.com> The Baha'i Faith makes available quite a bit of material in eBook form available on its website. Further, the Terms of Use (http://reference.bahai.org/en/terms.html) seem to make it perfectly acceptable to further distribute this work as long as the copyright and attribution is intact and it is for non-commercial use. Does anyone see a problem with "raiding" their material for inclusion in PG? I realize the texts will need to be reformatted to our standards and formats, but I can do that. Plus, as a side-benefit, they have many of the texts in Persian and Arabic as well as English, so we would be getting multiple languages represented in one swoop. Josh PS FYI, I am a Baha'i, but I don't speak in any official capacity. I'm going by the Terms of Use linked above. From Bowerbird at aol.com Tue Jun 21 12:16:21 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Jun 21 12:16:37 2005 Subject: [gutvol-d] header detection revisited Message-ID: <1f9.c2d11b6.2fe9c185@aol.com> it's been one month since my post about "detecting headers", in response to jon noring's "challenge" in that specific regard. in case you've forgotten, here's a quick recap: big and bold. that's what headers look like. conspicuous. real hard to miss. easy to find. as i said last month, i have developed a 30-item checklist. that's how many ways a header can make itself conspicuous. but the main way -- by far -- is simply to be big and/or bold. so it's time now for part 2. but first, any questions? don't be shy, step right up, because headers are the first step toward detecting all types of things. (which is why we need to discuss them in some more detail.) -bowerbird From jon at noring.name Tue Jun 21 13:47:50 2005 From: jon at noring.name (Jon Noring) Date: Tue Jun 21 13:48:06 2005 Subject: [gutvol-d] header detection revisited In-Reply-To: <1f9.c2d11b6.2fe9c185@aol.com> References: <1f9.c2d11b6.2fe9c185@aol.com> Message-ID: <1311771965.20050621144750@noring.name> Bowerbird wrote: > It's been one month since my post about "detecting headers", > in response to jon noring's "challenge" in that specific regard. > > in case you've forgotten, here's a quick recap: > > big and bold. that's what headers look like. > conspicuous. real hard to miss. easy to find. > > as i said last month, i have developed a 30-item checklist. > that's how many ways a header can make itself conspicuous. > but the main way -- by far -- is simply to be big and/or bold. > > so it's time now for part 2. > > but first, any questions? don't be shy, step right up, because > headers are the first step toward detecting all types of things. > (which is why we need to discuss them in some more detail.) There's enough variation in how headers can be formatted in print, as well as some other structures which look like headers but are not, that it is not possible to auto-determine with 100% reliability that something is a header. There are also language/country/time-era differences as well which further confuse matters. And even if one is able to correctly auto-determine that something is a header, there are sometimes difficulties in autodetecting the header level, which is usually important. It is simply not yet possible to reliably auto-determine the structure of books and documents. This is the big problem with PDF-to-whatever converters, since (unstructured) PDF does not preserve structural information -- it simply lays out the content according to visual typesetting conventions (which, of course, vary by country, language, time era, and the whims of the author/publisher.) Now, if the goal is to try to auto-determine a document's structure knowing that it won't always get it right, as part of a human proofing process (e.g., Distributed Proofreaders), then that is another matter. But it is hard to read from Bowerbird's comments as to whether he intends his methodology and tools to be part of a human proofing process, or to replace it entirely. I think he will find more acceptance of his methodology and tools by making clear the former. Jon Noring From Bowerbird at aol.com Tue Jun 21 14:27:46 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Jun 21 14:28:03 2005 Subject: [gutvol-d] header detection revisited Message-ID: <42.6ba397c5.2fe9e052@aol.com> as i step you through example after example after example -- all of 'em pre-existing, including many from the library itself, and handled with solid honest-to-goodness working source-code -- you'll come to realize fully how _easy_ it is to detect headers (and even header-level!), and the people who insist on telling you "it's impossible" will seem curiously illogical and out-of-touch... oh yeah, feel free to recommend any e-text from the whole library as a "test-case" that you would like me to consider in detail! :+) (to be fair, test-cases should include scans to resolve any doubts.) thank you! enjoy your first full day of summer! 90 degrees in l.a.! -bowerbird From lee at novomail.net Wed Jun 22 15:42:12 2005 From: lee at novomail.net (Lee Passey) Date: Wed Jun 22 15:42:31 2005 Subject: [gutvol-d] header detection revisited In-Reply-To: <20050622190003.C66AD8C837@pglaf.org> References: <20050622190003.C66AD8C837@pglaf.org> Message-ID: <42B9E944.2020900@novomail.net> >it's been one month since my post about "detecting headers", >in response to jon noring's "challenge" in that specific regard. > >in case you've forgotten, here's a quick recap: > > big and bold. that's what headers look like. > conspicuous. real hard to miss. easy to find. > >as i said last month, i have developed a 30-item checklist. >that's how many ways a header can make itself conspicuous. >but the main way -- by far -- is simply to be big and/or bold. > >so it's time now for part 2. > >but first, any questions? don't be shy, step right up, because >headers are the first step toward detecting all types of things. >(which is why we need to discuss them in some more detail.) > >-bowerbird > > The question is a bit ambiguous. What are you trying to detect headers _from_? AFAICT, Gutenberg e-texts don't have big and don't have bold, so neither can be the hallmark of a header in Gutentexts. Presumably, therefore, you are trying to detect headers in some marked-up text that uses some sort of presentational markup. Given your assumption that headers are 1. conspicuous, 2. hard to miss, and 3. easy to find (all variations on a theme), it seems to me that the best way to detect a header is to determine the general characteristics of the majority of all paragraphs in a document (size, indentation, amount of punctuation, location of punctuation, capitalization, etc.) and identify as headers any "paragraphs" which fall way outside the mean. I presume you have a reliable way to identify paragraphs (not always possible when using text derived from PDF files). Consider the shortest verse of the Bible: "Jesus wept." Biblical verses are merely numbered paragraphs. Can your algorithm determined that it is a paragraph and not a header? This is the problem of the false positive: it is as important to identify not-headers as it is to identify headers. You would be much more likely to increase your list of special cases if you would share the thirty-odd special cases you have already identified. From nwolcott at dsdial.net Thu Jun 23 10:32:23 2005 From: nwolcott at dsdial.net (N Wolcott) Date: Thu Jun 23 10:33:40 2005 Subject: [gutvol-d] Volunteers in New Jersey area? Message-ID: <001201c57819$8abec200$bd9495ce@gw98> Are there any PG volunteers near Rutgers University in New Jersey. Need some one wo scan a few pages of microfilm to disc or email, there is no charge for this apparently. N Wolcott nwolcott2@post.harvard.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050623/a2666a00/attachment.html From Bowerbird at aol.com Thu Jun 23 14:58:09 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jun 23 14:58:27 2005 Subject: [gutvol-d] header detection revisited Message-ID: lee said: > The question is a bit ambiguous. only if you haven't been following the drama for the last year-and-a-half. welcome to this listserve, lee. have you dropped the handle for good now? it's been some time since we chatted, especially frontchannel... > What are you trying to detect headers _from_? > AFAICT, Gutenberg e-texts don't have big and don't have bold, > so neither can be the hallmark of a header in Gutentexts. that's right. so for that i need to call on some of the other items in my 30-item checklist. the very best way to detect headers in a p.g. e-text is to test for blank-lines above the line in question. three blank lines will grab almost all of the headers, as well as a dose of false-alarms. the job then is to toss the false-alarms, and to do the best job possible of discerning the missed headers. and actually, in perhaps 25%-30% of project gutenberg's e-texts, pulling lines that start with "chapter" will net most headers. :+) > Presumably, therefore, you are trying to detect headers in some > marked-up text that uses some sort of presentational markup. "markup" doesn't usually enter into the equation. it can, of course, but if something has been marked up, a good way to find the headers is to examine the markup. nonetheless, i _can_ use my system on the _presentation_ of text that has been marked-up; many of my examples will be just that. as such, it can be used in cases where the mark-up is not available, for one reason (print) or another (.pdf), but its presentation is. but of more direct concern to this listserve, however, is its application toward the task that many people here do, which for the most part is to digitize text from scans of paper-books. a routine that recognizes headers in o.c.r. output -- because they are relatively big and/or set in bold -- saves the digitizer from that chore. i haven't discussed the importance revolving around header-recognition, so that might not seem like a big deal. but it is indeed rather important. (any e-book programmer, like yourself, lee, knows why it's important.) and, getting back again to the existing e-texts -- some 16,000+ now -- a routine for determining the headers in them would be quite valuable... if you're looking for a general overview, i focus on 3 distinct arenas: 1. strict z.m.l., where header-structure is defined by certain rules. 2. "fuzzy" mode, where texts are somewhat consistent, but not always. 3. "wild" texts, where all bets are off and you do the best that you can. project gutenberg's e-texts generally fall in the second category. as the examples i give will show, it would be relatively easy for me to make software that inputs text from the second category and then modifies it and outputs a file conforming to the strict first category. but nobody from project gutenberg took me up on my offer to do that... i've done enough work on arena #3 to know that it will be possible, although you can't expect perfect output from the tool on a wild text. i largely abandoned arena #2 when project gutenberg people passed, although there will be wide-ranging applicability of this arena on texts with some kind of regularity in them, such as listserve digests. but my main focus now is on spreading the gospel of arena #1 -- z.m.l. in z.m.l., headers are indicated simply by having blank lines above them. (and the more blank lines, the higher the priority-level of the heading, so it's a cinch to handle even the most complex of heading-structures.) this simplicity means that it's easy to write fast code to find headers in a z.m.l. file, and it's simple for users to understand how to make 'em. there is still a big explosion of self-publishing that will be happening, and i want to spare all those new writers the pains of doing mark-up. i'd much rather have them concentrating on their _content_ instead! once i've got all the tools in place to do what i want with arena #1, i'll return to arena #3. being able to take text "from the wild" and ascertain its underlying structure, and then output it in strict z.m.l., so it can be handled with my tools, will be an awesome achievement. again, this is an arena where markup is impractical, perhaps impossible. consider all the content that is being generated _every_single_day_ on yahoogroups. nobody's going to mark-up all that content, so we need to have a way of pulling it into our e-books and have it be nicely formatted. > Given your assumption that headers are > 1. conspicuous, > 2. hard to miss, and > 3. easy to find > (all variations on a theme) thanks for noticing the theme... ;+) but it's not really an _assumption_. (nice try to spin it that way, though.) it's actually an _observation_ on the very _nature_ of _being_ a _header_, one of those things that seems totally obvious once realized and verbalized. and of course, once you have realized that headers are _hard-to-miss_, it becomes very silly to maintain that it is "impossible" to detect them. of course you can detect them -- because they stick out like sore thumbs! > it seems to me that the best way to detect a header is to > determine the general characteristics of the majority of > all paragraphs in a document (size, indentation, amount of > punctuation, location of punctuation, capitalization, etc.) and > identify as headers any "paragraphs" which fall way outside the mean. now you're thinking. looks like you're on your way to replicating my 30-item checklist. > I presume you have a reliable way to identify paragraphs > (not always possible when using text derived from PDF files). well, yes. and the fact that text copied out of a .pdf loses its blank lines -- which then makes paragraph-detection exceedingly more difficult, -- does indeed make the detection of headers more difficult as well. which means you have to solve the paragraph-detection problem first, as best as you can, anyway, with text that you've copied out of a .pdf. restoring the paragraphs is a much bigger task than detecting headers. if you can't perform that hard task for end-users, why do the easy one? but the solution isn't as hard as you might think, although it's not 100%. when i'm done discussing headers, if you want to discuss this, we can... and besides, dealing with text copied out of a .pdf is not a high priority. the best way to deal with _that_ kind of text is to go to the producer and say, "can i instead have the file that you used to produce the .pdf?" but even without having solving this .pdf paragraph-detection problem, -- i.e., with all blank-lines removed -- my checklist does pretty well... > Consider the shortest verse of the Bible: "Jesus wept." > Biblical verses are merely numbered paragraphs. > Can your algorithm determined that it is a paragraph and not a header? um yeah. "headers" in the bible are "paragraphs" that are not numbered. and -- as you yourself just pointed out -- the actual verses are. voila. > This is the problem of the false positive: > it is as important to identify not-headers as it is to identify headers. yes it is. and much of the 30-item checklist is attuned to that issue. once you've accepted that this is part of the job, it's not all that hard. > You would be much more likely to > increase your list of special cases > if you would share the thirty-odd > special cases you have already identified. i haven't identified "thirty-odd special cases". i've abstracted 30 rules that act in combination to answer the question at hand -- is this a header? and it wasn't that hard. you can probably come up with 10-15 right off the top of your head, without even thinking too much. and if you subjected those to empirical testing on lots of e-texts, as i have over the course of the last 2-3 years, you would probably discover the rest of my 30 items. and then you too would be saying, "it's not impossible, folks, and in fact, it's not even all that difficult." there's no magic here. just hard work... -bowerbird From hart at pglaf.org Sun Jun 26 09:34:16 2005 From: hart at pglaf.org (Michael Hart) Date: Sun Jun 26 09:34:19 2005 Subject: [gutvol-d] Derivative works, or, what is copyrightable? In-Reply-To: <15cfa2a505061914061d3cfda9@mail.gmail.com> References: <15cfa2a505061914061d3cfda9@mail.gmail.com> Message-ID: We recently discussed the non-copyrightablity concerning new reproductions of old works, as per the recent court case of: Bridgeman Art Library v. Corel Corp In which it was determined that any reproductions of public domain works that were attempting to accurately reproduce the original works were not copyrightable, and this should be applicable here, as far as I can tell. I am not a lawyer. . .this is NOT a legal opinion or legal advice. IANAL = I am not a lawyer. However, I am sending this to two of our legal advisors for comment. Meanwhile, I will append the previous message concering Bridgeman Art Library v. Corel Corp below this message. Michael On Sun, 19 Jun 2005, Robert Cicconetti wrote: > I've been going through my files trying to close out some partially > finished projects. I have several Beatrix Potter books that had > missing or damaged pages, and I went to the library today to try to > fill in the blanks. > > Unfortunately, they only had the newer editions with a modern > copyright, claiming a copyright because they had made a new transfer > of the old watercolors. As far as I understand copyright law, this > claim is bogus; a derivative work must be different enough from the > original to be considered a new work; a slight technical improvement > on the reproduction is not enough. An original lithograph, sure, but > not making new screens. This may or may not be complicated by the fact > that the publisher operates both out of the UK and the US. > > Is my understanding correct enough to go through with an official > clearance request? Or shall I hunt for older copies? Potter books are > not rare, but finding the older ones is more difficult. > > Thanks, > R C To read the court decision, see Bridgeman Art Library v. Corel Corp, 36 F. Supp. 2d 191 (S.D.N.Y. 1999) This article by the American Association of Museums states in blunt terms that they expect the Bridgeman decision to stand. In fact they never brought a lawsuit like this, and asked Bridgeman to drop their suit, because they knew the decision would go against them. I will spare you my opinion about claiming to own something you know belongs to the public domain. Bridgeman Art Library v Corel Corp Many collage artists use reproductions of museum art in their work, assuming that a painting created hundreds of years ago must be in the public domain. To their chagrin, artists who try to publish such work have discovered that even if the original art is public domain, all existing reproductions are under copyright. This renders the original work completely out of reach, regardless of whether it is technically public domain. Museums prevent the viewing public from photographing art in their collections for many reasons, such as the expense and inconvenience of moving their art so it can be photographed. And more importantly, to preserve a monopoly over reproductions. Museums derive substantial income from posters, greeting cards, mouse pads etc. Naturally they want to protect their intellectual property. However, a recent court case may have shed new light on the situation. Bridgeman Art Library is a British company which licenses transparencies of museum art. In 1998, Bridgeman sued Corel, claiming that Corel's CD of fine art reproductions infringed on Bridgeman's copyright. The court determined that museum reproductions, whose purpose is to duplicate the original work as precisely as possible, do not involve enough originality to be copyrighted as a derivative work. In other words, a museum reproduction of fine art in the public domain is itself public domain, and unauthorized duplication of the reproduction is not copyright infringement. High-quality photography involves a great deal of skill and effort. That may make this decision seem unfair. After all, what is the point of going to all that work? A high quality reproduction has no more protection than an amateur snapshot. Probably less, since a snapshot will likely include elements (like an odd perspective or someone standing next to the artwork) that would qualify as originality. The court made a distinction between skill and originality. It may require an immense amount of skill to create a photograph that precisely duplicates a work of art. But, the court said, "'sweat of the brow' alone is not the 'creative spark' which is the sine qua non of originality." An exact duplicate deserves no more copyright protection than a photocopy. The decision noted that "There is little doubt that many photographs, probably the overwhelming majority, reflect at least the modest amount of originality required for copyright protection...." However, "Plaintiff by its own admission has labored to create "slavish copies" of public domain works of art. While it may be assumed that this required both skill and effort, there was no spark of originality -- indeed, the point of the exercise was to reproduce the underlying works with absolute fidelity. Copyright is not available in these circumstances." Speaking about this case, an attorney for the American Association of Museums said: "Just about every museum attorney looking at the case objectively thinks it came out the correct way according to U.S. copyright law -- that's why no museum had ever brought such a suit.... It would have been unwise for AAM to be on Bridgeman's side in this case because it would have undermined our credibility." Some important points to note: * Bridgeman v Corel affects only United States law. If you intend to publish your work in other countries besides the US, I would not recommend using this case as a guideline for legal use. * Bridgeman v Corel does not affect the law regarding photographs of three-dimensional works of art. The decision specifically addresses only two-dimensional works, where the goal is to duplicate the original as closely as possible. Photographing sculpture involves decisions about position, backdrop, lighting etc., all of which would probably make the photograph pass the "originality" test. However, this case does not discuss it one way or the other. * Bridgeman v Corel does not suggest that all museum reproductions are in the public domain. If the original is still under copyright, then so is the reproduction. * Bridgeman v Corel does not mean that you cannot be sued. Anyone can sue for any reason, whether or not they expect to win. (In fact, sometimes the threat of legal action is used as a bullying tactic, without any concern for who would win in court.) It does mean that you can copy museum reproductions of historical art in good faith. < back :: next > copyright ? 2000, 2001 by Sarah Ovenall. All rights reserved. From cannona at fireantproductions.com Sun Jun 26 19:24:21 2005 From: cannona at fireantproductions.com (Aaron Cannon) Date: Sun Jun 26 19:24:55 2005 Subject: [gutvol-d] PG Cookbook Message-ID: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions.com> I was thinking a few days ago about how PG has attracted volunteers from all over the world from different backgrounds, and I got to thinking that it might be kind of fun if PG were to compile a cookbook containing the favorite recipes from our volunteers. Since, in most cases, recipes can't be copyrighted, there shouldn't be any problem in that regard. I'll bet we could get a pretty sizable and diverse collection if we put the word out on this list and at DP. Anyway, it's just an idea. Thoughts? Any interest? Sincerely Aaron Cannon -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) From hacker at gnu-designs.com Sun Jun 26 19:30:14 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Sun Jun 26 19:30:55 2005 Subject: [gutvol-d] PG Cookbook In-Reply-To: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions.com> References: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions.com> Message-ID: > I was thinking a few days ago about how PG has attracted volunteers > from all over the world from different backgrounds, and I got to > thinking that it might be kind of fun if PG were to compile a > cookbook containing the favorite recipes from our volunteers. > Since, in most cases, recipes can't be copyrighted, there shouldn't > be any problem in that regard. I'll bet we could get a pretty > sizable and diverse collection if we put the word out on this list > and at DP. I'd be more than happy to compile this into a mobile version using Plucker, to beam/share with anyone who cares to read and distribute it. I've done quite a few for many other projects, which you can see some screenshots and samples of here: http://code.plkr.org/ Just let me know when its ready and I'll do the conversion to Plucker format (its not usually a straight-up conversion, in most cases, it requires some reformatting of the contents, adding a TOC, and many other subtle things). David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From JBuck814366460 at aol.com Sun Jun 26 20:36:07 2005 From: JBuck814366460 at aol.com (Jared Buck) Date: Sun Jun 26 20:36:21 2005 Subject: [gutvol-d] PG Cookbook In-Reply-To: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions.com> References: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions.com> Message-ID: <42BF7427.4010007@aol.com> Definately some interest from me, Aaron :) Got quite a few recipes I CAN share, including that for my world-famous(I hope) chocolate chip cookies! :) Jared Aaron Cannon wrote on 6/26/2005, 7:24 PM: > I was thinking a few days ago about how PG has attracted volunteers from > all over the world from different backgrounds, and I got to thinking that > it might be kind of fun if PG were to compile a cookbook containing the > favorite recipes from our volunteers. Since, in most cases, recipes > can't > be copyrighted, there shouldn't be any problem in that regard. I'll > bet we > could get a pretty sizable and diverse collection if we put the word > out on > this list and at DP. > > Anyway, it's just an idea. Thoughts? Any interest? > > Sincerely > Aaron Cannon > > > > > -- > E-mail: cannona@fireantproductions.com > Skype: cannona > MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail > address.) > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From tb at baechler.net Sun Jun 26 23:34:53 2005 From: tb at baechler.net (Tony Baechler) Date: Sun Jun 26 23:33:07 2005 Subject: [gutvol-d] PG Cookbook In-Reply-To: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions. com> Message-ID: <5.2.0.9.0.20050626233126.03fc5e40@bisinc.us> Hello. Well, while I don't specifically have any favorites, I have over 163,000 recipes I would be willing to donate if that helps. I have no idea of the copyright status of them though. Also, could you please elaborate on why recipes can't be copyrighted? Specifically, could you please tell me in which cases recipes can be protected by copyright? I have thought for many years about making recipes, either individually or in cookbook form available in Braille or similar formats for the blind, but I was always worried about the legal issues. The laws are very specific on how copyrighted works may be put into formats such as Braille and I have no money or means to defend myself in case of suits. You may write off list if you would like. From traverso at dm.unipi.it Sun Jun 26 23:57:15 2005 From: traverso at dm.unipi.it (Carlo Traverso) Date: Sun Jun 26 23:51:48 2005 Subject: [gutvol-d] PG Cookbook In-Reply-To: <5.2.0.9.0.20050626233126.03fc5e40@bisinc.us> (message from Tony Baechler on Sun, 26 Jun 2005 23:34:53 -0700) References: <5.2.0.9.0.20050626233126.03fc5e40@bisinc.us> Message-ID: <200506270657.j5R6vFE13175@pico.dm.unipi.it> IANAL, but with common sense I would say that: 1) a collection of recipes gets a copyright. 2) the exact wording of a recipe can get a copyright; but the recipe itself (as a description of a procedure) does not have a copyright. 3) the recipe itself (i.e. the final product) can be patented or trademarked. Carlo From cannona at fireantproductions.com Mon Jun 27 06:17:55 2005 From: cannona at fireantproductions.com (Aaron Cannon) Date: Mon Jun 27 06:22:57 2005 Subject: [gutvol-d] PG Cookbook In-Reply-To: <5.2.0.9.0.20050626233126.03fc5e40@bisinc.us> References: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions. com> <5.2.0.9.0.20050626233126.03fc5e40@bisinc.us> Message-ID: <6.2.1.2.0.20050627081606.03fe0b78@mail.fireantproductions.com> At 01:34 AM 6/27/2005, you wrote: >Hello. Well, while I don't specifically have any favorites, I have over >163,000 recipes I would be willing to donate if that helps. I have no >idea of the copyright status of them though. I think, for this compilation, we're aiming for quality, rather than quantity. But if you have a few particular favorites... >Also, could you please elaborate on why recipes can't be >copyrighted? Specifically, could you please tell me in which cases >recipes can be protected by copyright? I have thought for many years >about making recipes, either individually or in cookbook form available in >Braille or similar formats for the blind, but I was always worried about >the legal issues. The laws are very specific on how copyrighted works may >be put into formats such as Braille and I have no money or means to defend >myself in case of suits. You may write off list if you would like. The relevant web site for the US is here: http://www.copyright.gov/fls/fl122.html Sincerely Aaron Cannon >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) From j.hagerson at comcast.net Mon Jun 27 06:34:13 2005 From: j.hagerson at comcast.net (John Hagerson) Date: Mon Jun 27 06:34:20 2005 Subject: [gutvol-d] Amazon offers 1082 volume Penguin Classics for $7,989 Message-ID: <002001c57b1c$e8508920$0200a8c0@sarek> http://slashdot.org/article.pl?sid=05/06/27/0632258&from=rss From jon at noring.name Mon Jun 27 09:15:02 2005 From: jon at noring.name (Jon Noring) Date: Mon Jun 27 09:14:52 2005 Subject: [gutvol-d] Amazon offering of the complete "Penguin Classics Library" Message-ID: <1382534655.20050627101502@noring.name> Refer to: http://online.wsj.com/public/article/0,,SB111921715006463546-S0zI_EVookezthz8VC7m_WXjOAo_20060627,00.html?mod=blogs Fair Use snippet from above article: "We get a lot fewer random Amazon.com links sent to us since the great Henry Raddick stopped writing book reviews, something we're still mourning. But this one was jaw-dropping: The Penguin Classics Library Complete Collection, consisting of 1,082 books. List price: $13,317.74. Discount price: $7,989.99. Never has a 40% discount seemed quite so weighty." I think the interest to PG and DP is obvious. :^) Jon Noring From collin at xs4all.nl Mon Jun 27 14:03:44 2005 From: collin at xs4all.nl (Branko Collin) Date: Mon Jun 27 13:49:44 2005 Subject: [gutvol-d] Amazon offers 1082 volume Penguin Classics for $7,989 In-Reply-To: <002001c57b1c$e8508920$0200a8c0@sarek> Message-ID: <42C085D0.26479.29B00D4@localhost> On 27 Jun 2005, at 8:34, John Hagerson wrote: > http://slashdot.org/article.pl?sid=05/06/27/0632258&from=rss I noticed the server was a little slow this afternoon (CET). :-) -- branko collin collin@xs4all.nl From collin at xs4all.nl Mon Jun 27 14:20:55 2005 From: collin at xs4all.nl (Branko Collin) Date: Mon Jun 27 14:06:54 2005 Subject: [gutvol-d] PG Cookbook In-Reply-To: References: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions.com> Message-ID: <42C089D7.14465.2AABCF4@localhost> ??? wrote: > > I was thinking a few days ago about how PG has attracted volunteers > > from all over the world from different backgrounds, and I got to > > thinking that it might be kind of fun if PG were to compile a > > cookbook containing the favorite recipes from our volunteers. > > Since, in most cases, recipes can't be copyrighted, there shouldn't > > be any problem in that regard. I'll bet we could get a pretty > > sizable and diverse collection if we put the word out on this list > > and at DP. Sounds like a fun idea. However, I thought PG policy was to not publish previously unpublished works? Do I remember that correctly, and, if so, how would that influence this project? Perhaps PG needs to have a sister project with almost exactly the same goals, except that it will publish Vanity Press. Or were you talking about volunteers taking their recipes from the cookbooks in PG? -- branko collin collin@xs4all.nl From marcello at perathoner.de Mon Jun 27 15:06:39 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon Jun 27 15:06:50 2005 Subject: [gutvol-d] PG Cookbook In-Reply-To: <42C089D7.14465.2AABCF4@localhost> References: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions.com> <42C089D7.14465.2AABCF4@localhost> Message-ID: <42C0786F.6000105@perathoner.de> Branko Collin wrote: >>>I was thinking a few days ago about how PG has attracted volunteers >>>from all over the world from different backgrounds, and I got to >>>thinking that it might be kind of fun if PG were to compile a >>>cookbook containing the favorite recipes from our volunteers. > However, I thought PG policy was to not publish previously > unpublished works? Do I remember that correctly, and, if so, how > would that influence this project? > > Perhaps PG needs to have a sister project with almost exactly the > same goals, except that it will publish Vanity Press. This has already been done: http://en.wikibooks.org/wiki/Cookbook Of course, I could come up with better Italian recipes than they :-) -- Marcello Perathoner webmaster@gutenberg.org From cannona at fireantproductions.com Mon Jun 27 16:11:58 2005 From: cannona at fireantproductions.com (Aaron Cannon) Date: Mon Jun 27 16:12:53 2005 Subject: [gutvol-d] PG Cookbook In-Reply-To: <42C0786F.6000105@perathoner.de> References: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions.com> <42C089D7.14465.2AABCF4@localhost> <42C0786F.6000105@perathoner.de> Message-ID: <6.2.1.2.0.20050627180746.041defb0@mail.fireantproductions.com> The main idea of the project would be to recognize the diverse variety of volunteers, and not just put together a collection of recipes. Still, the point about the vanity publishing is a good one. Sincerely Aaron Cannon At 05:06 PM 6/27/2005, you wrote: >Branko Collin wrote: > >>>>I was thinking a few days ago about how PG has attracted volunteers >>>>from all over the world from different backgrounds, and I got to >>>>thinking that it might be kind of fun if PG were to compile a >>>>cookbook containing the favorite recipes from our volunteers. > >>However, I thought PG policy was to not publish previously unpublished >>works? Do I remember that correctly, and, if so, how would that influence >>this project? >>Perhaps PG needs to have a sister project with almost exactly the same >>goals, except that it will publish Vanity Press. > >This has already been done: > > http://en.wikibooks.org/wiki/Cookbook > > >Of course, I could come up with better Italian recipes than they :-) > > >-- >Marcello Perathoner >webmaster@gutenberg.org > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) From brad at chenla.org Tue Jun 28 20:26:16 2005 From: brad at chenla.org (Brad Collins) Date: Tue Jun 28 20:26:48 2005 Subject: [gutvol-d] PG Cookbook In-Reply-To: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions.com> (Aaron Cannon's message of "Sun, 26 Jun 2005 21:24:21 -0500") References: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions.com> Message-ID: <7jgddesn.fsf@chenla.org> Aaron Cannon writes: > I was thinking a few days ago about how PG has attracted volunteers > from all over the world from different backgrounds, and I got to > thinking that it might be kind of fun if PG were to compile a cookbook > containing the favorite recipes from our volunteers. Since, in most > cases, recipes can't be copyrighted, there shouldn't be any problem in > that regard. I'll bet we could get a pretty sizable and diverse > collection if we put the word out on this list and at DP. > > Anyway, it's just an idea. Thoughts? Any interest? This reminds me of a story. Back in the 90's a friend of mine was doing an environmental study for China Light & Power or the Hong Kong Gov. They were trying to put together a inventory of species of fish in Hong Kong waters. Finding the latin and English names for the fish was easy, but they were supposed to do everything in both English and Chinese so they went around the office asking people for the Chinese names for the different types of fish. No one could seem to remember the names for any of the fish but all of them could think of a recipes for each fish..... It turned out that there was no agreement on names and that each little fishing village and southern dialect had their own names for each type of fish. In the end they gave up and proposed making a cookbook of the recipes everyone had offered. I never heard if anything came of the cookbook.... b/ -- Brad Collins , Bangkok, Thailand From holden.mcgroin at dsl.pipex.com Tue Jun 28 23:16:28 2005 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Tue Jun 28 23:16:13 2005 Subject: [gutvol-d] PG Cookbook In-Reply-To: <7jgddesn.fsf@chenla.org> References: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions.com> <7jgddesn.fsf@chenla.org> Message-ID: <42C23CBC.5020405@dsl.pipex.com> Brad Collins wrote: > Back in the 90's a friend of mine was doing an environmental study for > China Light & Power or the Hong Kong Gov. They were trying to put > together a inventory of species of fish in Hong Kong waters. Finding > the latin and English names for the fish was easy, but they were > supposed to do everything in both English and Chinese so they went > around the office asking people for the Chinese names for the > different types of fish. > > No one could seem to remember the names for any of the fish but all of > them could think of a recipes for each fish..... It turned out that > there was no agreement on names and that each little fishing village > and southern dialect had their own names for each type of fish. > > In the end they gave up and proposed making a cookbook of the recipes > everyone had offered. I never heard if anything came of the > cookbook.... Which reminds me of one of my favourite quotes by HRH the Duke of Edinburgh: "If it has four legs and is not a chair, has wings and is not an aeroplane, or swims and is not a submarine, the Cantonese will eat it." Cheers, Holden From Bowerbird at aol.com Wed Jun 29 10:37:20 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Jun 29 10:37:38 2005 Subject: [gutvol-d] Greetings ebook makers ;) Message-ID: jeffrey said: > Hello fellow ebook creators, hello jeffrey. did anyone respond? -bowerbird From jefferydouglaswaddell at gmail.com Wed Jun 29 11:38:38 2005 From: jefferydouglaswaddell at gmail.com (Jeff Waddell) Date: Wed Jun 29 11:38:48 2005 Subject: [gutvol-d] Greetings ebook makers ;) In-Reply-To: References: Message-ID: <8a44f71c05062911383891d02c@mail.gmail.com> I have had some positive response from other forums, venues, and individuals. None of which has lead to anything resembling a "job". This would be the first sign of a response from the gutenberg community. Do you have any comments or suggestions? Jeff On 6/29/05, Bowerbird@aol.com wrote: > > jeffrey said: > > Hello fellow ebook creators, > > hello jeffrey. did anyone respond? > > -bowerbird > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050629/1798f98f/attachment.html From Bowerbird at aol.com Wed Jun 29 12:09:59 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Jun 29 12:10:14 2005 Subject: [gutvol-d] Greetings ebook makers ;) Message-ID: <1e.485ac604.2ff44c07@aol.com> jeffery said: > This would be the first sign of a response > from the gutenberg community. well, "the gutenberg community" doesn't consider me to be a part of it, so i guess you are still waiting for them. :+) in fact, since you've now soiled your trousers by even speaking to me, they will probably tell you that they are ignoring you for _that_ reason. > Do you have any comments or suggestions? motivate yourself, because they won't give you any help. :+) -bowerbird p.s. sorry i spelled your name wrong before... From Gutenberg9443 at aol.com Wed Jun 29 15:03:36 2005 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Jun 29 15:03:53 2005 Subject: [gutvol-d] PG Cookbook Message-ID: <199.422dd536.2ff474b8@aol.com> Announcement: I am not going to edit that cookbook. If I ever edit a cookbook, it will be one I wrote. However, for anybody who actually makes a decision to make a cookbook, here is my recipe. I call it I-am-worn-out-and-it-is-hot-as-h***-high-protein high-fiber salad: Chill one can of pork'n'beans. Chill one can of whole-kernel corn. Drain corn and empty into large bowl. Add undrained pork'n'beans. Chop up however many tomatoes and fresh onions you want to put in it. Add black olives and/or green stuffed olives and/or whatever else you want. Add celery or lettuce or whatever else you want. Toss in mayonnaise or Russian dressing or Italian dressing or whatever else you want. Eat with corn chips or potato chips or no chips at all. I am the only person I know who eats this. Everybody gives me that "are you out of your mind" look if I tell them about it. But the combination of beans and corn creates complete protein. Whatever veggies you decide to put in it are, of course, veggies. So it's a reasonably high-nutrition meal. You might drink milk with it or add diced cheese, as it is low in calcium. Or you might get your calcium by eating Tums after it, if your digestion isn't as fond of fiber as mine is. Anne Anne Do you like to breathe? Then save the trees! Begin a personal relationship with an ebook TODAY! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050629/2181675a/attachment.html From brad at chenla.org Wed Jun 29 17:54:42 2005 From: brad at chenla.org (Brad Collins) Date: Wed Jun 29 17:55:13 2005 Subject: [gutvol-d] PG Cookbook In-Reply-To: <42C23CBC.5020405@dsl.pipex.com> (Holden McGroin's message of "Wed, 29 Jun 2005 07:16:28 +0100") References: <6.2.1.2.0.20050626211707.01cb3d68@mail.fireantproductions.com> <7jgddesn.fsf@chenla.org> <42C23CBC.5020405@dsl.pipex.com> Message-ID: <3br0d5pp.fsf@chenla.org> Holden McGroin writes: > Brad Collins wrote: > Which reminds me of one of my favourite quotes by HRH the Duke of Edinburgh: > > "If it has four legs and is not a chair, has wings and is not an > aeroplane, or swims and is not a submarine, the Cantonese will eat > it." > Interesting. When I was working in Beijing I heard the entire quote a number of times in Mandarin. In Hong Kong it's usually shortened to something like "the Chinese will eat anything with four legs except a chair", with some pride, I might add. I wonder if the quote is originally Chinese and perhaps heard by the Duke from someone like Governor Wilson (who was known to have some what of a clue about local culture) as opposed to Chris Patton who was only made gov to piss off the mainland. Sorry, I know this is getting even further OT..... b/ -- Brad Collins , Bangkok, Thailand