From nwolcott at dsdial.net Sun Dec 4 21:29:10 2005 From: nwolcott at dsdial.net (N Wolcott) Date: Sun Dec 4 21:41:39 2005 Subject: [gutvol-d] Google print Message-ID: <005601c5f95e$61761da0$da9495ce@gw98> Now that the dust has cleared, can us proles have the final info-- can one download scans from google print, what is the best way, are they holding back, what is the resoloution, can one search (other than for the dejavu image), etc etc. Are there P2P networks to share stuff cribbed from google. Talking about PD images of course. Norm Wolcott nwolcott2@post.harvard.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051205/afe5b2f8/attachment.html From Bowerbird at aol.com Mon Dec 5 00:20:44 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Dec 5 00:21:10 2005 Subject: [gutvol-d] Google print Message-ID: <12a.6b35e5a8.30c5525c@aol.com> the dust has not cleared, norm. what is necessary at this time is a clearinghouse, where efforts to download every set of scans that google provides are coordinated, so they can be made available at a convenient place in cyberspace. an overlap of such downloadings will unnecessarily duplicate work, and only piss off google, who has already blacklisted certain people for excessive pings. and an uncoordinated effort will almost certainly result in many of the books being neglected... project gutenberg's high profile and history makes it a natural to coordinate all of these efforts... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051205/4186ca82/attachment.html From jon.ingram at gmail.com Mon Dec 5 01:30:26 2005 From: jon.ingram at gmail.com (Jon Ingram) Date: Mon Dec 5 01:30:47 2005 Subject: [gutvol-d] Google print In-Reply-To: <005601c5f95e$61761da0$da9495ce@gw98> References: <005601c5f95e$61761da0$da9495ce@gw98> Message-ID: <4baf53720512050130t1090c657i33a763f2be357e21@mail.gmail.com> On 12/5/05, N Wolcott wrote: > Now that the dust has cleared, can us proles have the final info-- can one > download scans from google print, what is the best way, are they holding > back, what is the resoloution, can one search (other than for the dejavu > image), etc etc. Are there P2P networks to share stuff cribbed from google. > Talking about PD images of course. Google does not use dejavu -- that's the Internet Archive. Google presents a fairly small jpeg image for each page of the book. There's no fixed resolution, but instead a fixed width. This means that the images generated from small books are relatively easy to OCR (and equate to around 100 dpi), while the images from books with large pages are hard for even humans to read. Those of you who can get access to Google Print should be able to download these 'web resolution' images from them just by right-clicking and saving. As far as I know there's no way to access the higher resolution images they must have made when they originally scanned the material; nor is there any way to access the OCRed text they use for searching purposes. Google provided no mechanism to download all the images for a book. You'll have to roll your own download script, or use one of the scripts written by others, such as the perl script gharvest, available from http://www.zuhause.org/dp/gharvest Google also provides no index to the material they have scanned. Several people have generated one by the crude means of searching for many different phrases, and storing the results. The most extensive list is probably also Bruce's, available from http://www.zuhause.org/dp/gfound1.html I've used this as a basis for a page showing the DP harvesting status of the material: http://homepage.ntlworld.com/jenjonliz/jon/tia/google.html -- Jon Ingram From nwolcott at dsdial.net Mon Dec 5 14:26:50 2005 From: nwolcott at dsdial.net (N Wolcott) Date: Mon Dec 5 14:32:50 2005 Subject: [gutvol-d] Google print References: <005601c5f95e$61761da0$da9495ce@gw98> <4baf53720512050130t1090c657i33a763f2be357e21@mail.gmail.com> Message-ID: <004a01c5f9eb$a1866f20$ee9495ce@gw98> I found that I could search for an author (Verne) using the Addvance Search on books.google.com and choosing dates 1850-1920. Got 5 hits including Mathias Sandorf. Images seemed to be 72 dpi png, but they were in color, so I imagine suitable smoothing etc could modify them to 150 b/w. I believe Capio does such things. Still lots of work. I'll lstick with Brewster for a while. I don;t know if ABBY OCR takes advantage of the color information. I'll try one of the scripts for a test. Thanks. Norm Wolcott nwolcott2@post.harvard.edu ----- Original Message ----- From: "Jon Ingram" To: "Project Gutenberg Volunteer Discussion" Sent: Monday, December 05, 2005 4:30 AM Subject: Re: [gutvol-d] Google print > On 12/5/05, N Wolcott wrote: > > Now that the dust has cleared, can us proles have the final info-- can one > > download scans from google print, what is the best way, are they holding > > back, what is the resoloution, can one search (other than for the dejavu > > image), etc etc. Are there P2P networks to share stuff cribbed from google. > > Talking about PD images of course. > > Google does not use dejavu -- that's the Internet Archive. Google > presents a fairly small jpeg image for each page of the book. There's > no fixed resolution, but instead a fixed width. This means that the > images generated from small books are relatively easy to OCR (and > equate to around 100 dpi), while the images from books with large > pages are hard for even humans to read. Those of you who can get > access to Google Print should be able to download these 'web > resolution' images from them just by right-clicking and saving. As far > as I know there's no way to access the higher resolution images they > must have made when they originally scanned the material; nor is there > any way to access the OCRed text they use for searching purposes. > > Google provided no mechanism to download all the images for a book. > You'll have to roll your own download script, or use one of the > scripts written by others, such as the perl script gharvest, available > from > http://www.zuhause.org/dp/gharvest > Google also provides no index to the material they have scanned. > Several people have generated one by the crude means of searching for > many different phrases, and storing the results. The most extensive > list is probably also Bruce's, available from > http://www.zuhause.org/dp/gfound1.html > I've used this as a basis for a page showing the DP harvesting status > of the material: > http://homepage.ntlworld.com/jenjonliz/jon/tia/google.html > > -- > Jon Ingram > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From nwolcott at dsdial.net Mon Dec 5 21:24:56 2005 From: nwolcott at dsdial.net (N Wolcott) Date: Mon Dec 5 21:26:47 2005 Subject: [gutvol-d] Canadian ISBN's Message-ID: <000a01c5fa25$6c958780$559495ce@gw98> Does anyone know if Canadian ISBN's can be ordered from US bookstores like Borders or Barnes and Noble? Do they appear in Amazon? or do they limit themseleves to Bowker's Books in Print. The question is if a self published book is given a Canadian ISBN, will anyone in the US be able to order it. I understand that even US ISBN's are not necessarily in Amazon, limiting their distribution. Canadian ISBN's are free while in US they are $30 a pop. I know that public libraries do not get Canadian books , mine anyway. Fredonia Books get a NL ISBN, but they can be ordered here and are on Amazon. Maybe a special deal? Norm Wolcott nwolcott2@post.harvard.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051206/fb5233e8/attachment.html From gbnewby at pglaf.org Tue Dec 6 22:45:16 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Tue Dec 6 22:45:19 2005 Subject: site design additions (Re: [gutvol-d] Free Beer !) In-Reply-To: <4360C55D.90401@perathoner.de> References: <434459D6.8070502@perathoner.de> <20051024174417.GA416@pglaf.org> <435E43CA.7010806@perathoner.de> <20051026015545.GA2346@pglaf.org> <435F9AD7.9080601@perathoner.de> <4360C55D.90401@perathoner.de> Message-ID: <20051207064516.GA23877@pglaf.org> On Thu, Oct 27, 2005 at 02:17:33PM +0200, Marcello Perathoner wrote: > Marcello Perathoner wrote: > > >>(C) I see no reason why we can't remove or rotate out Free as in Beer > >> immediately, rather than waiting for any of the above. > > > >I'm writing up a new page about "free" right now. > > That's done. > > Now I'd like to expand on that with some good strong advocacy of shorter > copyright terms. Unless I somehow missed some discussion, I didn't see any further commentary on this. The site at http://www.gutenberg.org has a somewhat revised look, no more "free beer," and a nice section on "No Cost of Freedom?" A minor fix I didn't see earlier to /freedom.php: " You can tell by reading the license inside the book. You may download a copyrighted book, but you are not allowed to give copies away." should be more like: " You can tell by reading the license inside the book. You may download a copyrighted book and give copies away, but might be limited in commercial uses and derivative works." Anyway, I like this, and hope others do too. Thanks!!! Also, the more prominent placing of the DONATE link *has* resulted in noticably more donations via PayPal. Most donations are smallish ($5-10), but sprinkled with $50 and $100 donations. This really helps to support our DVD/CD giveaways, book-buying, and other programs. -- Greg From sly at victoria.tc.ca Tue Dec 6 23:26:18 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Dec 6 23:26:37 2005 Subject: [gutvol-d] Donations to Project Gutenberg In-Reply-To: <20051207064516.GA23877@pglaf.org> References: <434459D6.8070502@perathoner.de> <20051024174417.GA416@pglaf.org> <435E43CA.7010806@perathoner.de> <20051026015545.GA2346@pglaf.org> <435F9AD7.9080601@perathoner.de> <4360C55D.90401@perathoner.de> <20051207064516.GA23877@pglaf.org> Message-ID: On Tue, 6 Dec 2005, Greg Newby wrote: > Also, the more prominent placing of the DONATE link *has* resulted > in noticably more donations via PayPal. Most donations are smallish > ($5-10), but sprinkled with $50 and $100 donations. This really helps > to support our DVD/CD giveaways, book-buying, and other programs. That is very good news. Thanks for sharing. Andrew From marcello at perathoner.de Wed Dec 7 10:21:11 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Dec 7 10:21:10 2005 Subject: site design additions (Re: [gutvol-d] Free Beer !) In-Reply-To: <20051207064516.GA23877@pglaf.org> References: <434459D6.8070502@perathoner.de> <20051024174417.GA416@pglaf.org> <435E43CA.7010806@perathoner.de> <20051026015545.GA2346@pglaf.org> <435F9AD7.9080601@perathoner.de> <4360C55D.90401@perathoner.de> <20051207064516.GA23877@pglaf.org> Message-ID: <43972817.6090308@perathoner.de> Greg Newby wrote: > " You can tell by reading the license inside the book. You may > download a copyrighted book, but you are not allowed to give copies > away." > > should be more like: > > " You can tell by reading the license inside the book. You may > download a copyrighted book and give copies away, but might be limited > in commercial uses and derivative works." Done. > Also, the more prominent placing of the DONATE link *has* resulted > in noticably more donations via PayPal. Most donations are smallish > ($5-10), but sprinkled with $50 and $100 donations. This really helps > to support our DVD/CD giveaways, book-buying, and other programs. What about a small writeup for the donations page that tells people what we do with their money? -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Wed Dec 7 10:39:16 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Dec 7 10:39:29 2005 Subject: [gutvol-d] broadband penetration Message-ID: <291.1761657.30c88654@aol.com> what is the current penetration of broadband? i seem to remember from the newsletter that a strong majority of people say they have it... which seems not to jibe with my reality, but most of my friends are poets, so we're poor. heck, three of them are living in a tent in my backyard right now. (not really, just kidding.) still, it seems to me that many people might have broadband at _work_, so they say "yes", when really, i would be more curious about the percentage who have it have _home_... does anyone have the figures? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051207/662991ed/attachment.html From joey at joeysmith.com Wed Dec 7 21:57:55 2005 From: joey at joeysmith.com (joey) Date: Wed Dec 7 22:05:12 2005 Subject: [gutvol-d] broadband penetration In-Reply-To: <291.1761657.30c88654@aol.com> References: <291.1761657.30c88654@aol.com> Message-ID: <20051208055755.GB13855@joeysmith.com> On Wed, Dec 07, 2005 at 01:39:16PM -0500, Bowerbird@aol.com wrote: > what is the current penetration of broadband? > > i seem to remember from the newsletter that > a strong majority of people say they have it... > > which seems not to jibe with my reality, but > most of my friends are poets, so we're poor. > heck, three of them are living in a tent in my > backyard right now. (not really, just kidding.) > > still, it seems to me that many people might > have broadband at _work_, so they say "yes", > when really, i would be more curious about > the percentage who have it have _home_... > > does anyone have the figures? > > -bowerbird According to a study [1] published by the Pew Internet & American Life Project, "[f]ully 48 million adult Americans have broadband connections at home", which they peg at 39% of adult Internet users. These numbers are for March 1, 2004. [1] www.pewinternet.org/pdfs/PIP_Broadband04.DataMemo.pdf From Bowerbird at aol.com Thu Dec 8 01:53:41 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Dec 8 01:54:07 2005 Subject: [gutvol-d] broadband penetration Message-ID: joey said: > According to a study [1] published by the Pew Internet & American Life Project, > "[f]ully 48 million adult Americans have broadband connections at home", which > they peg at 39% of adult Internet users. These numbers are for March 1, 2004. thanks joey. i'd guess that in the last 21 months, even that number has increased significantly, to the point where it might now be 50%... and it's been quite a ride for those of us who were baptized online at 300 baud... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051208/dfbb8eda/attachment.html From hart at pglaf.org Thu Dec 8 10:09:30 2005 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 8 10:09:32 2005 Subject: [gutvol-d] broadband penetration In-Reply-To: References: Message-ID: On Thu, 8 Dec 2005 Bowerbird@aol.com wrote: > joey said: >> According to a study [1] published by the Pew Internet & American Life > Project, >> "[f]ully 48 million adult Americans have broadband connections at home", > which >> they peg at 39% of adult Internet users. These numbers are for March 1, > 2004. > > thanks joey. > > i'd guess that in the last 21 months, even > that number has increased significantly, > to the point where it might now be 50%... > > and it's been quite a ride for those of us > who were baptized online at 300 baud... Let's not forget that 50% of the world doesn't even have phone service if any kind at all, much less narrow or broadband Intneret. I'm still targeting the world from the bottom up, much like the $100 laptop that we all hope for, inexpensive cellphone service, etc. For the majority of people broadband is totally out of reach, even out of sight. Happy Holidays! Give eBooks!!! Michael S. Hart Founder Project Gutenberg PS I started out at 13 baud. . . . From Bowerbird at aol.com Thu Dec 8 10:22:46 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Dec 8 10:23:13 2005 Subject: [gutvol-d] broadband penetration Message-ID: <1dc.4a719241.30c9d3f6@aol.com> michael said: > Let's not forget that 50% of the world doesn't > even have phone service if any kind at all, > much less narrow or broadband Intneret. i haven't "forgotten" that, michael. > I'm still targeting the world from the bottom up, as am i. > For the majority of people broadband is totally > out of reach, even out of sight. you're preaching to a fellow preacher, man. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051208/b79ed371/attachment.html From jeroen.mailinglist at bohol.ph Thu Dec 8 11:46:30 2005 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Thu Dec 8 11:45:55 2005 Subject: [gutvol-d] broadband penetration In-Reply-To: References: Message-ID: <43988D96.5040307@bohol.ph> Michael, In five years time, I've seen telephone coverage in my wife's home island in the Philippines raise from almost nothing (the nearest phone was a six kilometer walk from her home; at the beach you had to give them one day notice if you wanted to pay by creditcard, such that an employ could go to town to phone the CC company) to a much higher level -- everybody seems to be carrying cell phones now, and the entire province is dotted with telephone poles. You can have pre-paid cards starting as low as 35 cents, and you can actually send even smaller amounts of load by text message. I single text message is 2 cents, and far more popular than actual phone calls, that are very expensive at 15 cents a minute. I've taken up an item on the 100 dollar laptop on my website, http://www.bohol.ph/, but $100 is more than a month income for most families in Bohol. On the other hand, sending refurbished computers from here to schools in the Philippines is more expensive -- I send about 10, with mixed success. Some schools didn't have light fixtures, so could only use them at day time. ..., and ants get everywhere... Jeroen. Michael Hart wrote: > > Let's not forget that 50% of the world doesn't > even have phone service if any kind at all, > much less narrow or broadband Intneret. > > I'm still targeting the world from the bottom up, > much like the $100 laptop that we all hope for, > inexpensive cellphone service, etc. > From Bowerbird at aol.com Thu Dec 8 12:00:01 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Dec 8 12:00:15 2005 Subject: [gutvol-d] broadband penetration Message-ID: <1c3.36498e39.30c9eac1@aol.com> but michael, as one preacher to another, i'll remind you of something you already know. as the u.s. has hit the tipping point on broadband, there will be no turning back now, just a bigger rush. and increased speed is one of those things that computer users take for granted _very_quickly_, to the point we _forget_entirely_ the old and slow. so it will be increasingly difficult to talk with people if we don't consider their newly-changed mindset... to my mind, that means that an "either/or" approach will come to be rejected by them without consideration. far better, in my opinion, to recognize the new reality, and combine accommodation to the high-speed present enjoyed by _some_ people with an "awareness campaign" constantly reminding them that some people aren't so lucky. in other words, replace "either/or" with "both", and then tailor the sermon to the matter of the "choice" between 'em. you and i both know that, when push comes to shove, the "choice" is crystal-clear, so there's no need to artificially constrain the options that people entertain. argue against your enemies, not with your friends... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051208/4e68830e/attachment.html From hart at pglaf.org Thu Dec 8 13:50:51 2005 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 8 13:50:53 2005 Subject: [gutvol-d] broadband penetration In-Reply-To: <1c3.36498e39.30c9eac1@aol.com> References: <1c3.36498e39.30c9eac1@aol.com> Message-ID: On Thu, 8 Dec 2005 Bowerbird@aol.com wrote: > but michael, as one preacher to another, > i'll remind you of something you already know. > > as the u.s. has hit the tipping point on broadband, > there will be no turning back now, just a bigger rush. I may live in the US, but I also live in the world. I may work in the US, but I also work for the world. > and increased speed is one of those things that > computer users take for granted _very_quickly_, > to the point we _forget_entirely_ the old and slow. > > so it will be increasingly difficult to talk with people > if we don't consider their newly-changed mindset... > > to my mind, that means that an "either/or" approach > will come to be rejected by them without consideration. Let's not pretend that I am saying "either/or" to eyebooks, I'm just saying they aren't applicable to most of the world. What I don't like is when OTHERS say ONLY image eBooks. > far better, in my opinion, to recognize the new reality, > and combine accommodation to the high-speed present > enjoyed by _some_ people with an "awareness campaign" > constantly reminding them that some people aren't so lucky. Sorry, I'm not the sort to give better service to the "haves" that cannot be enjoyed by as many "have nots" as possible. I want my work to be accessible to everyone possible, is that such a terrible thing to you? > in other words, replace "either/or" with "both", and then > tailor the sermon to the matter of the "choice" between 'em. I'm not forcing the "either/or" choice, are you? > you and i both know that, when push comes to shove, > the "choice" is crystal-clear, so there's no need to > artificially constrain the options that people entertain. I am addressing the "real world" constraints, what is it that you are addressing? > argue against your enemies, not with your friends... Look in the mirror when you say that. . . . > > -bowerbird > From hart at pglaf.org Thu Dec 8 14:13:10 2005 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 8 14:13:11 2005 Subject: [gutvol-d] broadband penetration In-Reply-To: <43988D96.5040307@bohol.ph> References: <43988D96.5040307@bohol.ph> Message-ID: On Thu, 8 Dec 2005, Jeroen Hellingman (Mailing List Account) wrote: > > Michael, > > In five years time, I've seen telephone coverage in my wife's home island in > the Philippines raise from almost nothing (the nearest phone was a six > kilometer walk from her home; at the beach you had to give them one day > notice if you wanted to pay by creditcard, such that an employ could go to > town to phone the CC company) to a much higher level -- everybody seems to be > carrying cell phones now, and the entire province is dotted with telephone > poles. So, what you are telling me is that the Philippines has moved from the half of the world without telephone service to the half that now has phone service. Right? Still, half the people in the world still have the kind of phone service you describe as being there 5 years ago, or worse. . . . I would have to guess that half the people in the world did not use a phone at all in the last year. > You can have pre-paid cards starting as low as 35 cents, and you can > actually send even smaller amounts of load by text message. I single text > message is 2 cents, and far more popular than actual phone calls, that are > very expensive at 15 cents a minute. However, you can probably get more communication done in that minute than in one text message and four replies, unless they let you send huge emails like some we've just been through. > I've taken up an item on the 100 dollar laptop on my website, > http://www.bohol.ph/, but $100 is more than a month income for most families > in Bohol. Yes, I just mentioned something similar in a separate note, we will probably have to pay for those $100 laptops and give away so many of them that they become so ubiquitous that people don't steal them. Michael PS > On the other hand, sending refurbished computers from here to > schools in the Philippines is more expensive -- I send about 10, with mixed > success. Some schools didn't have light fixtures, so could only use them at > day time. ..., and ants get everywhere... Sory, I must have missed something, couldn't you use computers just from the light from the monitors? > > Jeroen. > > > Michael Hart wrote: > >> >> Let's not forget that 50% of the world doesn't >> even have phone service if any kind at all, >> much less narrow or broadband Intneret. >> >> I'm still targeting the world from the bottom up, >> much like the $100 laptop that we all hope for, >> inexpensive cellphone service, etc. >> > From Bowerbird at aol.com Thu Dec 8 14:13:45 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Dec 8 14:13:59 2005 Subject: [gutvol-d] broadband penetration Message-ID: <193.4d672cf0.30ca0a19@aol.com> well, if you get the feeling that people aren't listening to you as closely as they once did, maybe you'll think of this. otherwise, suit yourself... meanwhile, i will be busy trying to figure out how to quickly and easily convert 10,000 scanned books into zen markup language text, so everyone has lemonade... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051208/106b5ea5/attachment.html From hart at pglaf.org Thu Dec 8 14:17:25 2005 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 8 14:17:27 2005 Subject: [gutvol-d] broadband penetration In-Reply-To: <193.4d672cf0.30ca0a19@aol.com> References: <193.4d672cf0.30ca0a19@aol.com> Message-ID: On Thu, 8 Dec 2005 Bowerbird@aol.com wrote: > well, if you get the feeling that > people aren't listening to you > as closely as they once did, > maybe you'll think of this. Think of what? > > otherwise, suit yourself... > > meanwhile, i will be busy > trying to figure out how to > quickly and easily convert > 10,000 scanned books into > zen markup language text, > so everyone has lemonade... > > -bowerbird > From Bowerbird at aol.com Thu Dec 8 14:20:42 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Dec 8 14:20:58 2005 Subject: [gutvol-d] broadband penetration Message-ID: <231.33a275d.30ca0bba@aol.com> michael said: > Think of what? this thread, right here, right now. :+) -bowerbird p.s. i expected you to say "people never _did_ listen to me." -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051208/951d5840/attachment.html From jeroen.mailinglist at bohol.ph Thu Dec 8 14:38:12 2005 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Thu Dec 8 14:37:57 2005 Subject: [gutvol-d] broadband penetration In-Reply-To: References: <43988D96.5040307@bohol.ph> Message-ID: <4398B5D4.7060505@bohol.ph> Michael Hart wrote: > > So, what you are telling me is that the Philippines has moved from the > half > of the world without telephone service to the half that now has phone > service. > Not exactly, as I don't have the figures. Just saying that phone coverage has come within reach of many more people, and has done so very quickly, and I believe the same trend is true in many countries. Cellphones are booming business all over Africa as well. Your 50% may, hopefully soon be history. >> You can have pre-paid cards starting as low as 35 cents, and you can >> actually send even smaller amounts of load by text message. I single >> text message is 2 cents, and far more popular than actual phone >> calls, that are very expensive at 15 cents a minute. > > > However, you can probably get more communication done in that minute than > in one text message and four replies, unless they let you send huge > emails > like some we've just been through. > Well, you and I can agree on that, but scarcity of money sometimes leads to uneconomical decisions. As anybody who has been in a developping country can tell by the tiny sachets of shampoo, soap, and other care products on sale in every corner. Never ask what they cost by the liter. In the mean time, the Filipinos send on average 1.5 message per day; that is over 100 million messages. > > Yes, I just mentioned something similar in a separate note, we will > probably > have to pay for those $100 laptops and give away so many of them that > they > become so ubiquitous that people don't steal them. > Well, just like cell phones, such gadgets will remain a target for thieves. Personalize them, give them unique numbers that cannot be changed, and let them advertise where they are when switched on via WiFi, and you could fairly easy trace them back, just like cell-phones, if producers and providers would cooperate. > > Sory, I must have missed something, couldn't you use computers just from > the light from the monitors? > Ever tried to find the on switch in pitch dark? I love my wife's village in the middle of nowhere, as there is no light whatsoever to veil the stars. Jeroen. From hart at pglaf.org Thu Dec 8 15:01:55 2005 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 8 15:01:57 2005 Subject: [gutvol-d] broadband penetration In-Reply-To: <4398B5D4.7060505@bohol.ph> References: <43988D96.5040307@bohol.ph> <4398B5D4.7060505@bohol.ph> Message-ID: On Thu, 8 Dec 2005, Jeroen Hellingman (Mailing List Account) wrote: > Michael Hart wrote: > >> >> So, what you are telling me is that the Philippines has moved from the half >> of the world without telephone service to the half that now has phone >> service. >> > Not exactly, as I don't have the figures. Just saying that phone coverage has > come within reach of many more people, and has done so very quickly, and I > believe the same trend is true in many countries. Cellphones are booming > business all over Africa as well. Your 50% may, hopefully soon be history. Yes, I received an email from a cellphone user who was in the middle of the Serengeti Plain I may have mentioned earlier, who used PG eBooks on that phone, but I would still have to say that most Africans have never made a phone call. I was HOPING the 50% thing might have changed by now, but the latest figures still show half the world would have to go out of their way to make a phone call, sadly to say. >>> You can have pre-paid cards starting as low as 35 cents, and you can >>> actually send even smaller amounts of load by text message. I single >>> text message is 2 cents, and far more popular than actual phone calls, >>> that are very expensive at 15 cents a minute. >> >> >> However, you can probably get more communication done in that minute than >> in one text message and four replies, unless they let you send huge emails >> like some we've just been through. >> > Well, you and I can agree on that, but scarcity of money sometimes leads to > uneconomical decisions. As anybody who has been in a developping country can > tell by the tiny sachets of shampoo, soap, and other care products on sale in > every corner. Never ask what they cost by the liter. In the mean time, the > Filipinos send on average 1.5 message per day; that is over 100 million > messages. Yes, I see a similar smallness of toothpaste tubes in Eastern Europe when I go there, so I always pack a few family size tubes when I go, same with soap, and everything else. . .but food is not expensive, as they have to have food. "Neveer ask what they cost by the liter". . .cute, I'll save that one! >> Yes, I just mentioned something similar in a separate note, we will probably >> have to pay for those $100 laptops and give away so many of them that they >> become so ubiquitous that people don't steal them. >> > Well, just like cell phones, such gadgets will remain a target for thieves. > Personalize them, give them unique numbers that cannot be changed, and let > them advertise where they are when switched on via WiFi, and you could fairly > easy trace them back, just like cell-phones, if producers and providers would > cooperate. Of couse, even if you traced them down, they might not give them back easily. "This food belongs to" [local warlord's name]. . .as seen on various reports from sites where the UN, etc., dropped food for famine. > >> >> Sorry, I must have missed something, couldn't you use computers just from >> the light from the monitors? >> > Ever tried to find the on switch in pitch dark? Never had any trouble doing that. . .harder to find the stranger keys, even in the light from the monitor. I used to work all night sometimes when others were sleeping. > I love my wife's village in the middle of nowhere, as there is no light > whatsoever to veil the stars. At least you can SEE the stars! Light pollution is SO rampant even in my podunk location that you can't see much of anything except on the clearest nights, except perhaps the dozen or two brightest objects. I can remember standing out in this same street right here and watching Sputniks, Explorer, etc., go overhead, but I couldn't do that now. I even saw Sputnik III come down, and I wasn't even looking for it, just happened to be out. It's a different world now. . .I can only see the Big Dipper on the clearest nights, and Cassiopiea isn't recognizable. > Jeroen. Michael From JBuck814366460 at aol.com Thu Dec 8 21:24:19 2005 From: JBuck814366460 at aol.com (Jared Buck) Date: Thu Dec 8 21:24:40 2005 Subject: site design additions (Re: [gutvol-d] Free Beer !) In-Reply-To: <43972817.6090308@perathoner.de> References: <434459D6.8070502@perathoner.de> <20051024174417.GA416@pglaf.org> <435E43CA.7010806@perathoner.de> <20051026015545.GA2346@pglaf.org> <435F9AD7.9080601@perathoner.de> <4360C55D.90401@perathoner.de> <20051207064516.GA23877@pglaf.org> <43972817.6090308@perathoner.de> Message-ID: <43991503.2010301@aol.com> Marcello Perathoner wrote on 07/12/2005, 10:21 AM: > What about a small writeup for the donations page that tells people what > we do with their money? That's not a bad idea, Marcello. People do like to know where their donations go to, and for what purpose. It's great that more people are donating, even if they only donate a small amount. Happy Holidays to everyone! Jared From Bowerbird at aol.com Fri Dec 9 04:25:14 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Dec 9 04:25:49 2005 Subject: [gutvol-d] developments over at distributed proofreaders Message-ID: <46.768cddd5.30cad1aa@aol.com> i'm seeing some encouraging developments over at d.p., the most notable being an admission that the system has not always used the time and energy of volunteers wisely. i also see that the false veneer of "we're all one happy family" has a few cracks in it now, which i take as a very healthy sign, because it always struck me scarily as a little bit too cult-like. but there's enough truth in it that volunteers will emerge from the currently choppy seas with commitment intact, i believe... even though there are several suggestions that i could make, vis a vis volunteer energy, the big issue these days seems to be the number of rounds, and how those rounds are construed... rather than go over and post on the forums there, because hey, you guys don't really want _me_ over there right now, do you?, i'll just post a couple quick thoughts here instead... the p1.5 experiment and garweyne's diff tool are interesting, but um, you're really complicating the process unnecessarily. and there is a very simple solution. i've suggested it before, and another person has suggested a variant of it just recently... (sorry, can't remember exactly who, i think it was jhellingman.) garweyne's research on the diff tool in part seeks to determine whether a change that is made was a "significant" one. he has made some remarkable progress on capturing that information, really remarkable, but rock bottom, it's still a difficult judgment. more to the point, it's an unnecessary one. as long as _changes_ are still being made to a page, any changes, even "insignificant" ones, the page should be considered "in flux". it is only when a number of people (that number unspecified here) have looked at a page and determined "there's nothing here that needs to be changed" that we can really consider that page "done". (even then, it might still have errors. but that'll always be the case.) so, the question does _not_ boil down to "what changes were made, and are they significant?", a question that will be difficult to answer, it boils down to "were any changes made to the page by this person?". and _that_ question is dirt-simple to answer. if the text is the same, then no changes were made. no complex analysis is required at all. as long as changes are being made, not done. no changes, done. sure, sometimes you might temporarily cycle through a loop where one person does something, the next undoes it, the next re-does it, the next undoes it, but sooner or later, that pattern will be broken. (and you could jump out of it sooner, too, just by checking to see if the current version of the page matches the one that is _two_ back.) the beauty of this system is its simplicity. no complex analyses and no "rating" of the proofers (since that seems to be damping morale). just a simple method, with a foundation that is extremely intuitive... this has direct relevance to the "backlog" problems d.p. is facing too. that's because, if you keep sending the page back through p1 until two (let us say) proofers both view it without making any changes, and only _then_ is it sent to p2 (which is considered the "thin line". because responsibility is charged for any errors that survive _it_), then the _quantity_ of output from p1 will be curtailed somewhat, but the _quality_ of that p1 output will be considerably improved, to a point where you actually have great confidence in its accuracy. with less quantity, and better quality, pages will breeze through p2. for some pages -- easy ones, where the first proofer caught all -- that would mean only 3 p1 proofers would have to view the page. for difficult pages, it might take 6 or 10 proofers before it's solid. but so what? if that's what it takes, that's what it takes, and you'll feel when it finally gets out that you did exactly how much it took to get it _right_, no more, no less. and that's what you really want. all pages are not equally difficult; you need a variable methodology. any fixed-number-of-rounds system will result in too few rounds for the difficult pages, and too many rounds for the simple pages... and as far as i can tell from comments made, this type of system would _not_ be hard to implement in the actual code of the site... there are all kinds of things you could add to this that would make it even more useful -- like automated diff feedback as learning guide for proofers whenever their page was subjected to a further change, or ranking of proofers based on the percentage of pages they did that were touched/untouched by later proofers -- but since the important point here is _simplicity_, i'll not bother to discuss those in any detail. try it! you will quickly see that it works, and works well! anyway, as always, that's my advice, take it or leave it, as you see fit. but i get the impression that y'all really want to fix things now, and that you are getting bogged in the difficulty of a certain approach, when -- from my removed perspective -- i can see that a way that is much simpler will actually _better_ solve your various problems... best of luck. thanks for proofing! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051209/95490982/attachment.html From walter.van.holst at xs4all.nl Fri Dec 9 05:22:27 2005 From: walter.van.holst at xs4all.nl (Walter van Holst) Date: Fri Dec 9 05:56:46 2005 Subject: [gutvol-d] broadband penetration In-Reply-To: <43988D96.5040307@bohol.ph> References: <43988D96.5040307@bohol.ph> Message-ID: <43998513.8090002@xs4all.nl> Jeroen Hellingman (Mailing List Account) wrote: > > In five years time, I've seen telephone coverage in my wife's home > island in the Philippines raise from almost nothing (the nearest phone > was a six kilometer walk from her home; at the beach you had to give > them one day notice if you wanted to pay by creditcard, It is the same in Africa. My sister in Nigeria used to need a four-hour trip by motorbike to get to the nearest phone (about five years ago). Now she has a satellite phone and GSM coverage starts less than 15 km from where she lives. Which means that she probably will have GSM coverage somewhere next year. Oh, and by the way, she is in need of affordable science and math textbooks for teaching purposes. Having Shakespeare is all nice and dandy, but literature is not the foremost education priority over there. Last time I checked there weren't any available through Gutenberg at the pretty basic level they are needed. Regards, Walter From joshua at hutchinson.net Fri Dec 9 06:07:07 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Dec 9 06:07:10 2005 Subject: [gutvol-d] broadband penetration Message-ID: <20051209140707.C50B4EE2DC@ws6-1.us4.outblaze.com> ----- Original Message ----- From: "Walter van Holst" > Oh, and by the way, she is in need of affordable science and math textbooks > for teaching purposes. Having Shakespeare is all nice and dandy, but > literature is not the foremost education priority over there. Last time I > checked there weren't any available through Gutenberg at the pretty basic > level they are needed. > The problem is that these textbooks need to be fairly recent. Science has changed too much since 1922 for any science textbook to be worth the effort to use in a classroom. Honestly, I'd point your sister to Wikibooks (http://en.wikibooks.org/wiki/Main_Page). I don't know that any of the textbooks there would cover what she's looking for, but it has a better chance than anything we'd be able to do in PG. Josh From jon.ingram at gmail.com Fri Dec 9 08:08:36 2005 From: jon.ingram at gmail.com (Jon Ingram) Date: Fri Dec 9 08:08:46 2005 Subject: [gutvol-d] broadband penetration In-Reply-To: <20051209140707.C50B4EE2DC@ws6-1.us4.outblaze.com> References: <20051209140707.C50B4EE2DC@ws6-1.us4.outblaze.com> Message-ID: <4baf53720512090808i31907063w56f6428afb0a5c3@mail.gmail.com> On 12/9/05, Joshua Hutchinson wrote: > > ----- Original Message ----- > From: "Walter van Holst" > > > Oh, and by the way, she is in need of affordable science and math textbooks > > for teaching purposes. Having Shakespeare is all nice and dandy, but > > literature is not the foremost education priority over there. Last time I > > checked there weren't any available through Gutenberg at the pretty basic > > level they are needed. > > > > The problem is that these textbooks need to be fairly recent. Science has changed too much since 1922 for any science textbook to be worth the effort to use in a classroom. On the other hand, there are many pre-1923 maths texts, particularly at a basic level, which would be very useful. Away from pure pedagogy, we do already have a very interesting and very popular maths text in PG already -- http://www.gutenberg.org/etext/16713 Amusements in Mathematics by Henry Dudeney. It's consistently in the top 20 books downloaded from the main PG site, and has been wonderfully processed by DP (bias note: I scanned it :) ). The thing that continually surprises me is that we don't have an edition of Euclid's Elements in PG. I believe DP is working on Latex-ing a student edition of the first six books, but that's quite slow work. -- Jon Ingram From Bowerbird at aol.com Fri Dec 9 09:33:10 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Dec 9 09:33:23 2005 Subject: [gutvol-d] broadband penetration Message-ID: <13d.20dc262f.30cb19d6@aol.com> walter said: > Last time I checked there weren't any > available through Gutenberg at the > pretty basic level they are needed. open-source textbooks are an idea whose time has come. there are now various entities gearing up to pursue them. i'd suggest that you take a good look around cyberspace. even if they aren't here yet -- and i don't think they are -- they will be soon, perhaps very soon. best of luck to you! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051209/ba47d928/attachment.html From lee at novomail.net Fri Dec 9 09:43:48 2005 From: lee at novomail.net (Lee Passey) Date: Fri Dec 9 09:43:52 2005 Subject: [gutvol-d] broadband penetration In-Reply-To: <43998513.8090002@xs4all.nl> References: <43988D96.5040307@bohol.ph> <43998513.8090002@xs4all.nl> Message-ID: <4399C254.6050205@novomail.net> Walter van Holst wrote: > Oh, and by the way, she is in need of affordable science and math > textbooks for teaching purposes. Having Shakespeare is all nice and > dandy, but literature is not the foremost education priority over > there. Last time I checked there weren't any available through > Gutenberg at the pretty basic level they are needed. > > Regards, > > Walter You might want to look at http://textbookrevolution.org/ From hart at pglaf.org Fri Dec 9 09:51:24 2005 From: hart at pglaf.org (Michael Hart) Date: Fri Dec 9 09:51:25 2005 Subject: [gutvol-d] broadband penetration In-Reply-To: <4399C254.6050205@novomail.net> References: <43988D96.5040307@bohol.ph> <43998513.8090002@xs4all.nl> <4399C254.6050205@novomail.net> Message-ID: > Walter van Holst wrote: > >> Oh, and by the way, she is in need of affordable science and math >> textbooks for teaching purposes. Having Shakespeare is all nice and dandy, >> but literature is not the foremost education priority over there. Last >> time I checked there weren't any available through Gutenberg at the pretty >> basic level they are needed. >> >> Regards, >> >> Walter Most math and science books for teaching purposes are new enough to fall under what now appears to be perpetual copyright, but obvious exceptions would be things such as Einstein's earlier works, Leonardo da Vinci, and the ancient Greeks etc. mh From Gutenberg9443 at aol.com Sun Dec 11 12:45:20 2005 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sun Dec 11 12:45:36 2005 Subject: site design additions (Re: [gutvol-d] Free Beer !) Message-ID: <126.6b68f785.30cde9e0@aol.com> In a message dated 12/8/2005 10:24:38 P.M. Mountain Standard Time, JBuck814366460@aol.com writes: Marcello Perathoner wrote on 07/12/2005, 10:21 AM: > What about a small writeup for the donations page that tells people what > we do with their money? That's not a bad idea, Marcello. People do like to know where their donations go to, and for what purpose. It's great that more people are donating, even if they only donate a small amount. T has mentioned this several times. He's asleep right now (Nyquil), so I've emailed him this in case he didn't spot it himself. He can probably turn it out tmrw. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051211/213bb3e1/attachment.html From Gutenberg9443 at aol.com Sun Dec 11 12:54:35 2005 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sun Dec 11 12:54:50 2005 Subject: [gutvol-d] Donations to Project Gutenberg Message-ID: <8c.33a61f9b.30cdec0b@aol.com> In a message dated 12/7/2005 12:26:46 A.M. Mountain Standard Time, sly@victoria.tc.ca writes: > Also, the more prominent placing of the DONATE link *has* resulted > in noticably more donations via PayPal. Most donations are smallish > ($5-10), but sprinkled with $50 and $100 donations. This really helps > to support our DVD/CD giveaways, book-buying, and other programs. That is very good news. Thanks for sharing. That's what we're seeing here--small checks (one woman sent $4 in cash) sprinkled with larger donations--one a few weeks ago for $1000. We're also receiving letters containing both a donation and a request that we not sell their name to mailing lists or send the donors endless donation requests. We assure people that we do not do this. It might be a very good idea to make that promise again in the write-up, because we might be missing donations from people who would donate if they knew they would not be put in anybody's sucker list. We do keep careful records, because "how much from which state" is required in some states and occasionally by the IRS, but we definitely do not share names and addresses with ANYBODY. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051211/5dc55795/attachment.html From Gutenberg9443 at aol.com Sun Dec 11 13:05:11 2005 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sun Dec 11 13:05:28 2005 Subject: [gutvol-d] broadband penetration Message-ID: <1c5.364f11f8.30cdee87@aol.com> In a message dated 12/8/2005 3:38:17 P.M. Mountain Standard Time, jeroen.mailinglist@bohol.ph writes: > > Sory, I must have missed something, couldn't you use computers just from > the light from the monitors? > Ever tried to find the on switch in pitch dark? I love my wife's village in the middle of nowhere, as there is no light whatsoever to veil the stars. Are we talking about running on replaceable batteries, or about solar power recharging the batteries? Does anybody have any idea whether solar-powered computer or ebook batteries are on the horizon? I also must have missed something. Is someone saying that the light from the monitor could somehow power the entire computer? That sounds like perpetual motion machines to me, and so far nobody has had any luck inventing one. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051211/56ed11fa/attachment.html From jon at noring.name Sun Dec 11 13:21:25 2005 From: jon at noring.name (Jon Noring) Date: Sun Dec 11 13:21:41 2005 Subject: [gutvol-d] Wall Street Journal Online Interview with Michael Hart Message-ID: <84845268.20051211142125@noring.name> [Not sure if this was mentioned on gutvol-d. If so, forgive the duplication.] The Wall Street Journal Online has an interview with Michael Hart: http://online.wsj.com/public/article/SB113415403113218620-U_OqLOmApoaSvNpy5SjNwvhpW5w_20061209.html?mod=tff_main_tff_top Jon Noring From gbnewby at pglaf.org Mon Dec 12 00:27:57 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Dec 12 00:28:02 2005 Subject: site design additions (Re: [gutvol-d] Free Beer !) In-Reply-To: <43991503.2010301@aol.com> References: <434459D6.8070502@perathoner.de> <20051024174417.GA416@pglaf.org> <435E43CA.7010806@perathoner.de> <20051026015545.GA2346@pglaf.org> <435F9AD7.9080601@perathoner.de> <4360C55D.90401@perathoner.de> <20051207064516.GA23877@pglaf.org> <43972817.6090308@perathoner.de> <43991503.2010301@aol.com> Message-ID: <20051212082757.GD12598@pglaf.org> On Thu, Dec 08, 2005 at 09:24:19PM -0800, Jared Buck wrote: > Marcello Perathoner wrote on 07/12/2005, 10:21 AM: > > > What about a small writeup for the donations page that tells people what > > we do with their money? > > That's not a bad idea, Marcello. People do like to know where their > donations go to, and for what purpose. It's great that more people are > donating, even if they only donate a small amount. Sure.... we do an annual balance sheet, but it's not that inspiring or informative. I'd list these items as "typical expenses:" - paying our 1/4-time office staff - sustaining our CD & DVD giveaways, by reimbursing volunteers for supplies and postal expenses - colocation and occasional expansion of our servers (pglaf.org and pgdp.net) - costs of doing business, including an annual accountant's fiscal review (we don't have enough money to require a real audit), not-for-profit registration in states that require it, and a few other federal and state bureaucratic necessities - reimbursing individuals for some books, nearly all of which go to Distributed Proofreaders. Our main rule is to not pay more than about $1 per printed book. Many individuals donate books without seeking reimbursement, too Some of the expenses we don't incur are: - legal expenses, other than our accountant. Our lawyers are volunteers - travel expenses, except rarely. Instead, volunteers work wherever they are, and we communicate via email - fundraising expenses, including purchasing mailing lists, sending giveaways, and so forth. We hope to have some sort of t-shirt or coffee mug that people can buy, someday, but keep devoting most of our efforts to making new eBooks - copyright research. We do all our own, using volunteer copyright experts - software, other than occasionally helping an active volunteer with OCR software. Our Web sites and content rely on home-grown and free open source software - office space and equipment. We're decentralized, and own very little equipment: a few page-fed scanners, a few servers, and not much else - board of directors salaries: they're all volunteers, too From nwolcott at dsdial.net Sun Dec 11 21:33:02 2005 From: nwolcott at dsdial.net (N Wolcott) Date: Mon Dec 12 18:13:28 2005 Subject: [gutvol-d] Google print References: <005601c5f95e$61761da0$da9495ce@gw98><4baf53720512050130t1090c657i33a763f2be357e21@mail.gmail.com> <004a01c5f9eb$a1866f20$ee9495ce@gw98> Message-ID: <012901c5ff8a$9ac504e0$d79495ce@gw98> A little more fiddling aaround with google print. Allthough most of the images seem to be png files, one book was very dark jpeg files. So there is not perfect consistency. If you find a bad page there is a drop down box to check off the salient errors. Whether they do anything with that is of course another question. Another feature is the "search in this document" box. If you clear the box and put in say "243" and search, page 243 will come up. Google haas OCR'ed the book to the point of this text finding. The OCR version is not visible however. Similarly IX would find chapter IX etc. Other than this you can only go forward and back a page at a time. Norm Wolcott nwolcott2@post.harvard.edu ----- Original Message ----- From: "N Wolcott" To: "Project Gutenberg Volunteer Discussion" Sent: Monday, December 05, 2005 5:26 PM Subject: Re: [gutvol-d] Google print > I found that I could search for an author (Verne) using the Addvance Search > on books.google.com and choosing dates 1850-1920. Got 5 hits including > Mathias Sandorf. Images seemed to be 72 dpi png, but they were in color, so > I imagine suitable smoothing etc could modify them to 150 b/w. I believe > Capio does such things. Still lots of work. I'll lstick with Brewster for a > while. I don;t know if ABBY OCR takes advantage of the color information. > I'll try one of the scripts for a test. Thanks. > Norm Wolcott nwolcott2@post.harvard.edu > ----- Original Message ----- > From: "Jon Ingram" > To: "Project Gutenberg Volunteer Discussion" > Sent: Monday, December 05, 2005 4:30 AM > Subject: Re: [gutvol-d] Google print > > > > On 12/5/05, N Wolcott wrote: > > > Now that the dust has cleared, can us proles have the final info-- can > one > > > download scans from google print, what is the best way, are they holding > > > back, what is the resoloution, can one search (other than for the dejavu > > > image), etc etc. Are there P2P networks to share stuff cribbed from > google. > > > Talking about PD images of course. > > > > Google does not use dejavu -- that's the Internet Archive. Google > > presents a fairly small jpeg image for each page of the book. There's > > no fixed resolution, but instead a fixed width. This means that the > > images generated from small books are relatively easy to OCR (and > > equate to around 100 dpi), while the images from books with large > > pages are hard for even humans to read. Those of you who can get > > access to Google Print should be able to download these 'web > > resolution' images from them just by right-clicking and saving. As far > > as I know there's no way to access the higher resolution images they > > must have made when they originally scanned the material; nor is there > > any way to access the OCRed text they use for searching purposes. > > > > Google provided no mechanism to download all the images for a book. > > You'll have to roll your own download script, or use one of the > > scripts written by others, such as the perl script gharvest, available > > from > > http://www.zuhause.org/dp/gharvest > > Google also provides no index to the material they have scanned. > > Several people have generated one by the crude means of searching for > > many different phrases, and storing the results. The most extensive > > list is probably also Bruce's, available from > > http://www.zuhause.org/dp/gfound1.html > > I've used this as a basis for a page showing the DP harvesting status > > of the material: > > http://homepage.ntlworld.com/jenjonliz/jon/tia/google.html > > > > -- > > Jon Ingram > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d@lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > From Bowerbird at aol.com Mon Dec 12 20:08:23 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Dec 12 20:08:49 2005 Subject: [gutvol-d] Google print Message-ID: <274.221f971.30cfa337@aol.com> norm said: > Other than this you can only > go forward and back a page at a time. take a closer look at each u.r.l., norm, and you'll see how to jump to any page. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051212/d3171238/attachment.html From bruce at zuhause.org Mon Dec 12 21:03:30 2005 From: bruce at zuhause.org (Bruce Albrecht) Date: Mon Dec 12 21:03:48 2005 Subject: [gutvol-d] Google print In-Reply-To: <012901c5ff8a$9ac504e0$d79495ce@gw98> References: <005601c5f95e$61761da0$da9495ce@gw98> <4baf53720512050130t1090c657i33a763f2be357e21@mail.gmail.com> <004a01c5f9eb$a1866f20$ee9495ce@gw98> <012901c5ff8a$9ac504e0$d79495ce@gw98> Message-ID: <17310.22050.537065.271069@celery.zuhause.org> N Wolcott writes: > A little more fiddling aaround with google print. Allthough most of the > images seem to be png files, one book was very dark jpeg files. I've look at images from 10-20 books, and as far as I can tell, they're all jpeg images. I've never seen png files from Google Books. From gbnewby at pglaf.org Wed Dec 14 13:04:10 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Wed Dec 14 13:04:13 2005 Subject: [gutvol-d] Fwd: Cervantes Books Message-ID: <20051214210410.GD2015@pglaf.org> Has anyone checked these ones out before? -- Greg -------------- next part -------------- An embedded message was scrubbed... From: Armando Villena Morera Subject: Cervantes Books Date: Mon, 12 Dec 2005 12:25:43 +0100 (CET) Size: 3509 Url: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051214/9dab5b88/attachment.mht From grythumn at gmail.com Wed Dec 14 13:19:01 2005 From: grythumn at gmail.com (Robert Cicconetti) Date: Wed Dec 14 13:25:22 2005 Subject: [gutvol-d] Fwd: Cervantes Books In-Reply-To: <20051214210410.GD2015@pglaf.org> References: <20051214210410.GD2015@pglaf.org> Message-ID: <15cfa2a50512141319g6fcdded1n59179318c43ec8b2@mail.gmail.com> I spent four or five hours (about a week ago) looking for page images of spanish fiction, focusing on Cervantes.. aside from several copies of Don Quixote, I didn't find anything. Granted, my spanish is very poor, but I did check all of usual archives.. some hefty non-fiction, some french translations, and a number of text-only copies, but very little in the way of usable page images online. Lots of books on various mineral baths, religious tomes, and similar works, but little in the way of lighter works. If anyone can point us towards some page images, there was a thread on DP concerning a shortage of lighter spanish works... I believe they moved the single waiting fiction work out of the queues and into the rounds. I'm particularly interesting in getting the original spanish version of Novelas Ejemplares, and perhaps Comedias y Entremeses*. *(Some of which are available locally, but not in circulation.) If someone else has easier access, by all means feel free.. that's perhaps 8 volumes right there. :) AFAIK, we don't simply repackage existing text-only copies available on the web. R C ---------- Forwarded message ---------- From: Greg Newby Date: Dec 14, 2005 4:04 PM Subject: [gutvol-d] Fwd: Cervantes Books To: gutvol-d@lists.pglaf.org Has anyone checked these ones out before? -- Greg ---------- Forwarded message ---------- From: Armando Villena Morera To: news@pglaf.org Date: Mon, 12 Dec 2005 12:25:43 +0100 (CET) Subject: Cervantes Books Hello. It's a great pleasure finding someone like you, trying to give everybody te opportunity to reach great literature for free. I don't know wether this is the right mail address to send you this information or not but I think you'll find it useful. There's a LEGAL spanish site where you can find all Cervantes books in TXT format for free. It's http://cervantes.uah.es/obras.htm Maybe you can use it for your library. Cheers ------------------------------ Correo Yahoo! Comprueba qu? es nuevo, aqu? http://correo.yahoo.es _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051214/94141eee/attachment.html From joshua at hutchinson.net Wed Dec 14 13:46:57 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Dec 14 13:47:07 2005 Subject: [gutvol-d] Fwd: Cervantes Books Message-ID: <20051214214657.42003109B33@ws6-4.us4.outblaze.com> ----- Original Message ----- From: "Robert Cicconetti" > AFAIK, we don't simply repackage existing text-only copies available on the > web. > > R C Actually we do and have, Robert. Ok, DP doesn't, but PG volunteers do. We even have a name for it. Proofraiding. ;) The hard part is making sure it is well proofed and to our formatting standards and that it is clearable by our standards. Josh (JHutch at DP) From Bowerbird at aol.com Wed Dec 14 14:57:10 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Dec 14 14:57:30 2005 Subject: [gutvol-d] Fwd: Cervantes Books Message-ID: <28d.24b295a.30d1fd46@aol.com> robert said: > AFAIK, we don't simply repackage > existing text-only copies > available on the web. i'm sure robert means "distributed proofreaders" rather than "project gutenberg" when he says "we". :+) that's why he's looking for page-scans of cervantes' books. because i would think that someone at project gutenberg would be quite happy to "repackage" offered .txt books... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051214/831b6e0b/attachment.html From grythumn at gmail.com Wed Dec 14 15:00:46 2005 From: grythumn at gmail.com (Robert Cicconetti) Date: Wed Dec 14 15:07:46 2005 Subject: [gutvol-d] Fwd: Cervantes Books In-Reply-To: <20051214214657.42003109B33@ws6-4.us4.outblaze.com> References: <20051214214657.42003109B33@ws6-4.us4.outblaze.com> Message-ID: <15cfa2a50512141500y6a8374b1w57814401a10fb1cc@mail.gmail.com> Generally, proofraiding (or the new PC term, harvesting) refers to grabbing page images (and optionally text, but it's usually mediocre-to-poor raw OCR). Grabbing only text is seldom worth it.. nothing to compare it against. Blind format conversions are discouraged unless you have access to the original book. And as you say, clearing a text-only is more difficult. IIRC, it requires access to a paper copy and doing a fairly lengthy comparison.. only worthwhile IMO if the text is very clean and/or OCRs particularly poorly. (To be honest, I had forgotten this option when I wrote the previous post). Back to the gist of my question.. Does anyone know of image archives of spanish fiction? On 12/14/05, Joshua Hutchinson wrote: > > > ----- Original Message ----- > From: "Robert Cicconetti" < grythumn@gmail.com> > > > AFAIK, we don't simply repackage existing text-only copies available on > the > > web. > > > > R C > > > Actually we do and have, Robert. Ok, DP doesn't, but PG volunteers > do. We even have a name for it. Proofraiding. ;) > > The hard part is making sure it is well proofed and to our formatting > standards and that it is clearable by our standards. > > Josh > (JHutch at DP) > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051214/f549c7da/attachment.html From Bowerbird at aol.com Wed Dec 14 15:17:25 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Dec 14 15:17:43 2005 Subject: [gutvol-d] Fwd: Cervantes Books Message-ID: <212.fb95186.30d20205@aol.com> robert said: > Grabbing only text is seldom worth it.. > nothing to compare it against. um, i would disagree. it is your job to create something to compare it against. if you can't find page-scans, then do the scans yourself, and run those scans through distributed proofreaders. then compare the d.p. output against that other text. (because this is the goal, you'll want to make sure the p-book you scan is the same edition as the other text.) this is absolutely the best way to obtain accurate text, by comparing two independently-produced versions... and yes, i know y'all know that. as we speak, even, a version of "the secret garden" is going through this. it just seemed like this little reminder was being called. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051214/2fcd79a0/attachment.html From sly at victoria.tc.ca Wed Dec 14 18:47:48 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed Dec 14 18:48:05 2005 Subject: [gutvol-d] Fwd: Cervantes Books In-Reply-To: <28d.24b295a.30d1fd46@aol.com> References: <28d.24b295a.30d1fd46@aol.com> Message-ID: On Wed, 14 Dec 2005 Bowerbird@aol.com wrote: > robert said: > > AFAIK, we don't simply repackage > > existing text-only copies > > available on the web. > > because i would think that someone at project gutenberg > would be quite happy to "repackage" offered .txt books... > Interesting conversation. Yes, there certainly are texts that are "harvested" from other sites in plain text, or html, etc. format and then proofed and reformatted for PG. I know this as most of the texts I have contributed to PG have been of this type. Altogether in time and effort, it takes less than creating a new ebook from scratch. For more information about this process see: "I've found an eligible text elsewhere on the Net, but it's not in the PG archives. Can I just submit it to PG?" http://www.gutenberg.org/faq/V-62 As Josh said: "The hard part is making sure it is well proofed and to our formatting standards and that it is clearable by our standards." I have access to a decent-sized university library, so I am able to get clearances for many titles which might otherwise be quite difficult to track down. Currently I have as an "in process" a couple lesser-known shorter works by the French-Canadian Laure Conan. These are being adapted from a site that has a collection of about 20-30 French-Canadian texts. As for bb's comment that: "i would think that someone at project gutenberg would be quite happy to "repackage" offered .txt books." that is not quite accurate. I have an ever increasing list of web sites with full texts of books that could be cleared for PG, but nothing like the time and enthusiasm that would be needed to work through them. (which does need to be done by someone familiar with PG formatting conventions) Doing picky reformatting and proofing of this type is not as "glamorous" as the distributed proofing process, and understandably will not attract as many people. Most recently on this topic, I received an email saying this: "I have been making Mary E. Wilkins (Freeman) stories available on my website, and have completed proof-reading her novels. At one point, you had an interest in making them available on Project Gutenberg. I am still amenable to the thought." http://home.comcast.net/~jkaylin/jeff/Book/Book.htm Although it may take quite a while before I am able to get around to it... Anyone else interested in this? Andrew From joshua at hutchinson.net Thu Dec 15 04:25:12 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Dec 15 04:40:03 2005 Subject: [gutvol-d] Fwd: Cervantes Books In-Reply-To: <15cfa2a50512141500y6a8374b1w57814401a10fb1cc@mail.gmail.com> References: <20051214214657.42003109B33@ws6-4.us4.outblaze.com> <15cfa2a50512141500y6a8374b1w57814401a10fb1cc@mail.gmail.com> Message-ID: <43A160A8.50202@hutchinson.net> Robert Cicconetti wrote: > Generally, proofraiding (or the new PC term, harvesting) refers to > grabbing page images (and optionally text, but it's usually > mediocre-to-poor raw OCR). Grabbing only text is seldom worth it.. > nothing to compare it against. > > Blind format conversions are discouraged unless you have access to the > original book. And as you say, clearing a text-only is more difficult. > IIRC, it requires access to a paper copy and doing a fairly lengthy > comparison.. only worthwhile IMO if the text is very clean and/or OCRs > particularly poorly. (To be honest, I had forgotten this option when I > wrote the previous post). Sorry to contradict you again, Robert, but not only do we do proofraiding (and proofraiding refers to harvesting pre-existing text, imageraiding or harvesting traditionally refers to grabbing pre-existing images ... I've done lots and lots of both) ... not only do we do it, I'm in the middle of a proofraid right now. In the last 2 months I've post about 10 books so far from the Baha'i Reference Library. Other than format conversion and running GutCheck, I haven't gone much else with them. Josh From grythumn at gmail.com Thu Dec 15 08:29:38 2005 From: grythumn at gmail.com (Robert Cicconetti) Date: Thu Dec 15 08:29:45 2005 Subject: [gutvol-d] Fwd: Cervantes Books In-Reply-To: <43A160A8.50202@hutchinson.net> References: <20051214214657.42003109B33@ws6-4.us4.outblaze.com> <15cfa2a50512141500y6a8374b1w57814401a10fb1cc@mail.gmail.com> <43A160A8.50202@hutchinson.net> Message-ID: <15cfa2a50512150829j7c6a3c67na68c35b875466d3b@mail.gmail.com> On 12/15/05, Joshua Hutchinson wrote: > > Robert Cicconetti wrote: > > > Generally, proofraiding (or the new PC term, harvesting) refers to > > grabbing page images (and optionally text, but it's usually > > mediocre-to-poor raw OCR). Grabbing only text is seldom worth it.. > > nothing to compare it against. > > > > Blind format conversions are discouraged unless you have access to the > > original book. And as you say, clearing a text-only is more difficult. > > IIRC, it requires access to a paper copy and doing a fairly lengthy > > comparison.. only worthwhile IMO if the text is very clean and/or OCRs > > particularly poorly. (To be honest, I had forgotten this option when I > > wrote the previous post). > > > Sorry to contradict you again, Robert, but not only do we do > proofraiding (and proofraiding refers to harvesting pre-existing text, > imageraiding or harvesting traditionally refers to grabbing pre-existing > images ... I've done lots and lots of both) ... not only do we do it, > I'm in the middle of a proofraid right now. In the last 2 months I've > post about 10 books so far from the Baha'i Reference Library. Other > than format conversion and running GutCheck, I haven't gone much else > with them. > Shrug. Okay, so I'm wrong again. I was told the term 'proofraiding' was discouraged because it is not PC.. not because it refers to a different form of using preexisting resources. It certainly was used to describe grabbing previously scanned images, not text. On another note, how do you clear these texts without a physical copy (or page images) and the relatively lengthy comparison described here: http://www.gutenberg.org/howto/cconfirm-howto R C -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051215/cb1826d4/attachment.html From joshua at hutchinson.net Thu Dec 15 09:23:21 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Dec 15 09:23:27 2005 Subject: [gutvol-d] Fwd: Cervantes Books Message-ID: <20051215172321.B42849E896@ws6-2.us4.outblaze.com> ----- Original Message ----- From: "Robert Cicconetti" > On another note, how do you clear these texts without a physical copy (or > page images) and the relatively lengthy comparison described here: > > http://www.gutenberg.org/howto/cconfirm-howto > In my case, I'm pulling texts from the actual copyright owners who gave us permission to do so. In other cases, sometimes the original site made the original scans for the title page available. In the hardest cases, someone had to do the leg work described in your link. Josh From hart at pglaf.org Thu Dec 15 12:05:24 2005 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 15 12:05:26 2005 Subject: [gutvol-d] Fwd: Cervantes Books In-Reply-To: <15cfa2a50512141500y6a8374b1w57814401a10fb1cc@mail.gmail.com> References: <20051214214657.42003109B33@ws6-4.us4.outblaze.com> <15cfa2a50512141500y6a8374b1w57814401a10fb1cc@mail.gmail.com> Message-ID: On Wed, 14 Dec 2005, Robert Cicconetti wrote: > Generally, proofraiding (or the new PC term, harvesting) Sorry, PG has been using the term "harvesting" for at least as long as the world is been becoming PC. _I_ might have been somewhat responsible for the term raiding, as I used to call the harvesters "The Raiders of the Lost Art" ;-) Happy Holidays! Give eBooks!!! Michael S. Hart Founder Project Gutenberg From joshua at hutchinson.net Thu Dec 15 12:07:31 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Dec 15 12:07:43 2005 Subject: [gutvol-d] Fwd: Cervantes Books Message-ID: <20051215200731.60C9E9E9A1@ws6-2.us4.outblaze.com> Ok, it's official ... We need a PG/DP jargon file! ----- Original Message ----- From: "Michael Hart" To: "Project Gutenberg Volunteer Discussion" Subject: Re: [gutvol-d] Fwd: Cervantes Books Date: Thu, 15 Dec 2005 12:05:24 -0800 (PST) > > > > On Wed, 14 Dec 2005, Robert Cicconetti wrote: > > > Generally, proofraiding (or the new PC term, harvesting) > > Sorry, PG has been using the term "harvesting" for at least as long > as the world is been becoming PC. > > _I_ might have been somewhat responsible for the term raiding, > as I used to call the harvesters "The Raiders of the Lost Art" > > ;-) > > Happy Holidays! > > Give eBooks!!! > > Michael S. Hart > Founder > Project Gutenberg > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From Gutenberg9443 at aol.com Thu Dec 15 12:28:18 2005 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Thu Dec 15 12:28:36 2005 Subject: [gutvol-d] Fwd: Cervantes Books Message-ID: <1f2.17a47b44.30d32be2@aol.com> In a message dated 12/14/2005 7:48:03 P.M. Mountain Standard Time, sly@victoria.tc.ca writes: "I have been making Mary E. Wilkins (Freeman) stories available on my website, and have completed proof-reading her novels. At one point, you had an interest in making them available on Project Gutenberg. I am still amenable to the thought." http://home.comcast.net/~jkaylin/jeff/Book/Book.htm Although it may take quite a while before I am able to get around to it... Anyone else interested in this? I went and looked at it. I find it extremely interesting, though like you, I don't know when I could get to it. As to the rest of the thread, concerning Cervantes and so forth, I'll probably bring down the wrath of the mailing list on my head, but may I point out that we specifically say that we do not depend on a single edition or whatever. In reality, where a lot of scholarly interest is likely, we DO have to depend on a single edition, although the MLA rules on documentation of Web sources make it unnecessary to retain page numbers from the specific edition. (Am I the only one who keeps track of changes in MLA documentation requirements?) What drives me to wanting to spit and bite is that our front matter usually fails to tell us when the book was written and/or published. I go to the LoC, and if I don't find it there I scream and bang my head against the wall for a while. (Not really. But I feel like it. Several times I really have cried from sheer frustration after looking in a few more places.) But as far as proofing goes, unless there is scholarly interest, just proofing so that the text makes sense should be adequate, as long as the proofreader is conscious of grammar, mechanics, and the changes that time and geography bring to correctness. In other words, I flatter myself that as I am a good writer and a good grammarian, and I have read and enjoyed texts from the last 600 years of writing in the English language, my version is re-edited rather than proofread, and should therefore be acceptable for all purposes other than scholarly interest. (I am aware that this paragraph does not make much sense. I am ill today and would neither write nor proofread anything for publication.) When I proofed MADAME DUBARRY I was aware that there might be scholarly interest, and therefore I followed the text exactly except for page numbers, inserting text notes where they might be needed for clarity. I already knew that the MLA does not need page number documentation when a person is quoting from a Web source, and I don't know enough about any other documentation system to know what it needs. But when I created the PG version of SWISS FAMILY ROBINSON I used five different public domain translations, and my own brain, to create the best reading text I could make. Someone asked me, in effect, why I had not created a variorum. I didn't create a variorum because (a) I was creating a reading text, not a scholarly text; and (b) because I have not the mental, physical, or financial resources to create a variorum, particularly considering that the book's length was tripled by its first translator into French; and (c) because I didn't think the variorum would be worth doing anyway. (This does not mean I disapprove of variorum editions. I was fortunate enough to have repeatedly read variorum editions of Emily Dickinson and the Rubiyat in late childhood, with the result that I often memorized the "wrong" version of a poem.) But we must decide, for any book, what we are creating, a scholarly edition or a reading edition? I am aware that a good many useful people prefer scholarly editions, and have scanned and proofed them. I too am a scholar. But I consider reading editions very useful, and far more important for me to do, because as a writer of popular fiction I grok the needs of the genre. None of us would be likely to have reached the status of scholar without first having read a whole lot of reading editions. In fact there IS some scholarly interest in Mary E. Wilkins, so we would have to do one of two things: (1) Either re-proof everything according to the text, which is probably not even possible without spending several years on it; or (2) re-edit it and post in the front matter a warning that this is a reading edition ONLY. Now I will pull in my soap box and go away. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051215/5dffea8d/attachment.html From Gutenberg9443 at aol.com Thu Dec 15 12:30:38 2005 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Thu Dec 15 12:30:49 2005 Subject: [gutvol-d] Fwd: Cervantes Books Message-ID: <1ad.45406f47.30d32c6e@aol.com> In a message dated 12/15/2005 1:07:57 P.M. Mountain Standard Time, joshua@hutchinson.net writes: Ok, it's official ... We need a PG/DP jargon file! Thank you for volunteering. I LIKE "Raiders of the Lost Art." Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051215/6c231952/attachment.html From sly at victoria.tc.ca Thu Dec 15 19:26:14 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Dec 15 19:26:30 2005 Subject: [gutvol-d] Fwd: Cervantes Books In-Reply-To: <1f2.17a47b44.30d32be2@aol.com> References: <1f2.17a47b44.30d32be2@aol.com> Message-ID: Anne, Thanks for sharing your ideas. This makes me think of an issue from the cataloging point of view. It is altogether too easy for information about the original item (usually in our case, a monograph) and the digital transcription to get mixed up. Personally, I have no problem with keeping information about the source a pg text was derived from, as long as it is clear that is information about the _source text_ not the Project Gutenberg digital resource. (For instance, there are places where we call a book "Third edition"--when it is our first; or mention a lccn which applies to the source, but not the PG text. From a library sciences point of view this could be said to make as much sense as saying that a PG text has 416 pages, because that is what the source had.) It is perhaps interesting to note that the tei header contains two distinct areas for bibliographic data about a digital text itself and source(s) it was derived from. This is one reason that it can appear overly repetative on firt glance. Andrew On Thu, 15 Dec 2005 Gutenberg9443@aol.com wrote: > What drives me to wanting to spit and bite is that our front matter usually > fails to tell us when the book was written and/or published. I go to the LoC, > and if I don't find it there I scream and bang my head against the wall for a > while. (Not really. But I feel like it. Several times I really have cried > from sheer frustration after looking in a few more places.) From Gutenberg9443 at aol.com Fri Dec 16 08:51:09 2005 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Dec 16 08:51:18 2005 Subject: [gutvol-d] Fwd: Cervantes Books Message-ID: In a message dated 12/15/2005 8:26:29 P.M. Mountain Standard Time, sl y@victoria.tc.ca writes: Personally, I have no problem with keeping information about the source a pg text was derived from, as long as it is clear that is information about the _source text_ not the Project Gutenberg digital resource. (For instance, there are places where we call a book "Third edition"--when it is our first; or mention a lccn which applies to the source, but not the PG text. From a library sciences point of view this could be said to make as much sense as saying that a PG text has 416 pages, because that is what the source had.) --and that would be impossible to say accurately, because how many "pages" a PG text has depends on what size type the reader is using. I agree with all you have said here. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051216/79d79907/attachment.html From jon.ingram at gmail.com Sat Dec 17 23:14:35 2005 From: jon.ingram at gmail.com (Jon Ingram) Date: Sat Dec 17 23:14:53 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? Message-ID: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> I've been doing some volunteering for Librivox recently[1], and would like to know the best way to submit the end product (human-read versions of PG texts) to PG. [1] http://www.librivox.org/ -- Jon Ingram From gbnewby at pglaf.org Sun Dec 18 01:26:26 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Dec 18 01:26:29 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? In-Reply-To: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> References: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> Message-ID: <20051218092626.GA10170@pglaf.org> On Sun, Dec 18, 2005 at 07:14:35AM +0000, Jon Ingram wrote: > I've been doing some volunteering for Librivox recently[1], and would > like to know the best way to submit the end product (human-read > versions of PG texts) to PG. > > [1] http://www.librivox.org/ So far, it has been just me posting any form of audio eBook. This is a problem, because I have a serious backlog -- between new audio eBooks, fixing broken computer-generated audio eBooks, and contemporary copyrighted text works along with their fixes/updates, I have a pretty substantial collection of stuff waiting for my personal attention. Probably 50-75 new items, plus fixes/updates for another 30-50. Posting this type of content is a bit of a hassle compared to .txt and .htm files, because some sort of readme.txt or readme.htm needs to be written to accompany the files. For multi-part files, an index.htm would be a better alternative. Then, the ww's regular "makehead10k" program can "wrap" the readme.txt (or whatever) with our header/footer, and all the .mp3 files need to be renamed per our standard convention (such as, eBook # 17555 will have 17555-m/17555-m.mp3 , with variations for multi-part files). Jon, if you're willing to do the preparation by hand (perhaps following some of our earlier models, linked from http://gutenberg.org/audio), I could pre-allocate some eBook #s to use. If you could get me just one .zip with everything pre-configured, it would be very easy for me to upload. Oh, and a standard "posted" email message, too. At least a dozen or so items from literalsystems.org are also ready to be brought in, if anyone is inspired to do these. These audio eBooks are actually very popular, and the human readings often get complimented. (The computer-generated readings are not nearly so popular, but still surprisingly popular.) I just wish there were enough hours in the day for me to work through all of these pending items, or someone else interested enough to take the lead (hint, hint). -- Greg From jon.ingram at gmail.com Sun Dec 18 03:08:51 2005 From: jon.ingram at gmail.com (Jon Ingram) Date: Sun Dec 18 03:09:13 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? In-Reply-To: <20051218092626.GA10170@pglaf.org> References: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> <20051218092626.GA10170@pglaf.org> Message-ID: <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> On 12/18/05, Greg Newby wrote: > Jon, if you're willing to do the preparation by hand (perhaps following > some of our earlier models, linked from http://gutenberg.org/audio), I > could pre-allocate some eBook #s to use. If you could get me just one > .zip with everything pre-configured, it would be very easy for me to > upload. Oh, and a standard "posted" email message, too. > > At least a dozen or so items from literalsystems.org are also ready to > be brought in, if anyone is inspired to do these. These audio eBooks > are actually very popular, and the human readings often get > complimented. (The computer-generated readings are not nearly so > popular, but still surprisingly popular.) I just wish there were enough > hours in the day for me to work through all of these pending items, > or someone else interested enough to take the lead (hint, hint). Thanks for the reply. I'll look into the preparation you require. The ultimate aim is that all Librivox material would be uploaded to Project Gutenberg, in addition to the Internet Archive, which is the current destination. Ideally this would be done in a fairly automated way. Emailing the files is probably out, as they're quite large (128kbit MP3, so approximately 1 meg per minute of audio). Is there any mechanism for uploading them to a server in the same way that normal texts are uploaded to PG? -- Jon Ingram From hiddengreen at gmail.com Sun Dec 18 03:34:03 2005 From: hiddengreen at gmail.com (Cori) Date: Sun Dec 18 03:41:18 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? In-Reply-To: <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> References: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> <20051218092626.GA10170@pglaf.org> <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> Message-ID: <910fee4a0512180334x4de97e2bw75dc6cd48fccd269@mail.gmail.com> While we're on the subject ... On 12/18/05, Jon Ingram wrote: > On 12/18/05, Greg Newby wrote: > > Jon, if you're willing to do the preparation by hand (perhaps following > > some of our earlier models, linked from http://gutenberg.org/audio), I > > could pre-allocate some eBook #s to use. If you could get me just one > > .zip with everything pre-configured, it would be very easy for me to > > upload. Oh, and a standard "posted" email message, too. Is there a link between items allocated new eBook numbers, and the existing texts they're being read from..? I've only uploaded audio for one book so far, but because it was added at the same time as the plaintext and HTML, everything is now sitting under the same eText number, with just a separate folder for the audio files (which are also linked nicely from the HTML.) But if, for example, one read "A Christmas Carol" and it was given a new number - would / could there also be a link from eText #46 ..? Cori From jon.ingram at gmail.com Sun Dec 18 03:44:56 2005 From: jon.ingram at gmail.com (Jon Ingram) Date: Sun Dec 18 03:45:19 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? In-Reply-To: <910fee4a0512180334x4de97e2bw75dc6cd48fccd269@mail.gmail.com> References: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> <20051218092626.GA10170@pglaf.org> <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> <910fee4a0512180334x4de97e2bw75dc6cd48fccd269@mail.gmail.com> Message-ID: <4baf53720512180344w249362b3m10337b782cd98f81@mail.gmail.com> On 12/18/05, Cori wrote: > Is there a link between items allocated new eBook numbers, and the > existing texts they're being read from..? I've only uploaded audio > for one book so far, but because it was added at the same time as the > plaintext and HTML, everything is now sitting under the same eText > number, with just a separate folder for the audio files (which are > also linked nicely from the HTML.) > > But if, for example, one read "A Christmas Carol" and it was given a > new number - would / could there also be a link from eText #46 ..? It would make sense to use the number of the source text -- the audio version is after all just a new edition of the same text. -- Jon Ingram From joshua at hutchinson.net Sun Dec 18 08:23:14 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sun Dec 18 08:18:27 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? In-Reply-To: <4baf53720512180344w249362b3m10337b782cd98f81@mail.gmail.com> References: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> <20051218092626.GA10170@pglaf.org> <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> <910fee4a0512180334x4de97e2bw75dc6cd48fccd269@mail.gmail.com> <4baf53720512180344w249362b3m10337b782cd98f81@mail.gmail.com> Message-ID: <43A58CF2.9090405@hutchinson.net> Jon Ingram wrote: >On 12/18/05, Cori wrote: > > >>Is there a link between items allocated new eBook numbers, and the >>existing texts they're being read from..? I've only uploaded audio >>for one book so far, but because it was added at the same time as the >>plaintext and HTML, everything is now sitting under the same eText >>number, with just a separate folder for the audio files (which are >>also linked nicely from the HTML.) >> >>But if, for example, one read "A Christmas Carol" and it was given a >>new number - would / could there also be a link from eText #46 ..? >> >> > >It would make sense to use the number of the source text -- the audio >version is after all just a new edition of the same text. > >-- >Jon Ingram >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > If the audio file is read from our text, it should go into the etext number it was read from. If it derived from someone else's edition, then it would get a new number. At least, that's how I understand it. Josh From sly at victoria.tc.ca Sun Dec 18 11:09:08 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Sun Dec 18 11:09:18 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? In-Reply-To: <43A58CF2.9090405@hutchinson.net> References: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> <20051218092626.GA10170@pglaf.org> <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> <910fee4a0512180334x4de97e2bw75dc6cd48fccd269@mail.gmail.com> <4baf53720512180344w249362b3m10337b782cd98f81@mail.gmail.com> <43A58CF2.9090405@hutchinson.net> Message-ID: On Sun, 18 Dec 2005, Joshua Hutchinson wrote: > >On 12/18/05, Cori wrote: > > > >>But if, for example, one read "A Christmas Carol" and it was given a > >>new number - would / could there also be a link from eText #46 ..? > >> > > > >It would make sense to use the number of the source text -- the audio > >version is after all just a new edition of the same text. > > > >-- > >Jon Ingram > > > If the audio file is read from our text, it should go into the etext > number it was read from. If it derived from someone else's edition, > then it would get a new number. > > At least, that's how I understand it. > > Josh Yes, that sounds like a good idea. If there is any question, I'd say that posting the audio book under a new PG number would be the best choice. It's worth keeping in mind that, particularly for well known books, there can be multiple forms out there. A few examples: We have more than one edition of Darwin's "Origin of Species". We have a "young Folk's edition" of "Black Beauty" as well as "regular" edition. I understand that much of our P.G. Woodhouse is from American editions which sometimes vary greatly from the original British editions. So I would not be too hasty to put audiobooks under the same number unless we are certain that they prepared from that same text. We can make "See also" links in the bibrec pages if needed. Andrew From jtinsley at pobox.com Sun Dec 18 11:28:40 2005 From: jtinsley at pobox.com (Jim Tinsley) Date: Sun Dec 18 11:28:49 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? In-Reply-To: References: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> <20051218092626.GA10170@pglaf.org> <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> <910fee4a0512180334x4de97e2bw75dc6cd48fccd269@mail.gmail.com> <4baf53720512180344w249362b3m10337b782cd98f81@mail.gmail.com> <43A58CF2.9090405@hutchinson.net> Message-ID: <20051218192840.GA25910@panix.com> On Sun, Dec 18, 2005 at 11:09:08AM -0800, Andrew Sly wrote: > > >On Sun, 18 Dec 2005, Joshua Hutchinson wrote: > >> >On 12/18/05, Cori wrote: >> > >> >>But if, for example, one read "A Christmas Carol" and it was given a >> >>new number - would / could there also be a link from eText #46 ..? >> >> >> > >> >It would make sense to use the number of the source text -- the audio >> >version is after all just a new edition of the same text. >> > >> >-- >> >Jon Ingram >> > >> If the audio file is read from our text, it should go into the etext >> number it was read from. If it derived from someone else's edition, >> then it would get a new number. >> >> At least, that's how I understand it. >> >> Josh > >Yes, that sounds like a good idea. If there is any question, >I'd say that posting the audio book under a new PG number >would be the best choice. It's worth keeping in mind that, >particularly for well known books, there can be multiple >forms out there. A few examples: >We have more than one edition of Darwin's "Origin of Species". >We have a "young Folk's edition" of "Black Beauty" as well as >"regular" edition. I understand that much of our P.G. Woodhouse >is from American editions which sometimes vary greatly from >the original British editions. > >So I would not be too hasty to put audiobooks under the >same number unless we are certain that they prepared from >that same text. I'm with Josh and Andrew. _If_ the audio is made from the same edition as ours, it's just a new format of that number. In some previous cases, the producer of the computer-read file claimed copyright, and that's why Greg gave them new numbers, because we couldn't post both copyrighted and PD content under the same number, even though it was made from the same source text. We also said we'd replace them when the technology improved. If we don't know that the audio is from the same edition, then it should get a new number. jim From jon at noring.name Sun Dec 18 13:12:34 2005 From: jon at noring.name (Jon Noring) Date: Sun Dec 18 13:19:10 2005 Subject: [gutvol-d] Announcing "Distributed Scanners" discussion group (Scanning public domain texts) Message-ID: <1957558326.20051218141234@noring.name> [Feel free to redistribute this announcement to other forums where on-topic, such as scanning, graphics, books and publishing, library and archives, etc. Thanks.] Everyone, Several people privately expressed interest in the "Distributed Scanners" (DistScan) idea I recently outlined to the Book People forum. So, I've taken the next step and created a Yahoo discussion group to further explore this idea -- to see if it has any legs. You are invited to join -- refer to the info at the end of this message.) To summarize the idea: Is there interest and need for a volunteer-driven, large-scale distributed scanning project of public domain books and other documents modeled after (where applicable) Distributed Proofreaders? The full group description, and the current expanded summary of the idea (which will undoubtedly change and improve over time as we better understand the various issues) is given at: http://groups.yahoo.com/group/distscan/ To be clear, this group does not actually launch the project, but rather serves only to bring together sharp, like-minded people to explore the idea -- to see if there is a "working formula" that makes sense, and if we can assemble a core group of people with the needed skill sets and interest to be able to successfully launch the project. The goal, of course, is to accelerate the high-quality scanning of public domain texts. It is not intended to be competitive with other projects to scan the public domain, such as those managed by the Internet Archive (e.g. OCA), but rather to augment and possibly even work in cooperative fashion with those projects (including Distributed Proofreaders.) Please read carefully the group description at the above URL. If you wish to comment on this message, I encourage you to join the group and post your comment there. Or, email it to me in private and I may post it to the group (with your identity removed unless requested otherwise). Anyone interested in scanning the public domain (whether a private individual or representing an institution) is invited to participate. We definitely need people with expertise in a very wide range of areas. Since DistScan will likely have many components, you are probably expert in one of them! Do join and contribute to the discussion. Thanks! Jon Noring (p.s., there are three ways to subscribe to the DistScan group: 1) Use your YahooID and click on the "Join This Group!" button at the above URL. 2) Send a blank email to: distscan-subscribe@yahoogroups.com (No need to get a YahooID to subscribe this way.) 3) Ask me to subscribe you with the email address you want to use. No need to get a YahooID to subscribe this way.) From ciesiels at bigpond.net.au Sun Dec 18 12:25:46 2005 From: ciesiels at bigpond.net.au (Michael Ciesielski) Date: Sun Dec 18 14:16:29 2005 Subject: [gutvol-d] www.gutenberg.org In-Reply-To: <414C727B.6080406@perathoner.de> References: <414C727B.6080406@perathoner.de> Message-ID: <43A5C5CA.7020105@bigpond.net.au> It's been over a year since Marcello announced the change from gutenberg.net to gutenberg.org. Is there any particular reason why the PG header and footer inserted in newly posted texts refers to gutenberg.net in several places? Thanks! Mike Marcello Perathoner, on 19/09/2004 03:38, wrote: > As of my request ibiblio has changed our apache virtual host > ServerName from gutenberg.net to www.gutenberg.org. > > What? > > Nothing changes in web site operation except one little particular (if > you hit a non-existing url you get redirected to www.gutenberg.org > instead of gutenberg.net). > > > Why? > > The .org top-level domain is meant for non-profit organizations while > the .net domain is meant for network infrastructure such as providers, > backbones etc. > > We own both gutenberg.net and gutenberg.org, but using the latter one > is just more standard-compliant. > > Although both urls www.gutenberg.net and www.gutenberg.org give > exactly the same results, you should start using www.gutenberg.org in > all publications, papers etc. > > > From jon.ingram at gmail.com Sun Dec 18 14:33:22 2005 From: jon.ingram at gmail.com (Jon Ingram) Date: Sun Dec 18 14:33:32 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? In-Reply-To: <20051218192840.GA25910@panix.com> References: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> <20051218092626.GA10170@pglaf.org> <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> <910fee4a0512180334x4de97e2bw75dc6cd48fccd269@mail.gmail.com> <4baf53720512180344w249362b3m10337b782cd98f81@mail.gmail.com> <43A58CF2.9090405@hutchinson.net> <20051218192840.GA25910@panix.com> Message-ID: <4baf53720512181433s379d3440i22b53bf3f87da29@mail.gmail.com> On 12/18/05, Jim Tinsley wrote: > I'm with Josh and Andrew. _If_ the audio is made from the same > edition as ours, it's just a new format of that number. In > some previous cases, the producer of the computer-read file > claimed copyright, and that's why Greg gave them new numbers, > because we couldn't post both copyrighted and PD content under > the same number, even though it was made from the same source > text. We also said we'd replace them when the technology improved. > > If we don't know that the audio is from the same edition, then > it should get a new number. Sounds very sensible. In the case of all the Librivox projects I've been involved with, they've definitely been based on specific PG texts -- see http://librivox.org/forum/viewtopic.php?t=376 for an example. Librivox also explicity releases all the material it records back into the public domain, rather than retaining any rights over it. -- Jon Ingram From gbnewby at pglaf.org Sun Dec 18 17:43:02 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Dec 18 17:43:04 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? In-Reply-To: <4baf53720512181433s379d3440i22b53bf3f87da29@mail.gmail.com> References: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> <20051218092626.GA10170@pglaf.org> <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> <910fee4a0512180334x4de97e2bw75dc6cd48fccd269@mail.gmail.com> <4baf53720512180344w249362b3m10337b782cd98f81@mail.gmail.com> <43A58CF2.9090405@hutchinson.net> <20051218192840.GA25910@panix.com> <4baf53720512181433s379d3440i22b53bf3f87da29@mail.gmail.com> Message-ID: <20051219014302.GA10851@pglaf.org> On Sun, Dec 18, 2005 at 10:33:22PM +0000, Jon Ingram wrote: > On 12/18/05, Jim Tinsley wrote: > > I'm with Josh and Andrew. _If_ the audio is made from the same > > edition as ours, it's just a new format of that number. In > > some previous cases, the producer of the computer-read file > > claimed copyright, and that's why Greg gave them new numbers, > > because we couldn't post both copyrighted and PD content under > > the same number, even though it was made from the same source > > text. We also said we'd replace them when the technology improved. > > > > If we don't know that the audio is from the same edition, then > > it should get a new number. > > Sounds very sensible. In the case of all the Librivox projects I've > been involved with, they've definitely been based on specific PG texts > -- see > http://librivox.org/forum/viewtopic.php?t=376 > for an example. > > Librivox also explicity releases all the material it records back into > the public domain, rather than retaining any rights over it. I didn't know this -- I thought they kept a copyright with a creative commons-style license. Thanks for setting me straight. Mixing public domain with copyrighted items in the same eBook # is something I'd (still) like to avoid. So: Yes, we can just put these public domain audio performances in with the other formats. That makes the job slightly more difficult: for pre-10K items, it would be best to go ahead and update them for post-10K. (Not 100% required, but it would be nice -- otherwise, it's leaving a mess for someone to clean up in the future. Remember that the etext?? directories do *not* have subdirectories, making the post-10K structure much more suitable for multi-file audio eBooks.) For post-10K items it's a little simpler...just adding in the files. -- Greg From gbnewby at pglaf.org Sun Dec 18 17:46:21 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Dec 18 17:46:22 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? In-Reply-To: <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> References: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> <20051218092626.GA10170@pglaf.org> <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> Message-ID: <20051219014621.GC10851@pglaf.org> On Sun, Dec 18, 2005 at 11:08:51AM +0000, Jon Ingram wrote: > On 12/18/05, Greg Newby wrote: > > Jon, if you're willing to do the preparation by hand (perhaps following > > some of our earlier models, linked from http://gutenberg.org/audio), I > > could pre-allocate some eBook #s to use. If you could get me just one > > .zip with everything pre-configured, it would be very easy for me to > > upload. Oh, and a standard "posted" email message, too. > > > > At least a dozen or so items from literalsystems.org are also ready to > > be brought in, if anyone is inspired to do these. These audio eBooks > > are actually very popular, and the human readings often get > > complimented. (The computer-generated readings are not nearly so > > popular, but still surprisingly popular.) I just wish there were enough > > hours in the day for me to work through all of these pending items, > > or someone else interested enough to take the lead (hint, hint). > > Thanks for the reply. I'll look into the preparation you require. The > ultimate aim is that all Librivox material would be uploaded to > Project Gutenberg, in addition to the Internet Archive, which is the > current destination. Ideally this would be done in a fairly automated > way. Emailing the files is probably out, as they're quite large > (128kbit MP3, so approximately 1 meg per minute of audio). Is there > any mechanism for uploading them to a server in the same way that > normal texts are uploaded to PG? Yes -- there is a non-anoymous ftp server on pglaf.org. Or, you can just put 'em where I can find 'em on ibiblio.org. If we can figure out how to make this "fit" with our existing processes, there is also http://upload.pglaf.org . So far, as I described earlier, it's not been a great match... -- Greg From tb at baechler.net Mon Dec 19 07:37:49 2005 From: tb at baechler.net (Tony Baechler) Date: Mon Dec 19 07:37:30 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? In-Reply-To: <20051219014302.GA10851@pglaf.org> References: <4baf53720512181433s379d3440i22b53bf3f87da29@mail.gmail.com> <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> <20051218092626.GA10170@pglaf.org> <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> <910fee4a0512180334x4de97e2bw75dc6cd48fccd269@mail.gmail.com> <4baf53720512180344w249362b3m10337b782cd98f81@mail.gmail.com> <43A58CF2.9090405@hutchinson.net> <20051218192840.GA25910@panix.com> <4baf53720512181433s379d3440i22b53bf3f87da29@mail.gmail.com> Message-ID: <5.2.0.9.0.20051219073344.02a8e6c0@127.0.0.1> At 05:43 PM 12/18/2005 -0800, you wrote: >So: Yes, we can just put these public domain audio performances in with >the other formats. That makes the job slightly more difficult: for >pre-10K items, it would be best to go ahead and update them for >post-10K. (Not 100% required, but it would be nice -- otherwise, it's >leaving a mess for someone to clean up in the future. Remember that the >etext?? directories do *not* have subdirectories, making the post-10K >structure much more suitable for multi-file audio eBooks.) Hello. Why not just add an extra zip file with the individual mp3 files? Not neat, but it would work. Just take the base etext name and add -mp3.zip to the end. That only adds one extra file. The problem of course is that you're stuck downloading everything. Is there any reason why there can't be subdirectories in the etext dirs? That way you could have one big zip file with mp3 files and one subdirectory with them individually available for download. Yes, it's still a mess but would get us by until all the ebooks are reposted. From gbnewby at pglaf.org Mon Dec 19 10:03:14 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Dec 19 10:03:15 2005 Subject: [gutvol-d] How do we submit audio versions of PG texts? In-Reply-To: <5.2.0.9.0.20051219073344.02a8e6c0@127.0.0.1> References: <4baf53720512172314q738c42a3g84ca6cf2f28d68ec@mail.gmail.com> <20051218092626.GA10170@pglaf.org> <4baf53720512180308g7ef6ea36lda1e7b982d3da4da@mail.gmail.com> <910fee4a0512180334x4de97e2bw75dc6cd48fccd269@mail.gmail.com> <4baf53720512180344w249362b3m10337b782cd98f81@mail.gmail.com> <43A58CF2.9090405@hutchinson.net> <20051218192840.GA25910@panix.com> <4baf53720512181433s379d3440i22b53bf3f87da29@mail.gmail.com> <5.2.0.9.0.20051219073344.02a8e6c0@127.0.0.1> Message-ID: <20051219180314.GA26645@pglaf.org> On Mon, Dec 19, 2005 at 07:37:49AM -0800, Tony Baechler wrote: > At 05:43 PM 12/18/2005 -0800, you wrote: > > >So: Yes, we can just put these public domain audio performances in with > >the other formats. That makes the job slightly more difficult: for > >pre-10K items, it would be best to go ahead and update them for > >post-10K. (Not 100% required, but it would be nice -- otherwise, it's > >leaving a mess for someone to clean up in the future. Remember that the > >etext?? directories do *not* have subdirectories, making the post-10K > >structure much more suitable for multi-file audio eBooks.) > > > Hello. Why not just add an extra zip file with the individual mp3 > files? Not neat, but it would work. Just take the base etext name and add > -mp3.zip to the end. That only adds one extra file. The problem of course > is that you're stuck downloading everything. Is there any reason why there > can't be subdirectories in the etext dirs? That way you could have one big > zip file with mp3 files and one subdirectory with them individually > available for download. Yes, it's still a mess but would get us by until > all the ebooks are reposted. These things are possible, of course, but would be inconsistent with our other holdings. I'd rather be consistent, for many reasons. -- Greg From wally.thompson at gmail.com Tue Dec 20 18:05:31 2005 From: wally.thompson at gmail.com (Wally Thompson) Date: Tue Dec 20 18:12:07 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after poetry Message-ID: I'm working on a book that has poetry within the text. Sometimes a poem ends the paragraph and sometimes it does not. So I'm wondering how to handle this in the text file without leaving it ambiguous. Wally -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051220/1cc23db9/attachment.html From Gutenberg9443 at aol.com Tue Dec 20 21:11:36 2005 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Tue Dec 20 21:12:02 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... Message-ID: <237.410b1c1.30da3e08@aol.com> In a message dated 12/20/2005 7:12:21 P.M. Mountain Standard Time, wally.thompson@gmail.com writes: I'm working on a book that has poetry within the text. Sometimes a poem ends the paragraph and sometimes it does not. So I'm wondering how to handle this in the text file without leaving it ambiguous. Wally, could you provide a brief example? I don't do much with proofreading on PG, but I am a book publisher. If I saw an example in which the poem ends the paragraph and the poem does not end the paragraph, it would help. Are you saying that in some cases the poem is within the paragraph and the paragraph continues after the poem, or that the paragraph ends but the poem carries over to the next paragraph? I've seen both, and of course they're handled differently. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051221/4ccb4da9/attachment.html From wally.thompson at gmail.com Tue Dec 20 22:07:09 2005 From: wally.thompson at gmail.com (Wally Thompson) Date: Tue Dec 20 22:07:28 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <237.410b1c1.30da3e08@aol.com> References: <237.410b1c1.30da3e08@aol.com> Message-ID: On 12/20/05, Gutenberg9443@aol.com wrote: > > In a message dated 12/20/2005 7:12:21 P.M. Mountain Standard Time, > wally.thompson@gmail.com writes: > > I'm working on a book that has poetry within the text. Sometimes a poem > ends the paragraph and sometimes it does not. So I'm wondering how to > handle this in the text file without leaving it ambiguous. > > Wally, could you provide a brief example? I don't do much with > proofreading on PG, but I am a book publisher. If I saw an example in which > the poem ends the paragraph and the poem does not end the paragraph, it > would help. Are you saying that in some cases the poem is within the > paragraph and the paragraph continues after the poem, or that the paragraph > ends but the poem carries over to the next paragraph? I've seen both, and of > course they're handled differently. > > Anne > Here is example 1, where a new paragraph begins directly after the poem: http://cdl.library.cornell.edu/cgi-bin/moa/pageviewer?frames=1&coll=moa&view=50&root=%2Fmoa%2Fatla%2Fatla0018%2F&tif=00706.TIF&cite=http%3A%2F%2Fcdl.library.cornell.edu%2Fcgi-bin%2Fmoa%2Fmoa-cgi%3Fnotisid%3DABK2934-0018-102 Here is example 2, where the paragraph continues after the poem: http://cdl.library.cornell.edu/cgi-bin/moa/pageviewer?frames=1&coll=moa&view=50&root=%2Fmoa%2Fatla%2Fatla0018%2F&tif=00739.TIF&cite=http%3A%2F%2Fcdl.library.cornell.edu%2Fcgi-bin%2Fmoa%2Fmoa-cgi%3Fnotisid%3DABK2934-0018-105 These examples come from the Atlantic Monthly, December 1866. The Gutenberg Ebook is located at http://www.gutenberg.org/files/17217/17217-8.txt. The Gutenberg Ebook formats both examples in the same way. So the reader of the ebook faces an ambiguity as to whether or not a new paragraph begins after the poem. In the book that I'm working on, I'm facing the same issue. I would like to make it clear to the reader weather or not a new paragraph begins after a poem. But I would also like to be consistent with other Gutenberg Ebooks. So I'm wondering how others have dealt with this problem. Wally -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051220/bbdbb57c/attachment.html From jtinsley at pobox.com Tue Dec 20 22:35:10 2005 From: jtinsley at pobox.com (Jim Tinsley) Date: Tue Dec 20 22:38:30 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: References: <237.410b1c1.30da3e08@aol.com> Message-ID: <20051221063510.GB4683@panix.com> On Tue, Dec 20, 2005 at 11:07:09PM -0700, Wally Thompson wrote: >> > >Here is example 1, where a new paragraph begins directly after the poem: > >http://cdl.library.cornell.edu/cgi-bin/moa/pageviewer?frames=1&coll=moa&view=50&root=%2Fmoa%2Fatla%2Fatla0018%2F&tif=00706.TIF&cite=http%3A%2F%2Fcdl.library.cornell.edu%2Fcgi-bin%2Fmoa%2Fmoa-cgi%3Fnotisid%3DABK2934-0018-102 > >Here is example 2, where the paragraph continues after the poem: > >http://cdl.library.cornell.edu/cgi-bin/moa/pageviewer?frames=1&coll=moa&view=50&root=%2Fmoa%2Fatla%2Fatla0018%2F&tif=00739.TIF&cite=http%3A%2F%2Fcdl.library.cornell.edu%2Fcgi-bin%2Fmoa%2Fmoa-cgi%3Fnotisid%3DABK2934-0018-105 > >These examples come from the Atlantic Monthly, December 1866. The Gutenberg >Ebook is located at http://www.gutenberg.org/files/17217/17217-8.txt. > >The Gutenberg Ebook formats both examples in the same way. So the reader of >the ebook faces an ambiguity as to whether or not a new paragraph begins >after the poem. In the book that I'm working on, I'm facing the same issue. >I would like to make it clear to the reader weather or not a new paragraph >begins after a poem. But I would also like to be consistent with other >Gutenberg Ebooks. Thanks for the question, Wally. It's a good one. You can't do both. In the same circumstances, I would change the normal conventions either a) to indent the first line of each actual paragraph or b) to introduce two blank lines, rather than one, after a poem where a new para begins after it. and clarify what I was doing, and why I was doing it, by means of a Transcriber's Note at the top of the file. There are probably other reasonable ways of indicating the necessary distinction, but the general formula of "choose one, and leave a Transcriber's Note that applies to the whole text" will work for any of them. Any general set of rules need to be bent for some texts, and the Transcriber's Note, by which you communicate to the reader how and why you rendered _this_ one differently, is a time-honored way of regularizing exceptions, so don't be too worried about complete consistency with the most common cases. jim From traverso at dm.unipi.it Wed Dec 21 00:57:51 2005 From: traverso at dm.unipi.it (Carlo Traverso) Date: Wed Dec 21 00:48:45 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <237.410b1c1.30da3e08@aol.com> (Gutenberg9443@aol.com) References: <237.410b1c1.30da3e08@aol.com> Message-ID: <200512210857.jBL8vpC29786@pico.dm.unipi.it> >>>>> "Anne" == Gutenberg9443 writes: Anne> In a message dated 12/20/2005 7:12:21 P.M. Mountain Standard Anne> Time, wally.thompson@gmail.com writes: Wally> I'm working on a book that has poetry within the Wally> text. Sometimes a poem ends the paragraph and sometimes it Wally> does not. So I'm wondering how to handle this in the text Wally> file without leaving it ambiguous. Anne> Wally, could you provide a brief example? I don't do much Anne> with proofreading on PG, but I am a book publisher. If I saw Anne> an example in which the poem ends the paragraph and the poem Anne> does not end the paragraph, it would help. Are you saying Anne> that in some cases the poem is within the paragraph and the Anne> paragraph continues after the poem, or that the paragraph Anne> ends but the poem carries over to the next paragraph? I've Anne> seen both, and of course they're handled differently. Anne> Anne Practically: in a printed book, you recognize that a new paragraph starts after the poem from the indent. PG books do not indent paragraphs, but mark with a blank line. So there is a blank line always, since after a poem one leaves a blank line anyway. The problem is that if you want to reformat a book with indented paragraphs, the information is gone, and you cannot decide if there is a new paragraph or not (unless e.g. the new text starts with lowercase, since a new paragraph starts a new sentence. But of course there is the possibility that there is a new sentence but not a new paragraph...) This shows once more that presentational markup with blank lines only is insufficient to capture all the informations. The problem is not only for poetry, it is also valid for quotes; with the added complication that a quote can be subdivided in paragraphs, (or a poem subdivided in stanzas) and nevertheless the quotation ban be contained in the body of a paragraph: He said (first paragraph of a quotation) (second paragraph of the quotation) and even added (more quotation) After a long pause he added (final quotation) This is probably two paragraphs, but maybe only one. Carlo From Bowerbird at aol.com Wed Dec 21 04:08:43 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Dec 21 04:09:13 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... Message-ID: <276.1e9e111.30da9fcb@aol.com> carlo said: > This shows once more that > presentational markup with blank lines only > is insufficient to capture all the informations. um, do you mind if we let a person who has actually _developed_ a zen markup language that makes use of blank lines answer this question, please? because i certainly _have_ an answer... but let's tackle first things first, ok? wally, you should receive some _prize_ -- and i will give you one, i will indeed -- for doing the work that led you to ask this question, so thank you sir! :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051221/4dff1372/attachment.html From hyphen at hyphenologist.co.uk Wed Dec 21 06:04:59 2005 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Wed Dec 21 06:12:32 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after poetry In-Reply-To: References: Message-ID: On Tue, 20 Dec 2005 19:05:31 -0700, Wally Thompson wrote: | I'm working on a book that has poetry within the text. Sometimes a poem | ends the paragraph and sometimes it does not. So I'm wondering how to | handle this in the text file without leaving it ambiguous. When producing a .txt file it is easy. I insert spaces at the beginning of a line, just as I do when programming. My paragraphs are just two consecutive new lines. IME tabs can be anything from 2 to 8 spaces and so often display wrongly. -- Dave Fawthrop "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From Bowerbird at aol.com Wed Dec 21 10:28:06 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Dec 21 10:28:20 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after poetry Message-ID: <2a2.293e026.30daf8b6@aol.com> dave said: > When producing a .txt file it is easy. > > I insert spaces at the beginning of a line, > just as I do when programming. > > My paragraphs are just two consecutive new lines.?? > > IME tabs can be anything from 2 to 8 spaces > and so often display wrongly. bingo. there's the "right" answer. thanks dave. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051221/bbb77fcd/attachment-0001.html From lee at novomail.net Wed Dec 21 12:32:52 2005 From: lee at novomail.net (Lee Passey) Date: Wed Dec 21 12:32:57 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <20051221063510.GB4683@panix.com> References: <237.410b1c1.30da3e08@aol.com> <20051221063510.GB4683@panix.com> Message-ID: <43A9BBF4.6030605@novomail.net> Jim Tinsley wrote: > On Tue, Dec 20, 2005 at 11:07:09PM -0700, Wally Thompson wrote: [snip] > > I would like to make it clear to the reader weather or not a new > > paragraph begins after a poem. But I would also like to be > > consistent with other Gutenberg Ebooks. > > Thanks for the question, Wally. It's a good one. > > You can't do both. I think's it's obvious, Wally, that what you need is a markup language. The problem is that historically Project Gutenberg has been considered a NMA (No Markup Allowed) zone. In recent years, with the addition of HTML-formated works, this "standard" has been relaxed, but it is still required that works submitted to PG be reduced, in at least one instantiation, to a non-marked-up format. The way to get around this requirement it to create a markup language (perhaps only powerful enough to deal with the one problem you have encountered) that doesn't _look_ like a markup language, and thus might slip through unnoticed. These types of markup languages have been variously referred to as "unobtrusive markup languages" or "smart ASCII." Examples include ReStructured Text (http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html) and Bowerbird's Zen Markup Language (unpublished). > In the same circumstances, I would change the normal conventions > either > > a) to indent the first line of each actual paragraph > > or > > b) to introduce two blank lines, rather than one, after a poem where > a new para begins after it. > > and clarify what I was doing, and why I was doing it, by means of a > Transcriber's Note at the top of the file. This was my first inclination also. Essentially, the Gutenberg Markup Language defines the end of a text block (which may be a paragraph or may be something else) as text which ends with two consecutive newline sequences. Thus, to indicate that the poem is contained by the paragraph, it should be a simple matter of including only a single newline sequence before the beginning (and after the end) of the poem. Unfortunately, there is some uncertainty about whether single newline sequences have significance in GML. One of the biggest complaints that users of handheld devices have about GutenTexts is that with displays that have widths of less than 80 characters (and usually not a convenient multiple thereof) if newlines are significant you will get texts that have three full lines of text followed by a line of just one or two words, followed three full lines of text, followed by one or two words, and so on. Thus, many user agents which are designed to display GutenTexts consider a single newline sequence as insignificant, and treat it simply as a space, as do many utilities which have been written to convert GutenTexts to more consumer-friendly formats. In these cases, in addition to losing the "lininess" of the poem, you will also lose its "blockiness" as well. So what you need is a way to indicate "this is a block which does not end the previous block, but is encapsulated within it" as well as a way to indicate "this is a significant line ending which must not be removed." Solving the second problem may also solve the first. Let's assume that for a mandatory line endings we use the ASCII sequence "
." You could encode your second example as: satisfy the best soodra society,--
"With the yellow torches gleaming,
And the scarlet mantles streaming,
And the canopy above
Swaying as they slowly move."
Karlee has assured me that neither his
In this case, the mandatory newline sequences convey the meaning you desire, both as to the stanzas of the verse and as to the fact that it is part and parcel of the enveloping paragraph. Of course, I wouldn't use the "
" sequence as the mandatory newline indicator, as it smacks to much of XHTML, and may draw the attention of the markup police. Instead you could use something less intrusive, such as the unix newline code ("\n"), some unusual sequence that has no semantic overloading (such as "^!"), or something more descriptive (and thus less "codey") such as "{end line here}." Now these proposals all sacrifice the Gutenberg consistency for textual expressiveness. It is also possible to sacrifice the textual expressiveness for the sake of simplicity of text--and this may be the better choice. Almost all texts coming from Distributed Proofreaders these days are submitted (and available) in XHTML format, which clearly has the expressive power to satisfy this type of construct. Perhaps the right thing to do is to consider the XHTML version as the canonical version, and simply place a Transcriber's Note at the beginning of the simplified text version to the effect that, "Some structures in this book are not expressible in a simple text format, and have therefore been omitted here. Readers interested in a more accurate representation of this publication should refer to the file '17217.html'." > There are probably other reasonable ways of indicating the necessary > distinction, but the general formula of "choose one, and leave a > Transcriber's Note that applies to the whole text" will work for any > of them. This is good advice. In the end, it probably doesn't matter as much what you do, as it does that you simply inform people what you have done, and why you did it. From Bowerbird at aol.com Wed Dec 21 12:48:03 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Dec 21 12:48:22 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... Message-ID: <6.52f67788.30db1983@aol.com> lee said: > The problem is that historically Project Gutenberg > has been considered a NMA (No Markup Allowed) zone. just because _you_ can't "see" the markup, lee, doesn't mean that it isn't there... one or more leading-spaces in a line is a signal that the line should not be wrapped to the line above it... as you are someone who worked on "tidy", i'd have expected that you'd be familiar with this convention, since -- to the best of my knowledge -- tidy uses it... there _are_ difficult cases. but the ones wally gave are straightforward applications of dirt-simple rules... > Perhaps the right thing to do is to consider > the XHTML version as the canonical version um, no thanks... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051221/1e48c3b4/attachment.html From jon.ingram at gmail.com Wed Dec 21 13:29:44 2005 From: jon.ingram at gmail.com (Jon Ingram) Date: Wed Dec 21 13:29:55 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <6.52f67788.30db1983@aol.com> References: <6.52f67788.30db1983@aol.com> Message-ID: <4baf53720512211329w705c997lb1b5b216c72a6ace@mail.gmail.com> On 12/21/05, Bowerbird@aol.com wrote: > lee said: > > Perhaps the right thing to do is to consider > > the XHTML version as the canonical version > > um, no thanks... Indeed. It would be more sensible to mark the poem (indeed the whole work) in TEI, and use that as the canonical version. Several DPers are working quite hard in preparing the way for a gradual transition to a PG-themed TEI. Hopefully within a couple of years almost all the work DP produces will be in a format which will enable us to produce multiple document versions (HTML, text, PDF, LaTeX) with very little human intervention required. Until that's ready, however, using something like XHTML sensibly can work very well -- this means marking structure, not presentation. Don't mark up a poem like this:
satisfy the best soodra society,--
"With the yellow torches gleaming,
And the scarlet mantles streaming,
And the canopy above
Swaying as they slowly move."
Karlee has assured me that neither his
but instead enclose each line within a start/end tag, each stanza within a start/end tag, etc., using CSS to fine tune the presentational details. -- Jon Ingram From jon at noring.name Wed Dec 21 13:43:22 2005 From: jon at noring.name (Jon Noring) Date: Wed Dec 21 13:43:38 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <6.52f67788.30db1983@aol.com> References: <6.52f67788.30db1983@aol.com> Message-ID: <1834658482.20051221144322@noring.name> Bowerbird wrote: > lee said: >>?? The problem is that historically Project Gutenberg >>?? has been considered a NMA (No Markup Allowed) zone. > just because _you_ can't "see" the markup, > lee, doesn't mean that it isn't there... > > one or more leading-spaces in a line is a signal that > the line should not be wrapped to the line above it... > > as you are someone who worked on "tidy", i'd have > expected that you'd be familiar with this convention, > since -- to the best of my knowledge -- tidy uses it... > > there _are_ difficult cases.? but the ones wally gave > are straightforward applications of dirt-simple rules... I'm under the impression that PG did not establish any rules or guidelines early on in the game as to how to format plain text to unambiguously express various document structures. ZML is intended to be such a uniform ruleset for regularizing plain etexts. In fact, the need for Bowerbird to even invent ZML indicates to me that he saw a need for ZML after he studied PG etexts and saw variations in conventions. Bowerbird is the expert on this, so I defer to him to discuss if indeed he saw variation in how PG plain texts communicated structure. No matter, I believe that eventually all the older PG texts, including a lot of the classics, will be remastered (probably by DP) into TEI or XHTML. They will definitely NOT be mastered in ZML. If ZML is used at all, it will be as a derivative format so the plain text enthusiasts have something uniform to use. Jon From lee at novomail.net Wed Dec 21 13:53:45 2005 From: lee at novomail.net (Lee Passey) Date: Wed Dec 21 13:53:54 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <6.52f67788.30db1983@aol.com> References: <6.52f67788.30db1983@aol.com> Message-ID: <43A9CEE9.9000006@novomail.net> Bowerbird@aol.com wrote: > as you are someone who worked on "tidy", i'd have > expected that you'd be familiar with this convention, > since -- to the best of my knowledge -- tidy uses it... Absolutely not. In essence, Tidy parses an HTML document into an internal DOM tree, fixing some egregious errors as it goes. Because newline characters are just whitespace in HTML, and multiple runs of whitespace are not significant, newlines are converted to spaces, and runs of whitespace are collapsed, at parse time. A few other fixes are then made to convert the well-formed XML DOM into valid HTML, and then it gets spit back out. Whether any whitespace gets added to the beginning of lines is the user's choice. Personally, I usually turn both indenting and word-wrap off; default values are to wrap at column 68 and use 2 space indentation. From sly at victoria.tc.ca Wed Dec 21 14:19:04 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed Dec 21 14:19:15 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <43A9BBF4.6030605@novomail.net> References: <237.410b1c1.30da3e08@aol.com> <20051221063510.GB4683@panix.com> <43A9BBF4.6030605@novomail.net> Message-ID: Thanks for your message Lee. On Wed, 21 Dec 2005, Lee Passey wrote: > This is good advice. In the end, it probably doesn't matter as much what > you do, as it does that you simply inform people what you have done, and > why you did it. > Interestingly enough, the last place that I read something similar to that was in some TEI markup guidelines. Andrew From Bowerbird at aol.com Wed Dec 21 14:21:17 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Dec 21 14:21:35 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... Message-ID: <22c.3a2f0c7.30db2f5d@aol.com> lee said: > Absolutely not. In essence, ok. i stand corrected. on tidy. but tidy was primarily an aside. (however, you are now officially _exonerated_ for not knowing of the leading-space convention.) as for z.m.l., the rules stand. any line of poetry must have one or more leading spaces. (the number of spaces should indicate the desired indent.) this prevents the line from wrapping to the line above it, in a rewrap situation. it also prevents the line below it from wrapping to it, by the way. because of this simple rule, a poem (or a block quote, or any of many other structures) that is embedded in a paragraph is always quite easy to recognize, whether it ends the paragraph or not. i've written another message that covers wally's specific examples in detail, so any questions should be deferred until i've posted that... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051221/a0f8f2e1/attachment.html From Bowerbird at aol.com Wed Dec 21 14:24:25 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Dec 21 14:24:39 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... Message-ID: <25b.410f849.30db3019@aol.com> jon ingram said: > It would be more sensible to mark the poem > (indeed the whole work) in TEI, > and use that as the canonical version. um, no thanks... > Hopefully within a couple of years > almost all the work DP produces > will be in a format which will enable us to > produce multiple document versions > (HTML, text, PDF, LaTeX) with > very little human intervention required. it's interesting that "the magical time" is always _a_couple_of_years_, isn't it? when i came to this list two years ago, it was "a couple years" away. and it still is. like a mirage, it's always within sight, but somehow never gets any closer... nonetheless, it looks like roger frank has made some good progress lately. keep up the good work, people... > Hopefully within a couple of years > almost all the work DP produces > will be in a format which will enable us to > produce multiple document versions > (HTML, text, PDF, LaTeX) with > very little human intervention required. i can produce text, html, and pdf with a .zml file today. so big whoop. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051221/94f14c28/attachment-0001.html From Bowerbird at aol.com Wed Dec 21 14:31:49 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Dec 21 14:32:01 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... Message-ID: <1a5.44ee8bbf.30db31d5@aol.com> jon noring said: > Bowerbird is the expert on this jon, i know what they teach you in the dale carnegie courses about "how to win friends and influence people" is to flatter people's egos until they will do whatever you want them to... and i grant it is astoundingly effective. but that crap doesn't work with me -- precisely because i'm hip to it -- so you shouldn't waste your breath... and please stop making yourself look silly by trying to tell people all about z.m.l., because you don't know jack about it... as for the p.g. e-texts, most of them are already _in_ z.m.l. format, or close to it, so "remastering them" in z.m.l. will be a task so simple a 4th-grader can do it, mostly verifying my tool's auto-formatting. and from that, multiple goodies will follow. but i'm still waiting to do that "remastering" until you markup fools throw away a lot more of your precious time and energy. a _smart_ poker player doesn't show his hand too soon when he knows he's got the other players beat, he gets them to put their money in the pot first. ok, now back to not talking to jon directly, since he moderates posts by michael hart over on his listserve, and i firmly believe that a pioneer like michael deserves more respect. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051221/557632ef/attachment.html From lee at novomail.net Wed Dec 21 14:49:38 2005 From: lee at novomail.net (Lee Passey) Date: Wed Dec 21 14:49:44 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <4baf53720512211329w705c997lb1b5b216c72a6ace@mail.gmail.com> References: <6.52f67788.30db1983@aol.com> <4baf53720512211329w705c997lb1b5b216c72a6ace@mail.gmail.com> Message-ID: <43A9DC02.8020207@novomail.net> Jon Ingram wrote: >Indeed. It would be more sensible to mark the poem (indeed the whole >work) in TEI, and use that as the canonical version. > I can definitely agree with this; I think TEI, even minimal TEI, is probably a better canonical form than XHTML. But TEI-encoded text has to exist, and it has to be made available to the public so they can get the canonical form if they desire. If TEI exists, use it, if not use the best encoding available. I don't think Mr. Thompson is prepared to create a canonical TEI version of the work just to solve this one problem (although it would be cool if he did). The Transcriber's Notes should simply refer to whatever canonical form is available. From jon at noring.name Wed Dec 21 15:54:25 2005 From: jon at noring.name (Jon Noring) Date: Wed Dec 21 15:54:40 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <1a5.44ee8bbf.30db31d5@aol.com> References: <1a5.44ee8bbf.30db31d5@aol.com> Message-ID: <1907931133.20051221165425@noring.name> Bowerbird wrote: > jon noring said: >>?Bowerbird is the expert on this [zml] > jon, i know what they teach you in the dale carnegie courses about > "how to win friends and influence people" is to flatter people's > egos until they will do whatever you want them to... > > and i grant it is astoundingly effective. > > but that crap doesn't work with me -- precisely because i'm hip > to it -- so you shouldn't waste your breath... Why not? You *are* the most knowledgeable expert here (as far as I know) on regularized plain text. You've done a lot of study on the structure of PG plain texts, and used that study to design the uniform ZML set of guidelines. This is a statement of truth. Anyway, why would I try to win you over? I'm only showing respect to the talents and contributions you've made to the ebook universe. You can accept or reject that respect as you see fit. I guess you have rejected it. So be it. I'm not hurt by your rejection. > and please stop making yourself look silly by trying to tell > people all about z.m.l.,because you don't know jack about it... Oh? Well, what you have published on ZML, I understand quite well. It is possible to map ZML to a particular XHTML vocabulary, and back to a canonically equivalent ZML. (It will also be trivial to convert ZML to OpenReader/OEBPS and back again.) So from that perspective I know it all too well. Anyway, as you've noted, even a 4th grader can understand ZML. I guess you believe that I never graduated from the third grade. > but i'm still waiting to do that "remastering" until you markup > fools throw away a lot more of your precious time and energy. > a _smart_poker player doesn't show his hand too soon when he > knows he's got the other players beat, he gets them to put their > money in the pot first. You know, if you instead worked *with* people (rather than viewing everyone else as enemies that you must outwit), and started a SourceForge project, assembled like-minded people, you'd be much further along, and would probably have won a lot more hearts and minds. You believe that your "go it alone" approach will win out, but it won't. I know ZML and all the "appz" you are writing will mean diddly-squat. First, ZML is not sufficient for mastering texts for multiple digital publication purposes. For example, you've not bothered to address how in ZML you will enable standardized intra- and inter-publication deep linking, a powerful ebook function (see, for example, OSoft's Thout Reader -- what they do *cannot* be done with regularized plain text.) Second, you've turned off so many people on so many ebook-related forums that even if you are able to demonstrate ZML to be the greatest thing since sliced bread (or chocolate pudding), it will be ignored by the world. I see a place for ZML in the ebook universe, but not as you do. The final arbiter of our differen viewpoints will be the future. Again, I believe you should open source your "appz" as part of a SourceForge project. I really do believe that's the best way to promote your system, build it the quickest, and increase the chances it will get embraced at some level in ebookdom. If you believe I am saying this in order to sabotage your effort, then you truly do live in a world I don't understand. > ok, now back to not talking to jon directly, since he moderates > posts by michael hart over on his listserve, and i firmly believe > that a pioneer like michael deserves more respect. How many times have you replied to me when you said you'd no longer reply to me? I've rather enjoyed the (mostly) quiet times. Regarding the moderation issue, I posted about that last July (the communications are appended below for those interested). Anyway, John Mark Ockerbloom also moderates Michael (he moderates everyone) on his excellent Book People list, and I don't see Bowerbird complaining about that. Strange. Jon ********************************************************************* (here's Bowerbird's message after "discovering" I was, heaven forbid, moderating Michael Hart on The eBook Community. I follow it with my reply. Both dated 24 July 2005, and found in the gutvol-d archives, http://lists.pglaf.org/private.cgi/gutvol-d/2005-July/ Enjoy! >From Bowerbird at aol.com Sun Jul 24 12:09:32 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Jul 24 12:09:43 2005 Subject: [gutvol-d] why is michael hart being censored? Message-ID: <1aa.3b127c5c.3015416c@aol.com> a rather alarming thing has come to my attention. it seems that michael hart is being censored by jon noring over on jon's listserve. messages from michael appear to have to be "approved" before they are posted to the list. the evidence is sitting right there in the "headers" to see. jon, is this actually the case? i demand you confirm or deny. if so, i find it quite preposterous. love him or hate him, michael hart _invented_ e-books. when a lot of people were just flapping their mouths, michael sat down and started _typing_ various books into the computer and making them available globally, for _free_. and that's as in beer _and_ as in speech. if any person has earned the right to speak freely about electronic-books, michael has. (and let us remember that no one needs to "earn" this right in the first place; the right to speak freely is an inalienable human right -- one recognized in the charter of the united nations.) unless michael is allowed to post freely on jon's listserve, without having to go through the "mother may i" routine of having posts "approved" by jon or any other moderator, i -- for one -- will cease to converse with noring in any way. you've got until noon on monday to fix this egregious error, jon. -bowerbird ********************************************************************* (now for my reply) >From jon at noring.name Sun Jul 24 14:32:43 2005 From: jon at noring.name (Jon Noring) Date: Sun Jul 24 14:32:51 2005 Subject: [gutvol-d] why is michael hart being censored? In-Reply-To: <1aa.3b127c5c.3015416c@aol.com> References: <1aa.3b127c5c.3015416c@aol.com> Message-ID: <928867381.20050724153243@noring.name> Bowerbird wrote: > a rather alarming thing has come to my attention. > > it seems that michael hart is being censored by jon noring > over on jon's listserve.? messages from michael appear to > have to be "approved" before they are posted to the list. > the evidence is sitting right there in the "headers" to see. > > jon, is this actually the case? > i demand you confirm or deny. > > if so, i find it quite preposterous. What an odd message to gutvol-d! The timing is very strange, too. Answer below... > if any person has earned the right to speak freely about > electronic-books, michael has.? (and let us remember > that no one needs to "earn" this right in the first place; > the right to speak freely is an inalienable human right > -- one recognized in the charter of the united nations.) > > unless michael is allowed to post freely on jon's listserve, > without having to go through the "mother may i" routine > of having posts "approved" by jon or any other moderator, > i -- for one -- will cease to converse with noring in any way. > > you've got until noon on monday to fix this egregious error, jon. First point: The eBook Community is slowly being moved over to full moderation, mainly because it has to -- to fight spam and email spoofing. Every day I'm deleting a score or more of spam. The reason why TeBC is mostly clean (vis-a-vis spam) is because of my efforts to keep it clean. With over 2800 subscribers and growing, with links to it all over the Internet, TeBC is a big target for spammers, spoofers, and virus/trojan distributors. TeBC home page: http://groups.yahoo.com/group/ebook-community/ All new subscribers are automatically being put into moderation. Only some of the very old-timers are still on "grandfather clause", but that will change, too, eventually. (It's only fair for everyone that all are treated equally.) Other plans call for forming an advisory committee to oversee the operation of TeBC, including group moderation and maintaining/revising message requirements. Note that posts from *anyone* are disallowed only if they blatantly disparage others rather than focusing on rational and polite discourse of the topic at hand (reminds me of someone else I know -- yes, I know, a disparaging remark), or are wholly off-topic (announcements of particular ebook titles are usually off-topic, for example.) Second point: There is a history why Michael is on moderation, but am not at liberty to discuss it. If Michael brings up his side of the story, then I'll feel more free to discuss it from my perspective (which I may or may not.) But at this time it is between Michael and I. It is nothing really serious, actually -- at the time moderation was started, I had decided to slowly move TeBC over to full moderation in a few year time period. When it comes to moderation, I don't treat Michael any differently than I treat any of the others who are on moderation. His many messages and his perspectives of things are much welcome. Although I disagree with a lot of his views and philosophical approach to digitizing the Public Domain and am pretty vocal about it (I also agree with him on many things), I do respect him and his accomplishments. He is a welcome and honored participant in the eBook Community. Let's look at some statistics. Of the last 1354 messages to TeBC (all the posts since 01 Feb 2005), guess who posted more to TeBC than anyone else? You guessed right: Michael Hart with 160 messages, about 12% of the total. The next most prolific posters were 102, 63 (yours truly), 56 and 51, and then a few dozen more people posting fewer messages in that time frame. (I'm sure Michael will appreciate the statistics showing him to be #1 on TeBC.) Of course, the best person to answer this is Michael. He may not like being moderated -- who does? A few of the many messages he has posted over time were disallowed, but the total percentage is very low -- I won't go into the specifics but will say moderation was done mostly to prevent certain always-emotional topics, like copyright, from spiraling out of control, so I temporarily closed discussion on those topics. Interestingly, the copyright discussion is now heavy again on TeBC, and I'm keeping a close watch on it to halt it if it spirals out of control (which may filter out some of Michael's messages since copyright is a big concern of his.) Third point (a couple questions actually): Btw, should John Mark Ockerbloom also not moderate Michael on John's "Book People" list"? He moderates everyone, including Michael. Should Michael be given a pass because he is *the* Michael Hart? Note that I am moderated along with everyone else on BP, and I have no difficulty with that. I admire John's dedication in keeping discussion on Book People focused and civilized. Bowerbird, do you believe that all public forums should have no moderation nor kick anyone off for repeated and uncontrolled egregious behavior towards others (i.e., ad hominem attacks and focusing on the person's motives rather than on what they said), thereby disrupting the forum discussion? ********************************************************************* Anyway, to summarize and focus on the topic of this gutvol-d thread, I take fair moderation of TeBC seriously. And Michael is moderated no differently than everyone else. If anything, I cut him a little more slack than the others. Jon Noring [of course, the threatened Monday deadline passed, and I am still here. ] From wally.thompson at gmail.com Wed Dec 21 16:03:58 2005 From: wally.thompson at gmail.com (Wally Thompson) Date: Wed Dec 21 16:04:12 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <43A9DC02.8020207@novomail.net> References: <6.52f67788.30db1983@aol.com> <4baf53720512211329w705c997lb1b5b216c72a6ace@mail.gmail.com> <43A9DC02.8020207@novomail.net> Message-ID: On 12/21/05, Lee Passey wrote: > Jon Ingram wrote: > > >Indeed. It would be more sensible to mark the poem (indeed the whole > >work) in TEI, and use that as the canonical version. > > > I can definitely agree with this; I think TEI, even minimal TEI, is > probably a better canonical form than XHTML. But TEI-encoded text has to > exist, and it has to be made available to the public so they can get the > canonical form if they desire. If TEI exists, use it, if not use the > best encoding available. I don't think Mr. Thompson is prepared to > create a canonical TEI version of the work just to solve this one > problem (although it would be cool if he did). The Transcriber's Notes > should simply refer to whatever canonical form is available. Actually, I've intended to use TEI or something more capable to create a canonical version from the start. My original questions merely had to do with the plain text version, which I'm trying to prepare without losing too much information. Wally From jon.ingram at gmail.com Wed Dec 21 17:18:45 2005 From: jon.ingram at gmail.com (Jon Ingram) Date: Wed Dec 21 17:19:00 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: References: <6.52f67788.30db1983@aol.com> <4baf53720512211329w705c997lb1b5b216c72a6ace@mail.gmail.com> <43A9DC02.8020207@novomail.net> Message-ID: <4baf53720512211718p2144c36cob64e836b7f77f34b@mail.gmail.com> On 12/22/05, Wally Thompson wrote: > Actually, I've intended to use TEI or something more capable to create > a canonical version from the start. My original questions merely had > to do with the plain text version, which I'm trying to prepare without > losing too much information. Well, it's impossible to create a pure text edition of a complex document without losing information. Personally I think we spend far too much time at PG worrying about the look of the plain text edition -- for many complex documents the plain text edition is a very poor relation indeed. That said, in this specific case you'll find life easier if you don't try to change the indentations in the poetry to blocks -- we used to interpret indentations in poetry as indicating new stanzas at DP, and hence changed them to blank lines, but as you've noticed there are quite a few works that use both indentations and blank lines, so it's easiest just to replicate the indentation and gaps used in the original work. -- Jon Ingram From joshua at hutchinson.net Thu Dec 22 06:18:22 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Dec 22 06:18:25 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... Message-ID: <20051222141822.7CFCF2F91B@ws6-3.us4.outblaze.com> ----- Original Message ----- From: "Wally Thompson" > > Actually, I've intended to use TEI or something more capable to create > a canonical version from the start. My original questions merely had > to do with the plain text version, which I'm trying to prepare without > losing too much information. > Ifyou do create a TEI master, and you follow the PGTEI v0.4 DTD ... you don't have to worry about the txt and html. (or pdf) They will get generated automatically from the TEI master. If you want some help or a second set of eyes for this, give me a holler. I'd be happy to help out. Josh From marcello at perathoner.de Thu Dec 22 09:41:05 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Dec 22 09:41:01 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <25b.410f849.30db3019@aol.com> References: <25b.410f849.30db3019@aol.com> Message-ID: <43AAE531.8050009@perathoner.de> Bowerbird@aol.com wrote: > -- for immediate release -- > > date: 14 february 2003 > dateline: los angeles, california > contact: bowerbird intelligentleman > bowerbird@aol.com 310.980.9202 > > bowerbird intelligentleman announces > an open-source project geared toward > creating an o.e.b. "presentation system", > i.e., a cross-platform reader-program > that will allow users to read o.e.b files. > > [...] > > bowerbird further indicated that he is fully confident that > the effort would bear fruit quickly, since he has previously > programmed a wide variety of electronic-book applications. > it's interesting that "the magical time" > is always _a_couple_of_years_, isn't it? > when i came to this list two years ago, > it was "a couple years" away. and it still is. > like a mirage, it's always within sight, > but somehow never gets any closer... May I ask how many copies of this "presentation system" you have distributed so far and how many people are actively using it ? > i can produce text, html, and pdf > with a .zml file today. so big whoop. As long as anybody isn't that bigoted as to want *validating* html. In that case he'll have to wait ... hmmmm ... a couple of years more ? -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Thu Dec 22 09:41:15 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Dec 22 09:41:08 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <1a5.44ee8bbf.30db31d5@aol.com> References: <1a5.44ee8bbf.30db31d5@aol.com> Message-ID: <43AAE53B.9070901@perathoner.de> Bowerbird@aol.com wrote: on Nov 6, 2003: > since my program uses project gutenberg's > already-existing plain-ascii e-text files, > nobody has to waste any time marking them up. > just drop them on the app and you've got instant e-book. on Dec 21, 2005: > as for the p.g. e-texts, most of them are > already _in_ z.m.l. format, or close to it, > so "remastering them" in z.m.l. will be > a task so simple a 4th-grader can do it, I see that you are making big progress. In less than 2 years of development time you realized that the "instant e-book" feature was a chimera. I wonder, does your program come with the 4th-grader included? > but i'm still waiting to do that "remastering" > until you markup fools throw away a lot more > of your precious time and energy. Do that and while you are waiting we markup fools will conquer the world. Thank you for your patience. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Thu Dec 22 09:44:13 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Dec 22 09:44:23 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... Message-ID: <211.fb98f49.30dc3fed@aol.com> if you create a .zml master -- which is a heckuva lot faster and easier than creating a .tei master -- you don't have to worry about the html or pdf, as they can be generated automatically... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051222/c7b3bfe0/attachment.html From Bowerbird at aol.com Thu Dec 22 09:58:39 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Dec 22 09:58:50 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... Message-ID: <46.7774cf02.30dc434f@aol.com> ah, marcello, where ya been, buddy? i've missed your constant commentary! the o.e.b. announcement was a joke, designed to show the noring/rothman spin machine would even promote _me_ if i only said those magic three letters... sure enough, they did. i laughed out loud. but hey, i've explained all that before, so i'll step off the marcello merry-go-round. have a happy holiday season! -bowerbird p.s. hey marcello, can you give me the u.r.l. to that "fansite" you put up "honoring" me? i'm trying to convince wikipedia to let me put up my bio of me, which states that i am "an asshole with far too many opinions"... but they consider it to be "not n.p.o.v." -- "neutral point of view" for newbies -- (which is nonsense of course, because why would i call myself an "asshole" if it wasn't true, seems to me this would be a classic case _illustrating_ neutral p.o.v.) -- so your webpage will go a long way to help make my case with them. thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051222/f37eee93/attachment.html From jon at noring.name Thu Dec 22 10:15:43 2005 From: jon at noring.name (Jon Noring) Date: Thu Dec 22 10:15:54 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <46.7774cf02.30dc434f@aol.com> References: <46.7774cf02.30dc434f@aol.com> Message-ID: <442718147.20051222111543@noring.name> Bowerbird wrote: > p.s.? hey marcello, can you give me the u.r.l. > to that "fansite" you put up "honoring" me? > i'm trying to convince wikipedia to let me > put up my bio of me, which states that i am > "an asshole with far too many opinions"... http://www.gnutenberg.de/bowerbird/ Hope this helps you get your Wikipedia auto-bio! Jon From hart at pglaf.org Thu Dec 22 10:26:44 2005 From: hart at pglaf.org (Michael Hart) Date: Thu Dec 22 10:26:46 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: References: <237.410b1c1.30da3e08@aol.com> Message-ID: On Tue, 20 Dec 2005, Wally Thompson wrote: > On 12/20/05, Gutenberg9443@aol.com wrote: >> >> In a message dated 12/20/2005 7:12:21 P.M. Mountain Standard Time, >> wally.thompson@gmail.com writes: >> >> I'm working on a book that has poetry within the text. Sometimes a poem >> ends the paragraph and sometimes it does not. So I'm wondering how to >> handle this in the text file without leaving it ambiguous. >> >> Wally, could you provide a brief example? I don't do much with >> proofreading on PG, but I am a book publisher. If I saw an example in which >> the poem ends the paragraph and the poem does not end the paragraph, it >> would help. Are you saying that in some cases the poem is within the >> paragraph and the paragraph continues after the poem, or that the paragraph >> ends but the poem carries over to the next paragraph? I've seen both, and of >> course they're handled differently. >> >> Anne >> > > Here is example 1, where a new paragraph begins directly after the poem: > > http://cdl.library.cornell.edu/cgi-bin/moa/pageviewer?frames=1&coll=moa&view=50&root=%2Fmoa%2Fatla%2Fatla0018%2F&tif=00706.TIF&cite=http%3A%2F%2Fcdl.library.cornell.edu%2Fcgi-bin%2Fmoa%2Fmoa-cgi%3Fnotisid%3DABK2934-0018-102 > > Here is example 2, where the paragraph continues after the poem: > > http://cdl.library.cornell.edu/cgi-bin/moa/pageviewer?frames=1&coll=moa&view=50&root=%2Fmoa%2Fatla%2Fatla0018%2F&tif=00739.TIF&cite=http%3A%2F%2Fcdl.library.cornell.edu%2Fcgi-bin%2Fmoa%2Fmoa-cgi%3Fnotisid%3DABK2934-0018-105 > > These examples come from the Atlantic Monthly, December 1866. The Gutenberg > Ebook is located at http://www.gutenberg.org/files/17217/17217-8.txt. > > The Gutenberg Ebook formats both examples in the same way. So the reader of > the ebook faces an ambiguity as to whether or not a new paragraph begins > after the poem. In the book that I'm working on, I'm facing the same issue. > I would like to make it clear to the reader weather or not a new paragraph > begins after a poem. But I would also like to be consistent with other > Gutenberg Ebooks. So I'm wondering how others have dealt with this problem. > > Wally We used to just put in an extra blank line to make things like this obvious to our readers. Some would even put in some kind of marker, such as an * Michael From marcello at perathoner.de Thu Dec 22 10:43:33 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Dec 22 10:43:30 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... In-Reply-To: <46.7774cf02.30dc434f@aol.com> References: <46.7774cf02.30dc434f@aol.com> Message-ID: <43AAF3D5.4060007@perathoner.de> Bowerbird@aol.com wrote: > the o.e.b. announcement was a joke, > designed to show the noring/rothman > spin machine would even promote _me_ > if i only said those magic three letters... ... the grapes are sour. > p.s. hey marcello, can you give me the u.r.l. > to that "fansite" you put up "honoring" me? > i'm trying to convince wikipedia to let me > put up my bio of me, which states that i am > "an asshole with far too many opinions"... http://www.google.com/local?hl=en&lr=&q=psychiatrists&near=Los+Angeles,+CA&sa=X&oi=localr -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Thu Dec 22 10:53:25 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Dec 22 10:53:34 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... Message-ID: jon said: > Hope this helps you get your Wikipedia auto-bio! thanks jon! i hope it does too! not that i care about having my "bio" included; but given all the hullabaloo lately on the topic, i thought the wiki world could use a little levity. i'm hoping that somebody will edit the page to say i was involved in the kennedy assassinations. (hint: now you know what i want for christmas!) ;+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051222/85a05b70/attachment.html From Bowerbird at aol.com Thu Dec 22 11:11:10 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Dec 22 11:11:29 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... Message-ID: <215.105a3747.30dc544e@aol.com> everyone who has followed my posts at _any_ time during the last 7 years would know that i would _never_ create an o.e.b. viewer-program. nor would i ever call such a program "an end-user presentation agent", or whatever technoidese that i wrote. little did i realize how long i would be laughing about that little in-joke. years later, it's still working its magic. but hey, you know what, maybe i _will_ start a sourceforge page for that thing. like so many other o.e.b. open-source efforts, it would just die on the vine, but when that's the _object_ of the exercise, it wouldn't seem like a miserable failure, now, would it? i bet it's easier to put up a sourceforge page than a wikipedia bio, with none of those pesky "fact-checkers" snooping around to ruin all of the fun... -bowerbird p.s. spellchecker, on encountering "o.e.b.", suggests "m.b.a." as replacement. how fitting! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051222/d4c10aa0/attachment.html From Bowerbird at aol.com Thu Dec 22 11:50:48 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Dec 22 11:51:06 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after p... Message-ID: jon said: > I guess you have rejected it. nah. your "praise" had no effect on me. what would it mean to be an "expert" on a 4th-grade topic anyway? :+) i just wanted you to realize that i was immune to the dale carnegie approach, so you wouldn't get frustrated it didn't work. > You know, if you instead worked *with* people > (rather than viewing everyone else as enemies > that you must outwit), and started a SourceForge project, > assembled like-minded people, you'd be much further along, > and would probably have won a lot more hearts and minds. i have no desire to "win" hearts or minds. or to "win" friends or "influence" people... i am merely a messenger. for those who will listen... > First, ZML is not sufficient for mastering texts for > multiple digital publication purposes. For example, > you've not bothered to address how in ZML you will > enable standardized intra- and inter-publication > deep linking, a powerful ebook function and there you go again. and this is why i say you don't know jack about z.m.l. when you see how finely-grained i can "deep-link", and realize what you would have to do to get that functionality with your heavy markup, you'll get a knot in your gut. and i'll be able to give people a program that will do it _right_now_, today, even on trailing-edge machinery, while your methodology will necessitate telling them they must use browser x, after buying a new machine. deep-linking is ridiculously easy with permanent url's. and, coincidentally, absolutely impossible without them. > it will be ignored by the world. and you still don't see that i don't care. i really don't. i'm doing the world a favor, showing it a better way of doing something. if it is too stupid, or resentful, or stubborn, or whatever, to realize that, it will be no skin off my nose. i'll still be as fat and sassy as ever. > open-source as i've said before, any people who want to work on a z.m.l.-like open-source project should join with john gruber's "markdown". he's over at http://www.daringfireball.net. > moderation ockerbloom moderates every poster, while you make a special exception of moderating michael. > How many times have you replied to me > when you said you'd no longer reply to me? > I've rather enjoyed the (mostly) quiet times. i've enjoyed the (mostly) quiet times too, jon. one of my few resolutions for 2005 was to get off the noring merry-go-round. i did well, i think, wasting far less of my time there this year than i have in previous years. as you point out, i wasn't perfect, mostly since you seemed to want to provoke me by routinely misrepresenting z.m.l., but on the whole, i still did well. and i expect to waste even less time in your circles in the next year. (by the way, my big resolution for 2006 is to get off the michael hart anti-google merry-go-round, which wasted far too much of my time in 2005. i've already started practicing that one, and it is going well too.) and jon, i do believe that even you will get off of your own merry-go-round in 2006, because you will finally have some rubber-meets-road reality when thoutreader releases an openreader viewer. i've long said that your format-centric approach would become much stronger if it was tempered with input from a programmer on its weaknesses. (such input teaches format's relative irrelevance; what really matters is how you manipulate content, not so much the container wherein it is packaged.) and with the improvement in your confidence that an actual app will give you, you probably will not need to use the shrill argumentation of your past. because the proof really is in the pudding... at least we can hope you'll leave the preaching behind. and i too will be dealing more in pudding next year. it's been fun, and all, seeing just how long y'all would continue to insist that z.m.l. "cannot do the job", but it's time for me to stop toying with you, and give proof. at least, once you throw just a _little_ bit more money in the poker pot, in the form of your time and energy... :+) so here it is, another post on the noring merry-go-round, to commemorate 2005 quickly becoming a year of the past... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051222/90179b04/attachment-0001.html From joey at joeysmith.com Thu Dec 22 14:31:50 2005 From: joey at joeysmith.com (joey) Date: Thu Dec 22 14:33:53 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after poetry In-Reply-To: References: Message-ID: <20051222223150.GA7352@joeysmith.com> On Wed, Dec 21, 2005 at 02:04:59PM +0000, Dave Fawthrop wrote: > On Tue, 20 Dec 2005 19:05:31 -0700, Wally Thompson > wrote: > > | I'm working on a book that has poetry within the text. Sometimes a poem > | ends the paragraph and sometimes it does not. So I'm wondering how to > | handle this in the text file without leaving it ambiguous. > > When producing a .txt file it is easy. > > I insert spaces at the beginning of a line, just as I do when > programming. > > My paragraphs are just two consecutive new lines. > > IME tabs can be anything from 2 to 8 spaces and so often display > wrongly. > -- As somewhat of a poet, I'm curious: How do you distinguish between leading spaces that are part of the content (something I use regularly in my own poetry) and spaces that are meant to indicate line continuation (or lack thereof)? From hyphen at hyphenologist.co.uk Fri Dec 23 00:08:38 2005 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Fri Dec 23 00:09:07 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after poetry In-Reply-To: <20051222223150.GA7352@joeysmith.com> References: <20051222223150.GA7352@joeysmith.com> Message-ID: <8pbnq1ps389sj9534r94rin9c7vr53frjo@4ax.com> On Thu, 22 Dec 2005 15:31:50 -0700, joey wrote: | On Wed, Dec 21, 2005 at 02:04:59PM +0000, Dave Fawthrop wrote: | > On Tue, 20 Dec 2005 19:05:31 -0700, Wally Thompson | > wrote: | > | > | I'm working on a book that has poetry within the text. Sometimes a poem | > | ends the paragraph and sometimes it does not. So I'm wondering how to | > | handle this in the text file without leaving it ambiguous. | > | > When producing a .txt file it is easy. | > | > I insert spaces at the beginning of a line, just as I do when | > programming. | > | > My paragraphs are just two consecutive new lines. | > | > IME tabs can be anything from 2 to 8 spaces and so often display | > wrongly. | > -- | | As somewhat of a poet, I'm curious: How do you distinguish between leading spaces | that are part of the content (something I use regularly in my own poetry) and | spaces that are meant to indicate line continuation (or lack thereof)? I had forgotten to mention that :-( I put several extra spaces before the line continuation parts. The number of spaces depends on the layout of the poem, but I have not failed to visually disambiguate the various uses of spaces. -- Dave Fawthrop "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From joey at joeysmith.com Fri Dec 23 02:13:42 2005 From: joey at joeysmith.com (joey) Date: Fri Dec 23 02:15:59 2005 Subject: [gutvol-d] differentiating between indentation and no indentation after poetry In-Reply-To: <8pbnq1ps389sj9534r94rin9c7vr53frjo@4ax.com> References: <20051222223150.GA7352@joeysmith.com> <8pbnq1ps389sj9534r94rin9c7vr53frjo@4ax.com> Message-ID: <20051223101342.GA32132@joeysmith.com> On Fri, Dec 23, 2005 at 08:08:38AM +0000, Dave Fawthrop wrote: > On Thu, 22 Dec 2005 15:31:50 -0700, joey wrote: > > | > | As somewhat of a poet, I'm curious: How do you distinguish between leading spaces > | that are part of the content (something I use regularly in my own poetry) and > | spaces that are meant to indicate line continuation (or lack thereof)? > > I had forgotten to mention that :-( I put several extra spaces > before the line continuation parts. The number of spaces depends on > the layout of the poem, but I have not failed to visually disambiguate > the various uses of spaces. > Thanks for replying! Can you explain in more detail? I'd be willing to send you one of my poems to have you markup if that would be helpful for demontration purposes. If we need to take this off-list, that's fine, because I'm not sure it involves PG as a whole. From gbnewby at pglaf.org Sat Dec 24 20:18:15 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Sat Dec 24 20:18:17 2005 Subject: [gutvol-d] www.gutenberg.org In-Reply-To: <43A5C5CA.7020105@bigpond.net.au> References: <414C727B.6080406@perathoner.de> <43A5C5CA.7020105@bigpond.net.au> Message-ID: <20051225041815.GC29640@pglaf.org> On Mon, Dec 19, 2005 at 07:25:46AM +1100, Michael Ciesielski wrote: > It's been over a year since Marcello announced the change from > gutenberg.net to gutenberg.org. Is there any particular reason why the > PG header and footer inserted in newly posted texts refers to > gutenberg.net in several places? > > Thanks! > Mike I thought we had changed this, too.... I've just updated the header template files on pglaf.org to specify gutenberg.org rather than gutenberg.net (there are a few that leave out the 'www.', but I didn't try to fix them). Some of the whitewashers use addhd on their home PCs, rather than makehead10k (on the pglaf.org server). So, they'll also need to update their headers, perhaps with help from Jim Tinsley when he's back from vacation. Thanks for pointing this out. -- Greg > Marcello Perathoner, on 19/09/2004 03:38, wrote: > > >As of my request ibiblio has changed our apache virtual host > >ServerName from gutenberg.net to www.gutenberg.org. > > > >What? > > > >Nothing changes in web site operation except one little particular (if > >you hit a non-existing url you get redirected to www.gutenberg.org > >instead of gutenberg.net). > > > > > >Why? > > > >The .org top-level domain is meant for non-profit organizations while > >the .net domain is meant for network infrastructure such as providers, > >backbones etc. > > > >We own both gutenberg.net and gutenberg.org, but using the latter one > >is just more standard-compliant. > > > >Although both urls www.gutenberg.net and www.gutenberg.org give > >exactly the same results, you should start using www.gutenberg.org in > >all publications, papers etc. > > > > > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From Bowerbird at aol.com Sun Dec 25 09:08:17 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Dec 25 09:08:27 2005 Subject: [gutvol-d] merry christmas 2005 -- the good, the bad, and the surprising Message-ID: <20a.fd89379.30e02c01@aol.com> merry christmas 2005! the bad news is that distributed proofreaders got no present in their stocking this year... :+( the good news is that they didn't get a lump of coal like they did last year... :+) the surprising news is that michael hart -- for the first year in anyone's memory -- got no present. some people speculate that google stole michael's present, but perhaps he just wasn't as good this year as he has been in previous years? maybe saint nicholas just skipped over the p.g./d.p. houses this year... but i'm sure he'll be back next year! happy 2006! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051225/7958fca1/attachment.html From nwolcott at dsdial.net Mon Dec 26 09:16:58 2005 From: nwolcott at dsdial.net (N Wolcott) Date: Mon Dec 26 09:20:51 2005 Subject: [gutvol-d] copyright vs creative commons Message-ID: <000a01c60a40$57bc9800$ce9495ce@gw98> Are there any opinions about the suitability of creative commons copyright vs regular copyright when the intent is to allow unlimited personal and non-profit non-commercial use of the works. That is one of the cc options, but I have read that the use of a cc copyright might bring an unwanted third party into any copyright disputes, and might further limit the author's rights. The suggestion was made that making a statement on the copyright page that copying for non commercial etc would be approved on request would do the same thing with fewer potential complications. comments? N Wolcott nwolcott2@post.harvard.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051226/a4306229/attachment.html From Bowerbird at aol.com Mon Dec 26 13:37:33 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Dec 26 13:37:52 2005 Subject: [gutvol-d] the secret garden, revisited Message-ID: <15.528c2f95.30e1bc9d@aol.com> i see "the secret garden" was posted as #17396. this is a re-digitization of a very early e-text -- #113. as i've said before, different digitizations can serve as a check on each other, and therefore boost accuracy _considerably_. i did a check of a preliminary d.p. version against the project gutenberg #113, and have posted my z.m.l. distillation here: > http://snowy.arsc.alaska.edu/bowerbird/2005cleanup/tsg4.zml (in-progress versions tsg1 through tsg3 are also there.) i will check this final posted p.g./d.p. version against my distillation and see what pops up... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051226/571d102d/attachment.html From Bowerbird at aol.com Mon Dec 26 13:47:34 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Dec 26 13:47:46 2005 Subject: [gutvol-d] the secret garden, revisited Message-ID: <192.4ec68928.30e1bef6@aol.com> i said: > i will check this final posted p.g./d.p. version > against my distillation and see what pops up... well, i will when it shows up, anyway... :+) (i guess the "posted" digest is a bit premature; takes a little time for the e-text page to go live.) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051226/beca5ae6/attachment.html From gbnewby at pglaf.org Mon Dec 26 18:21:18 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Dec 26 18:21:20 2005 Subject: [gutvol-d] the secret garden, revisited In-Reply-To: <192.4ec68928.30e1bef6@aol.com> References: <192.4ec68928.30e1bef6@aol.com> Message-ID: <20051227022118.GB27935@pglaf.org> On Mon, Dec 26, 2005 at 04:47:34PM -0500, Bowerbird@aol.com wrote: > i said: > > i will check this final posted p.g./d.p. version > > against my distillation and see what pops up... > > well, i will when it shows up, anyway... :+) > > (i guess the "posted" digest is a bit premature; > takes a little time for the e-text page to go live.) - The email to posted@lists.pglaf.org (http://lists.pglaf.org) go out when the file is formatted and 'staged' for pushing to the servers - Every hour at :20, staged files are uploaded to ibiblio.org (aka gutenberg.org/gutenberg.net) and archive.org At that point, the files are accessible if you know how to navigate to them. For example, #17392 could be accessed as http://www.gutenberg.org/dirs/1/7/3/9/17392 - Every morning (I think at 0600 EST) the catalog is updated with any changed files or newly added files At that point, you can find the item in the catalog, and get to it via the "canonical" path: http://www.gutenberg.org/etext/17392 - Mirrors have their own schedule, but most will pick up new files within 24 hours of uploading -- Greg From darrenburnhill at hotmail.com Tue Dec 27 03:43:55 2005 From: darrenburnhill at hotmail.com (Darren Burnhill) Date: Tue Dec 27 04:01:19 2005 Subject: [gutvol-d] copyright clearances In-Reply-To: <20051226200003.D08878C15F@pglaf.org> Message-ID: Hi, Does PG have any future plans to ascertain the status of books for countries other than the U.S.? Whilst I think that clarifying the status on the download page is good; “Not copyrighted in the United States. If you live elsewhere check the laws of your country before downloading this ebook.” A link to some relevant information would be better. BTW: The “submit the item” link on the ‘copyright-howto’ page needs updating; http://beryl.ils.unc.edu/copy.html From sly at victoria.tc.ca Tue Dec 27 10:17:41 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Dec 27 10:17:49 2005 Subject: [gutvol-d] copyright clearances In-Reply-To: References: Message-ID: I would say the answer to your first question is no. See Faq C4, Why does Project Gutenberg advise only on U.S. copyright issues? http://www.gutenberg.org/faq/C-4 I have found that copyright is a set of issues that continually gets more complex the closer you look into it. Copyright is not actually one "right", but a bundle of separate rights. Every country has its own laws which often differ in the small details, and laws can be updated, modified, etc. at any time. It could be a liability for PG to make claims about the copyright status of items in different countries, if the information that claim was based on turns out to be wrong, or outdated. If it helps to realise it, I suspect that one thing that led to the recent change of wording that you noticed was a challenge that I received through the catalog error email. (Which I forwarded on to others.) The gist of the message was "This text is under copyright in my country, you should remove it now or you will risk legal action." Andrew On Tue, 27 Dec 2005, Darren Burnhill wrote: > Hi, > > Does PG have any future plans to ascertain the status of books for countries > other than the U.S.? > > Whilst I think that clarifying the status on the download page is good; > > ?Not copyrighted in the United States. If you live elsewhere check the laws > of your country before downloading this ebook.? > > A link to some relevant information would be better. > > BTW: The ?submit the item? link on the ?copyright-howto? page needs > updating; > > http://beryl.ils.unc.edu/copy.html > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From scott_bulkmail at productarchitect.com Tue Dec 27 11:19:03 2005 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Tue Dec 27 11:32:18 2005 Subject: [gutvol-d] copyright vs creative commons In-Reply-To: <000a01c60a40$57bc9800$ce9495ce@gw98> References: <000a01c60a40$57bc9800$ce9495ce@gw98> Message-ID: >Are there any opinions about the suitability of creative commons copyright vs regular copyright when the intent is to allow unlimited personal and non-profit non-commercial use of the works. IANAL and may be completely wrong, but I think the two are somewhat different. You can retain full *ownership* of the copyright and choose to make works available under a CC *license*. > That is one of the cc options, but I have read that the use of a cc copyright might bring an unwanted third party into any copyright disputes, and might further limit the author's rights. I don't know whether a CC license is considered "revokable" -- so that's one thing you might give up vs. just granting a license on a case-by-case basis. AND I haven't read (or looked for) any of the critiques of CC that you cite. I think the advantage of CC is that it's simple and well-defined; that's hard to beat with a hand-crafted license. There is legal muscle behind CC, and I personally would recommend it. They clearly learned from the various "open source" software licenses and encapsulated the options without getting dragged into the fight between the various factions. -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting From Bowerbird at aol.com Tue Dec 27 11:34:53 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Dec 27 11:35:04 2005 Subject: [gutvol-d] copyright clearances Message-ID: <231.45a676c.30e2f15d@aol.com> darren said: > A link to some relevant information would be better. such as a referral service to lawyers for consultation? because that's what the world seems to be coming to... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051227/ede47f71/attachment.html From Bowerbird at aol.com Tue Dec 27 11:58:28 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Dec 27 11:58:39 2005 Subject: [gutvol-d] the secret garden, revisited Message-ID: <192.4ed27ac5.30e2f6e4@aol.com> greg said: > At that point, the files are accessible > if you know how to navigate to them. > For example, #17392 could be accessed as > http://www.gutenberg.org/dirs/1/7/3/9/17392 oh right, crap, i should have remembered that... so i looked at the version of "the secret garden" that was just created by distributed proofreaders, comparing it to the distillation copy i had prepared. (which itself had resulted from the original e-text of this book that is already in p.g., as e-text #113, being compared with the smoothreading version, so now i was comparing a comparison to a comparison. just so you don't get confused...) ;+) and d.p. did a _very_ good job on it. congratulations... however, i still show 5 outright errors in your version, the details of which i have appended to this message... (the top line in each pair is your line, with the error being followed by the large gap before the next word.) plus there are also 5 special cases outlined below them, where i chose to make edits. your mileage might vary... (but you _did_ make highly similar edits, for consistency.) as a point of reference for bystanders, there were approximately 250-300 points of difference between the d.p.-processed text and the original p.g. e-text, so 5-10 errors in boiling them down isn't _that_ many. 5-10 errors on a book would be _excellent_ on a first-pass. it's respectable even on a comparison project like this one. although perfection _is_ within the realm of obtainability, you've attained a very high accuracy-mark here, especially since your comparison tools probably aren't up to snuff yet. but the one thing that _does_ give me pause on some of the things is that they _should_ have been caught by your tools. any missing end-paragraph terminating-punctuation (page 31) should be caught. ditto a continuation-quote (page 108, but perhaps you were "matching the scan" and passed on that). and a good tool will detect inconsistent and/or irregular usage, of the type that is represented in the additional edits i made... so a comparison tool should eventually find all of your errors here (mine did), but even your _regular_ tools should've caught _some_! so i congratulate you for a job well-done, but recommend you bring your tools up to speed, and then you'll realize perfection, at least on these re-do projects, and maybe first-passes too... anyway, perhaps for 2006, one of your resolutions should be not to alienate any future tool-makers... and in sum, this double-digitization shows nicely how this approach can drive an e-text to perfection, probably better than any other... just in passing, i will note that the postprocessor on this book (miller) must have applied a lot of elbow-grease doing the comparison, because the version that went out for smooth-reading had a _lot_ more errors. there has been a lot of upheaval over at d.p. as a result of the june move to four rounds, but still only two of those rounds are for actual proofing. (the other two are for formatting.) so the primary difference is that the second proofing round is done by a proofer tested to be "more-qualified". i don't know if this text went through the new system or not, but if it did, it does _not_ speak well for it. i highly recommend that d.p. use a _third_ round of proofing. (or even better, adopt the recommendation that i made a while back for a "roundless" system that uses _consensus_ as the factor that promotes a page from the proofing rounds into the formatting rounds.) also, it would be nice if -- when d.p. does a "re-do" project like this one -- d.p. would package up the various iterations of the process and make 'em available to researchers like myself so that we could examine the output at the different stages and do work on improving the overall results... so how 'bout it, d.p. people, will you furnish that material on this book? (indeed, if you could package up the results of each round into a .zip file, for _every_ book you do, that would be a great resource for researchers.) -bowerbird ------------------------------------------------------------------------------ ------------------------------ these are your errors, by the scans and/or common sense: p. 31 also. "That there?" she said "Yes." "That's also. "That there?" she said. "Yes." "That's p. 129 blades. "There's lots o' 'dead wood as blades. "There's lots o' dead wood as p. 187 told her anything," said Colin, "She heard told her anything," said Colin. "She heard p. 222 out between two sobs: "Sh--show her! She--she'll see then!" out between two sobs: "Sh-show her! She-she'll see then!" p. 287 "It is my garden now, I am "It is my garden now. I am ------------------------------------------------------------------------------ ------------------------------ and here are 5 edits, contra-scans, that i made for consistency: p. 91 an' I was in practice." Mary got an' I was in practise." Mary got p. 108 to her: "_My Dear Dickon:_ This comes hoping to her: _"My Dear Dickon:_ "This comes hoping p. 109 bit o' mother's hot oat cake, an' butter, bit o' mother's hot oat-cake, an' butter, p. 254 said Mary quite seriously. "An tha' munnot said Mary quite seriously. "An' tha' munnot p. 326 got into my throat." "But" she said got into my throat." "But," she said ------------------------------------------------------------------------------ ------------------------------ there was also this edit, from the earlier p.g. version, that i liked, so i kept it in my version as a tribute to the original type-in e-text. p. 108 him?" asked Martha suddenly, she had looked him?" asked Martha suddenly, for Mary had looked ------------------------------------------------------------------------------ ------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051227/0ae24cf0/attachment-0001.html From hart at pglaf.org Tue Dec 27 11:59:42 2005 From: hart at pglaf.org (Michael Hart) Date: Tue Dec 27 11:59:44 2005 Subject: [gutvol-d] copyright clearances In-Reply-To: <231.45a676c.30e2f15d@aol.com> References: <231.45a676c.30e2f15d@aol.com> Message-ID: On Tue, 27 Dec 2005 Bowerbird@aol.com wrote: > darren said: >> A link to some relevant information would be better. > > such as a referral service to lawyers for consultation? > > because that's what the world seems to be coming to... > > -bowerbird > A few words from one of our copyright advisors should serve to settle things down: US Copyright *** Under Section 506(a)(1) of the 1976 Act and 18. . .2319 anyone "who infringes a copyright willfully and for purpose of commercial advantage. . .or private financial gain" is subject to felony or misdemeanor punishment. In addition, following U.S. v. La Macchia (quashing prosecution of a computer bulletin board operator who distributed free unauthorized copies of commercial software ot subscribers) the 1997 "No Electronic Theft" legislation made willful infringement a crime even if undertaken without a profit motive. It has been held that only the copyright statutes cited above may be used to prosecute infringers, so fanciful interpretations of other laws on theft need not be feared. Given the precautions Gutenberg takes to be sure it copies only public domain works it is highly unlikely that willfullness could ever be established. From hart at pglaf.org Tue Dec 27 12:03:07 2005 From: hart at pglaf.org (Michael Hart) Date: Tue Dec 27 12:03:09 2005 Subject: [gutvol-d] copyright vs creative commons In-Reply-To: References: <000a01c60a40$57bc9800$ce9495ce@gw98> Message-ID: On Tue, 27 Dec 2005, Scott Lawton wrote: >> Are there any opinions about the suitability of creative commons copyright >> vs regular copyright when the intent is to allow unlimited personal and >> non-profit non-commercial use of the works. > > IANAL and may be completely wrong, but I think the two are somewhat > different. You can retain full *ownership* of the copyright and choose to > make works available under a CC *license*. Same with the PG license. >> That is one of the cc options, but I have read that the use of a cc >> copyright might bring an unwanted third party into any copyright disputes, >> and might further limit the author's rights. > > I don't know whether a CC license is considered "revokable" -- so that's one > thing you might give up vs. just granting a license on a case-by-case basis. The PG license specifially states "irrevocable" > AND I haven't read (or looked for) any of the critiques of CC that you cite. > I think the advantage of CC is that it's simple and well-defined; that's hard > to beat with a hand-crafted license. There is legal muscle behind CC, and I > personally would recommend it. They clearly learned from the various "open > source" software licenses and encapsulated the options without getting > dragged into the fight between the various factions. -- The PG license is speficially tailored to eBooks, and has many more years of history behind it. . .precedent counts a lot in our laws. mh From prosfilaes at gmail.com Tue Dec 27 12:06:49 2005 From: prosfilaes at gmail.com (David Starner) Date: Tue Dec 27 12:06:58 2005 Subject: [gutvol-d] copyright clearances In-Reply-To: References: <20051226200003.D08878C15F@pglaf.org> Message-ID: <6d99d1fd0512271206s5b7fbc6fqdb652a4499487690@mail.gmail.com> > "Not copyrighted in the United States. If you live elsewhere check the laws > of your country before downloading this ebook." I think it would be better not to use the imperative here. IMO, it would be better to say something like "Project Gutenberg is not concerned with and does not know the copyright status in other nations." I don't know the exact wording, but I think we should just state the facts. From Bowerbird at aol.com Tue Dec 27 12:16:33 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Dec 27 12:16:45 2005 Subject: [gutvol-d] copyright clearances Message-ID: <291.30c0bdf.30e2fb21@aol.com> michael said: > A few words from one of our copyright advisors > should serve to settle things down: > US Copyright except the original poster was talking about _outside_ the u.s. i think everyone grants that _inside_ the u.s., project gutenberg's research carries the day. after all, you still haven't been sued, not even once, have you? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20051227/fb818a3e/attachment.html From gbnewby at pglaf.org Tue Dec 27 12:26:33 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Tue Dec 27 12:26:34 2005 Subject: [gutvol-d] copyright clearances In-Reply-To: References: Message-ID: <20051227202633.GB10739@pglaf.org> > On Tue, 27 Dec 2005, Darren Burnhill wrote: > > > Hi, > > > > Does PG have any future plans to ascertain the status of books for countries > > other than the U.S.? It's not really possible for us to do this. We've interacted with world *experts* in copyright, and for many items there is no simple answer. Many countries don't even have a concept of "fair use," making it hard to offer guidance on what items are "safe" to view even if they're copyrighted. > > Whilst I think that clarifying the status on the download page is good; > > > > ?Not copyrighted in the United States. If you live elsewhere check the laws > > of your country before downloading this ebook.? > > > > A link to some relevant information would be better. We decided against this because it would make it look like we're trying to give copyright advice outside of our expertise. > > BTW: The ?submit the item? link on the ?copyright-howto? page needs > > updating; > > > > http://beryl.ils.unc.edu/copy.html Fixed! I also added two links to further information in the copyright HOWTO: http://www.gutenberg.org/howto/copyright-howto One link to John Ockerbloom's copyright FAQ (it lists countries & their terms); another to the kingkong New General Catalog of birth/death dates -- both are quite useful for determining likely copyright status for items outside of the US. But I don't want to add those same links to every download page, lest it appear that we're trying to give guidance. (Also, we already get way too many questions about foreign copyright, that we cannot answer.) -- Greg From darrenburnhill at hotmail.com Wed Dec 28 01:06:51 2005 From: darrenburnhill at hotmail.com (Darren Burnhill) Date: Wed Dec 28 01:07:11 2005 Subject: [gutvol-d] copyright clearances In-Reply-To: <20051227200005.A814A8C832@pglaf.org> Message-ID: Hi, Thanks for the (helpful) replies. I appreciate that it is a complicated issue and should have therefore stated that what I meant was just a link to some general information/pointers to set people on the right track. Is there anyone looking after the interests of the Brits?, as I have (potentially) quite a few books to submit and 24/7/365 as free time. From joshua at hutchinson.net Wed Dec 28 05:22:16 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Dec 28 05:22:19 2005 Subject: [gutvol-d] copyright clearances Message-ID: <20051228132216.E25371099A7@ws6-4.us4.outblaze.com> Distributed Proofreaders Europe is set up to handle the different copyright terms of countries in Europe. You might want to give them a try. I'm sure they could use plenty of volunteer help! http://dp.rastko.net/ Josh ----- Original Message ----- From: "Darren Burnhill" To: gutvol-d@lists.pglaf.org Subject: [gutvol-d] copyright clearances Date: Wed, 28 Dec 2005 09:06:51 +0000 > > Hi, > > Thanks for the (helpful) replies. I appreciate that it is a complicated issue > and should have therefore stated that what I meant was just a link to some > general information/pointers to set people on the right track. > > Is there anyone looking after the interests of the Brits?, as I have > (potentially) quite a few books to submit and 24/7/365 as free time. > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From hyphen at hyphenologist.co.uk Wed Dec 28 06:08:31 2005 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Wed Dec 28 06:08:45 2005 Subject: [gutvol-d] copyright clearances In-Reply-To: <20051228132216.E25371099A7@ws6-4.us4.outblaze.com> References: <20051228132216.E25371099A7@ws6-4.us4.outblaze.com> Message-ID: <3r65r1lo6pe7gnjo8ieart53i8rsmv1qi2@4ax.com> On Wed, 28 Dec 2005 08:22:16 -0500, "Joshua Hutchinson" wrote: | Distributed Proofreaders Europe is set up to handle | the different copyright terms of countries in Europe. | You might want to give them a try. | I'm sure they could use plenty of volunteer help! I have just thought. The books I do were published in England and are out of copyright to both US and UK/EU rules. So which organisation should I submit them to? -- Dave Fawthrop Some of my Hobbies: VDU Glasses http://tinyurl.com/c3lh, Wordlists http://tinyurl.com/c3lj, Celtic fonts http://tinyurl.com/c3ll, Killfile&Anti Troll FAQs http://tinyurl.com/c3lo Tyke Dialect http://tinyurl.com/c3ls Curry Project, http://tinyurl.com/1q6 From jon.ingram at gmail.com Wed Dec 28 06:50:13 2005 From: jon.ingram at gmail.com (Jon Ingram) Date: Wed Dec 28 06:50:16 2005 Subject: [gutvol-d] copyright clearances In-Reply-To: <3r65r1lo6pe7gnjo8ieart53i8rsmv1qi2@4ax.com> References: <20051228132216.E25371099A7@ws6-4.us4.outblaze.com> <3r65r1lo6pe7gnjo8ieart53i8rsmv1qi2@4ax.com> Message-ID: <4baf53720512280650r551773b3x26443fc9187cabd6@mail.gmail.com> On 12/28/05, Dave Fawthrop wrote: > On Wed, 28 Dec 2005 08:22:16 -0500, "Joshua Hutchinson" > wrote: > > | Distributed Proofreaders Europe is set up to handle > | the different copyright terms of countries in Europe. > | You might want to give them a try. > | I'm sure they could use plenty of volunteer help! > > I have just thought. The books I do were published in England and > are out of copyright to both US and UK/EU rules. > > So which organisation should I submit them to? I submit books which are out of copyright in the US to the main DP, which filters into the US Project Gutenberg. All the non-US PG siblings are very small, new operations, and have no set procedures for actually posting books. -- Jon Ingram From joshua at hutchinson.net Wed Dec 28 10:59:09 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Dec 28 10:59:17 2005 Subject: [gutvol-d] copyright clearances Message-ID: <20051228185910.1BAF6109AB7@ws6-4.us4.outblaze.com> If they are in English, I'd go with PGDP (they have more volunteer power). If in another language, DP-Europe is better equipped. Josh ----- Original Message ----- From: "Dave Fawthrop" > > On Wed, 28 Dec 2005 08:22:16 -0500, "Joshua Hutchinson" > wrote: > > | Distributed Proofreaders Europe is set up to handle > | the different copyright terms of countries in Europe. > | You might want to give them a try. > | I'm sure they could use plenty of volunteer help! > > I have just thought. The books I do were published in England and > are out of copyright to both US and UK/EU rules. > > So which organisation should I submit them to? > -- > Dave Fawthrop Some of my Hobbies: VDU Glasses > http://tinyurl.com/c3lh, Wordlists http://tinyurl.com/c3lj, Celtic fonts > http://tinyurl.com/c3ll, Killfile&Anti Troll FAQs http://tinyurl.com/c3lo > Tyke Dialect http://tinyurl.com/c3ls Curry Project, http://tinyurl.com/1q6 > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From prosfilaes at gmail.com Wed Dec 28 12:11:38 2005 From: prosfilaes at gmail.com (David Starner) Date: Wed Dec 28 12:11:47 2005 Subject: [gutvol-d] copyright clearances In-Reply-To: <20051228185910.1BAF6109AB7@ws6-4.us4.outblaze.com> References: <20051228185910.1BAF6109AB7@ws6-4.us4.outblaze.com> Message-ID: <6d99d1fd0512281211v554cc5besf9885026899e69f1@mail.gmail.com> On 12/28/05, Joshua Hutchinson wrote: > If they are in English, I'd go with PGDP (they have more volunteer power). If in another language, DP-Europe is better equipped. I don't think that's universally true. Most Western European languages have a much larger body of users at PGDP than DP-Europe. The only exception off the top of my head is Icelandic. Even non-Latin-1 languages, like Esperanto or Middle English, do better at PGDP, because of an established body of proofers and it's easier for many people to transliterate instead of typing the Unicode characters. This goes double for the Native American languages that use a huge, odd, collection of accents and indiosyncratic characters. From gbnewby at pglaf.org Wed Dec 28 12:35:40 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Wed Dec 28 12:35:41 2005 Subject: [gutvol-d] copyright clearances In-Reply-To: <3r65r1lo6pe7gnjo8ieart53i8rsmv1qi2@4ax.com> References: <20051228132216.E25371099A7@ws6-4.us4.outblaze.com> <3r65r1lo6pe7gnjo8ieart53i8rsmv1qi2@4ax.com> Message-ID: <20051228203540.GC13369@pglaf.org> On Wed, Dec 28, 2005 at 02:08:31PM +0000, Dave Fawthrop wrote: > On Wed, 28 Dec 2005 08:22:16 -0500, "Joshua Hutchinson" > wrote: > > | Distributed Proofreaders Europe is set up to handle > | the different copyright terms of countries in Europe. > | You might want to give them a try. > | I'm sure they could use plenty of volunteer help! > > I have just thought. The books I do were published in England and > are out of copyright to both US and UK/EU rules. > > So which organisation should I submit them to? Just a quick note that between PG-EU and PG-US, we are both happy to have duplicate content. So, you could submit to both places. There is some duplication (or planned duplication, anyway) with PG-AU as well. -- Greg From nwolcott at dsdial.net Wed Dec 28 17:25:04 2005 From: nwolcott at dsdial.net (N Wolcott) Date: Wed Dec 28 17:39:14 2005 Subject: [gutvol-d] copyright clearances References: <20051228185910.1BAF6109AB7@ws6-4.us4.outblaze.com> Message-ID: <003001c60c18$73716ec0$e39495ce@gw98> I understood that PGEurope was not accepting unsolicited books yet. ?? N Wolcott nwolcott2@post.harvard.edu ----- Original Message ----- From: "Joshua Hutchinson" To: ; "Project Gutenberg Volunteer Discussion" Sent: Wednesday, December 28, 2005 1:59 PM Subject: Re: [gutvol-d] copyright clearances > If they are in English, I'd go with PGDP (they have more volunteer power). If in another language, DP-Europe is better equipped. > > Josh > > ----- Original Message ----- > From: "Dave Fawthrop" > > > > On Wed, 28 Dec 2005 08:22:16 -0500, "Joshua Hutchinson" > > wrote: > > > > | Distributed Proofreaders Europe is set up to handle > > | the different copyright terms of countries in Europe. > > | You might want to give them a try. > > | I'm sure they could use plenty of volunteer help! > > > > I have just thought. The books I do were published in England and > > are out of copyright to both US and UK/EU rules. > > > > So which organisation should I submit them to? > > -- > > Dave Fawthrop Some of my Hobbies: VDU Glasses > > http://tinyurl.com/c3lh, Wordlists http://tinyurl.com/c3lj, Celtic fonts > > http://tinyurl.com/c3ll, Killfile&Anti Troll FAQs http://tinyurl.com/c3lo > > Tyke Dialect http://tinyurl.com/c3ls Curry Project, http://tinyurl.com/1q6 > > > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d@lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From sly at victoria.tc.ca Wed Dec 28 19:00:55 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed Dec 28 19:01:10 2005 Subject: [gutvol-d] copyright clearances In-Reply-To: <6d99d1fd0512281211v554cc5besf9885026899e69f1@mail.gmail.com> References: <20051228185910.1BAF6109AB7@ws6-4.us4.outblaze.com> <6d99d1fd0512281211v554cc5besf9885026899e69f1@mail.gmail.com> Message-ID: On Wed, 28 Dec 2005, David Starner wrote: > goes double for the Native American languages that use a huge, odd, > collection of accents and idiosyncratic characters. Or for another way to look at it, the people who worked on transcribing these languages developed their own individual (sometimes idiosyncratic) systems for doing so. Andrew From darrenburnhill at hotmail.com Fri Dec 30 08:17:26 2005 From: darrenburnhill at hotmail.com (Darren Burnhill) Date: Fri Dec 30 08:17:31 2005 Subject: [gutvol-d] Dictionaries In-Reply-To: <20051229200002.E95578C83C@pglaf.org> Message-ID: Hi, Has anyone ever developed dictionaries to suit the checking of texts from different ages? From tb at baechler.net Fri Dec 30 10:02:50 2005 From: tb at baechler.net (Tony Baechler) Date: Fri Dec 30 10:08:42 2005 Subject: [gutvol-d] Internet Archive PG collection Message-ID: <5.2.0.9.0.20051230100013.009d1300@127.0.0.1> Hello all. I was browsing the Internet Archive texts and saw that they have a link to their own Project Gutenberg collection. When I followed it, they only listed 7,389 items. Why is this? There should be over 17,000, right? They are all available via ftp.archive.org I believe, so why aren't they listed on their web site? To see this, go to http://www.archive.org/ and go to texts. Select the Project Gutenberg link. Are things not being mirrored properly from the PG catalog? From jeroen.mailinglist at bohol.ph Fri Dec 30 11:34:50 2005 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Fri Dec 30 11:33:22 2005 Subject: [gutvol-d] Dictionaries In-Reply-To: References: Message-ID: <43B58BDA.8090000@bohol.ph> I wanted to do this for Dutch in the pre-1947 orthography, but it is a tremendous lot of work, so have not yet completed this work. I've collected about 100 megabytes of text in this orthography, and made an initial word list out of it, discarding anything that appears less than five times. Then have to match this to a modern word list, to fill in the gaps, then have to go through the entire list again to add all regular (grammatical) variants of each word, and filter out unwanted words, such as common misspellings and scarce words matching with common scannos or typos. I think I would like to set up a distributed word-list creator website to distribute this work. From a complete word-list to a spelling checker is easier, especially with open-office. After that, I have to do the exercise again for pre-1865 Dutch orthography. Jeroen. Darren Burnhill wrote: > Hi, > > Has anyone ever developed dictionaries to suit the checking of texts > from different ages? From walter.van.holst at xs4all.nl Fri Dec 30 11:41:26 2005 From: walter.van.holst at xs4all.nl (Walter van Holst) Date: Fri Dec 30 11:41:35 2005 Subject: [gutvol-d] Dictionaries In-Reply-To: <43B58BDA.8090000@bohol.ph> References: <43B58BDA.8090000@bohol.ph> Message-ID: <43B58D66.4080206@xs4all.nl> Jeroen Hellingman (Mailing List Account) wrote: > > I wanted to do this for Dutch in the pre-1947 orthography, but it is a > tremendous lot of work, so have not yet completed this work. I've > collected about 100 megabytes of text in this orthography, and made an > initial word list out of it, discarding anything that appears less > than five times. Then have to match this to a modern word list, to > fill in the gaps, then have to go through the entire list again to add > all regular (grammatical) variants of each word, and filter out > unwanted words, such as common misspellings and scarce words matching > with common scannos or typos. > It would be awfully nice if it would be possible to have lexical data included. There are plenty of dictionary files available, but none of them include lexical data. Regards, Walter From gbnewby at pglaf.org Fri Dec 30 11:53:41 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Fri Dec 30 11:53:41 2005 Subject: [gutvol-d] Internet Archive PG collection In-Reply-To: <5.2.0.9.0.20051230100013.009d1300@127.0.0.1> References: <5.2.0.9.0.20051230100013.009d1300@127.0.0.1> Message-ID: <20051230195341.GC23138@pglaf.org> On Fri, Dec 30, 2005 at 10:02:50AM -0800, Tony Baechler wrote: > Hello all. I was browsing the Internet Archive texts and saw that they > have a link to their own Project Gutenberg collection. When I followed it, > they only listed 7,389 items. Why is this? There should be over 17,000, > right? They are all available via ftp.archive.org I believe, so why aren't > they listed on their web site? > > To see this, go to http://www.archive.org/ and go to texts. Select the > Project Gutenberg link. Are things not being mirrored properly from the PG > catalog? I've bothered them about this repeatedly. I think there might be a note or link somewhere that mentions they only have a subset of the titles. What happened was, they had some summer interns who set up their site. It's modeled after an Alexa/Amazon-style referrer system, which is kind of nice. But it was all done "by hand," and doens't integrate new titles. Feel free to send them feedback directly with suggestions about how they can make it clearer that this is just a subset. Or, volunteer to do an internship at the Internet Archive next summer or spring, to fix it to auto-update!!! -- Greg