From bruce at zuhause.org Wed Mar 1 07:57:11 2006 From: bruce at zuhause.org (Bruce Albrecht) Date: Wed Mar 1 08:21:14 2006 Subject: [gutvol-d] Commercial paper editions of PG texts In-Reply-To: <2de.3175a87.31364202@aol.com> References: <2de.3175a87.31364202@aol.com> Message-ID: <17413.50263.444498.622315@celery.zuhause.org> Bowerbird@aol.com writes: > after all, they "own" that public-domain > material just as much as you or i "own" it. Yes, but PG doesn't claim a copyright on the PD material, unlike Kessenger and the others. It's a shame that they are able to hide a PD book behind a copyrighted cover. > no, i think the blame falls on _our_ shoulders, > because as the people dedicated to providing > full and free access to the public domain, we are > failing in our mission by not ensuring that google > has a no-pages-restricted entity in its book-search > for each and every public-domain book that they have. How many PD books have you found in Google Book Search that were not visible? Did you report them to Google? If not, some of the blame falls on _your_ shoulders. From greg at durendal.org Wed Mar 1 08:24:30 2006 From: greg at durendal.org (Greg Weeks) Date: Wed Mar 1 09:00:08 2006 Subject: [gutvol-d] Commercial paper editions of PG texts In-Reply-To: <17413.50263.444498.622315@celery.zuhause.org> References: <2de.3175a87.31364202@aol.com> <17413.50263.444498.622315@celery.zuhause.org> Message-ID: <Pine.LNX.4.63.0603011123300.10029@durendal.durendal.org> On Wed, 1 Mar 2006, Bruce Albrecht wrote: > How many PD books have you found in Google Book Search that were not > visible? Did you report them to Google? If not, some of the blame > falls on _your_ shoulders. I don't think it works this way. If the books are in there because the publisher added them and the publisher claims they are under copyright there is nothing you can do to change it. -- Greg Weeks http://durendal.org:8080/greg/ From hyphen at hyphenologist.co.uk Wed Mar 1 09:54:36 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Wed Mar 1 09:54:59 2006 Subject: [gutvol-d] Commercial paper editions of PG texts In-Reply-To: <Pine.LNX.4.63.0603011123300.10029@durendal.durendal.org> References: <2de.3175a87.31364202@aol.com> <17413.50263.444498.622315@celery.zuhause.org> <Pine.LNX.4.63.0603011123300.10029@durendal.durendal.org> Message-ID: <gonb02pgfj67ftl4vdqcndm86s20dof57j@4ax.com> On Wed, 1 Mar 2006 11:24:30 -0500 (EST), Greg Weeks <greg@durendal.org> wrote: |On Wed, 1 Mar 2006, Bruce Albrecht wrote: | |> How many PD books have you found in Google Book Search that were not |> visible? Did you report them to Google? If not, some of the blame |> falls on _your_ shoulders. | |I don't think it works this way. If the books are in there because the |publisher added them and the publisher claims they are under copyright |there is nothing you can do to change it. AFAIK the copyright notice is valid, but only applies to the page and line layout plus cover layout of the new paper edition, not the PG text. Not that they would tell you that. -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights. From gbnewby at pglaf.org Wed Mar 1 10:55:17 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Wed Mar 1 10:55:18 2006 Subject: [gutvol-d] Commercial paper editions of PG texts In-Reply-To: <gonb02pgfj67ftl4vdqcndm86s20dof57j@4ax.com> References: <2de.3175a87.31364202@aol.com> <17413.50263.444498.622315@celery.zuhause.org> <Pine.LNX.4.63.0603011123300.10029@durendal.durendal.org> <gonb02pgfj67ftl4vdqcndm86s20dof57j@4ax.com> Message-ID: <20060301185517.GA29172@pglaf.org> On Wed, Mar 01, 2006 at 05:54:36PM +0000, Dave Fawthrop wrote: > On Wed, 1 Mar 2006 11:24:30 -0500 (EST), Greg Weeks <greg@durendal.org> > wrote: > > |On Wed, 1 Mar 2006, Bruce Albrecht wrote: > | > |> How many PD books have you found in Google Book Search that were not > |> visible? Did you report them to Google? If not, some of the blame > |> falls on _your_ shoulders. (There are *many*, but in many cases the print publisher claimed a copyright inappropriately or imprecisely. > |I don't think it works this way. If the books are in there because the > |publisher added them and the publisher claims they are under copyright > |there is nothing you can do to change it. > > AFAIK the copyright notice is valid, but only applies to the page and line > layout plus cover layout of the new paper edition, not the PG text. Not > that they would tell you that. Not in our opinion (which has been vetted by several expert copyright lawyers): No Sweat of the Brow Copyright http://www.gutenberg.org/howto/sweat-no-c -- Greg From marcello at perathoner.de Wed Mar 1 10:55:17 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Mar 1 10:55:22 2006 Subject: [gutvol-d] Commercial paper editions of PG texts In-Reply-To: <2de.3175a87.31364202@aol.com> References: <2de.3175a87.31364202@aol.com> Message-ID: <4405EE15.5030601@perathoner.de> Bowerbird@aol.com wrote: > indeed, in the sense that they > offer customers the option of a > hard-copy printing of an e-text, > i think they're providing a service. Those people know they can take a PG text, format it, print a hardcopy and sell it. That's done in good faith. Nobody has any problem with that. They also know that formatting a text does not give them any copyright whatsoever. But still they stick a copyright notice on a public domain text. That's done in bad faith. They didn't even proof-read the text, or they would have noticed those errors. > we are > failing in our mission by not ensuring that google > has a no-pages-restricted entity in its book-search > for each and every public-domain book that they have. You are contradicting yourself. Google is a commercial enterprise just like those two-bit publishers. Why do you require ethical behaviour from Google and not from those other publishers? -- Marcello Perathoner webmaster@gutenberg.org From gbnewby at pglaf.org Wed Mar 1 11:00:08 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Wed Mar 1 11:00:10 2006 Subject: [gutvol-d] Commercial paper editions of PG texts In-Reply-To: <17413.50263.444498.622315@celery.zuhause.org> References: <2de.3175a87.31364202@aol.com> <17413.50263.444498.622315@celery.zuhause.org> Message-ID: <20060301190008.GB29172@pglaf.org> On Wed, Mar 01, 2006 at 09:57:11AM -0600, Bruce Albrecht wrote: > ... > How many PD books have you found in Google Book Search that were not > visible? Did you report them to Google? If not, some of the blame > falls on _your_ shoulders. How is such notification done? To easily find some examples, look for Jane Austen's works, H.G. Wells, and other well-known long-dead authors. Note that they seem to use a ridiculous date for "world wide" public domain...something in the 1800s, rather than a "US-Safe" cutoff of 1923. I think they (Google) are creating ambiguity when there is none, at least in the US. -- Greg From Bowerbird at aol.com Wed Mar 1 11:06:37 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Mar 1 11:06:45 2006 Subject: [gutvol-d] Commercial paper editions of PG texts Message-ID: <1a4.4927bbe9.31374abd@aol.com> bruce said: > It's a shame that they are able to hide a PD > book behind a copyrighted cover. major publishing houses do it all the time with p-books. and there's nothing wrong with it legally. or even "morally". most especially if they're delivering hardcopy. paper doesn't grow on trees. (ha ha, i funny!) (i would agree that it would be kind of sleazy to charge people for an electronic-copy, especially if all they got was a download, not an actual c.d. then again, if p.g. wasn't getting free hosting from ibiblio, and they had to pay for all the bandwidth...) *** again, there is nothing to "report". what they're doing is (a) perfectly legal, and (b) a service to their customers. if we want to give people an unrestricted view of the pages, we need to submit the book to the program and specify that. i'll get around to doing it myself sooner or later, providing google doesn't charge publishers to get their titles listed... but it's really something i think p.g. should do, systematically. not that anyone gives a flying burrito what i think p.g. should do. *** greg said: > If the books are in there because the publisher added them > and the publisher claims they are under copyright > there is nothing you can do to change it. i don't think you have to claim the text is under copyright to restrict viewing access on any or even all of your pages. like i said, our job is to submit books to the program that have _no_ restrictions on their viewing. that will serve to drive some of the parasites from the scene, but have little negative effect on the people who are offering a _service_ to end-users by giving them a hard-copy option. (indeed, visibility might have a positive effect on those businesses.) another question is whether project gutenberg wants to get in the hard-copy business itself. p.g. could probably make a little bit of money doing it, or maybe lose a little, because it ain't always easy to satisfy the general public, but either way, i don't see any volunteers for the task... (as branko will tell you, it can take a few hours of work to get an e-text into the shape where you could feel good offering it for sale, and that means a heckuva lot of work for a library that stands at 18,000+ e-texts and growing. of course, if the e-texts had been formatted consistently, it'd be a piece of cake to get them into publication shape. that's one reason i've hollered so long about consistency.) you wouldn't _have_to_ get into the hard-copy business in order to _list_ the titles so that people could view your pages without restriction. you could just specify a cost of $8,000 per title, and probably drive away any customers. (and if you did get a customer, it would be worth it, not?) but meanwhile, the pages would be sitting there, viewable. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060301/49fab548/attachment.html From Bowerbird at aol.com Wed Mar 1 11:38:01 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Mar 1 11:38:18 2006 Subject: [gutvol-d] Commercial paper editions of PG texts Message-ID: <2f3.220a96.31375219@aol.com> marcello said: > Why do you require ethical behaviour from > Google and not from those other publishers? it's folly to try to "require ethical behaviour" from _anyone_ in the publishing industry... but nobody's doing any "unethical" anyway... as i pointed out just now, the copyright issue has absolutely no bearing on viewability of any pages. if you think google should display the pages of a public-domain title without any restrictions, then all you need to do is submit it to their program... heck, the first book i would submit would be "books and culture", the public-domain title that google made available as its first example. you can see my reworking of this title right now: > http://snowy.arsc.alaska.edu/bowerbird/mabie/mabie.html > http://snowy.arsc.alaska.edu/bowerbird/mabie/mabiep001.html and when i submitted my .pdf to their program, i would even leave in the "google print" stamp that they put on each page, just as a little joke... these scans actually came from "google library", now known as "google book search", and not the "google print" program, which is the one geared towards commercial publishers, but google was a little casual with their project-names early on. and now i don't think anyone is very clear about where one program ends and another begins, not even google... but i just looked, and the program is indeed free: > http://books.google.com/intl/en/googlebooks/publisher.html so if you want unfettered titles in the program, the only thing you need to do is take action... by the way, here's my mabie example for displaying a scan-book: > http://snowy.arsc.alaska.edu/bowerbird/mabie/mabied002003.html readers will find my interface more pleasant than the google one: > http://books.google.com/books?id=yGZZXIrbUKQC&pg=PA5 > They didn't even proof-read the text, > or they would have noticed those errors. this is so funny it's not even funny... :+) -bowerbird p.s. yes, "monday morning quarterback" _is_ late, but it's coming... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060301/41532ef6/attachment-0001.html From hyphen at hyphenologist.co.uk Wed Mar 1 12:18:22 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Wed Mar 1 12:18:42 2006 Subject: [gutvol-d] Commercial paper editions of PG texts In-Reply-To: <20060301185517.GA29172@pglaf.org> References: <2de.3175a87.31364202@aol.com> <17413.50263.444498.622315@celery.zuhause.org> <Pine.LNX.4.63.0603011123300.10029@durendal.durendal.org> <gonb02pgfj67ftl4vdqcndm86s20dof57j@4ax.com> <20060301185517.GA29172@pglaf.org> Message-ID: <960c02pahh1fm8cqlipuiq4okkl9o8fut2@4ax.com> On Wed, 1 Mar 2006 10:55:17 -0800, Greg Newby <gbnewby@pglaf.org> wrote: |On Wed, Mar 01, 2006 at 05:54:36PM +0000, Dave Fawthrop wrote: |> On Wed, 1 Mar 2006 11:24:30 -0500 (EST), Greg Weeks <greg@durendal.org> |> wrote: |> |> |On Wed, 1 Mar 2006, Bruce Albrecht wrote: |> | |> |> How many PD books have you found in Google Book Search that were not |> |> visible? Did you report them to Google? If not, some of the blame |> |> falls on _your_ shoulders. | |(There are *many*, but in many cases the print publisher claimed |a copyright inappropriately or imprecisely. | |> |I don't think it works this way. If the books are in there because the |> |publisher added them and the publisher claims they are under copyright |> |there is nothing you can do to change it. |> |> AFAIK the copyright notice is valid, but only applies to the page and line ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |> layout plus cover layout of the new paper edition, not the PG text. Not ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |> that they would tell you that. | |Not in our opinion (which has been vetted by several expert |copyright lawyers): | | No Sweat of the Brow Copyright | http://www.gutenberg.org/howto/sweat-no-c Which is what I said :-( -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights. From joshua at hutchinson.net Wed Mar 1 13:34:25 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Mar 1 13:34:31 2006 Subject: [gutvol-d] Commercial paper editions of PG texts Message-ID: <20060301213425.405654F532@ws6-5.us4.outblaze.com> > ----- Original Message ----- > From: "Dave Fawthrop" <hyphen@hyphenologist.co.uk> > To: gbnewby@pglaf.org, "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> > Subject: Re: [gutvol-d] Commercial paper editions of PG texts > Date: Wed, 01 Mar 2006 20:18:22 +0000 > > > |> > |> AFAIK the copyright notice is valid, but only applies to the page and line > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > |> layout plus cover layout of the new paper edition, not the PG text. Not > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > |> that they would tell you that. > | > |Not in our opinion (which has been vetted by several expert > |copyright lawyers): > | > | No Sweat of the Brow Copyright > | http://www.gutenberg.org/howto/sweat-no-c > > Which is what I said :-( If I understand it right, there is no separate layout copyright in the US (which is the law PG operates under). So, no, the copyright notice is not valid because there is no copyright available for layout. Josh From ag737 at freenet.carleton.ca Wed Mar 1 12:36:38 2006 From: ag737 at freenet.carleton.ca (Wallace J.McLean) Date: Wed Mar 1 13:36:40 2006 Subject: [gutvol-d] Sweat of brow Message-ID: <d9515ad961bc.d961bcd9515a@ncf.ca> >> AFAIK the copyright notice is valid, but only applies to the page and line >> layout plus cover layout of the new paper edition, not the PG text. Not >> that they would tell you that. > Not in our opinion (which has been vetted by several expert > copyright lawyers): > No Sweat of the Brow Copyright > http://www.gutenberg.org/howto/sweat-no-c Not sure about in the US, but the case law, at least when I researched it about 7 or 8 years ago, was still unsettled in Canada and other commonwealth countries. I don't know off hand that that situation has changed. In the UK, most if not all EU countries, and much of the Commonwealth - - though not in Canada -- there is also an express provision protecting "editions" or "typographical arrangements". The term is generally shorter than full copyright, and of course they aren't exclusive. You can prepare another edition or typographical arrangement of Shakespeare in the UK, you just can't republish an exact copy of someone else's in the first 25 years after publication. Quaere; does "edition" or "typographical arrangement" copyright subsist in PG e-texts, in countries which recognize those types of copyright? From bruce at zuhause.org Wed Mar 1 18:23:47 2006 From: bruce at zuhause.org (Bruce Albrecht) Date: Wed Mar 1 18:23:50 2006 Subject: [gutvol-d] Commercial paper editions of PG texts In-Reply-To: <20060301190008.GB29172@pglaf.org> References: <2de.3175a87.31364202@aol.com> <17413.50263.444498.622315@celery.zuhause.org> <20060301190008.GB29172@pglaf.org> Message-ID: <17414.22323.407940.157838@celery.zuhause.org> Greg Newby writes: > On Wed, Mar 01, 2006 at 09:57:11AM -0600, Bruce Albrecht wrote: > > ... > > How many PD books have you found in Google Book Search that were not > > visible? Did you report them to Google? If not, some of the blame > > falls on _your_ shoulders. > > How is such notification done? Well, when I've been doing book searches at Google, and it comes up with a book that doesn't say that it was provided by a publisher, and the book information claims it was copyrighted before 1923, or I can find the copyright in a snippet, I use Google's feeback link to report that the book is incorrectly flagged as being in copyright so that they will fix the status. In one case, they fixed it after a 4-5 email exchange. In other cases, they simply told me that they were aware that some books were incorrectly identified as in copyright. From gbnewby at pglaf.org Sat Mar 4 13:36:10 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Sat Mar 4 13:36:11 2006 Subject: [gutvol-d] Commercial paper editions of PG texts In-Reply-To: <17414.22323.407940.157838@celery.zuhause.org> References: <2de.3175a87.31364202@aol.com> <17413.50263.444498.622315@celery.zuhause.org> <20060301190008.GB29172@pglaf.org> <17414.22323.407940.157838@celery.zuhause.org> Message-ID: <20060304213610.GJ6307@pglaf.org> On Wed, Mar 01, 2006 at 08:23:47PM -0600, Bruce Albrecht wrote: > Greg Newby writes: > > On Wed, Mar 01, 2006 at 09:57:11AM -0600, Bruce Albrecht wrote: > > > ... > > > How many PD books have you found in Google Book Search that were not > > > visible? Did you report them to Google? If not, some of the blame > > > falls on _your_ shoulders. > > > > How is such notification done? > > Well, when I've been doing book searches at Google, and it comes up > with a book that doesn't say that it was provided by a publisher, and > the book information claims it was copyrighted before 1923, or I can > find the copyright in a snippet, I use Google's feeback link to report > that the book is incorrectly flagged as being in copyright so that > they will fix the status. In one case, they fixed it after a 4-5 > email exchange. In other cases, they simply told me that they were > aware that some books were incorrectly identified as in copyright. Do they consider 1923 as a cutoff date (per US law)? Or do they look to 1868 or something similar as a cutoff, as an attempt to only say "public domain" if it's defensibly for the entire world? -- Greg From gbnewby at pglaf.org Sat Mar 4 13:45:33 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Sat Mar 4 13:45:34 2006 Subject: [gutvol-d] Sweat of brow In-Reply-To: <d9515ad961bc.d961bcd9515a@ncf.ca> References: <d9515ad961bc.d961bcd9515a@ncf.ca> Message-ID: <20060304214533.GK6307@pglaf.org> On Wed, Mar 01, 2006 at 03:36:38PM -0500, Wallace J.McLean wrote: > > >> AFAIK the copyright notice is valid, but only applies to the page > and line > >> layout plus cover layout of the new paper edition, not the PG > text. Not > >> that they would tell you that. > > > Not in our opinion (which has been vetted by several expert > > copyright lawyers): > > > No Sweat of the Brow Copyright > > http://www.gutenberg.org/howto/sweat-no-c > > > Not sure about in the US, but the case law, at least when I researched > it about 7 or 8 years ago, was still unsettled in Canada and other > commonwealth countries. I don't know off hand that that situation has > changed. PG's volunteer lawyers are not aware of any case law for this specific question, either. The sweat-no-c document is based on their research on the US Title 17, and surrounding/related case law like Feist v. Rural Telephone Co. BTW, there is at least one grey area: when display involves code (say, some Javascript or even CSS, or something more complex like a complete viewer). In that type of situation, the code itself will likely be copyrighted, even if the content it displays is not. The challenge is when the copyrighted code is embedded / intermixed with the public domain content -- like with CSS or Javascript & HTML. PG tends to avoid this by doing our own markup, & asserting it's public domain, but this grey area might limit some of our harvesting activities. > In the UK, most if not all EU countries, and much of the Commonwealth - > - though not in Canada -- there is also an express provision > protecting "editions" or "typographical arrangements". The term is > generally shorter than full copyright, and of course they aren't > exclusive. You can prepare another edition or typographical > arrangement of Shakespeare in the UK, you just can't republish an > exact copy of someone else's in the first 25 years after publication. > > Quaere; does "edition" or "typographical arrangement" copyright > subsist in PG e-texts, in countries which recognize those types of > copyright? Short answer: we only try to follow US laws, and I'm unaware of anything like this provision in the US. The closest is copyrights on specific fonts, which doesn't really matter much for PG since we seldom include scans of the raw pages from a print book with our eBooks, and our sources tend to be pretty old anyway. It might be this type of copyright (or whatever it might be called) would play a role in the EU and elsewhere...though in practice, if a physical book is old enough to be public domain based on author's death date, probably any typography copyright has also expired. From ag737 at freenet.carleton.ca Sun Mar 5 12:17:33 2006 From: ag737 at freenet.carleton.ca (Wallace J.McLean) Date: Sun Mar 5 12:17:36 2006 Subject: [gutvol-d] Sweat of brow Message-ID: <e0bf77e06c1b.e06c1be0bf77@ncf.ca> ----- Original Message ----- >From Greg Newby <gbnewby@pglaf.org> Date Sat, 4 Mar 2006 13:45:33 -0800 To Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org> Subject Re: [gutvol-d] Sweat of brow >> Quaere; does "edition" or "typographical arrangement" copyright >> subsist in PG e-texts, in countries which recognize those types of >> copyright? > > Short answer: we only try to follow US laws, and I'm unaware > of anything like this provision in the US. Yes, but even if there were, ***the copyright status of a PG work in another country*** is determined by the national law of that country, not of the US. > It might be this type of copyright (or whatever it might be called) > would play a role in the EU and elsewhere...though in practice, if a > physical book is old enough to be public domain based on author's death > date, probably any typography copyright has also expired. Yes, but that's missing the point of typographical or edition copyright: a new typographical arrangement or edition of the work, in that form, has a copyright attached to it. Not the work, the typographical arrangement or edition of it. PG is infringing neither the copyright in the work nor the copyright in the typographical arrangement; but the PG edition may have copyright status in a country that DOES recognize typographical arrangements or editions, subject to national treatment and shorter-term provisions in that country's national law. From hart at pglaf.org Sun Mar 5 12:58:07 2006 From: hart at pglaf.org (Michael Hart) Date: Sun Mar 5 12:58:10 2006 Subject: [gutvol-d] Sweat of brow In-Reply-To: <e0bf77e06c1b.e06c1be0bf77@ncf.ca> References: <e0bf77e06c1b.e06c1be0bf77@ncf.ca> Message-ID: <Pine.LNX.4.60.0603051251440.10668@pglaf.org> On Sun, 5 Mar 2006, Wallace J.McLean wrote: > ----- Original Message ----- >> From Greg Newby <gbnewby@pglaf.org> > Date Sat, 4 Mar 2006 13:45:33 -0800 > To Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org> > Subject Re: [gutvol-d] Sweat of brow > >>> Quaere; does "edition" or "typographical arrangement" copyright >>> subsist in PG e-texts, in countries which recognize those types of >>> copyright? >> >> Short answer: we only try to follow US laws, and I'm unaware >> of anything like this provision in the US. > > Yes, but even if there were, ***the copyright status of a PG work in > another country*** is determined by the national law of that country, > not of the US. > >> It might be this type of copyright (or whatever it might be called) >> would play a role in the EU and elsewhere...though in practice, if a >> physical book is old enough to be public domain based on author's > death >> date, probably any typography copyright has also expired. > > Yes, but that's missing the point of typographical or edition > copyright: a new typographical arrangement or edition of the work, in > that form, has a copyright attached to it. Not the work, the > typographical arrangement or edition of it. PG is infringing neither > the copyright in the work nor the copyright in the typographical > arrangement; but the PG edition may have copyright status in a country > that DOES recognize typographical arrangements or editions, subject to > national treatment and shorter-term provisions in that country's > national law. > If anyone really wants to argue that point, we can insert some more "legal fine print" to the effect that PG places all such possibly copyrightable material into the public domain in all counries. We should put all such potential legal claims to rest before any can even get started. Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg From Bowerbird at aol.com Sun Mar 5 13:23:45 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Mar 5 13:23:54 2006 Subject: [gutvol-d] let's sweat -- i.s.b.n. anyone? Message-ID: <28b.6b580a2.313cb0e1@aol.com> i.s.b.n. are relatively cheap in lots of 10,000... howsabout y'all buy enough for your library? and has there been any progress on putting your free versions of your e-texts into google? or would you just rather wring your hands and shake your fists at the people reselling them? if you want your distribution to be "unlimited", these are some simple steps that you can take. -bowerbird p.s. issue 1 of "monday morning quarterback" will go up tomorrow, in case you were wondering. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060305/3dd6d631/attachment.html From tb at baechler.net Mon Mar 6 00:22:46 2006 From: tb at baechler.net (Tony Baechler) Date: Mon Mar 6 00:22:28 2006 Subject: [gutvol-d] Sweat of brow In-Reply-To: <e0bf77e06c1b.e06c1be0bf77@ncf.ca> References: <e0bf77e06c1b.e06c1be0bf77@ncf.ca> Message-ID: <7.0.1.0.2.20060306001820.02af50d0@baechler.net> Hi. sorry to nitpick here, and I admit this is out of my league, but wouldn't a plain text edition remove any and all fonts or typography anyway? Let's say that you harvest an html or pdf from a country where typography is still under copyright. It is converted to plain text to comply with PG standards, plus valid html etc. Now it gets imported back into the original country. Wouldn't it be legal because the fonts have been removed? Am I missing something obvious? I've followed the thread and understand it relates to google's idea of public domain, but it would seem to me that the copyrighted portion was removed (the typography) so it wouldn't matter. At 12:17 PM 3/5/2006, you wrote: >Yes, but that's missing the point of typographical or edition >copyright: a new typographical arrangement or edition of the work, in >that form, has a copyright attached to it. Not the work, the >typographical arrangement or edition of it. PG is infringing neither >the copyright in the work nor the copyright in the typographical >arrangement; but the PG edition may have copyright status in a country >that DOES recognize typographical arrangements or editions, subject to >national treatment and shorter-term provisions in that country's >national law. From sly at victoria.tc.ca Mon Mar 6 00:44:17 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Mon Mar 6 00:44:19 2006 Subject: [gutvol-d] Sweat of brow In-Reply-To: <7.0.1.0.2.20060306001820.02af50d0@baechler.net> References: <e0bf77e06c1b.e06c1be0bf77@ncf.ca> <7.0.1.0.2.20060306001820.02af50d0@baechler.net> Message-ID: <Pine.GSO.4.58.0603060023270.15295@vtn1.victoria.tc.ca> Hi Tony. I believe that the matter under discussion was not a matter of the rights surrounding the source texts used to produce PG materials, but rather rights of the PG texts themselves. The argument (if I understood correctly) was that in some juristictions legal usage of PG texts might be restricted because the "typesetting" done in preparing the PG texts would qualify for certain protections on its own. This whole discussion reminds me of a few points I like to make when people start making overly broad, generalized statements about copyright laws: 1) Copyright is not just one right, but a bundle of rights. 2) It is treated on a national basis--that is every country has its own copyright laws and quirks about how those laws are applied. (and to what types of material, and what aspects of a given work, and with what definition of certain terms, etc., etc.) 3) The state of likely copyright status for an item in a given country at a given time can be affected by laws that were passed many previously; various amendments that have taken place over time, under pressure from various sources; international conventions the country may be a member of; precedence set by certain legal decisions, etc. Andrew On Mon, 6 Mar 2006, Tony Baechler wrote: > Hi. sorry to nitpick here, and I admit this is out of my league, but > wouldn't a plain text edition remove any and all fonts or typography > anyway? Let's say that you harvest an html or pdf from a country > where typography is still under copyright. It is converted to plain > text to comply with PG standards, plus valid html etc. Now it gets > imported back into the original country. Wouldn't it be legal > because the fonts have been removed? Am I missing something > obvious? I've followed the thread and understand it relates to > google's idea of public domain, but it would seem to me that the > copyrighted portion was removed (the typography) so it wouldn't matter. > From hyphen at hyphenologist.co.uk Mon Mar 6 01:51:10 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Mon Mar 6 01:52:09 2006 Subject: [gutvol-d] Sweat of brow In-Reply-To: <Pine.GSO.4.58.0603060023270.15295@vtn1.victoria.tc.ca> References: <e0bf77e06c1b.e06c1be0bf77@ncf.ca> <7.0.1.0.2.20060306001820.02af50d0@baechler.net> <Pine.GSO.4.58.0603060023270.15295@vtn1.victoria.tc.ca> Message-ID: <8v0o02t5tre1m32e6chvv2fm7tonfji52s@4ax.com> On Mon, 6 Mar 2006 00:44:17 -0800 (PST), Andrew Sly <sly@victoria.tc.ca> wrote: |Hi Tony. | |I believe that the matter under discussion was not a matter |of the rights surrounding the source texts used to produce PG |materials, but rather rights of the PG texts themselves. | |The argument (if I understood correctly) was that in some |juristictions legal usage of PG texts might be restricted |because the "typesetting" done in preparing the PG texts |would qualify for certain protections on its own. In which case IMO PG should put all its work explicitly into the public domain as MH suggested up thread. This is what I did with my Yorkshire Dialect work on my Web Site. It has been copied widely, (I can tell it is my work from the line breaks of poetry) including a Print on Demand outfit. As I work Pro Bono Publico I am happy to see my work reproduced anywhere. -- Dave Fawthrop <dave hyphenologist co uk> "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From Bowerbird at aol.com Mon Mar 6 10:37:00 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Mar 6 10:37:07 2006 Subject: [gutvol-d] monday morning quarterback #01 is now up Message-ID: <65.568a702f.313ddb4c@aol.com> "monday morning quarterback", #01, is up... "m.m.q." is my new weekly "series" on best-practices for people digitizing e-books from paper-books... issue #1 is now available at the "bpsuper" listserve: >?? http://groups.yahoo.com/group/bpsuper/message/3 you don't have to be a member of the "bpsuper" yahoogroup in order to read the messages.? but if yahoogroups gives you an allergic reaction nonetheless, the issue is also posted here: >?? http://snowy.arsc.alaska.edu/bowerbird/mmq/mmq01.txt learn what this "monday morning quarterback" says you are doing wrong, and what you're doing right... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060306/2e1d36a5/attachment.html From ag737 at freenet.carleton.ca Mon Mar 6 12:07:26 2006 From: ag737 at freenet.carleton.ca (Wallace J.McLean) Date: Mon Mar 6 12:07:28 2006 Subject: [gutvol-d] Sweat of brow Message-ID: <e2cfefe2fec4.e2fec4e2cfef@ncf.ca> > ----- Original Message ----- > From Tony Baechler <tb@baechler.net> > Date Mon, 06 Mar 2006 00:22:46 -0800 > To Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org > Subject Re: [gutvol-d] Sweat of brow > > Hi. sorry to nitpick here, and I admit this is out of my league, but > wouldn't a plain text edition remove any and all fonts or typography > anyway? There are still line lengths and possibly some editorial decisions that have been made along the way. It's a remote chance, but there may be room in the legal fine print for PD texts that would disclaim or renounce the copyright that may subsist in the e-text, where, and to the extent that such a copyright is recognized in national copyright law, and where, and to the extent that such a disclaimer or renunciation would be recognized by national law. There is no express provision in Canadian law for a disclaim of copyright, but there is some case law that says that such a disclaimer would estop any subsequent claim for infringement. From hiddengreen at gmail.com Mon Mar 6 12:35:55 2006 From: hiddengreen at gmail.com (Cori) Date: Mon Mar 6 12:36:01 2006 Subject: [gutvol-d] Re: [BP] monday morning quarterback -- #01 In-Reply-To: <200603061928.OAA27822@digital.lib.upenn.edu> References: <200603061928.OAA27822@digital.lib.upenn.edu> Message-ID: <910fee4a0603061235m7335eb21y1e43bcff4b9473c1@mail.gmail.com> On 3/6/06, Bowerbird@aol.com wrote: > i've started a new weekly "series" on best-practices > for people digitizing e-books from paper-books... > you don't have to be a member of the "bpsuper" yahoogroup > in order to read the messages. but if yahoogroups gives you > an allergic reaction nonetheless, the issue is also posted here: If however, it's Bowerbird's writing that gives you an allergic reaction, but you have some reading time free, you'd be welcome to drop by the Distributed Proofreaders' Smoothreading Pool, where a wide variety of complete books can be read, prior to their final Project Gutenberg upload & archiving. http://www.pgdp.net/c/tools/post_proofers/smooth_reading.php You don't have to be a member of DP to visit and download books to read! If you'd like to add comments (by editing the text and leeving [*typo: leaving?] notes,) you would need to create a new account in order to upload your edited copy. As a post-processor, I greatly appreciate the people who have found time to smooth-read my books - they pick out things I just couldn't find any other way, and every little query raised and checked is one less thing for errata @ PG to worry about in 42 years time :) Have fun, Cori From Bowerbird at aol.com Mon Mar 6 13:51:10 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Mar 6 13:51:30 2006 Subject: [gutvol-d] Re: [BP] monday morning quarterback -- #01 Message-ID: <1c4.3b09dfb1.313e08ce@aol.com> cori said: > If however, it's Bowerbird's writing > that gives you an allergic reaction think of it as "fiber", it's good for the diet... :+) > you have some reading time free, > you'd be welcome to drop by the > Distributed Proofreaders' Smoothreading Pool, > where a wide variety of complete books can be read, > prior to their final Project Gutenberg upload & archiving. smoothreading is definitely cool. for anybody who wants to volunteer in user-creation of our global cyberlibrary, but doesn't have the time and energy to digitize a whole book yourself, know you can do your fair share by smoothreading... it's fun and easy, yeah, but it's also _necessary_. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060306/00adfdbe/attachment.html From hart at pglaf.org Mon Mar 6 19:50:48 2006 From: hart at pglaf.org (Michael Hart) Date: Mon Mar 6 19:50:50 2006 Subject: [gutvol-d] Commercial paper editions of PG texts In-Reply-To: <20060301190008.GB29172@pglaf.org> References: <2de.3175a87.31364202@aol.com> <17413.50263.444498.622315@celery.zuhause.org> <20060301190008.GB29172@pglaf.org> Message-ID: <Pine.LNX.4.60.0603061949050.13638@pglaf.org> On Wed, 1 Mar 2006, Greg Newby wrote: > On Wed, Mar 01, 2006 at 09:57:11AM -0600, Bruce Albrecht wrote: >> ... >> How many PD books have you found in Google Book Search that were not >> visible? Did you report them to Google? If not, some of the blame >> falls on _your_ shoulders. > > How is such notification done? > > To easily find some examples, look for Jane Austen's > works, H.G. Wells, and other well-known long-dead authors. > > Note that they seem to use a ridiculous date for > "world wide" public domain...something in the 1800s, > rather than a "US-Safe" cutoff of 1923. > > I think they (Google) are creating ambiguity when there > is none, at least in the US. > -- Greg They made a huge mistake doing too many copyrighted books with the original Google Print Library, now they are hard at work trying to reverse that public relations fiasco. i.e. going overboard in the other direction. mh From Bowerbird at aol.com Mon Mar 6 22:30:11 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Mar 6 22:30:18 2006 Subject: [gutvol-d] an example of a .pdf with excellent e-book design Message-ID: <1e8.4c7a62d5.313e8273@aol.com> with so many examples of bad .pdf design out there, it gives me great pleasure to be able to point to one where the design is well-done, on wonderful poetry: > http://www.poetrysuperhighway.com/ToHellWithRL.pdf -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060307/310088e5/attachment.html From hyphen at hyphenologist.co.uk Tue Mar 7 00:04:13 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Tue Mar 7 00:04:24 2006 Subject: [gutvol-d] Sweat of brow In-Reply-To: <e2cfefe2fec4.e2fec4e2cfef@ncf.ca> References: <e2cfefe2fec4.e2fec4e2cfef@ncf.ca> Message-ID: <t9fq02d0c280deqrh2fdbknjgg9t245e70@4ax.com> On Mon, 06 Mar 2006 15:07:26 -0500, "Wallace J.McLean" <ag737@freenet.carleton.ca> wrote: | |> ----- Original Message ----- |> From Tony Baechler <tb@baechler.net> |> Date Mon, 06 Mar 2006 00:22:46 -0800 |> To Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org |> Subject Re: [gutvol-d] Sweat of brow |> |> Hi. sorry to nitpick here, and I admit this is out of my league, |but |> wouldn't a plain text edition remove any and all fonts or typography |> anyway? | |There are still line lengths and possibly some editorial decisions |that have been made along the way. IME there are always such decisions, especially in poetry where a line would extend beyond 72 or even 80 characters. Also moving mdashes away from line ends etc. -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights. From mkengel at gmail.com Tue Mar 7 19:05:35 2006 From: mkengel at gmail.com (Michael Engel) Date: Tue Mar 7 19:12:55 2006 Subject: [gutvol-d] Grimms Maerchen Message-ID: <aaa8c4580603071905q680758fdsc2625de2a0014f54@mail.gmail.com> There is a Grimm database (i.e. text files) of the following books on the internet * Br?der Grimm: ?Kinder- und Hausm?rchen (7. Ausgabe, 1857)? * Br?der Grimm: ?Kinder- und Hausm?rchen (2. Ausgabe, 1819)? * Jacob Grimm: ?Kleinere Schriften 1 (2. Auflage, 1879)? http://www.lg.fukuoka-u.ac.jp/~ynagata/grimm_database.html They have downloadable Latex files Is that of interest for project Gutenberg ? regards Michael Engel From sly at victoria.tc.ca Tue Mar 7 23:53:56 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Mar 7 23:54:07 2006 Subject: [gutvol-d] Grimms Maerchen In-Reply-To: <aaa8c4580603071905q680758fdsc2625de2a0014f54@mail.gmail.com> References: <aaa8c4580603071905q680758fdsc2625de2a0014f54@mail.gmail.com> Message-ID: <Pine.GSO.4.58.0603072348040.18199@vtn1.victoria.tc.ca> Hi Michael. There is a truly vast amount of material on many different websites which could concievably be added to Project Gutenberg. (Often as I'm checking names in the catalog, I keep finding more.) However, it does take some work to get copyright clearance and reformat the files. If you would like to work on these texts, I would be happy to help with any questions you have... Andrew On Wed, 8 Mar 2006, Michael Engel wrote: > There is a Grimm database (i.e. text files) of the following books on the internet * Br??der Grimm: ??Kinder- und Hausm??rchen (7. Ausgabe, 1857)?? * Br??der Grimm: ??Kinder- und Hausm??rchen (2. Ausgabe, 1819)?? * Jacob Grimm: ??Kleinere Schriften 1 (2. Auflage, 1879)?? http://www.lg.fukuoka-u.ac.jp/~ynagata/grimm_database.html They have downloadable Latex files Is that of interest for project Gutenberg ? regards Michael Engel From Bowerbird at aol.com Thu Mar 9 00:41:22 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Mar 9 00:41:30 2006 Subject: [gutvol-d] google and the translation thing Message-ID: <260.82f14a6.31414432@aol.com> http://www.dancohen.org/blog/posts/no_computer_left_behind said: > Google researchers have demonstrated > (but not yet released to the general public) > a powerful method for creating 'good enough' > translations?not by understanding the grammar > of each passage, but by rapidly scanning and > comparing similar phrases on countless electronic > documents in the original and second languages. > Given large enough volumes of words in a variety > of languages, machine processing can find parallel phrases > and reduce any document into a series of word swaps. > Where once it seemed necessary to have a human being > aid in a computer's translating skills, or to teach that > machine the basics of language, swift algorithms functioning > on unimaginably large amounts of text suffice. Are such new > computer translations as good as a skilled, bilingual human being? > Of course not. Are they good enough to get the gist of a text? Absolutely. > So good the National Security Agency and the Central Intelligence Agency > increasingly rely on that kind of technology to scan, sort, and mine > gargantuan amounts of text and communications > (whether or not the rest of us like it). sounds like something you might find interesting, michael. of course, a "good enough" translation probably wouldn't be, not for literature, where the realm of creativity is instantiated, but could it work as a "first pass" that would do the bulk of the "heavy lifting", so a person knowledgeable in both languages could come in and spend relatively little time smoothing it out? well, it's certainly possible, i would think. and maybe probable. especially if progress on the technique proves to be forthcoming... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060309/fe3b10c2/attachment.html From schultzk at uni-trier.de Thu Mar 9 02:24:40 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Thu Mar 9 03:37:09 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <260.82f14a6.31414432@aol.com> References: <260.82f14a6.31414432@aol.com> Message-ID: <D574473C-8C0B-421F-9693-1E65A0B1B0F1@uni-trier.de> Hi There, Let me chime in here. Yes, you can use these tools as a start and for casual use, but otherwise you can forget them as a professional tool. - due to the statistacal modell you get only 80-90 % accuracy - I see a lot of sites on which the content for different languages is diferent no one to comparison possible - I have work and help develope such tools and know that they give interresting results and are in the range above. Yet, these methods are only good as a analyse tool. - a system with fairly decent grammar models and lexicons give better results using less resources give better results. The Problem here is is that then are not publically availble. The actual problem with translation is the so-called extra- linguistical part!! Culture related facts, context, register etc. to get the last 5 % for a decent translation the effort and resources rises exponentially. As proof the japanese in the 80s said they would have a real-time translation for telephones on the market in 5 years. This was is vaporware. The method is not new. It was used successfully for wheather reports already in the 80s. The method works only for small areas of knowledge/language. In the 70s word for word used to be good enough. Now they have something they say is "good enough". Two Euro cents worth Keith. Am 09.03.2006 um 09:41 schrieb Bowerbird@aol.com: > http://www.dancohen.org/blog/posts/no_computer_left_behind said: > > Google researchers have demonstrated > > (but not yet released to the general public) > > a powerful method for creating 'good enough' > > translations?not by understanding the grammar > > of each passage, but by rapidly scanning and > > comparing similar phrases on countless electronic > > documents in the original and second languages. > > Given large enough volumes of words in a variety > > of languages, machine processing can find parallel phrases > > and reduce any document into a series of word swaps. > > Where once it seemed necessary to have a human being > > aid in a computer's translating skills, or to teach that > > machine the basics of language, swift algorithms functioning > > on unimaginably large amounts of text suffice. Are such new > > computer translations as good as a skilled, bilingual human being? > > Of course not. Are they good enough to get the gist of a text? > Absolutely. > > So good the National Security Agency and the Central > Intelligence Agency > > increasingly rely on that kind of technology to scan, sort, and > mine > > gargantuan amounts of text and communications > > (whether or not the rest of us like it). > > sounds like something you might find interesting, michael. > of course, a "good enough" translation probably wouldn't be, > not for literature, where the realm of creativity is instantiated, > > but could it work as a "first pass" that would do the bulk of the > "heavy lifting", so a person knowledgeable in both languages > could come in and spend relatively little time smoothing it out? > well, it's certainly possible, i would think. and maybe probable. > especially if progress on the technique proves to be forthcoming... > > -bowerbird > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060309/90567b21/attachment.html From Bowerbird at aol.com Thu Mar 9 08:05:59 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Mar 9 08:06:16 2006 Subject: [gutvol-d] google and the translation thing Message-ID: <272.71795cd.3141ac67@aol.com> keith said: > The method is not new. It was used > successfully for wheather reports already the "method" might not be new. but what _is_ different, and undeniably so, is that google has a _huge_ corpus of text with which to implement the method now, possibly the "secret sauce" to make it work. this asset, and its bearing on the problem, should not be underestimated. and indeed, that huge corpus could exert all manner of effects on a wide variety of knowledge tasks. the information about the world represented by _billions_ of web-pages out in cyberspace could lead to the gleaning of vast knowledge. (so much so that it could become very scary.) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060309/9c13eedd/attachment.html From donovan at abs.net Thu Mar 9 15:43:18 2006 From: donovan at abs.net (D Garcia) Date: Thu Mar 9 16:02:18 2006 Subject: [dp-pg] re: [gutvol-d] google and the translation thing In-Reply-To: <272.71795cd.3141ac67@aol.com> References: <272.71795cd.3141ac67@aol.com> Message-ID: <200603091843.19084.donovan@abs.net> On Thursday 09 March 2006 11:05 am, Bowerbird@aol.com wrote: > but what _is_ different, and undeniably so, > is that google has a _huge_ corpus of text <snip> > the information about the world represented > by _billions_ of web-pages out in cyberspace > could lead to the gleaning of vast knowledge. > (so much so that it could become very scary.) I can see this revealing (or at least quantifying) the disturbingly high rate of spelling and grammatical errors. Billions and billions of them, to paraphrase Sagan, or more likely (with sincere apologies to Kubrick) ... "My God ... it's full of shit." Speaking of the web, of course. :) From Bowerbird at aol.com Thu Mar 9 16:16:23 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Mar 9 16:16:29 2006 Subject: [dp-pg] re: [gutvol-d] google and the translation thing Message-ID: <221.96121d3.31421f57@aol.com> donovan said: > I can see this revealing > (or at least quantifying) > the disturbingly high rate of > spelling and grammatical errors. actually, hit-differentials are an excellent method to _detect_ spelling and grammatical errors, so it shouldn't be that difficult to clean the corpus quite thoroughly. but that's not the "knowledge" that google might glean that is so scary to me. that involves putting together disparate pieces of information that were never intended to be connected, but nonetheless exist out in cyberspace and _can_ be joined with enough "smarts". especially if you can dip into a few "private" databases, like the ones with credit-card info, you could build quite a dossier on any person (or place or thing) out there... -bowerbird p.s. in the news yesterday were reports that yet another credit-card database was hacked. does it strike anyone else as odd that "security" can be so lapse on this personal and private data at the same time that the corporations are wanting to "lock up" all their content? i'm beginning to think we should just all _pretend_ that d.r.m. works great to put them at ease, knowing that it'll all be cracked a few years down the line, and we'll be done with it... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060309/ae1ea6f3/attachment.html From sly at victoria.tc.ca Thu Mar 9 20:00:34 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Mar 9 20:00:40 2006 Subject: [gutvol-d] Editing jpg images Message-ID: <Pine.GSO.4.58.0603091956220.12502@vtn1.victoria.tc.ca> The latest of a series of Mary E. Wilkins Freeman books I've been adapting for PG from another website is "The Portion of Labor". The person who produced this has also made some illustrations of availible. I've prepared an html file, and the unchanged images here: http://www.victoria.tc.ca/~sly/portion.htm Would anyone like to work these images into usable shape for PG purposes, or suggest how I might want to deal with them? Andrew From hart at pglaf.org Thu Mar 9 20:58:14 2006 From: hart at pglaf.org (Michael Hart) Date: Thu Mar 9 20:58:15 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <260.82f14a6.31414432@aol.com> References: <260.82f14a6.31414432@aol.com> Message-ID: <Pine.LNX.4.60.0603092055030.32091@pglaf.org> Yes, this is part of what I have been talking about for a few years. Once OCR gets to an acceptable level for all, the next big thing, the killer ap, so to speak, will be MT [Machine Translation] which will convert the 10 million eBooks that will be available into 100 different languages, for a billion free online eBooks. mh On Thu, 9 Mar 2006 Bowerbird@aol.com wrote: > http://www.dancohen.org/blog/posts/no_computer_left_behind said: >> Google researchers have demonstrated >> (but not yet released to the general public) >> a powerful method for creating 'good enough' >> translations??not by understanding the grammar >> of each passage, but by rapidly scanning and >> comparing similar phrases on countless electronic >> documents in the original and second languages. >> Given large enough volumes of words in a variety >> of languages, machine processing can find parallel phrases >> and reduce any document into a series of word swaps. >> Where once it seemed necessary to have a human being >> aid in a computer's translating skills, or to teach that >> machine the basics of language, swift algorithms functioning >> on unimaginably large amounts of text suffice. Are such new >> computer translations as good as a skilled, bilingual human being? >> Of course not. Are they good enough to get the gist of a text? > Absolutely. >> So good the National Security Agency and the Central Intelligence Agency > >> increasingly rely on that kind of technology to scan, sort, and mine >> gargantuan amounts of text and communications >> (whether or not the rest of us like it). > > sounds like something you might find interesting, michael. > of course, a "good enough" translation probably wouldn't be, > not for literature, where the realm of creativity is instantiated, > > but could it work as a "first pass" that would do the bulk of the > "heavy lifting", so a person knowledgeable in both languages > could come in and spend relatively little time smoothing it out? > well, it's certainly possible, i would think. and maybe probable. > especially if progress on the technique proves to be forthcoming... > > -bowerbird > From hyphen at hyphenologist.co.uk Thu Mar 9 23:37:36 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Thu Mar 9 23:38:02 2006 Subject: [dp-pg] re: [gutvol-d] google and the translation thing In-Reply-To: <200603091843.19084.donovan@abs.net> References: <272.71795cd.3141ac67@aol.com> <200603091843.19084.donovan@abs.net> Message-ID: <0sa212ptd3n413ak7i37ngh3607lmg9us2@4ax.com> On Thu, 9 Mar 2006 18:43:18 -0500, D Garcia <donovan@abs.net> wrote: |On Thursday 09 March 2006 11:05 am, Bowerbird@aol.com wrote: |> but what _is_ different, and undeniably so, |> is that google has a _huge_ corpus of text |<snip> |> the information about the world represented |> by _billions_ of web-pages out in cyberspace |> could lead to the gleaning of vast knowledge. |> (so much so that it could become very scary.) | |I can see this revealing (or at least quantifying) the disturbingly high rate |of spelling and grammatical errors. Billions and billions of them, to |paraphrase Sagan, or more likely (with sincere apologies to Kubrick) ... |"My God ... it's full of shit." | |Speaking of the web, of course. :) Clearly we are ?progressing? back to the days of Shakespeare when spelling was much more varied, and he spelled his name in several different ways. Not having a dictionary of ?correct? spelling available did his work no harm. Discuss. ;-) -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights. From hyphen at hyphenologist.co.uk Thu Mar 9 23:46:40 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Thu Mar 9 23:46:55 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <Pine.LNX.4.60.0603092055030.32091@pglaf.org> References: <260.82f14a6.31414432@aol.com> <Pine.LNX.4.60.0603092055030.32091@pglaf.org> Message-ID: <57b212lmtm5sh2kl1jmiah0gh2u1htjr37@4ax.com> On Thu, 9 Mar 2006 20:58:14 -0800 (PST), Michael Hart <hart@pglaf.org> wrote: | |Yes, this is part of what I have been talking about for a few years. And will be *talked about* for decades/centuries to come. |Once OCR gets to an acceptable level for all, the next big thing, |the killer ap, so to speak, will be MT [Machine Translation] which |will convert the 10 million eBooks that will be available into 100 |different languages, for a billion free online eBooks. In your dreams! Only a *tiny* fraction of the *human* population can produce acceptable translations ATM. Machines will have to become more ?intelligent? than 99% of the population before MT becomes a reality. Machines can now compete on equal terms with an earwig. -- Dave Fawthrop <dave hyphenologist co uk> "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From schultzk at uni-trier.de Fri Mar 10 01:16:34 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Fri Mar 10 01:16:40 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <272.71795cd.3141ac67@aol.com> References: <272.71795cd.3141ac67@aol.com> Message-ID: <EE5F31D2-2B4A-4ED7-8B9F-2DE2136C6FD8@uni-trier.de> Hi There, Am 09.03.2006 um 17:05 schrieb Bowerbird@aol.com: > keith said: > > The method is not new. It was used > > successfully for wheather reports already > > the "method" might not be new. > > but what _is_ different, and undeniably so, > is that google has a _huge_ corpus of text > with which to implement the method now, > possibly the "secret sauce" to make it work. Just the opposite is the case. Believe me as a computer linguist. For decades it said that with faster computers bigger corpora MT would have its break through. What has happened. Vaporware and results. It simply does not work. Language can not be sucessfully model. Languages are regularly formed, nor well formed. > > this asset, and its bearing on the problem, > should not be underestimated. and indeed, > that huge corpus could exert all manner of > effects on a wide variety of knowledge tasks. > > the information about the world represented > by _billions_ of web-pages out in cyberspace > could lead to the gleaning of vast knowledge. > (so much so that it could become very scary.) All AI projects so far have failed and failure has been admitted. That knowlege can be extracted from corpora. Language does not constitute meaning or knowledge. It just transport it. That is why a good deal in NLP is done in the field of knowledge representation. Do you realize that voilets where originally the color BROWN and not blue !!! (see Goethe). A translator today would translate Goethe braun(brown) to blue since it is what people would expect!!! Keith. > > -bowerbird > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060310/5d3be654/attachment.html From schultzk at uni-trier.de Fri Mar 10 01:32:29 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Fri Mar 10 01:32:36 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <Pine.LNX.4.60.0603092055030.32091@pglaf.org> References: <260.82f14a6.31414432@aol.com> <Pine.LNX.4.60.0603092055030.32091@pglaf.org> Message-ID: <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de> Am 10.03.2006 um 05:58 schrieb Michael Hart: > > Yes, this is part of what I have been talking about for a few years. > > Once OCR gets to an acceptable level for all, the next big thing, > the killer ap, so to speak, will be MT [Machine Translation] which > will convert the 10 million eBooks that will be available into 100 > different languages, for a billion free online eBooks. Not in the next 100 or so years. In the 80s there where OCR systems you could/(had to) be trained. They would give you 95 to 99% accuracy. But, inorder to get these results you would train the system a long time and this training could basically be used just on one text. Today, dictionaries are used to guess which words are to be recognised. That is why the OCR systems today give us better results if the original has DECENT quality!!! The pattern recognition systems have not gotten better and the dictionary trick takes the motivation away to develop better OCR algorithms. Interesting is also, I had a Apple Newton and it recognized my handwriting with 98-99% accuracy. Yet, most OCR systems today will fail!! They can not be trained! I still have to find a system today with similar performance. So much for technological break throughs. Keith. > > mh > > On Thu, 9 Mar 2006 Bowerbird@aol.com wrote: > >> http://www.dancohen.org/blog/posts/no_computer_left_behind said: >>> Google researchers have demonstrated >>> (but not yet released to the general public) >>> a powerful method for creating 'good enough' >>> translations??not by understanding the grammar >>> of each passage, but by rapidly scanning and >>> comparing similar phrases on countless electronic >>> documents in the original and second languages. >>> Given large enough volumes of words in a variety >>> of languages, machine processing can find parallel phrases >>> and reduce any document into a series of word swaps. >>> Where once it seemed necessary to have a human being >>> aid in a computer's translating skills, or to teach that >>> machine the basics of language, swift algorithms functioning >>> on unimaginably large amounts of text suffice. Are such new >>> computer translations as good as a skilled, bilingual human >>> being? >>> Of course not. Are they good enough to get the gist of a text? >> Absolutely. >>> So good the National Security Agency and the Central >>> Intelligence Agency >> >>> increasingly rely on that kind of technology to scan, sort, >>> and mine >>> gargantuan amounts of text and communications >>> (whether or not the rest of us like it). >> >> sounds like something you might find interesting, michael. >> of course, a "good enough" translation probably wouldn't be, >> not for literature, where the realm of creativity is instantiated, >> >> but could it work as a "first pass" that would do the bulk of the >> "heavy lifting", so a person knowledgeable in both languages >> could come in and spend relatively little time smoothing it out? >> well, it's certainly possible, i would think. and maybe probable. >> especially if progress on the technique proves to be forthcoming... >> >> -bowerbird > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From schultzk at uni-trier.de Fri Mar 10 01:43:16 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Fri Mar 10 01:43:21 2006 Subject: [dp-pg] re: [gutvol-d] google and the translation thing In-Reply-To: <0sa212ptd3n413ak7i37ngh3607lmg9us2@4ax.com> References: <272.71795cd.3141ac67@aol.com> <200603091843.19084.donovan@abs.net> <0sa212ptd3n413ak7i37ngh3607lmg9us2@4ax.com> Message-ID: <A9A9A706-5415-4C9A-A8EE-4EDE665370FD@uni-trier.de> Hi, Am 10.03.2006 um 08:37 schrieb Dave Fawthrop: > On Thu, 9 Mar 2006 18:43:18 -0500, D Garcia <donovan@abs.net> wrote: > > |On Thursday 09 March 2006 11:05 am, Bowerbird@aol.com wrote: > |> but what _is_ different, and undeniably so, > |> is that google has a _huge_ corpus of text > |<snip> > |> the information about the world represented > |> by _billions_ of web-pages out in cyberspace > |> could lead to the gleaning of vast knowledge. > |> (so much so that it could become very scary.) > | > |I can see this revealing (or at least quantifying) the > disturbingly high rate > |of spelling and grammatical errors. Billions and billions of them, to > |paraphrase Sagan, or more likely (with sincere apologies to > Kubrick) ... > |"My God ... it's full of shit." > | > |Speaking of the web, of course. :) > > Clearly we are ?progressing? back to the days of Shakespeare when > spelling > was much more varied, and he spelled his name in several different > ways. > Not having a dictionary of ?correct? spelling available did his > work no > harm. Discuss. ;-) It did him no harm and humans no harm.But, machines are knowledgeless !! They need a dictionary. Humans through their experience and knowledge can recognize all this. A machine has to be given this knowledge. This is not a trival task. The Cobuild dictionary was the first Dictionary that was completly corpus based, but there was a lot of human man power used, also. Btw. All of Shakespeare works were not written down by himself, but were transcripted during the plays. Therefore the varied portfolios and spellings. Keith. > -- > Dave Fawthrop <dave hyphenologist co uk> > Freedom of Speech, Expression, Religion, and Democracy are > the keys to Civilization, together with legal acceptance of > Fundamental Human rights. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From hyphen at hyphenologist.co.uk Fri Mar 10 02:37:34 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Fri Mar 10 02:37:42 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de> References: <260.82f14a6.31414432@aol.com> <Pine.LNX.4.60.0603092055030.32091@pglaf.org> <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de> Message-ID: <2el2129vfruqn211o6n8v98t9srlnpjdb3@4ax.com> On Fri, 10 Mar 2006 10:32:29 +0100, "Keith J. Schultz" <schultzk@uni-trier.de> wrote: | text. Today, dictionaries are used to guess which words are | to be recognised. That is why the OCR systems today give us | better results if the original has DECENT quality!!! And get it *wrong* <mumblehategrumble> very often. For my Yorkshire Dialect stuff which include "wor" many times, this gets changed into "war", most of the time. To the extent that I use an initial edit to put it right. -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights. From holden.mcgroin at dsl.pipex.com Fri Mar 10 02:24:15 2006 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Fri Mar 10 02:50:13 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de> References: <260.82f14a6.31414432@aol.com> <Pine.LNX.4.60.0603092055030.32091@pglaf.org> <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de> Message-ID: <1141986255.20173.15.camel@steve-mcqueen> On Fri, 2006-03-10 at 10:32 +0100, Keith J. Schultz wrote: > text. Today, dictionaries are used to guess which words are > to be recognised. That is why the OCR systems today give us > better results if the original has DECENT quality!!! > The pattern recognition systems have not gotten better and > the dictionary trick takes the motivation away to > develop better OCR algorithms. I'm going to have to call bullshit here. As a researcher working in the field of document recognition, I've noticed tremendous improvements in OCR quality even just in the past five years. The fact is, OCR and document recognition as a whole is a field of tremendous ongoing research. It's no secret that the problem of OCR is not "solved" yet but for some types of document (particularly clean ones using lating characters), results are already damn good. In other areas, particularly regarding degraded documents, results aren't as good but are steadily improving. You state that the so-called "dictionary trick" takes away all motivation to research in the field. This is not what I observe going on in the research community. Dictionary-based lookups are one tool in the arsenal but that's something that's well understood. Some of my colleagues are currently researching novel image processing and feature extraction techniques with the goal of improving raw OCR results. OCR is improving. We're working on it. Cheers, Holden From schultzk at uni-trier.de Fri Mar 10 03:33:59 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Fri Mar 10 03:34:05 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <1141986255.20173.15.camel@steve-mcqueen> References: <260.82f14a6.31414432@aol.com> <Pine.LNX.4.60.0603092055030.32091@pglaf.org> <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de> <1141986255.20173.15.camel@steve-mcqueen> Message-ID: <264BC548-43B0-40FF-900A-CBCFC8914FAF@uni-trier.de> Hello, Am 10.03.2006 um 11:24 schrieb Holden McGroin: > On Fri, 2006-03-10 at 10:32 +0100, Keith J. Schultz wrote: >> text. Today, dictionaries are used to guess which words are >> to be recognised. That is why the OCR systems today give us >> better results if the original has DECENT quality!!! > >> The pattern recognition systems have not gotten better and >> the dictionary trick takes the motivation away to >> develop better OCR algorithms. > > I'm going to have to call bullshit here. As a researcher working in > the > field of document recognition, I've noticed tremendous improvements in > OCR quality even just in the past five years. Before you start to swear, read and understand! Maybe in the development labs, but not for the non-high end user!!!! > > The fact is, OCR and document recognition as a whole is a field of > tremendous ongoing research. It's no secret that the problem of OCR is > not "solved" yet but for some types of document (particularly clean > ones > using lating characters), results are already damn good. In other > areas, > particularly regarding degraded documents, results aren't as good but > are steadily improving. > > You state that the so-called "dictionary trick" takes away all > motivation to research in the field. This is not what I observe > going on > in the research community. Dictionary-based lookups are one tool in > the > arsenal but that's something that's well understood. Some of my > colleagues are currently researching novel image processing and > feature > extraction techniques with the goal of improving raw OCR results. We have not seen any improvements in the field for the past five years!!! The improvements are mainly due to the use of dictionaries!! Not the improvement of character recognition!! Most systems in the field get their performance out of word recognition !!! > > OCR is improving. We're working on it. I did mean to say not there is no improvement in Optical Character Recognition, but the improvment over the past 10 years is minimal at most. When I see a OCR system that just uses raw results, then I will bow my head in recognition of true achieve meant. Furthermore, when the image processing gets that far it will open up new possiblities in all kinds of sciences. From schultzk at uni-trier.de Fri Mar 10 03:36:09 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Fri Mar 10 03:36:12 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <2el2129vfruqn211o6n8v98t9srlnpjdb3@4ax.com> References: <260.82f14a6.31414432@aol.com> <Pine.LNX.4.60.0603092055030.32091@pglaf.org> <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de> <2el2129vfruqn211o6n8v98t9srlnpjdb3@4ax.com> Message-ID: <7599AF7D-1B28-495E-964B-55F4A5998386@uni-trier.de> Hi There, Am 10.03.2006 um 11:37 schrieb Dave Fawthrop: > On Fri, 10 Mar 2006 10:32:29 +0100, "Keith J. Schultz" > <schultzk@uni-trier.de> wrote: > > | text. Today, dictionaries are used to guess which words are > | to be recognised. That is why the OCR systems today give us > | better results if the original has DECENT quality!!! > > And get it *wrong* <mumblehategrumble> very often. exactly my point. > For my Yorkshire Dialect stuff which include "wor" many times, this > gets > changed into "war", most of the time. To the extent that I use an > initial edit to put it right. A small tip. Try using a custom dictionary. Or get a system that you can train! Keith. From joshua at hutchinson.net Fri Mar 10 06:31:18 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Mar 10 06:31:19 2006 Subject: [gutvol-d] google and the translation thing Message-ID: <20060310143118.ADEC62F8DE@ws6-3.us4.outblaze.com> > ----- Original Message ----- > From: "Keith J. Schultz" <schultzk@uni-trier.de> > Interesting is also, I had a Apple Newton and it recognized > my handwriting with 98-99% accuracy. Yet, most OCR systems > today will fail!! They can not be trained! I still have to find > a system today with similar performance. So much for technological > break throughs. > Not to disagree with your main points, which I agree with, but I thought I'd point out that most major OCR packages still allowing training (I'm most familiar with FineReader), but they do tend to bury deep so that you have to hunt to find out how to do it (we have people who do it regular at DP for some of the more ... creative ... fonts we find sometimes in old texts). Josh From aotg20 at dsl.pipex.com Fri Mar 10 06:45:40 2006 From: aotg20 at dsl.pipex.com (Richard Poynder) Date: Fri Mar 10 06:45:49 2006 Subject: [gutvol-d] Interview with Michael Hart In-Reply-To: <Pine.LNX.4.60.0602261037450.10447@pglaf.org> References: <1675677963.20060109133820@noring.name> <20060109233815.GB21426@pglaf.org> <80308709.20060109164835@noring.name> <20060110002256.GA27181@pglaf.org> <7.0.1.0.2.20060226175302.00adc9e0@dsl.pipex.com> <Pine.LNX.4.60.0602261037450.10447@pglaf.org> Message-ID: <7.0.1.0.2.20060310144359.021b2678@dsl.pipex.com> Thank you. The interview is now published at: http://poynder.blogspot.com/2006/03/interview-with-michael-hart.html Best wishes, Richard Poynder Richard Poynder Freelance Journalist www.richardpoynder.com http://poynder.blogspot.com At 18:53 26/02/2006, you wrote: >Hopefully any of the three color pics here will fill the bill, >or the right hand b/w. > > >lynx http://pglaf.org/~hart/ > > >"If what you did yesterday >Still seems great today, >Then your goals for tomorrow >Are not big enough." > >Ling Yu Fu, circa 600 BC > > >Break Down The Bars Of Ignorance And Illiteracy > >Michael S. Hart (hart@pobox.com) > > >On Sun, 26 Feb 2006, Richard Poynder wrote: > >>Dear All, >> >>I shall shortly be publishing an interview I did with Michael Hart, >>and am very keen to include a recent color photo of him. >> >>Does anyone happen to have such a photo that they could send to me >>by e-mail? If so, I would be most grateful. >> >>Best wishes, >> >> >>Richard Poynder >> >> >>Richard Poynder >>Freelance Journalist >>www.richardpoynder.com >>http://poynder.blogspot.com From hyphen at hyphenologist.co.uk Fri Mar 10 06:54:25 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Fri Mar 10 06:54:37 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <20060310143118.ADEC62F8DE@ws6-3.us4.outblaze.com> References: <20060310143118.ADEC62F8DE@ws6-3.us4.outblaze.com> Message-ID: <9j4312tesp0mnmneg8gqf9mg269vn479os@4ax.com> On Fri, 10 Mar 2006 09:31:18 -0500, "Joshua Hutchinson" <joshua@hutchinson.net> wrote: | |> ----- Original Message ----- |> From: "Keith J. Schultz" <schultzk@uni-trier.de> | |> Interesting is also, I had a Apple Newton and it recognized |> my handwriting with 98-99% accuracy. Yet, most OCR systems |> today will fail!! They can not be trained! I still have to find |> a system today with similar performance. So much for technological |> break throughs. |> | |Not to disagree with your main points, which I agree with, but I thought I'd point out that most major OCR packages still allowing training (I'm most familiar with FineReader), but they do tend to bury deep so that you have to hunt to find out how to do it (we have people who do it regular at DP for some of the more ... creative ... fonts we find sometimes in old texts). I use Readiris because finereader would not mount when I tried it *long* ago. The problem there is that it does not ?see? the thin strokes in "w" and "r" so no amount of training will work for those two characters. -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights. From Bowerbird at aol.com Fri Mar 10 09:59:26 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Mar 10 09:59:42 2006 Subject: [gutvol-d] google and the translation thing Message-ID: <111.5cbf1cc9.3143187e@aol.com> keith said: > Just the opposite is the case. > Believe me as a computer?linguist. i believe that the computer linguists have not been able to solve the problem. i also believe that google's research lab _will_ be able to solve it. i doubt they have "solved" it yet, and i'm sure when they do, their "solution" won't be "perfect enough" for the computer linguists, but nonetheless... > What has happened. Vaporware and results. > It simply does not work. Language can not be > sucessfully model. Languages are regularly formed, > nor well formed.? and here's a great example of why it won't be "perfect". just in the sentences quoted above: there should be a question-mark after "happened"; there seems to be a missing adjective before "results";' "successfully" is not spelled correctly. and there seems to be a missing word between "regularly" and "formed"; yet despite all these shortcomings, i know exactly what you meant to say... (and i don't mean to be picking on you if english is not your first language. i only speak one language, so i am the last person to criticize anyone else on that dimension. the point is that human beings are very good at resolving the ambiguity that results from incomplete information, and we probably can't reasonably expect that of machines. but it is simply not that case that ambiguity permeates _every_aspect_ of language; clarity is not impossible.) > All AI projects so far have failed > and failure has been admitted. yes it has been, yet deep blue can still beat all but the best of the world's grandmasters... if you give up on teaching a machine "meaning", and concentrate on giving it enough rules that give the correct results most of the time, you can get very close to finishing the job you want done. of course, this approach is considered "a trick" by the artificial-intelligence people, whose aim was to "teach meaning" rather than solve a task, but that's why those artificial-intelligence people have been such a failure themselves... > When I see a OCR system that just uses raw results, > then I will bow my head in recognition of true achieve meant. a perfect example of what i just said: the objective is to get accurate o.c.r., by whatever means necessary, and _not_ to limit yourself to "raw results". if doing some voodoo gave better o.c.r., we would do it. this isn't some kind of "intellectual challenge" where we find it necessary to tie our hands behind our back; it is a practical job that needs to be done... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060310/94d5a747/attachment-0001.html From gbnewby at pglaf.org Fri Mar 10 16:07:02 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Fri Mar 10 16:07:03 2006 Subject: [gutvol-d] eBooks on slashdot today Message-ID: <20060311000702.GA22305@pglaf.org> "It seems that the readers of Slashdot are the most likely early adopters of electronic books, but from posts I've seen here, it doesn't appear that many on Slashdot are e-book fans. In the hopes of sparking a discussion, I'd like to ask what keeps you personally from reading e-books?" Here are some of my guesses as to why people haven't taken up e-Books: 1. Form factor: They just prefer the feel and 'interface' of a paper book. 2. Lack of a compelling device (or perhaps lack of convergence): They don't own a reader (other than a PC or notebook) and can't take them with them. 3. Lack of content: Books they are interested in aren't available in electronic format 4. Distribution model: They don't like the DRM scheme their favorite publisher offers, or are otherwise unhappy with current offerings. Maybe lively discussion from a prospective set of customers might spur the creator of the next generation of electronic book devices. Too bad the name 'iBook' is already taken." What reason do you have for not taking up e-Books? Are they listed above or are there other reasons that you would like to add?" http://ask.slashdot.org/article.pl?sid=06/03/10/1555203 From bruce at zuhause.org Fri Mar 10 18:40:33 2006 From: bruce at zuhause.org (Bruce Albrecht) Date: Fri Mar 10 18:40:40 2006 Subject: [gutvol-d] Commercial paper editions of PG texts In-Reply-To: <20060304213610.GJ6307@pglaf.org> References: <2de.3175a87.31364202@aol.com> <17413.50263.444498.622315@celery.zuhause.org> <20060301190008.GB29172@pglaf.org> <17414.22323.407940.157838@celery.zuhause.org> <20060304213610.GJ6307@pglaf.org> Message-ID: <17426.14497.873203.598737@celery.zuhause.org> Greg Newby writes: > On Wed, Mar 01, 2006 at 08:23:47PM -0600, Bruce Albrecht wrote: > > Greg Newby writes: > > > On Wed, Mar 01, 2006 at 09:57:11AM -0600, Bruce Albrecht wrote: > > > > ... > > > > How many PD books have you found in Google Book Search that were not > > > > visible? Did you report them to Google? If not, some of the blame > > > > falls on _your_ shoulders. > > > > > > How is such notification done? > > > > Well, when I've been doing book searches at Google, and it comes up > > with a book that doesn't say that it was provided by a publisher, and > > the book information claims it was copyrighted before 1923, or I can > > find the copyright in a snippet, I use Google's feeback link to report > > that the book is incorrectly flagged as being in copyright so that > > they will fix the status. In one case, they fixed it after a 4-5 > > email exchange. In other cases, they simply told me that they were > > aware that some books were incorrectly identified as in copyright. > > Do they consider 1923 as a cutoff date (per US law)? Or > do they look to 1868 or something similar as a cutoff, > as an attempt to only say "public domain" if it's defensibly > for the entire world? I don't know about other countries, as I am in the US. This week, I tried to follow up on a book published in 1914 in England (but probably missing an explicit copyright), but it's hard to tell because Google doesn't display full sized title and title-verso page. The British Library didn't indicate any additional editions. Basically, Google's response this time was "We're not sure if it's in copyright so you're not going to see anything more than snippets." From phil at thalasson.com Fri Mar 10 18:11:16 2006 From: phil at thalasson.com (Philip Baker) Date: Fri Mar 10 19:15:01 2006 Subject: [dp-pg] re: [gutvol-d] google and the translation thing In-Reply-To: <A9A9A706-5415-4C9A-A8EE-4EDE665370FD@uni-trier.de> Message-ID: <IGVKjOAEHjEEFwbv@thalasson.com> In article <A9A9A706-5415-4C9A-A8EE-4EDE665370FD@uni-trier.de>, "Keith J. Schultz" <schultzk@uni-trier.de> writes > Btw. All of Shakespeare works were not written down by himself, > but were transcripted during the plays. Therefore the varied > portfolios and spellings. You mean the various quartos. Some may have been bootleg copies for the use of rival theatre companies but the First Folio was produced from working copies of the plays owned by Shakespeare's theatre company. -- Philip Baker From tb at baechler.net Sat Mar 11 00:43:00 2006 From: tb at baechler.net (Tony Baechler) Date: Sat Mar 11 03:26:10 2006 Subject: [gutvol-d] eBooks on slashdot today In-Reply-To: <20060311000702.GA22305@pglaf.org> References: <20060311000702.GA22305@pglaf.org> Message-ID: <7.0.1.0.2.20060311003844.03737d60@baechler.net> My reasons are 3 and 4 below. I'm blind so I'm very interested in electronic books. I read them almost exclusively. However, as much as I like PG, I get tired of only reading pre-1923 books so I look elsewhere. I am very greatful to the people who are getting 1950's science fiction cleared. I assume this falls under rule 6? I'm greatful, those books aren't generally available except from PG. I don't like DRM anyway, but especially since it usually locks out screen readers from reading the text. Often PDf files are encrypted or have passwords preventing copying text. MS Reader is generally not accessible at all. Even if reading aloud is turned on, you don't get a choice of what voice you want and are stuck with horrible software speech. Even the new DAISY format for the blind has restrictions but I can convert it to plain text so I'm happy. Post this to slashdot if you want. At 04:07 PM 3/10/2006, you wrote: >3. Lack of content: Books they are interested in aren't available in >electronic format > >4. Distribution model: They don't like the DRM scheme their favorite >publisher offers, or are otherwise unhappy with current offerings. From hart at ibiblio.org Fri Mar 10 13:42:49 2006 From: hart at ibiblio.org (Michael Hart) Date: Sat Mar 11 08:09:45 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <111.5cbf1cc9.3143187e@aol.com> References: <111.5cbf1cc9.3143187e@aol.com> Message-ID: <Pine.LNX.4.61.0603101640560.14959@tribal.metalab.unc.edu> This is all just nitpicking and progress is being made, that's all that counts, not now many think ENUF progress. "Those doing the impossible should not be interrupted by those who say impossible." Ancient Chinese Proverb I am currently running from an emergency backup mail system, so please reply to hart@pobox.com as usual, but cc: me at: hart@metalab.unc.edu until I let you know I am back @pglaf. Please also add hart@pglaf.org to your email alias for me. Thanks!!!!!!! Michael On Fri, 10 Mar 2006 Bowerbird@aol.com wrote: > keith said: >> Just the opposite is the case. >> Believe me as a computer?linguist. > > i believe that the computer linguists > have not been able to solve the problem. > > i also believe that google's research lab > _will_ be able to solve it. i doubt they have > "solved" it yet, and i'm sure when they do, > their "solution" won't be "perfect enough" > for the computer linguists, but nonetheless... > > >> What has happened. Vaporware and results. >> It simply does not work. Language can not be >> sucessfully model. Languages are regularly formed, >> nor well formed.? > > and here's a great example of why it won't be "perfect". > just in the sentences quoted above: there should be a > question-mark after "happened"; there seems to be a > missing adjective before "results";' "successfully" is not > spelled correctly. and there seems to be a missing word > between "regularly" and "formed"; yet despite all these > shortcomings, i know exactly what you meant to say... > > (and i don't mean to be picking on you if english is not > your first language. i only speak one language, so i am > the last person to criticize anyone else on that dimension. > the point is that human beings are very good at resolving > the ambiguity that results from incomplete information, > and we probably can't reasonably expect that of machines. > but it is simply not that case that ambiguity permeates > _every_aspect_ of language; clarity is not impossible.) > > >> All AI projects so far have failed >> and failure has been admitted. > > yes it has been, yet deep blue can still beat > all but the best of the world's grandmasters... > > if you give up on teaching a machine "meaning", > and concentrate on giving it enough rules that > give the correct results most of the time, you can > get very close to finishing the job you want done. > > of course, this approach is considered "a trick" > by the artificial-intelligence people, whose aim > was to "teach meaning" rather than solve a task, > but that's why those artificial-intelligence people > have been such a failure themselves... > > >> When I see a OCR system that just uses raw results, >> then I will bow my head in recognition of true achieve meant. > > a perfect example of what i just said: > the objective is to get accurate o.c.r., > by whatever means necessary, and > _not_ to limit yourself to "raw results". > > if doing some voodoo gave better o.c.r., > we would do it. this isn't some kind of > "intellectual challenge" where we find it > necessary to tie our hands behind our back; > it is a practical job that needs to be done... > > -bowerbird > From Bowerbird at aol.com Sat Mar 11 11:13:05 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Mar 11 11:13:13 2006 Subject: [gutvol-d] google and the translation thing Message-ID: <244.8584560.31447b41@aol.com> michael said: > This is all just nitpicking and progress is being made, > that's all that counts, not now many think ENUF progress. um, it's certainly not "nitpicking". perhaps the fact that "progress is being made" might be "all that counts" to _you_, michael, but maybe something else counts to someone else... i certainly think the _methods_ that people are using to "make progress" is an interesting topic. moreover, i think it's quite fascinating that those people whose methods failed to make progress localize the cause of that failure in some inherent "difficulty of the task" rather than in their methods. they then go on to lambast anyone else who thinks that progress could be made with another method. and yes, there is much precedent for this on this list. for the past few years, a number of people have been telling me that "a plain-text format cannot represent the range of features in paper-books" simply because they could not imagine one that could. but i _can_... and even when i told you, repeatedly, that i could do it, they insisted -- just as vehemently -- that i could not... well, in case you haven't noticed, people, i have begun the process of giving you unequivocal proof that i can. and, just as i've known and predicted all along, you will suddenly become silent with your "that can't be done" song and dance, and will pretend you never said it at all. > "Those doing the impossible > should not be interrupted > by those who say impossible." > Ancient Chinese Proverb but the ones who say it is impossible will keep on trying to interrupt them... because otherwise, their smug picture of themselves as "invited experts" will vanish in a puff of their own vapor... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060311/3fbc93a8/attachment.html From hart at pglaf.org Sat Mar 11 19:18:39 2006 From: hart at pglaf.org (Michael Hart) Date: Sat Mar 11 19:18:41 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <244.8584560.31447b41@aol.com> References: <244.8584560.31447b41@aol.com> Message-ID: <Pine.LNX.4.60.0603111858540.13127@pglaf.org> On Sat, 11 Mar 2006 Bowerbird@aol.com wrote: > michael said: >> This is all just nitpicking and progress is being made, >> that's all that counts, not now many think ENUF progress. > > um, it's certainly not "nitpicking". > > perhaps the fact that "progress is being made" > might be "all that counts" to _you_, michael, but > maybe something else counts to someone else... > > i certainly think the _methods_ that people are > using to "make progress" is an interesting topic. It's not the means to the end that count, it's arriving at the end that counts. Methodologies are continually being upset by those who find some other way to do an expensive [time or money] function for an infinitessimal amount of the original. Just try going coast to coast without it. Of course, if the Wright Brothers were in the current copyright scenario, we should only just now have their blueprints in an ancient and mummified public domain. > moreover, i think it's quite fascinating that those > people whose methods failed to make progress > localize the cause of that failure in some inherent > "difficulty of the task" rather than in their methods. This is true enough to all failures to be meaningless in this particular specification, don't you have some remarkable insight for THIS specific application? If not, then why talk in such generalities that these non-specifics apply to everything in general and thus to nothing in specific. > they then go on to lambast anyone else who thinks > that progress could be made with another method. Time to look in the mirror, my friend. Try everything, go which what succeeds. "Nothing succeeds like success." > and yes, there is much precedent for this on this list. Speak for yourself, John. > for the past few years, a number of people have been > telling me that "a plain-text format cannot represent > the range of features in paper-books" simply because > they could not imagine one that could. but i _can_... It only matters when you get to the point of ending the debating and actually doing something the outside world can see and work with. Until then, as S. I. Hawakawa told me was the best thing he could teach me, it remains in your asylum with you. Get out into the real world! Until, it makes no difference to anyone else. > and even when i told you, repeatedly, that i could do it, > they insisted -- just as vehemently -- that i could not... There is only one way to prove them wrong. > well, in case you haven't noticed, people, i have begun > the process of giving you unequivocal proof that i can. "The proof of the pudding is in the eating." "Alice, Pudding. Pudding, Alice." Until your product is introduced to the public, it's just a Mad Hatter's Tea Party. > and, just as i've known and predicted all along, you will > suddenly become silent with your "that can't be done" > song and dance, and will pretend you never said it at all. "You" who? Yoohoo! "Is there anybody OUT there?" THAT is the ONLY question that matters OUTSIDE the laboratory. This is why Doug Englebart with never be credited with eBooks, they were never released into the wild, as ours are. Until yours make it in the wild, we'll just never know. . . . >> "Those doing the impossible >> should not be interrupted >> by those who say impossible." >> Ancient Chinese Proverb > > but the ones who say it is impossible > will keep on trying to interrupt them... "PAY NO ATTENTION TO THE MAN BEHIND THE CURTAIN!" You have had the power all along. "There's no place like home, "There's no place like home, "There's no place like home, "There's no place like home." Until your work finds a home, it may as well be in Oz. > because otherwise, their smug picture > of themselves as "invited experts" will > vanish in a puff of their own vapor... Only if reality becomes part of the equation. > > -bowerbird > Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg From gbnewby at pglaf.org Sat Mar 11 22:37:50 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Sat Mar 11 22:37:52 2006 Subject: [gutvol-d] Fwd: Reading software for PG users Message-ID: <20060312063750.GB16884@pglaf.org> Some info about speed reading software, below, for anyone interested: ----- Forwarded message from John Burgess <ad123@ix.netcom.com> ----- From: "John Burgess" <ad123@ix.netcom.com> To: <gbnewby@pglaf.org> Subject: Reading software for PG users Date: Sat, 11 Mar 2006 14:54:49 -0800 Delivered-To: gbnewby@pglaf.org Hello Gregory: I represent a reading software company named Rocket Reader that produces a reading tool many users of Project Gutenberg will be very interested in using. Rocket Reader not only improves reading speed and comprehension, but readers will find it very useful in pacing their reading as well. There are two modules in the software that I use for this purpose: Speed Training and Grouping Training. Speed Training flashes groups of words from the imported text on a single line. It brings the words to your eyes. The rate can be set at the desired pace and ramped up as one reads through the text. Grouping Training begins with the text covered, then reveals it in groups of words, line-by-line. The speed and groups size can be set by the reader. I would like to discuss some arrangement to make Rocket Reader available to Project Gutenberg users. Perhaps, we could donate money to the cause as people begin using Rocket Reader. I also believe that more people would use Project Gutenberg if a reading tool like Rocket Reader were available to them. You can download a free 45-day trial at https://www.rocketreader.com/school/trial_aplus.html. I am more than happy to answer any of your questions. John Burgess Ed Tech Consultant 561-889-6585 ----- End forwarded message ----- From gbnewby at pglaf.org Sat Mar 11 22:42:05 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Sat Mar 11 22:42:06 2006 Subject: [gutvol-d] eBooks on slashdot today In-Reply-To: <7.0.1.0.2.20060311003844.03737d60@baechler.net> References: <20060311000702.GA22305@pglaf.org> <7.0.1.0.2.20060311003844.03737d60@baechler.net> Message-ID: <20060312064205.GC16884@pglaf.org> On Sat, Mar 11, 2006 at 12:43:00AM -0800, Tony Baechler wrote: > My reasons are 3 and 4 below. I'm blind so I'm very interested in > electronic books. I read them almost exclusively. However, as much > as I like PG, I get tired of only reading pre-1923 books so I look > elsewhere. I am very greatful to the people who are getting 1950's > science fiction cleared. I assume this falls under rule 6? I'm > greatful, those books aren't generally available except from PG. Yes, the Sci Fi from 1923-1963 falls under our Rule 6. The new HOWTO is under testing, at http://copy.pglaf.org . Thanks to Greg Weeks for being one of the pioneers with Rule 6! There at least 1 million books published from 1923-1963 that were not renewed, and are therefore public domain in the US. This is a huge number, and I hope PG can make a dent in it. However, the risk of erroneously claiming public domain on a copyrighted item is higher, so we're trying to start cautiously. -- Greg From hyphen at hyphenologist.co.uk Sat Mar 11 23:27:54 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sat Mar 11 23:28:07 2006 Subject: [gutvol-d] Fwd: Reading software for PG users In-Reply-To: <20060312063750.GB16884@pglaf.org> References: <20060312063750.GB16884@pglaf.org> Message-ID: <p3j712dhu88v3fijjnppnj5c93vj9t6v76@4ax.com> On Sat, 11 Mar 2006 22:37:50 -0800, Greg Newby <gbnewby@pglaf.org> wrote: |Some info about speed reading software, below, for anyone |interested: | |----- Forwarded message from John Burgess <ad123@ix.netcom.com> ----- | |From: "John Burgess" <ad123@ix.netcom.com> |To: <gbnewby@pglaf.org> |Subject: Reading software for PG users |Date: Sat, 11 Mar 2006 14:54:49 -0800 |Delivered-To: gbnewby@pglaf.org | |Hello Gregory: | |I represent a reading software company named Rocket Reader that produces a |reading tool many users of Project Gutenberg will be very interested in |using. | |Rocket Reader not only improves reading speed and comprehension, but readers |will find it very useful in pacing their reading as well. There are two |modules in the software that I use for this purpose: Speed Training and |Grouping Training. Speed Training flashes groups of words from the imported |text on a single line. It brings the words to your eyes. The rate can be |set at the desired pace and ramped up as one reads through the text. |Grouping Training begins with the text covered, then reveals it in groups of |words, line-by-line. The speed and groups size can be set by the reader. | |I would like to discuss some arrangement to make Rocket Reader available to |Project Gutenberg users. Perhaps, we could donate money to the cause as |people begin using Rocket Reader. I also believe that more people would use |Project Gutenberg if a reading tool like Rocket Reader were available to |them. | |You can download a free 45-day trial at |https://www.rocketreader.com/school/trial_aplus.html. I bought something like that made in plastic <mumble> years ago. Never found it much use :-( Gave up using it without being able to prove any increase in reading speed. -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights. From Bowerbird at aol.com Sun Mar 12 01:45:44 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Mar 12 01:45:57 2006 Subject: [gutvol-d] google and the translation thing Message-ID: <c0.38e4a5a5.314547c8@aol.com> michael said: > Methodologies are continually being upset > by those who find some other way to do an > expensive [time or money] function for an > infinitessimal amount of the original. well, i think we're both on the same page... except you're reading it and i'm writing it... :+) (in other words, if one is interested in the actual upsetting of methodologies, then one pays some attention to them. otherwise, one waits until they play out.) google is upsetting the methodologies here. and you are counting that machine translation will become up to snuff sooner or later, and you're not interesting in the interim period. both positions are equally fine to hold... > If not, then why talk in such generalities i don't think i'm talking about "generalities" at all. in the current case, google is the entity doing it -- or so it has been reported, whether true or not -- and keith is the entity saying "it can't be done"... (well, he said "not in the next 100 or so years".) and in the other case i've mentioned -- the "dispute" between me and my "detractors" on this listserve -- there are no "generalities" either. we spent 2 years going back and forth at each other, so the positions are well-staked-out in the archives if you're curious. > It only matters when you get to the point of > ending the debating and actually doing something > the outside world can see and work with. right. except when you're playing poker, the object is to win as much money as possible with the hands you win, and to lose as little as possible with the ones that you lose. and that means you don't always show all of your cards right away... google ain't showing all their cards. and i ain't showing all mine either... but i'm starting to show _some_. so we're past the point where any of this matters any more, in the matter of me vs. this list, since pudding is being served... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060312/383b478d/attachment.html From hart at pglaf.org Sun Mar 12 14:14:31 2006 From: hart at pglaf.org (Michael Hart) Date: Sun Mar 12 14:14:32 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <c0.38e4a5a5.314547c8@aol.com> References: <c0.38e4a5a5.314547c8@aol.com> Message-ID: <Pine.LNX.4.60.0603121355070.12337@pglaf.org> On Sun, 12 Mar 2006 Bowerbird@aol.com wrote: > michael said: >> Methodologies are continually being upset >> by those who find some other way to do an >> expensive [time or money] function for an >> infinitessimal amount of the original. > > well, i think we're both on the same page... > except you're reading it and i'm writing it... :+) If you only wrote as much and as well as you think, we might be in a much better place. > (in other words, if one is interested in > the actual upsetting of methodologies, > then one pays some attention to them. > otherwise, one waits until they play out.) No, one need not know the methodologies in use to come up with better ones. Only the reverse engineering types rely on this sort of logic. Innovation, when it comes, usually comes from a source well outside current methodologies, the rest is just incrementalism. > google is upsetting the methodologies here. I guess you weren't aware of Golden Bow and the others who preceded them. I've been talking about Machine Translation since before there even was a Google. Don't you remember? One of the most difficult things about speaking with you is your apparent lack of memory/attn. This is what is most likely to get you placed on the spam list. > and you are counting that machine translation > will become up to snuff sooner or later, and > you're not interesting in the interim period. > both positions are equally fine to hold... Again you seem to have not paid attention. . . . _I_ have been promoting the interim phases as good enough for people to work with, while YOU have been saying that only perfection is enough, or close to it. Again, you are just asking to be ignored by any of the people who actually TRY to follow your words. >> If not, then why talk in such generalities > > i don't think i'm talking about "generalities" at all. > > in the current case, google is the entity doing it -- > or so it has been reported, whether true or not -- > and keith is the entity saying "it can't be done"... > (well, he said "not in the next 100 or so years".) I'm sticking with my original prediction: By the time we have put a sigifican dent in public domain books that are available, 10-20 million eBooks, the next big thing after OCR will be MT. . .AND. . .this will all start to take place in the public eye by 2020. Just in case you try to misinterpret that. . .14 years. > and in the other case i've mentioned -- the "dispute" > between me and my "detractors" on this listserve -- > there are no "generalities" either. we spent 2 years > going back and forth at each other, so the positions > are well-staked-out in the archives if you're curious. Sadly to say, I have read the vast majority of your archived messages, and have no desire to again. Obviously not even YOU think they are worth quoting, or you would have. >> It only matters when you get to the point of >> ending the debating and actually doing something >> the outside world can see and work with. > > right. If only you said what you meant, and meant what you said. Back to Alice. > except when you're playing poker, > the object is to win as much money > as possible with the hands you win, > and to lose as little as possible with > the ones that you lose. This is NOT a GAME, and MONEY is NOT the OBJECT. Again I refer you to Jon Noring, you have more in common than you would like to think. > and that means you don't always > show all of your cards right away... As above, stop PLAYING, start WORKING. Remember your physics lessons? It's not WORK if you don't MOVE something and then KEEP IT THERE. > google ain't showing all their cards. Sadly to say, I expected more of you than of Google. > and i ain't showing all mine either... Sadly to say. . . . No one can see or build on your work. You might have been a giant for someone to stand on. "If I have seen further, it is because I have stood on the shoulders of giants." Newton > but i'm starting to show _some_. Sorry, strip-tease is not acceptable. > so we're past the point where > any of this matters any more, > in the matter of me vs. this list, > since pudding is being served... And that is why you will likkely continue to be ignored. > -bowerbird > mh From schultzk at uni-trier.de Mon Mar 13 00:32:11 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Mon Mar 13 00:32:18 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <111.5cbf1cc9.3143187e@aol.com> References: <111.5cbf1cc9.3143187e@aol.com> Message-ID: <17512C8C-F012-4025-96FB-561B3A4576A4@uni-trier.de> Hi, Am 10.03.2006 um 18:59 schrieb Bowerbird@aol.com: > keith said: > > Just the opposite is the case. > > Believe me as a computer linguist. > > i believe that the computer linguists > have not been able to solve the problem. Exactly. > > i also believe that google's research lab > _will_ be able to solve it. i doubt they have > "solved" it yet, and i'm sure when they do, > their "solution" won't be "perfect enough" > for the computer linguists, but nonetheless... If coputers linguists have not solved the problems in 20 years, google probably will not either ;-)) They might, but very unlikely. > > > > What has happened. Vaporware and results. > > It simply does not work. Language can not be > > sucessfully model. Languages are regularly formed, > > nor well formed. > > and here's a great example of why it won't be "perfect". > just in the sentences quoted above: there should be a > question-mark after "happened"; there seems to be a > missing adjective before "results";' "successfully" is not > spelled correctly. and there seems to be a missing word > between "regularly" and "formed"; yet despite all these > shortcomings, i know exactly what you meant to say... > > (and i don't mean to be picking on you if english is not > your first language. i only speak one language, so i am > the last person to criticize anyone else on that dimension. Ouch! I am very sorry. Please excuse me. I had alot of work to do long hours last week(,) and people in and out of the office. I knew I had alot of booboos in my post. > the point is that human beings are very good at resolving > the ambiguity that results from incomplete information, > and we probably can't reasonably expect that of machines. > but it is simply not that case that ambiguity permeates > _every_aspect_ of language; clarity is not impossible.) > > > > All AI projects so far have failed > > and failure has been admitted. > > yes it has been, yet deep blue can still beat > all but the best of the world's grandmasters... Gottcha ,-)))) Big blue is not AI it is brute force. I ould be glad to dicuss this one. Directly with you, if you care to. This would be OT. > > if you give up on teaching a machine "meaning", > and concentrate on giving it enough rules that > give the correct results most of the time, you can > get very close to finishing the job you want done. That has been tried in some AI projects, and failed! > > of course, this approach is considered "a trick" > by the artificial-intelligence people, whose aim > was to "teach meaning" rather than solve a task, > but that's why those artificial-intelligence people > have been such a failure themselves... This is getting OT, too. But, The reason they are failing is due to the pardigm that language is meaning. Humans when resolving language(understanding) and more so translating it use moren than thier knowledge or the language to solve these tasks. > > > > When I see a OCR system that just uses raw results, > > then I will bow my head in recognition of true achieve meant. > > a perfect example of what i just said: > the objective is to get accurate o.c.r., > by whatever means necessary, and > _not_ to limit yourself to "raw results". Yet, i order to get better over all result, we need better "raw results" OCR has come a long way since they use dictionary. Adding a DB with phrasal information will bring along another 2 %, but the costs of the other side would be about 50% in resources. Sure cheaper computers, memory, availibity of google will help. Yet it is not the holy grail. Also, as a OT example. How long have we been waiting for the 3 liter car(3 liters per 100km). Well, it has be here since the 80s. A engineer had modify a VW rabbit(just the form of the pistons) and it only need 1 Gallon per 62 miles!! Money rules. ( O.K. very off topic) > > if doing some voodoo gave better o.c.r., > we would do it. this isn't some kind of > "intellectual challenge" where we find it > necessary to tie our hands behind our back; > it is a practical job that needs to be done... Exactly, my point. Things, work for most simple every day tasks, but .... Keith. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/1cfd7c05/attachment.html From schultzk at uni-trier.de Mon Mar 13 00:39:09 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Mon Mar 13 00:39:13 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <244.8584560.31447b41@aol.com> References: <244.8584560.31447b41@aol.com> Message-ID: <C04E20CD-EABF-45EB-9330-6EE51C64AE52@uni-trier.de> Hi There, Am 11.03.2006 um 20:13 schrieb Bowerbird@aol.com: > michael said: > > This is all just nitpicking and progress is being made, > > that's all that counts, not now many think ENUF progress. > > um, it's certainly not "nitpicking". I do not feel he is nitpicking > > perhaps the fact that "progress is being made" > might be "all that counts" to _you_, michael, but > maybe something else counts to someone else... I agree that we must talk about methods. I had jumped in because the method had already been tested. I have worked with it myself and also some modifications there of. [snip, snip] > > > "Those doing the impossible > > should not be interrupted > > by those who say impossible." > > Ancient Chinese Proverb > > but the ones who say it is impossible > will keep on trying to interrupt them... I do not say, impossible. I say, highly improbable. I agree though with Micheal, that we ought to take this somewhere else!! As it is growing very OT to PG. Keith. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/2b594d4a/attachment.html From schultzk at uni-trier.de Mon Mar 13 01:03:04 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Mon Mar 13 01:03:09 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <c0.38e4a5a5.314547c8@aol.com> References: <c0.38e4a5a5.314547c8@aol.com> Message-ID: <8FB6C99B-2708-4DF1-81DA-54B7C791B0C9@uni-trier.de> Hi Again, I will come back in. Am 12.03.2006 um 10:45 schrieb Bowerbird@aol.com: > michael said: > > Methodologies are continually being upset > > by those who find some other way to do an > > expensive [time or money] function for an > > infinitessimal amount of the original. My point was that it IS NOT A NEW METHOLOGY OR NEW METHOD!! > > well, i think we're both on the same page... > except you're reading it and i'm writing it... :+) > > (in other words, if one is interested in > the actual upsetting of methodologies, > then one pays some attention to them. > otherwise, one waits until they play out.) > > google is upsetting the methodologies here. > and you are counting that machine translation > will become up to snuff sooner or later, and > you're not interesting in the interim period. > both positions are equally fine to hold... Actually, there is a very excellent transltion system out there already. SYSTRANS. But, what is availibable to the public you can forget. It uses grammar models, lexica and a lot more vodoo. They claim 95-99% out of the box. What is its draw back. It needs a hell of a lot of computing power. Even works with voice. Do not ask me what it costs either. > > > > If not, then why talk in such generalities > > i don't think i'm talking about "generalities" at all. > > in the current case, google is the entity doing it -- > or so it has been reported, whether true or not -- > and keith is the entity saying "it can't be done"... > (well, he said "not in the next 100 or so years".) As I have mentioned before the method is not new. It will give you acceptable results for the average joe. It will not work for PG. > > and in the other case i've mentioned -- the "dispute" > between me and my "detractors" on this listserve -- > there are no "generalities" either. we spent 2 years > going back and forth at each other, so the positions > are well-staked-out in the archives if you're curious. > > > > It only matters when you get to the point of > > ending the debating and actually doing something > > the outside world can see and work with. > > right. > > except when you're playing poker, > the object is to win as much money > as possible with the hands you win, > and to lose as little as possible with > the ones that you lose. > > and that means you don't always > show all of your cards right away... > > google ain't showing all their cards. > > and i ain't showing all mine either... This reminds me of my first semester in CL. Where the great inovators said my method is better than yours. I can do this you can not. You can do that, but I can do this. Na na nah nah! But, as Micheal said, we shall see if google will revelutionize the world of MT. I doubt it very much. Of course I could ask for my money back from the unversity. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/b41007f7/attachment-0001.html From schultzk at uni-trier.de Mon Mar 13 01:20:14 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Mon Mar 13 01:20:20 2006 Subject: [gutvol-d] google and the translation thing Message-ID: <6D5B4F9A-E710-4C0F-836E-72D887F9E050@uni-trier.de> Maybe one last word here. The EU use MT technologies to translate the bulk. Yet, the produced texts are still manually processed by humans to get it right. If the google method was so good the EU would not need translators since thier written texts as basically similar in all langauges. They are basically formal debates and legistative in form. Keith. From hyphen at hyphenologist.co.uk Mon Mar 13 02:23:24 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Mon Mar 13 02:23:36 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <6D5B4F9A-E710-4C0F-836E-72D887F9E050@uni-trier.de> References: <6D5B4F9A-E710-4C0F-836E-72D887F9E050@uni-trier.de> Message-ID: <u9ha125o3mqt0s372hft0qn46gdi4m5br9@4ax.com> On Mon, 13 Mar 2006 10:20:14 +0100, "Keith J. Schultz" <schultzk@uni-trier.de> wrote: |Maybe one last word here. | | | The EU use MT technologies to translate the bulk. Yet, the produced | texts are still manually processed by humans to get it right. | If the google method was so good the EU would not need translators | since thier written texts as basically similar in all langauges. |They are | basically formal debates and legistative in form. | | Keith. The EU MT technologies are specifically adjusted to work with the specialised language/subjects used by the EU for laws and political debates. Googles proposals are for general text, and therefor *much* *much* more demanding. IMO The Google proposals will never get better than a first pass, before a human does the job properly. I use Systran, the market leader, on occasion, and its translations are at best understandable. -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights. From schultzk at uni-trier.de Mon Mar 13 03:26:13 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Mon Mar 13 03:26:19 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <u9ha125o3mqt0s372hft0qn46gdi4m5br9@4ax.com> References: <6D5B4F9A-E710-4C0F-836E-72D887F9E050@uni-trier.de> <u9ha125o3mqt0s372hft0qn46gdi4m5br9@4ax.com> Message-ID: <E40A0F1F-360B-4202-9908-E50F369F2279@uni-trier.de> Hi There, This debate is becomming very tedious. Am 13.03.2006 um 11:23 schrieb Dave Fawthrop: > On Mon, 13 Mar 2006 10:20:14 +0100, "Keith J. Schultz" > <schultzk@uni-trier.de> wrote: > > |Maybe one last word here. > | > | > | The EU use MT technologies to translate the bulk. Yet, the produced > | texts are still manually processed by humans to get it right. > | If the google method was so good the EU would not need translators > | since thier written texts as basically similar in all langauges. > |They are > | basically formal debates and legistative in form. > | > | Keith. > > The EU MT technologies are specifically adjusted to work with the > specialised language/subjects used by the EU for laws and political > debates. > exactly. > Googles proposals are for general text, and therefor *much* *much* > more > demanding. More demanding, definately. If it does not even work for a specialized field, how do you expect it to work in a general text!?? I have been here, there and back again. > > IMO The Google proposals will never get better than a first pass, > before a > human does the job properly. Just what I saying. > > I use Systran, the market leader, on occasion, and its translations > are at > best understandable. Which product. Already, mentioned that the better products are not availible to the general public. Keith. From Bowerbird at aol.com Mon Mar 13 03:58:23 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Mar 13 03:58:28 2006 Subject: [gutvol-d] monday morning quarterback -- #02 Message-ID: <2b5.63d87a7.3146b85f@aol.com> week #2 of "m.m.q." - "monday morning quarterback" -- is up: > http://snowy.arsc.alaska.edu/bowerbird/mmq/mmq02.txt > http://groups.yahoo.com/group/bpsuper/message/4 this weeks topic is "what do we want our final text to look like?" *** 3 examples of "continuous proofreading": > http://www.greatamericannovel.com/mabie/mabiep001 > http://www.greatamericannovel.com/myant/myantc001 > http://www.greatamericannovel.com/tolbk/tolbkp001 and their underlying text-file masters: > http://www.greatamericannovel.com/mabie/mabie.zml > http://www.greatamericannovel.com/myant/myant.zml > http://www.greatamericannovel.com/tolbk/tolbk.zml comparison of "straight-out-of-ocr" and "final version" e-texts: > http://snowy.arsc.alaska.edu/bowerbird/myant/myant-ocr.txt > http://www.greatamericannovel.com/myant/myant.zml the error-report form now provides cross-links from each specific page to an overall "error-report page" for each book: > http://www.greatamericannovel.com/mabie/mabie-er.html > http://www.greatamericannovel.com/myant/myant-er.html > http://www.greatamericannovel.com/tolbk/tolbk-er.html -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/f9056fbf/attachment.html From Gutenberg9443 at aol.com Mon Mar 13 06:17:56 2006 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Mon Mar 13 06:18:16 2006 Subject: [gutvol-d] need volunteers in Dallas Message-ID: <241.885df10.3146d914@aol.com> Do we have two (or more) volunteers in Dallas who are tactful and fast on the uptake and totally free the week of July 1 through July 7 and can find the Hilton Anatole without getting lost? Reply directly to me--Gutenberg9443@aol.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/2e104154/attachment.html From sly at victoria.tc.ca Mon Mar 13 09:05:21 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Mon Mar 13 09:05:25 2006 Subject: [gutvol-d] PG#17945 Mark Twain: Tri Noveloj Message-ID: <Pine.GSO.4.58.0603130903440.25232@vtn1.victoria.tc.ca> One recently posted PG text (#17945 Mark Twain: Tri Noveloj) might cause some confusion to users regarding its copyright status. This is a contemperary Esperanto translation of three Mark Twain stories. I believe the translator, Edwin Grobe, has recently explicitly released these, and his other translations of American literature, into the public domain. However, this text contains the prominent statement "Copyright 1999", copied from the original printed book. Am I right in thinking this could lead to confusion? Andrew From greg at durendal.org Mon Mar 13 10:58:31 2006 From: greg at durendal.org (Greg Weeks) Date: Mon Mar 13 11:30:05 2006 Subject: [gutvol-d] PG#17945 Mark Twain: Tri Noveloj In-Reply-To: <Pine.GSO.4.58.0603130903440.25232@vtn1.victoria.tc.ca> References: <Pine.GSO.4.58.0603130903440.25232@vtn1.victoria.tc.ca> Message-ID: <Pine.LNX.4.63.0603131357260.7770@durendal.durendal.org> On Mon, 13 Mar 2006, Andrew Sly wrote: > However, this text contains the prominent statement "Copyright 1999", > copied from the original printed book. Am I right in thinking this > could lead to confusion? This seems to be pretty common in the rule 6 stuff I've been clearing. I think it's confusing to have an incorrect copyright statement in the PG version even when the original book had the statement. -- Greg Weeks http://durendal.org:8080/greg/ From Bowerbird at aol.com Mon Mar 13 12:14:28 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Mar 13 12:14:39 2006 Subject: [gutvol-d] periodic update Message-ID: <21d.9c6beac.31472ca4@aol.com> for what it's worth... neither "the secret garden" nor "swiss family robinson" have been updated with corrections to the errors that i listed in posts here... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/a8d73df5/attachment.html From hart at pglaf.org Mon Mar 13 12:21:16 2006 From: hart at pglaf.org (Michael Hart) Date: Mon Mar 13 12:21:18 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <E40A0F1F-360B-4202-9908-E50F369F2279@uni-trier.de> References: <6D5B4F9A-E710-4C0F-836E-72D887F9E050@uni-trier.de> <u9ha125o3mqt0s372hft0qn46gdi4m5br9@4ax.com> <E40A0F1F-360B-4202-9908-E50F369F2279@uni-trier.de> Message-ID: <Pine.LNX.4.60.0603131220310.3961@pglaf.org> On Mon, 13 Mar 2006, Keith J. Schultz wrote: > Hi There, > > This debate is becomming very tedious. Machine Translation is such an important issue that discussion should not be limited, especially here. Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg > > Am 13.03.2006 um 11:23 schrieb Dave Fawthrop: > >> On Mon, 13 Mar 2006 10:20:14 +0100, "Keith J. Schultz" >> <schultzk@uni-trier.de> wrote: >> >> |Maybe one last word here. >> | >> | >> | The EU use MT technologies to translate the bulk. Yet, the produced >> | texts are still manually processed by humans to get it right. >> | If the google method was so good the EU would not need translators >> | since thier written texts as basically similar in all langauges. >> |They are >> | basically formal debates and legistative in form. >> | >> | Keith. >> >> The EU MT technologies are specifically adjusted to work with the >> specialised language/subjects used by the EU for laws and political >> debates. >> > exactly. > >> Googles proposals are for general text, and therefor *much* *much* more >> demanding. > More demanding, definately. If it does not even work for a > specialized field, how do you expect it to work in a > general text!?? I have been here, there and back again. > >> >> IMO The Google proposals will never get better than a first pass, before a >> human does the job properly. > Just what I saying. >> >> I use Systran, the market leader, on occasion, and its translations are at >> best understandable. > Which product. Already, mentioned that the better products are not > availible to the general public. > > Keith. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Mon Mar 13 12:24:16 2006 From: hart at pglaf.org (Michael Hart) Date: Mon Mar 13 12:24:17 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <C04E20CD-EABF-45EB-9330-6EE51C64AE52@uni-trier.de> References: <244.8584560.31447b41@aol.com> <C04E20CD-EABF-45EB-9330-6EE51C64AE52@uni-trier.de> Message-ID: <Pine.LNX.4.60.0603131222570.3961@pglaf.org> On Mon, 13 Mar 2006, Keith J. Schultz wrote: > Hi There, > > Am 11.03.2006 um 20:13 schrieb Bowerbird@aol.com: > >> michael said: >> > This is all just nitpicking and progress is being made, >> > that's all that counts, not now many think ENUF progress. >> >> um, it's certainly not "nitpicking". > I do not feel he is nitpicking > >> >> perhaps the fact that "progress is being made" >> might be "all that counts" to _you_, michael, but >> maybe something else counts to someone else... > I agree that we must talk about methods. I had jumped in > because the method had already been tested. I have > worked with it myself and also some modifications there of. > [snip, snip] >> >> > "Those doing the impossible >> > should not be interrupted >> > by those who say impossible." >> > Ancient Chinese Proverb >> >> but the ones who say it is impossible >> will keep on trying to interrupt them... > I do not say, impossible. I say, highly improbable. > I agree though with Micheal, that we ought to take > this somewhere else!! As it is growing very OT to PG. > > Keith. Sorry, you must be agreeing with someone else, not me. _I_ think MT will become one of the most MAJOR topics of PG, and that we should do all we can to stay on top of MT items. Michael From marcello at perathoner.de Mon Mar 13 13:06:10 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon Mar 13 13:06:14 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <Pine.LNX.4.60.0603131222570.3961@pglaf.org> References: <244.8584560.31447b41@aol.com> <C04E20CD-EABF-45EB-9330-6EE51C64AE52@uni-trier.de> <Pine.LNX.4.60.0603131222570.3961@pglaf.org> Message-ID: <4415DEC2.7040201@perathoner.de> Michael Hart wrote: > _I_ think MT will become one of the most MAJOR topics of PG, > and that we should do all we can to stay on top of MT items. I think robots will become the major producers of ebooks for PG and thus we should stay ahead of robot technology. Being a visionary is easy if you have generic enough visions and you don't commit to a timeline. Give the world visions in 2006!!! -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Mon Mar 13 13:06:05 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon Mar 13 13:06:15 2006 Subject: [gutvol-d] periodic update In-Reply-To: <21d.9c6beac.31472ca4@aol.com> References: <21d.9c6beac.31472ca4@aol.com> Message-ID: <4415DEBD.6060809@perathoner.de> Bowerbird@aol.com wrote: > neither "the secret garden" > nor "swiss family robinson" > have been updated with > corrections to the errors > that i listed in posts here... I mailed a juicy list of errata to "Pride and Prejudice" to errata@pglaf.org and they got applied in less than 48 hours. Ergo: the errata team works fine. If your corrections still don't work, you have to look elsewhere to find the culprit. -- Marcello Perathoner webmaster@gutenberg.org From walter.van.holst at xs4all.nl Mon Mar 13 14:43:16 2006 From: walter.van.holst at xs4all.nl (Walter H. van Holst) Date: Mon Mar 13 14:56:09 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <u9ha125o3mqt0s372hft0qn46gdi4m5br9@4ax.com> References: <6D5B4F9A-E710-4C0F-836E-72D887F9E050@uni-trier.de> <u9ha125o3mqt0s372hft0qn46gdi4m5br9@4ax.com> Message-ID: <1142289796.3636.11.camel@God> On Mon, 2006-03-13 at 10:23 +0000, Dave Fawthrop wrote: > The EU MT technologies are specifically adjusted to work with the > specialised language/subjects used by the EU for laws and political > debates. So if Google manages to find a way to classify the domain of a text, it could use domain-specific MT to achieve the same results. I wouldn't be surprised if, using some fairly basic statistic methods, classifying the domain of a text would be trivial. Regards, Walter From Bowerbird at aol.com Mon Mar 13 16:03:07 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Mar 13 16:03:13 2006 Subject: [gutvol-d] periodic update Message-ID: <76.662d55b9.3147623b@aol.com> carlo said: > This is not the proper place to notify errors part of the experiment is to see how long it takes someone to practice the advice they give me, and send my error-reports to "the proper place"... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/0335bebe/attachment.html From joshua at hutchinson.net Mon Mar 13 20:12:49 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon Mar 13 20:11:13 2006 Subject: [gutvol-d] periodic update Message-ID: <20060314041249.2D3219EE8E@ws6-2.us4.outblaze.com> When you've been told repeatedly where the proper place is and you refuse to send them there ... why should anyone else do your work for you? > ----- Original Message ----- > From: Bowerbird@aol.com > > carlo said: > > This is not the proper place to notify errors > > part of the experiment is to see > how long it takes someone to > practice the advice they give me, > and send my error-reports to > "the proper place"... From Bowerbird at aol.com Mon Mar 13 20:25:39 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Mar 13 20:25:47 2006 Subject: [gutvol-d] periodic update Message-ID: <21e.9cf2f13.31479fc3@aol.com> joshua said: > why should anyone else do your work for you? it's not my job to send error-reports to a specific place. or indeed, even to prepare the things in the first place... and neither is it your job to forward error-reports on. or anyone else's job either, for that matter. granted... so i don't think anyone can be assessed "blame" here. i'm just curious about how long an error-report will be left laying around before _someone_ acts on it, even though it's _not_ their "job" to do so... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/6f7dbb44/attachment.html From ben.crowder at gmail.com Mon Mar 13 20:53:44 2006 From: ben.crowder at gmail.com (Ben Crowder) Date: Mon Mar 13 20:59:45 2006 Subject: [gutvol-d] periodic update In-Reply-To: <21e.9cf2f13.31479fc3@aol.com> References: <21e.9cf2f13.31479fc3@aol.com> Message-ID: <a6270b5d6e44da24e8f28bee646850c7@gmail.com> bowerbird said: > i'm just curious about how long an error-report > will be left laying around before _someone_ acts > on it, even though it's _not_ their "job" to do so... To what purpose? Waste time? We're not here to babysit. There's a place for error reports to be sent, and it's completely beside the point to leave them "laying around" waiting for someone else to pick them up. Do you leave your dirty laundry on the ground waiting for someone else to pick it up and put it away? Come on, let's be reasonable here. Better to use that time to further the cause and get more eBooks made (or send the error reports to the proper place so we can fix them and move on). I suspect these are wasted words, though, judging from your e-mails in the archive. ~sigh~ Ben -- Ben Crowder <ben.crowder@gmail.com> MSN: ben.crowder@gmail.com Website: http://www.blankslate.net/ Blog: http://www.topofthemountains.net/ From gbnewby at pglaf.org Mon Mar 13 22:14:41 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Mar 13 22:14:43 2006 Subject: [gutvol-d] periodic update In-Reply-To: <21d.9c6beac.31472ca4@aol.com> References: <21d.9c6beac.31472ca4@aol.com> Message-ID: <20060314061441.GD19944@pglaf.org> On Mon, Mar 13, 2006 at 03:14:28PM -0500, Bowerbird@aol.com wrote: > for what it's worth... > > neither "the secret garden" > nor "swiss family robinson" > have been updated with > corrections to the errors > that i listed in posts here... > > -bowerbird > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d None of the errata team subscribe to gutvol-d, so don't see your posts. I don't see your posts either, but saw your thread had been responded to. To report errors, see our procedure here: http://www.gutenberg.org/faq/R-26 Short version: email to errata@pglaf.org I don't know what error reports you're talking about, or from when, and am too lazy to go hunting. Please report 'em, and they'll be fixed. -- Greg From gbnewby at pglaf.org Mon Mar 13 22:19:29 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Mar 13 22:19:31 2006 Subject: [gutvol-d] PG#17945 Mark Twain: Tri Noveloj In-Reply-To: <Pine.LNX.4.63.0603131357260.7770@durendal.durendal.org> References: <Pine.GSO.4.58.0603130903440.25232@vtn1.victoria.tc.ca> <Pine.LNX.4.63.0603131357260.7770@durendal.durendal.org> Message-ID: <20060314061929.GE19944@pglaf.org> On Mon, Mar 13, 2006 at 01:58:31PM -0500, Greg Weeks wrote: > On Mon, 13 Mar 2006, Andrew Sly wrote: > > >However, this text contains the prominent statement "Copyright 1999", > >copied from the original printed book. Am I right in thinking this > >could lead to confusion? > > This seems to be pretty common in the rule 6 stuff I've been clearing. I > think it's confusing to have an incorrect copyright statement in the PG > version even when the original book had the statement. I agree with you, Greg. We had some discussion about this among the whitewashers team, and I know it's come up in some DP forum discussions. The PG policy is that it's the producer's choice whether to leave such info in. (We used to have a policy against including a transcription of the full title/verso page due to concerns about trademarked publishers' names, but we've since received legal advice this is not a major concern.) Personally, I'd most likely opt to add a note somewhere where an outdated copyright statement appears, reaffirming the public domain status of the PG eBook. -- Greg From sly at victoria.tc.ca Mon Mar 13 22:42:32 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Mon Mar 13 22:42:35 2006 Subject: [gutvol-d] PG#17945 Mark Twain: Tri Noveloj In-Reply-To: <20060314061929.GE19944@pglaf.org> References: <Pine.GSO.4.58.0603130903440.25232@vtn1.victoria.tc.ca> <Pine.LNX.4.63.0603131357260.7770@durendal.durendal.org> <20060314061929.GE19944@pglaf.org> Message-ID: <Pine.GSO.4.58.0603132236450.12957@vtn1.victoria.tc.ca> On Mon, 13 Mar 2006, Greg Newby wrote: > The PG policy is that it's the producer's choice whether > to leave such info in. (We used to have a policy against > including a transcription of the full title/verso page due > to concerns about trademarked publishers' names, but we've > since received legal advice this is not a major concern.) > > Personally, I'd most likely opt to add a note somewhere > where an outdated copyright statement appears, reaffirming > the public domain status of the PG eBook. > -- Greg > That seems like decent reasoning. This is the kind of information that some people will complain if you put it in, and some will complain if you leave it out. I don't have a problem with it being left in, but i think it is a good idea to have it identified in some way as information from the source text, which is not applicable to the PG transcription. Andrew From jeroen.mailinglist at bohol.ph Tue Mar 14 13:28:02 2006 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Tue Mar 14 13:47:58 2006 Subject: [gutvol-d] Producing texts for text-to-speech. In-Reply-To: <20060314061441.GD19944@pglaf.org> References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org> Message-ID: <44173562.4060705@bohol.ph> Hi All, I am studying the options for preparing ebooks for text-to-speech. Does anybody have experience with that and willing to share experience. I am looking at things like SSML, aural-CSS, and text-to-speech software. Any software that can support this? My intention is to add the relevant tags to my TEI master, and generate SSML from that, feed that to TTS software to obtain audio files (Ideally, I would only post the SSML, and let people regenerate the speech when needed). Any tools that can be advised? Things to consider are additional tags to disambiguate words with identical spelling (read and read; record and record, for example), and to help pronouncing dates, currency amounts, measures, abbreviations, etc. Issues I found is lack of support for things like aural CSS, expensive software, etc. Jeroen. From sly at victoria.tc.ca Tue Mar 14 13:55:58 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Mar 14 13:56:03 2006 Subject: [gutvol-d] Producing texts for text-to-speech. In-Reply-To: <44173562.4060705@bohol.ph> References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org> <44173562.4060705@bohol.ph> Message-ID: <Pine.GSO.4.58.0603141354540.22756@vtn1.victoria.tc.ca> In case you have not seen it yet, I'd suggest taking a look at DAISY: http://www.daisy.org/ Andrew On Tue, 14 Mar 2006, Jeroen Hellingman (Mailing List Account) wrote: > Hi All, > > I am studying the options for preparing ebooks for text-to-speech. Does > anybody have experience with that and willing to share experience. > > I am looking at things like SSML, aural-CSS, and text-to-speech > software. Any software that can support this? My intention is to add the > relevant tags to my TEI master, and generate SSML from that, feed that > to TTS software to obtain audio files (Ideally, I would only post the > SSML, and let people regenerate the speech when needed). Any tools that > can be advised? > > Things to consider are additional tags to disambiguate words with > identical spelling (read and read; record and record, for example), and > to help pronouncing dates, currency amounts, measures, abbreviations, etc. > > Issues I found is lack of support for things like aural CSS, expensive > software, etc. > > Jeroen. > From hart at pglaf.org Tue Mar 14 14:19:22 2006 From: hart at pglaf.org (Michael Hart) Date: Tue Mar 14 14:19:23 2006 Subject: [gutvol-d] periodic update In-Reply-To: <21e.9cf2f13.31479fc3@aol.com> References: <21e.9cf2f13.31479fc3@aol.com> Message-ID: <Pine.LNX.4.60.0603141416151.4132@pglaf.org> The buck stops here. Send error messages to me, and then keep after me to make sure they are followed up on. You can also report errors directly to: bugs@pglaf.org or send updated files to errata@pglaf.org but please continue to also send directly to me. Please resend to me if you don't hear from them within a few days. This is from an FAQ that I include in replies to error messages. Thanks! Michael S. Hart <hart@pobox.com> Project Gutenberg "*Ask Dr. Internet*" Executive Coordinator "*Internet User ~#100*" On Mon, 13 Mar 2006 Bowerbird@aol.com wrote: > joshua said: >> why should anyone else do your work for you? > > it's not my job to send error-reports to a specific place. > or indeed, even to prepare the things in the first place... > > and neither is it your job to forward error-reports on. > or anyone else's job either, for that matter. granted... > > so i don't think anyone can be assessed "blame" here. > > i'm just curious about how long an error-report > will be left laying around before _someone_ acts > on it, even though it's _not_ their "job" to do so... > > -bowerbird > From hart at pglaf.org Tue Mar 14 14:36:43 2006 From: hart at pglaf.org (Michael Hart) Date: Tue Mar 14 14:36:44 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <4415DEC2.7040201@perathoner.de> References: <244.8584560.31447b41@aol.com> <C04E20CD-EABF-45EB-9330-6EE51C64AE52@uni-trier.de> <Pine.LNX.4.60.0603131222570.3961@pglaf.org> <4415DEC2.7040201@perathoner.de> Message-ID: <Pine.LNX.4.60.0603141432270.4132@pglaf.org> On Mon, 13 Mar 2006, Marcello Perathoner wrote: > Michael Hart wrote: > >> _I_ think MT will become one of the most MAJOR topics of PG, >> and that we should do all we can to stay on top of MT items. > > I think robots will become the major producers of ebooks for PG and thus we > should stay ahead of robot technology. So far this is a bit to generic a comment to be taken seriously. Then again, I am not sure you MEANT it to be taken seriously. However, send it every decade, and you'll probably be taken more seriously each time. > Being a visionary is easy if you have generic enough visions and you don't > commit to a timeline. Only for the visionaries who do not insist on accomplishing their goals. I wonder how long it will be before we see other people out there committed to a lifetime career dedicated to the advancement of eBooks? > > > > Give the world visions in 2006!!! > > > -- > Marcello Perathoner > webmaster@gutenberg.org > From Bowerbird at aol.com Tue Mar 14 14:46:32 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Mar 14 14:46:39 2006 Subject: [gutvol-d] re: google and the translation thing Message-ID: <77.5751cfea.3148a1c8@aol.com> Skipped content of type multipart/alternative-------------- next part -------------- An embedded message was scrubbed... From: Michael Hart <hart@pglaf.org> Subject: Re: [gutvol-d] google and the translation thing Date: Tue, 14 Mar 2006 14:36:43 -0800 (PST) Size: 3421 Url: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060314/bb5db2c5/attachment.mht From hart at pglaf.org Tue Mar 14 14:58:56 2006 From: hart at pglaf.org (Michael Hart) Date: Tue Mar 14 14:58:57 2006 Subject: [gutvol-d] re: google and the translation thing In-Reply-To: <77.5751cfea.3148a1c8@aol.com> References: <77.5751cfea.3148a1c8@aol.com> Message-ID: <Pine.LNX.4.60.0603141455030.4132@pglaf.org> The truth is that I do NOT think it will be all that long. However, I have been rather disapppointed to see many major players, at least they SAID they were major players, in the eBook world simply vanish. Whap happened to: Lou Burnard, Oxford Text Archive Bob Hollander, Princeton/Rutgers? Michael Seaman?, U Virginia Not to mention all those people posing for the cameras when the big Google announcement hit? However, I think it is more likely, not less, that someone else should come along to relaceme me in time. ;) On Tue, 14 Mar 2006 Bowerbird@aol.com wrote: > michael said: >> I wonder how long it will be before we see >> other people out there committed to a lifetime >> career dedicated to the advancement of eBooks? > > i think you're unique, michael, and always will be. > > that's not to negate the _serious_ committment of > time and energy and love that _many_ people are > donating to the cause, such as juliet at d.p. among > others there, or nicholas hodson, or david harrada, > or the whitewashers, and a whole slew of us others. > > but none have been as instrumental as you... > > -bowerbird > From jeroen.mailinglist at bohol.ph Tue Mar 14 15:07:37 2006 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Tue Mar 14 15:02:46 2006 Subject: [gutvol-d] Producing texts for text-to-speech. In-Reply-To: <Pine.GSO.4.58.0603141354540.22756@vtn1.victoria.tc.ca> References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org> <44173562.4060705@bohol.ph> <Pine.GSO.4.58.0603141354540.22756@vtn1.victoria.tc.ca> Message-ID: <44174CB9.7040807@bohol.ph> That is certainly interesting, but seems not to be oriented towards computerized text-to-speech. What I am primarily looking at is methods to automatically read books, and markup that assists in doing so. Daisy appears to work from audio files, probably human spoken. Note that the tagging I have in mind may also help human readers to produce a human read spoken book. Jeroen. Andrew Sly wrote: >In case you have not seen it yet, I'd suggest taking a >look at DAISY: http://www.daisy.org/ > >Andrew > > > From grythumn at gmail.com Tue Mar 14 16:35:16 2006 From: grythumn at gmail.com (Robert Cicconetti) Date: Tue Mar 14 16:41:47 2006 Subject: [gutvol-d] Producing texts for text-to-speech. In-Reply-To: <44173562.4060705@bohol.ph> References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org> <44173562.4060705@bohol.ph> Message-ID: <15cfa2a50603141635k610ab610s950b4c67b3467a1@mail.gmail.com> I use the AT&T Natural Voice engine for most of my general fiction* conversion.. fairly resource intensive, but one of the better sounding voices. I keep a list of standard substitutions as I notice them. The engine does poorly on abbreviations and foreign loan words, and of course on heteronyms. Lead, axes, alternate, etc. You can specify alternate pronunciations in a phonetic language. Concatenated engines like Natural Voices, Cepstral, Neospeech and RealSpeak are limited in how much you can alter speed and timber before they get unusable.. NV tends to clip syllables at anything above roughly +1 or +2. Most of these engines are available via Nextup and other online retailers. Freeware engines such Festival tend to have somewhat lower out-of-the-box quality, but are more flexible (at least if you can tolerate LISP). In particular, in a synthesized TTS engine, you can turn up the speech speed much further before it becomes unintelligible, but it sometimes requires practice to understand. Synthesized speech compresses quite well with voice codecs.. if I'm not using an external MP3 player, I'll compress it with Speex at quality 4 or 5. R C *(I generate audiobooks from Webscriptions and Gutenberg for commute and other relative downtimes.) On 3/14/06, Jeroen Hellingman (Mailing List Account) < jeroen.mailinglist@bohol.ph> wrote: > > Hi All, > > I am studying the options for preparing ebooks for text-to-speech. Does > anybody have experience with that and willing to share experience. > > I am looking at things like SSML, aural-CSS, and text-to-speech > software. Any software that can support this? My intention is to add the > relevant tags to my TEI master, and generate SSML from that, feed that > to TTS software to obtain audio files (Ideally, I would only post the > SSML, and let people regenerate the speech when needed). Any tools that > can be advised? > > Things to consider are additional tags to disambiguate words with > identical spelling (read and read; record and record, for example), and > to help pronouncing dates, currency amounts, measures, abbreviations, etc. > > Issues I found is lack of support for things like aural CSS, expensive > software, etc. > > Jeroen. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060314/62e3c20f/attachment-0001.html From Bowerbird at aol.com Tue Mar 14 17:07:32 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Mar 14 17:07:39 2006 Subject: [gutvol-d] re: google and the translation thing Message-ID: <284.7380636.3148c2d4@aol.com> michael said: > However, I think it is more likely, not less, that > someone else should come along to replace me in time. at this point, with cyberspace well-established, and the sheer logic of electronic books (and all manner of other types of digital content) recognized by all, any people who "come along" are stepping into a _completely_ different river. so, um, no, they won't be "replacing" you... good thing, too, because i think one of you is quite enough, thank you very much... :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060314/e9f0408e/attachment.html From hyphen at hyphenologist.co.uk Tue Mar 14 23:44:24 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Tue Mar 14 23:44:37 2006 Subject: [gutvol-d] Producing texts for text-to-speech. In-Reply-To: <44173562.4060705@bohol.ph> References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org> <44173562.4060705@bohol.ph> Message-ID: <g7hf12p0m21la5oo5opgd6l632o8d6q97t@4ax.com> On Tue, 14 Mar 2006 22:28:02 +0100, "Jeroen Hellingman (Mailing List Account)" <jeroen.mailinglist@bohol.ph> wrote: |Hi All, | |I am studying the options for preparing ebooks for text-to-speech. Does |anybody have experience with that and willing to share experience. | |I am looking at things like SSML, aural-CSS, and text-to-speech |software. Any software that can support this? My intention is to add the |relevant tags to my TEI master, and generate SSML from that, feed that |to TTS software to obtain audio files (Ideally, I would only post the |SSML, and let people regenerate the speech when needed). Any tools that |can be advised? | |Things to consider are additional tags to disambiguate words with |identical spelling (read and read; record and record, for example), and |to help pronouncing dates, currency amounts, measures, abbreviations, etc. Not to mention the different forms of ?English? American, Queens English, Indian English, Strine, to mention but a few. An American voice would sound terrible after a whole book. -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights. From tb at baechler.net Tue Mar 14 23:46:54 2006 From: tb at baechler.net (Tony Baechler) Date: Tue Mar 14 23:46:22 2006 Subject: [gutvol-d] Producing texts for text-to-speech. In-Reply-To: <Pine.GSO.4.58.0603141354540.22756@vtn1.victoria.tc.ca> References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org> <44173562.4060705@bohol.ph> <Pine.GSO.4.58.0603141354540.22756@vtn1.victoria.tc.ca> Message-ID: <7.0.1.0.2.20060314234320.02b24960@baechler.net> Hi, That format is primarily for the blind. The format doesn't really do what is wanted. You can either have xml files which can be read by a screen reader and special software or mp3 recordings or both, but the speech isn't generated any differently than for plain text files. In fact, I download about 100 DAISY books per month and I always convert to plain text. The DAISY software I've used is better about handling new pages and navigation than plain text but the speech output is still the same. There is no way to do custom pronounciations or anything that I'm aware of. Also, that format is specifically designed for the blind and I doubt there is much mainstream support for it. Tools to convert tend to be expensive from what I've seen. At 01:55 PM 3/14/2006, you wrote: >In case you have not seen it yet, I'd suggest taking a >look at DAISY: http://www.daisy.org/ > >Andrew > >On Tue, 14 Mar 2006, Jeroen Hellingman (Mailing List Account) wrote: > > > Hi All, > > > > I am studying the options for preparing ebooks for text-to-speech. Does > > anybody have experience with that and willing to share experience. > > > > I am looking at things like SSML, aural-CSS, and text-to-speech > > software. Any software that can support this? My intention is to add the > > relevant tags to my TEI master, and generate SSML from that, feed that > > to TTS software to obtain audio files (Ideally, I would only post the > > SSML, and let people regenerate the speech when needed). Any tools that > > can be advised? > > > > Things to consider are additional tags to disambiguate words with > > identical spelling (read and read; record and record, for example), and > > to help pronouncing dates, currency amounts, measures, abbreviations, etc. > > > > Issues I found is lack of support for things like aural CSS, expensive > > software, etc. From Bowerbird at aol.com Wed Mar 15 16:52:30 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Mar 15 16:52:40 2006 Subject: [gutvol-d] well, this is interesting Message-ID: <2ea.34d74f4.314a10ce@aol.com> evidently, distributed proofreaders is "branching out" from project gutenberg: > http://www.solutiongrove.com/kmw/ctn/files/view/LINCTbiz_planKK_1.23.06.html it says: > Distributed Proofreaders, a well-regarded group of volunteers, will provide > public domain books. LibraryCity will contribute resources for DP to expand. sounds cozy... librarycity, one of the main organizations involved in this plan, has as its director david rothman, and i'm sure that jon noring is involved with it somehow as well... they're looking for "sponsors", suggesting "an annual fee of $1000", or even all the way up to $350,000, which buys you a "thank you" from within the browser of the one million of their clients you've sponsored... and this: > LibraryCity?s revenue will come from several sources. > It will partner with an Internet bookstore to > obtain large numbers of e-books from publishers > and to offer electronic books to libraries. > The model will be a mix of purchase, short-term rentals > and subscription fees. When libraries do not carry books, > patrons will have an opportunity to rent or purchase them > through the existing store and through the retail arm of > LibraryCity called BookTry.com. it continues: > LibraryCity and BookTry.com will help popularize the > OpenReader format and interactive software that OSoft will offer. > In return and also out of public-spiritedness, OSoft will agree > to donate a certain percentage of its earnings and/or revenue > to the Epie Institute, the 501(c)(3) pass-through, > for use with LibraryCity and other partners within LINCT. and then: > The above efficiencies and close relationship with OSoft, > a provider of interactive software that allows comments and > even blogs to be embedded within specific locations in books, > will enable LibraryCity to be more competitive against > such library-related companies as OverDrive.com. like the subject says, "interesting..." -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060315/a7820071/attachment.html From hyphen at hyphenologist.co.uk Thu Mar 16 00:04:24 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Thu Mar 16 00:04:37 2006 Subject: [gutvol-d] well, this is interesting In-Reply-To: <2ea.34d74f4.314a10ce@aol.com> References: <2ea.34d74f4.314a10ce@aol.com> Message-ID: <cs6i1254vvs12ree40ho3d1pnv19en39eh@4ax.com> On Wed, 15 Mar 2006 19:52:30 EST, Bowerbird@aol.com wrote: |evidently, distributed proofreaders is "branching out" from project |gutenberg: |> |http://www.solutiongrove.com/kmw/ctn/files/view/LINCTbiz_planKK_1.23.06.html | |it says: |> Distributed Proofreaders, a well-regarded group of volunteers, will |provide |> public domain books. LibraryCity will contribute resources for DP to |expand. | |sounds cozy... | |librarycity, one of the main organizations involved in this plan, has as its |director |david rothman, and i'm sure that jon noring is involved with it somehow as |well... | |they're looking for "sponsors", suggesting "an annual fee of $1000", |or even all the way up to $350,000, which buys you a "thank you" from |within the browser of the one million of their clients you've sponsored... | |and this: |> LibraryCity?s revenue will come from several sources. |> It will partner with an Internet bookstore to |> obtain large numbers of e-books from publishers |> and to offer electronic books to libraries. |> The model will be a mix of purchase, short-term rentals |> and subscription fees. When libraries do not carry books, |> patrons will have an opportunity to rent or purchase them |> through the existing store and through the retail arm of |> LibraryCity called BookTry.com. | |it continues: |> LibraryCity and BookTry.com will help popularize the |> OpenReader format and interactive software that OSoft will offer. |> In return and also out of public-spiritedness, OSoft will agree |> to donate a certain percentage of its earnings and/or revenue |> to the Epie Institute, the 501(c)(3) pass-through, |> for use with LibraryCity and other partners within LINCT. | |and then: |> The above efficiencies and close relationship with OSoft, |> a provider of interactive software that allows comments and |> even blogs to be embedded within specific locations in books, |> will enable LibraryCity to be more competitive against |> such library-related companies as OverDrive.com. | |like the subject says, "interesting..." *If* this happens I wonder how many volunteers DP will lose. -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights. From davedoty at hotmail.com Thu Mar 16 00:18:55 2006 From: davedoty at hotmail.com (Dave Doty) Date: Thu Mar 16 00:36:00 2006 Subject: [gutvol-d] well, this is interesting In-Reply-To: <cs6i1254vvs12ree40ho3d1pnv19en39eh@4ax.com> Message-ID: <BAY101-F14D77BF15C570E283B3897DFE70@phx.gbl> From: Dave Fawthrop <hyphen@hyphenologist.co.uk> >*If* this happens I wonder how many volunteers DP will lose. Why would they lose any? They give DP resources to expand, and use the books. Since they are already free to use the books, the only thing that would change is more financial resources for DP. It didn't say anything about exclusive use and even if they tried, well they admitted right there on the webpage that the books are public domain, so they wouldn't be able to keep PG or anyone else from using them. Dave Doty From hyphen at hyphenologist.co.uk Thu Mar 16 00:40:23 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Thu Mar 16 00:40:34 2006 Subject: [gutvol-d] well, this is interesting In-Reply-To: <BAY101-F14D77BF15C570E283B3897DFE70@phx.gbl> References: <cs6i1254vvs12ree40ho3d1pnv19en39eh@4ax.com> <BAY101-F14D77BF15C570E283B3897DFE70@phx.gbl> Message-ID: <us8i12le0oenfvkdmgga78qr9j4rm32qhv@4ax.com> On Thu, 16 Mar 2006 08:18:55 +0000, "Dave Doty" <davedoty@hotmail.com> wrote: |From: Dave Fawthrop <hyphen@hyphenologist.co.uk> | |>*If* this happens I wonder how many volunteers DP will lose. | |Why would they lose any? Because I do books for PG, Pro Bono Publico. Any dilution of this principle by association with commercial organisations would concern me. -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights. From schultzk at uni-trier.de Thu Mar 16 04:38:43 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Thu Mar 16 04:38:54 2006 Subject: [gutvol-d] Producing texts for text-to-speech. In-Reply-To: <g7hf12p0m21la5oo5opgd6l632o8d6q97t@4ax.com> References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org> <44173562.4060705@bohol.ph> <g7hf12p0m21la5oo5opgd6l632o8d6q97t@4ax.com> Message-ID: <95474F1D-CBF9-4340-B3C1-0B58D9740B16@uni-trier.de> Hi There, If you have a Mac it will read it for you. You can also customize the dictionary. There is also a programming interface if you really want high quality output, you can even create your own voices. I personally have not played with it. It has been around for a long time. Keith. Am 15.03.2006 um 08:44 schrieb Dave Fawthrop: > On Tue, 14 Mar 2006 22:28:02 +0100, "Jeroen Hellingman (Mailing List > Account)" <jeroen.mailinglist@bohol.ph> wrote: > > |Hi All, > | > |I am studying the options for preparing ebooks for text-to-speech. > Does > |anybody have experience with that and willing to share experience. > | > |I am looking at things like SSML, aural-CSS, and text-to-speech > |software. Any software that can support this? My intention is to > add the > |relevant tags to my TEI master, and generate SSML from that, feed > that > |to TTS software to obtain audio files (Ideally, I would only post the > |SSML, and let people regenerate the speech when needed). Any tools > that > |can be advised? > | > |Things to consider are additional tags to disambiguate words with > |identical spelling (read and read; record and record, for > example), and > |to help pronouncing dates, currency amounts, measures, > abbreviations, etc. > > Not to mention the different forms of ?English? American, Queens > English, > Indian English, Strine, to mention but a few. An American voice would > sound terrible after a whole book. > -- > Dave Fawthrop <dave hyphenologist co uk> > Freedom of Speech, Expression, Religion, and Democracy are > the keys to Civilization, together with legal acceptance of > Fundamental Human rights. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From joshua at hutchinson.net Thu Mar 16 05:59:56 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Mar 16 05:59:23 2006 Subject: [gutvol-d] well, this is interesting Message-ID: <20060316135956.D0C8F2F998@ws6-3.us4.outblaze.com> Not to be a smart-ass ... but you better stop now, Dave. Commercial publishers snarf PG stuff all the time. I bought a lovely two volume set of all the OZ books for my son last year. As we were reading it, I noticed some typos and such. On a hunch, I compared the typos to our files. They are snarfed PG text (and didn't even proof it again) and stripped the PG notices and printing a book. Personally, I don't have a problem with commercial interests using PG/DP stuff. As long as they don't try to claim an additional copyright (which they sometimes do) or leave the PG trademark in place and not pay us (which I've never actually seen). Josh > ----- Original Message ----- > From: "Dave Fawthrop" <hyphen@hyphenologist.co.uk> > To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> > Subject: Re: [gutvol-d] well, this is interesting > Date: Thu, 16 Mar 2006 08:40:23 +0000 > > > On Thu, 16 Mar 2006 08:18:55 +0000, "Dave Doty" <davedoty@hotmail.com> > wrote: > > |From: Dave Fawthrop <hyphen@hyphenologist.co.uk> > | > |>*If* this happens I wonder how many volunteers DP will lose. > | > |Why would they lose any? > > Because I do books for PG, Pro Bono Publico. > Any dilution of this principle by association with commercial organisations > would concern me. > -- > Dave Fawthrop <dave hyphenologist co uk> > Freedom of Speech, Expression, Religion, and Democracy are > the keys to Civilization, together with legal acceptance of > Fundamental Human rights. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From holden.mcgroin at dsl.pipex.com Thu Mar 16 07:57:11 2006 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Thu Mar 16 07:57:16 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <264BC548-43B0-40FF-900A-CBCFC8914FAF@uni-trier.de> References: <260.82f14a6.31414432@aol.com> <Pine.LNX.4.60.0603092055030.32091@pglaf.org> <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de> <1141986255.20173.15.camel@steve-mcqueen> <264BC548-43B0-40FF-900A-CBCFC8914FAF@uni-trier.de> Message-ID: <1142524632.14007.34.camel@steve-mcqueen> Hi! On Fri, 2006-03-10 at 12:33 +0100, Keith J. Schultz wrote: > Hello, > > Am 10.03.2006 um 11:24 schrieb Holden McGroin: > > > On Fri, 2006-03-10 at 10:32 +0100, Keith J. Schultz wrote: > >> text. Today, dictionaries are used to guess which words are > >> to be recognised. That is why the OCR systems today give us > >> better results if the original has DECENT quality!!! > > > >> The pattern recognition systems have not gotten better and > >> the dictionary trick takes the motivation away to > >> develop better OCR algorithms. > > > > I'm going to have to call bullshit here. As a researcher working in > > the > > field of document recognition, I've noticed tremendous improvements in > > OCR quality even just in the past five years. > Before you start to swear, read and understand! Maybe in the > development labs, but not for the non-high end user!!!! OCR results are improving across the board. One only has to compare Finereader 8, a mainstream OCR product, with version 5 or so to see the improvement in standard OCR packages over the last 5 years. Recognition quality improves (where there is room for improvement) and so does the range of documents which can be recognised. Each passing year brings improvements in quality for older, noisy and lower quality documents. Again, I stress that this is *real-world* improvement in mainstream OCR products. In your initial post, you stated that the "dictionary trick" takes away the motivation to develop better OCR algorithms. Yet, it is still an extremely active research subject. Perhaps you're not familiar with the research community around OCR but there are many major conferences, workshops and journals devoted entirely or mainly to the task of digitising documents. And of course, where do you think the improvements in mainstream OCR applications come from? Yesterday's innovation in the research lab forms the basis of new features in today's commercial OCR packages. Likewise, the work that's going on now in the lab will improve tomorrow's OCR packages. > We have not seen any improvements in the field for the past five > years!!! The improvements are mainly due to the use of dictionaries!! > Not the improvement of character recognition!! Most systems in the > field get their performance out of word recognition !!! Well, that's a nice statement to make since the vast majority of systems in the field are black-box commercial systems. How do you know where the performance comes from? I'm a researcher in the field. I attend conferences and read journals and I don't know much about the internals of ABBYY. Unsurprisingly, it's something they keep under close wraps. So all you really have is the fact that commercial (and research) OCR systems are improving and your unfounded assertion that the improvements are mainly due to dictionaries. > I did mean to say not there is no improvement in Optical > Character Recognition, but the improvment over the past > 10 years is minimal at most. When I see a OCR system that > just uses raw results, then I will bow my head in recognition > of true achieve meant. Furthermore, when the image processing > gets that far it will open up new possiblities in all kinds > of sciences. There are countless tools which can be used to improve OCR performance. Using dictionary lookups is just one tool in the box. OCR is improving using many different techniques. I've been observing improvements in many different areas over the last few years (as long as I've been in the area), including: - Improvements in low-level Image processing techniques - Improvements in feature extraction from characters - Improvements in character recognition based on those features If you don't like dictionary lookups, don't use them. Raw OCR performance is improving in the lab and in the marketplace and is already great for a large proportion of documents. I must apologise on behalf of the research community if you find the rate of progress to be inadequate. That said, if you don't like it, muck in. There are many research labs around the world working on improving OCR and related techniques and I'm sure they'd be glad to have someone as knowledgeable as yourself join. There are even a few Free Software / Open Source OCR systems which would gladly welcome any interested developers: Ocrad: http://www.gnu.org/software/ocrad/ocrad.html GOCR/JOCR: http://jocr.sourceforge.net/ ClaraOCR: http://www.geocities.com/claraocr/ Cheers, Holden From Bowerbird at aol.com Thu Mar 16 11:15:24 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Mar 16 11:15:36 2006 Subject: [gutvol-d] well, this is interesting Message-ID: <2da.47e050a.314b134c@aol.com> dave said: > *If* this happens I wonder how many volunteers DP will lose. well, i would guess an agreement has _already_ been forged, or else the d.p. name wouldn't have been used. but i don't see that it should cause the loss of many volunteers, because the aim remains to get e-texts out there. after all, anyone can take the e-texts from project gutenberg and do _whatever_they_want_ with them, just as long as they don't use the name of project gutenberg, right? that's what the public domain is all about. and yes, it is kind of sleazy to use public-domain content as the "free samples" to bring people through the door for your commercial content, but what would we do about that? hell, some of these guys sell the public-domain stuff as well! (and if they can sell what we give away, more power to 'em.) i think i've made it perfectly clear that i'm not a fan of either rothman or noring, and i have long observed they have been trying to get their mitts on project gutenberg, so this doesn't surprise me from their end, but it's curious d.p. played along. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060316/aff298c8/attachment-0001.html From marcello at perathoner.de Thu Mar 16 11:57:50 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Mar 16 11:57:57 2006 Subject: [gutvol-d] well, this is interesting In-Reply-To: <2ea.34d74f4.314a10ce@aol.com> References: <2ea.34d74f4.314a10ce@aol.com> Message-ID: <4419C33E.6020107@perathoner.de> Bowerbird@aol.com wrote: >> Distributed Proofreaders, a well-regarded group of volunteers, will >> provide public domain books. LibraryCity will contribute resources >> for DP to expand. LibraryCity / OSoft / openreader.org is just a bunch of self-referential entites trying to give some credibility to each other. Like google-spammers build link-farms these people are building organisation-farms. Nothing much to bother about. Norings results so far are failure-to-failure comparable to yours. > they're looking for "sponsors", suggesting "an annual fee of $1000", > or even all the way up to $350,000, which buys you a "thank you" from > within the browser of the one million of their clients you've > sponsored... This is worth a new thread ... -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Thu Mar 16 12:40:23 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Mar 16 12:40:30 2006 Subject: [gutvol-d] Sponsors Message-ID: <4419CD37.2000505@perathoner.de> From a different thread: >> they're looking for "sponsors", suggesting "an annual fee of $1000", >> or even all the way up to $350,000, which buys you a "thank you" from >> within the browser of the one million of their clients you've >> sponsored... I don't know about their millions of clients but the PG website is now ranked top 3000 at alexa.com and serving ~250K pages to ~50K hosts a day. We have a Google page-rank of 8. To get that spammers would feed their mothers to the Ravenous Bugblatter Beast of Traal. We could put an ad space at the top of every page. I'm thinking of text-only ads, no distracting images. We could cycle ads like this: Did you know that you can help producing ebooks investing just ten minutes a day? www.pgdp.net Sponsor PG and get your web site mentioned here. See: www.gutenberg.org/fundraising/sponsoring We thank the Curl Up and Dye hair parlor for their kind gift of $1000. www.curl-up-and-dye.com Do we want to do this? And what rules should we put in place? Is selling ads compatible with the non-for-profit status? Anybody out there with an internet marketing background to figure out what we could "charge" for this ad space? -- Marcello Perathoner webmaster@gutenberg.org From hart at pglaf.org Thu Mar 16 12:53:23 2006 From: hart at pglaf.org (Michael Hart) Date: Thu Mar 16 12:53:25 2006 Subject: [gutvol-d] OCR Trends, and Not: was Google Translation In-Reply-To: <1142524632.14007.34.camel@steve-mcqueen> References: <260.82f14a6.31414432@aol.com> <Pine.LNX.4.60.0603092055030.32091@pglaf.org> <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de> <1141986255.20173.15.camel@steve-mcqueen> <264BC548-43B0-40FF-900A-CBCFC8914FAF@uni-trier.de> <1142524632.14007.34.camel@steve-mcqueen> Message-ID: <Pine.LNX.4.60.0603161231580.20273@pglaf.org> The following messages give widely opposing points of view. The reason could be, as one stated, that the bottom of the line scanner and OCR combinations are not yet good enough, at least for that person's particular needs. My own observation is that it might simply be the wrong tool for the wrong application. We all see more and more features in calculators that are under $100, and even under $10, to the point where no one is really going to say the TI-84 has no improvements over the previous versions, even if you get it for $60, as the price was where I saw it last. If your applications are simple four function arithmetic, there isn't much point in comparing any new calculators-- they will all do what you want, and the hardware may be a more important aspect than the software. . . how long the calculator and/or the batteries will last, etc. To those who really need a supercomputer, no difference. The same is most likely true of scanners and OCR combos-- some improvements may not apply to what YOU are doing and others may be totally beyond any appplications you have a mind to be using. The same is true for all those different kinds of cheaper calculators out there. It sounds a little as if one person in this conversation, I didn't keep track of various portions and names, was an example of the person who says it does not matter at all, because none of them create perfect results. To this kind of person it doesn't matter how full a glass is getting, until that very last drop is added, then that glass becomes full, otherwise it is empty. The exact same thing has been said here and there via the error rate for eBooks. If a certain element of perfection is missing, then ebook value remains zero even though the paper book has errors. By the way, I saw what appeared to be a perfect scan/OCR, at least 10 years ago, perhaps 15, on the original Apple- Flatbed scanner. I forget the model and the OCR, but the demonstration certainly made me wake up to OCR more and I eventually talked Apple into giving me a Mac and scanner. Thanks Apple!!! Thanks Steve Cisler!!! More to the point about the current topic is what a user wants out of the hardware/software combination. If you don't do your homework when buying these, you are not likely to get what you want. However, and I stress this, the people in these messages are VERY likely, given their positions, to find salesmen and saleswomen who would be MORE than happy to show your people their products and answer questions. Just contact them. . .your report of their demonstration will multiply the effect of their work! This would probably be of great interest to us all. I wonder if the next time we have some kind of meeting-- should we invite some demonstrations??? Michael PS On the topic of calculator, I heard that even if it is not your thing to use something like Encarta, that the current version includes a calculator program that may be worth more than the cost of the entire Encarta. Anyone seen it? On Thu, 16 Mar 2006, Holden McGroin wrote: > Hi! > > On Fri, 2006-03-10 at 12:33 +0100, Keith J. Schultz wrote: >> Hello, >> >> Am 10.03.2006 um 11:24 schrieb Holden McGroin: >> >>> On Fri, 2006-03-10 at 10:32 +0100, Keith J. Schultz wrote: >>>> text. Today, dictionaries are used to guess which words are >>>> to be recognised. That is why the OCR systems today give us >>>> better results if the original has DECENT quality!!! >>> >>>> The pattern recognition systems have not gotten better and >>>> the dictionary trick takes the motivation away to >>>> develop better OCR algorithms. >>> >>> I'm going to have to call bullshit here. As a researcher working in >>> the >>> field of document recognition, I've noticed tremendous improvements in >>> OCR quality even just in the past five years. >> Before you start to swear, read and understand! Maybe in the >> development labs, but not for the non-high end user!!!! > > OCR results are improving across the board. One only has to compare > Finereader 8, a mainstream OCR product, with version 5 or so to see the > improvement in standard OCR packages over the last 5 years. Recognition > quality improves (where there is room for improvement) and so does the > range of documents which can be recognised. Each passing year brings > improvements in quality for older, noisy and lower quality documents. > Again, I stress that this is *real-world* improvement in mainstream OCR > products. > > In your initial post, you stated that the "dictionary trick" takes away > the motivation to develop better OCR algorithms. Yet, it is still an > extremely active research subject. Perhaps you're not familiar with the > research community around OCR but there are many major conferences, > workshops and journals devoted entirely or mainly to the task of > digitising documents. > > And of course, where do you think the improvements in mainstream OCR > applications come from? Yesterday's innovation in the research lab forms > the basis of new features in today's commercial OCR packages. Likewise, > the work that's going on now in the lab will improve tomorrow's OCR > packages. > >> We have not seen any improvements in the field for the past five >> years!!! The improvements are mainly due to the use of dictionaries!! >> Not the improvement of character recognition!! Most systems in the >> field get their performance out of word recognition !!! > > Well, that's a nice statement to make since the vast majority of systems > in the field are black-box commercial systems. How do you know where the > performance comes from? I'm a researcher in the field. I attend > conferences and read journals and I don't know much about the internals > of ABBYY. Unsurprisingly, it's something they keep under close wraps. > > So all you really have is the fact that commercial (and research) OCR > systems are improving and your unfounded assertion that the improvements > are mainly due to dictionaries. > >> I did mean to say not there is no improvement in Optical >> Character Recognition, but the improvment over the past >> 10 years is minimal at most. When I see a OCR system that >> just uses raw results, then I will bow my head in recognition >> of true achieve meant. Furthermore, when the image processing >> gets that far it will open up new possiblities in all kinds >> of sciences. > > There are countless tools which can be used to improve OCR performance. > Using dictionary lookups is just one tool in the box. OCR is improving > using many different techniques. I've been observing improvements in > many different areas over the last few years (as long as I've been in > the area), including: > > - Improvements in low-level Image processing techniques > - Improvements in feature extraction from characters > - Improvements in character recognition based on those features > > If you don't like dictionary lookups, don't use them. Raw OCR > performance is improving in the lab and in the marketplace and is > already great for a large proportion of documents. I must apologise on > behalf of the research community if you find the rate of progress to be > inadequate. > > That said, if you don't like it, muck in. There are many research labs > around the world working on improving OCR and related techniques and I'm > sure they'd be glad to have someone as knowledgeable as yourself join. > There are even a few Free Software / Open Source OCR systems which would > gladly welcome any interested developers: > > Ocrad: http://www.gnu.org/software/ocrad/ocrad.html > GOCR/JOCR: http://jocr.sourceforge.net/ > ClaraOCR: http://www.geocities.com/claraocr/ > > > Cheers, > Holden > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From brandon.galbraith at gmail.com Thu Mar 16 13:06:21 2006 From: brandon.galbraith at gmail.com (Brandon Galbraith) Date: Thu Mar 16 13:06:23 2006 Subject: [gutvol-d] Sponsors In-Reply-To: <4419CD37.2000505@perathoner.de> References: <4419CD37.2000505@perathoner.de> Message-ID: <366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com> In most cases, you charged based on "impressions", as in how many people see the ad. There are plenty of open source banner ad systems out there that could be modified to fit the need (including a price scale for the amount of ads purchased). Then again, Google AdSense could just be used =) *ducks* -brandon On 3/16/06, Marcello Perathoner <marcello@perathoner.de> wrote: > > From a different thread: > > >> they're looking for "sponsors", suggesting "an annual fee of $1000", > >> or even all the way up to $350,000, which buys you a "thank you" from > >> within the browser of the one million of their clients you've > >> sponsored... > > I don't know about their millions of clients but the PG website is now > ranked top 3000 at alexa.com and serving ~250K pages to ~50K hosts a > day. We have a Google page-rank of 8. To get that spammers would feed > their mothers to the Ravenous Bugblatter Beast of Traal. > > > We could put an ad space at the top of every page. I'm thinking of > text-only ads, no distracting images. We could cycle ads like this: > > Did you know that you can help producing ebooks investing > just ten minutes a day? www.pgdp.net > > Sponsor PG and get your web site mentioned here. > See: www.gutenberg.org/fundraising/sponsoring > > We thank the Curl Up and Dye hair parlor for their > kind gift of $1000. www.curl-up-and-dye.com > > > Do we want to do this? And what rules should we put in place? > > Is selling ads compatible with the non-for-profit status? > > Anybody out there with an internet marketing background to figure out > what we could "charge" for this ad space? > > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -- Brandon Galbraith Email: brandon.galbraith@gmail.com AIM: brandong00 Voice: 630.400.6992 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060316/7773b94e/attachment.html From hart at pglaf.org Thu Mar 16 13:19:26 2006 From: hart at pglaf.org (Michael Hart) Date: Thu Mar 16 13:19:28 2006 Subject: [gutvol-d] well, this is interesting In-Reply-To: <20060316135956.D0C8F2F998@ws6-3.us4.outblaze.com> References: <20060316135956.D0C8F2F998@ws6-3.us4.outblaze.com> Message-ID: <Pine.LNX.4.60.0603161319010.20273@pglaf.org> John, wanna tell us the brand name of the Oz set, how much it was, etc? mh On Thu, 16 Mar 2006, Joshua Hutchinson wrote: > Not to be a smart-ass ... but you better stop now, Dave. Commercial publishers snarf PG stuff all the time. > > I bought a lovely two volume set of all the OZ books for my son last year. As we were reading it, I noticed some typos and such. On a hunch, I compared the typos to our files. They are snarfed PG text (and didn't even proof it again) and stripped the PG notices and printing a book. > > Personally, I don't have a problem with commercial interests using PG/DP stuff. As long as they don't try to claim an additional copyright (which they sometimes do) or leave the PG trademark in place and not pay us (which I've never actually seen). > > Josh > >> ----- Original Message ----- >> From: "Dave Fawthrop" <hyphen@hyphenologist.co.uk> >> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> >> Subject: Re: [gutvol-d] well, this is interesting >> Date: Thu, 16 Mar 2006 08:40:23 +0000 >> >> >> On Thu, 16 Mar 2006 08:18:55 +0000, "Dave Doty" <davedoty@hotmail.com> >> wrote: >> >> |From: Dave Fawthrop <hyphen@hyphenologist.co.uk> >> | >> |>*If* this happens I wonder how many volunteers DP will lose. >> | >> |Why would they lose any? >> >> Because I do books for PG, Pro Bono Publico. >> Any dilution of this principle by association with commercial organisations >> would concern me. >> -- >> Dave Fawthrop <dave hyphenologist co uk> >> Freedom of Speech, Expression, Religion, and Democracy are >> the keys to Civilization, together with legal acceptance of >> Fundamental Human rights. >> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d@lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > >> > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hyphen at hyphenologist.co.uk Thu Mar 16 14:08:19 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Thu Mar 16 14:08:31 2006 Subject: [gutvol-d] well, this is interesting In-Reply-To: <BAY101-F14D77BF15C570E283B3897DFE70@phx.gbl> References: <cs6i1254vvs12ree40ho3d1pnv19en39eh@4ax.com> <BAY101-F14D77BF15C570E283B3897DFE70@phx.gbl> Message-ID: <78oj129bgqglu075e12lg9c3e5fbim9ufv@4ax.com> On Thu, 16 Mar 2006 08:18:55 +0000, "Dave Doty" <davedoty@hotmail.com> wrote: |From: Dave Fawthrop <hyphen@hyphenologist.co.uk> | |>*If* this happens I wonder how many volunteers DP will lose. | |Why would they lose any? They give DP resources to expand, and use the |books. Since they are already free to use the books, the only thing that |would change is more financial resources for DP. It didn't say anything |about exclusive use and even if they tried, well they admitted right there |on the webpage that the books are public domain, so they wouldn't be able to |keep PG or anyone else from using them. In the UK "He who pays the Piper calls the tune." Does this not happen in the USA? -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights. From Bowerbird at aol.com Thu Mar 16 14:10:45 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Mar 16 14:11:02 2006 Subject: [gutvol-d] Sponsors Message-ID: <302.c71630.314b3c65@aol.com> marcello said: > Do we want to do this? no! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060316/c1aad754/attachment.html From donovan at abs.net Thu Mar 16 15:38:56 2006 From: donovan at abs.net (D Garcia) Date: Thu Mar 16 15:39:08 2006 Subject: [gutvol-d] OCR Trends, and Not: was Google Translation In-Reply-To: <Pine.LNX.4.60.0603161231580.20273@pglaf.org> References: <260.82f14a6.31414432@aol.com> <1142524632.14007.34.camel@steve-mcqueen> <Pine.LNX.4.60.0603161231580.20273@pglaf.org> Message-ID: <200603161838.57005.donovan@abs.net> On Thursday 16 March 2006 03:53 pm, Michael Hart wrote: (a lot of things, but I wanted to keep the thread separated. Hi, Michael!) > On Thu, 16 Mar 2006, Holden wrote: > > There are even a few Free Software / Open Source OCR systems which would > > gladly welcome any interested developers: > > > > Ocrad: http://www.gnu.org/software/ocrad/ocrad.html > > GOCR/JOCR: http://jocr.sourceforge.net/ > > ClaraOCR: http://www.geocities.com/claraocr/ I'm not a researcher in the field, but I have mucked in on ocrad (which is a single developer project), and managed to get two minor patches accepted. Frankly, though, each of those packages uses very different approaches and native internal formats, and mostly rely on simpler models to recognize characters. ocrad almost exclusively depends on feature recognition in b/w and has a very simplistic confidence model. The others I don't recall details of off the top of my head, but I believe one of them was trying to use feature recognition plus same-page similarity modeling. I don't believe any of them use the "dictionary trick" and they all pretty much fail on merged and broken characters. From black-box observation, FR seems to start with feature recognition, and uses similarity, curve reconstruction, adaptive thresholding, and even outline tracing for comparison/similarity against ttf font curves. I suspect they may also be using digraph and trigraph frequencies (at least for English) to improve their confidence scorings. Probably they also compare same-page word shapes to resolve cases where a character in a bounded word has low confidence value. At any rate, you'd have to be pretty damned dedicated and/or already fairly knowledgeable in several disciplines to contribute significantly to these projects. IMO, the single biggest improvement anyone could offer one of these open source projects is a better way to bound broken and merged characters. Feature recognition does a fairly good job up to that point. From hart at pglaf.org Thu Mar 16 21:12:25 2006 From: hart at pglaf.org (Michael Hart) Date: Thu Mar 16 21:12:27 2006 Subject: [gutvol-d] well, this is interesting In-Reply-To: <78oj129bgqglu075e12lg9c3e5fbim9ufv@4ax.com> References: <cs6i1254vvs12ree40ho3d1pnv19en39eh@4ax.com> <BAY101-F14D77BF15C570E283B3897DFE70@phx.gbl> <78oj129bgqglu075e12lg9c3e5fbim9ufv@4ax.com> Message-ID: <Pine.LNX.4.60.0603162108420.31395@pglaf.org> On Thu, 16 Mar 2006, Dave Fawthrop wrote: > On Thu, 16 Mar 2006 08:18:55 +0000, "Dave Doty" <davedoty@hotmail.com> > wrote: > > |From: Dave Fawthrop <hyphen@hyphenologist.co.uk> > | > |>*If* this happens I wonder how many volunteers DP will lose. > | > |Why would they lose any? They give DP resources to expand, and use the > |books. Since they are already free to use the books, the only thing that > |would change is more financial resources for DP. It didn't say anything > |about exclusive use and even if they tried, well they admitted right there > |on the webpage that the books are public domain, so they wouldn't be able to > |keep PG or anyone else from using them. > > In the UK "He who pays the Piper calls the tune." > Does this not happen in the USA? This is WHY Project Gutenberg has remained independent. This is HOW Project Gutenberg has remained independent. This is why I have been willing to work the last three years without any salary, so PG remains independent. Michael S. Hart Founder Project Gutenberg From schultzk at uni-trier.de Fri Mar 17 00:16:56 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Fri Mar 17 00:17:03 2006 Subject: [gutvol-d] google and the translation thing In-Reply-To: <1142524632.14007.34.camel@steve-mcqueen> References: <260.82f14a6.31414432@aol.com> <Pine.LNX.4.60.0603092055030.32091@pglaf.org> <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de> <1141986255.20173.15.camel@steve-mcqueen> <264BC548-43B0-40FF-900A-CBCFC8914FAF@uni-trier.de> <1142524632.14007.34.camel@steve-mcqueen> Message-ID: <759D61C7-5021-4A57-98B2-FB958E55B88E@uni-trier.de> Hi Holden, Thank you for your kind and sober reply. I did not intend to offend the OCR developers or say that their is no improvement. Basically, all comercial products use somekind of "vodoo" for better results. That is their perfect right. As a reseachers know that money is the motor to efficiently progress. Companies want the results yesterday and do not care if the improvements in their product is due to "vodoo" or improvement in the fundemental technology. I have had to study the technology and decided to use it or not. I generally do not as that results I required in my field take up to many resources for most of my goals. There are cheaper ways of getteng things done resource wise. OCR would be just one tool that I use and is just the beginning of what I want and need to do. It took me 20 years to own my own scanner, and believe me I did not get it for OCR. Still waiting and willing to wait for the quality I consider adequate. Believe me. I would finance OCR reseacher to get 99 % recognition out of the box if i could. I do know how hard it is to get money for research. One a side track here. Humans do not recognize Characters, but words and phrases. That is how we learn to read!!! regards Keith. Am 16.03.2006 um 16:57 schrieb Holden McGroin: > Hi! > > On Fri, 2006-03-10 at 12:33 +0100, Keith J. Schultz wrote: >> Hello, >> >> Am 10.03.2006 um 11:24 schrieb Holden McGroin: >> >>> On Fri, 2006-03-10 at 10:32 +0100, Keith J. Schultz wrote: >>>> text. Today, dictionaries are used to guess which words are >>>> to be recognised. That is why the OCR systems today give us >>>> better results if the original has DECENT quality!!! >>> >>>> The pattern recognition systems have not gotten better and >>>> the dictionary trick takes the motivation away to >>>> develop better OCR algorithms. >>> >>> I'm going to have to call bullshit here. As a researcher working in >>> the >>> field of document recognition, I've noticed tremendous >>> improvements in >>> OCR quality even just in the past five years. >> Before you start to swear, read and understand! Maybe in the >> development labs, but not for the non-high end user!!!! > > OCR results are improving across the board. One only has to compare > Finereader 8, a mainstream OCR product, with version 5 or so to see > the > improvement in standard OCR packages over the last 5 years. > Recognition > quality improves (where there is room for improvement) and so does the > range of documents which can be recognised. Each passing year brings > improvements in quality for older, noisy and lower quality documents. > Again, I stress that this is *real-world* improvement in mainstream > OCR > products. > > In your initial post, you stated that the "dictionary trick" takes > away > the motivation to develop better OCR algorithms. Yet, it is still an > extremely active research subject. Perhaps you're not familiar with > the > research community around OCR but there are many major conferences, > workshops and journals devoted entirely or mainly to the task of > digitising documents. > > And of course, where do you think the improvements in mainstream OCR > applications come from? Yesterday's innovation in the research lab > forms > the basis of new features in today's commercial OCR packages. > Likewise, > the work that's going on now in the lab will improve tomorrow's OCR > packages. > >> We have not seen any improvements in the field for the past five >> years!!! The improvements are mainly due to the use of dictionaries!! >> Not the improvement of character recognition!! Most systems in the >> field get their performance out of word recognition !!! > > Well, that's a nice statement to make since the vast majority of > systems > in the field are black-box commercial systems. How do you know > where the > performance comes from? I'm a researcher in the field. I attend > conferences and read journals and I don't know much about the > internals > of ABBYY. Unsurprisingly, it's something they keep under close wraps. > > So all you really have is the fact that commercial (and research) OCR > systems are improving and your unfounded assertion that the > improvements > are mainly due to dictionaries. > >> I did mean to say not there is no improvement in Optical >> Character Recognition, but the improvment over the past >> 10 years is minimal at most. When I see a OCR system that >> just uses raw results, then I will bow my head in recognition >> of true achieve meant. Furthermore, when the image processing >> gets that far it will open up new possiblities in all kinds >> of sciences. > > There are countless tools which can be used to improve OCR > performance. > Using dictionary lookups is just one tool in the box. OCR is improving > using many different techniques. I've been observing improvements in > many different areas over the last few years (as long as I've been in > the area), including: > > - Improvements in low-level Image processing techniques > - Improvements in feature extraction from characters > - Improvements in character recognition based on those features > > If you don't like dictionary lookups, don't use them. Raw OCR > performance is improving in the lab and in the marketplace and is > already great for a large proportion of documents. I must apologise on > behalf of the research community if you find the rate of progress > to be > inadequate. > > That said, if you don't like it, muck in. There are many research labs > around the world working on improving OCR and related techniques > and I'm > sure they'd be glad to have someone as knowledgeable as yourself join. > There are even a few Free Software / Open Source OCR systems which > would > gladly welcome any interested developers: > > Ocrad: http://www.gnu.org/software/ocrad/ocrad.html > GOCR/JOCR: http://jocr.sourceforge.net/ > ClaraOCR: http://www.geocities.com/claraocr/ > > > Cheers, > Holden > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From tb at baechler.net Fri Mar 17 01:53:56 2006 From: tb at baechler.net (Tony Baechler) Date: Fri Mar 17 01:53:20 2006 Subject: [gutvol-d] Producing texts for text-to-speech. In-Reply-To: <95474F1D-CBF9-4340-B3C1-0B58D9740B16@uni-trier.de> References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org> <44173562.4060705@bohol.ph> <g7hf12p0m21la5oo5opgd6l632o8d6q97t@4ax.com> <95474F1D-CBF9-4340-B3C1-0B58D9740B16@uni-trier.de> Message-ID: <7.0.1.0.2.20060317015139.033f83c0@baechler.net> Yes, and it isn't open source, cross platform, or especially useful to anyone without a Mac. I for one can't use it because I don't have a Mac, and even if I did the built-in screen reader isn't perfect so I'm not sure how accessible it is. Also he didn't necessarily want actual audio output, only a means of which files could be created so users could make their own output if they wish. See my previous discussion about DAISY. At 04:38 AM 3/16/2006, you wrote: > If you have a Mac it will read it for you. > You can also customize the dictionary. > There is also a programming interface if > you really want high quality output, you can even > create your own voices. > > I personally have not played with it. It has been > around for a long time. From schultzk at uni-trier.de Fri Mar 17 02:14:58 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Fri Mar 17 02:15:05 2006 Subject: [gutvol-d] Producing texts for text-to-speech. In-Reply-To: <7.0.1.0.2.20060317015139.033f83c0@baechler.net> References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org> <44173562.4060705@bohol.ph> <g7hf12p0m21la5oo5opgd6l632o8d6q97t@4ax.com> <95474F1D-CBF9-4340-B3C1-0B58D9740B16@uni-trier.de> <7.0.1.0.2.20060317015139.033f83c0@baechler.net> Message-ID: <9A4E28A5-CA29-4429-A756-D2CE259F5C2E@uni-trier.de> Hi There, He wanted a system for markup. The Mac system can do this. All needed information is avaible free of charge and can be used publically. The system is customizable. I do admitt it is not crossplatform, but it can be used as a starting place if one has access to a Mac. Furthermore, any encoding/markup he chosses will be bound to one program or the other. Also, it should be his decision if want I suggest will fit his needs or not. To my knowledge DAISY is not what he wants either!! flame someone else!!! Keith. Am 17.03.2006 um 10:53 schrieb Tony Baechler: > Yes, and it isn't open source, cross platform, or especially useful > to anyone without a Mac. I for one can't use it because I don't > have a Mac, and even if I did the built-in screen reader isn't > perfect so I'm not sure how accessible it is. Also he didn't > necessarily want actual audio output, only a means of which files > could be created so users could make their own output if they > wish. See my previous discussion about DAISY. > > At 04:38 AM 3/16/2006, you wrote: > >> If you have a Mac it will read it for you. >> You can also customize the dictionary. >> There is also a programming interface if >> you really want high quality output, you can even >> create your own voices. >> >> I personally have not played with it. It has been >> around for a long time. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From ajhaines at shaw.ca Fri Mar 17 12:11:05 2006 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Fri Mar 17 12:11:09 2006 Subject: [gutvol-d] Scanner recommendations? Message-ID: <000301c649fe$ecab54d0$6401a8c0@ahainesp2600> I'm planning to purchase a new scanner sometime in the next few weeks, and am looking for comments and recommendations. My current scanner is an HP Scanjet 5P, flatbed-type, SCSI-connected. It works OK, but it's difficult to scan books that won't lie flat when opened (no binding "valley"). It isn't supported under Windows XP, so I have to keep switching my system back and forth between Windows 2000 and WinXP (drive racks are so cool for this - no dual-boot fussing). My current scanning/OCR software is Abby Finereader Sprint 4.0, which uses HP's Deskscan V2.9 software to actually acquire the image to be OCR'ed. For my purposes this combo works just fine and gives excellent results. (I've played with Abby Professional 6.0, but found that it kept "getting in the way," so I've stuck with Sprint.) Having said all that, it's the fact that the scanner, plus the Deskscan software, is SLOW, taking about 45 seconds or so to go from the start of the "Preview" scan to the finished "Final" scan of a page pair. After that, the actual OCR and saving of the resulting text file takes only a few seconds. Including turning the page, a single scan takes about a minute, which I've decided is too slow to keep on with. (In fact, it's the time investment that's keeping me from doing some of the thicker books I have. Scanning is BORING, and I can't face an 800-page book with my current equipment.) So, I'm looking for a scanner that's considerably faster than my current one, will handle stiffly-bound books without having to force them flat, is USB-connected, and works under Windows XP. I've Googled "book scanner", but most of the hits have been for those big, professional scanners with lights, an overhead camera, automatic page turning, etc., that seem to cost in the $20K-$40K range - a definite overkill for my needs. This search also pointed me to the Plustek Opticbook 3600, which I found mentioned in this forum's March 2005 archives, in the "Scanning/OCR Tips" thread. Comments/recommendations on this or other candidate scanners? Thanks, Al From jeroen.mailinglist at bohol.ph Fri Mar 17 14:23:55 2006 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Fri Mar 17 15:18:06 2006 Subject: [gutvol-d] Producing texts for text-to-speech. In-Reply-To: <9A4E28A5-CA29-4429-A756-D2CE259F5C2E@uni-trier.de> References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org> <44173562.4060705@bohol.ph> <g7hf12p0m21la5oo5opgd6l632o8d6q97t@4ax.com> <95474F1D-CBF9-4340-B3C1-0B58D9740B16@uni-trier.de> <7.0.1.0.2.20060317015139.033f83c0@baechler.net> <9A4E28A5-CA29-4429-A756-D2CE259F5C2E@uni-trier.de> Message-ID: <441B36FB.5020201@bohol.ph> Hi People, What I wanted is a system of mark-up that has some value as standard. It should be future proof, well documented, and vendor neutral, such that I won't be forced to stay with one platform. I am currently looking at SSML, which is an XML based W3C standard, in combination with aural CSS stylesheets. I know there are very few tools for this, but will rather, for the time being, transform to temporary formats than compromise on non-standard formats for masters. I will actually integrate the information machine spoken books require in my master TEI documents with a small number of extentions, which again, I will document. For Project Gutenberg, we have to plan for the long term. Best regards, Jeroen. Keith J. Schultz wrote: > > He wanted a system for markup. The Mac system can do this. All > needed information is avaible > free of charge and can be used publically. The system is > customizable. I do admitt it is not > crossplatform, but it can be used as a starting place if one has > access to a Mac. From Bowerbird at aol.com Fri Mar 17 15:53:19 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Mar 17 15:53:24 2006 Subject: [gutvol-d] 18,000 Message-ID: <65.5731ec2b.314ca5ef@aol.com> congratulations to the project gutenberg volunteers for crossing the 18,000 marker on the mothership... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060317/aa371f8e/attachment.html From bruce at zuhause.org Sat Mar 18 07:11:05 2006 From: bruce at zuhause.org (Bruce Albrecht) Date: Sat Mar 18 07:11:10 2006 Subject: [gutvol-d] Scanner recommendations? In-Reply-To: <000301c649fe$ecab54d0$6401a8c0@ahainesp2600> References: <000301c649fe$ecab54d0$6401a8c0@ahainesp2600> Message-ID: <17436.8969.181849.80589@celery.zuhause.org> Al Haines (shaw) writes: > I'm planning to purchase a new scanner sometime in the next few weeks, and > am looking for comments and recommendations. I have an Opticbook 3600, and I strongly recommend it. It has three built-in scan modes (black&white, greyscale, and color) tied to scanner buttons, so you can change scan modes in the middle of a batch scan process. It has Abbyy Finereader Sprint 5.0, but I've never used it, since I had FR6 and upgraded to FR8. Since I'm often scanning more fragile works, I usually scan a page at a time, and I get about 6 pages per minute in 300 DPI black & white, 5 PPM in 300 DPI greyscale, and 1 or 2 PPM in 300 DPI color. I usually batch scan using the Opticbook's book pilot, do some post-processing, and then run it through FR. I bought mine for about $250 USD. Since I use the Opticbook's book pilot to batch scan, I set up a preview image once, and then I'm usually turning the page and repositioning during the time it sends the data to the computer. With the book pilot, you can set it up so that it automatically rotates the image for either scan a page at a time (rotating the book 180 degrees for each page), or CCW 90 degrees for double page scans. Of course, if you're scanning an oversize book, you'll want to be using page at a time anyway. If you're using FR, I think you're better off using the FR twain interface (which I don't use), because you can set it to scan the margins of your book (no preview mode, so you need to know or guess the size) and then scan multiple pages, with background OCRing. With the FR twain interface, you can't automatically switch between scan modes though. Right now I'm scanning a book with a lot of color illustrations, so it's really nice to press the grey button for my normal greyscale scan, and hit the color button when I hit a page with a colour illustration. You can also use it for double page scans, like a normal flat bed scanner. I usually don't because I think I get better results (albeit at half the scan speed) with single page mode. Downsides: It's not SANE compliant, so you have to use Windows (not a problem for you, but it's a show stopper for others). The usable scan starts about 3 mm from the edge of the scanner, so that if you have really narrow gutters on the book, you will still have problems. The depth of field is only so-so, so the curvature with thick book with narrow gutters will make the edge very dark, and sometimes unusable. Greyscale scanning is better than B&W scanning for this. From sly at victoria.tc.ca Sun Mar 19 00:09:13 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Sun Mar 19 00:09:19 2006 Subject: [gutvol-d] eGranary library Message-ID: <Pine.GSO.4.58.0603190005570.13771@vtn1.victoria.tc.ca> >From the description, this sounds like a very worth-while project which distributes digital resources to places they might otherwise not be accessible. Perhaps somewhere that would be good to have PG texts? http://www.egranary.org/ A good summary here: http://en.wikipedia.org/wiki/EGranary_Digital_Library Andrew From Bowerbird at aol.com Sun Mar 19 01:33:50 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Mar 19 01:33:58 2006 Subject: [gutvol-d] eGranary library Message-ID: <2fa.eb65f6.314e7f7e@aol.com> i would've been surprised if they weren't already familiar with p.g. ("amazed" would be a better term.) but of course they are quite familiar, as indicated on this page on their site: > http://www.widernet.org/digitallibrary/DigitalLibraries.htm so i'm quite certain _some_ of the library has long been included in their program. not sure how often they update, though, especially since the link on that webpage still points to the promo.net site... when michael's got his 35th anniversary d.v.d. ready, he should send it to them... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060319/c2a1da24/attachment.html From sly at victoria.tc.ca Sun Mar 19 09:22:12 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Sun Mar 19 09:22:15 2006 Subject: [gutvol-d] eGranary library In-Reply-To: <2fa.eb65f6.314e7f7e@aol.com> References: <2fa.eb65f6.314e7f7e@aol.com> Message-ID: <Pine.GSO.4.58.0603190919140.3359@vtn1.victoria.tc.ca> Thanks for finding that page. At the top, the list is prefaced with: In addition to the eGranary, we have compiled a list of scholastic journals and electronic resources that are available online via the worldwide web. >From this information, I can't tell if they distribute PG texts or not. Andrew On Sun, 19 Mar 2006 Bowerbird@aol.com wrote: > i would've been surprised if they > weren't already familiar with p.g. > ("amazed" would be a better term.) > > but of course they are quite familiar, > as indicated on this page on their site: > > http://www.widernet.org/digitallibrary/DigitalLibraries.htm > > so i'm quite certain _some_ of the library > has long been included in their program. > > not sure how often they update, though, > especially since the link on that webpage > still points to the promo.net site... > > when michael's got his 35th anniversary > d.v.d. ready, he should send it to them... > > -bowerbird > From Bowerbird at aol.com Sun Mar 19 10:03:42 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Mar 19 10:03:49 2006 Subject: [gutvol-d] eGranary library Message-ID: <2d3.51387e1.314ef6fe@aol.com> andrew said: > From this information, > I can't tell if they distribute PG texts or not. understandable, as they don't say so explicitly. nonetheless, since their mission is to provide electronic texts to places that have a hard time accessing the internet proper, with a server-box that holds a number of e-texts, i'd assume so... if you're filling a granary, you would definitely harvest one of the biggest fields around, not? but they might have harvested it once, and then not returned later as the field increased in size... so it would definitely be a good idea to send 'em the yield from the newest crop, once it's threshed. i'd assume michael plans on making a big noise when that d.v.d. is ready... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060319/7a289803/attachment.html From Bowerbird at aol.com Sun Mar 19 11:50:18 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Mar 19 11:50:22 2006 Subject: [gutvol-d] the secret garden demo Message-ID: <302.fdfa0d.314f0ffa@aol.com> i have put up my newest demo, "the secret garden": > http://www.greatamericannovel.com/sgfhb/sgfhbc001.html these demos are aimed at "continuous proofreading", but with this latest example i've also begun doing the _formatting_ expected for the purpose of pure reading. for example, the chapter-headers are now _displayed_ as headers (i.e., big and bold), and they are hotlinked back to the "hot table of contents" for easy navigation. in addition, the "table of contents" pages are hotlinked to the items listed. (these hotlinks are in addition to the ones on the specialized "table of contents" pages which are auto-generated and were always hotlinked.) i've also changed from internet-style block-paragraphs (with a blank line between paragraphs) to book-style indented paragraphs (with no blank line between 'em)... with this formatting, the auto-generated .html display is starting to look _highly_similar_ to the original pages... page-numbers are also colorized, to make 'em stand out. i've also included "chapter-jump" links, so the reader can jump from any chapter to the one before or the one after. finally, i included links on each page that allow the reader to conveniently switch from the 1-page display to 2-up... *** for years now, many people here wanted to refuse to accept my position that a plain-text format could serve as "master", so they "challenged" me to "prove it" with some "examples"... now that i am doing so, they have grown strangely silent. just as i knew they would. at any rate, i welcome any constructive criticism of my work. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060319/d9281e3c/attachment.html From gbnewby at pglaf.org Sun Mar 19 18:39:58 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Mar 19 18:40:00 2006 Subject: [gutvol-d] Sponsors In-Reply-To: <366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com> References: <4419CD37.2000505@perathoner.de> <366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com> Message-ID: <20060320023958.GG8882@pglaf.org> > On 3/16/06, Marcello Perathoner <marcello@perathoner.de> wrote: > > > > From a different thread: > > > > >> they're looking for "sponsors", suggesting "an annual fee of $1000", > > >> or even all the way up to $350,000, which buys you a "thank you" from > > >> within the browser of the one million of their clients you've > > >> sponsored... > > > > I don't know about their millions of clients but the PG website is now > > ranked top 3000 at alexa.com and serving ~250K pages to ~50K hosts a > > day. We have a Google page-rank of 8. To get that spammers would feed > > their mothers to the Ravenous Bugblatter Beast of Traal. > > > > > > We could put an ad space at the top of every page. I'm thinking of > > text-only ads, no distracting images. We could cycle ads like this: > > > > Did you know that you can help producing ebooks investing > > just ten minutes a day? www.pgdp.net > > > > Sponsor PG and get your web site mentioned here. > > See: www.gutenberg.org/fundraising/sponsoring > > > > We thank the Curl Up and Dye hair parlor for their > > kind gift of $1000. www.curl-up-and-dye.com > > > > > > Do we want to do this? And what rules should we put in place? I really like the idea of having rotating ads for DP, the various PG affiliates, ibiblio, and our other clearly-defined "partners" (at one level or another, see http://www.gutenberg.org/links). Maybe you could work on this, Marcello? No need to delay, and we already have a few banner graphics for both DP & PG. In fact, I remember we used to lead with an occasionally-rotating graphic. I'd be in favor of some clear criteria for other organizations. Including Wikipedia would be nice. Places like the Linux Fund. But it's hard to draw a line. For such organizations to submit artwork and request being added to our rotating banner would be a wonderful service that PG could provide. But those criteria are a little sticky... > > Is selling ads compatible with the non-for-profit status? No, we can't sell ad space at all. Neither PGLAF, nor ibiblio. This would need to just be free, and just for non-commercial messages. (They could be for commercial entities... but not "buy our stuff" messages.) But for not-for-profit, it's a wonderful idea, and on-mission for PG. In the US, we have "public service announcements." That's the type of model we could easily pursue. -- Greg From scott_bulkmail at productarchitect.com Sun Mar 19 19:07:42 2006 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Sun Mar 19 19:13:39 2006 Subject: [gutvol-d] Sponsors In-Reply-To: <20060320023958.GG8882@pglaf.org> References: <4419CD37.2000505@perathoner.de> <366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com> <20060320023958.GG8882@pglaf.org> Message-ID: <p06110416c043cc858530@[192.168.0.52]> >>Is selling ads compatible with the non-for-profit status? > >No, we can't sell ad space at all. Neither PGLAF, nor ibiblio. This >would need to just be free, and just for non-commercial messages. (They >could be for commercial entities... but not "buy our stuff" messages.) >But for not-for-profit, it's a wonderful idea, and on-mission for PG. Just so it's clear: although IANAL, I'm pretty sure that there's nothing to stop a non-profit IN GENERAL from selling ads (or products). So, the above must be restrictions specific to PGLAF and/or ibiblio. -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books From jon at noring.name Sun Mar 19 19:16:41 2006 From: jon at noring.name (Jon Noring) Date: Sun Mar 19 19:23:54 2006 Subject: [gutvol-d] the secret garden demo In-Reply-To: <302.fdfa0d.314f0ffa@aol.com> References: <302.fdfa0d.314f0ffa@aol.com> Message-ID: <1706161355.20060319201641@noring.name> Bowerbird wrote: > for years now, many people here wanted to refuse to accept > my position that a plain-text format could serve as "master", > so they "challenged" me to "prove it" with some "examples"... > > now that i am doing so, they have grown strangely silent. > > just as i knew they would. I doubt many people are even reading your messages, let alone visiting your demos. And if they look at your examples, they are probably yawning. It's not worth their time to even write a reply. So your explanation of the "silence" is probably a little off the mark. Jon From Bowerbird at aol.com Sun Mar 19 20:44:45 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Mar 19 20:45:03 2006 Subject: [gutvol-d] the secret garden demo Message-ID: <35b.14920a.314f8d3d@aol.com> ah, humor. humor is good. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060319/fdcede2a/attachment.html From sly at victoria.tc.ca Sun Mar 19 21:45:28 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Sun Mar 19 21:45:30 2006 Subject: [gutvol-d] Sponsors In-Reply-To: <20060320023958.GG8882@pglaf.org> References: <4419CD37.2000505@perathoner.de> <366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com> <20060320023958.GG8882@pglaf.org> Message-ID: <Pine.GSO.4.58.0603192135001.12712@vtn1.victoria.tc.ca> Is anyone here interested in helping to work out details of what we consider an "affiliated project"? I would like to do this as long as I have someone else to bounce ideas off of. I think it would be ideal to have a loose affiliation of projects with similar goals, particularly as there are an increasing number of websites out there focusing on individual languages that would be worth being centrally linked in one place. In the English-language wikipedia article on Project Gutenberg, there was some disagreement recently about what should be included in the list of "affiliated projects", partly because I think we don't really have a clear definition here. Although, as Greg mentions, the gray areas are where the challenge is. Andrew On Sun, 19 Mar 2006, Greg Newby wrote: > > I really like the idea of having rotating ads for DP, the various PG > affiliates, ibiblio, and our other clearly-defined "partners" (at one > level or another, see http://www.gutenberg.org/links). > > Maybe you could work on this, Marcello? No need to delay, and > we already have a few banner graphics for both DP & PG. In fact, > I remember we used to lead with an occasionally-rotating graphic. > > I'd be in favor of some clear criteria for other organizations. > Including Wikipedia would be nice. Places like the Linux Fund. But > it's hard to draw a line. For such organizations to submit artwork and > request being added to our rotating banner would be a wonderful service > that PG could provide. But those criteria are a little sticky... > From holden.mcgroin at dsl.pipex.com Mon Mar 20 06:04:20 2006 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Mon Mar 20 06:04:25 2006 Subject: [gutvol-d] Sponsors In-Reply-To: <20060320023958.GG8882@pglaf.org> References: <4419CD37.2000505@perathoner.de> <366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com> <20060320023958.GG8882@pglaf.org> Message-ID: <1142863461.26355.11.camel@steve-mcqueen> On Sun, 2006-03-19 at 18:39 -0800, Greg Newby wrote: > I'd be in favor of some clear criteria for other organizations. > Including Wikipedia would be nice. Places like the Linux Fund. But > it's hard to draw a line. For such organizations to submit artwork and > request being added to our rotating banner would be a wonderful service > that PG could provide. But those criteria are a little sticky... Hi! If I might, I'd like to suggest the fine folk at Ubuntu Linux ( http://www.ubuntu.com/ ). For those of you who are unfamiliar with Ubuntu, it's a project which is less than two years old. Their aim is to produce a refined operating system (based on Linux) which is not only free of cost but also free as in freedom and which is also usable by everybody in their native language. As I said, they're just two years old but due largely to the quality of their "product", they've become literally overnight one of the largest Linux distributions. On DistroWatch.com, they've been the number 1 Linux distribution for over a year, and by a considerable margin too. Personally, I've been running Ubuntu Linux for over a year as my main desktop and server operating system. It's truly a worthy replacement to Windows and -- best of all -- it's free in both senses. So, I think it really is a worthy project and one which has very similar goals to Project Gutenberg. Cheers, Holden From marcello at perathoner.de Mon Mar 20 09:34:39 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon Mar 20 09:34:44 2006 Subject: [gutvol-d] Sponsors In-Reply-To: <20060320023958.GG8882@pglaf.org> References: <4419CD37.2000505@perathoner.de> <366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com> <20060320023958.GG8882@pglaf.org> Message-ID: <441EE7AF.7070500@perathoner.de> Greg Newby wrote: > No, we can't sell ad space at all. Neither PGLAF, nor ibiblio. I'm not sure this is true: "501 (b) Tax on unrelated business income and certain other activities An organization exempt from taxation under subsection (a) shall be subject to tax to the extent provided in parts II, III, and VI of this subchapter, but (notwithstanding parts II, III, and VI of this subchapter) shall be considered an organization exempt from income taxes for the purpose of any law which refers to organizations exempt from income taxes." http://www.law.cornell.edu/uscode/html/uscode26/usc_sec_26_00000501----000-.html IANAL but this means to me that we would have to pay taxes on the ad revenues, but selling ads would not endanger our non-profit status. But the more interesting question is: if we just display standard "thank you" notices for donations received, without letting the donor choose the text, would this be considered "selling ads" or just being nice to our donors? -- Marcello Perathoner webmaster@gutenberg.org From scott_bulkmail at productarchitect.com Mon Mar 20 10:06:00 2006 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Mon Mar 20 10:07:27 2006 Subject: [gutvol-d] Sponsors In-Reply-To: <441EE7AF.7070500@perathoner.de> References: <4419CD37.2000505@perathoner.de> <366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com> <20060320023958.GG8882@pglaf.org> <441EE7AF.7070500@perathoner.de> Message-ID: <p0611041dc0449e7edeba@[192.168.0.52]> >IANAL but this means to me that we would have to pay taxes on the ad revenues, but selling ads would not endanger our non-profit status. Do public radio stations pay tax on their ad revenue? Do the Girl Scouts pay tax on their cookie sales? I doubt it (but I could well be wrong). PG operates a Web site; showing ads strikes me as more "related" than "unrelated". And, even if income tax is involved, that's hardly a show stopper. I would guess the ad revenue would easily cover the tax rate and an accountant. -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books From Bowerbird at aol.com Mon Mar 20 11:40:03 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Mar 20 11:40:14 2006 Subject: [gutvol-d] Sponsors Message-ID: <90.70f24a7b.31505f13@aol.com> scott said: > And, even if income tax is involved, > that's hardly a show stopper.? > I would guess the ad revenue > would easily cover the tax rate > and an accountant. why would p.g. want to go into the ad-selling business? isn't the internet permeated enough with sales pitches? even "thank you" notes for donors smells "fishy" to me... where _precisely_ is it you expect the proceeds would go? i think michael hart should be the #1 recipient, but i suspect he'd rather go without pay to have his project remain pure... indeed, isn't that pretty much what he just said?... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060320/aeab9b27/attachment.html From marcello at perathoner.de Mon Mar 20 11:54:14 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon Mar 20 11:54:22 2006 Subject: [gutvol-d] Sponsors In-Reply-To: <90.70f24a7b.31505f13@aol.com> References: <90.70f24a7b.31505f13@aol.com> Message-ID: <441F0866.9040100@perathoner.de> Bowerbird@aol.com wrote: > why would p.g. want to go into the ad-selling business? Why would PG want to collect donations? > where _precisely_ is it you expect the proceeds would go? The same places where the other donations go. > he'd rather go without pay to have his project remain pure... How can it be purer to "squeeze" people with little money than corporations with big money? -- Marcello Perathoner webmaster@gutenberg.org From brandon.galbraith at gmail.com Mon Mar 20 11:53:05 2006 From: brandon.galbraith at gmail.com (Brandon Galbraith) Date: Mon Mar 20 11:59:36 2006 Subject: [gutvol-d] Sponsors In-Reply-To: <90.70f24a7b.31505f13@aol.com> References: <90.70f24a7b.31505f13@aol.com> Message-ID: <366100670603201153yad99acdq967de231a4b772fb@mail.gmail.com> Because even though we have a huge amount of volunteers, you still need money to pay the bills? Some parts of the world runs on hopes and dreams, but the rest of it runs on cold, hard cash. -brandon On 3/20/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote: > > scott said: > > And, even if income tax is involved, > > that's hardly a show stopper. > > I would guess the ad revenue > > would easily cover the tax rate > > and an accountant. > > why would p.g. want to go into the ad-selling business? > isn't the internet permeated enough with sales pitches? > even "thank you" notes for donors smells "fishy" to me... > > where _precisely_ is it you expect the proceeds would go? > > i think michael hart should be the #1 recipient, but i suspect > he'd rather go without pay to have his project remain pure... > > indeed, isn't that pretty much what he just said?... > > -bowerbird > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > -- Brandon Galbraith Email: brandon.galbraith@gmail.com AIM: brandong00 Voice: 630.400.6992 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060320/85940336/attachment-0001.html From creeva at gmail.com Mon Mar 20 12:07:53 2006 From: creeva at gmail.com (Brent Gueth) Date: Mon Mar 20 12:13:35 2006 Subject: [gutvol-d] the secret garden demo In-Reply-To: <302.fdfa0d.314f0ffa@aol.com> Message-ID: <003201c64c59$f96fc590$6738a8c0@Corp.Symantec.Com> This is the argument that I agreed with you with months ago. I think that plain text should always be the master as it is easier to format for new devices with interopt with old. Of course now saying this and siding with you on it I'm going to get flamed to death with the old XML debate. But like before that's my .02. A Twi a day keeps the wookiee away. www.creeva.com _____ From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Bowerbird@aol.com Sent: Sunday, March 19, 2006 2:50 PM To: gutvol-d@lists.pglaf.org; Bowerbird@aol.com Subject: [gutvol-d] the secret garden demo i have put up my newest demo, "the secret garden": > http://www.greatamericannovel.com/sgfhb/sgfhbc001.html these demos are aimed at "continuous proofreading", but with this latest example i've also begun doing the _formatting_ expected for the purpose of pure reading. for example, the chapter-headers are now _displayed_ as headers (i.e., big and bold), and they are hotlinked back to the "hot table of contents" for easy navigation. in addition, the "table of contents" pages are hotlinked to the items listed. (these hotlinks are in addition to the ones on the specialized "table of contents" pages which are auto-generated and were always hotlinked.) i've also changed from internet-style block-paragraphs (with a blank line between paragraphs) to book-style indented paragraphs (with no blank line between 'em)... with this formatting, the auto-generated .html display is starting to look _highly_similar_ to the original pages... page-numbers are also colorized, to make 'em stand out. i've also included "chapter-jump" links, so the reader can jump from any chapter to the one before or the one after. finally, i included links on each page that allow the reader to conveniently switch from the 1-page display to 2-up... *** for years now, many people here wanted to refuse to accept my position that a plain-text format could serve as "master", so they "challenged" me to "prove it" with some "examples"... now that i am doing so, they have grown strangely silent. just as i knew they would. at any rate, i welcome any constructive criticism of my work. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060320/87a5b512/attachment.html From prosfilaes at gmail.com Mon Mar 20 12:23:42 2006 From: prosfilaes at gmail.com (David Starner) Date: Mon Mar 20 12:30:36 2006 Subject: [gutvol-d] the secret garden demo In-Reply-To: <003201c64c59$f96fc590$6738a8c0@Corp.Symantec.Com> References: <302.fdfa0d.314f0ffa@aol.com> <003201c64c59$f96fc590$6738a8c0@Corp.Symantec.Com> Message-ID: <6d99d1fd0603201223i3f5986d1x334d610a8b33d8b6@mail.gmail.com> On 3/20/06, Brent Gueth <creeva@gmail.com> wrote: > This is the argument that I agreed with you with months ago. I think that > plain text should always be the master as it is easier to format for new > devices with interopt with old. As you say in an rich-text email. All a "plain text" markup format makes easy is a half-assed conversion. If you want to actually convert it, you've got to break it down properly, and a standard XML reader makes that as easy, if not easier, then a custom-designed program to read an arbitrary "plain text" format. From Bowerbird at aol.com Mon Mar 20 17:49:58 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Mar 20 17:50:04 2006 Subject: [gutvol-d] so much to chat about, i'll have to go outline Message-ID: <2be.7172d97.3150b5c6@aol.com> so much to chat about, i'll have to go outline: 1. agility=ability for zen markup viewer-apps 2. sloppy thinking lumps donations with ads 3. #3 of "monday morning quarterback" is up 4. thanks for the support, brent, but stay low 5. i've just posted another demonstration-book *** 1. because the format of the z.m.l. format is so very transparent, it's easy for apps to grok it. and to manipulate it. and to slap it into output. thus the programs will become especially _agile_. 100k of script, 90k boilerplate, so you write 10k just do the specific job you want done right now, and you're debugging, with the whole wide world (if you want) to help you find and fix that mistake. the job gets done, and you're off to the next one... your xml programs, however, will be bloatware, with that heavy-markup requiring complex code that'll be a nightmare to try to modify on the fly. and thus improvements will be slow in coming... and, as usual, the race goes to the swift. the truth of the matter will be that the abundance of the _zml_ viewer-programs, not the _xml_ ones, plus their _agility_, will be the defining features that tips things _to_ me (and not _away_ as you suggest). *** 2. ok, i will explain how donations differ from ads. a donation is a _reward_ giving a stamp of affirmation signing a relationship whose past has _proven_ worth, a firm avowal of underlying root in a _gift_economy_. it says, "job well done, my friend, thank you very much." an ad-sale is an _exchange_ that sets an expectation that the relationship will _deliver_ worth in the future, and becomes a symbol of foundation based on _barter_. it says, "ok, i'd better get my money's worth out of this." to the greatest extent possible now, the world needs relations built on a cornerstone of _gift_, not _barter_. project gutenberg is one of the leading lights in that move to future that is gift-based, not barter-based... and besides, in an organization that runs so completely on volunteer labor, a little money would be a bad thing. a terrible thing. the only person who can reasonably expect _anything_ out of project gutenberg is michael, and even for him, it's only due to all the years he spent in the wilderness, not for being on a now-heavily-populated bandwagon. *** 3. issue #3 of "monday morning quarterback" is out. this one is short and sweet, focusing on just one point: ================================== each scan you make should have, in its filename, the _page-number_ of the page which it pictures. ================================== > http://groups.yahoo.com/group/bpsuper/message/7 > http://snowy.arsc.alaska.edu/bowerbird/mmq/mmq03.txt *** 4. while i appreciate your agreement with me, brent, there's no real need to speak. you'll only draw flames, and it's better just to let these lying dogs die peacefully. we're past the debate stage, anyway, and eating pudding. hey, i'm gonna use something like that as my slogan -- we _would_ eat our own dogfood, but we don't _make_ dogfood; we make _pudding_, and we _love_ to eat it! it's _good_! *** 5. another demo-book went up today, this one titled "the hacker manifesto", by mckenzie wark. wark is writing a new book in public, via a blog, a test of the institute for the future of the book: > http://www.futureofthebook.org/blog/ the u.r.l. of my demo is: > http://www.greatamericannovel.com/ahmmw/ahmmwc001.html i think i've forgotten to remind you so far that it's a better overall reading experience if you go into full-screen mode to do your reading. (or hide all the toolbars if that's all you can do.) not only does it remove unnecessary distraction, allowing you to immerse yourself in the material, but it also means the type and scans can be bigger. (no one should complain e-books are hard to read, because they can easily make e-books _easier_ to read than paper, just by making the type _bigger_.) also, if you hadn't noticed yet, clicking the image "turns the page" to the next page. (clicking the left image on the 2-up interface "turns back to" the preceding facing-pages spread in the book.) as i said before, on a fast pipe, where scans take less than a second or two to download, you can speed through a book fairly quickly. anyway, that's 5 demo-books up now: > http://www.greatamericannovel.com/mabie/mabiep001.html > http://www.greatamericannovel.com/myant/myantc001.html > http://www.greatamericannovel.com/tolbk/tolbkp001.html > http://www.greatamericannovel.com/sgfhb/sgfhbc001.html > http://www.greatamericannovel.com/ahmmw/ahmmwc001.html pudding is served. it's the beginning of the end for heavy markup... :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060320/b8093cfa/attachment.html From hyphen at hyphenologist.co.uk Tue Mar 21 00:34:59 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Tue Mar 21 00:35:13 2006 Subject: [gutvol-d] Lili Marlene In-Reply-To: <82.35f5d394.30eeecb0@aol.com> References: <82.35f5d394.30eeecb0@aol.com> Message-ID: <3odv12pag3e46nivas302gc8iu2p5feuec@4ax.com> I note that the original German words of Lili Marlene are now out of copyright in the USA, the English and French words, and the music, written in the 1940s, are unfortunately still in copyright in life plus 70 countries, I have not investigated the situation in life plus 50 countries http://history.sandiego.edu/gen/snd/lilymarlene.html >>>Written by German soldier Hans Leip in 1915, set to music by Norbert Schultze in 1938 as The Girl under the Lantern , recorded by Lale Andersen, broadcast by German Forces Radio but was quickly banned in Germany, broadcast daily by Radio Belgrade from Yugoslavia to the Afrika Korps in 1941 when Rommel indicated he liked it, adopted by the British Eighth Army as one of the favorite songs of World War II, sung on radio by Marlene Dietrich, recorded in English by Anne Sheldon in 1944. <<< As my German is almost non existant, I am perhaps the last person to make this into etext. Perhaps a German speaking volunteer would run with this. -- Dave Fawthrop <dave hyphenologist co uk> "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From sly at victoria.tc.ca Tue Mar 21 09:29:51 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Mar 21 09:29:53 2006 Subject: [gutvol-d] Lili Marlene In-Reply-To: <3odv12pag3e46nivas302gc8iu2p5feuec@4ax.com> References: <82.35f5d394.30eeecb0@aol.com> <3odv12pag3e46nivas302gc8iu2p5feuec@4ax.com> Message-ID: <Pine.GSO.4.58.0603210924370.8480@vtn1.victoria.tc.ca> In any event, the text of a single song is almost certainly too short for a Project Gutenberg text. See: http://www.gutenberg.org/faq/V-17 Perhaps this project might be a better home for it: http://www.recmusic.org/lieder/ Andrew On Tue, 21 Mar 2006, Dave Fawthrop wrote: > > I note that the original German words of Lili Marlene are now out of > copyright in the USA, the English and French words, and the music, written > in the 1940s, are unfortunately still in copyright in life plus 70 > countries, I have not investigated the situation in life plus 50 countries > > http://history.sandiego.edu/gen/snd/lilymarlene.html > >>>Written by German soldier Hans Leip in 1915, set to music by Norbert > Schultze in 1938 as The Girl under the Lantern , recorded by Lale Andersen, > broadcast by German Forces Radio but was quickly banned in Germany, > broadcast daily by Radio Belgrade from Yugoslavia to the Afrika Korps in > 1941 when Rommel indicated he liked it, adopted by the British Eighth Army > as one of the favorite songs of World War II, sung on radio by Marlene > Dietrich, recorded in English by Anne Sheldon in 1944. <<< > > As my German is almost non existant, I am perhaps the last person to make > this into etext. Perhaps a German speaking volunteer would run with this. > > From ian at babcockbrown.com Tue Mar 21 09:45:09 2006 From: ian at babcockbrown.com (Ian Stoba) Date: Tue Mar 21 10:21:02 2006 Subject: [gutvol-d] iRex ebook reader Message-ID: <1C09A92C-A1AE-497D-8D87-006F01B05DBC@babcockbrown.com> I came across this link today and did not remember seeing it discussed on the list: http://www.irextechnologies.com/shop/products/iliad.htm iRex (a Phillips spinoff) is preparing to launch an e-book reader. Technically it seems most similar to the Sony one, but apparently without the onerous DRM. Engadget is saying it may retail for ?650, or about the same as a station wagon full of used paperbacks from my local "Friends of the Library" sale. http://www.engadget.com/2006/03/19/irex-reveals-deets-on-its-iliad- ebook-reader/ This email message may contain information that is confidential and proprietary to Babcock & Brown or a third party. If you are not the intended recipient, please contact the sender and destroy the original and any copies of the original message. Babcock & Brown takes measures to protect the content of its communications. However, Babcock & Brown cannot guarantee that email messages will not be intercepted by third parties or that email messages will be free of errors or viruses. If you do not wish to receive any further e-mail from Babcock & Brown, please send an email to opt-out@babcockbrown.com. From Bowerbird at aol.com Tue Mar 21 17:13:04 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Mar 21 17:13:11 2006 Subject: [gutvol-d] perhaps someone should tell david rothman Message-ID: <2f2.1ad4467.3151fea0@aol.com> um, perhaps someone should tell david rothman that if he keeps talking about me over on the d.p. forums, i'll have to go over there and start posting again to clear up the record... and i don't think you d.p. people want me to do that, do you? no, of course you don't. so maybe someone should tell him... and for the record here, regarding "the relationship" between distributed proofreaders and librarycity, i quoted extensively in my post from the _webpage_ whose u.r.l. i listed at the top of my message, so it wasn't _me_ saying all of those things... and people who compounded those quotes with their own misinterpretations should answer for their own mistakes... *** i informed you about one of my newest demo-books: > http://www.greatamericannovel.com/sgfhb/sgfhbc001.html but i forgot to say that i used scans i got from d.p. so as to show people the quality of some of the scans that d.p. does. these scans were good enough for some acceptable o.c.r., presumably, because the final e-text as posted was good, but the scans are not good enough for reading purposes... this is not a criticism -- because that wasn't their intent -- but it does have bearing on those people who try to tell us the d.p. scans can be productively used for those purposes. many (if not most) of the scans that d.p. has in storage are simply not good enough for reading, even if they're cleaned. they're good enough to do "continuous proofreading", yes, but that's about all. if we _really_ will want to put their scans in some kind of "archive" for reading by end-users, then d.p. needs to set a new standard of quality for people scanning... anyway, since my other scan-sets are of very high quality, it was good to have a demo with a lesser-quality scan-set. but in general, i would not consider this level of quality to be of above the minimal level required for public posting... (and i remind people again that this was not its objective.) *** meanwhile, here's a morsel from carlo on the d.p. forums: > Perhaps here we need a new idea. > We need to be sure that the proofreading > is OK before applying the formatting. > This means that we have to check > the proofreading quality before F1, > and in case of need repeat a P round. hey, carlo, perhaps you need an even newer idea -- forgo the formatting rounds entirely for zen markup! (alright, it's the same old idea i suggested long ago...) > Then the project goes to the F rounds > or to an off-line formatting. when i suggested offline formatting "long ago", some d.p. people wanted to tar-and-feather me, suggesting i didn't know what "distributed" meant. well, um, yes, i certainly do, but pushing a whole book worth of scans out at an array of people so they can say "nope, no formatting on that page either..." is retarded... and doing it twice is _doubly_ retarded... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060321/c0c03a81/attachment.html From gbnewby at pglaf.org Wed Mar 22 08:51:07 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Wed Mar 22 08:51:08 2006 Subject: [gutvol-d] Sponsors In-Reply-To: <441F0866.9040100@perathoner.de> References: <90.70f24a7b.31505f13@aol.com> <441F0866.9040100@perathoner.de> Message-ID: <20060322165107.GA6734@pglaf.org> On Mon, Mar 20, 2006 at 08:54:14PM +0100, Marcello Perathoner wrote: > Bowerbird@aol.com wrote: > > >why would p.g. want to go into the ad-selling business? > > Why would PG want to collect donations? We spend our money on just a few things. Our fiscal year runs July - June, and we get an annual audit (which also costs money!) 1. CD/DVD giveaways. This was about $10000 in the prior fiscal year. We reimburse volunteers for media & mailing costs; many recipients choose to make a donation after getting their CD/DVD, so this project is largely self-sustaining. 2. States compliance. This will be reduced, since we no longer have enough planned income to justify it - but since 2001, we've tried to follow the often-onerous (sometimes easy) not-for-profit fundraising guidelines from all fifty US states. 3. Office management and related compliance & activities. This is the Wingates shared 1/4 time salary to open the mail, deal with our bank, occasionally field phone calls, and work on #2. 4. Buy books. We reimburse a few bookbuyers who channel into DP, at an average cost of < $1/book. 5. Support DP & PGLAF systems & hosting, occasional scanners & supplies. For example, hosting for DP's colocated server is about $1100/year. 6. Pay Michael. This hasn't happened in a few years, because his target salary is a lot more than all of the above...in order to pay Michael, we'd need him (or someone else) to do some successful fundraising. So, this is a theoretical budget item. We *always* encourage people to seek fundraising opportunities, because there are always some projects we'd like to grow (like the giveaways) or start. But such people should check with me before getting started, since there are some guidelines to follow (partially for our not-for-profit status, partially to keep "on mission"). There's no way we could ever pay for the hugely valuable volunteer labor, so the $50,000/year or so we've lived on for the past few years has been plenty to sustain core production activities. -- Greg From Bowerbird at aol.com Fri Mar 24 10:45:09 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Mar 24 10:45:20 2006 Subject: [gutvol-d] when they act surprised Message-ID: <361.812eaf.31559835@aol.com> it was a mere 3-6 months ago that i was informing people here that their scraping of the google books would cause google to become too conservative in displaying scans. it's annoying when people tune out warnings. but it is even _more_ annoying when they act _surprised_ when the consequences show up! >? ? http://groups.yahoo.com/group/ebook-community/message/25166 um, yes, bruce, google is being overly cautious. your scan-scraper script is one main reason why. and your catalog is _another_ main reason why... of not quite the magnitude of publisher suits, granted, but big enough to be "main" reasons. so take a look in the mirror, buddy. and we cannot ignore the fact that _scrapers_ are the ones currently spooking publisher fear. their "hackers-will-just-grab the-whole-book" nightmare is tinged in reality when they see you. so you can act all surprised if you want, upon discovering that your actions have consequences. but all the people who have been reading this list know that _i_told_you_so_.?? and you didn't listen... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060324/be558214/attachment.html From jon.ingram at gmail.com Fri Mar 24 12:16:53 2006 From: jon.ingram at gmail.com (Jon Ingram) Date: Fri Mar 24 12:24:03 2006 Subject: [gutvol-d] when they act surprised In-Reply-To: <361.812eaf.31559835@aol.com> References: <361.812eaf.31559835@aol.com> Message-ID: <4baf53720603241216m5965f200w5af1a23c856f871a@mail.gmail.com> On 3/24/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote: > it was a mere 3-6 months ago that i was > informing people here that their scraping > of the google books would cause google to > become too conservative in displaying scans. > > it's annoying when people tune out warnings. > > but it is even _more_ annoying when they act > _surprised_ when the consequences show up! I don't believe anyone has been mass-downloading books from Google's archive based on Bruce's index of their content, using Bruce's scraper or otherwise. Around a dozen DPers have claimed books to download, according to the list available at http://homepage.ntlworld.com/jenjonliz/jon/tia/google.html (note that the list isn't currently being maintained, because there's little demand at DP for new content scraped from any site at the moment, and several of us are in the initial stages of working on a more general database-driven system for claiming books from image providers) The number of claimed books is in the low hundreds, and most of these have not been downloaded, either because research has indicated that the books are already in PG, or because there was no need for them on DP until now, due to the current glut of content working its way through the DP system. I'd be very surprised if DPers have been responsible for scraping even a hundred complete texts from Google's archive -- a tiny amount compared to the more than 35000 texts listed in Bruce's current index. As far I can tell, Google is allowing me to view all the works it has allowed me to view ever since their site was set up, so I don't see any evidence that they have become more conservative, at least in content displayed to people in the UK. On the other hand, their policy of restricting access based on the publication date being earlier than 1864 *does* exclude a lot of books which are public domain in the UK from being viewed in the UK -- and, oddly, they aren't moving the barrier forward each year, as they should (unlike the US, the public domain isn't frozen here, so new material is entering every year). It is just another example of US-based companies only dealing with non-US issues as a poorly considered afterthought, so it's not all that surprising :). -- Jon Ingram From bruce at zuhause.org Fri Mar 24 16:37:28 2006 From: bruce at zuhause.org (Bruce Albrecht) Date: Fri Mar 24 16:37:31 2006 Subject: [gutvol-d] when they act surprised In-Reply-To: <361.812eaf.31559835@aol.com> References: <361.812eaf.31559835@aol.com> Message-ID: <17444.37064.48800.463423@celery.zuhause.org> Bowerbird@aol.com writes: > but it is even _more_ annoying when they act > _surprised_ when the consequences show up! > > >? ? http://groups.yahoo.com/group/ebook-community/message/25166 > > um, yes, bruce, google is being overly cautious. > your scan-scraper script is one main reason why. > and your catalog is _another_ main reason why... I don't think I ever expressed surprise. Annoyance, perhaps. I don't believe that Google's decision to only classify books as PD only when there's an explicit copyright, as opposed to including books with an explicit publishing date has anything to do with my (or other people's) scan scraper, or my catalog. Furthermore, if they were so concerned about them, do you think they would have put about another 25,000 books online in the PD status, and added a select box to search for PD-only books? The PD-only search seems to miss some things, but I'm quibbling. BTW, Google is aware of my catalog, and the Google Books program manager mentioned my catalog as "Bruce Albrecht's catalog" (or something close to it) at a conference. They've never attempted to contact me. Interpret that as you will. From hyphen at hyphenologist.co.uk Sat Mar 25 02:46:03 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sat Mar 25 02:46:18 2006 Subject: [gutvol-d] Four F W Moorman books bound together & Any Tykes? In-Reply-To: <000301c613f2$cbe0ab70$6401a8c0@ahainesp2600> References: <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com> <82.35f5d394.30eeecb0@aol.com> <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com> <3.0.5.32.20060107164958.0254d100@mail.chattanooga.net> <000301c613f2$cbe0ab70$6401a8c0@ahainesp2600> Message-ID: <516a22dmui1b8la2dvadgkld3f6af2hkts@4ax.com> I have just bought "Tales, Songs, and Plays of the Ridings" which contains "Tales the Ridings", "More Tales of the Ridings", "Plays of the Ridings", and "Songs of the Ridings". This last "Songs of the Ridings I have already done for PG, Etext No 3232. On looking at what I have bought, these four books are simply a complete reprinting of the latest edition of the four complete with title pages, and even adverts bound together as a single book. All are pre 1923. Would PG by happy if I was to submit the three not already done as three single books? The advantage would be to split the work into sections to use as a break from John Hartley books. Any other Yorkshire Tykes out there? Alison is helping me with proofreading, are there any more Tykes out there who would be willing to help doing Yorkshire Dialect books? The DP route seems to have fallen at the first hurdle :-( -- Dave Fawthrop <dave hyphenologist co uk> "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From greg at durendal.org Sat Mar 25 04:28:50 2006 From: greg at durendal.org (Greg Weeks) Date: Sat Mar 25 05:00:06 2006 Subject: [gutvol-d] Four F W Moorman books bound together & Any Tykes? In-Reply-To: <516a22dmui1b8la2dvadgkld3f6af2hkts@4ax.com> References: <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com> <82.35f5d394.30eeecb0@aol.com> <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com> <3.0.5.32.20060107164958.0254d100@mail.chattanooga.net> <000301c613f2$cbe0ab70$6401a8c0@ahainesp2600> <516a22dmui1b8la2dvadgkld3f6af2hkts@4ax.com> Message-ID: <Pine.LNX.4.63.0603250725210.6940@durendal.durendal.org> On Sat, 25 Mar 2006, Dave Fawthrop wrote: > Would PG by happy if I was to submit the three not already done as three > single books? The advantage would be to split the work into sections to > use as a break from John Hartley books. I've routinely split books apart that are reprinted this way. Sometimes it was necessary for rights reasons where I couldn't clear one or the other. -- Greg Weeks http://durendal.org:8080/greg/ From sly at victoria.tc.ca Sat Mar 25 08:00:30 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Sat Mar 25 08:00:34 2006 Subject: [gutvol-d] Four F W Moorman books bound together & Any Tykes? In-Reply-To: <516a22dmui1b8la2dvadgkld3f6af2hkts@4ax.com> References: <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com> <82.35f5d394.30eeecb0@aol.com> <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com> <3.0.5.32.20060107164958.0254d100@mail.chattanooga.net> <000301c613f2$cbe0ab70$6401a8c0@ahainesp2600> <516a22dmui1b8la2dvadgkld3f6af2hkts@4ax.com> Message-ID: <Pine.GSO.4.58.0603250753340.24838@vtn1.victoria.tc.ca> Traditionally, it's been up to the volunteer to decide what they would like to have done with this. Myself, I like to have shorter texts by the same author together if they have some subject resemblance, and are likely to be downloaded together by someone interested anyway. For a recent example, see this text of Violet Jacob's poetry: http://www.gutenberg.org/etext/17933 Andrew On Sat, 25 Mar 2006, Dave Fawthrop wrote: > > > I have just bought "Tales, Songs, and Plays of the Ridings" which contains > "Tales the Ridings", "More Tales of the Ridings", "Plays of the Ridings", > and "Songs of the Ridings". This last "Songs of the Ridings I have > already done for PG, Etext No 3232. > > On looking at what I have bought, these four books are simply a complete > reprinting of the latest edition of the four complete with title pages, and > even adverts bound together as a single book. All are pre 1923. > > Would PG by happy if I was to submit the three not already done as three > single books? The advantage would be to split the work into sections to > use as a break from John Hartley books. > > Any other Yorkshire Tykes out there? Alison is helping me with > proofreading, are there any more Tykes out there who would be willing to > help doing Yorkshire Dialect books? The DP route seems to have fallen at > the first hurdle :-( > > From sly at victoria.tc.ca Mon Mar 27 09:36:13 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Mon Mar 27 09:36:19 2006 Subject: [gutvol-d] Dutch texts downloaded Message-ID: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca> Taking a look at the top-100 list, http://www.gutenberg.org/browse/scores/top For the statistics of the last seven days, there seems to be more texts in Dutch than I have noticed before. It's good to see more interest in a wider range of languages... Andrew From Bowerbird at aol.com Mon Mar 27 10:44:48 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Mar 27 10:44:53 2006 Subject: [gutvol-d] welcome to another week Message-ID: <2a7.a48791.31598ca0@aol.com> d-lib has an article on language-translation by machine: > http://www.dlib.org/dlib/march06/smith/03smith.html *** they also have an article on automatic "document recognition", the ability to have a computer ascertain the underlying structure of a document, what i've been talking about here for many years, which my detractors here have repeatedly termed as "impossible": > http://www.dlib.org/dlib/march06/choudhury/03choudhury.html in time, people will laugh at how ridiculously stupid my detractors were. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060327/bf8da816/attachment.html From jeroen.mailinglist at bohol.ph Mon Mar 27 14:27:13 2006 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Mon Mar 27 14:22:27 2006 Subject: [gutvol-d] Dutch texts downloaded In-Reply-To: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca> References: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca> Message-ID: <442866C1.2050706@bohol.ph> Hi All, I noticed this as well, and it is getting less already. I don't know what is causing this, but thousands of copies have been downloaded. This almost starts to match the number of English downloads... Can I have referrer logs, to find out where they come from? A per language top 100 would be much appreciated. Jeroen. Andrew Sly wrote: >Taking a look at the top-100 list, >http://www.gutenberg.org/browse/scores/top >For the statistics of the last seven days, >there seems to be more texts in Dutch than >I have noticed before. > >It's good to see more interest in a wider >range of languages... > > >Andrew >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > From prosfilaes at gmail.com Mon Mar 27 20:35:09 2006 From: prosfilaes at gmail.com (David Starner) Date: Mon Mar 27 20:41:43 2006 Subject: [gutvol-d] Dutch texts downloaded In-Reply-To: <442866C1.2050706@bohol.ph> References: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca> <442866C1.2050706@bohol.ph> Message-ID: <6d99d1fd0603272035w4998cfe2nc1c5d140924de47a@mail.gmail.com> On 3/27/06, Jeroen Hellingman (Mailing List Account) <jeroen.mailinglist@bohol.ph> wrote: > > Hi All, > > I noticed this as well, and it is getting less already. I don't know > what is causing this, but thousands of copies have been downloaded. This > almost starts to match the number of English downloads... > > Can I have referrer logs, to find out where they come from? A per > language top 100 would be much appreciated. Team Esperanto on DP was wondering about the most downloaded Esperanto books. It would be interesting at least as a one-time thing. From marcello at perathoner.de Tue Mar 28 07:29:03 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Mar 28 07:29:07 2006 Subject: [gutvol-d] Esperanto texts downloaded In-Reply-To: <6d99d1fd0603272035w4998cfe2nc1c5d140924de47a@mail.gmail.com> References: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca> <442866C1.2050706@bohol.ph> <6d99d1fd0603272035w4998cfe2nc1c5d140924de47a@mail.gmail.com> Message-ID: <4429563F.9050400@perathoner.de> David Starner wrote: > Team Esperanto on DP was wondering about the most downloaded Esperanto > books. It would be interesting at least as a one-time thing. We keep records for the last 30 days only. gutenberg=> SELECT scores.book_downloads.fk_books, SUM (downloads) AS downloads gutenberg-> FROM scores.book_downloads, mn_books_langs gutenberg-> WHERE mn_books_langs.fk_langs = 'eo' gutenberg-> AND mn_books_langs.fk_books = scores.book_downloads.fk_books gutenberg-> GROUP BY scores.book_downloads.fk_books gutenberg-> ORDER BY downloads DESC; fk_books | downloads ----------+----------- 8177 | 405 16967 | 347 7787 | 303 17482 | 214 11511 | 183 17945 | 148 8224 | 145 17425 | 126 17665 | 98 11307 | 66 (10 rows) gutenberg=> -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Tue Mar 28 08:35:46 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Mar 28 08:35:51 2006 Subject: [gutvol-d] Dutch texts downloaded In-Reply-To: <442866C1.2050706@bohol.ph> References: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca> <442866C1.2050706@bohol.ph> Message-ID: <442965E2.2000600@perathoner.de> Jeroen Hellingman (Mailing List Account) wrote: > I noticed this as well, and it is getting less already. I don't know > what is causing this, but thousands of copies have been downloaded. This > almost starts to match the number of English downloads... > > Can I have referrer logs, to find out where they come from? A per > language top 100 would be much appreciated. Spam attack courtesy of New Horizons. See: http://www.spews.org/html/S2507.html They are collecting some texts to get past spam filters. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Tue Mar 28 09:50:30 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Mar 28 09:50:41 2006 Subject: [gutvol-d] another dose of reality Message-ID: <248.97317aa.315ad166@aol.com> x.m.l. fans here should read yet another dose of harsh reality, this time surprisingly from one of your leaders, simon st. laurent: > http://www.xml.com/lpt/a/2006/03/15/next-web-xhtml2-ajax.html it's interesting how simon uses the past tense for many of your buzzwords... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060328/1447f966/attachment.html From sly at victoria.tc.ca Tue Mar 28 10:11:04 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Mar 28 10:11:09 2006 Subject: [gutvol-d] Esperanto texts downloaded In-Reply-To: <4429563F.9050400@perathoner.de> References: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca> <442866C1.2050706@bohol.ph> <6d99d1fd0603272035w4998cfe2nc1c5d140924de47a@mail.gmail.com> <4429563F.9050400@perathoner.de> Message-ID: <Pine.GSO.4.58.0603281009140.5042@vtn1.victoria.tc.ca> In case anyone is interested besides David, here are the titles to go with the list of numbers Marcello has provided. Notice that the instructional books are quite clearly the most often downloaded. PG number Downloads Title 8177 | 405 The Esperanto Teacher 16967 | 347 English-Esperanto Dictionary 7787 | 303 A Complete Grammar of Esperanto 17482 | 214 La Aventuroj de Alicio en Mirlando 11511 | 183 Robinsono Kruso 17945 | 148 Mark Twain: Tri Noveloj 8224 | 145 Fundamenta Krestomatio 17425 | 126 La Falo de Usxero-Domo 17665 | 98 Mia Kontrabandulo 11307 | 66 El la Biblio On Tue, 28 Mar 2006, Marcello Perathoner wrote: > David Starner wrote: > > > Team Esperanto on DP was wondering about the most downloaded Esperanto > > books. It would be interesting at least as a one-time thing. > > We keep records for the last 30 days only. > > gutenberg=> SELECT scores.book_downloads.fk_books, SUM (downloads) AS > downloads > gutenberg-> FROM scores.book_downloads, mn_books_langs > gutenberg-> WHERE mn_books_langs.fk_langs = 'eo' > gutenberg-> AND mn_books_langs.fk_books = scores.book_downloads.fk_books > gutenberg-> GROUP BY scores.book_downloads.fk_books > gutenberg-> ORDER BY downloads DESC; > fk_books | downloads > ----------+----------- > 8177 | 405 > 16967 | 347 > 7787 | 303 > 17482 | 214 > 11511 | 183 > 17945 | 148 > 8224 | 145 > 17425 | 126 > 17665 | 98 > 11307 | 66 > (10 rows) > > gutenberg=> > > From holden.mcgroin at dsl.pipex.com Tue Mar 28 11:27:55 2006 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Tue Mar 28 11:51:46 2006 Subject: [gutvol-d] another dose of reality In-Reply-To: <248.97317aa.315ad166@aol.com> References: <248.97317aa.315ad166@aol.com> Message-ID: <1143574076.4196.9.camel@steve-mcqueen> On Tue, 2006-03-28 at 12:50 -0500, Bowerbird@aol.com wrote: > x.m.l. fans here should read yet another dose of harsh reality, > this time surprisingly from one of your leaders, simon st. laurent: > > http://www.xml.com/lpt/a/2006/03/15/next-web-xhtml2-ajax.html > > it's interesting how simon uses the past tense for many of your > buzzwords... It seems like you're trying to pick a fight but I'll not bite. The article you post is irrelevant to current XML-based plans for PG. The whole point of the article is that direct delivery of XML to users with stylesheets has not yet happened. However, from what I've heard about current PG uses of XML, they tend towards using XML as a master format for storage on the server. Content is then converted to the user's desired format (plain text, HTML, XML, PDF, or random format X). It should be quite obvious that these two concepts are entirely different. The planned PG approach does not even need the user to have software capable of rendering XML with stylesheets. All it requires of the user is a web browser for accessing PG's site in the first place and a viewer for whichever download format he/she chooses. ---- On a slight side note, I don't see the point of your aggressive posts to the list. Everybody here should be (is?) aiming towards the furthering of PG's goals. If whatever format you choose happens to preferred in the long run, that's not a reason for gloating. The other people on this list are merely trying to help PG as much as we all hope you are. Regards, Holden From Bowerbird at aol.com Tue Mar 28 13:36:28 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Mar 28 13:36:39 2006 Subject: [gutvol-d] another dose of reality Message-ID: <29b.8221559.315b065c@aol.com> holden said: > It seems like you're trying to pick a fight nope. just sharing information. got any? > The article you post is irrelevant to current XML-based plans for PG. well, i guess that's a matter of opinion. i think it is relevant, in the sense that it talks about a bunch of much-hyped "solutions" which have not materialized, and now might _never_ come to fruition, including some that are counted on here. > The whole point of the article is that direct delivery of XML to users > with stylesheets has not yet happened. that's one of those "solutions", yes, but just one... > However, from what I've heard about current PG uses of XML, > they tend towards using XML as a master format for storage well yes, and as simon points out, that is a "fallback" position from the one that was originally staked, which was the serving of x.m.l. files. but even this fallback rings hollow here, since the position that x.m.l. is needed for _conversion_to_multiple_formats_ was always a tenuous one, in the sense that many entities out there are already doing mass conversion of p.g. e-texts, even though they aren't in x.m.l. format. further, the x.s.l.t. methodology that has always been the crucial linchpin in the "strategy" of x.m.l. advocates here is one of the ones that simon relegates to the past-tense. i find that interesting. and surely the examples of it that we have seen so far have shown it is badly lacking. add to the equation now that i'm showing, with real example-books, that z.m.l. can convert to multiple formats quite easily, on the user's desktop, via button-clicks, and the question becomes hard to avoid: what's the reason to apply heavy markup? don't get me wrong, i'm sure the hypesters will be able to invent one, they are creative that way, it's just that i think history should warn us to take these with a grain of salt... > On a slight side note, I don't > see the point of your aggressive posts first, my posts are not "aggressive". but if you _choose_ to interpret them that way, then i don't see _your_ point. wouldn't it be better just to skip them? why even bother reading them, holden? (let alone replying to them?) i mean, seriously, i could just as easily interpret _your_ posts as ad hominem, since you've said straight out that i am "trying to pick a fight" with "aggressive posts". but going down that road wouldn't be too productive, so i consciously choose not to; instead i have responded to your post with rebuttals that are on-topic and on-point, without diverting to attack your character. if you want a mud-fight or a flame-war, well i've shown i can do those things too; but why not friendly conversation instead? and don't get me wrong, i don't mean to be disingenuous here, because i fully understand that it's not pleasant to be on the losing side of a "you were wrong" comment. but that's the risk you take when you take a stand and you're wrong. but when someone is wrong, and you say they are wrong, that doesn't mean it's an "aggressive" post. > Everybody here should be (is?) aiming > towards the furthering of PG's goals. well yes, i believe that we all agree on that. the next issue is, "how do we obtain that?" on _that_ question, there is disagreement, which has been longstanding, and ugly too. and as much as some people might like to sweep this disagreement under the carpet, and have people forget what they said since things aren't looking too swell for their side, the disagreement still runs deep, and wide... meanwhile, little progress is being made on "the goals of p.g." that we all agree on... how long does t.e.i./x.m.l./whatever remain on the table as "the official plan" before it's required to show some action and results? how are "the goals of p.g." being served? those are questions i think you all should be asking yourselves. as for me, i'll just keep on plugging away with my little experiments, and maybe someday you'll realize that z.m.l. is best. > If whatever format you choose > happens to preferred in the long run, > that's not a reason for gloating. well, i certainly will not be "gloating", because i don't see much point in that. however, it _is_ important to keep in mind whose predictions were wrong, and right, and whose credibility was badly shredded, for future reference... you know, fool me once, your fault, but fool me twice, my fault, right? so surely you can't mind if we evaluate those matters quite closely, can you? besides, in retrospect, my methodologies will be so _obvious_ that no one will even consider their "invention" to be _special_; that it was "controversial" will be laughable. luckily, i'll be able to point to lots and lots of messages that people posted to _this_ listserve as solid evidence that some people didn't get it. (which is why i spent so much time discussing it! y'all would have been a lot smarter to _fold_ your losing hand much earlier in this poker-game than you did, instead of constantly raising your bets...) > The other people on this list are merely > trying to help PG as much as we all hope you are. and i give them full credit on the variable of "trying". that doesn't mean i'm gonna start paying attention to what they say they see in their crystal ball though, because we've come to learn that it's badly cracked... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060328/aa972b31/attachment.html From jon at noring.name Tue Mar 28 13:57:05 2006 From: jon at noring.name (Jon Noring) Date: Tue Mar 28 13:57:12 2006 Subject: [gutvol-d] another dose of reality In-Reply-To: <1143574076.4196.9.camel@steve-mcqueen> References: <248.97317aa.315ad166@aol.com> <1143574076.4196.9.camel@steve-mcqueen> Message-ID: <1152284438.20060328145705@noring.name> Holden wrote: > Bowerbird wrote: >> x.m.l. fans here should read yet another dose of harsh reality, >> this time surprisingly from one of your leaders, simon st. laurent: >> http://www.xml.com/lpt/a/2006/03/15/next-web-xhtml2-ajax.html >> >> it's interesting how simon uses the past tense for many of your >> buzzwords... > The article you post is irrelevant to current XML-based plans for PG. > The whole point of the article is that direct delivery of XML to users > with stylesheets has not yet happened. The difficulty of adapting other-than-HTML XML documents to web browsers is that web browsers are largely limited, by historical development (and the resultant inertia of the installed-base of web browsers), to the HTML "paradigm." CSS partially helps with presentational "interpretation" of many XML elements using the CSS 'display' property. But CSS 'display' does not include values (nor should it for reasons best explained another time) for the following important "bread and butter" web features: 1) hypertext linking 2) image and multimedia embedding There's also the issue that using CSS 'display' to recognize table markup must be of the HTML table model. In addition, the HTML paradigm does not natively handle the TEI <note> and DocBook <Note> elements (and similar constructs found in *many* established markup vocabularies *except* HTML) which essentially place annotative content in the main flow of the text at the point of reference. Such annotative content is not intended to be displayed as part of the main flow of the document, but to be extracted and somehow rendered outside the mainflow. (CSS2.1 may be used to float and move such inline annotative content, but there's no native recognition in HTML of such inline annotative content -- it's one of the bigger mistakes, and understandable given the time I suppose, that the original HTML folk made when they invented HTML and built rudimentary browsers which essentially locked-in how web browsers are to work. They probably thought that since the annotative content can be placed elsewhere and linked to using <a>, why support inline annotative content? It was probably programming expediency more than anything.) XLink was primarily designed to be a vocabulary-independent way to add hypertext linking and image/multimedia embedding to XML documents (I think it could also be used for inline annotative content, but this is less appealing for various reasons.) Unfortunately, since everyone has been using HTML for so long, and HTML already provides the <a>, <img> and <object> elements, there's been little incentive to add XLink support to browsers. A sort of Catch-22 situation. Mozilla/Firefox support a limited subset of XLink sufficient to enable hypertext linking as the following demo will show (of course, the link only works using Mozilla-based browsers): http://www.windspun.com/demoxml/demolink.xml Unfortunately, the full XLink spec is quite complex (because it was designed to do everything except maybe toast bread) and this has acted as a further impediment to its embracement. There are quite simple subsets that could be implemented though, and implementation is fairly straightforward, even of the full XLink. XLink is implemented in several XML-based applications, but except for the limited Mozilla support as described above, XLink has not yet reached the web browser world. (Note that CSS2 may be used to embed images within XML documents, but this is a kludge since this violates the general principle of separation of content from presentation (markup should be used to embed the images, which are content, and not use CSS since documents need to stand alone without CSS -- that is, no content references should be placed into CSS -- lose the CSS, lose content.) For an example using CSS for this purpose, which works only in Opera and Mozilla/Firefox: http://www.windspun.com/demoxml/embedimage.xml ) Handling non-HTML table models is a more complex issue, and one which has no easy answer except either conforming all table markup in XML documents to the HTML model (where CSS 'display' may then be used for visual presentation), or adding multi-table-model support to web browsers. Inline annotative content is also a pretty sticky area. The XHTML 2.0 folk appeared to be close to adding something like the TEI <note> tag to XHTML 2.0, but it appears they backed away from that, I suppose because of the inertia of the current installed base of web browsers. (This can be handled by CSS2-aware web browsers by simply setting certain CSS properties for default handling.) At one time I had a demo illustrating how to get inline annotative content to be displayed to the side (such as in a sidebar), but I can't find that demo. :^( (Second note: XML Namespaces is a mechanism by which HTML markup may be embedded within XML documents. So one could add something like <xhtml:a>, <xhtml:img>, and <xhtml:object>, but then these are not vocabulary-independent ways to add such functionality. XLink is the better long-term solution because it is XML-generic.) Anyway, the bottom line is that the biggest impediment to visual presentation of "arbitrary" XML markup in web browsers is not XML, but of the inertia of web browser developers to implement a few small things, such as XLink support (even a subset sufficient for hypertext linking and images/multimedia embedding.) There's nothing inherently bad about XLink that makes it difficult to enable -- it's simply one of those Catch-22 things, combined with some "political" aspects I won't go into here. > However, from what I've heard about current PG uses of XML, they tend > towards using XML as a master format for storage on the server. Content > is then converted to the user's desired format (plain text, HTML, XML, > PDF, or random format X). Yes. I observe that the main thrust these days in both PG and DP is to develop a well-defined (constrained) subset of TEI for markup. XSLT can then be used to generate web-browser friendly XHTML markup, as well as other formats, such as plain text. Except for the issues of hypertext linking, image/multimedia embedding, table models, and inline annotative content as discussed above, TEI itself may be natively rendered in advanced CSS2-aware web browsers (such as Opera and Mozilla/Firefox). And as noted above, one has some limited ability to use CSS to float inline annotative content from the main flow to somewhere outside the mainflow. (Damn, wish I could find that example illustrating this.) Here's a couple CSS stylesheets already developed for rendering TEI: http://xml.web.cern.ch/XML/www.tei-c.org/Stylesheets/ (Been looking for examples of using these style sheets in action, but haven't found any yet. Anyone?) > On a slight side note, I don't see the point of your aggressive posts to > the list. Everybody here should be (is?) aiming towards the furthering > of PG's goals. If whatever format you choose happens to preferred in the > long run, that's not a reason for gloating. The other people on this > list are merely trying to help PG as much as we all hope you are. Well, Bowerbird believes mastering PG/DP texts in XML is a bad idea, so he's trying to convince PG/DP to instead embrace regularized plain text, notably his ZML system. He's using all the weapons he can muster. I don't see using ZML for mastering, but I do see using ZML rules for regularizing the plain text output from a transformation of the XML master. Jon From holden.mcgroin at dsl.pipex.com Tue Mar 28 15:00:55 2006 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Tue Mar 28 15:00:58 2006 Subject: [gutvol-d] another dose of reality In-Reply-To: <29b.8221559.315b065c@aol.com> References: <29b.8221559.315b065c@aol.com> Message-ID: <1143586855.4196.57.camel@steve-mcqueen> Bowerbird, I hope you don't mind if I snip your long post. I was unable to find specific sections which would make for a suitable reply so I reply here to your post as a whole. The flaw in your argument, I find, is your suggestion that since PG's XML teams have so far failed to produce any output, XML must be a bad strategy for PG. Your ZML-based strategy has brought forth results and so is better. What is XML? XML is merely a method for devising markup languages. No more, no less. If one project uses XML and, coincidentally, happens to be slow, while another does not and happens to be faster, that does not mean we can attribute such a speed difference to the different languages used, particularly when other factors differ between the two projects. Why has your ZML-based project brought forth results so quickly while others have not? Simply because you are doing a different task to them. Your project is essentially taking standard PG texts and automatically formatting them. From what I've read of the XML team's efforts, they are going back to the original books to ensure they get the formatting from the original books. There is, of course, a trade-off. Going back to the original books to get missing formatting information is extremely time consuming but guarantees accurate results. Your approach does not guarantee accurate results, but settles for well formatted results in most cases, the advantage of which is that it is fast. Blaming a slow start on XML is to miss the point. The XML team could just as well use automatic transformations to convert from PG texts to their XML format, if they chose to. However, their opinion (they are correct) is that such automatic translations as you are using can not capture the full complexity of each book and will not work on every book. Moreover, choosing/defining a suitable format which is capable of retaining every formatting nuance of any given text is not an enviable task. So, please take this as my plea for calm. You and the XML team are working towards different goals. This is not a zero-sum game. If either team produces a great product, it does not come at the expense of the other team. If anything, why not just stop trying to disparage the other team and just get on with producing texts in the best way that you know? Regards, Holden From jon at noring.name Tue Mar 28 15:04:52 2006 From: jon at noring.name (Jon Noring) Date: Tue Mar 28 15:04:56 2006 Subject: [gutvol-d] another dose of reality In-Reply-To: <29b.8221559.315b065c@aol.com> References: <29b.8221559.315b065c@aol.com> Message-ID: <83037338.20060328160452@noring.name> Bowerbird wrote: > further, the x.s.l.t. methodology that has always been the crucial > linchpin in the "strategy" of x.m.l. advocates here is one of the > ones that simon relegates to the past-tense.?i find that > interesting. Do you think Simon would agree with your assessment of what he is saying? I think you are putting words into his mouth that he did not say or mean. XSLT is being *massively* used in quite a few XML applications, and successfully so. No doubt XSLT has its share of problems as all human-made systems have, but such problems have not stopped it from being used in real-world systems. XSLT is not a theoretical spec -- it is definitely not "vapour." DocBook is one notable success story. O'Reilly uses DocBook in much of its publishing workflow (it was interesting to hear Tim O'Reilly speak at Reading 2.0 -- he's a super-pragmatic person -- they use DocBook and XSLT/XSL-FO because it *makes sense to*.) Rosetta Solutions and other document conversion houses are moving fast to mastering in XML (Rosetta Solutions is using DocBook) and using XSLT (and the related XSL-FO) for outputting in various formats. It's been eye-opening to talk with the several conversion houses (as we have been doing for both OpenReader and LibraryCity.) I also recall seeing a couple online book projects in academia which master in TEI and use XSLT to generate XHTML and other formats. Do a check on Google of "TEI XSLT". 168,000 pages came up. Have fun. If PG/DP has failed so far to move to TEI-based (or other XML vocabulary) mastering, it has little to do with XML, XSLT, etc. It's simply the limited time of the volunteers. I notice that things in PG-Land tend to move slow anyway in most areas, particularly when it comes to change. Look at the problems you've had in getting text errors corrected! (Although maybe that's due to not submitting error reports to the right place.) DP is where most of the action is taking place these days, but even there, DP's long-planned move to a next-gen system (which includes a uniform XML-based mastering) appears to have been put on hold as well (or they're doing it in smaller increments.) They're too busy producing texts. It is the tyranny found in every limited-funded, volunteer organization (and even well-funded orgs): change tends to take a long time unless some bright light steps forward to make something happen. You will no doubt argue, and there is merit in your argument, that your ZML system (which is essentially regularized plain text) is the answer to all PG's and DP's woes, but then you have to *show* that for all the things they'd like to do with their texts in the long-term future, ZML has sufficient structural resolution. But your approach so far to convince others reminds me a lot of the famous advertising slogan of "Ralph's Pretty Good Grocery" (in Garrison Keilor's mythical small-town of Lake Wobegon): "If you can't find it at Ralph's, you can probably get along without it." What's needed on both sides of the debate is a clear cut requirements list of exactly what the "master" format is to accomplish/fulfill. Then this will determine whether the simpler regularized text approach is sufficient, or if an XML-based approach is called for. From my study the last few years in related systems, the XML-based approach is worth the extra work to get there, provided the XML vocabulary is properly chosen and consistently applied. So, your saying that "trust me, ZML is sufficient", is itself an insufficient statement. It's like George Bush saying "trust me, the invasion of Iraq is justified." You even come across like George Bush who "knows" what's good for us but doesn't bother to explain why. Just "trust me, I know what's good for you." Of course, as just noted, there's no agreed to requirements list on which to base any important decision upon, so this debate is sort of being conducted in the dark. Nevertheless, several of the main players in DP and PG have a pretty good intuitive feeling that regularized text is not sufficient. Since XML, properly done, will always surpass ZML in document structure resolution, then the conservative position is XML (better to have more machine-readable document structure than less -- one can always later scale back on the markup if found unnecessary. But if there's a million texts with insufficient structural resolution, then that's a BIG problem.) Also, developing a killer "viewer-app" system for ZML is not sufficient to prove the merit of ZML, either, since visual presentation is simply one use of digital texts. There's other uses such as non-visual presentation, inter-publication linking, annotation, searching/data-mining, machine translation, etc. There's no doubt uses not yet recognized which may require more, not less, document structural identification. Each use adds its own set of requirements. Jon Noring From Bowerbird at aol.com Tue Mar 28 15:19:18 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Mar 28 15:19:24 2006 Subject: [gutvol-d] another dose of reality Message-ID: <1ad.495c14b8.315b1e76@aol.com> holden said: > So, please take this as my plea for calm. i'm _perfectly_ calm, holden... :+) sitting back and eating pudding, as a matter of fact. tastes good! ;+) but i don't even mind if people want to spend their volunteer time doing x.m.l. there will be a time and place for it too, and i wish them the best of luck with it. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060328/19bc6c5b/attachment.html From joshua at hutchinson.net Tue Mar 28 18:02:04 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Mar 28 17:57:09 2006 Subject: [gutvol-d] another dose of reality Message-ID: <20060329020204.641E1DA59F@ws6-6.us4.outblaze.com> > ----- Original Message ----- > From: "Holden McGroin" <holden.mcgroin@dsl.pipex.com> > > The flaw in your argument, I find, is your suggestion that since PG's > XML teams have so far failed to produce any output, XML must be a bad > strategy for PG. Your ZML-based strategy has brought forth results and > so is better. > PGDP *has* produced some XML based books. Here is a short list of some of them (there are at least a few more, but this is a list I was able to put together quickly). I'd also like to point out that this *partial* list is still far more than bowerbird's zml efforts. http://www.gutenberg.org/etext/16697 - Epistle to the Son of the Wolf by Bah?'u'll?h http://www.gutenberg.org/etext/16939 - Gems of Divine Mysteries by Bah?'u'll?h http://www.gutenberg.org/etext/16940 - Gleanings from the Writings of Bah?'u'll?h by Bah?'u'll?h http://www.gutenberg.org/etext/16941 - The Hidden Words of Bah?'u'll?h by Bah?'u'll?h http://www.gutenberg.org/etext/16983 - The Kit?b-i-?q?n by Bah?'u'll?h http://www.gutenberg.org/etext/16523 - The Kit?b-i-Aqdas by Bah?'u'll?h http://www.gutenberg.org/etext/16984 - Prayers and Meditations by Bah?'u'll?h http://www.gutenberg.org/etext/16985 - The Proclamation of Bah?'u'll?h by Bah?'u'll?h http://www.gutenberg.org/etext/16986 - The Seven Valleys and the Four Valleys by Bah?'u'll?h http://www.gutenberg.org/etext/17309 - The Summons of the Lord of Hosts by Bah?'u'll?h http://www.gutenberg.org/etext/17310 - Tablets of Bah??u?ll?h Revealed after the Kitab-i-Aqdas by Bah?'u'll?h http://www.gutenberg.org/etext/15697 - True Stories of History and Biography by Nathaniel Hawthorne JHutch From marcello at perathoner.de Wed Mar 29 07:56:03 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Mar 29 07:56:07 2006 Subject: [gutvol-d] another dose of reality In-Reply-To: <29b.8221559.315b065c@aol.com> References: <29b.8221559.315b065c@aol.com> Message-ID: <442AAE13.6050401@perathoner.de> Bowerbird@aol.com wrote: > add to the equation now that i'm showing, > with real example-books, that z.m.l. can > convert to multiple formats quite easily, > on the user's desktop, via button-clicks, You forgot to say that the "user's desktop" has to be the computer of your alleged girlfriend, a 1991 Mac running System 7.5. Because on everybody else's desktop it just prints the splash screen and crashes. > how long does t.e.i./x.m.l./whatever remain > on the table as "the official plan" before it's > required to show some action and results? There is no official plan. If you can contribute a working technology and convince the people at DP to adopt it, you win. If not, you lose. That's that. But wait! All you did in the last three years was to steal the time and to insult everybody who was trying to do some real work. You'll have a hard time getting people to adopt your gadgets because everybody hates you. And that's a dose of reality for you. What about starting a book distribution site yourself? Servers are cheap. A genius like you will convert the PG library into all kinds of formats in no time at all. Go ahead insted of wasting your precious time with blockheads like us! -- Marcello Perathoner webmaster@gutenberg.org From creeva at gmail.com Wed Mar 29 08:16:11 2006 From: creeva at gmail.com (Brent Gueth) Date: Wed Mar 29 08:21:59 2006 Subject: [gutvol-d] another dose of reality In-Reply-To: <442AAE13.6050401@perathoner.de> Message-ID: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com> Now I'm not going to get in the middle of this on going feud, but what readers do we have for each format? The TEI formats are listed there but what do you use to read them so you do not see all the markup encoding. Firefox shows the markup when you open the file. Notepad shows the markup (to be expected). MS Word just crashes upon opening because it says that there are problems with the contents. Going to the gutenberg help page and looking under formats listed there is no information on it (this should have been the first thing done before placing a new format on the site). I understand that behind the scenes there are tools to get these things handled, but why put them on the public site if the tools are not available and a description is not available on how to use them. -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Marcello Perathoner Sent: Wednesday, March 29, 2006 10:56 AM To: Project Gutenberg Volunteer Discussion Cc: Bowerbird@aol.com Subject: Re: [gutvol-d] another dose of reality Bowerbird@aol.com wrote: > add to the equation now that i'm showing, > with real example-books, that z.m.l. can > convert to multiple formats quite easily, > on the user's desktop, via button-clicks, You forgot to say that the "user's desktop" has to be the computer of your alleged girlfriend, a 1991 Mac running System 7.5. Because on everybody else's desktop it just prints the splash screen and crashes. > how long does t.e.i./x.m.l./whatever remain > on the table as "the official plan" before it's > required to show some action and results? There is no official plan. If you can contribute a working technology and convince the people at DP to adopt it, you win. If not, you lose. That's that. But wait! All you did in the last three years was to steal the time and to insult everybody who was trying to do some real work. You'll have a hard time getting people to adopt your gadgets because everybody hates you. And that's a dose of reality for you. What about starting a book distribution site yourself? Servers are cheap. A genius like you will convert the PG library into all kinds of formats in no time at all. Go ahead insted of wasting your precious time with blockheads like us! -- Marcello Perathoner webmaster@gutenberg.org _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From joshua at hutchinson.net Wed Mar 29 08:42:06 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Mar 29 08:36:19 2006 Subject: [gutvol-d] another dose of reality Message-ID: <20060329164206.9C3394F533@ws6-5.us4.outblaze.com> > ----- Original Message ----- > From: "Brent Gueth" <creeva@gmail.com> > > Now I'm not going to get in the middle of this on going feud, but what > readers do we have for each format? The TEI formats are listed there but > what do you use to read them so you do not see all the markup encoding. > Firefox shows the markup when you open the file. Notepad shows the markup > (to be expected). MS Word just crashes upon opening because it says that > there are problems with the contents. > Good question, Brent. The answer is that TEI is not really meant to be read by humans (just as HTML source code is not really meant to be read by humans). The great thing about those TEI docs is that I only produced 1 file (the TEI document) and then a server automatically generated the ASCII, HTML and PDF formats. I didn't have to manually fiddle with multiple files. The TEI file is a text file, so technically you CAN read it in Notepad (or any text editor of your choice) but it won't be all that pretty. Hopefully, the resulting files *from* it will be pretty, though. The reason TEI is nice for this is that it provides a consistent markup method that is fairly easy for a computer to manipulate to "output" formats. if you have any further questions, fire away. I'll be happy to try to answer them. Josh From creeva at gmail.com Wed Mar 29 08:45:38 2006 From: creeva at gmail.com (Brent Gueth) Date: Wed Mar 29 08:44:59 2006 Subject: [gutvol-d] another dose of reality In-Reply-To: <20060329164206.9C3394F533@ws6-5.us4.outblaze.com> Message-ID: <008601c65350$38bf5470$6755fea9@Corp.Symantec.Com> If it is supposed to be machine readable and the tools are not released with a method of quickly editing them or a description on how they are usable to the community, why are they posted on the web site. I would think this would lead to confusion to people that are privy to this mailing list and the inside discussion. This isn't an argument for or against any format, but to have something customer facing (you don't sell anything but eyeballs are your customers and proof of the work you are doing) you should have a customer facing reason or explanation to have them there. Whether it be the TEI format or the ZML format or an XYZ format; there should be clear instructions of what the format is and how it is handled if it is launched on the site. HTML, TXT, and PDF are considered ubiquitous in the internet age, but you have the plucker format and an explanation of what it is. I would also have assumed when the first plucker texts were launched there was an explanation of what the format is and hopefully the little html link next to the download explaining what it is was there at launch. To lessen doubt that consumers are missing something to view these texts there should be a link explaining what TEI is. -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Joshua Hutchinson Sent: Wednesday, March 29, 2006 11:42 AM To: Project Gutenberg Volunteer Discussion Subject: RE: [gutvol-d] another dose of reality > ----- Original Message ----- > From: "Brent Gueth" <creeva@gmail.com> > > Now I'm not going to get in the middle of this on going feud, but what > readers do we have for each format? The TEI formats are listed there but > what do you use to read them so you do not see all the markup encoding. > Firefox shows the markup when you open the file. Notepad shows the markup > (to be expected). MS Word just crashes upon opening because it says that > there are problems with the contents. > Good question, Brent. The answer is that TEI is not really meant to be read by humans (just as HTML source code is not really meant to be read by humans). The great thing about those TEI docs is that I only produced 1 file (the TEI document) and then a server automatically generated the ASCII, HTML and PDF formats. I didn't have to manually fiddle with multiple files. The TEI file is a text file, so technically you CAN read it in Notepad (or any text editor of your choice) but it won't be all that pretty. Hopefully, the resulting files *from* it will be pretty, though. The reason TEI is nice for this is that it provides a consistent markup method that is fairly easy for a computer to manipulate to "output" formats. if you have any further questions, fire away. I'll be happy to try to answer them. Josh _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From joshua at hutchinson.net Wed Mar 29 09:12:21 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Mar 29 09:07:10 2006 Subject: [gutvol-d] another dose of reality Message-ID: <20060329171222.405E42F94D@ws6-3.us4.outblaze.com> We do have tools and documentation on the website, but I'll admit there is not explanation link on the download page. Here is a link to the TEI tools and documentation. http://pgtei.pglaf.org/marcello/0.4/ Josh > ----- Original Message ----- > From: "Brent Gueth" <creeva@gmail.com> > To: "'Project Gutenberg Volunteer Discussion'" <gutvol-d@lists.pglaf.org> > Subject: RE: [gutvol-d] another dose of reality > Date: Wed, 29 Mar 2006 11:45:38 -0500 > > > If it is supposed to be machine readable and the tools are not released with > a method of quickly editing them or a description on how they are usable to > the community, why are they posted on the web site. I would think this > would lead to confusion to people that are privy to this mailing list and > the inside discussion. > > This isn't an argument for or against any format, but to have something > customer facing (you don't sell anything but eyeballs are your customers and > proof of the work you are doing) you should have a customer facing reason or > explanation to have them there. Whether it be the TEI format or the ZML > format or an XYZ format; there should be clear instructions of what the > format is and how it is handled if it is launched on the site. > > > HTML, TXT, and PDF are considered ubiquitous in the internet age, but you > have the plucker format and an explanation of what it is. I would also > have assumed when the first plucker texts were launched there was an > explanation of what the format is and hopefully the little html link next to > the download explaining what it is was there at launch. To lessen doubt > that consumers are missing something to view these texts there should be a > link explaining what TEI is. > > -----Original Message----- > From: gutvol-d-bounces@lists.pglaf.org > [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Joshua Hutchinson > Sent: Wednesday, March 29, 2006 11:42 AM > To: Project Gutenberg Volunteer Discussion > Subject: RE: [gutvol-d] another dose of reality > > > > ----- Original Message ----- > > From: "Brent Gueth" <creeva@gmail.com> > > > > Now I'm not going to get in the middle of this on going feud, but what > > readers do we have for each format? The TEI formats are listed there but > > what do you use to read them so you do not see all the markup encoding. > > Firefox shows the markup when you open the file. Notepad shows the > markup > > (to be expected). MS Word just crashes upon opening because it says that > > there are problems with the contents. > > > > Good question, Brent. > > The answer is that TEI is not really meant to be read by humans (just as > HTML source code is not really meant to be read by humans). > > The great thing about those TEI docs is that I only produced 1 file (the TEI > document) and then a server automatically generated the ASCII, HTML and PDF > formats. I didn't have to manually fiddle with multiple files. > > The TEI file is a text file, so technically you CAN read it in Notepad (or > any text editor of your choice) but it won't be all that pretty. Hopefully, > the resulting files *from* it will be pretty, though. > > The reason TEI is nice for this is that it provides a consistent markup > method that is fairly easy for a computer to manipulate to "output" formats. > > if you have any further questions, fire away. I'll be happy to try to > answer them. > > Josh > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Wed Mar 29 09:08:30 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Mar 29 09:08:39 2006 Subject: [gutvol-d] another dose of reality In-Reply-To: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com> References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com> Message-ID: <442ABF0E.6000607@perathoner.de> Brent Gueth wrote: > Going to the gutenberg help page and looking under formats listed there is > no information on it (this should have been the first thing done before > placing a new format on the site). Most of the formats are not described on the help page. Reason: I don't have the time to do it. If you want to help, here is a list of all formats offered on PG. Provide a suitable description (short and concise) of any and I'll post it on the site. gutenberg=> SELECT * from filetypes order by pk; pk | filetype | sortorder | mediatype ------------+------------------------------+-----------+------------------------ ? | Unspecified | 100 | avi | MS Video | 10 | css | CSS Stylesheet | 10 | doc | MS Word Document | 10 | dvi | TeX Device Independent | 10 | eps | Encapsulated PostScript | 10 | application/postscript gif | GIF Picture | 10 | image/gif html | HTML | 5 | text/html index | Index | 3 | iso | ISO CD/DVD Image | 7 | jpg | JPEG Picture | 10 | image/jpeg license | License | 2 | lit | MS Lit for PocketPC | 10 | ly | LilyPond | 10 | md5 | MD5 Checksum | 8 | mid | MIDI | 10 | mp3 | MP3 Audio | 20 | audio/mpeg mpg | MPEG Video | 10 | video/mpeg mus | Finale | 10 | nfo | Proprietary `Folio' format | 50 | pageimages | Raw Page Images | 50 | pdb | Palm Database | 10 | application/vnd.palm pdf | Adobe PDF | 10 | application/pdf png | PNG Picture | 10 | image/png prc | Palm Database | 10 | application/vnd.palm ps | PostScript | 10 | application/postscript ps2 | PostScript Level 2 | 10 | application/postscript qt | Quicktime Video | 10 | video/quicktime readme | Readme | 1 | rtf | MS Rich Text Format | 10 | text/rtf sib | Sibelius | 10 | svg | SVG | 10 | tei | TEI Text Encoding Initiative | 50 | tex | TeX | 10 | tiff | TIFF Picture | 10 | image/tiff tr | Tome Raider | 10 | txt | Plain text | 7 | text/plain wav | MS Wave Audio | 10 | xml | XML | 50 | text/xml xsl | XSLT Stylesheet | 10 | (40 rows) -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Wed Mar 29 09:29:28 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Mar 29 09:29:44 2006 Subject: [gutvol-d] another dose of reality Message-ID: <141.584d3e4a.315c1df8@aol.com> joshua said: > PGDP *has* produced some XML based books.? > Here is a short list of some of them > (there are at least a few more, but this > is a list I was able to put together quickly). well yes, and you are to be commended, josh, for your work in actually doing some markup. as should marcello for the groundwork he has done, and jim tinsley for setting a sensible adoption policy (which is very important and not as easy as it looks). without you three, the little progress that _has_ been made would not have been accomplished. and i wish you the best in your future efforts. > I'd also like to point out that this *partial* list > is still far more than bowerbird's zml efforts. well, you might want to enjoy that "lead" as much as possible while you still can... :+) *** as long-time subscribers know, we've already done the discussion thing on this topic for long enough, it's time for pudding now, so i'll limit my replies to: 1) jon, that was a lot of verbiage, but i don't think you said anything of substance, so no reply for you. 2) holden, i think you mischaracterized my aims, since i'm not merely "reworking" old e-texts but laying out a workflow to handle new ones as well, both digitized paper-books and "born-digital" ones. (my next few examples will be in the latter category.) 3) marcello, i don't know if it's your italian "nature" or your german "nurture", but you certainly seem to have a penchant for disinformation and "the big lie". i always feel slimed after i read one of your posts. ick! anyway, folks, no need to have this discussion again. anyone who wants to see it can go read the archives. it's pudding time now -- show proof, or go home... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060329/16bf8f9d/attachment.html From joey at joeysmith.com Wed Mar 29 10:44:22 2006 From: joey at joeysmith.com (joey) Date: Wed Mar 29 10:59:09 2006 Subject: File formats and the website (was Re: [gutvol-d] another dose of reality) In-Reply-To: <442ABF0E.6000607@perathoner.de> References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com> <442ABF0E.6000607@perathoner.de> Message-ID: <20060329184422.GA30671@joeysmith.com> Marcello: How about linking to wikipedia for those we don't have descriptions for? If that's something we're interested in, I'd even be willing to take a stab at creating new Wikipedia entries for those that don't exist in either location. Also, have we thought about making a PG wiki? It might take some of this load off of you, even if we do it as a "staged to the wiki, rolled out to 'production' on a given cycle after all changes in the current cycle have been examined by one of the following trusted people..." to prevent unsavory types from pollution the production site. I'd be more than willing to help setup, maintain, and cross-populate such, if it's something PG is interested in. Here are some wikipedia links for you to use if you so choose. If you want me to make the additional entries, just let me know. avi: http://en.wikipedia.org/wiki/.avi css: http://en.wikipedia.org/wiki/Cascading_Style_Sheets dvi: http://en.wikipedia.org/wiki/DVI_file_format eps: http://en.wikipedia.org/wiki/Encapsulated_PostScript gif: http://en.wikipedia.org/wiki/GIF html: http://en.wikipedia.org/wiki/HTML iso: http://en.wikipedia.org/wiki/ISO_image lit: http://en.wikipedia.org/wiki/Microsoft_Reader ly: http://en.wikipedia.org/wiki/GNU_LilyPond md5: http://en.wikipedia.org/wiki/MD5 mp3: http://en.wikipedia.org/wiki/MP3 mus: http://en.wikipedia.org/wiki/Finale_notation_program pdb: http://en.wikipedia.org/wiki/Palm_OS pdf: http://en.wikipedia.org/wiki/Portable_Document_Format prc: http://en.wikipedia.org/wiki/Palm_OS ps: http://en.wikipedia.org/wiki/PostScript qt: http://en.wikipedia.org/wiki/QuickTime rtf: http://en.wikipedia.org/wiki/RTF sib: http://en.wikipedia.org/wiki/Sibelius_notation_program svg: http://en.wikipedia.org/wiki/SVG tei: http://en.wikipedia.org/wiki/Text_Encoding_Initiative tex: http://en.wikipedia.org/wiki/TeX tiff: http://en.wikipedia.org/wiki/TIFF tr: http://en.wikipedia.org/wiki/TomeRaider xml: http://en.wikipedia.org/wiki/XML xsl: http://en.wikipedia.org/wiki/XSL Also, you may or may not want to apply any of the following to the database. These are brought over from the /etc/mime.types on my debian box, so apply salt to taste. [See, I can make food analogies too! ;)] update filetypes set mediatype = 'video/x-msvideo' where pk = 'avi'; update filetypes set mediatype = 'text/css' where pk = 'css'; update filetypes set mediatype = 'application/msword' where pk = 'doc'; update filetypes set mediatype = 'application/x-dvi' where pk = 'dvi'; update filetypes set mediatype = 'application/x-iso9660-image' where pk = 'iso'; update filetypes set mediatype = 'audio/midi' where pk = 'mid'; update filetypes set mediatype = 'text/plain' where pk = 'readme'; update filetypes set mediatype = 'image/svg+xml' where pk = 'svg'; update filetypes set mediatype = 'audio/x-wav' where pk = 'wav'; update filetypes set mediatype = 'application/xml' where pk = 'xsl'; From marcello at perathoner.de Wed Mar 29 12:39:41 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Mar 29 12:39:46 2006 Subject: File formats and the website (was Re: [gutvol-d] another dose of reality) In-Reply-To: <20060329184422.GA30671@joeysmith.com> References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com> <442ABF0E.6000607@perathoner.de> <20060329184422.GA30671@joeysmith.com> Message-ID: <442AF08D.5000702@perathoner.de> joey wrote: > How about linking to wikipedia for those we don't have descriptions for? > If that's something we're interested in, I'd even be willing to take a > stab at creating new Wikipedia entries for those that don't exist in either > location. What we need is a short description like those you find here: http://www.gutenberg.org/help/bibrec#format The average Wikipedia entry is too complex for people who just want to know which format to download. > Also, have we thought about making a PG wiki? There already is a wiki for the newsletter editors ... not used very often. -- Marcello Perathoner webmaster@gutenberg.org From jon at noring.name Wed Mar 29 12:48:20 2006 From: jon at noring.name (Jon Noring) Date: Wed Mar 29 12:48:27 2006 Subject: File formats and the website (was Re: [gutvol-d] another dose of reality) In-Reply-To: <442AF08D.5000702@perathoner.de> References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com> <442ABF0E.6000607@perathoner.de> <20060329184422.GA30671@joeysmith.com> <442AF08D.5000702@perathoner.de> Message-ID: <1628983256.20060329134820@noring.name> Marcello wrote: > joey wrote: >> How about linking to wikipedia for those we don't have descriptions for? >> If that's something we're interested in, I'd even be willing to take a >> stab at creating new Wikipedia entries for those that don't exist in either >> location. > What we need is a short description like those you find here: > > http://www.gutenberg.org/help/bibrec#format > > The average Wikipedia entry is too complex for people who just want to > know which format to download. Well, whoever writes the short descriptions can link to Wikipedia articles describing the media types in more detail, and for those where there's no Wikipedia description, to write them. But I agree with Marcello the first step is to write the short descriptions. Jon From joey at joeysmith.com Wed Mar 29 13:04:17 2006 From: joey at joeysmith.com (joey) Date: Wed Mar 29 13:04:57 2006 Subject: File formats and the website (was Re: [gutvol-d] another dose of reality) In-Reply-To: <442AF08D.5000702@perathoner.de> References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com> <442ABF0E.6000607@perathoner.de> <20060329184422.GA30671@joeysmith.com> <442AF08D.5000702@perathoner.de> Message-ID: <20060329210417.GB30671@joeysmith.com> On Wed, Mar 29, 2006 at 10:39:41PM +0200, Marcello Perathoner wrote: > joey wrote: > > >How about linking to wikipedia for those we don't have descriptions for? > >If that's something we're interested in, I'd even be willing to take a > >stab at creating new Wikipedia entries for those that don't exist in either > >location. > > What we need is a short description like those you find here: > > http://www.gutenberg.org/help/bibrec#format > > The average Wikipedia entry is too complex for people who just want to > know which format to download. I can do that. I hadn't ever been to this page, so I didn't know what the expectation was. > >Also, have we thought about making a PG wiki? > > There already is a wiki for the newsletter editors ... not used very often. That's a "not interested"? As I've previously mentioned, I'd be glad to help maintain stuff, but it's hard when so much of it exists only in your head. Or is there some documentation or a mailing list I'm not aware of? For my part, I don't use the newsletter editor wiki because I'm not a newsletter editor. :) From joey at joeysmith.com Wed Mar 29 13:20:51 2006 From: joey at joeysmith.com (joey) Date: Wed Mar 29 13:21:31 2006 Subject: File formats and the website (was Re: [gutvol-d] another dose of reality) In-Reply-To: <442AF08D.5000702@perathoner.de> References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com> <442ABF0E.6000607@perathoner.de> <20060329184422.GA30671@joeysmith.com> <442AF08D.5000702@perathoner.de> Message-ID: <20060329212051.GC30671@joeysmith.com> So, before I get *too* far down this path, here's what I've come up with so far. Is this usable to you? AVI: AVI files can contain both audio and video. They can generally be played with media players such as Windows Media Player, WinAmp, or Mplayer. See <a href="http://en.wikipedia.org/wiki/.avi">here</a> for more information. CSS: CSS (Cascading Style Sheets) are generally used to make HTML pages look nice, and are not intended for direct viewing. Your web browser will find these files as referenced by the HTML files that use them. See <a href="http://en.wikipedia.org/wiki/Cascading_Style_Sheets">here</a> for more information. DVI: The output format of a typesetting system called TeX. Generally more common on Unix-like platforms. Can be viewed using xdvi or Evince. See <a href="http://en.wikipedia.org/wiki/DVI_file_format">here</a> for more information. EPS: Short for "Encapsulated PostScript", it can generally be viewed with any PostScript viewer. A free PostScript viewer is available at <a href="http://www.cs.wisc.edu/~ghost/doc/AFPL/index.htm. See <a href="http://en.wikipedia.org/wiki/Encapsulated_PostScript">here</a> for more information. GIF: An image format generally viewable by any web browser. See <a href="http://en.wikipedia.org/wiki/GIF">here</a> for more information. ISO: A logical copy of a CD-ROM or other optical media. Most CD/DVD authoring utilities can deal with ISO images. A free tool for mounting these images on a Windows machine as though they were inserted into a CD-ROM drive is available <a href="http://www.daemon-tools.cc/dtcc/download.php">here</a>. A tool for burning ISOs to a physical CD-R or CD-RW on Windows is available <a href="http://isorecorder.alexfeinman.com/isorecorder.htm">here</a>. From Bowerbird at aol.com Wed Mar 29 13:49:14 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Mar 29 13:49:26 2006 Subject: [gutvol-d] any graphic designers out there? Message-ID: <2e8.47b10ad.315c5ada@aol.com> please help a design-challenged e-book programmer... check this out: >?? http://www.greatamericannovel.com/meyer/shot.html at the upper-left is a design for the title-page/cover. at the upper-right is the same thing, with coordinates, so you can tell how you'd suggest anything be moved... the upper-left design is repeated at lower-left, with an alternate design at lower-right.? do you prefer it? this isn't just for this one cover, or i wouldn't bother... it's concerning how i will write the all-purpose routine for formatting covers, so i'd like to do a good job of it, since it will be for thousands and thousands of books... i noticed josh uses left-justified headers in his .tei books; do people think that looks nice? or is the old-fashioned centering still the best way to go? (i think so, but i don't want to be too inflexible, so i'm willing to consider it all.) any other suggestions -- a splash of color or what-not? -- would be welcome as well to spruce up the look of this and move it into the digital world of the 21st-century e-book... while i'm at it, here's one of the backgrounds i've been using. > http://www.greatamericannovel.com/meyer/goodbook.jpg any feedback on that would be greatly appreciated, as would a reworking of your own design. (credit granted, naturally...) and here's a nice "page" background from brewster kahle: > http://www.greatamericannovel.com/meyer/leftblank.jpg combining my gutter with the page from brewster gives us: > http://www.greatamericannovel.com/meyer/blank.html (the colors don't match up, but you get the idea here; breaking the overall image up into pieces like these might be necessary.) anyway, if this is fun for anyone out there, have at it... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060329/ceab10fd/attachment.html From prosfilaes at gmail.com Wed Mar 29 22:33:01 2006 From: prosfilaes at gmail.com (David Starner) Date: Wed Mar 29 22:33:04 2006 Subject: [gutvol-d] another dose of reality In-Reply-To: <008601c65350$38bf5470$6755fea9@Corp.Symantec.Com> References: <20060329164206.9C3394F533@ws6-5.us4.outblaze.com> <008601c65350$38bf5470$6755fea9@Corp.Symantec.Com> Message-ID: <6d99d1fd0603292233j7aa17276mefb93098c0da4b30@mail.gmail.com> On 3/29/06, Brent Gueth <creeva@gmail.com> wrote: > If it is supposed to be machine readable and the tools are not released with > a method of quickly editing them or a description on how they are usable to > the community, why are they posted on the web site. Because it is the format of choice of certain people, and if it is to be the master format for that etext, it should be available to everyone. It would be against the policy of PG to find the master format and only let certain people make changes to the master format and use it to regenerate all the different forms. From c.shepard at yahoo.com Thu Mar 30 07:20:21 2006 From: c.shepard at yahoo.com (Chris Shepard) Date: Thu Mar 30 08:13:21 2006 Subject: [gutvol-d] rdfterms Message-ID: <20060330152021.45462.qmail@web38004.mail.mud.yahoo.com> Hi, The catalog.rdf file shows the pgterms namespace as living at http://www.gutenberg.org/rdfterms, but it's not there. (i.e., 404) Could someone send me the proper URL? Many thanks. "Computer Science is no more about computers than astronomy is about telescopes." -- E. W. Dijkstra __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From marcello at perathoner.de Thu Mar 30 08:55:57 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Mar 30 08:56:02 2006 Subject: [gutvol-d] rdfterms In-Reply-To: <20060330152021.45462.qmail@web38004.mail.mud.yahoo.com> References: <20060330152021.45462.qmail@web38004.mail.mud.yahoo.com> Message-ID: <442C0D9D.1010501@perathoner.de> Chris Shepard wrote: > The catalog.rdf file shows the pgterms namespace as living at > http://www.gutenberg.org/rdfterms, but it's not there. (i.e., 404) > > Could someone send me the proper URL? A namespace is not an URL (though it very much looks like one). "The attribute's normalized value MUST be either an IRI reference ? the namespace name identifying the namespace ? or an empty string. The namespace name, to serve its intended purpose, SHOULD have the characteristics of uniqueness and persistence. It is not a goal that it be directly usable for retrieval of a schema (if any exists). Uniform Resource Names [RFC2141] is an example of a syntax that is designed with these goals in mind. However, it should be noted that ordinary URLs can be managed in such a way as to achieve these same goals." http://www.w3.org/TR/xml-names11/#ns-decl -- Marcello Perathoner webmaster@gutenberg.org From gbuchana at rogers.com Thu Mar 30 16:31:12 2006 From: gbuchana at rogers.com (Gardner Buchanan) Date: Thu Mar 30 16:37:13 2006 Subject: [gutvol-d] any graphic designers out there? In-Reply-To: <2e8.47b10ad.315c5ada@aol.com> References: <2e8.47b10ad.315c5ada@aol.com> Message-ID: <442C7850.7030007@rogers.com> Hi there, Bowerbird@aol.com wrote: > the upper-left design is repeated at lower-left, with > an alternate design at lower-right. do you prefer it? > Yes, I like the lower-right version better. I like "--" rendered as an m-dash. I'm not fussy about turning "Copyright" int the "(C)" symbol, but suit yourself. I think if you wanted that symbol, you would use the (C) markup. ============================================================ Gardner Buchanan <gbuchana@rogers.com> Ottawa, ON FreeBSD: Where you want to go. Today. From bzg at altern.org Fri Mar 31 04:22:05 2006 From: bzg at altern.org (Bastien) Date: Fri Mar 31 05:39:05 2006 Subject: [gutvol-d] Re: File formats and the website In-Reply-To: <20060329212051.GC30671@joeysmith.com> (joey@joeysmith.com's message of "Wed, 29 Mar 2006 14:20:51 -0700") References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com> <442ABF0E.6000607@perathoner.de> <20060329184422.GA30671@joeysmith.com> <442AF08D.5000702@perathoner.de> <20060329212051.GC30671@joeysmith.com> Message-ID: <87psk2olci.fsf@tallis.ilo.ucl.ac.uk> joey <joey@joeysmith.com> writes: > So, before I get *too* far down this path, here's what I've come up with > so far. Is this usable to you? I think it's a pretty good start. May i suggest you to have a look at this: http://www.openformats.org/ Maybe it's often too didactic/normative for PG's purpose, but i think you can grab some useful content. Some excerpts: * Plain text: http://www.openformats.org/en60 Plain text (ASCII) Whenever possible, just avoid using formatted text: using plain text (either ascii or .txt format) guarantees complete access for everyone, regardless of their software, their operating system or the computer they are using. In your emails, if what is important to you is the content and not the formatting, send the text directly in the body of your message instead of sending it as an attachment. Plain text can carry no virus, it is extremely light and can be easily used to create tables (with tabs or commas) which any software is able to read. * HTML: http://www.openformats.org/en61 Hyper Text Markup Language (HTML) HTML format is the standard language for the web, and it was defined by an standardizing international organization (the W3_Consortium). HTML is a flexible universal format, rich and compact. Native HTML (with no javascript) can carry no virus and can be read on any platform. Note: The HTML code produced by Word is semi-proprietary, and it is prone to include information which cannot be displayed on all platforms. * TeX, LaTeX, DVI: http://www.openformats.org/en62 TeX, LaTeX and Device Independent Format (DVI) TeX is both a language to typeset documents and a programming language. Originally written to typeset mathematical documents in a professional manner, it is now used in many other areas. LaTeX is also a typsetting and programming language. It's actually a simplified version of TeX which enables top level instruction manipulation, just as HTML is a simplified version of SGML. DVI. A TeX or LaTeX source file must be compiled. The result of this compilation is in DVI format, readable on any platform. Most of the time, the result of the compilation will, in turn, be converted to PDF or PS. * OpenDocument: http://www.openformats.org/en62x1 OpenDocument is: a. An open, XML-based file format. b. An open standard, supported by the OASIS and ISO standards groups. c. The default file format for OpenOffice.org 2.0 and KOffice 1.4. d. A top prospect for an official format for the European Commission. e. Our best chance to fight vendor lock-in associated with proprietary formats. * RTF: http://www.openformats.org/en63 Rich Text Format (RTF) RTF format was introduced by Microsoft to create a standard format for text formatting. It offers the same format variety than DOC, all the while being (at least in its native version) a format with public specifications. Most word-processing programs are capable or reading and writing this format, but because certain programs tend to use proprietary extensions of this format, its compatibility remains uncertain. * PS: http://www.openformats.org/en64 PostScript (PS) The PostScript format is a language describing a page, developped by Adobe in 1985, created for printing and widely used in typography. One of its advantages is that it is universal (it is independent from the format of the original file) and it cannot carry viruses. Contrary to PDF format, PostScript does not allow to copy text viewed on a screen to paste it in another application. It can be generated with compatible printers (option: 'print in file') and with the GhostScript program. * PDF: http://www.openformats.org/en65 Portable Document Format (PDF) PDF format (Portable Document Format), developed by Adobe, is a document presentation format, the specifications for PDF are available on the web. It is a universal format (regardless of which platform and software are used to generate it), compatible with any printer, flexible (you can substitute fonts, add links, bookmarks, notes) and legible onscreen with the appropriate plugins. It can be generated with Adobe Acrobat, with the open source software GhostScript or created on the fly in a Unix environment. * JPEG: http://www.openformats.org/en66 Joint Photographic Expert Group (JPEG) JPEG is one of the most efficient picture compression formats currently available. This open format is very light and allows you to determine the rate of data compression, knowing that the higher the compression rate, the lower the quality of the picture. JPEG follows a process of cumulative compression: the image is clearly affected if you open it and save it with a new compression rate. A variant of this format, progressive JPEG, allows you to optimise the time it takes to display the picture on internet. The new JPEG_2000 standard, currently being defined, will allow for a better quality/compression ratio as well as the indexing of pictures with keywords. * PNG: http://www.openformats.org/en67 Portable Network Graphics (PNG) PNG-8 and PNG-24 are two open formats which are also license-free. They represent the principal alternative to the GIF format, specially created to optimise the display of images on internet. They allow data compression without loss of information and are supported by most browsers. The size of a PNG file remains significantly higher than its JPEG equivalent. However, PNG will advantageously replace GIF for images which are 8-bit or less. * ... -- Bastien