From bruce at zuhause.org  Wed Mar  1 07:57:11 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Wed Mar  1 08:21:14 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
In-Reply-To: <2de.3175a87.31364202@aol.com>
References: <2de.3175a87.31364202@aol.com>
Message-ID: <17413.50263.444498.622315@celery.zuhause.org>

Bowerbird@aol.com writes:
 > after all, they "own" that public-domain
 > material just as much as you or i "own" it.

Yes, but PG doesn't claim a copyright on the PD material, unlike
Kessenger and the others. It's a shame that they are able to hide a PD
book behind a copyrighted cover.

 > no, i think the blame falls on _our_ shoulders,
 > because as the people dedicated to providing
 > full and free access to the public domain, we are
 > failing in our mission by not ensuring that google
 > has a no-pages-restricted entity in its book-search
 > for each and every public-domain book that they have.

How many PD books have you found in Google Book Search that were not
visible?  Did you report them to Google?  If not, some of the blame
falls on _your_ shoulders.
From greg at durendal.org  Wed Mar  1 08:24:30 2006
From: greg at durendal.org (Greg Weeks)
Date: Wed Mar  1 09:00:08 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
In-Reply-To: <17413.50263.444498.622315@celery.zuhause.org>
References: <2de.3175a87.31364202@aol.com>
	<17413.50263.444498.622315@celery.zuhause.org>
Message-ID: <Pine.LNX.4.63.0603011123300.10029@durendal.durendal.org>

On Wed, 1 Mar 2006, Bruce Albrecht wrote:

> How many PD books have you found in Google Book Search that were not
> visible?  Did you report them to Google?  If not, some of the blame
> falls on _your_ shoulders.

I don't think it works this way. If the books are in there because the 
publisher added them and the publisher claims they are under copyright 
there is nothing you can do to change it.

-- 
Greg Weeks
http://durendal.org:8080/greg/

From hyphen at hyphenologist.co.uk  Wed Mar  1 09:54:36 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Wed Mar  1 09:54:59 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
In-Reply-To: <Pine.LNX.4.63.0603011123300.10029@durendal.durendal.org>
References: <2de.3175a87.31364202@aol.com>
	<17413.50263.444498.622315@celery.zuhause.org>
	<Pine.LNX.4.63.0603011123300.10029@durendal.durendal.org>
Message-ID: <gonb02pgfj67ftl4vdqcndm86s20dof57j@4ax.com>

On Wed, 1 Mar 2006 11:24:30 -0500 (EST),  Greg Weeks <greg@durendal.org>
wrote:

|On Wed, 1 Mar 2006, Bruce Albrecht wrote:
|
|> How many PD books have you found in Google Book Search that were not
|> visible?  Did you report them to Google?  If not, some of the blame
|> falls on _your_ shoulders.
|
|I don't think it works this way. If the books are in there because the 
|publisher added them and the publisher claims they are under copyright 
|there is nothing you can do to change it.

AFAIK the copyright notice is valid, but only applies to the page and line
layout plus cover layout of the new paper edition, not the PG text.    Not
that they would tell you that.    
-- 
Dave Fawthrop <dave hyphenologist co uk>
Freedom of Speech, Expression, Religion, and Democracy are 
the keys to Civilization, together with legal acceptance of 
Fundamental Human rights.

From gbnewby at pglaf.org  Wed Mar  1 10:55:17 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Mar  1 10:55:18 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
In-Reply-To: <gonb02pgfj67ftl4vdqcndm86s20dof57j@4ax.com>
References: <2de.3175a87.31364202@aol.com>
	<17413.50263.444498.622315@celery.zuhause.org>
	<Pine.LNX.4.63.0603011123300.10029@durendal.durendal.org>
	<gonb02pgfj67ftl4vdqcndm86s20dof57j@4ax.com>
Message-ID: <20060301185517.GA29172@pglaf.org>

On Wed, Mar 01, 2006 at 05:54:36PM +0000, Dave Fawthrop wrote:
> On Wed, 1 Mar 2006 11:24:30 -0500 (EST),  Greg Weeks <greg@durendal.org>
> wrote:
> 
> |On Wed, 1 Mar 2006, Bruce Albrecht wrote:
> |
> |> How many PD books have you found in Google Book Search that were not
> |> visible?  Did you report them to Google?  If not, some of the blame
> |> falls on _your_ shoulders.

(There are *many*, but in many cases the print publisher claimed
a copyright inappropriately or imprecisely.

> |I don't think it works this way. If the books are in there because the 
> |publisher added them and the publisher claims they are under copyright 
> |there is nothing you can do to change it.
> 
> AFAIK the copyright notice is valid, but only applies to the page and line
> layout plus cover layout of the new paper edition, not the PG text.    Not
> that they would tell you that.    

Not in our opinion (which has been vetted by several expert
copyright lawyers):

	No Sweat of the Brow Copyright
	http://www.gutenberg.org/howto/sweat-no-c

  -- Greg
From marcello at perathoner.de  Wed Mar  1 10:55:17 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Mar  1 10:55:22 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
In-Reply-To: <2de.3175a87.31364202@aol.com>
References: <2de.3175a87.31364202@aol.com>
Message-ID: <4405EE15.5030601@perathoner.de>

Bowerbird@aol.com wrote:

> indeed, in the sense that they
> offer customers the option of a
> hard-copy printing of an e-text,
> i think they're providing a service.

Those people know they can take a PG text, format it, print a hardcopy 
and sell it. That's done in good faith. Nobody has any problem with that.

They also know that formatting a text does not give them any copyright 
whatsoever. But still they stick a copyright notice on a public domain 
text. That's done in bad faith. They didn't even proof-read the text, or 
they would have noticed those errors.


> we are
> failing in our mission by not ensuring that google
> has a no-pages-restricted entity in its book-search
> for each and every public-domain book that they have.

You are contradicting yourself. Google is a commercial enterprise just 
like those two-bit publishers. Why do you require ethical behaviour from 
Google and not from those other publishers?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From gbnewby at pglaf.org  Wed Mar  1 11:00:08 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Mar  1 11:00:10 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
In-Reply-To: <17413.50263.444498.622315@celery.zuhause.org>
References: <2de.3175a87.31364202@aol.com>
	<17413.50263.444498.622315@celery.zuhause.org>
Message-ID: <20060301190008.GB29172@pglaf.org>

On Wed, Mar 01, 2006 at 09:57:11AM -0600, Bruce Albrecht wrote:
> ...
> How many PD books have you found in Google Book Search that were not
> visible?  Did you report them to Google?  If not, some of the blame
> falls on _your_ shoulders.

How is such notification done?  

To easily find some examples, look for Jane Austen's
works, H.G. Wells, and other well-known long-dead authors.

Note that they seem to use a ridiculous date for
"world wide" public domain...something in the 1800s,
rather than a "US-Safe" cutoff of 1923.  

I think they (Google) are creating ambiguity when there
is none, at least in the US.
  -- Greg

From Bowerbird at aol.com  Wed Mar  1 11:06:37 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Mar  1 11:06:45 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
Message-ID: <1a4.4927bbe9.31374abd@aol.com>

bruce said:
>   It's a shame that they are able to hide a PD
>    book behind a copyrighted cover.

major publishing houses do it all the time with p-books.

and there's nothing wrong with it legally.   or even "morally".

most especially if they're delivering hardcopy.
paper doesn't grow on trees.   (ha ha, i funny!)

(i would agree that it would be kind of sleazy to
charge people for an electronic-copy, especially
if all they got was a download, not an actual c.d.
then again, if p.g. wasn't getting free hosting from
ibiblio, and they had to pay for all the bandwidth...)

***


again, there is nothing to "report".   what they're doing
is (a) perfectly legal, and (b) a service to their customers.

if we want to give people an unrestricted view of the pages,
we need to submit the book to the program and specify that.

i'll get around to doing it myself sooner or later, providing
google doesn't charge publishers to get their titles listed...

but it's really something i think p.g. should do, systematically.

not that anyone gives a flying burrito what i think p.g. should do.

***

greg said:
>    If the books are in there because the publisher added them 
>    and the publisher claims they are under copyright
>    there is nothing you can do to change it.

i don't think you have to claim the text is under copyright
to restrict viewing access on any or even all of your pages.

like i said, our job is to submit books to the program that
have _no_ restrictions on their viewing.   that will serve to
drive some of the parasites from the scene, but have little
negative effect on the people who are offering a _service_
to end-users by giving them a hard-copy option.   (indeed,
visibility might have a positive effect on those businesses.)

another question is whether project gutenberg wants to
get in the hard-copy business itself.   p.g. could probably
make a little bit of money doing it, or maybe lose a little,
because it ain't always easy to satisfy the general public,
but either way, i don't see any volunteers for the task...

(as branko will tell you, it can take a few hours of work to
get an e-text into the shape where you could feel good
offering it for sale, and that means a heckuva lot of work
for a library that stands at 18,000+ e-texts and growing.
of course, if the e-texts had been formatted consistently,
it'd be a piece of cake to get them into publication shape.
that's one reason i've hollered so long about consistency.)

you wouldn't _have_to_ get into the hard-copy business
in order to _list_ the titles so that people could view your
pages without restriction.   you could just specify a cost of
$8,000 per title, and probably drive away any customers.
(and if you did get a customer, it would be worth it, not?)
but meanwhile, the pages would be sitting there, viewable.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060301/49fab548/attachment.html
From Bowerbird at aol.com  Wed Mar  1 11:38:01 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Mar  1 11:38:18 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
Message-ID: <2f3.220a96.31375219@aol.com>

marcello said:
>    Why do you require ethical behaviour from
>    Google and not from those other publishers?

it's folly to try to "require ethical behaviour"
from _anyone_ in the publishing industry...

but nobody's doing any "unethical" anyway...

as i pointed out just now, the copyright issue has
absolutely no bearing on viewability of any pages.

if you think google should display the pages of a
public-domain title without any restrictions, then
all you need to do is submit it to their program...

heck, the first book i would submit would be
"books and culture", the public-domain title
that google made available as its first example.

you can see my reworking of this title right now:
>    http://snowy.arsc.alaska.edu/bowerbird/mabie/mabie.html
>    http://snowy.arsc.alaska.edu/bowerbird/mabie/mabiep001.html

and when i submitted my .pdf to their program,
i would even leave in the "google print" stamp
that they put on each page, just as a little joke...

these scans actually came from "google library",
now known as "google book search", and not the
"google print" program, which is the one geared
towards commercial publishers, but google was
a little casual with their project-names early on.
and now i don't think anyone is very clear about
where one program ends and another begins,
not even google...

but i just looked, and the program is indeed free:
>    http://books.google.com/intl/en/googlebooks/publisher.html

so if you want unfettered titles in the program,
the only thing you need to do is take action...

by the way, here's my mabie example for displaying a scan-book:
>    http://snowy.arsc.alaska.edu/bowerbird/mabie/mabied002003.html

readers will find my interface more pleasant than the google one:
>    http://books.google.com/books?id=yGZZXIrbUKQC&pg=PA5


>   They didn't even proof-read the text, 
>    or they would have noticed those errors.

this is so funny it's not even funny...         :+)

-bowerbird

p.s.   yes, "monday morning quarterback" _is_ late, but it's coming...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060301/41532ef6/attachment-0001.html
From hyphen at hyphenologist.co.uk  Wed Mar  1 12:18:22 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Wed Mar  1 12:18:42 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
In-Reply-To: <20060301185517.GA29172@pglaf.org>
References: <2de.3175a87.31364202@aol.com>
	<17413.50263.444498.622315@celery.zuhause.org>
	<Pine.LNX.4.63.0603011123300.10029@durendal.durendal.org>
	<gonb02pgfj67ftl4vdqcndm86s20dof57j@4ax.com>
	<20060301185517.GA29172@pglaf.org>
Message-ID: <960c02pahh1fm8cqlipuiq4okkl9o8fut2@4ax.com>

On Wed, 1 Mar 2006 10:55:17 -0800,  Greg Newby <gbnewby@pglaf.org> wrote:

|On Wed, Mar 01, 2006 at 05:54:36PM +0000, Dave Fawthrop wrote:
|> On Wed, 1 Mar 2006 11:24:30 -0500 (EST),  Greg Weeks <greg@durendal.org>
|> wrote:
|> 
|> |On Wed, 1 Mar 2006, Bruce Albrecht wrote:
|> |
|> |> How many PD books have you found in Google Book Search that were not
|> |> visible?  Did you report them to Google?  If not, some of the blame
|> |> falls on _your_ shoulders.
|
|(There are *many*, but in many cases the print publisher claimed
|a copyright inappropriately or imprecisely.
|
|> |I don't think it works this way. If the books are in there because the 
|> |publisher added them and the publisher claims they are under copyright 
|> |there is nothing you can do to change it.
|> 
|> AFAIK the copyright notice is valid, but only applies to the page and line
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|> layout plus cover layout of the new paper edition, not the PG text.    Not
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                 
|> that they would tell you that.    
|
|Not in our opinion (which has been vetted by several expert
|copyright lawyers):
|
|	No Sweat of the Brow Copyright
|	http://www.gutenberg.org/howto/sweat-no-c

Which is what I said :-(
-- 
Dave Fawthrop <dave hyphenologist co uk>
Freedom of Speech, Expression, Religion, and Democracy are 
the keys to Civilization, together with legal acceptance of 
Fundamental Human rights.

From joshua at hutchinson.net  Wed Mar  1 13:34:25 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Mar  1 13:34:31 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
Message-ID: <20060301213425.405654F532@ws6-5.us4.outblaze.com>


> ----- Original Message -----
> From: "Dave Fawthrop" <hyphen@hyphenologist.co.uk>
> To: gbnewby@pglaf.org, "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
> Subject: Re: [gutvol-d] Commercial paper editions of PG texts
> Date: Wed, 01 Mar 2006 20:18:22 +0000
> 
> 
> |>
> |> AFAIK the copyright notice is valid, but only applies to the page and line
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> |> layout plus cover layout of the new paper edition, not the PG text.    Not
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> |> that they would tell you that.
> |
> |Not in our opinion (which has been vetted by several expert
> |copyright lawyers):
> |
> |	No Sweat of the Brow Copyright
> |	http://www.gutenberg.org/howto/sweat-no-c
> 
> Which is what I said :-(

If I understand it right, there is no separate layout copyright in the US (which is the law PG operates under).  So, no, the copyright notice is not valid because there is no copyright available for layout.

Josh
From ag737 at freenet.carleton.ca  Wed Mar  1 12:36:38 2006
From: ag737 at freenet.carleton.ca (Wallace J.McLean)
Date: Wed Mar  1 13:36:40 2006
Subject: [gutvol-d] Sweat of brow
Message-ID: <d9515ad961bc.d961bcd9515a@ncf.ca>


>> AFAIK the copyright notice is valid, but only applies to the page 
and line
>> layout plus cover layout of the new paper edition, not the PG 
text.    Not
>> that they would tell you that.    

> Not in our opinion (which has been vetted by several expert
> copyright lawyers):

> No Sweat of the Brow Copyright
> http://www.gutenberg.org/howto/sweat-no-c


Not sure about in the US, but the case law, at least when I researched 
it about 7 or 8 years ago, was still unsettled in Canada and other 
commonwealth countries. I don't know off hand that that situation has 
changed.

In the UK, most if not all EU countries, and much of the Commonwealth -
- though not in Canada -- there is also an express provision 
protecting "editions" or "typographical arrangements". The term is 
generally shorter than full copyright, and of course they aren't 
exclusive. You can prepare another edition or typographical 
arrangement of Shakespeare in the UK, you just can't republish an 
exact copy of someone else's in the first 25 years after publication.

Quaere; does "edition" or "typographical arrangement" copyright 
subsist in PG e-texts, in countries which recognize those types of 
copyright?


From bruce at zuhause.org  Wed Mar  1 18:23:47 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Wed Mar  1 18:23:50 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
In-Reply-To: <20060301190008.GB29172@pglaf.org>
References: <2de.3175a87.31364202@aol.com>
	<17413.50263.444498.622315@celery.zuhause.org>
	<20060301190008.GB29172@pglaf.org>
Message-ID: <17414.22323.407940.157838@celery.zuhause.org>

Greg Newby writes:
 > On Wed, Mar 01, 2006 at 09:57:11AM -0600, Bruce Albrecht wrote:
 > > ...
 > > How many PD books have you found in Google Book Search that were not
 > > visible?  Did you report them to Google?  If not, some of the blame
 > > falls on _your_ shoulders.
 > 
 > How is such notification done?  

Well, when I've been doing book searches at Google, and it comes up
with a book that doesn't say that it was provided by a publisher, and
the book information claims it was copyrighted before 1923, or I can
find the copyright in a snippet, I use Google's feeback link to report
that the book is incorrectly flagged as being in copyright so that
they will fix the status.  In one case, they fixed it after a 4-5
email exchange.  In other cases, they simply told me that they were
aware that some books were incorrectly identified as in copyright.
From gbnewby at pglaf.org  Sat Mar  4 13:36:10 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat Mar  4 13:36:11 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
In-Reply-To: <17414.22323.407940.157838@celery.zuhause.org>
References: <2de.3175a87.31364202@aol.com>
	<17413.50263.444498.622315@celery.zuhause.org>
	<20060301190008.GB29172@pglaf.org>
	<17414.22323.407940.157838@celery.zuhause.org>
Message-ID: <20060304213610.GJ6307@pglaf.org>

On Wed, Mar 01, 2006 at 08:23:47PM -0600, Bruce Albrecht wrote:
> Greg Newby writes:
>  > On Wed, Mar 01, 2006 at 09:57:11AM -0600, Bruce Albrecht wrote:
>  > > ...
>  > > How many PD books have you found in Google Book Search that were not
>  > > visible?  Did you report them to Google?  If not, some of the blame
>  > > falls on _your_ shoulders.
>  > 
>  > How is such notification done?  
> 
> Well, when I've been doing book searches at Google, and it comes up
> with a book that doesn't say that it was provided by a publisher, and
> the book information claims it was copyrighted before 1923, or I can
> find the copyright in a snippet, I use Google's feeback link to report
> that the book is incorrectly flagged as being in copyright so that
> they will fix the status.  In one case, they fixed it after a 4-5
> email exchange.  In other cases, they simply told me that they were
> aware that some books were incorrectly identified as in copyright.

Do they consider 1923 as a cutoff date (per US law)?  Or
do they look to 1868 or something similar as a cutoff,
as an attempt to only say "public domain" if it's defensibly
for the entire world?
  -- Greg
From gbnewby at pglaf.org  Sat Mar  4 13:45:33 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat Mar  4 13:45:34 2006
Subject: [gutvol-d] Sweat of brow
In-Reply-To: <d9515ad961bc.d961bcd9515a@ncf.ca>
References: <d9515ad961bc.d961bcd9515a@ncf.ca>
Message-ID: <20060304214533.GK6307@pglaf.org>

On Wed, Mar 01, 2006 at 03:36:38PM -0500, Wallace J.McLean wrote:
> 
> >> AFAIK the copyright notice is valid, but only applies to the page 
> and line
> >> layout plus cover layout of the new paper edition, not the PG 
> text.    Not
> >> that they would tell you that.    
> 
> > Not in our opinion (which has been vetted by several expert
> > copyright lawyers):
> 
> > No Sweat of the Brow Copyright
> > http://www.gutenberg.org/howto/sweat-no-c
> 
> 
> Not sure about in the US, but the case law, at least when I researched 
> it about 7 or 8 years ago, was still unsettled in Canada and other 
> commonwealth countries. I don't know off hand that that situation has 
> changed.

PG's volunteer lawyers are not aware of any case law for this specific
question, either.  The sweat-no-c document is based on their research on
the US Title 17, and surrounding/related case law like Feist v. 
Rural Telephone Co. 

BTW, there is at least one grey area: when display involves code (say,
some Javascript or even CSS, or something more complex like a complete
viewer).  In that type of situation, the code itself will likely be
copyrighted, even if the content it displays is not.  The challenge is
when the copyrighted code is embedded / intermixed with the public
domain content -- like with CSS or Javascript & HTML.  PG tends to
avoid this by doing our own markup, & asserting it's public domain,
but this grey area might limit some of our harvesting activities.

> In the UK, most if not all EU countries, and much of the Commonwealth -
> - though not in Canada -- there is also an express provision 
> protecting "editions" or "typographical arrangements". The term is 
> generally shorter than full copyright, and of course they aren't 
> exclusive. You can prepare another edition or typographical 
> arrangement of Shakespeare in the UK, you just can't republish an 
> exact copy of someone else's in the first 25 years after publication.
> 
> Quaere; does "edition" or "typographical arrangement" copyright 
> subsist in PG e-texts, in countries which recognize those types of 
> copyright?

Short answer: we only try to follow US laws, and I'm unaware
of anything like this provision in the US.  The closest is copyrights
on specific fonts, which doesn't really matter much for PG since
we seldom include scans of the raw pages from a print book with
our eBooks, and our sources tend to be pretty old anyway.

It might be this type of copyright (or whatever it might be called)
would play a role in the EU and elsewhere...though in practice, if a
physical book is old enough to be public domain based on author's death
date, probably any typography copyright has also expired.


From ag737 at freenet.carleton.ca  Sun Mar  5 12:17:33 2006
From: ag737 at freenet.carleton.ca (Wallace J.McLean)
Date: Sun Mar  5 12:17:36 2006
Subject: [gutvol-d] Sweat of brow
Message-ID: <e0bf77e06c1b.e06c1be0bf77@ncf.ca>

----- Original Message ----- 
>From  Greg Newby <gbnewby@pglaf.org> 
Date  Sat, 4 Mar 2006 13:45:33 -0800 
To  Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org> 
Subject  Re: [gutvol-d] Sweat of brow 

>> Quaere; does "edition" or "typographical arrangement" copyright 
>> subsist in PG e-texts, in countries which recognize those types of 
>> copyright?
>
> Short answer: we only try to follow US laws, and I'm unaware
> of anything like this provision in the US.

Yes, but even if there were, ***the copyright status of a PG work in 
another country*** is determined by the national law of that country, 
not of the US.

> It might be this type of copyright (or whatever it might be called)
> would play a role in the EU and elsewhere...though in practice, if a
> physical book is old enough to be public domain based on author's 
death
> date, probably any typography copyright has also expired.

Yes, but that's missing the point of typographical or edition 
copyright: a new typographical arrangement or edition of the work, in 
that form, has a copyright attached to it. Not the work, the 
typographical arrangement or edition of it. PG is infringing neither 
the copyright in the work nor the copyright in the typographical 
arrangement; but the PG edition may have copyright status in a country 
that DOES recognize typographical arrangements or editions, subject to 
national treatment and shorter-term provisions in that country's 
national law.






From hart at pglaf.org  Sun Mar  5 12:58:07 2006
From: hart at pglaf.org (Michael Hart)
Date: Sun Mar  5 12:58:10 2006
Subject: [gutvol-d] Sweat of brow
In-Reply-To: <e0bf77e06c1b.e06c1be0bf77@ncf.ca>
References: <e0bf77e06c1b.e06c1be0bf77@ncf.ca>
Message-ID: <Pine.LNX.4.60.0603051251440.10668@pglaf.org>


On Sun, 5 Mar 2006, Wallace J.McLean wrote:

> ----- Original Message -----
>> From  Greg Newby <gbnewby@pglaf.org>
> Date  Sat, 4 Mar 2006 13:45:33 -0800
> To  Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
> Subject  Re: [gutvol-d] Sweat of brow
>
>>> Quaere; does "edition" or "typographical arrangement" copyright
>>> subsist in PG e-texts, in countries which recognize those types of
>>> copyright?
>>
>> Short answer: we only try to follow US laws, and I'm unaware
>> of anything like this provision in the US.
>
> Yes, but even if there were, ***the copyright status of a PG work in
> another country*** is determined by the national law of that country,
> not of the US.
>
>> It might be this type of copyright (or whatever it might be called)
>> would play a role in the EU and elsewhere...though in practice, if a
>> physical book is old enough to be public domain based on author's
> death
>> date, probably any typography copyright has also expired.
>
> Yes, but that's missing the point of typographical or edition
> copyright: a new typographical arrangement or edition of the work, in
> that form, has a copyright attached to it. Not the work, the
> typographical arrangement or edition of it. PG is infringing neither
> the copyright in the work nor the copyright in the typographical
> arrangement; but the PG edition may have copyright status in a country
> that DOES recognize typographical arrangements or editions, subject to
> national treatment and shorter-term provisions in that country's
> national law.
>

If anyone really wants to argue that point, we can insert some more
"legal fine print" to the effect that PG places all such possibly
copyrightable material into the public domain in all counries.

We should put all such potential legal claims to rest before any
can even get started.


Thanks!!!

Give the world eBooks in 2006!!!

Michael S. Hart
Founder
Project Gutenberg

From Bowerbird at aol.com  Sun Mar  5 13:23:45 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Mar  5 13:23:54 2006
Subject: [gutvol-d] let's sweat -- i.s.b.n. anyone?
Message-ID: <28b.6b580a2.313cb0e1@aol.com>

i.s.b.n. are relatively cheap in lots of 10,000...
howsabout y'all buy enough for your library?

and has there been any progress on putting
your free versions of your e-texts into google?
or would you just rather wring your hands and
shake your fists at the people reselling them?

if you want your distribution to be "unlimited",
these are some simple steps that you can take.

-bowerbird

p.s.   issue 1 of "monday morning quarterback"
will go up tomorrow, in case you were wondering.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060305/3dd6d631/attachment.html
From tb at baechler.net  Mon Mar  6 00:22:46 2006
From: tb at baechler.net (Tony Baechler)
Date: Mon Mar  6 00:22:28 2006
Subject: [gutvol-d] Sweat of brow
In-Reply-To: <e0bf77e06c1b.e06c1be0bf77@ncf.ca>
References: <e0bf77e06c1b.e06c1be0bf77@ncf.ca>
Message-ID: <7.0.1.0.2.20060306001820.02af50d0@baechler.net>

Hi.  sorry to nitpick here, and I admit this is out of my league, but 
wouldn't a plain text edition remove any and all fonts or typography 
anyway?  Let's say that you harvest an html or pdf from a country 
where typography is still under copyright.  It is converted to plain 
text to comply with PG standards, plus valid html etc.  Now it gets 
imported back into the original country.  Wouldn't it be legal 
because the fonts have been removed?  Am I missing something 
obvious?  I've followed the thread and understand it relates to 
google's idea of public domain, but it would seem to me that the 
copyrighted portion was removed (the typography) so it wouldn't matter.

At 12:17 PM 3/5/2006, you wrote:
>Yes, but that's missing the point of typographical or edition
>copyright: a new typographical arrangement or edition of the work, in
>that form, has a copyright attached to it. Not the work, the
>typographical arrangement or edition of it. PG is infringing neither
>the copyright in the work nor the copyright in the typographical
>arrangement; but the PG edition may have copyright status in a country
>that DOES recognize typographical arrangements or editions, subject to
>national treatment and shorter-term provisions in that country's
>national law.

From sly at victoria.tc.ca  Mon Mar  6 00:44:17 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon Mar  6 00:44:19 2006
Subject: [gutvol-d] Sweat of brow
In-Reply-To: <7.0.1.0.2.20060306001820.02af50d0@baechler.net>
References: <e0bf77e06c1b.e06c1be0bf77@ncf.ca>
	<7.0.1.0.2.20060306001820.02af50d0@baechler.net>
Message-ID: <Pine.GSO.4.58.0603060023270.15295@vtn1.victoria.tc.ca>

Hi Tony.

I believe that the matter under discussion was not a matter
of the rights surrounding the source texts used to produce PG
materials, but rather rights of the PG texts themselves.

The argument (if I understood correctly) was that in some
juristictions legal usage of PG texts might be restricted
because the "typesetting" done in preparing the PG texts
would qualify for certain protections on its own.

This whole discussion reminds me of a few points I like to
make when people start making overly broad, generalized
statements about copyright laws:

1) Copyright is not just one right, but a bundle of
rights.
2) It is treated on a national basis--that is every
country has its own copyright laws and quirks about how
those laws are applied. (and to what types of material,
and what aspects of a given work, and with what definition
of certain terms, etc., etc.)
3) The state of likely copyright status for an item
in a given country at a given time can be affected by
laws that were passed many previously; various amendments
that have taken place over time, under pressure from
various sources; international conventions the country
may be a member of; precedence set by certain legal
decisions, etc.


Andrew

On Mon, 6 Mar 2006, Tony Baechler wrote:

> Hi.  sorry to nitpick here, and I admit this is out of my league, but
> wouldn't a plain text edition remove any and all fonts or typography
> anyway?  Let's say that you harvest an html or pdf from a country
> where typography is still under copyright.  It is converted to plain
> text to comply with PG standards, plus valid html etc.  Now it gets
> imported back into the original country.  Wouldn't it be legal
> because the fonts have been removed?  Am I missing something
> obvious?  I've followed the thread and understand it relates to
> google's idea of public domain, but it would seem to me that the
> copyrighted portion was removed (the typography) so it wouldn't matter.
>
From hyphen at hyphenologist.co.uk  Mon Mar  6 01:51:10 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Mon Mar  6 01:52:09 2006
Subject: [gutvol-d] Sweat of brow
In-Reply-To: <Pine.GSO.4.58.0603060023270.15295@vtn1.victoria.tc.ca>
References: <e0bf77e06c1b.e06c1be0bf77@ncf.ca>
	<7.0.1.0.2.20060306001820.02af50d0@baechler.net>
	<Pine.GSO.4.58.0603060023270.15295@vtn1.victoria.tc.ca>
Message-ID: <8v0o02t5tre1m32e6chvv2fm7tonfji52s@4ax.com>

On Mon, 6 Mar 2006 00:44:17 -0800 (PST),  Andrew Sly <sly@victoria.tc.ca>
wrote:

|Hi Tony.
|
|I believe that the matter under discussion was not a matter
|of the rights surrounding the source texts used to produce PG
|materials, but rather rights of the PG texts themselves.
|
|The argument (if I understood correctly) was that in some
|juristictions legal usage of PG texts might be restricted
|because the "typesetting" done in preparing the PG texts
|would qualify for certain protections on its own.

In which case IMO PG should put all its work explicitly into the public
domain  as MH suggested up thread.   

This is what I did with my Yorkshire Dialect work on my Web Site.    It has
been copied widely, (I can tell it is my work from the line breaks of
poetry)  including a Print on Demand outfit.   As I work Pro Bono Publico I
am happy to see my work reproduced anywhere.
-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain

From Bowerbird at aol.com  Mon Mar  6 10:37:00 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Mar  6 10:37:07 2006
Subject: [gutvol-d] monday morning quarterback #01 is now up
Message-ID: <65.568a702f.313ddb4c@aol.com>

"monday morning quarterback", #01, is up...

"m.m.q." is my new weekly "series" on best-practices
for people digitizing e-books from paper-books...

issue #1 is now available at the "bpsuper" listserve:
>?? http://groups.yahoo.com/group/bpsuper/message/3

you don't have to be a member of the "bpsuper" yahoogroup
in order to read the messages.? but if yahoogroups gives you
an allergic reaction nonetheless, the issue is also posted here:
>?? http://snowy.arsc.alaska.edu/bowerbird/mmq/mmq01.txt

learn what this "monday morning quarterback" says
you are doing wrong, and what you're doing right...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060306/2e1d36a5/attachment.html
From ag737 at freenet.carleton.ca  Mon Mar  6 12:07:26 2006
From: ag737 at freenet.carleton.ca (Wallace J.McLean)
Date: Mon Mar  6 12:07:28 2006
Subject: [gutvol-d] Sweat of brow
Message-ID: <e2cfefe2fec4.e2fec4e2cfef@ncf.ca>


> ----- Original Message ----- 
> From  Tony Baechler <tb@baechler.net> 
> Date  Mon, 06 Mar 2006 00:22:46 -0800 
> To  Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org
> Subject  Re: [gutvol-d] Sweat of brow 
> 
> Hi.  sorry to nitpick here, and I admit this is out of my league, 
but 
> wouldn't a plain text edition remove any and all fonts or typography 
> anyway?

There are still line lengths and possibly some editorial decisions 
that have been made along the way. It's a remote chance, but there may 
be room in the legal fine print for PD texts that would disclaim or 
renounce the copyright that may subsist in the e-text, where, and to 
the extent that such a copyright is recognized in national copyright 
law, and where, and to the extent that such a disclaimer or 
renunciation would be recognized by national law.

There is no express provision in Canadian law for a disclaim of 
copyright, but there is some case law that says that such a disclaimer 
would estop any subsequent claim for infringement.




From hiddengreen at gmail.com  Mon Mar  6 12:35:55 2006
From: hiddengreen at gmail.com (Cori)
Date: Mon Mar  6 12:36:01 2006
Subject: [gutvol-d] Re: [BP] monday morning quarterback -- #01
In-Reply-To: <200603061928.OAA27822@digital.lib.upenn.edu>
References: <200603061928.OAA27822@digital.lib.upenn.edu>
Message-ID: <910fee4a0603061235m7335eb21y1e43bcff4b9473c1@mail.gmail.com>

On 3/6/06, Bowerbird@aol.com wrote:
> i've started a new weekly "series" on best-practices
> for people digitizing e-books from paper-books...

> you don't have to be a member of the "bpsuper" yahoogroup
> in order to read the messages.   but if yahoogroups gives you
> an allergic reaction nonetheless, the issue is also posted here:

If however, it's Bowerbird's writing that gives you an allergic
reaction, but you have some reading time free, you'd be welcome to
drop by the Distributed Proofreaders' Smoothreading Pool, where a wide
variety of complete books can be read, prior to their final Project
Gutenberg upload & archiving.

http://www.pgdp.net/c/tools/post_proofers/smooth_reading.php

You don't have to be a member of DP to visit and download books to
read!  If you'd like to add comments (by editing the text and leeving
[*typo: leaving?] notes,) you would need to create a new account in
order to upload your edited copy.

As a post-processor, I greatly appreciate the people who have found
time to smooth-read my books - they pick out things I just couldn't
find any other way, and every little query raised and checked is one
less thing for errata @ PG to worry about in 42 years time :)

Have fun,

Cori
From Bowerbird at aol.com  Mon Mar  6 13:51:10 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Mar  6 13:51:30 2006
Subject: [gutvol-d] Re: [BP] monday morning quarterback -- #01
Message-ID: <1c4.3b09dfb1.313e08ce@aol.com>

cori said:
>    If however, it's Bowerbird's writing 
>    that gives you an allergic reaction

think of it as "fiber", it's good for the diet...        :+)

>   you have some reading time free, 
>    you'd be welcome to drop by the 
>    Distributed Proofreaders' Smoothreading Pool, 
>    where a wide variety of complete books can be read, 
>    prior to their final Project Gutenberg upload & archiving.

smoothreading is definitely cool.

for anybody who wants to volunteer in
user-creation of our global cyberlibrary,
but doesn't have the time and energy to
digitize a whole book yourself, know you
can do your fair share by smoothreading...

it's fun and easy, yeah, but it's also _necessary_.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060306/00adfdbe/attachment.html
From hart at pglaf.org  Mon Mar  6 19:50:48 2006
From: hart at pglaf.org (Michael Hart)
Date: Mon Mar  6 19:50:50 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
In-Reply-To: <20060301190008.GB29172@pglaf.org>
References: <2de.3175a87.31364202@aol.com>
	<17413.50263.444498.622315@celery.zuhause.org>
	<20060301190008.GB29172@pglaf.org>
Message-ID: <Pine.LNX.4.60.0603061949050.13638@pglaf.org>

On Wed, 1 Mar 2006, Greg Newby wrote:

> On Wed, Mar 01, 2006 at 09:57:11AM -0600, Bruce Albrecht wrote:
>> ...
>> How many PD books have you found in Google Book Search that were not
>> visible?  Did you report them to Google?  If not, some of the blame
>> falls on _your_ shoulders.
>
> How is such notification done?
>
> To easily find some examples, look for Jane Austen's
> works, H.G. Wells, and other well-known long-dead authors.
>
> Note that they seem to use a ridiculous date for
> "world wide" public domain...something in the 1800s,
> rather than a "US-Safe" cutoff of 1923.
>
> I think they (Google) are creating ambiguity when there
> is none, at least in the US.
>  -- Greg

They made a huge mistake doing too many copyrighted books
with the original Google Print Library, now they are hard
at work trying to reverse that public relations fiasco.

i.e. going overboard in the other direction.

mh
From Bowerbird at aol.com  Mon Mar  6 22:30:11 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Mar  6 22:30:18 2006
Subject: [gutvol-d] an example of a .pdf with excellent e-book design
Message-ID: <1e8.4c7a62d5.313e8273@aol.com>


with so many examples of bad .pdf design out there,
it gives me great pleasure to be able to point to one
where the design is well-done, on wonderful poetry:
>    http://www.poetrysuperhighway.com/ToHellWithRL.pdf

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060307/310088e5/attachment.html
From hyphen at hyphenologist.co.uk  Tue Mar  7 00:04:13 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Tue Mar  7 00:04:24 2006
Subject: [gutvol-d] Sweat of brow
In-Reply-To: <e2cfefe2fec4.e2fec4e2cfef@ncf.ca>
References: <e2cfefe2fec4.e2fec4e2cfef@ncf.ca>
Message-ID: <t9fq02d0c280deqrh2fdbknjgg9t245e70@4ax.com>

On Mon, 06 Mar 2006 15:07:26 -0500,  "Wallace J.McLean"
<ag737@freenet.carleton.ca> wrote:

|
|> ----- Original Message ----- 
|> From  Tony Baechler <tb@baechler.net> 
|> Date  Mon, 06 Mar 2006 00:22:46 -0800 
|> To  Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org
|> Subject  Re: [gutvol-d] Sweat of brow 
|> 
|> Hi.  sorry to nitpick here, and I admit this is out of my league, 
|but 
|> wouldn't a plain text edition remove any and all fonts or typography 
|> anyway?
|
|There are still line lengths and possibly some editorial decisions 
|that have been made along the way. 

IME there are always such decisions, especially in poetry where a line
would extend beyond 72 or even 80 characters.
Also moving mdashes away from line ends etc.
-- 
Dave Fawthrop <dave hyphenologist co uk>
Freedom of Speech, Expression, Religion, and Democracy are 
the keys to Civilization, together with legal acceptance of 
Fundamental Human rights.

From mkengel at gmail.com  Tue Mar  7 19:05:35 2006
From: mkengel at gmail.com (Michael Engel)
Date: Tue Mar  7 19:12:55 2006
Subject: [gutvol-d] Grimms Maerchen
Message-ID: <aaa8c4580603071905q680758fdsc2625de2a0014f54@mail.gmail.com>

There is a Grimm database (i.e. text files) of the following books on
the internet

    * Br?der Grimm: ?Kinder- und Hausm?rchen (7. Ausgabe, 1857)?
    * Br?der Grimm: ?Kinder- und Hausm?rchen (2. Ausgabe, 1819)?
    * Jacob Grimm: ?Kleinere Schriften 1 (2. Auflage, 1879)?

http://www.lg.fukuoka-u.ac.jp/~ynagata/grimm_database.html

They have downloadable Latex files

Is that of interest for project Gutenberg ?

regards
Michael Engel
From sly at victoria.tc.ca  Tue Mar  7 23:53:56 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Mar  7 23:54:07 2006
Subject: [gutvol-d] Grimms Maerchen
In-Reply-To: <aaa8c4580603071905q680758fdsc2625de2a0014f54@mail.gmail.com>
References: <aaa8c4580603071905q680758fdsc2625de2a0014f54@mail.gmail.com>
Message-ID: <Pine.GSO.4.58.0603072348040.18199@vtn1.victoria.tc.ca>

Hi Michael.

There is a truly vast amount of material on many different
websites which could concievably be added to Project Gutenberg.
(Often as I'm checking names in the catalog, I keep finding
more.) However, it does take some work to get copyright
clearance and reformat the files. If you would like to
work on these texts, I would be happy to help with any
questions you have...

Andrew

On Wed, 8 Mar 2006, Michael Engel wrote:

> There is a Grimm database (i.e. text files) of the following books on
the internet

    * Br??der Grimm: ??Kinder- und Hausm??rchen (7. Ausgabe, 1857)??
    * Br??der Grimm: ??Kinder- und Hausm??rchen (2. Ausgabe, 1819)??
    * Jacob Grimm: ??Kleinere Schriften 1 (2. Auflage, 1879)??

http://www.lg.fukuoka-u.ac.jp/~ynagata/grimm_database.html

They have downloadable Latex files

Is that of interest for project Gutenberg ?

regards
Michael Engel
From Bowerbird at aol.com  Thu Mar  9 00:41:22 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Mar  9 00:41:30 2006
Subject: [gutvol-d] google and the translation thing
Message-ID: <260.82f14a6.31414432@aol.com>

http://www.dancohen.org/blog/posts/no_computer_left_behind said:
>    Google researchers have demonstrated 
>    (but not yet released to the general public) 
>    a powerful method for creating 'good enough' 
>    translations?not by understanding the grammar 
>    of each passage, but by rapidly scanning and 
>    comparing similar phrases on countless electronic 
>    documents in the original and second languages. 
>    Given large enough volumes of words in a variety 
>    of languages, machine processing can find parallel phrases 
>    and reduce any document into a series of word swaps. 
>    Where once it seemed necessary to have a human being 
>    aid in a computer's translating skills, or to teach that 
>    machine the basics of language, swift algorithms functioning 
>    on unimaginably large amounts of text suffice. Are such new 
>    computer translations as good as a skilled, bilingual human being? 
>    Of course not. Are they good enough to get the gist of a text? 
Absolutely. 
>    So good the National Security Agency and the Central Intelligence Agency 

>    increasingly rely on that kind of technology to scan, sort, and mine 
>    gargantuan amounts of text and communications 
>    (whether or not the rest of us like it).

sounds like something you might find interesting, michael.
of course, a "good enough" translation probably wouldn't be,
not for literature, where the realm of creativity is instantiated,

but could it work as a "first pass" that would do the bulk of the
"heavy lifting", so a person knowledgeable in both languages
could come in and spend relatively little time smoothing it out?
well, it's certainly possible, i would think.   and maybe probable.
especially if progress on the technique proves to be forthcoming...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060309/fe3b10c2/attachment.html
From schultzk at uni-trier.de  Thu Mar  9 02:24:40 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Thu Mar  9 03:37:09 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <260.82f14a6.31414432@aol.com>
References: <260.82f14a6.31414432@aol.com>
Message-ID: <D574473C-8C0B-421F-9693-1E65A0B1B0F1@uni-trier.de>

Hi There,

	Let me chime in here. Yes, you can use these tools as a start and
	for casual use, but otherwise you can forget them as a professional
	tool.
		- due to the statistacal modell you get only 80-90 % accuracy
		- I see a lot of sites on which the content for different languages  
is diferent
                   no one to comparison possible
		- I have work and help develope such tools and know that they give
	          interresting results and are in the range above. Yet,  
these methods are
	          only good as a analyse tool.
		- a system with fairly decent grammar models and lexicons give better
	          results using less resources give better results. The  
Problem here is
		   is that then are not publically availble.

	The actual problem with translation is the so-called extra- 
linguistical part!!
	Culture related facts, context, register etc. to get the last 5 %  
for a decent
	translation the effort and resources rises exponentially.

	As proof the japanese in the 80s said they would have a real-time  
translation
	for telephones on the market in 5 years. This was is vaporware.

	The method is not new. It was used successfully for wheather reports  
already in the
	80s. The method works only for small areas of knowledge/language.

	In the 70s word for word used to be good enough. Now they have  
something
	they say is "good enough".

		Two Euro cents worth
			Keith.


	
Am 09.03.2006 um 09:41 schrieb Bowerbird@aol.com:

> http://www.dancohen.org/blog/posts/no_computer_left_behind said:
> >   Google researchers have demonstrated
> >   (but not yet released to the general public)
> >   a powerful method for creating 'good enough'
> >   translations?not by understanding the grammar
> >   of each passage, but by rapidly scanning and
> >   comparing similar phrases on countless electronic
> >   documents in the original and second languages.
> >   Given large enough volumes of words in a variety
> >   of languages, machine processing can find parallel phrases
> >   and reduce any document into a series of word swaps.
> >   Where once it seemed necessary to have a human being
> >   aid in a computer's translating skills, or to teach that
> >   machine the basics of language, swift algorithms functioning
> >   on unimaginably large amounts of text suffice. Are such new
> >   computer translations as good as a skilled, bilingual human being?
> >   Of course not. Are they good enough to get the gist of a text?  
> Absolutely.
> >   So good the National Security Agency and the Central  
> Intelligence Agency
> >   increasingly rely on that kind of technology to scan, sort, and  
> mine
> >   gargantuan amounts of text and communications
> >   (whether or not the rest of us like it).
>
> sounds like something you might find interesting, michael.
> of course, a "good enough" translation probably wouldn't be,
> not for literature, where the realm of creativity is instantiated,
>
> but could it work as a "first pass" that would do the bulk of the
> "heavy lifting", so a person knowledgeable in both languages
> could come in and spend relatively little time smoothing it out?
> well, it's certainly possible, i would think.  and maybe probable.
> especially if progress on the technique proves to be forthcoming...
>
> -bowerbird
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060309/90567b21/attachment.html
From Bowerbird at aol.com  Thu Mar  9 08:05:59 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Mar  9 08:06:16 2006
Subject: [gutvol-d] google and the translation thing
Message-ID: <272.71795cd.3141ac67@aol.com>

keith said:
>    The method is not new. It was used 
>    successfully for wheather reports already

the "method" might not be new.

but what _is_ different, and undeniably so,
is that google has a _huge_ corpus of text
with which to implement the method now,
possibly the "secret sauce" to make it work.

this asset, and its bearing on the problem,
should not be underestimated.   and indeed,
that huge corpus could exert all manner of
effects on a wide variety of knowledge tasks.

the information about the world represented
by _billions_ of web-pages out in cyberspace
could lead to the gleaning of vast knowledge.
(so much so that it could become very scary.)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060309/9c13eedd/attachment.html
From donovan at abs.net  Thu Mar  9 15:43:18 2006
From: donovan at abs.net (D Garcia)
Date: Thu Mar  9 16:02:18 2006
Subject: [dp-pg] re: [gutvol-d] google and the translation thing
In-Reply-To: <272.71795cd.3141ac67@aol.com>
References: <272.71795cd.3141ac67@aol.com>
Message-ID: <200603091843.19084.donovan@abs.net>

On Thursday 09 March 2006 11:05 am, Bowerbird@aol.com wrote:
> but what _is_ different, and undeniably so,
> is that google has a _huge_ corpus of text
<snip>
> the information about the world represented
> by _billions_ of web-pages out in cyberspace
> could lead to the gleaning of vast knowledge.
> (so much so that it could become very scary.)

I can see this revealing (or at least quantifying) the disturbingly high rate 
of spelling and grammatical errors. Billions and billions of them, to 
paraphrase Sagan, or more likely (with sincere apologies to Kubrick) ... 
"My God ... it's full of shit."

Speaking of the web, of course. :)
From Bowerbird at aol.com  Thu Mar  9 16:16:23 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Mar  9 16:16:29 2006
Subject: [dp-pg] re: [gutvol-d] google and the translation thing
Message-ID: <221.96121d3.31421f57@aol.com>

donovan said:
>    I can see this revealing 
>    (or at least quantifying) 
>    the disturbingly high rate of 
>    spelling and grammatical errors.

actually, hit-differentials are
an excellent method to _detect_
spelling and grammatical errors,
so it shouldn't be that difficult to
clean the corpus quite thoroughly.

but that's not the "knowledge" that
google might glean that is so scary
to me.   that involves putting together
disparate pieces of information that
were never intended to be connected,
but nonetheless exist out in cyberspace
and _can_ be joined with enough "smarts".

especially if you can dip into a few "private"
databases, like the ones with credit-card info,
you could build quite a dossier on any person
(or place or thing) out there...

-bowerbird

p.s.   in the news yesterday were reports that
yet another credit-card database was hacked.
does it strike anyone else as odd that "security"
can be so lapse on this personal and private data
at the same time that the corporations are wanting
to "lock up" all their content?   i'm beginning to think
we should just all _pretend_ that d.r.m. works great
to put them at ease, knowing that it'll all be cracked
a few years down the line, and we'll be done with it...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060309/ae1ea6f3/attachment.html
From sly at victoria.tc.ca  Thu Mar  9 20:00:34 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Mar  9 20:00:40 2006
Subject: [gutvol-d] Editing jpg images
Message-ID: <Pine.GSO.4.58.0603091956220.12502@vtn1.victoria.tc.ca>



The latest of a series of Mary E. Wilkins Freeman
books I've been adapting for PG from another website
is "The Portion of Labor". The person who produced
this has also made some illustrations of availible.

I've prepared an html file, and the unchanged images here:
http://www.victoria.tc.ca/~sly/portion.htm

Would anyone like to work these images into usable
shape for PG purposes, or suggest how I might want
to deal with them?

Andrew
From hart at pglaf.org  Thu Mar  9 20:58:14 2006
From: hart at pglaf.org (Michael Hart)
Date: Thu Mar  9 20:58:15 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <260.82f14a6.31414432@aol.com>
References: <260.82f14a6.31414432@aol.com>
Message-ID: <Pine.LNX.4.60.0603092055030.32091@pglaf.org>


Yes, this is part of what I have been talking about for a few years.

Once OCR gets to an acceptable level for all, the next big thing,
the killer ap, so to speak, will be MT [Machine Translation] which
will convert the 10 million eBooks that will be available into 100
different languages, for a billion free online eBooks.

mh

On Thu, 9 Mar 2006 Bowerbird@aol.com wrote:

> http://www.dancohen.org/blog/posts/no_computer_left_behind said:
>>    Google researchers have demonstrated
>>    (but not yet released to the general public)
>>    a powerful method for creating 'good enough'
>>    translations??not by understanding the grammar
>>    of each passage, but by rapidly scanning and
>>    comparing similar phrases on countless electronic
>>    documents in the original and second languages.
>>    Given large enough volumes of words in a variety
>>    of languages, machine processing can find parallel phrases
>>    and reduce any document into a series of word swaps.
>>    Where once it seemed necessary to have a human being
>>    aid in a computer's translating skills, or to teach that
>>    machine the basics of language, swift algorithms functioning
>>    on unimaginably large amounts of text suffice. Are such new
>>    computer translations as good as a skilled, bilingual human being?
>>    Of course not. Are they good enough to get the gist of a text?
> Absolutely.
>>    So good the National Security Agency and the Central Intelligence Agency
>
>>    increasingly rely on that kind of technology to scan, sort, and mine
>>    gargantuan amounts of text and communications
>>    (whether or not the rest of us like it).
>
> sounds like something you might find interesting, michael.
> of course, a "good enough" translation probably wouldn't be,
> not for literature, where the realm of creativity is instantiated,
>
> but could it work as a "first pass" that would do the bulk of the
> "heavy lifting", so a person knowledgeable in both languages
> could come in and spend relatively little time smoothing it out?
> well, it's certainly possible, i would think.   and maybe probable.
> especially if progress on the technique proves to be forthcoming...
>
> -bowerbird
>
From hyphen at hyphenologist.co.uk  Thu Mar  9 23:37:36 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Thu Mar  9 23:38:02 2006
Subject: [dp-pg] re: [gutvol-d] google and the translation thing
In-Reply-To: <200603091843.19084.donovan@abs.net>
References: <272.71795cd.3141ac67@aol.com> <200603091843.19084.donovan@abs.net>
Message-ID: <0sa212ptd3n413ak7i37ngh3607lmg9us2@4ax.com>

On Thu, 9 Mar 2006 18:43:18 -0500,  D Garcia <donovan@abs.net> wrote:

|On Thursday 09 March 2006 11:05 am, Bowerbird@aol.com wrote:
|> but what _is_ different, and undeniably so,
|> is that google has a _huge_ corpus of text
|<snip>
|> the information about the world represented
|> by _billions_ of web-pages out in cyberspace
|> could lead to the gleaning of vast knowledge.
|> (so much so that it could become very scary.)
|
|I can see this revealing (or at least quantifying) the disturbingly high rate 
|of spelling and grammatical errors. Billions and billions of them, to 
|paraphrase Sagan, or more likely (with sincere apologies to Kubrick) ... 
|"My God ... it's full of shit."
|
|Speaking of the web, of course. :)

Clearly we are ?progressing? back to the days of Shakespeare when spelling
was much more varied, and he spelled his name in several different ways.
Not having a dictionary of ?correct? spelling available did his work no
harm.  Discuss. ;-)
-- 
Dave Fawthrop <dave hyphenologist co uk>
Freedom of Speech, Expression, Religion, and Democracy are 
the keys to Civilization, together with legal acceptance of 
Fundamental Human rights.

From hyphen at hyphenologist.co.uk  Thu Mar  9 23:46:40 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Thu Mar  9 23:46:55 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <Pine.LNX.4.60.0603092055030.32091@pglaf.org>
References: <260.82f14a6.31414432@aol.com>
	<Pine.LNX.4.60.0603092055030.32091@pglaf.org>
Message-ID: <57b212lmtm5sh2kl1jmiah0gh2u1htjr37@4ax.com>

On Thu, 9 Mar 2006 20:58:14 -0800 (PST),  Michael Hart <hart@pglaf.org>
wrote:

|
|Yes, this is part of what I have been talking about for a few years.

And will be *talked about* for decades/centuries to come.

|Once OCR gets to an acceptable level for all, the next big thing,
|the killer ap, so to speak, will be MT [Machine Translation] which
|will convert the 10 million eBooks that will be available into 100
|different languages, for a billion free online eBooks.

In your dreams!  Only a *tiny* fraction of the *human* population can
produce acceptable translations ATM.   Machines will have to become more
?intelligent? than 99% of the population before MT becomes a reality.
Machines can now compete on equal terms with an earwig.
-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain

From schultzk at uni-trier.de  Fri Mar 10 01:16:34 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Fri Mar 10 01:16:40 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <272.71795cd.3141ac67@aol.com>
References: <272.71795cd.3141ac67@aol.com>
Message-ID: <EE5F31D2-2B4A-4ED7-8B9F-2DE2136C6FD8@uni-trier.de>

Hi There,

Am 09.03.2006 um 17:05 schrieb Bowerbird@aol.com:

> keith said:
> >   The method is not new. It was used
> >   successfully for wheather reports already
>
> the "method" might not be new.
>
> but what _is_ different, and undeniably so,
> is that google has a _huge_ corpus of text
> with which to implement the method now,
> possibly the "secret sauce" to make it work.
	Just the opposite is the case. Believe me as a computer
	linguist.  For decades it said that with faster computers
	bigger corpora MT would have its break through.
	What has happened. Vaporware and results.
	It simply does not work. Language can not be
	sucessfully model. Languages are regularly formed,
	nor well formed.

>
> this asset, and its bearing on the problem,
> should not be underestimated.  and indeed,
> that huge corpus could exert all manner of
> effects on a wide variety of knowledge tasks.
>
> the information about the world represented
> by _billions_ of web-pages out in cyberspace
> could lead to the gleaning of vast knowledge.
> (so much so that it could become very scary.)
	All AI projects so far have failed and failure has
	been admitted. That knowlege can be extracted
	from corpora. Language does not constitute
	meaning or knowledge. It just transport it.
	That is why a good deal in NLP is done in the field
	of knowledge representation.

	Do you realize that voilets where originally the
	color BROWN and not blue !!! (see Goethe).
	A translator today would translate Goethe braun(brown)
	to blue since it is what people would expect!!!

		Keith.
	
>
> -bowerbird
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060310/5d3be654/attachment.html
From schultzk at uni-trier.de  Fri Mar 10 01:32:29 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Fri Mar 10 01:32:36 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <Pine.LNX.4.60.0603092055030.32091@pglaf.org>
References: <260.82f14a6.31414432@aol.com>
	<Pine.LNX.4.60.0603092055030.32091@pglaf.org>
Message-ID: <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de>


Am 10.03.2006 um 05:58 schrieb Michael Hart:

>
> Yes, this is part of what I have been talking about for a few years.
>
> Once OCR gets to an acceptable level for all, the next big thing,
> the killer ap, so to speak, will be MT [Machine Translation] which
> will convert the 10 million eBooks that will be available into 100
> different languages, for a billion free online eBooks.
	Not in the next 100 or so years. In the 80s there where OCR
	systems you could/(had to) be trained. They would give you 95 to 99%
	accuracy. But, inorder to get these results you would train the
	system a long time and this training could basically be used just on  
one
	text. Today, dictionaries are used to guess which words are
	to be recognised. That is why the OCR systems today give us
	better results if the original has DECENT quality!!!

	The pattern recognition systems have not gotten better and
	the dictionary trick takes the motivation away to
	develop better OCR algorithms.

	Interesting is also, I had a Apple Newton and it recognized
	my handwriting with 98-99% accuracy. Yet, most OCR systems
	today will fail!! They can not be trained! I still have to find
	a system today with similar performance. So much for technological
	break throughs.

		Keith.

>
> mh
>
> On Thu, 9 Mar 2006 Bowerbird@aol.com wrote:
>
>> http://www.dancohen.org/blog/posts/no_computer_left_behind said:
>>>    Google researchers have demonstrated
>>>    (but not yet released to the general public)
>>>    a powerful method for creating 'good enough'
>>>    translations??not by understanding the grammar
>>>    of each passage, but by rapidly scanning and
>>>    comparing similar phrases on countless electronic
>>>    documents in the original and second languages.
>>>    Given large enough volumes of words in a variety
>>>    of languages, machine processing can find parallel phrases
>>>    and reduce any document into a series of word swaps.
>>>    Where once it seemed necessary to have a human being
>>>    aid in a computer's translating skills, or to teach that
>>>    machine the basics of language, swift algorithms functioning
>>>    on unimaginably large amounts of text suffice. Are such new
>>>    computer translations as good as a skilled, bilingual human  
>>> being?
>>>    Of course not. Are they good enough to get the gist of a text?
>> Absolutely.
>>>    So good the National Security Agency and the Central  
>>> Intelligence Agency
>>
>>>    increasingly rely on that kind of technology to scan, sort,  
>>> and mine
>>>    gargantuan amounts of text and communications
>>>    (whether or not the rest of us like it).
>>
>> sounds like something you might find interesting, michael.
>> of course, a "good enough" translation probably wouldn't be,
>> not for literature, where the realm of creativity is instantiated,
>>
>> but could it work as a "first pass" that would do the bulk of the
>> "heavy lifting", so a person knowledgeable in both languages
>> could come in and spend relatively little time smoothing it out?
>> well, it's certainly possible, i would think.   and maybe probable.
>> especially if progress on the technique proves to be forthcoming...
>>
>> -bowerbird
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From schultzk at uni-trier.de  Fri Mar 10 01:43:16 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Fri Mar 10 01:43:21 2006
Subject: [dp-pg] re: [gutvol-d] google and the translation thing
In-Reply-To: <0sa212ptd3n413ak7i37ngh3607lmg9us2@4ax.com>
References: <272.71795cd.3141ac67@aol.com> <200603091843.19084.donovan@abs.net>
	<0sa212ptd3n413ak7i37ngh3607lmg9us2@4ax.com>
Message-ID: <A9A9A706-5415-4C9A-A8EE-4EDE665370FD@uni-trier.de>

Hi,

Am 10.03.2006 um 08:37 schrieb Dave Fawthrop:

> On Thu, 9 Mar 2006 18:43:18 -0500,  D Garcia <donovan@abs.net> wrote:
>
> |On Thursday 09 March 2006 11:05 am, Bowerbird@aol.com wrote:
> |> but what _is_ different, and undeniably so,
> |> is that google has a _huge_ corpus of text
> |<snip>
> |> the information about the world represented
> |> by _billions_ of web-pages out in cyberspace
> |> could lead to the gleaning of vast knowledge.
> |> (so much so that it could become very scary.)
> |
> |I can see this revealing (or at least quantifying) the  
> disturbingly high rate
> |of spelling and grammatical errors. Billions and billions of them, to
> |paraphrase Sagan, or more likely (with sincere apologies to  
> Kubrick) ...
> |"My God ... it's full of shit."
> |
> |Speaking of the web, of course. :)
>
> Clearly we are ?progressing? back to the days of Shakespeare when  
> spelling
> was much more varied, and he spelled his name in several different  
> ways.
> Not having a dictionary of ?correct? spelling available did his  
> work no
> harm.  Discuss. ;-)
	It did him no harm and humans no harm.But, machines are
	knowledgeless !! They need a dictionary. Humans through
	their experience and knowledge can recognize all this.
	A machine has to be given this knowledge. This is not a
	trival task.

	The Cobuild dictionary was the first Dictionary that was completly
	corpus based, but there was a lot of human man power used,
	also.

	Btw. All of Shakespeare works were not written down by himself,
	     but were transcripted during the plays. Therefore the varied
              portfolios and spellings.

		Keith.
		


> -- 
> Dave Fawthrop <dave hyphenologist co uk>
> Freedom of Speech, Expression, Religion, and Democracy are
> the keys to Civilization, together with legal acceptance of
> Fundamental Human rights.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From hyphen at hyphenologist.co.uk  Fri Mar 10 02:37:34 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Fri Mar 10 02:37:42 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de>
References: <260.82f14a6.31414432@aol.com>
	<Pine.LNX.4.60.0603092055030.32091@pglaf.org>
	<C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de>
Message-ID: <2el2129vfruqn211o6n8v98t9srlnpjdb3@4ax.com>

On Fri, 10 Mar 2006 10:32:29 +0100,  "Keith J. Schultz"
<schultzk@uni-trier.de> wrote:

|	text. Today, dictionaries are used to guess which words are
|	to be recognised. That is why the OCR systems today give us
|	better results if the original has DECENT quality!!!

And get it *wrong* <mumblehategrumble> very often. 
For my Yorkshire Dialect stuff which include "wor" many times, this gets
changed into "war", most of the time.    To the extent that I use an
initial edit to put it right.
-- 
Dave Fawthrop <dave hyphenologist co uk>
Freedom of Speech, Expression, Religion, and Democracy are 
the keys to Civilization, together with legal acceptance of 
Fundamental Human rights.

From holden.mcgroin at dsl.pipex.com  Fri Mar 10 02:24:15 2006
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Fri Mar 10 02:50:13 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de>
References: <260.82f14a6.31414432@aol.com>
	<Pine.LNX.4.60.0603092055030.32091@pglaf.org>
	<C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de>
Message-ID: <1141986255.20173.15.camel@steve-mcqueen>

On Fri, 2006-03-10 at 10:32 +0100, Keith J. Schultz wrote:
> text. Today, dictionaries are used to guess which words are
> to be recognised. That is why the OCR systems today give us
> better results if the original has DECENT quality!!!

> The pattern recognition systems have not gotten better and
> the dictionary trick takes the motivation away to
> develop better OCR algorithms.

I'm going to have to call bullshit here. As a researcher working in the
field of document recognition, I've noticed tremendous improvements in
OCR quality even just in the past five years.

The fact is, OCR and document recognition as a whole is a field of
tremendous ongoing research. It's no secret that the problem of OCR is
not "solved" yet but for some types of document (particularly clean ones
using lating characters), results are already damn good. In other areas,
particularly regarding degraded documents, results aren't as good but
are steadily improving.

You state that the so-called "dictionary trick" takes away all
motivation to research in the field. This is not what I observe going on
in the research community. Dictionary-based lookups are one tool in the
arsenal but that's something that's well understood. Some of my
colleagues are currently researching novel image processing and feature
extraction techniques with the goal of improving raw OCR results.

OCR is improving. We're working on it.

Cheers,
Holden

From schultzk at uni-trier.de  Fri Mar 10 03:33:59 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Fri Mar 10 03:34:05 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <1141986255.20173.15.camel@steve-mcqueen>
References: <260.82f14a6.31414432@aol.com>
	<Pine.LNX.4.60.0603092055030.32091@pglaf.org>
	<C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de>
	<1141986255.20173.15.camel@steve-mcqueen>
Message-ID: <264BC548-43B0-40FF-900A-CBCFC8914FAF@uni-trier.de>

Hello,

Am 10.03.2006 um 11:24 schrieb Holden McGroin:

> On Fri, 2006-03-10 at 10:32 +0100, Keith J. Schultz wrote:
>> text. Today, dictionaries are used to guess which words are
>> to be recognised. That is why the OCR systems today give us
>> better results if the original has DECENT quality!!!
>
>> The pattern recognition systems have not gotten better and
>> the dictionary trick takes the motivation away to
>> develop better OCR algorithms.
>
> I'm going to have to call bullshit here. As a researcher working in  
> the
> field of document recognition, I've noticed tremendous improvements in
> OCR quality even just in the past five years.
	Before you start to swear, read and understand! Maybe in the
	development labs, but not for the non-high end user!!!!

>
> The fact is, OCR and document recognition as a whole is a field of
> tremendous ongoing research. It's no secret that the problem of OCR is
> not "solved" yet but for some types of document (particularly clean  
> ones
> using lating characters), results are already damn good. In other  
> areas,
> particularly regarding degraded documents, results aren't as good but
> are steadily improving.
>
> You state that the so-called "dictionary trick" takes away all
> motivation to research in the field. This is not what I observe  
> going on
> in the research community. Dictionary-based lookups are one tool in  
> the
> arsenal but that's something that's well understood. Some of my
> colleagues are currently researching novel image processing and  
> feature
> extraction techniques with the goal of improving raw OCR results.
	
	We have not seen any improvements in the field for the past five
	years!!! The improvements are mainly due to the use of dictionaries!!
	Not the improvement of character recognition!! Most systems in the
	field get their performance out of word recognition !!!
>
> OCR is improving. We're working on it.

	I did mean to say not there is no improvement in Optical
	Character Recognition, but the improvment over the past
	10 years is minimal at most. When I see a OCR system that
	just uses raw results, then I will bow my head in recognition
	of true achieve meant. Furthermore, when the image processing
	gets that far it will open up new possiblities in all kinds
	of sciences.


	

From schultzk at uni-trier.de  Fri Mar 10 03:36:09 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Fri Mar 10 03:36:12 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <2el2129vfruqn211o6n8v98t9srlnpjdb3@4ax.com>
References: <260.82f14a6.31414432@aol.com>
	<Pine.LNX.4.60.0603092055030.32091@pglaf.org>
	<C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de>
	<2el2129vfruqn211o6n8v98t9srlnpjdb3@4ax.com>
Message-ID: <7599AF7D-1B28-495E-964B-55F4A5998386@uni-trier.de>

Hi There,

Am 10.03.2006 um 11:37 schrieb Dave Fawthrop:

> On Fri, 10 Mar 2006 10:32:29 +0100,  "Keith J. Schultz"
> <schultzk@uni-trier.de> wrote:
>
> |	text. Today, dictionaries are used to guess which words are
> |	to be recognised. That is why the OCR systems today give us
> |	better results if the original has DECENT quality!!!
>
> And get it *wrong* <mumblehategrumble> very often.
	exactly my point.

> For my Yorkshire Dialect stuff which include "wor" many times, this  
> gets
> changed into "war", most of the time.    To the extent that I use an
> initial edit to put it right.
	A small tip. Try using a custom dictionary. Or get a system
	that you can train!

		Keith.

From joshua at hutchinson.net  Fri Mar 10 06:31:18 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Mar 10 06:31:19 2006
Subject: [gutvol-d] google and the translation thing
Message-ID: <20060310143118.ADEC62F8DE@ws6-3.us4.outblaze.com>


> ----- Original Message -----
> From: "Keith J. Schultz" <schultzk@uni-trier.de>

> 	Interesting is also, I had a Apple Newton and it recognized
> 	my handwriting with 98-99% accuracy. Yet, most OCR systems
> 	today will fail!! They can not be trained! I still have to find
> 	a system today with similar performance. So much for technological
> 	break throughs.
> 

Not to disagree with your main points, which I agree with, but I thought I'd point out that most major OCR packages still allowing training (I'm most familiar with FineReader), but they do tend to bury deep so that you have to hunt to find out how to do it (we have people who do it regular at DP for some of the more ... creative ... fonts we find sometimes in old texts).

Josh
From aotg20 at dsl.pipex.com  Fri Mar 10 06:45:40 2006
From: aotg20 at dsl.pipex.com (Richard Poynder)
Date: Fri Mar 10 06:45:49 2006
Subject: [gutvol-d] Interview with Michael Hart
In-Reply-To: <Pine.LNX.4.60.0602261037450.10447@pglaf.org>
References: <1675677963.20060109133820@noring.name>
	<20060109233815.GB21426@pglaf.org>
	<80308709.20060109164835@noring.name>
	<20060110002256.GA27181@pglaf.org>
	<7.0.1.0.2.20060226175302.00adc9e0@dsl.pipex.com>
	<Pine.LNX.4.60.0602261037450.10447@pglaf.org>
Message-ID: <7.0.1.0.2.20060310144359.021b2678@dsl.pipex.com>

Thank you. The interview is now published at:

http://poynder.blogspot.com/2006/03/interview-with-michael-hart.html

Best wishes,


Richard Poynder



Richard Poynder
Freelance Journalist
www.richardpoynder.com
http://poynder.blogspot.com



At 18:53 26/02/2006, you wrote:

>Hopefully any of the three color pics here will fill the bill,
>or the right hand b/w.
>
>
>lynx http://pglaf.org/~hart/
>
>
>"If what you did yesterday
>Still seems great today,
>Then your goals for tomorrow
>Are not big enough."
>
>Ling Yu Fu, circa 600 BC
>
>
>Break Down The Bars Of Ignorance And Illiteracy
>
>Michael S. Hart (hart@pobox.com)
>
>
>On Sun, 26 Feb 2006, Richard Poynder wrote:
>
>>Dear All,
>>
>>I shall shortly be publishing an interview I did with Michael Hart, 
>>and am very keen to include a recent color photo of him.
>>
>>Does anyone happen to have such a photo that they could send to me 
>>by e-mail? If so, I would be most grateful.
>>
>>Best wishes,
>>
>>
>>Richard Poynder
>>
>>
>>Richard Poynder
>>Freelance Journalist
>>www.richardpoynder.com
>>http://poynder.blogspot.com




From hyphen at hyphenologist.co.uk  Fri Mar 10 06:54:25 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Fri Mar 10 06:54:37 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <20060310143118.ADEC62F8DE@ws6-3.us4.outblaze.com>
References: <20060310143118.ADEC62F8DE@ws6-3.us4.outblaze.com>
Message-ID: <9j4312tesp0mnmneg8gqf9mg269vn479os@4ax.com>

On Fri, 10 Mar 2006 09:31:18 -0500,  "Joshua Hutchinson"
<joshua@hutchinson.net> wrote:

|
|> ----- Original Message -----
|> From: "Keith J. Schultz" <schultzk@uni-trier.de>
|
|> 	Interesting is also, I had a Apple Newton and it recognized
|> 	my handwriting with 98-99% accuracy. Yet, most OCR systems
|> 	today will fail!! They can not be trained! I still have to find
|> 	a system today with similar performance. So much for technological
|> 	break throughs.
|> 
|
|Not to disagree with your main points, which I agree with, but I thought I'd point out that most major OCR packages still allowing training (I'm most familiar with FineReader), but they do tend to bury deep so that you have to hunt to find out how to do it (we have people who do it regular at DP for some of the more ... creative ... fonts we find sometimes in old texts).

I use Readiris because finereader would not mount when I tried it *long*
ago.   The problem there is that it does not ?see? the thin strokes in "w"
and "r" so no amount of training will work for those two characters.
-- 
Dave Fawthrop <dave hyphenologist co uk>
Freedom of Speech, Expression, Religion, and Democracy are 
the keys to Civilization, together with legal acceptance of 
Fundamental Human rights.

From Bowerbird at aol.com  Fri Mar 10 09:59:26 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Mar 10 09:59:42 2006
Subject: [gutvol-d] google and the translation thing
Message-ID: <111.5cbf1cc9.3143187e@aol.com>

keith said:
>    Just the opposite is the case. 
>    Believe me as a computer?linguist.

i believe that the computer linguists
have not been able to solve the problem.

i also believe that google's research lab
_will_ be able to solve it.   i doubt they have
"solved" it yet, and i'm sure when they do,
their "solution" won't be "perfect enough"
for the computer linguists, but nonetheless...


>    What has happened. Vaporware and results.
>    It simply does not work. Language can not be
>    sucessfully model. Languages are regularly formed,
>    nor well formed.?

and here's a great example of why it won't be "perfect".
just in the sentences quoted above: there should be a
question-mark after "happened"; there seems to be a
missing adjective before "results";' "successfully" is not
spelled correctly.   and there seems to be a missing word
between "regularly" and "formed"; yet despite all these
shortcomings, i know exactly what you meant to say...

(and i don't mean to be picking on you if english is not
your first language.   i only speak one language, so i am
the last person to criticize anyone else on that dimension.
the point is that human beings are very good at resolving
the ambiguity that results from incomplete information,
and we probably can't reasonably expect that of machines.
but it is simply not that case that ambiguity permeates
_every_aspect_ of language; clarity is not impossible.)


>    All AI projects so far have failed 
>    and failure has been admitted. 

yes it has been, yet deep blue can still beat
all but the best of the world's grandmasters...

if you give up on teaching a machine "meaning",
and concentrate on giving it enough rules that
give the correct results most of the time, you can
get very close to finishing the job you want done.

of course, this approach is considered "a trick"
by the artificial-intelligence people, whose aim
was to "teach meaning" rather than solve a task,
but that's why those artificial-intelligence people
have been such a failure themselves...


>   When I see a OCR system that just uses raw results, 
>    then I will bow my head in recognition of true achieve meant.

a perfect example of what i just said:
the objective is to get accurate o.c.r.,
by whatever means necessary, and
_not_ to limit yourself to "raw results".

if doing some voodoo gave better o.c.r.,
we would do it.   this isn't some kind of
"intellectual challenge" where we find it
necessary to tie our hands behind our back;
it is a practical job that needs to be done...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060310/94d5a747/attachment-0001.html
From gbnewby at pglaf.org  Fri Mar 10 16:07:02 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Mar 10 16:07:03 2006
Subject: [gutvol-d] eBooks on slashdot today
Message-ID: <20060311000702.GA22305@pglaf.org>


"It seems that the readers of Slashdot are the most likely early
adopters of electronic books, but from posts I've seen here, it doesn't
appear that many on Slashdot are e-book fans. In the hopes of sparking a
discussion, I'd like to ask what keeps you personally from reading
e-books?"  Here are some of my guesses as to why people haven't taken up
e-Books:

1. Form factor: They just prefer the feel and 'interface' of a paper
book.

2. Lack of a compelling device (or perhaps lack of convergence): They
don't own a reader (other than a PC or notebook) and can't take them
with them.

3. Lack of content: Books they are interested in aren't available in
electronic format

4. Distribution model: They don't like the DRM scheme their favorite
publisher offers, or are otherwise unhappy with current offerings.

Maybe lively discussion from a prospective set of customers might spur
the creator of the next generation of electronic book devices. Too bad
the name 'iBook' is already taken."

What reason do you have for not taking up e-Books? Are they listed above
or are there other reasons that you would like to add?"


http://ask.slashdot.org/article.pl?sid=06/03/10/1555203
From bruce at zuhause.org  Fri Mar 10 18:40:33 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Fri Mar 10 18:40:40 2006
Subject: [gutvol-d] Commercial paper editions of PG texts
In-Reply-To: <20060304213610.GJ6307@pglaf.org>
References: <2de.3175a87.31364202@aol.com>
	<17413.50263.444498.622315@celery.zuhause.org>
	<20060301190008.GB29172@pglaf.org>
	<17414.22323.407940.157838@celery.zuhause.org>
	<20060304213610.GJ6307@pglaf.org>
Message-ID: <17426.14497.873203.598737@celery.zuhause.org>

Greg Newby writes:
 > On Wed, Mar 01, 2006 at 08:23:47PM -0600, Bruce Albrecht wrote:
 > > Greg Newby writes:
 > >  > On Wed, Mar 01, 2006 at 09:57:11AM -0600, Bruce Albrecht wrote:
 > >  > > ...
 > >  > > How many PD books have you found in Google Book Search that were not
 > >  > > visible?  Did you report them to Google?  If not, some of the blame
 > >  > > falls on _your_ shoulders.
 > >  > 
 > >  > How is such notification done?  
 > > 
 > > Well, when I've been doing book searches at Google, and it comes up
 > > with a book that doesn't say that it was provided by a publisher, and
 > > the book information claims it was copyrighted before 1923, or I can
 > > find the copyright in a snippet, I use Google's feeback link to report
 > > that the book is incorrectly flagged as being in copyright so that
 > > they will fix the status.  In one case, they fixed it after a 4-5
 > > email exchange.  In other cases, they simply told me that they were
 > > aware that some books were incorrectly identified as in copyright.
 > 
 > Do they consider 1923 as a cutoff date (per US law)?  Or
 > do they look to 1868 or something similar as a cutoff,
 > as an attempt to only say "public domain" if it's defensibly
 > for the entire world?

I don't know about other countries, as I am in the US. This week, I
tried to follow up on a book published in 1914 in England (but
probably missing an explicit copyright), but it's hard to tell because
Google doesn't display full sized title and title-verso page.  The
British Library didn't indicate any additional editions.  Basically,
Google's response this time was "We're not sure if it's in copyright
so you're not going to see anything more than snippets."
From phil at thalasson.com  Fri Mar 10 18:11:16 2006
From: phil at thalasson.com (Philip Baker)
Date: Fri Mar 10 19:15:01 2006
Subject: [dp-pg] re: [gutvol-d] google and the translation thing
In-Reply-To: <A9A9A706-5415-4C9A-A8EE-4EDE665370FD@uni-trier.de>
Message-ID: <IGVKjOAEHjEEFwbv@thalasson.com>

In article <A9A9A706-5415-4C9A-A8EE-4EDE665370FD@uni-trier.de>, "Keith
J. Schultz" <schultzk@uni-trier.de> writes
>       Btw. All of Shakespeare works were not written down by himself,
>            but were transcripted during the plays. Therefore the varied
>              portfolios and spellings.

You mean the various quartos. Some may have been bootleg copies for the
use of rival theatre companies but the First Folio was produced from
working copies of the plays owned by Shakespeare's theatre company.
-- 
Philip Baker
From tb at baechler.net  Sat Mar 11 00:43:00 2006
From: tb at baechler.net (Tony Baechler)
Date: Sat Mar 11 03:26:10 2006
Subject: [gutvol-d] eBooks on slashdot today
In-Reply-To: <20060311000702.GA22305@pglaf.org>
References: <20060311000702.GA22305@pglaf.org>
Message-ID: <7.0.1.0.2.20060311003844.03737d60@baechler.net>

My reasons are 3 and 4 below.  I'm blind so I'm very interested in 
electronic books.  I read them almost exclusively.  However, as much 
as I like PG, I get tired of only reading pre-1923 books so I look 
elsewhere.  I am very greatful to the people who are getting 1950's 
science fiction cleared.  I assume this falls under rule 6?  I'm 
greatful, those books aren't generally available except from PG.

I don't like DRM anyway, but especially since it usually locks out 
screen readers from reading the text.  Often PDf files are encrypted 
or have passwords preventing copying text.  MS Reader is generally 
not accessible at all.  Even if reading aloud is turned on, you don't 
get a choice of what voice you want and are stuck with horrible 
software speech.  Even the new DAISY format for the blind has 
restrictions but I can convert it to plain text so I'm happy.  Post 
this to slashdot if you want.

At 04:07 PM 3/10/2006, you wrote:

>3. Lack of content: Books they are interested in aren't available in
>electronic format
>
>4. Distribution model: They don't like the DRM scheme their favorite
>publisher offers, or are otherwise unhappy with current offerings.

From hart at ibiblio.org  Fri Mar 10 13:42:49 2006
From: hart at ibiblio.org (Michael Hart)
Date: Sat Mar 11 08:09:45 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <111.5cbf1cc9.3143187e@aol.com>
References: <111.5cbf1cc9.3143187e@aol.com>
Message-ID: <Pine.LNX.4.61.0603101640560.14959@tribal.metalab.unc.edu>


This is all just nitpicking and progress is being made,
that's all that counts, not now many think ENUF progress.

"Those doing the impossible
should not be interrupted
by those who say impossible."

Ancient Chinese Proverb



I am currently running from an emergency backup mail system,
so please reply to hart@pobox.com as usual, but cc: me at:
hart@metalab.unc.edu until I let you know I am back @pglaf.
Please also add hart@pglaf.org to your email alias for me.

Thanks!!!!!!!

Michael


On Fri, 10 Mar 2006 Bowerbird@aol.com wrote:

> keith said:
>>    Just the opposite is the case.
>>    Believe me as a computer?linguist.
>
> i believe that the computer linguists
> have not been able to solve the problem.
>
> i also believe that google's research lab
> _will_ be able to solve it.   i doubt they have
> "solved" it yet, and i'm sure when they do,
> their "solution" won't be "perfect enough"
> for the computer linguists, but nonetheless...
>
>
>>    What has happened. Vaporware and results.
>>    It simply does not work. Language can not be
>>    sucessfully model. Languages are regularly formed,
>>    nor well formed.?
>
> and here's a great example of why it won't be "perfect".
> just in the sentences quoted above: there should be a
> question-mark after "happened"; there seems to be a
> missing adjective before "results";' "successfully" is not
> spelled correctly.   and there seems to be a missing word
> between "regularly" and "formed"; yet despite all these
> shortcomings, i know exactly what you meant to say...
>
> (and i don't mean to be picking on you if english is not
> your first language.   i only speak one language, so i am
> the last person to criticize anyone else on that dimension.
> the point is that human beings are very good at resolving
> the ambiguity that results from incomplete information,
> and we probably can't reasonably expect that of machines.
> but it is simply not that case that ambiguity permeates
> _every_aspect_ of language; clarity is not impossible.)
>
>
>>    All AI projects so far have failed
>>    and failure has been admitted.
>
> yes it has been, yet deep blue can still beat
> all but the best of the world's grandmasters...
>
> if you give up on teaching a machine "meaning",
> and concentrate on giving it enough rules that
> give the correct results most of the time, you can
> get very close to finishing the job you want done.
>
> of course, this approach is considered "a trick"
> by the artificial-intelligence people, whose aim
> was to "teach meaning" rather than solve a task,
> but that's why those artificial-intelligence people
> have been such a failure themselves...
>
>
>>   When I see a OCR system that just uses raw results,
>>    then I will bow my head in recognition of true achieve meant.
>
> a perfect example of what i just said:
> the objective is to get accurate o.c.r.,
> by whatever means necessary, and
> _not_ to limit yourself to "raw results".
>
> if doing some voodoo gave better o.c.r.,
> we would do it.   this isn't some kind of
> "intellectual challenge" where we find it
> necessary to tie our hands behind our back;
> it is a practical job that needs to be done...
>
> -bowerbird
>
From Bowerbird at aol.com  Sat Mar 11 11:13:05 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Mar 11 11:13:13 2006
Subject: [gutvol-d] google and the translation thing
Message-ID: <244.8584560.31447b41@aol.com>

michael said:
>    This is all just nitpicking and progress is being made,
>    that's all that counts, not now many think ENUF progress.

um, it's certainly not "nitpicking".

perhaps the fact that "progress is being made"
might be "all that counts" to _you_, michael, but
maybe something else counts to someone else...

i certainly think the _methods_ that people are
using to "make progress" is an interesting topic.

moreover, i think it's quite fascinating that those
people whose methods failed to make progress
localize the cause of that failure in some inherent
"difficulty of the task" rather than in their methods.

they then go on to lambast anyone else who thinks
that progress could be made with another method.

and yes, there is much precedent for this on this list.

for the past few years, a number of people have been
telling me that "a plain-text format cannot represent
the range of features in paper-books" simply because
they could not imagine one that could.   but i _can_...

and even when i told you, repeatedly, that i could do it,
they insisted -- just as vehemently -- that i could not...

well, in case you haven't noticed, people, i have begun
the process of giving you unequivocal proof that i can.

and, just as i've known and predicted all along, you will
suddenly become silent with your "that can't be done"
song and dance, and will pretend you never said it at all.


>   "Those doing the impossible
>    should not be interrupted
>    by those who say impossible."
>    Ancient Chinese Proverb

but the ones who say it is impossible
will keep on trying to interrupt them...

because otherwise, their smug picture
of themselves as "invited experts" will
vanish in a puff of their own vapor...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060311/3fbc93a8/attachment.html
From hart at pglaf.org  Sat Mar 11 19:18:39 2006
From: hart at pglaf.org (Michael Hart)
Date: Sat Mar 11 19:18:41 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <244.8584560.31447b41@aol.com>
References: <244.8584560.31447b41@aol.com>
Message-ID: <Pine.LNX.4.60.0603111858540.13127@pglaf.org>


On Sat, 11 Mar 2006 Bowerbird@aol.com wrote:

> michael said:
>>    This is all just nitpicking and progress is being made,
>>    that's all that counts, not now many think ENUF progress.
>
> um, it's certainly not "nitpicking".
>
> perhaps the fact that "progress is being made"
> might be "all that counts" to _you_, michael, but
> maybe something else counts to someone else...
>
> i certainly think the _methods_ that people are
> using to "make progress" is an interesting topic.

It's not the means to the end that count,
it's arriving at the end that counts.

Methodologies are continually being upset
by those who find some other way to do an
expensive [time or money] function for an
infinitessimal amount of the original.

Just try going coast to coast without it.

Of course, if the Wright Brothers were in
the current copyright scenario, we should
only just now have their blueprints in an
ancient and mummified public domain.


> moreover, i think it's quite fascinating that those
> people whose methods failed to make progress
> localize the cause of that failure in some inherent
> "difficulty of the task" rather than in their methods.

This is true enough to all failures to be meaningless
in this particular specification, don't you have some
remarkable insight for THIS specific application?

If not, then why talk in such generalities that these
non-specifics apply to everything in general and thus
to nothing in specific.


> they then go on to lambast anyone else who thinks
> that progress could be made with another method.

Time to look in the mirror, my friend.

Try everything, go which what succeeds.

"Nothing succeeds like success."

> and yes, there is much precedent for this on this list.

Speak for yourself, John.


> for the past few years, a number of people have been
> telling me that "a plain-text format cannot represent
> the range of features in paper-books" simply because
> they could not imagine one that could.   but i _can_...

It only matters when you get to the point of ending the
debating and actually doing something the outside world
can see and work with.

Until then, as S. I. Hawakawa told me was the best thing
he could teach me, it remains in your asylum with you.

Get out into the real world!

Until, it makes no difference to anyone else.


> and even when i told you, repeatedly, that i could do it,
> they insisted -- just as vehemently -- that i could not...

There is only one way to prove them wrong.


> well, in case you haven't noticed, people, i have begun
> the process of giving you unequivocal proof that i can.

"The proof of the pudding is in the eating."

"Alice, Pudding.  Pudding, Alice."

Until your product is introduced to the public,
it's just a Mad Hatter's Tea Party.


> and, just as i've known and predicted all along, you will
> suddenly become silent with your "that can't be done"
> song and dance, and will pretend you never said it at all.

"You" who?

Yoohoo!

"Is there anybody OUT there?"

THAT is the ONLY question that matters OUTSIDE the laboratory.

This is why Doug Englebart with never be credited with eBooks,
they were never released into the wild, as ours are.

Until yours make it in the wild, we'll just never know. . . .


>>   "Those doing the impossible
>>    should not be interrupted
>>    by those who say impossible."
>>    Ancient Chinese Proverb
>
> but the ones who say it is impossible
> will keep on trying to interrupt them...


"PAY NO ATTENTION TO THE MAN BEHIND THE CURTAIN!"


You have had the power all along.


"There's no place like home,

"There's no place like home,

"There's no place like home,

"There's no place like home."

Until your work finds a home,
it may as well be in Oz.


> because otherwise, their smug picture
> of themselves as "invited experts" will
> vanish in a puff of their own vapor...


Only if reality becomes part of the equation.



>
> -bowerbird
>



Thanks!!!

Give the world eBooks in 2006!!!

Michael S. Hart
Founder
Project Gutenberg

From gbnewby at pglaf.org  Sat Mar 11 22:37:50 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat Mar 11 22:37:52 2006
Subject: [gutvol-d] Fwd: Reading software for PG users
Message-ID: <20060312063750.GB16884@pglaf.org>

Some info about speed reading software, below, for anyone
interested:

----- Forwarded message from John Burgess <ad123@ix.netcom.com> -----

From: "John Burgess" <ad123@ix.netcom.com>
To: <gbnewby@pglaf.org>
Subject: Reading software for PG users
Date: Sat, 11 Mar 2006 14:54:49 -0800
Delivered-To: gbnewby@pglaf.org

Hello Gregory:

I represent a reading software company named Rocket Reader that produces a
reading tool many users of Project Gutenberg will be very interested in
using.

Rocket Reader not only improves reading speed and comprehension, but readers
will find it very useful in pacing their reading as well.  There are two
modules in the software that I use for this purpose: Speed Training and
Grouping Training.  Speed Training flashes groups of words from the imported
text on a single line.  It brings the words to your eyes.  The rate can be
set at the desired pace and ramped up as one reads through the text.
Grouping Training begins with the text covered, then reveals it in groups of
words, line-by-line.  The speed and groups size can be set by the reader.

I would like to discuss some arrangement to make Rocket Reader available to
Project Gutenberg users.  Perhaps, we could donate money to the cause as
people begin using Rocket Reader.  I also believe that more people would use
Project Gutenberg if a reading tool like Rocket Reader were available to
them.

You can download a free 45-day trial at
https://www.rocketreader.com/school/trial_aplus.html.

I am more than happy to answer any of your questions.

John Burgess
Ed Tech Consultant
561-889-6585

----- End forwarded message -----
From gbnewby at pglaf.org  Sat Mar 11 22:42:05 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat Mar 11 22:42:06 2006
Subject: [gutvol-d] eBooks on slashdot today
In-Reply-To: <7.0.1.0.2.20060311003844.03737d60@baechler.net>
References: <20060311000702.GA22305@pglaf.org>
	<7.0.1.0.2.20060311003844.03737d60@baechler.net>
Message-ID: <20060312064205.GC16884@pglaf.org>

On Sat, Mar 11, 2006 at 12:43:00AM -0800, Tony Baechler wrote:
> My reasons are 3 and 4 below.  I'm blind so I'm very interested in 
> electronic books.  I read them almost exclusively.  However, as much 
> as I like PG, I get tired of only reading pre-1923 books so I look 
> elsewhere.  I am very greatful to the people who are getting 1950's 
> science fiction cleared.  I assume this falls under rule 6?  I'm 
> greatful, those books aren't generally available except from PG.

Yes, the Sci Fi from 1923-1963 falls under our Rule 6.  The
new HOWTO is under testing, at http://copy.pglaf.org .  Thanks
to Greg Weeks for being one of the pioneers with Rule 6!

There at least 1 million books published from 1923-1963 that
were not renewed, and are therefore public domain in the US.  This
is a huge number, and I hope PG can make a dent in it.  However,
the risk of erroneously claiming public domain on a copyrighted
item is higher, so we're trying to start cautiously.
  -- Greg
From hyphen at hyphenologist.co.uk  Sat Mar 11 23:27:54 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sat Mar 11 23:28:07 2006
Subject: [gutvol-d] Fwd: Reading software for PG users
In-Reply-To: <20060312063750.GB16884@pglaf.org>
References: <20060312063750.GB16884@pglaf.org>
Message-ID: <p3j712dhu88v3fijjnppnj5c93vj9t6v76@4ax.com>

On Sat, 11 Mar 2006 22:37:50 -0800,  Greg Newby <gbnewby@pglaf.org> wrote:

|Some info about speed reading software, below, for anyone
|interested:
|
|----- Forwarded message from John Burgess <ad123@ix.netcom.com> -----
|
|From: "John Burgess" <ad123@ix.netcom.com>
|To: <gbnewby@pglaf.org>
|Subject: Reading software for PG users
|Date: Sat, 11 Mar 2006 14:54:49 -0800
|Delivered-To: gbnewby@pglaf.org
|
|Hello Gregory:
|
|I represent a reading software company named Rocket Reader that produces a
|reading tool many users of Project Gutenberg will be very interested in
|using.
|
|Rocket Reader not only improves reading speed and comprehension, but readers
|will find it very useful in pacing their reading as well.  There are two
|modules in the software that I use for this purpose: Speed Training and
|Grouping Training.  Speed Training flashes groups of words from the imported
|text on a single line.  It brings the words to your eyes.  The rate can be
|set at the desired pace and ramped up as one reads through the text.
|Grouping Training begins with the text covered, then reveals it in groups of
|words, line-by-line.  The speed and groups size can be set by the reader.
|
|I would like to discuss some arrangement to make Rocket Reader available to
|Project Gutenberg users.  Perhaps, we could donate money to the cause as
|people begin using Rocket Reader.  I also believe that more people would use
|Project Gutenberg if a reading tool like Rocket Reader were available to
|them.
|
|You can download a free 45-day trial at
|https://www.rocketreader.com/school/trial_aplus.html.

I bought something like that made in plastic <mumble> years ago.   Never
found it much use :-(  Gave up using it without being able to prove any
increase in reading speed.
-- 
Dave Fawthrop <dave hyphenologist co uk>
Freedom of Speech, Expression, Religion, and Democracy are 
the keys to Civilization, together with legal acceptance of 
Fundamental Human rights.

From Bowerbird at aol.com  Sun Mar 12 01:45:44 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Mar 12 01:45:57 2006
Subject: [gutvol-d] google and the translation thing
Message-ID: <c0.38e4a5a5.314547c8@aol.com>

michael said:
>    Methodologies are continually being upset
>    by those who find some other way to do an
>    expensive [time or money] function for an
>    infinitessimal amount of the original.

well, i think we're both on the same page...
except you're reading it and i'm writing it...        :+)

(in other words, if one is interested in
the actual upsetting of methodologies,
then one pays some attention to them.
otherwise, one waits until they play out.)

google is upsetting the methodologies here.
and you are counting that machine translation
will become up to snuff sooner or later, and
you're not interesting in the interim period.
both positions are equally fine to hold...


>    If not, then why talk in such generalities

i don't think i'm talking about "generalities" at all.

in the current case, google is the entity doing it --
or so it has been reported, whether true or not --
and keith is the entity saying "it can't be done"...
(well, he said "not in the next 100 or so years".)

and in the other case i've mentioned -- the "dispute"
between me and my "detractors" on this listserve --
there are no "generalities" either.   we spent 2 years
going back and forth at each other, so the positions
are well-staked-out in the archives if you're curious.


>    It only matters when you get to the point of 
>    ending the debating and actually doing something 
>    the outside world can see and work with.

right.

except when you're playing poker,
the object is to win as much money
as possible with the hands you win,
and to lose as little as possible with
the ones that you lose.

and that means you don't always
show all of your cards right away...

google ain't showing all their cards.

and i ain't showing all mine either...

but i'm starting to show _some_.
so we're past the point where
any of this matters any more,
in the matter of me vs. this list,
since pudding is being served...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060312/383b478d/attachment.html
From hart at pglaf.org  Sun Mar 12 14:14:31 2006
From: hart at pglaf.org (Michael Hart)
Date: Sun Mar 12 14:14:32 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <c0.38e4a5a5.314547c8@aol.com>
References: <c0.38e4a5a5.314547c8@aol.com>
Message-ID: <Pine.LNX.4.60.0603121355070.12337@pglaf.org>


On Sun, 12 Mar 2006 Bowerbird@aol.com wrote:

> michael said:
>>    Methodologies are continually being upset
>>    by those who find some other way to do an
>>    expensive [time or money] function for an
>>    infinitessimal amount of the original.
>
> well, i think we're both on the same page...
> except you're reading it and i'm writing it...        :+)

If you only wrote as much and as well as you think,
we might be in a much better place.


> (in other words, if one is interested in
> the actual upsetting of methodologies,
> then one pays some attention to them.
> otherwise, one waits until they play out.)

No, one need not know the methodologies in use
to come up with better ones.  Only the reverse
engineering types rely on this sort of logic.

Innovation, when it comes, usually comes from
a source well outside current methodologies,
the rest is just incrementalism.


> google is upsetting the methodologies here.

I guess you weren't aware of Golden Bow and the
others who preceded them.

I've been talking about Machine Translation since
before there even was a Google.

Don't you remember?

One of the most difficult things about speaking
with you is your apparent lack of memory/attn.

This is what is most likely to get you placed
on the spam list.


> and you are counting that machine translation
> will become up to snuff sooner or later, and
> you're not interesting in the interim period.
> both positions are equally fine to hold...

Again you seem to have not paid attention. . . .

_I_ have been promoting the interim phases as
good enough for people to work with, while YOU
have been saying that only perfection is enough,
or close to it.

Again, you are just asking to be ignored by any
of the people who actually TRY to follow your words.


>>    If not, then why talk in such generalities
>
> i don't think i'm talking about "generalities" at all.
>
> in the current case, google is the entity doing it --
> or so it has been reported, whether true or not --
> and keith is the entity saying "it can't be done"...
> (well, he said "not in the next 100 or so years".)

I'm sticking with my original prediction:

By the time we have put a sigifican dent in public domain
books that are available, 10-20 million eBooks, the next
big thing after OCR will be MT. . .AND. . .this will all
start to take place in the public eye by 2020.

Just in case you try to misinterpret that. . .14 years.


> and in the other case i've mentioned -- the "dispute"
> between me and my "detractors" on this listserve --
> there are no "generalities" either.   we spent 2 years
> going back and forth at each other, so the positions
> are well-staked-out in the archives if you're curious.

Sadly to say, I have read the vast majority of your
archived messages, and have no desire to again.

Obviously not even YOU think they are worth quoting,
or you would have.


>>    It only matters when you get to the point of
>>    ending the debating and actually doing something
>>    the outside world can see and work with.
>
> right.

If only you said what you meant, and meant what you said.

Back to Alice.


> except when you're playing poker,
> the object is to win as much money
> as possible with the hands you win,
> and to lose as little as possible with
> the ones that you lose.

This is NOT a GAME, and MONEY is NOT the OBJECT.

Again I refer you to Jon Noring, you have more
in common than you would like to think.


> and that means you don't always
> show all of your cards right away...

As above, stop PLAYING, start WORKING.

Remember your physics lessons?

It's not WORK if you don't MOVE something
and then KEEP IT THERE.


> google ain't showing all their cards.

Sadly to say, I expected more of you than of Google.


> and i ain't showing all mine either...

Sadly to say. . . .

No one can see or build on your work.

You might have been a giant for someone to stand on.


"If I have seen further, it is because I have
stood on the shoulders of giants."  Newton


> but i'm starting to show _some_.

Sorry, strip-tease is not acceptable.


> so we're past the point where
> any of this matters any more,
> in the matter of me vs. this list,
> since pudding is being served...

And that is why you will likkely
continue to be ignored.


> -bowerbird
>


mh
From schultzk at uni-trier.de  Mon Mar 13 00:32:11 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Mon Mar 13 00:32:18 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <111.5cbf1cc9.3143187e@aol.com>
References: <111.5cbf1cc9.3143187e@aol.com>
Message-ID: <17512C8C-F012-4025-96FB-561B3A4576A4@uni-trier.de>

Hi,

Am 10.03.2006 um 18:59 schrieb Bowerbird@aol.com:

> keith said:
> >   Just the opposite is the case.
> >   Believe me as a computer linguist.
>
> i believe that the computer linguists
> have not been able to solve the problem.
	Exactly.
>
> i also believe that google's research lab
> _will_ be able to solve it.  i doubt they have
> "solved" it yet, and i'm sure when they do,
> their "solution" won't be "perfect enough"
> for the computer linguists, but nonetheless...
	If coputers linguists have not solved the problems
	in 20 years, google probably will not either ;-))
	They might, but very unlikely.
>
>
> >   What has happened. Vaporware and results.
> >   It simply does not work. Language can not be
> >   sucessfully model. Languages are regularly formed,
> >   nor well formed.
>
> and here's a great example of why it won't be "perfect".
> just in the sentences quoted above: there should be a
> question-mark after "happened"; there seems to be a
> missing adjective before "results";' "successfully" is not
> spelled correctly.  and there seems to be a missing word
> between "regularly" and "formed"; yet despite all these
> shortcomings, i know exactly what you meant to say...
>
> (and i don't mean to be picking on you if english is not
> your first language.  i only speak one language, so i am
> the last person to criticize anyone else on that dimension.
	Ouch! I am very sorry. Please excuse me. I had alot of work to do
	long hours last week(,) and people in and out of the office.
	I knew I had alot of booboos in my post.

> the point is that human beings are very good at resolving
> the ambiguity that results from incomplete information,
> and we probably can't reasonably expect that of machines.
> but it is simply not that case that ambiguity permeates
> _every_aspect_ of language; clarity is not impossible.)
>
>
> >   All AI projects so far have failed
> >   and failure has been admitted.
>
> yes it has been, yet deep blue can still beat
> all but the best of the world's grandmasters...
	Gottcha ,-)))) Big blue is not AI it is brute force.
	I ould be glad to dicuss this one. Directly with you,
	 if you care to. This would be OT.


>
> if you give up on teaching a machine "meaning",
> and concentrate on giving it enough rules that
> give the correct results most of the time, you can
> get very close to finishing the job you want done.
	That has been tried in some AI projects, and failed!
>
> of course, this approach is considered "a trick"
> by the artificial-intelligence people, whose aim
> was to "teach meaning" rather than solve a task,
> but that's why those artificial-intelligence people
> have been such a failure themselves...
	This is getting OT, too. But, The reason they are failing
	is due to the pardigm that language is meaning.
	Humans when resolving language(understanding) and
	more so translating it use moren than thier knowledge
	or the language to solve these tasks.

>
>
> >   When I see a OCR system that just uses raw results,
> >   then I will bow my head in recognition of true achieve meant.
>
> a perfect example of what i just said:
> the objective is to get accurate o.c.r.,
> by whatever means necessary, and
> _not_ to limit yourself to "raw results".
	Yet, i order to get better over all result, we need
	better "raw results" OCR has come a long way since
	they use dictionary. Adding a DB with phrasal information
	will bring along another 2 %, but the costs of the other
	side would be about 50% in resources. Sure cheaper
	computers, memory, availibity of google will help. Yet
	it is not the holy grail. Also, as a OT example. How long have
	we been waiting for the 3 liter car(3 liters per 100km).
	Well, it has be here since the 80s. A engineer had modify
	a VW rabbit(just the form of the pistons) and it only need 1 Gallon
	per 62 miles!! Money rules. ( O.K. very off topic)


>
> if doing some voodoo gave better o.c.r.,
> we would do it.  this isn't some kind of
> "intellectual challenge" where we find it
> necessary to tie our hands behind our back;
> it is a practical job that needs to be done...
	Exactly, my point. Things, work for most simple
	every day tasks, but ....

		Keith.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/1cfd7c05/attachment.html
From schultzk at uni-trier.de  Mon Mar 13 00:39:09 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Mon Mar 13 00:39:13 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <244.8584560.31447b41@aol.com>
References: <244.8584560.31447b41@aol.com>
Message-ID: <C04E20CD-EABF-45EB-9330-6EE51C64AE52@uni-trier.de>

Hi There,

Am 11.03.2006 um 20:13 schrieb Bowerbird@aol.com:

> michael said:
> >   This is all just nitpicking and progress is being made,
> >   that's all that counts, not now many think ENUF progress.
>
> um, it's certainly not "nitpicking".
	I do not feel he is nitpicking

>
> perhaps the fact that "progress is being made"
> might be "all that counts" to _you_, michael, but
> maybe something else counts to someone else...
	I agree that we must talk about methods. I had jumped in
	because the method had already been tested. I have
	worked with it myself and also some modifications there of.
[snip, snip]
>
> >   "Those doing the impossible
> >   should not be interrupted
> >   by those who say impossible."
> >   Ancient Chinese Proverb
>
> but the ones who say it is impossible
> will keep on trying to interrupt them...
	I do not say, impossible. I say, highly improbable.
	
	I agree though with Micheal, that we ought to take
	this somewhere else!! As it is growing very OT to PG.

		Keith.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/2b594d4a/attachment.html
From schultzk at uni-trier.de  Mon Mar 13 01:03:04 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Mon Mar 13 01:03:09 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <c0.38e4a5a5.314547c8@aol.com>
References: <c0.38e4a5a5.314547c8@aol.com>
Message-ID: <8FB6C99B-2708-4DF1-81DA-54B7C791B0C9@uni-trier.de>

Hi Again,
	I will come back in.
Am 12.03.2006 um 10:45 schrieb Bowerbird@aol.com:

> michael said:
> >   Methodologies are continually being upset
> >   by those who find some other way to do an
> >   expensive [time or money] function for an
> >   infinitessimal amount of the original.
	My point was that it IS NOT A NEW METHOLOGY OR NEW
	METHOD!!

>
> well, i think we're both on the same page...
> except you're reading it and i'm writing it...       :+)
>
> (in other words, if one is interested in
> the actual upsetting of methodologies,
> then one pays some attention to them.
> otherwise, one waits until they play out.)
>
> google is upsetting the methodologies here.
> and you are counting that machine translation
> will become up to snuff sooner or later, and
> you're not interesting in the interim period.
> both positions are equally fine to hold...
	Actually, there is a very excellent transltion system out there
	already. SYSTRANS. But, what is availibable to the public
	you can forget. It uses grammar models, lexica and a lot more
	vodoo. They claim 95-99% out of the box. What is its draw back.
	It needs a hell of a lot of computing power. Even works with
	voice. Do not ask me what it costs either.

>
>
> >   If not, then why talk in such generalities
>
> i don't think i'm talking about "generalities" at all.
>
> in the current case, google is the entity doing it --
> or so it has been reported, whether true or not --
> and keith is the entity saying "it can't be done"...
> (well, he said "not in the next 100 or so years".)
	As I have mentioned before the method is not new.
	It will give you acceptable results for the average joe.
	It will not work for PG.

>
> and in the other case i've mentioned -- the "dispute"
> between me and my "detractors" on this listserve --
> there are no "generalities" either.  we spent 2 years
> going back and forth at each other, so the positions
> are well-staked-out in the archives if you're curious.
>
>
> >   It only matters when you get to the point of
> >   ending the debating and actually doing something
> >   the outside world can see and work with.
>
> right.
>
> except when you're playing poker,
> the object is to win as much money
> as possible with the hands you win,
> and to lose as little as possible with
> the ones that you lose.
>
> and that means you don't always
> show all of your cards right away...
>
> google ain't showing all their cards.
>
> and i ain't showing all mine either...
	This reminds me of my first semester in CL.
	Where the great inovators said my method is
	better than yours. I can do this you can not.
	You can do that, but I can do this. Na na nah nah!

	But, as Micheal said, we shall see if google will
	revelutionize the world of MT. I doubt it very much.
	Of course I could ask for my money back from the
	unversity.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/b41007f7/attachment-0001.html
From schultzk at uni-trier.de  Mon Mar 13 01:20:14 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Mon Mar 13 01:20:20 2006
Subject: [gutvol-d] google and the translation thing
Message-ID: <6D5B4F9A-E710-4C0F-836E-72D887F9E050@uni-trier.de>

Maybe one last word here.


	The EU use MT technologies to translate the bulk. Yet, the produced
	texts are still manually processed by humans to get it right.
	If the google method was so good the EU would not need translators
	since thier written texts as basically similar in all langauges.  
They are
	basically formal debates and legistative in form.

		Keith.

From hyphen at hyphenologist.co.uk  Mon Mar 13 02:23:24 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Mon Mar 13 02:23:36 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <6D5B4F9A-E710-4C0F-836E-72D887F9E050@uni-trier.de>
References: <6D5B4F9A-E710-4C0F-836E-72D887F9E050@uni-trier.de>
Message-ID: <u9ha125o3mqt0s372hft0qn46gdi4m5br9@4ax.com>

On Mon, 13 Mar 2006 10:20:14 +0100,  "Keith J. Schultz"
<schultzk@uni-trier.de> wrote:

|Maybe one last word here.
|
|
|	The EU use MT technologies to translate the bulk. Yet, the produced
|	texts are still manually processed by humans to get it right.
|	If the google method was so good the EU would not need translators
|	since thier written texts as basically similar in all langauges.  
|They are
|	basically formal debates and legistative in form.
|
|		Keith.

The EU MT technologies are specifically adjusted to work with the
specialised language/subjects used by the EU for laws and political
debates.    

Googles proposals are for general text, and therefor *much* *much* more
demanding.

IMO The Google proposals will never get better than a first pass, before a
human does the job properly.    

I use Systran, the market leader, on occasion, and its translations are at
best understandable.   
-- 
Dave Fawthrop <dave hyphenologist co uk>
Freedom of Speech, Expression, Religion, and Democracy are 
the keys to Civilization, together with legal acceptance of 
Fundamental Human rights.

From schultzk at uni-trier.de  Mon Mar 13 03:26:13 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Mon Mar 13 03:26:19 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <u9ha125o3mqt0s372hft0qn46gdi4m5br9@4ax.com>
References: <6D5B4F9A-E710-4C0F-836E-72D887F9E050@uni-trier.de>
	<u9ha125o3mqt0s372hft0qn46gdi4m5br9@4ax.com>
Message-ID: <E40A0F1F-360B-4202-9908-E50F369F2279@uni-trier.de>

Hi There,

	This debate is becomming very tedious.

Am 13.03.2006 um 11:23 schrieb Dave Fawthrop:

> On Mon, 13 Mar 2006 10:20:14 +0100,  "Keith J. Schultz"
> <schultzk@uni-trier.de> wrote:
>
> |Maybe one last word here.
> |
> |
> |	The EU use MT technologies to translate the bulk. Yet, the produced
> |	texts are still manually processed by humans to get it right.
> |	If the google method was so good the EU would not need translators
> |	since thier written texts as basically similar in all langauges.
> |They are
> |	basically formal debates and legistative in form.
> |
> |		Keith.
>
> The EU MT technologies are specifically adjusted to work with the
> specialised language/subjects used by the EU for laws and political
> debates.
>
	exactly.

> Googles proposals are for general text, and therefor *much* *much*  
> more
> demanding.
	More demanding, definately. If it does not even work for a
	specialized field, how do you expect it to work in a
	general text!?? I have been here, there and back again.
	

>
> IMO The Google proposals will never get better than a first pass,  
> before a
> human does the job properly.
	Just what I saying.
>
> I use Systran, the market leader, on occasion, and its translations  
> are at
> best understandable.
	Which product. Already, mentioned that the better products are not
	availible to the general public.

	Keith.

From Bowerbird at aol.com  Mon Mar 13 03:58:23 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Mar 13 03:58:28 2006
Subject: [gutvol-d] monday morning quarterback -- #02
Message-ID: <2b5.63d87a7.3146b85f@aol.com>

week #2 of "m.m.q." - "monday morning quarterback" -- is up:
>    http://snowy.arsc.alaska.edu/bowerbird/mmq/mmq02.txt
>    http://groups.yahoo.com/group/bpsuper/message/4

this weeks topic is "what do we want our final text to look like?"

***

3 examples of "continuous proofreading":
>    http://www.greatamericannovel.com/mabie/mabiep001
>    http://www.greatamericannovel.com/myant/myantc001
>    http://www.greatamericannovel.com/tolbk/tolbkp001

and their underlying text-file masters:
>    http://www.greatamericannovel.com/mabie/mabie.zml
>    http://www.greatamericannovel.com/myant/myant.zml
>    http://www.greatamericannovel.com/tolbk/tolbk.zml

comparison of "straight-out-of-ocr" and "final version" e-texts:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myant-ocr.txt
>    http://www.greatamericannovel.com/myant/myant.zml

the error-report form now provides cross-links from each
specific page to an overall "error-report page" for each book:
>    http://www.greatamericannovel.com/mabie/mabie-er.html
>    http://www.greatamericannovel.com/myant/myant-er.html
>    http://www.greatamericannovel.com/tolbk/tolbk-er.html

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/f9056fbf/attachment.html
From Gutenberg9443 at aol.com  Mon Mar 13 06:17:56 2006
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Mon Mar 13 06:18:16 2006
Subject: [gutvol-d] need volunteers in Dallas
Message-ID: <241.885df10.3146d914@aol.com>

Do we have two (or more) volunteers in Dallas who are tactful and fast on  
the uptake and totally free the week of July 1 through July 7 and can find the  
Hilton Anatole without getting lost? Reply directly to  
me--Gutenberg9443@aol.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/2e104154/attachment.html
From sly at victoria.tc.ca  Mon Mar 13 09:05:21 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon Mar 13 09:05:25 2006
Subject: [gutvol-d] PG#17945 Mark Twain: Tri Noveloj
Message-ID: <Pine.GSO.4.58.0603130903440.25232@vtn1.victoria.tc.ca>


One recently posted PG text (#17945 Mark Twain: Tri Noveloj) might
cause some confusion to users regarding its copyright status.

This is a contemperary Esperanto translation of three Mark Twain
stories. I believe the translator, Edwin Grobe, has recently
explicitly released these, and his other translations of American
literature, into the public domain.

However, this text contains the prominent statement "Copyright 1999",
copied from the original printed book. Am I right in thinking this
could lead to confusion?


Andrew
From greg at durendal.org  Mon Mar 13 10:58:31 2006
From: greg at durendal.org (Greg Weeks)
Date: Mon Mar 13 11:30:05 2006
Subject: [gutvol-d] PG#17945 Mark Twain: Tri Noveloj
In-Reply-To: <Pine.GSO.4.58.0603130903440.25232@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0603130903440.25232@vtn1.victoria.tc.ca>
Message-ID: <Pine.LNX.4.63.0603131357260.7770@durendal.durendal.org>

On Mon, 13 Mar 2006, Andrew Sly wrote:

> However, this text contains the prominent statement "Copyright 1999",
> copied from the original printed book. Am I right in thinking this
> could lead to confusion?

This seems to be pretty common in the rule 6 stuff I've been clearing. I 
think it's confusing to have an incorrect copyright statement in the PG 
version even when the original book had the statement.

-- 
Greg Weeks
http://durendal.org:8080/greg/

From Bowerbird at aol.com  Mon Mar 13 12:14:28 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Mar 13 12:14:39 2006
Subject: [gutvol-d] periodic update
Message-ID: <21d.9c6beac.31472ca4@aol.com>

for what it's worth...

neither "the secret garden"
nor "swiss family robinson"
have been updated with
corrections to the errors
that i listed in posts here...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/a8d73df5/attachment.html
From hart at pglaf.org  Mon Mar 13 12:21:16 2006
From: hart at pglaf.org (Michael Hart)
Date: Mon Mar 13 12:21:18 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <E40A0F1F-360B-4202-9908-E50F369F2279@uni-trier.de>
References: <6D5B4F9A-E710-4C0F-836E-72D887F9E050@uni-trier.de>
	<u9ha125o3mqt0s372hft0qn46gdi4m5br9@4ax.com>
	<E40A0F1F-360B-4202-9908-E50F369F2279@uni-trier.de>
Message-ID: <Pine.LNX.4.60.0603131220310.3961@pglaf.org>


On Mon, 13 Mar 2006, Keith J. Schultz wrote:

> Hi There,
>
> 	This debate is becomming very tedious.

Machine Translation is such an important issue that discussion
should not be limited, especially here.


Thanks!!!

Give the world eBooks in 2006!!!

Michael S. Hart
Founder
Project Gutenberg


>
> Am 13.03.2006 um 11:23 schrieb Dave Fawthrop:
>
>> On Mon, 13 Mar 2006 10:20:14 +0100,  "Keith J. Schultz"
>> <schultzk@uni-trier.de> wrote:
>> 
>> |Maybe one last word here.
>> |
>> |
>> |	The EU use MT technologies to translate the bulk. Yet, the produced
>> |	texts are still manually processed by humans to get it right.
>> |	If the google method was so good the EU would not need translators
>> |	since thier written texts as basically similar in all langauges.
>> |They are
>> |	basically formal debates and legistative in form.
>> |
>> |		Keith.
>> 
>> The EU MT technologies are specifically adjusted to work with the
>> specialised language/subjects used by the EU for laws and political
>> debates.
>> 
> 	exactly.
>
>> Googles proposals are for general text, and therefor *much* *much* more
>> demanding.
> 	More demanding, definately. If it does not even work for a
> 	specialized field, how do you expect it to work in a
> 	general text!?? I have been here, there and back again.
> 
>> 
>> IMO The Google proposals will never get better than a first pass, before a
>> human does the job properly.
> 	Just what I saying.
>> 
>> I use Systran, the market leader, on occasion, and its translations are at
>> best understandable.
> 	Which product. Already, mentioned that the better products are not
> 	availible to the general public.
>
> 	Keith.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From hart at pglaf.org  Mon Mar 13 12:24:16 2006
From: hart at pglaf.org (Michael Hart)
Date: Mon Mar 13 12:24:17 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <C04E20CD-EABF-45EB-9330-6EE51C64AE52@uni-trier.de>
References: <244.8584560.31447b41@aol.com>
	<C04E20CD-EABF-45EB-9330-6EE51C64AE52@uni-trier.de>
Message-ID: <Pine.LNX.4.60.0603131222570.3961@pglaf.org>


On Mon, 13 Mar 2006, Keith J. Schultz wrote:

> Hi There,
>
> Am 11.03.2006 um 20:13 schrieb Bowerbird@aol.com:
>
>> michael said:
>> >   This is all just nitpicking and progress is being made,
>> >   that's all that counts, not now many think ENUF progress.
>> 
>> um, it's certainly not "nitpicking".
> 	I do not feel he is nitpicking
>
>> 
>> perhaps the fact that "progress is being made"
>> might be "all that counts" to _you_, michael, but
>> maybe something else counts to someone else...
> 	I agree that we must talk about methods. I had jumped in
> 	because the method had already been tested. I have
> 	worked with it myself and also some modifications there of.
> [snip, snip]
>> 
>> >   "Those doing the impossible
>> >   should not be interrupted
>> >   by those who say impossible."
>> >   Ancient Chinese Proverb
>> 
>> but the ones who say it is impossible
>> will keep on trying to interrupt them...
> 	I do not say, impossible. I say, highly improbable.
> 		I agree though with Micheal, that we ought to take
> 	this somewhere else!! As it is growing very OT to PG.
>
> 		Keith.


Sorry, you must be agreeing with someone else, not me.

_I_ think MT will become one of the most MAJOR topics of PG,
and that we should do all we can to stay on top of MT items.

Michael
From marcello at perathoner.de  Mon Mar 13 13:06:10 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon Mar 13 13:06:14 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <Pine.LNX.4.60.0603131222570.3961@pglaf.org>
References: <244.8584560.31447b41@aol.com>	<C04E20CD-EABF-45EB-9330-6EE51C64AE52@uni-trier.de>
	<Pine.LNX.4.60.0603131222570.3961@pglaf.org>
Message-ID: <4415DEC2.7040201@perathoner.de>

Michael Hart wrote:

> _I_ think MT will become one of the most MAJOR topics of PG,
> and that we should do all we can to stay on top of MT items.

I think robots will become the major producers of ebooks for PG and thus 
we should stay ahead of robot technology.

Being a visionary is easy if you have generic enough visions and you 
don't commit to a timeline.



Give the world visions in 2006!!!


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Mon Mar 13 13:06:05 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon Mar 13 13:06:15 2006
Subject: [gutvol-d] periodic update
In-Reply-To: <21d.9c6beac.31472ca4@aol.com>
References: <21d.9c6beac.31472ca4@aol.com>
Message-ID: <4415DEBD.6060809@perathoner.de>

Bowerbird@aol.com wrote:

> neither "the secret garden"
> nor "swiss family robinson"
> have been updated with
> corrections to the errors
> that i listed in posts here...

I mailed a juicy list of errata to "Pride and Prejudice" to 
errata@pglaf.org and they got applied in less than 48 hours.

Ergo: the errata team works fine.

If your corrections still don't work, you have to look elsewhere to find 
the culprit.



-- 
Marcello Perathoner
webmaster@gutenberg.org

From walter.van.holst at xs4all.nl  Mon Mar 13 14:43:16 2006
From: walter.van.holst at xs4all.nl (Walter H. van Holst)
Date: Mon Mar 13 14:56:09 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <u9ha125o3mqt0s372hft0qn46gdi4m5br9@4ax.com>
References: <6D5B4F9A-E710-4C0F-836E-72D887F9E050@uni-trier.de>
	<u9ha125o3mqt0s372hft0qn46gdi4m5br9@4ax.com>
Message-ID: <1142289796.3636.11.camel@God>

On Mon, 2006-03-13 at 10:23 +0000, Dave Fawthrop wrote:

> The EU MT technologies are specifically adjusted to work with the
> specialised language/subjects used by the EU for laws and political
> debates.    

So if Google manages to find a way to classify the domain of a text, it
could use domain-specific MT to achieve the same results. I wouldn't be
surprised if, using some fairly basic statistic methods, classifying the
domain of a text would be trivial.

Regards,

 Walter

From Bowerbird at aol.com  Mon Mar 13 16:03:07 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Mar 13 16:03:13 2006
Subject: [gutvol-d] periodic update
Message-ID: <76.662d55b9.3147623b@aol.com>

carlo said:
>    This is not the proper place to notify errors

part of the experiment is to see
how long it takes someone to
practice the advice they give me,
and send my error-reports to
"the proper place"...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/0335bebe/attachment.html
From joshua at hutchinson.net  Mon Mar 13 20:12:49 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Mar 13 20:11:13 2006
Subject: [gutvol-d] periodic update
Message-ID: <20060314041249.2D3219EE8E@ws6-2.us4.outblaze.com>

When you've been told repeatedly where the proper place is and you refuse to send them there ... why should anyone else do your work for you?


> ----- Original Message -----
> From: Bowerbird@aol.com
> 
> carlo said:
> >    This is not the proper place to notify errors
> 
> part of the experiment is to see
> how long it takes someone to
> practice the advice they give me,
> and send my error-reports to
> "the proper place"...

From Bowerbird at aol.com  Mon Mar 13 20:25:39 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Mar 13 20:25:47 2006
Subject: [gutvol-d] periodic update
Message-ID: <21e.9cf2f13.31479fc3@aol.com>

joshua said:
>    why should anyone else do your work for you?

it's not my job to send error-reports to a specific place.
or indeed, even to prepare the things in the first place...

and neither is it your job to forward error-reports on.
or anyone else's job either, for that matter.   granted...

so i don't think anyone can be assessed "blame" here.

i'm just curious about how long an error-report
will be left laying around before _someone_ acts
on it, even though it's _not_ their "job" to do so...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060313/6f7dbb44/attachment.html
From ben.crowder at gmail.com  Mon Mar 13 20:53:44 2006
From: ben.crowder at gmail.com (Ben Crowder)
Date: Mon Mar 13 20:59:45 2006
Subject: [gutvol-d] periodic update
In-Reply-To: <21e.9cf2f13.31479fc3@aol.com>
References: <21e.9cf2f13.31479fc3@aol.com>
Message-ID: <a6270b5d6e44da24e8f28bee646850c7@gmail.com>

bowerbird said:
>  i'm just curious about how long an error-report
>  will be left laying around before _someone_ acts
>  on it, even though it's _not_ their "job" to do so...

To what purpose?  Waste time?  We're not here to babysit.  There's a 
place for error reports to be sent, and it's completely beside the 
point to leave them "laying around" waiting for someone else to pick 
them up.  Do you leave your dirty laundry on the ground waiting for 
someone else to pick it up and put it away?  Come on, let's be 
reasonable here.  Better to use that time to further the cause and get 
more eBooks made (or send the error reports to the proper place so we 
can fix them and move on).

I suspect these are wasted words, though, judging from your e-mails in 
the archive. ~sigh~

Ben

--
Ben Crowder <ben.crowder@gmail.com>
MSN: ben.crowder@gmail.com
Website: http://www.blankslate.net/
Blog: http://www.topofthemountains.net/

From gbnewby at pglaf.org  Mon Mar 13 22:14:41 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon Mar 13 22:14:43 2006
Subject: [gutvol-d] periodic update
In-Reply-To: <21d.9c6beac.31472ca4@aol.com>
References: <21d.9c6beac.31472ca4@aol.com>
Message-ID: <20060314061441.GD19944@pglaf.org>

On Mon, Mar 13, 2006 at 03:14:28PM -0500, Bowerbird@aol.com wrote:
> for what it's worth...
> 
> neither "the secret garden"
> nor "swiss family robinson"
> have been updated with
> corrections to the errors
> that i listed in posts here...
> 
> -bowerbird

> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


None of the errata team subscribe to gutvol-d, so
don't see your posts.  

I don't see your posts either, but saw your thread had been
responded to.

To report errors, see our procedure here:
	http://www.gutenberg.org/faq/R-26

Short version: email to errata@pglaf.org

I don't know what error reports you're talking about, or
from when, and am too lazy to go hunting.  Please report 'em,
and they'll be fixed.
  -- Greg
From gbnewby at pglaf.org  Mon Mar 13 22:19:29 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon Mar 13 22:19:31 2006
Subject: [gutvol-d] PG#17945 Mark Twain: Tri Noveloj
In-Reply-To: <Pine.LNX.4.63.0603131357260.7770@durendal.durendal.org>
References: <Pine.GSO.4.58.0603130903440.25232@vtn1.victoria.tc.ca>
	<Pine.LNX.4.63.0603131357260.7770@durendal.durendal.org>
Message-ID: <20060314061929.GE19944@pglaf.org>

On Mon, Mar 13, 2006 at 01:58:31PM -0500, Greg Weeks wrote:
> On Mon, 13 Mar 2006, Andrew Sly wrote:
> 
> >However, this text contains the prominent statement "Copyright 1999",
> >copied from the original printed book. Am I right in thinking this
> >could lead to confusion?
> 
> This seems to be pretty common in the rule 6 stuff I've been clearing. I 
> think it's confusing to have an incorrect copyright statement in the PG 
> version even when the original book had the statement.

I agree with you, Greg.  We had some discussion about this
among the whitewashers team, and I know it's come up in some
DP forum discussions.

The PG policy is that it's the producer's choice whether
to leave such info in.  (We used to have a policy against 
including a transcription of the full title/verso page due
to concerns about trademarked publishers' names, but we've
since received legal advice this is not a major concern.)

Personally, I'd most likely opt to add a note somewhere
where an outdated copyright statement appears, reaffirming
the public domain status of the PG eBook.
  -- Greg
 
From sly at victoria.tc.ca  Mon Mar 13 22:42:32 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon Mar 13 22:42:35 2006
Subject: [gutvol-d] PG#17945 Mark Twain: Tri Noveloj
In-Reply-To: <20060314061929.GE19944@pglaf.org>
References: <Pine.GSO.4.58.0603130903440.25232@vtn1.victoria.tc.ca>
	<Pine.LNX.4.63.0603131357260.7770@durendal.durendal.org>
	<20060314061929.GE19944@pglaf.org>
Message-ID: <Pine.GSO.4.58.0603132236450.12957@vtn1.victoria.tc.ca>


On Mon, 13 Mar 2006, Greg Newby wrote:

> The PG policy is that it's the producer's choice whether
> to leave such info in.  (We used to have a policy against
> including a transcription of the full title/verso page due
> to concerns about trademarked publishers' names, but we've
> since received legal advice this is not a major concern.)
>
> Personally, I'd most likely opt to add a note somewhere
> where an outdated copyright statement appears, reaffirming
> the public domain status of the PG eBook.
>   -- Greg
>

That seems like decent reasoning. This is the kind of information
that some people will complain if you put it in, and some will
complain if you leave it out.

I don't have a problem with it being left in, but i think it
is a good idea to have it identified in some way as information
from the source text, which is not applicable to the PG
transcription.

Andrew
From jeroen.mailinglist at bohol.ph  Tue Mar 14 13:28:02 2006
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Tue Mar 14 13:47:58 2006
Subject: [gutvol-d] Producing texts for text-to-speech.
In-Reply-To: <20060314061441.GD19944@pglaf.org>
References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org>
Message-ID: <44173562.4060705@bohol.ph>

Hi All,

I am studying the options for preparing ebooks for text-to-speech. Does 
anybody have experience with that and willing to share experience.

I am looking at things like SSML, aural-CSS, and text-to-speech 
software. Any software that can support this? My intention is to add the 
relevant tags to my TEI master, and generate SSML from that, feed that 
to TTS software to obtain audio files (Ideally, I would only post the 
SSML, and let people regenerate the speech when needed). Any tools that 
can be advised?

Things to consider are additional tags to disambiguate words with 
identical spelling (read and read; record and record, for example), and 
to help pronouncing dates, currency amounts, measures, abbreviations, etc.

Issues I found is lack of support for things like aural CSS, expensive 
software, etc.

Jeroen.

From sly at victoria.tc.ca  Tue Mar 14 13:55:58 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Mar 14 13:56:03 2006
Subject: [gutvol-d] Producing texts for text-to-speech.
In-Reply-To: <44173562.4060705@bohol.ph>
References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org>
	<44173562.4060705@bohol.ph>
Message-ID: <Pine.GSO.4.58.0603141354540.22756@vtn1.victoria.tc.ca>


In case you have not seen it yet, I'd suggest taking a
look at DAISY: http://www.daisy.org/

Andrew

On Tue, 14 Mar 2006, Jeroen Hellingman (Mailing List Account) wrote:

> Hi All,
>
> I am studying the options for preparing ebooks for text-to-speech. Does
> anybody have experience with that and willing to share experience.
>
> I am looking at things like SSML, aural-CSS, and text-to-speech
> software. Any software that can support this? My intention is to add the
> relevant tags to my TEI master, and generate SSML from that, feed that
> to TTS software to obtain audio files (Ideally, I would only post the
> SSML, and let people regenerate the speech when needed). Any tools that
> can be advised?
>
> Things to consider are additional tags to disambiguate words with
> identical spelling (read and read; record and record, for example), and
> to help pronouncing dates, currency amounts, measures, abbreviations, etc.
>
> Issues I found is lack of support for things like aural CSS, expensive
> software, etc.
>
> Jeroen.
>
From hart at pglaf.org  Tue Mar 14 14:19:22 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue Mar 14 14:19:23 2006
Subject: [gutvol-d] periodic update
In-Reply-To: <21e.9cf2f13.31479fc3@aol.com>
References: <21e.9cf2f13.31479fc3@aol.com>
Message-ID: <Pine.LNX.4.60.0603141416151.4132@pglaf.org>


The buck stops here.

Send error messages to me, and then keep after me
to make sure they are followed up on.

You can also report errors directly to:

bugs@pglaf.org

or send updated files to

errata@pglaf.org

but please continue to also send directly to me.

Please resend to me if you don't hear from them within a few days.

This is from an FAQ that I include in replies to error messages.


Thanks!


Michael S. Hart
<hart@pobox.com>
Project Gutenberg
"*Ask Dr. Internet*"
Executive Coordinator
"*Internet User ~#100*"

On Mon, 13 Mar 2006 Bowerbird@aol.com wrote:

> joshua said:
>>    why should anyone else do your work for you?
>
> it's not my job to send error-reports to a specific place.
> or indeed, even to prepare the things in the first place...
>
> and neither is it your job to forward error-reports on.
> or anyone else's job either, for that matter.   granted...
>
> so i don't think anyone can be assessed "blame" here.
>
> i'm just curious about how long an error-report
> will be left laying around before _someone_ acts
> on it, even though it's _not_ their "job" to do so...
>
> -bowerbird
>
From hart at pglaf.org  Tue Mar 14 14:36:43 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue Mar 14 14:36:44 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <4415DEC2.7040201@perathoner.de>
References: <244.8584560.31447b41@aol.com>
	<C04E20CD-EABF-45EB-9330-6EE51C64AE52@uni-trier.de>
	<Pine.LNX.4.60.0603131222570.3961@pglaf.org>
	<4415DEC2.7040201@perathoner.de>
Message-ID: <Pine.LNX.4.60.0603141432270.4132@pglaf.org>

On Mon, 13 Mar 2006, Marcello Perathoner wrote:

> Michael Hart wrote:
>
>> _I_ think MT will become one of the most MAJOR topics of PG,
>> and that we should do all we can to stay on top of MT items.
>
> I think robots will become the major producers of ebooks for PG and thus we 
> should stay ahead of robot technology.

So far this is a bit to generic a comment to be taken seriously.

Then again, I am not sure you MEANT it to be taken seriously.

However, send it every decade, and you'll probably be taken more
seriously each time.


> Being a visionary is easy if you have generic enough visions and you don't 
> commit to a timeline.

Only for the visionaries who do not insist on accomplishing their goals.

I wonder how long it will be before we see other people out there committed
to a lifetime career dedicated to the advancement of eBooks?



>
>
>
> Give the world visions in 2006!!!
>
>
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org
>
From Bowerbird at aol.com  Tue Mar 14 14:46:32 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Mar 14 14:46:39 2006
Subject: [gutvol-d] re: google and the translation thing
Message-ID: <77.5751cfea.3148a1c8@aol.com>

Skipped content of type multipart/alternative-------------- next part --------------
An embedded message was scrubbed...
From: Michael Hart <hart@pglaf.org>
Subject: Re: [gutvol-d] google and the translation thing
Date: Tue, 14 Mar 2006 14:36:43 -0800 (PST)
Size: 3421
Url: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060314/bb5db2c5/attachment.mht
From hart at pglaf.org  Tue Mar 14 14:58:56 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue Mar 14 14:58:57 2006
Subject: [gutvol-d] re: google and the translation thing
In-Reply-To: <77.5751cfea.3148a1c8@aol.com>
References: <77.5751cfea.3148a1c8@aol.com>
Message-ID: <Pine.LNX.4.60.0603141455030.4132@pglaf.org>


The truth is that I do NOT think it will be all that long.

However, I have been rather disapppointed to see many major
players, at least they SAID they were major players, in the
eBook world simply vanish.

Whap happened to:

Lou Burnard, Oxford Text Archive
Bob Hollander, Princeton/Rutgers?
Michael Seaman?, U Virginia

Not to mention all those people
posing for the cameras when the
big Google announcement hit?

However, I think it is more likely,
not less, that someone else should
come along to relaceme me in time.

;)


On Tue, 14 Mar 2006 Bowerbird@aol.com wrote:

> michael said:
>>    I wonder how long it will be before we see
>>    other people out there committed to a lifetime
>>    career dedicated to the advancement of eBooks?
>
> i think you're unique, michael, and always will be.
>
> that's not to negate the _serious_ committment of
> time and energy and love that _many_ people are
> donating to the cause, such as juliet at d.p. among
> others there, or nicholas hodson, or david harrada,
> or the whitewashers, and a whole slew of us others.
>
> but none have been as instrumental as you...
>
> -bowerbird
>
From jeroen.mailinglist at bohol.ph  Tue Mar 14 15:07:37 2006
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Tue Mar 14 15:02:46 2006
Subject: [gutvol-d] Producing texts for text-to-speech.
In-Reply-To: <Pine.GSO.4.58.0603141354540.22756@vtn1.victoria.tc.ca>
References: <21d.9c6beac.31472ca4@aol.com>
	<20060314061441.GD19944@pglaf.org>	<44173562.4060705@bohol.ph>
	<Pine.GSO.4.58.0603141354540.22756@vtn1.victoria.tc.ca>
Message-ID: <44174CB9.7040807@bohol.ph>


That is certainly interesting, but seems not to be oriented towards 
computerized text-to-speech. What I am primarily looking at is methods 
to automatically read books, and markup that assists in doing so. Daisy 
appears to work from audio files, probably human spoken. Note that the 
tagging I have in mind may also help human readers to produce a human 
read spoken book.

Jeroen.

Andrew Sly wrote:

>In case you have not seen it yet, I'd suggest taking a
>look at DAISY: http://www.daisy.org/
>
>Andrew
>
>  
>
From grythumn at gmail.com  Tue Mar 14 16:35:16 2006
From: grythumn at gmail.com (Robert Cicconetti)
Date: Tue Mar 14 16:41:47 2006
Subject: [gutvol-d] Producing texts for text-to-speech.
In-Reply-To: <44173562.4060705@bohol.ph>
References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org>
	<44173562.4060705@bohol.ph>
Message-ID: <15cfa2a50603141635k610ab610s950b4c67b3467a1@mail.gmail.com>

I use the AT&T Natural Voice engine for most of my general fiction*
conversion.. fairly resource intensive, but one of the better sounding
voices. I keep a list of standard substitutions as I notice them. The engine
does poorly on abbreviations and foreign loan words, and of course on
heteronyms. Lead, axes, alternate, etc. You can specify alternate
pronunciations in a phonetic language. Concatenated engines like Natural
Voices, Cepstral, Neospeech and RealSpeak are limited in how much you can
alter speed and timber before they get unusable.. NV tends to clip syllables
at anything above roughly +1 or +2. Most of these engines are available via
Nextup and other online retailers.

Freeware engines such Festival tend to have somewhat lower out-of-the-box
quality, but are more flexible (at least if you can tolerate LISP). In
particular, in a synthesized TTS engine, you can turn up the speech speed
much further before it becomes unintelligible, but it sometimes requires
practice to understand.

Synthesized speech compresses quite well with voice codecs.. if I'm not
using an external MP3 player, I'll compress it with Speex at quality 4 or 5.

R C
*(I generate audiobooks from Webscriptions and Gutenberg for commute and
other relative downtimes.)

On 3/14/06, Jeroen Hellingman (Mailing List Account) <
jeroen.mailinglist@bohol.ph> wrote:
>
> Hi All,
>
> I am studying the options for preparing ebooks for text-to-speech. Does
> anybody have experience with that and willing to share experience.
>
> I am looking at things like SSML, aural-CSS, and text-to-speech
> software. Any software that can support this? My intention is to add the
> relevant tags to my TEI master, and generate SSML from that, feed that
> to TTS software to obtain audio files (Ideally, I would only post the
> SSML, and let people regenerate the speech when needed). Any tools that
> can be advised?
>
> Things to consider are additional tags to disambiguate words with
> identical spelling (read and read; record and record, for example), and
> to help pronouncing dates, currency amounts, measures, abbreviations, etc.
>
> Issues I found is lack of support for things like aural CSS, expensive
> software, etc.
>
> Jeroen.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060314/62e3c20f/attachment-0001.html
From Bowerbird at aol.com  Tue Mar 14 17:07:32 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Mar 14 17:07:39 2006
Subject: [gutvol-d] re: google and the translation thing
Message-ID: <284.7380636.3148c2d4@aol.com>

michael said:
>    However, I think it is more likely, not less, that 
>    someone else should come along to replace me in time.

at this point, with cyberspace well-established,
and the sheer logic of electronic books (and
all manner of other types of digital content)
recognized by all, any people who "come along"
are stepping into a _completely_ different river.

so, um, no, they won't be "replacing" you...

good thing, too, because i think one of you
is quite enough, thank you very much...         :+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060314/e9f0408e/attachment.html
From hyphen at hyphenologist.co.uk  Tue Mar 14 23:44:24 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Tue Mar 14 23:44:37 2006
Subject: [gutvol-d] Producing texts for text-to-speech.
In-Reply-To: <44173562.4060705@bohol.ph>
References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org>
	<44173562.4060705@bohol.ph>
Message-ID: <g7hf12p0m21la5oo5opgd6l632o8d6q97t@4ax.com>

On Tue, 14 Mar 2006 22:28:02 +0100,  "Jeroen Hellingman (Mailing List
Account)" <jeroen.mailinglist@bohol.ph> wrote:

|Hi All,
|
|I am studying the options for preparing ebooks for text-to-speech. Does 
|anybody have experience with that and willing to share experience.
|
|I am looking at things like SSML, aural-CSS, and text-to-speech 
|software. Any software that can support this? My intention is to add the 
|relevant tags to my TEI master, and generate SSML from that, feed that 
|to TTS software to obtain audio files (Ideally, I would only post the 
|SSML, and let people regenerate the speech when needed). Any tools that 
|can be advised?
|
|Things to consider are additional tags to disambiguate words with 
|identical spelling (read and read; record and record, for example), and 
|to help pronouncing dates, currency amounts, measures, abbreviations, etc.

Not to mention the different forms of ?English?  American, Queens English,
Indian English, Strine, to mention but a few.  An American voice would
sound terrible after a  whole book.   
-- 
Dave Fawthrop <dave hyphenologist co uk>
Freedom of Speech, Expression, Religion, and Democracy are 
the keys to Civilization, together with legal acceptance of 
Fundamental Human rights.

From tb at baechler.net  Tue Mar 14 23:46:54 2006
From: tb at baechler.net (Tony Baechler)
Date: Tue Mar 14 23:46:22 2006
Subject: [gutvol-d] Producing texts for text-to-speech.
In-Reply-To: <Pine.GSO.4.58.0603141354540.22756@vtn1.victoria.tc.ca>
References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org>
	<44173562.4060705@bohol.ph>
	<Pine.GSO.4.58.0603141354540.22756@vtn1.victoria.tc.ca>
Message-ID: <7.0.1.0.2.20060314234320.02b24960@baechler.net>

Hi,

That format is primarily for the blind.  The format doesn't really do 
what is wanted.  You can either have xml files which can be read by a 
screen reader and special software or mp3 recordings or both, but the 
speech isn't generated any differently than for plain text files.  In 
fact, I download about 100 DAISY books per month and I always convert 
to plain text.  The DAISY software I've used is better about handling 
new pages and navigation than plain text but the speech output is 
still the same.  There is no way to do custom pronounciations or 
anything that I'm aware of.  Also, that format is specifically 
designed for the blind and I doubt there is much mainstream support 
for it.  Tools to convert tend to be expensive from what I've seen.

At 01:55 PM 3/14/2006, you wrote:

>In case you have not seen it yet, I'd suggest taking a
>look at DAISY: http://www.daisy.org/
>
>Andrew
>
>On Tue, 14 Mar 2006, Jeroen Hellingman (Mailing List Account) wrote:
>
> > Hi All,
> >
> > I am studying the options for preparing ebooks for text-to-speech. Does
> > anybody have experience with that and willing to share experience.
> >
> > I am looking at things like SSML, aural-CSS, and text-to-speech
> > software. Any software that can support this? My intention is to add the
> > relevant tags to my TEI master, and generate SSML from that, feed that
> > to TTS software to obtain audio files (Ideally, I would only post the
> > SSML, and let people regenerate the speech when needed). Any tools that
> > can be advised?
> >
> > Things to consider are additional tags to disambiguate words with
> > identical spelling (read and read; record and record, for example), and
> > to help pronouncing dates, currency amounts, measures, abbreviations, etc.
> >
> > Issues I found is lack of support for things like aural CSS, expensive
> > software, etc.

From Bowerbird at aol.com  Wed Mar 15 16:52:30 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Mar 15 16:52:40 2006
Subject: [gutvol-d] well, this is interesting
Message-ID: <2ea.34d74f4.314a10ce@aol.com>

evidently, distributed proofreaders is "branching out" from project 
gutenberg:
>   
http://www.solutiongrove.com/kmw/ctn/files/view/LINCTbiz_planKK_1.23.06.html

it says:
>   Distributed Proofreaders, a well-regarded group of volunteers, will 
provide 
>    public domain books. LibraryCity will contribute resources for DP to 
expand.

sounds cozy...

librarycity, one of the main organizations involved in this plan, has as its 
director
david rothman, and i'm sure that jon noring is involved with it somehow as 
well...

they're looking for "sponsors", suggesting "an annual fee of $1000",
or even all the way up to $350,000, which buys you a "thank you" from
within the browser of the one million of their clients you've sponsored...

and this:
>   LibraryCity?s revenue will come from several sources. 
>    It will partner with an Internet bookstore to 
>    obtain large numbers of e-books from publishers 
>    and to offer electronic books to libraries. 
>    The model will be a mix of purchase, short-term rentals 
>    and subscription fees. When libraries do not carry books, 
>    patrons will have an opportunity to rent or purchase them 
>    through the existing store and through the retail arm of 
>    LibraryCity called BookTry.com.

it continues:
>   LibraryCity and BookTry.com will help popularize the 
>   OpenReader format and interactive software that OSoft will offer. 
>    In return and also out of public-spiritedness, OSoft will agree 
>    to donate a certain percentage of its earnings and/or revenue 
>    to the Epie Institute, the 501(c)(3) pass-through, 
>    for use with LibraryCity and other partners within LINCT.

and then:
>    The above efficiencies and close relationship with OSoft, 
>    a provider of interactive software that allows comments and 
>    even blogs to be embedded within specific locations in books, 
>    will enable LibraryCity to be more competitive against 
>    such library-related companies as OverDrive.com.

like the subject says, "interesting..."

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060315/a7820071/attachment.html
From hyphen at hyphenologist.co.uk  Thu Mar 16 00:04:24 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Thu Mar 16 00:04:37 2006
Subject: [gutvol-d] well, this is interesting
In-Reply-To: <2ea.34d74f4.314a10ce@aol.com>
References: <2ea.34d74f4.314a10ce@aol.com>
Message-ID: <cs6i1254vvs12ree40ho3d1pnv19en39eh@4ax.com>

On Wed, 15 Mar 2006 19:52:30 EST,  Bowerbird@aol.com wrote:

|evidently, distributed proofreaders is "branching out" from project 
|gutenberg:
|>   
|http://www.solutiongrove.com/kmw/ctn/files/view/LINCTbiz_planKK_1.23.06.html
|
|it says:
|>   Distributed Proofreaders, a well-regarded group of volunteers, will 
|provide 
|>    public domain books. LibraryCity will contribute resources for DP to 
|expand.
|
|sounds cozy...
|
|librarycity, one of the main organizations involved in this plan, has as its 
|director
|david rothman, and i'm sure that jon noring is involved with it somehow as 
|well...
|
|they're looking for "sponsors", suggesting "an annual fee of $1000",
|or even all the way up to $350,000, which buys you a "thank you" from
|within the browser of the one million of their clients you've sponsored...
|
|and this:
|>   LibraryCity?s revenue will come from several sources. 
|>    It will partner with an Internet bookstore to 
|>    obtain large numbers of e-books from publishers 
|>    and to offer electronic books to libraries. 
|>    The model will be a mix of purchase, short-term rentals 
|>    and subscription fees. When libraries do not carry books, 
|>    patrons will have an opportunity to rent or purchase them 
|>    through the existing store and through the retail arm of 
|>    LibraryCity called BookTry.com.
|
|it continues:
|>   LibraryCity and BookTry.com will help popularize the 
|>   OpenReader format and interactive software that OSoft will offer. 
|>    In return and also out of public-spiritedness, OSoft will agree 
|>    to donate a certain percentage of its earnings and/or revenue 
|>    to the Epie Institute, the 501(c)(3) pass-through, 
|>    for use with LibraryCity and other partners within LINCT.
|
|and then:
|>    The above efficiencies and close relationship with OSoft, 
|>    a provider of interactive software that allows comments and 
|>    even blogs to be embedded within specific locations in books, 
|>    will enable LibraryCity to be more competitive against 
|>    such library-related companies as OverDrive.com.
|
|like the subject says, "interesting..."

*If* this happens I wonder how many volunteers DP will lose.
-- 
Dave Fawthrop <dave hyphenologist co uk>
Freedom of Speech, Expression, Religion, and Democracy are 
the keys to Civilization, together with legal acceptance of 
Fundamental Human rights.

From davedoty at hotmail.com  Thu Mar 16 00:18:55 2006
From: davedoty at hotmail.com (Dave Doty)
Date: Thu Mar 16 00:36:00 2006
Subject: [gutvol-d] well, this is interesting
In-Reply-To: <cs6i1254vvs12ree40ho3d1pnv19en39eh@4ax.com>
Message-ID: <BAY101-F14D77BF15C570E283B3897DFE70@phx.gbl>

From: Dave Fawthrop <hyphen@hyphenologist.co.uk>

>*If* this happens I wonder how many volunteers DP will lose.

Why would they lose any?  They give DP resources to expand, and use the 
books.  Since they are already free to use the books, the only thing that 
would change is more financial resources for DP.  It didn't say anything 
about exclusive use and even if they tried, well they admitted right there 
on the webpage that the books are public domain, so they wouldn't be able to 
keep PG or anyone else from using them.

Dave Doty


From hyphen at hyphenologist.co.uk  Thu Mar 16 00:40:23 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Thu Mar 16 00:40:34 2006
Subject: [gutvol-d] well, this is interesting
In-Reply-To: <BAY101-F14D77BF15C570E283B3897DFE70@phx.gbl>
References: <cs6i1254vvs12ree40ho3d1pnv19en39eh@4ax.com>
	<BAY101-F14D77BF15C570E283B3897DFE70@phx.gbl>
Message-ID: <us8i12le0oenfvkdmgga78qr9j4rm32qhv@4ax.com>

On Thu, 16 Mar 2006 08:18:55 +0000,  "Dave Doty" <davedoty@hotmail.com>
wrote:

|From: Dave Fawthrop <hyphen@hyphenologist.co.uk>
|
|>*If* this happens I wonder how many volunteers DP will lose.
|
|Why would they lose any?  

Because I do books for PG, Pro Bono Publico.
Any dilution of this principle by association with commercial organisations
would concern me. 
-- 
Dave Fawthrop <dave hyphenologist co uk>
Freedom of Speech, Expression, Religion, and Democracy are 
the keys to Civilization, together with legal acceptance of 
Fundamental Human rights.

From schultzk at uni-trier.de  Thu Mar 16 04:38:43 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Thu Mar 16 04:38:54 2006
Subject: [gutvol-d] Producing texts for text-to-speech.
In-Reply-To: <g7hf12p0m21la5oo5opgd6l632o8d6q97t@4ax.com>
References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org>
	<44173562.4060705@bohol.ph>
	<g7hf12p0m21la5oo5opgd6l632o8d6q97t@4ax.com>
Message-ID: <95474F1D-CBF9-4340-B3C1-0B58D9740B16@uni-trier.de>

Hi There,

	If you have a Mac it will read it for you.
	You can also customize the dictionary.
	There is also a programming interface if
	you really want high quality output, you can even
	create your own voices.

	I personally have not played with it. It has been
	around for a long time.

		Keith.

Am 15.03.2006 um 08:44 schrieb Dave Fawthrop:

> On Tue, 14 Mar 2006 22:28:02 +0100,  "Jeroen Hellingman (Mailing List
> Account)" <jeroen.mailinglist@bohol.ph> wrote:
>
> |Hi All,
> |
> |I am studying the options for preparing ebooks for text-to-speech.  
> Does
> |anybody have experience with that and willing to share experience.
> |
> |I am looking at things like SSML, aural-CSS, and text-to-speech
> |software. Any software that can support this? My intention is to  
> add the
> |relevant tags to my TEI master, and generate SSML from that, feed  
> that
> |to TTS software to obtain audio files (Ideally, I would only post the
> |SSML, and let people regenerate the speech when needed). Any tools  
> that
> |can be advised?
> |
> |Things to consider are additional tags to disambiguate words with
> |identical spelling (read and read; record and record, for  
> example), and
> |to help pronouncing dates, currency amounts, measures,  
> abbreviations, etc.
>
> Not to mention the different forms of ?English?  American, Queens  
> English,
> Indian English, Strine, to mention but a few.  An American voice would
> sound terrible after a  whole book.
> -- 
> Dave Fawthrop <dave hyphenologist co uk>
> Freedom of Speech, Expression, Religion, and Democracy are
> the keys to Civilization, together with legal acceptance of
> Fundamental Human rights.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From joshua at hutchinson.net  Thu Mar 16 05:59:56 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Mar 16 05:59:23 2006
Subject: [gutvol-d] well, this is interesting
Message-ID: <20060316135956.D0C8F2F998@ws6-3.us4.outblaze.com>

Not to be a smart-ass ... but you better stop now, Dave.  Commercial publishers snarf PG stuff all the time.

I bought a lovely two volume set of all the OZ books for my son last year.  As we were reading it, I noticed some typos and such.  On a hunch, I compared the typos to our files.  They are snarfed PG text (and didn't even proof it again) and stripped the PG notices and printing a book.

Personally, I don't have a problem with commercial interests using PG/DP stuff.  As long as they don't try to claim an additional copyright (which they sometimes do) or leave the PG trademark in place and not pay us (which I've never actually seen).

Josh

> ----- Original Message -----
> From: "Dave Fawthrop" <hyphen@hyphenologist.co.uk>
> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
> Subject: Re: [gutvol-d] well, this is interesting
> Date: Thu, 16 Mar 2006 08:40:23 +0000
> 
> 
> On Thu, 16 Mar 2006 08:18:55 +0000,  "Dave Doty" <davedoty@hotmail.com>
> wrote:
> 
> |From: Dave Fawthrop <hyphen@hyphenologist.co.uk>
> |
> |>*If* this happens I wonder how many volunteers DP will lose.
> |
> |Why would they lose any?
> 
> Because I do books for PG, Pro Bono Publico.
> Any dilution of this principle by association with commercial organisations
> would concern me.
> --
> Dave Fawthrop <dave hyphenologist co uk>
> Freedom of Speech, Expression, Religion, and Democracy are
> the keys to Civilization, together with legal acceptance of
> Fundamental Human rights.
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

>

From holden.mcgroin at dsl.pipex.com  Thu Mar 16 07:57:11 2006
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Thu Mar 16 07:57:16 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <264BC548-43B0-40FF-900A-CBCFC8914FAF@uni-trier.de>
References: <260.82f14a6.31414432@aol.com>
	<Pine.LNX.4.60.0603092055030.32091@pglaf.org>
	<C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de>
	<1141986255.20173.15.camel@steve-mcqueen>
	<264BC548-43B0-40FF-900A-CBCFC8914FAF@uni-trier.de>
Message-ID: <1142524632.14007.34.camel@steve-mcqueen>

Hi!

On Fri, 2006-03-10 at 12:33 +0100, Keith J. Schultz wrote:
> Hello,
> 
> Am 10.03.2006 um 11:24 schrieb Holden McGroin:
> 
> > On Fri, 2006-03-10 at 10:32 +0100, Keith J. Schultz wrote:
> >> text. Today, dictionaries are used to guess which words are
> >> to be recognised. That is why the OCR systems today give us
> >> better results if the original has DECENT quality!!!
> >
> >> The pattern recognition systems have not gotten better and
> >> the dictionary trick takes the motivation away to
> >> develop better OCR algorithms.
> >
> > I'm going to have to call bullshit here. As a researcher working in  
> > the
> > field of document recognition, I've noticed tremendous improvements in
> > OCR quality even just in the past five years.
> Before you start to swear, read and understand! Maybe in the
> development labs, but not for the non-high end user!!!!

OCR results are improving across the board. One only has to compare
Finereader 8, a mainstream OCR product, with version 5 or so to see the
improvement in standard OCR packages over the last 5 years. Recognition
quality improves (where there is room for improvement) and so does the
range of documents which can be recognised. Each passing year brings
improvements in quality for older, noisy and lower quality documents.
Again, I stress that this is *real-world* improvement in mainstream OCR
products.

In your initial post, you stated that the "dictionary trick" takes away
the motivation to develop better OCR algorithms. Yet, it is still an
extremely active research subject. Perhaps you're not familiar with the
research community around OCR but there are many major conferences,
workshops and journals devoted entirely or mainly to the task of
digitising documents.

And of course, where do you think the improvements in mainstream OCR
applications come from? Yesterday's innovation in the research lab forms
the basis of new features in today's commercial OCR packages. Likewise,
the work that's going on now in the lab will improve tomorrow's OCR
packages.

> We have not seen any improvements in the field for the past five
> years!!! The improvements are mainly due to the use of dictionaries!!
> Not the improvement of character recognition!! Most systems in the
> field get their performance out of word recognition !!!

Well, that's a nice statement to make since the vast majority of systems
in the field are black-box commercial systems. How do you know where the
performance comes from? I'm a researcher in the field. I attend
conferences and read journals and I don't know much about the internals
of ABBYY. Unsurprisingly, it's something they keep under close wraps.

So all you really have is the fact that commercial (and research) OCR
systems are improving and your unfounded assertion that the improvements
are mainly due to dictionaries.

> I did mean to say not there is no improvement in Optical
> Character Recognition, but the improvment over the past
> 10 years is minimal at most. When I see a OCR system that
> just uses raw results, then I will bow my head in recognition
> of true achieve meant. Furthermore, when the image processing
> gets that far it will open up new possiblities in all kinds
> of sciences.

There are countless tools which can be used to improve OCR performance.
Using dictionary lookups is just one tool in the box. OCR is improving
using many different techniques. I've been observing improvements in
many different areas over the last few years (as long as I've been in
the area), including:

 - Improvements in low-level Image processing techniques
 - Improvements in feature extraction from characters
 - Improvements in character recognition based on those features

If you don't like dictionary lookups, don't use them. Raw OCR
performance is improving in the lab and in the marketplace and is
already great for a large proportion of documents. I must apologise on
behalf of the research community if you find the rate of progress to be
inadequate.

That said, if you don't like it, muck in. There are many research labs
around the world working on improving OCR and related techniques and I'm
sure they'd be glad to have someone as knowledgeable as yourself join.
There are even a few Free Software / Open Source OCR systems which would
gladly welcome any interested developers:

Ocrad:     http://www.gnu.org/software/ocrad/ocrad.html
GOCR/JOCR: http://jocr.sourceforge.net/
ClaraOCR:  http://www.geocities.com/claraocr/


Cheers,
Holden

From Bowerbird at aol.com  Thu Mar 16 11:15:24 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Mar 16 11:15:36 2006
Subject: [gutvol-d] well, this is interesting
Message-ID: <2da.47e050a.314b134c@aol.com>

dave said:
>    *If* this happens I wonder how many volunteers DP will lose.

well, i would guess an agreement has _already_ been forged,
or else the d.p. name wouldn't have been used.

but i don't see that it should cause the loss of many volunteers,
because the aim remains to get e-texts out there.

after all, anyone can take the e-texts from project gutenberg
and do _whatever_they_want_ with them, just as long as they
don't use the name of project gutenberg, right?

that's what the public domain is all about.

and yes, it is kind of sleazy to use public-domain content
as the "free samples" to bring people through the door for
your commercial content, but what would we do about that?
hell, some of these guys sell the public-domain stuff as well!
(and if they can sell what we give away, more power to 'em.)

i think i've made it perfectly clear that i'm not a fan of either
rothman or noring, and i have long observed they have been
trying to get their mitts on project gutenberg, so this doesn't
surprise me from their end, but it's curious d.p. played along.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060316/aff298c8/attachment-0001.html
From marcello at perathoner.de  Thu Mar 16 11:57:50 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Mar 16 11:57:57 2006
Subject: [gutvol-d] well, this is interesting
In-Reply-To: <2ea.34d74f4.314a10ce@aol.com>
References: <2ea.34d74f4.314a10ce@aol.com>
Message-ID: <4419C33E.6020107@perathoner.de>

Bowerbird@aol.com wrote:

>> Distributed Proofreaders, a well-regarded group of volunteers, will
>> provide public domain books. LibraryCity will contribute resources
>> for DP to expand.

LibraryCity / OSoft / openreader.org is just a bunch of self-referential 
entites trying to give some credibility to each other. Like 
google-spammers build link-farms these people are building 
organisation-farms.

Nothing much to bother about. Norings results so far are 
failure-to-failure comparable to yours.


> they're looking for "sponsors", suggesting "an annual fee of $1000", 
> or even all the way up to $350,000, which buys you a "thank you" from
>  within the browser of the one million of their clients you've
> sponsored...

This is worth a new thread ...


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Thu Mar 16 12:40:23 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Mar 16 12:40:30 2006
Subject: [gutvol-d] Sponsors
Message-ID: <4419CD37.2000505@perathoner.de>

 From a different thread:

>> they're looking for "sponsors", suggesting "an annual fee of $1000", 
>> or even all the way up to $350,000, which buys you a "thank you" from
>>  within the browser of the one million of their clients you've
>> sponsored...

I don't know about their millions of clients but the PG website is now 
ranked top 3000 at alexa.com and serving ~250K pages to ~50K hosts a 
day. We have a Google page-rank of 8. To get that spammers would feed 
their mothers to the Ravenous Bugblatter Beast of Traal.


We could put an ad space at the top of every page. I'm thinking of 
text-only ads, no distracting images. We could cycle ads like this:

   Did you know that you can help producing ebooks investing
   just ten minutes a day? www.pgdp.net

   Sponsor PG and get your web site mentioned here.
   See: www.gutenberg.org/fundraising/sponsoring

   We thank the Curl Up and Dye hair parlor for their
   kind gift of $1000. www.curl-up-and-dye.com


Do we want to do this? And what rules should we put in place?

Is selling ads compatible with the non-for-profit status?

Anybody out there with an internet marketing background to figure out 
what we could "charge" for this ad space?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From hart at pglaf.org  Thu Mar 16 12:53:23 2006
From: hart at pglaf.org (Michael Hart)
Date: Thu Mar 16 12:53:25 2006
Subject: [gutvol-d] OCR Trends, and Not:  was Google Translation
In-Reply-To: <1142524632.14007.34.camel@steve-mcqueen>
References: <260.82f14a6.31414432@aol.com>
	<Pine.LNX.4.60.0603092055030.32091@pglaf.org>
	<C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de>
	<1141986255.20173.15.camel@steve-mcqueen>
	<264BC548-43B0-40FF-900A-CBCFC8914FAF@uni-trier.de>
	<1142524632.14007.34.camel@steve-mcqueen>
Message-ID: <Pine.LNX.4.60.0603161231580.20273@pglaf.org>


The following messages give widely opposing points of view.

The reason could be, as one stated, that the bottom of the
line scanner and OCR combinations are not yet good enough,
at least for that person's particular needs.

My own observation is that it might simply be the wrong
tool for the wrong application.

We all see more and more features in calculators that are
under $100, and even under $10, to the point where no one
is really going to say the TI-84 has no improvements over
the previous versions, even if you get it for $60, as the
price was where I saw it last.

If your applications are simple four function arithmetic,
there isn't much point in comparing any new calculators--
they will all do what you want, and the hardware may be a
more important aspect than the software. . . how long the
calculator and/or the batteries will last, etc.

To those who really need a supercomputer, no difference.


The same is most likely true of scanners and OCR combos--
some improvements may not apply to what YOU are doing and
others may be totally beyond any appplications you have a
mind to be using.

The same is true for all those different kinds of cheaper
calculators out there.

It sounds a little as if one person in this conversation,
I didn't keep track of various portions and names, was an
example of the person who says it does not matter at all,
because none of them create perfect results.

To this kind of person it doesn't matter how full a glass
is getting, until that very last drop is added, then that
glass becomes full, otherwise it is empty.

The exact same thing has been said here and there via the
error rate for eBooks.

If a certain element of perfection is missing, then ebook
value remains zero even though the paper book has errors.

By the way, I saw what appeared to be a perfect scan/OCR,
at least 10 years ago, perhaps 15, on the original Apple-
Flatbed scanner.  I forget the model and the OCR, but the
demonstration certainly made me wake up to OCR more and I
eventually talked Apple into giving me a Mac and scanner.

Thanks Apple!!!

Thanks Steve Cisler!!!

More to the point about the current topic is what a user
wants out of the hardware/software combination.

If you don't do your homework when buying these, you are
not likely to get what you want.

However, and I stress this, the people in these messages
are VERY likely, given their positions, to find salesmen
and saleswomen who would be MORE than happy to show your
people their products and answer questions.

Just contact them. . .your report of their demonstration
will multiply the effect of their work!

This would probably be of great interest to us all.

I wonder if the next time we have some kind of meeting--
should we invite some demonstrations???

Michael

PS
On the topic of calculator, I heard that even if it is
not your thing to use something like Encarta, that the
current version includes a calculator program that may
be worth more than the cost of the entire Encarta.

Anyone seen it?


On Thu, 16 Mar 2006, Holden McGroin wrote:

> Hi!
>
> On Fri, 2006-03-10 at 12:33 +0100, Keith J. Schultz wrote:
>> Hello,
>>
>> Am 10.03.2006 um 11:24 schrieb Holden McGroin:
>>
>>> On Fri, 2006-03-10 at 10:32 +0100, Keith J. Schultz wrote:
>>>> text. Today, dictionaries are used to guess which words are
>>>> to be recognised. That is why the OCR systems today give us
>>>> better results if the original has DECENT quality!!!
>>>
>>>> The pattern recognition systems have not gotten better and
>>>> the dictionary trick takes the motivation away to
>>>> develop better OCR algorithms.
>>>
>>> I'm going to have to call bullshit here. As a researcher working in
>>> the
>>> field of document recognition, I've noticed tremendous improvements in
>>> OCR quality even just in the past five years.
>> Before you start to swear, read and understand! Maybe in the
>> development labs, but not for the non-high end user!!!!
>
> OCR results are improving across the board. One only has to compare
> Finereader 8, a mainstream OCR product, with version 5 or so to see the
> improvement in standard OCR packages over the last 5 years. Recognition
> quality improves (where there is room for improvement) and so does the
> range of documents which can be recognised. Each passing year brings
> improvements in quality for older, noisy and lower quality documents.
> Again, I stress that this is *real-world* improvement in mainstream OCR
> products.
>
> In your initial post, you stated that the "dictionary trick" takes away
> the motivation to develop better OCR algorithms. Yet, it is still an
> extremely active research subject. Perhaps you're not familiar with the
> research community around OCR but there are many major conferences,
> workshops and journals devoted entirely or mainly to the task of
> digitising documents.
>
> And of course, where do you think the improvements in mainstream OCR
> applications come from? Yesterday's innovation in the research lab forms
> the basis of new features in today's commercial OCR packages. Likewise,
> the work that's going on now in the lab will improve tomorrow's OCR
> packages.
>
>> We have not seen any improvements in the field for the past five
>> years!!! The improvements are mainly due to the use of dictionaries!!
>> Not the improvement of character recognition!! Most systems in the
>> field get their performance out of word recognition !!!
>
> Well, that's a nice statement to make since the vast majority of systems
> in the field are black-box commercial systems. How do you know where the
> performance comes from? I'm a researcher in the field. I attend
> conferences and read journals and I don't know much about the internals
> of ABBYY. Unsurprisingly, it's something they keep under close wraps.
>
> So all you really have is the fact that commercial (and research) OCR
> systems are improving and your unfounded assertion that the improvements
> are mainly due to dictionaries.
>
>> I did mean to say not there is no improvement in Optical
>> Character Recognition, but the improvment over the past
>> 10 years is minimal at most. When I see a OCR system that
>> just uses raw results, then I will bow my head in recognition
>> of true achieve meant. Furthermore, when the image processing
>> gets that far it will open up new possiblities in all kinds
>> of sciences.
>
> There are countless tools which can be used to improve OCR performance.
> Using dictionary lookups is just one tool in the box. OCR is improving
> using many different techniques. I've been observing improvements in
> many different areas over the last few years (as long as I've been in
> the area), including:
>
> - Improvements in low-level Image processing techniques
> - Improvements in feature extraction from characters
> - Improvements in character recognition based on those features
>
> If you don't like dictionary lookups, don't use them. Raw OCR
> performance is improving in the lab and in the marketplace and is
> already great for a large proportion of documents. I must apologise on
> behalf of the research community if you find the rate of progress to be
> inadequate.
>
> That said, if you don't like it, muck in. There are many research labs
> around the world working on improving OCR and related techniques and I'm
> sure they'd be glad to have someone as knowledgeable as yourself join.
> There are even a few Free Software / Open Source OCR systems which would
> gladly welcome any interested developers:
>
> Ocrad:     http://www.gnu.org/software/ocrad/ocrad.html
> GOCR/JOCR: http://jocr.sourceforge.net/
> ClaraOCR:  http://www.geocities.com/claraocr/
>
>
> Cheers,
> Holden
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From brandon.galbraith at gmail.com  Thu Mar 16 13:06:21 2006
From: brandon.galbraith at gmail.com (Brandon Galbraith)
Date: Thu Mar 16 13:06:23 2006
Subject: [gutvol-d] Sponsors
In-Reply-To: <4419CD37.2000505@perathoner.de>
References: <4419CD37.2000505@perathoner.de>
Message-ID: <366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com>

In most cases, you charged based on "impressions", as in how many people see
the ad. There are plenty of open source banner ad systems out there that
could be modified to fit the need (including a price scale for the amount of
ads purchased). Then again, Google AdSense could just be used =) *ducks*

-brandon

On 3/16/06, Marcello Perathoner <marcello@perathoner.de> wrote:
>
> From a different thread:
>
> >> they're looking for "sponsors", suggesting "an annual fee of $1000",
> >> or even all the way up to $350,000, which buys you a "thank you" from
> >>  within the browser of the one million of their clients you've
> >> sponsored...
>
> I don't know about their millions of clients but the PG website is now
> ranked top 3000 at alexa.com and serving ~250K pages to ~50K hosts a
> day. We have a Google page-rank of 8. To get that spammers would feed
> their mothers to the Ravenous Bugblatter Beast of Traal.
>
>
> We could put an ad space at the top of every page. I'm thinking of
> text-only ads, no distracting images. We could cycle ads like this:
>
>    Did you know that you can help producing ebooks investing
>    just ten minutes a day? www.pgdp.net
>
>    Sponsor PG and get your web site mentioned here.
>    See: www.gutenberg.org/fundraising/sponsoring
>
>    We thank the Curl Up and Dye hair parlor for their
>    kind gift of $1000. www.curl-up-and-dye.com
>
>
> Do we want to do this? And what rules should we put in place?
>
> Is selling ads compatible with the non-for-profit status?
>
> Anybody out there with an internet marketing background to figure out
> what we could "charge" for this ad space?
>
>
> --
> Marcello Perathoner
> webmaster@gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>



--
Brandon Galbraith
Email: brandon.galbraith@gmail.com
AIM: brandong00
Voice: 630.400.6992
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060316/7773b94e/attachment.html
From hart at pglaf.org  Thu Mar 16 13:19:26 2006
From: hart at pglaf.org (Michael Hart)
Date: Thu Mar 16 13:19:28 2006
Subject: [gutvol-d] well, this is interesting
In-Reply-To: <20060316135956.D0C8F2F998@ws6-3.us4.outblaze.com>
References: <20060316135956.D0C8F2F998@ws6-3.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0603161319010.20273@pglaf.org>



John, wanna tell us the brand name of the Oz set, how much it was, etc?

mh

On Thu, 16 Mar 2006, Joshua Hutchinson wrote:

> Not to be a smart-ass ... but you better stop now, Dave.  Commercial publishers snarf PG stuff all the time.
>
> I bought a lovely two volume set of all the OZ books for my son last year.  As we were reading it, I noticed some typos and such.  On a hunch, I compared the typos to our files.  They are snarfed PG text (and didn't even proof it again) and stripped the PG notices and printing a book.
>
> Personally, I don't have a problem with commercial interests using PG/DP stuff.  As long as they don't try to claim an additional copyright (which they sometimes do) or leave the PG trademark in place and not pay us (which I've never actually seen).
>
> Josh
>
>> ----- Original Message -----
>> From: "Dave Fawthrop" <hyphen@hyphenologist.co.uk>
>> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
>> Subject: Re: [gutvol-d] well, this is interesting
>> Date: Thu, 16 Mar 2006 08:40:23 +0000
>>
>>
>> On Thu, 16 Mar 2006 08:18:55 +0000,  "Dave Doty" <davedoty@hotmail.com>
>> wrote:
>>
>> |From: Dave Fawthrop <hyphen@hyphenologist.co.uk>
>> |
>> |>*If* this happens I wonder how many volunteers DP will lose.
>> |
>> |Why would they lose any?
>>
>> Because I do books for PG, Pro Bono Publico.
>> Any dilution of this principle by association with commercial organisations
>> would concern me.
>> --
>> Dave Fawthrop <dave hyphenologist co uk>
>> Freedom of Speech, Expression, Religion, and Democracy are
>> the keys to Civilization, together with legal acceptance of
>> Fundamental Human rights.
>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From hyphen at hyphenologist.co.uk  Thu Mar 16 14:08:19 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Thu Mar 16 14:08:31 2006
Subject: [gutvol-d] well, this is interesting
In-Reply-To: <BAY101-F14D77BF15C570E283B3897DFE70@phx.gbl>
References: <cs6i1254vvs12ree40ho3d1pnv19en39eh@4ax.com>
	<BAY101-F14D77BF15C570E283B3897DFE70@phx.gbl>
Message-ID: <78oj129bgqglu075e12lg9c3e5fbim9ufv@4ax.com>

On Thu, 16 Mar 2006 08:18:55 +0000,  "Dave Doty" <davedoty@hotmail.com>
wrote:

|From: Dave Fawthrop <hyphen@hyphenologist.co.uk>
|
|>*If* this happens I wonder how many volunteers DP will lose.
|
|Why would they lose any?  They give DP resources to expand, and use the 
|books.  Since they are already free to use the books, the only thing that 
|would change is more financial resources for DP.  It didn't say anything 
|about exclusive use and even if they tried, well they admitted right there 
|on the webpage that the books are public domain, so they wouldn't be able to 
|keep PG or anyone else from using them.

In the UK "He who pays the Piper calls the tune."
Does this not happen in the USA?
-- 
Dave Fawthrop <dave hyphenologist co uk>
Freedom of Speech, Expression, Religion, and Democracy are 
the keys to Civilization, together with legal acceptance of 
Fundamental Human rights.

From Bowerbird at aol.com  Thu Mar 16 14:10:45 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Mar 16 14:11:02 2006
Subject: [gutvol-d] Sponsors
Message-ID: <302.c71630.314b3c65@aol.com>

marcello said:
>    Do we want to do this?

no!

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060316/c1aad754/attachment.html
From donovan at abs.net  Thu Mar 16 15:38:56 2006
From: donovan at abs.net (D Garcia)
Date: Thu Mar 16 15:39:08 2006
Subject: [gutvol-d] OCR Trends, and Not:  was Google Translation
In-Reply-To: <Pine.LNX.4.60.0603161231580.20273@pglaf.org>
References: <260.82f14a6.31414432@aol.com>
	<1142524632.14007.34.camel@steve-mcqueen>
	<Pine.LNX.4.60.0603161231580.20273@pglaf.org>
Message-ID: <200603161838.57005.donovan@abs.net>

On Thursday 16 March 2006 03:53 pm, Michael Hart wrote:
(a lot of things, but I wanted to keep the thread separated. Hi, Michael!)

> On Thu, 16 Mar 2006, Holden wrote:
> > There are even a few Free Software / Open Source OCR systems which would
> > gladly welcome any interested developers:
> >
> > Ocrad:     http://www.gnu.org/software/ocrad/ocrad.html
> > GOCR/JOCR: http://jocr.sourceforge.net/
> > ClaraOCR:  http://www.geocities.com/claraocr/

I'm not a researcher in the field, but I have mucked in on ocrad (which is a 
single developer project), and managed to get two minor patches accepted.
Frankly, though, each of those packages uses very different approaches and 
native internal formats, and mostly rely on simpler models to recognize 
characters. ocrad almost exclusively depends on feature recognition in b/w 
and has a very simplistic confidence model. The others I don't recall details 
of off the top of my head, but I believe one of them was trying to use 
feature recognition plus same-page similarity modeling. I don't believe any 
of them use the "dictionary trick" and they all pretty much fail on merged 
and broken characters.

From black-box observation, FR seems to start with feature recognition, and 
uses similarity, curve reconstruction, adaptive thresholding, and even 
outline tracing for comparison/similarity against ttf font curves. I suspect 
they may also be using digraph and trigraph frequencies (at least for 
English) to improve their confidence scorings. Probably they also compare 
same-page word shapes to resolve cases where a character in a bounded word 
has low confidence value.

At any rate, you'd have to be pretty damned dedicated and/or already fairly 
knowledgeable in several disciplines to contribute significantly to these 
projects. IMO, the single biggest improvement anyone could offer one of these 
open source projects is a better way to bound broken and merged characters.
Feature recognition does a fairly good job up to that point.
From hart at pglaf.org  Thu Mar 16 21:12:25 2006
From: hart at pglaf.org (Michael Hart)
Date: Thu Mar 16 21:12:27 2006
Subject: [gutvol-d] well, this is interesting
In-Reply-To: <78oj129bgqglu075e12lg9c3e5fbim9ufv@4ax.com>
References: <cs6i1254vvs12ree40ho3d1pnv19en39eh@4ax.com>
	<BAY101-F14D77BF15C570E283B3897DFE70@phx.gbl>
	<78oj129bgqglu075e12lg9c3e5fbim9ufv@4ax.com>
Message-ID: <Pine.LNX.4.60.0603162108420.31395@pglaf.org>


On Thu, 16 Mar 2006, Dave Fawthrop wrote:

> On Thu, 16 Mar 2006 08:18:55 +0000,  "Dave Doty" <davedoty@hotmail.com>
> wrote:
>
> |From: Dave Fawthrop <hyphen@hyphenologist.co.uk>
> |
> |>*If* this happens I wonder how many volunteers DP will lose.
> |
> |Why would they lose any?  They give DP resources to expand, and use the
> |books.  Since they are already free to use the books, the only thing that
> |would change is more financial resources for DP.  It didn't say anything
> |about exclusive use and even if they tried, well they admitted right there
> |on the webpage that the books are public domain, so they wouldn't be able to
> |keep PG or anyone else from using them.
>
> In the UK "He who pays the Piper calls the tune."
> Does this not happen in the USA?

This is WHY Project Gutenberg has remained independent.

This is HOW Project Gutenberg has remained independent.

This is why I have been willing to work the last three
years without any salary, so PG remains independent.



Michael S. Hart
Founder
Project Gutenberg
From schultzk at uni-trier.de  Fri Mar 17 00:16:56 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Fri Mar 17 00:17:03 2006
Subject: [gutvol-d] google and the translation thing
In-Reply-To: <1142524632.14007.34.camel@steve-mcqueen>
References: <260.82f14a6.31414432@aol.com>
	<Pine.LNX.4.60.0603092055030.32091@pglaf.org>
	<C4AA70A8-E0D8-474B-9703-7832E5267228@uni-trier.de>
	<1141986255.20173.15.camel@steve-mcqueen>
	<264BC548-43B0-40FF-900A-CBCFC8914FAF@uni-trier.de>
	<1142524632.14007.34.camel@steve-mcqueen>
Message-ID: <759D61C7-5021-4A57-98B2-FB958E55B88E@uni-trier.de>

Hi Holden,

	Thank you for your kind and sober reply.
	I did not intend to offend the OCR developers
	or say that their is no improvement.
	Basically, all comercial products use
	somekind of "vodoo" for better results.
	That is their perfect right.

	As a reseachers know that money is the motor
	to efficiently progress. Companies want the
	results yesterday and do not care if the improvements
	in their product is due to "vodoo" or improvement
	in the fundemental technology.

	I have had to study the technology and decided to use it
	or not. I generally do not as that results I required in my field
	take up to many resources for most of my goals. There are
	cheaper ways of getteng things done resource wise.

	OCR would be just one tool that I use and is just the beginning of
	what I want and need to do.

  	It took me 20 years to own my own scanner, and believe me I did  
not get it
	for OCR. Still waiting and willing to wait for the quality I consider
	adequate.

	Believe me. I would finance OCR reseacher to get 99 % recognition  
out of the
	box if i could. I do know how hard it is to get money for research.

	One a side track here. Humans do not recognize Characters, but words  
and phrases.
	That is how we learn to read!!!

		regards
			Keith.

Am 16.03.2006 um 16:57 schrieb Holden McGroin:

> Hi!
>
> On Fri, 2006-03-10 at 12:33 +0100, Keith J. Schultz wrote:
>> Hello,
>>
>> Am 10.03.2006 um 11:24 schrieb Holden McGroin:
>>
>>> On Fri, 2006-03-10 at 10:32 +0100, Keith J. Schultz wrote:
>>>> text. Today, dictionaries are used to guess which words are
>>>> to be recognised. That is why the OCR systems today give us
>>>> better results if the original has DECENT quality!!!
>>>
>>>> The pattern recognition systems have not gotten better and
>>>> the dictionary trick takes the motivation away to
>>>> develop better OCR algorithms.
>>>
>>> I'm going to have to call bullshit here. As a researcher working in
>>> the
>>> field of document recognition, I've noticed tremendous  
>>> improvements in
>>> OCR quality even just in the past five years.
>> Before you start to swear, read and understand! Maybe in the
>> development labs, but not for the non-high end user!!!!
>
> OCR results are improving across the board. One only has to compare
> Finereader 8, a mainstream OCR product, with version 5 or so to see  
> the
> improvement in standard OCR packages over the last 5 years.  
> Recognition
> quality improves (where there is room for improvement) and so does the
> range of documents which can be recognised. Each passing year brings
> improvements in quality for older, noisy and lower quality documents.
> Again, I stress that this is *real-world* improvement in mainstream  
> OCR
> products.
>
> In your initial post, you stated that the "dictionary trick" takes  
> away
> the motivation to develop better OCR algorithms. Yet, it is still an
> extremely active research subject. Perhaps you're not familiar with  
> the
> research community around OCR but there are many major conferences,
> workshops and journals devoted entirely or mainly to the task of
> digitising documents.
>
> And of course, where do you think the improvements in mainstream OCR
> applications come from? Yesterday's innovation in the research lab  
> forms
> the basis of new features in today's commercial OCR packages.  
> Likewise,
> the work that's going on now in the lab will improve tomorrow's OCR
> packages.
>
>> We have not seen any improvements in the field for the past five
>> years!!! The improvements are mainly due to the use of dictionaries!!
>> Not the improvement of character recognition!! Most systems in the
>> field get their performance out of word recognition !!!
>
> Well, that's a nice statement to make since the vast majority of  
> systems
> in the field are black-box commercial systems. How do you know  
> where the
> performance comes from? I'm a researcher in the field. I attend
> conferences and read journals and I don't know much about the  
> internals
> of ABBYY. Unsurprisingly, it's something they keep under close wraps.
>
> So all you really have is the fact that commercial (and research) OCR
> systems are improving and your unfounded assertion that the  
> improvements
> are mainly due to dictionaries.
>
>> I did mean to say not there is no improvement in Optical
>> Character Recognition, but the improvment over the past
>> 10 years is minimal at most. When I see a OCR system that
>> just uses raw results, then I will bow my head in recognition
>> of true achieve meant. Furthermore, when the image processing
>> gets that far it will open up new possiblities in all kinds
>> of sciences.
>
> There are countless tools which can be used to improve OCR  
> performance.
> Using dictionary lookups is just one tool in the box. OCR is improving
> using many different techniques. I've been observing improvements in
> many different areas over the last few years (as long as I've been in
> the area), including:
>
>  - Improvements in low-level Image processing techniques
>  - Improvements in feature extraction from characters
>  - Improvements in character recognition based on those features
>
> If you don't like dictionary lookups, don't use them. Raw OCR
> performance is improving in the lab and in the marketplace and is
> already great for a large proportion of documents. I must apologise on
> behalf of the research community if you find the rate of progress  
> to be
> inadequate.
>
> That said, if you don't like it, muck in. There are many research labs
> around the world working on improving OCR and related techniques  
> and I'm
> sure they'd be glad to have someone as knowledgeable as yourself join.
> There are even a few Free Software / Open Source OCR systems which  
> would
> gladly welcome any interested developers:
>
> Ocrad:     http://www.gnu.org/software/ocrad/ocrad.html
> GOCR/JOCR: http://jocr.sourceforge.net/
> ClaraOCR:  http://www.geocities.com/claraocr/
>
>
> Cheers,
> Holden
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From tb at baechler.net  Fri Mar 17 01:53:56 2006
From: tb at baechler.net (Tony Baechler)
Date: Fri Mar 17 01:53:20 2006
Subject: [gutvol-d] Producing texts for text-to-speech.
In-Reply-To: <95474F1D-CBF9-4340-B3C1-0B58D9740B16@uni-trier.de>
References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org>
	<44173562.4060705@bohol.ph>
	<g7hf12p0m21la5oo5opgd6l632o8d6q97t@4ax.com>
	<95474F1D-CBF9-4340-B3C1-0B58D9740B16@uni-trier.de>
Message-ID: <7.0.1.0.2.20060317015139.033f83c0@baechler.net>

Yes, and it isn't open source, cross platform, or especially useful 
to anyone without a Mac.  I for one can't use it because I don't have 
a Mac, and even if I did the built-in screen reader isn't perfect so 
I'm not sure how accessible it is.  Also he didn't necessarily want 
actual audio output, only a means of which files could be created so 
users could make their own output if they wish.  See my previous 
discussion about DAISY.

At 04:38 AM 3/16/2006, you wrote:

>         If you have a Mac it will read it for you.
>         You can also customize the dictionary.
>         There is also a programming interface if
>         you really want high quality output, you can even
>         create your own voices.
>
>         I personally have not played with it. It has been
>         around for a long time.

From schultzk at uni-trier.de  Fri Mar 17 02:14:58 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Fri Mar 17 02:15:05 2006
Subject: [gutvol-d] Producing texts for text-to-speech.
In-Reply-To: <7.0.1.0.2.20060317015139.033f83c0@baechler.net>
References: <21d.9c6beac.31472ca4@aol.com> <20060314061441.GD19944@pglaf.org>
	<44173562.4060705@bohol.ph>
	<g7hf12p0m21la5oo5opgd6l632o8d6q97t@4ax.com>
	<95474F1D-CBF9-4340-B3C1-0B58D9740B16@uni-trier.de>
	<7.0.1.0.2.20060317015139.033f83c0@baechler.net>
Message-ID: <9A4E28A5-CA29-4429-A756-D2CE259F5C2E@uni-trier.de>

Hi There,

	He wanted a system for markup. The Mac system can do this. All  
needed information is avaible
	free of charge and can be used publically. The system is  
customizable. I do admitt it is not
	crossplatform, but it can be used as a starting place if one has  
access to a Mac.

	Furthermore, any encoding/markup he chosses will be bound to one  
program or the other.
	Also, it should be his decision if want I suggest will fit his needs  
or not.

	To my knowledge DAISY is not what he wants either!!

	flame someone else!!!


		Keith.

Am 17.03.2006 um 10:53 schrieb Tony Baechler:

> Yes, and it isn't open source, cross platform, or especially useful  
> to anyone without a Mac.  I for one can't use it because I don't  
> have a Mac, and even if I did the built-in screen reader isn't  
> perfect so I'm not sure how accessible it is.  Also he didn't  
> necessarily want actual audio output, only a means of which files  
> could be created so users could make their own output if they  
> wish.  See my previous discussion about DAISY.
>
> At 04:38 AM 3/16/2006, you wrote:
>
>>         If you have a Mac it will read it for you.
>>         You can also customize the dictionary.
>>         There is also a programming interface if
>>         you really want high quality output, you can even
>>         create your own voices.
>>
>>         I personally have not played with it. It has been
>>         around for a long time.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From ajhaines at shaw.ca  Fri Mar 17 12:11:05 2006
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Fri Mar 17 12:11:09 2006
Subject: [gutvol-d] Scanner recommendations?
Message-ID: <000301c649fe$ecab54d0$6401a8c0@ahainesp2600>

I'm planning to purchase a new scanner sometime in the next few weeks, and 
am looking for comments and recommendations.

My current scanner is an HP Scanjet 5P, flatbed-type, SCSI-connected.  It 
works OK, but it's difficult to scan books that won't lie flat when opened 
(no binding "valley").  It isn't supported under Windows XP, so I have to 
keep switching my system back and forth between Windows 2000 and WinXP 
(drive racks are so cool for this - no dual-boot fussing).

My current scanning/OCR software is Abby Finereader Sprint 4.0, which uses 
HP's Deskscan V2.9 software to actually acquire the image to be OCR'ed.  For 
my purposes this combo works just fine and gives excellent results.  (I've 
played with Abby Professional 6.0, but found that it kept "getting in the 
way," so I've stuck with Sprint.)

Having said all that, it's the fact that the scanner, plus the Deskscan 
software, is SLOW, taking about 45 seconds or so to go from the start of the 
"Preview" scan to the finished "Final" scan of a page pair.  After that, the 
actual OCR and saving of the resulting text file takes only a few seconds. 
Including turning the page, a single scan takes about a minute, which I've 
decided is too slow to keep on with.  (In fact, it's the time investment 
that's keeping me from doing some of the thicker books I have.  Scanning is 
BORING, and I can't face an 800-page book with my current equipment.)

So, I'm looking for a scanner that's considerably faster than my current 
one, will handle stiffly-bound books without having to force them flat, is 
USB-connected, and works under Windows XP.

I've Googled "book scanner", but most of the hits have been for those big, 
professional scanners with lights, an overhead camera, automatic page 
turning, etc., that seem to cost in the $20K-$40K range - a definite 
overkill for my needs.

This search also pointed me to the Plustek Opticbook 3600, which I found 
mentioned in this forum's March 2005 archives, in the "Scanning/OCR Tips" 
thread.

Comments/recommendations on this or other candidate scanners?

Thanks,
Al 


From jeroen.mailinglist at bohol.ph  Fri Mar 17 14:23:55 2006
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Fri Mar 17 15:18:06 2006
Subject: [gutvol-d] Producing texts for text-to-speech.
In-Reply-To: <9A4E28A5-CA29-4429-A756-D2CE259F5C2E@uni-trier.de>
References: <21d.9c6beac.31472ca4@aol.com>
	<20060314061441.GD19944@pglaf.org>	<44173562.4060705@bohol.ph>	<g7hf12p0m21la5oo5opgd6l632o8d6q97t@4ax.com>	<95474F1D-CBF9-4340-B3C1-0B58D9740B16@uni-trier.de>	<7.0.1.0.2.20060317015139.033f83c0@baechler.net>
	<9A4E28A5-CA29-4429-A756-D2CE259F5C2E@uni-trier.de>
Message-ID: <441B36FB.5020201@bohol.ph>


Hi People,

What I wanted is a system of mark-up that has some value as standard. It 
should be future proof, well documented, and vendor neutral, such that I 
won't be forced to stay with one platform.  I am currently looking at 
SSML, which is an XML based W3C standard, in combination with aural CSS 
stylesheets. I know there are very few tools for this, but will rather, 
for the time being, transform to temporary formats than compromise on 
non-standard formats for masters. I will actually integrate the 
information machine spoken books require in my master TEI documents with 
a small number of extentions, which again, I will document. For Project 
Gutenberg, we have to plan for the long term.

Best regards,

Jeroen.

Keith J. Schultz wrote:

>
>     He wanted a system for markup. The Mac system can do this. All  
> needed information is avaible
>     free of charge and can be used publically. The system is  
> customizable. I do admitt it is not
>     crossplatform, but it can be used as a starting place if one has  
> access to a Mac.


From Bowerbird at aol.com  Fri Mar 17 15:53:19 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Mar 17 15:53:24 2006
Subject: [gutvol-d] 18,000
Message-ID: <65.5731ec2b.314ca5ef@aol.com>


congratulations to the project gutenberg volunteers
for crossing the 18,000 marker on the mothership...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060317/aa371f8e/attachment.html
From bruce at zuhause.org  Sat Mar 18 07:11:05 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Sat Mar 18 07:11:10 2006
Subject: [gutvol-d] Scanner recommendations?
In-Reply-To: <000301c649fe$ecab54d0$6401a8c0@ahainesp2600>
References: <000301c649fe$ecab54d0$6401a8c0@ahainesp2600>
Message-ID: <17436.8969.181849.80589@celery.zuhause.org>

Al Haines (shaw) writes:
 > I'm planning to purchase a new scanner sometime in the next few weeks, and 
 > am looking for comments and recommendations.

I have an Opticbook 3600, and I strongly recommend it.  It has three
built-in scan modes (black&white, greyscale, and color) tied to
scanner buttons, so you can change scan modes in the middle of a batch
scan process.  It has Abbyy Finereader Sprint 5.0, but I've never used
it, since I had FR6 and upgraded to FR8.  Since I'm often scanning
more fragile works, I usually scan a page at a time, and I get about 6
pages per minute in 300 DPI black & white, 5 PPM in 300 DPI greyscale,
and 1 or 2 PPM in 300 DPI color.  I usually batch scan using the
Opticbook's book pilot, do some post-processing, and then run it
through FR.  I bought mine for about $250 USD.

Since I use the Opticbook's book pilot to batch scan, I set up a
preview image once, and then I'm usually turning the page and
repositioning during the time it sends the data to the computer.  With
the book pilot, you can set it up so that it automatically rotates the
image for either scan a page at a time (rotating the book 180 degrees
for each page), or CCW 90 degrees for double page scans. Of course, if
you're scanning an oversize book, you'll want to be using page at a
time anyway.

If you're using FR, I think you're better off using the FR twain
interface (which I don't use), because you can set it to scan the
margins of your book (no preview mode, so you need to know or guess
the size) and then scan multiple pages, with background OCRing.  With
the FR twain interface, you can't automatically switch between scan
modes though.  Right now I'm scanning a book with a lot of color
illustrations, so it's really nice to press the grey button for my
normal greyscale scan, and hit the color button when I hit a page with
a colour illustration.

You can also use it for double page scans, like a normal flat bed
scanner.  I usually don't because I think I get better results (albeit
at half the scan speed) with single page mode.

Downsides:
It's not SANE compliant, so you have to use Windows (not a problem for
you, but it's a show stopper for others).  The usable scan starts
about 3 mm from the edge of the scanner, so that if you have really
narrow gutters on the book, you will still have problems.  The depth
of field is only so-so, so the curvature with thick book with narrow
gutters will make the edge very dark, and sometimes unusable.
Greyscale scanning is better than B&W scanning for this.
From sly at victoria.tc.ca  Sun Mar 19 00:09:13 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Mar 19 00:09:19 2006
Subject: [gutvol-d] eGranary library
Message-ID: <Pine.GSO.4.58.0603190005570.13771@vtn1.victoria.tc.ca>


>From the description, this sounds like a very worth-while
project which distributes digital resources to places they
might otherwise not be accessible. Perhaps
somewhere that would be good to have PG texts?


http://www.egranary.org/

A good summary here:
http://en.wikipedia.org/wiki/EGranary_Digital_Library

Andrew
From Bowerbird at aol.com  Sun Mar 19 01:33:50 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Mar 19 01:33:58 2006
Subject: [gutvol-d] eGranary library
Message-ID: <2fa.eb65f6.314e7f7e@aol.com>

i would've been surprised if they
weren't already familiar with p.g.
("amazed" would be a better term.)

but of course they are quite familiar,
as indicated on this page on their site:
>    http://www.widernet.org/digitallibrary/DigitalLibraries.htm

so i'm quite certain _some_ of the library
has long been included in their program.

not sure how often they update, though,
especially since the link on that webpage
still points to the promo.net site...

when michael's got his 35th anniversary
d.v.d. ready, he should send it to them...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060319/c2a1da24/attachment.html
From sly at victoria.tc.ca  Sun Mar 19 09:22:12 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Mar 19 09:22:15 2006
Subject: [gutvol-d] eGranary library
In-Reply-To: <2fa.eb65f6.314e7f7e@aol.com>
References: <2fa.eb65f6.314e7f7e@aol.com>
Message-ID: <Pine.GSO.4.58.0603190919140.3359@vtn1.victoria.tc.ca>

Thanks for finding that page.

At the top, the list is prefaced with:

   In addition to the eGranary, we have compiled a list of scholastic
   journals and electronic resources that are available online via the
   worldwide web.

>From this information, I can't tell if they distribute
PG texts or not.

Andrew

On Sun, 19 Mar 2006 Bowerbird@aol.com wrote:

> i would've been surprised if they
> weren't already familiar with p.g.
> ("amazed" would be a better term.)
>
> but of course they are quite familiar,
> as indicated on this page on their site:
> >    http://www.widernet.org/digitallibrary/DigitalLibraries.htm
>
> so i'm quite certain _some_ of the library
> has long been included in their program.
>
> not sure how often they update, though,
> especially since the link on that webpage
> still points to the promo.net site...
>
> when michael's got his 35th anniversary
> d.v.d. ready, he should send it to them...
>
> -bowerbird
>
From Bowerbird at aol.com  Sun Mar 19 10:03:42 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Mar 19 10:03:49 2006
Subject: [gutvol-d] eGranary library
Message-ID: <2d3.51387e1.314ef6fe@aol.com>

andrew said:
>    From this information, 
>    I can't tell if they distribute PG texts or not.

understandable, as they don't say so explicitly.

nonetheless, since their mission is to provide
electronic texts to places that have a hard time
accessing the internet proper, with a server-box
that holds a number of e-texts, i'd assume so...

if you're filling a granary, you would definitely
harvest one of the biggest fields around, not?

but they might have harvested it once, and then
not returned later as the field increased in size...

so it would definitely be a good idea to send 'em
the yield from the newest crop, once it's threshed.

i'd assume michael plans on making a big noise
when that d.v.d. is ready...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060319/7a289803/attachment.html
From Bowerbird at aol.com  Sun Mar 19 11:50:18 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Mar 19 11:50:22 2006
Subject: [gutvol-d] the secret garden demo
Message-ID: <302.fdfa0d.314f0ffa@aol.com>

i have put up my newest demo, "the secret garden":
>    http://www.greatamericannovel.com/sgfhb/sgfhbc001.html

these demos are aimed at "continuous proofreading",
but with this latest example i've also begun doing the
_formatting_ expected for the purpose of pure reading.

for example, the chapter-headers are now _displayed_
as headers (i.e., big and bold), and they are hotlinked
back to the "hot table of contents" for easy navigation.

in addition, the "table of contents" pages are hotlinked
to the items listed.   (these hotlinks are in addition to 
the ones on the specialized "table of contents" pages
which are auto-generated and were always hotlinked.)

i've also changed from internet-style block-paragraphs
(with a blank line between paragraphs) to book-style
indented paragraphs (with no blank line between 'em)...

with this formatting, the auto-generated .html display
is starting to look _highly_similar_ to the original pages...

page-numbers are also colorized, to make 'em stand out.

i've also included "chapter-jump" links, so the reader can
jump from any chapter to the one before or the one after.

finally, i included links on each page that allow the reader
to conveniently switch from the 1-page display to 2-up...

***

for years now, many people here wanted to refuse to accept
my position that a plain-text format could serve as "master",
so they "challenged" me to "prove it" with some "examples"...

now that i am doing so, they have grown strangely silent.

just as i knew they would.

at any rate, i welcome any constructive criticism of my work.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060319/d9281e3c/attachment.html
From gbnewby at pglaf.org  Sun Mar 19 18:39:58 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Mar 19 18:40:00 2006
Subject: [gutvol-d] Sponsors
In-Reply-To: <366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com>
References: <4419CD37.2000505@perathoner.de>
	<366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com>
Message-ID: <20060320023958.GG8882@pglaf.org>

> On 3/16/06, Marcello Perathoner <marcello@perathoner.de> wrote:
> >
> > From a different thread:
> >
> > >> they're looking for "sponsors", suggesting "an annual fee of $1000",
> > >> or even all the way up to $350,000, which buys you a "thank you" from
> > >>  within the browser of the one million of their clients you've
> > >> sponsored...
> >
> > I don't know about their millions of clients but the PG website is now
> > ranked top 3000 at alexa.com and serving ~250K pages to ~50K hosts a
> > day. We have a Google page-rank of 8. To get that spammers would feed
> > their mothers to the Ravenous Bugblatter Beast of Traal.
> >
> >
> > We could put an ad space at the top of every page. I'm thinking of
> > text-only ads, no distracting images. We could cycle ads like this:
> >
> >    Did you know that you can help producing ebooks investing
> >    just ten minutes a day? www.pgdp.net
> >
> >    Sponsor PG and get your web site mentioned here.
> >    See: www.gutenberg.org/fundraising/sponsoring
> >
> >    We thank the Curl Up and Dye hair parlor for their
> >    kind gift of $1000. www.curl-up-and-dye.com
> >
> >
> > Do we want to do this? And what rules should we put in place?

I really like the idea of having rotating ads for DP, the various PG
affiliates, ibiblio, and our other clearly-defined "partners" (at one
level or another, see http://www.gutenberg.org/links).

Maybe you could work on this, Marcello?  No need to delay, and
we already have a few banner graphics for both DP & PG.  In fact,
I remember we used to lead with an occasionally-rotating graphic.

I'd be in favor of some clear criteria for other organizations.
Including Wikipedia would be nice.  Places like the Linux Fund.  But
it's hard to draw a line.  For such organizations to submit artwork and
request being added to our rotating banner would be a wonderful service
that PG could provide.  But those criteria are a little sticky...

> > Is selling ads compatible with the non-for-profit status?

No, we can't sell ad space at all.  Neither PGLAF, nor ibiblio.  This
would need to just be free, and just for non-commercial messages.  (They
could be for commercial entities...  but not "buy our stuff" messages.)
But for not-for-profit, it's a wonderful idea, and on-mission for PG.

In the US, we have "public service announcements."  That's the type
of model we could easily pursue.

  -- Greg
From scott_bulkmail at productarchitect.com  Sun Mar 19 19:07:42 2006
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Sun Mar 19 19:13:39 2006
Subject: [gutvol-d] Sponsors
In-Reply-To: <20060320023958.GG8882@pglaf.org>
References: <4419CD37.2000505@perathoner.de>
	<366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com>
	<20060320023958.GG8882@pglaf.org>
Message-ID: <p06110416c043cc858530@[192.168.0.52]>

>>Is selling ads compatible with the non-for-profit status?
>
>No, we can't sell ad space at all.  Neither PGLAF, nor ibiblio.  This
>would need to just be free, and just for non-commercial messages.  (They
>could be for commercial entities...  but not "buy our stuff" messages.)
>But for not-for-profit, it's a wonderful idea, and on-mission for PG.

Just so it's clear: although IANAL, I'm pretty sure that there's nothing to stop a non-profit IN GENERAL from selling ads (or products).  So, the above must be restrictions specific to PGLAF and/or ibiblio.
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
From jon at noring.name  Sun Mar 19 19:16:41 2006
From: jon at noring.name (Jon Noring)
Date: Sun Mar 19 19:23:54 2006
Subject: [gutvol-d] the secret garden demo
In-Reply-To: <302.fdfa0d.314f0ffa@aol.com>
References: <302.fdfa0d.314f0ffa@aol.com>
Message-ID: <1706161355.20060319201641@noring.name>

Bowerbird wrote:

>  for years now, many people here wanted to refuse to accept
>  my position that a plain-text format could serve as "master",
>  so they "challenged" me to "prove it" with some "examples"...
>  
>  now that i am doing so, they have grown strangely silent.
>  
>  just as i knew they would.

I doubt many people are even reading your messages, let alone visiting
your demos. And if they look at your examples, they are probably
yawning. It's not worth their time to even write a reply.

So your explanation of the "silence" is probably a little off the
mark.

Jon



From Bowerbird at aol.com  Sun Mar 19 20:44:45 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Mar 19 20:45:03 2006
Subject: [gutvol-d] the secret garden demo
Message-ID: <35b.14920a.314f8d3d@aol.com>


ah, humor.   humor is good.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060319/fdcede2a/attachment.html
From sly at victoria.tc.ca  Sun Mar 19 21:45:28 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Mar 19 21:45:30 2006
Subject: [gutvol-d] Sponsors
In-Reply-To: <20060320023958.GG8882@pglaf.org>
References: <4419CD37.2000505@perathoner.de>
	<366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com>
	<20060320023958.GG8882@pglaf.org>
Message-ID: <Pine.GSO.4.58.0603192135001.12712@vtn1.victoria.tc.ca>

Is anyone here interested in helping to work out details of what
we consider an "affiliated project"? I would like to do this as
long as I have someone else to bounce ideas off of.

I think it would be ideal to have a loose affiliation of
projects with similar goals, particularly as there are an
increasing number of websites out there focusing on individual
languages that would be worth being centrally linked in one
place.

In the English-language wikipedia article on Project Gutenberg, there
was some disagreement recently about what should be included in the
list of "affiliated projects", partly because I think we don't
really have a clear definition here.

Although, as Greg mentions, the gray areas are where the challenge is.

Andrew

On Sun, 19 Mar 2006, Greg Newby wrote:

>
> I really like the idea of having rotating ads for DP, the various PG
> affiliates, ibiblio, and our other clearly-defined "partners" (at one
> level or another, see http://www.gutenberg.org/links).
>
> Maybe you could work on this, Marcello?  No need to delay, and
> we already have a few banner graphics for both DP & PG.  In fact,
> I remember we used to lead with an occasionally-rotating graphic.
>
> I'd be in favor of some clear criteria for other organizations.
> Including Wikipedia would be nice.  Places like the Linux Fund.  But
> it's hard to draw a line.  For such organizations to submit artwork and
> request being added to our rotating banner would be a wonderful service
> that PG could provide.  But those criteria are a little sticky...
>
From holden.mcgroin at dsl.pipex.com  Mon Mar 20 06:04:20 2006
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Mon Mar 20 06:04:25 2006
Subject: [gutvol-d] Sponsors
In-Reply-To: <20060320023958.GG8882@pglaf.org>
References: <4419CD37.2000505@perathoner.de>
	<366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com>
	<20060320023958.GG8882@pglaf.org>
Message-ID: <1142863461.26355.11.camel@steve-mcqueen>

On Sun, 2006-03-19 at 18:39 -0800, Greg Newby wrote:
> I'd be in favor of some clear criteria for other organizations.
> Including Wikipedia would be nice.  Places like the Linux Fund.  But
> it's hard to draw a line.  For such organizations to submit artwork and
> request being added to our rotating banner would be a wonderful service
> that PG could provide.  But those criteria are a little sticky...

Hi!

If I might, I'd like to suggest the fine folk at Ubuntu Linux
( http://www.ubuntu.com/ ).

For those of you who are unfamiliar with Ubuntu, it's a project which is
less than two years old. Their aim is to produce a refined operating
system (based on Linux) which is not only free of cost but also free as
in freedom and which is also usable by everybody in their native
language.

As I said, they're just two years old but due largely to the quality of
their "product", they've become literally overnight one of the largest
Linux distributions. On DistroWatch.com, they've been the number 1 Linux
distribution for over a year, and by a considerable margin too.

Personally, I've been running Ubuntu Linux for over a year as my main
desktop and server operating system. It's truly a worthy replacement to
Windows and -- best of all -- it's free in both senses. So, I think it
really is a worthy project and one which has very similar goals to
Project Gutenberg.

Cheers,
Holden

From marcello at perathoner.de  Mon Mar 20 09:34:39 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon Mar 20 09:34:44 2006
Subject: [gutvol-d] Sponsors
In-Reply-To: <20060320023958.GG8882@pglaf.org>
References: <4419CD37.2000505@perathoner.de>	<366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com>
	<20060320023958.GG8882@pglaf.org>
Message-ID: <441EE7AF.7070500@perathoner.de>

Greg Newby wrote:

> No, we can't sell ad space at all.  Neither PGLAF, nor ibiblio.

I'm not sure this is true:

"501 (b)  Tax on unrelated business income and certain other activities
An organization exempt from taxation under subsection (a) shall be 
subject to tax to the extent provided in parts II, III, and VI of this 
subchapter, but (notwithstanding parts II, III, and VI of this 
subchapter) shall be considered an organization exempt from income taxes 
for the purpose of any law which refers to organizations exempt from 
income taxes."

http://www.law.cornell.edu/uscode/html/uscode26/usc_sec_26_00000501----000-.html

IANAL but this means to me that we would have to pay taxes on the ad 
revenues, but selling ads would not endanger our non-profit status.


But the more interesting question is: if we just display standard "thank 
you" notices for donations received, without letting the donor choose 
the text, would this be considered "selling ads" or just being nice to 
our donors?



-- 
Marcello Perathoner
webmaster@gutenberg.org

From scott_bulkmail at productarchitect.com  Mon Mar 20 10:06:00 2006
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Mon Mar 20 10:07:27 2006
Subject: [gutvol-d] Sponsors
In-Reply-To: <441EE7AF.7070500@perathoner.de>
References: <4419CD37.2000505@perathoner.de>
	<366100670603161306m339473f2v5b31fa6a7f61a7b0@mail.gmail.com>
	<20060320023958.GG8882@pglaf.org> <441EE7AF.7070500@perathoner.de>
Message-ID: <p0611041dc0449e7edeba@[192.168.0.52]>

>IANAL but this means to me that we would have to pay taxes on the ad revenues, but selling ads would not endanger our non-profit status.

Do public radio stations pay tax on their ad revenue?  Do the Girl Scouts pay tax on their cookie sales?  I doubt it (but I could well be wrong).

PG operates a Web site; showing ads strikes me as more "related" than "unrelated".

And, even if income tax is involved, that's hardly a show stopper.  I would guess the ad revenue would easily cover the tax rate and an accountant.
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
From Bowerbird at aol.com  Mon Mar 20 11:40:03 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Mar 20 11:40:14 2006
Subject: [gutvol-d] Sponsors
Message-ID: <90.70f24a7b.31505f13@aol.com>

scott said:
>    And, even if income tax is involved, 
>    that's hardly a show stopper.? 
>    I would guess the ad revenue 
>    would easily cover the tax rate 
>    and an accountant.

why would p.g. want to go into the ad-selling business?
isn't the internet permeated enough with sales pitches?
even "thank you" notes for donors smells "fishy" to me...

where _precisely_ is it you expect the proceeds would go?

i think michael hart should be the #1 recipient, but i suspect
he'd rather go without pay to have his project remain pure...

indeed, isn't that pretty much what he just said?...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060320/aeab9b27/attachment.html
From marcello at perathoner.de  Mon Mar 20 11:54:14 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon Mar 20 11:54:22 2006
Subject: [gutvol-d] Sponsors
In-Reply-To: <90.70f24a7b.31505f13@aol.com>
References: <90.70f24a7b.31505f13@aol.com>
Message-ID: <441F0866.9040100@perathoner.de>

Bowerbird@aol.com wrote:

> why would p.g. want to go into the ad-selling business?

Why would PG want to collect donations?


> where _precisely_ is it you expect the proceeds would go?

The same places where the other donations go.


> he'd rather go without pay to have his project remain pure...

How can it be purer to "squeeze" people with little money than 
corporations with big money?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From brandon.galbraith at gmail.com  Mon Mar 20 11:53:05 2006
From: brandon.galbraith at gmail.com (Brandon Galbraith)
Date: Mon Mar 20 11:59:36 2006
Subject: [gutvol-d] Sponsors
In-Reply-To: <90.70f24a7b.31505f13@aol.com>
References: <90.70f24a7b.31505f13@aol.com>
Message-ID: <366100670603201153yad99acdq967de231a4b772fb@mail.gmail.com>

Because even though we have a huge amount of volunteers, you still need
money to pay the bills?
Some parts of the world runs on hopes and dreams, but the rest of it runs on
cold, hard cash.

-brandon

On 3/20/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
>
> scott said:
> >   And, even if income tax is involved,
> >   that's hardly a show stopper.
> >   I would guess the ad revenue
> >   would easily cover the tax rate
> >   and an accountant.
>
> why would p.g. want to go into the ad-selling business?
> isn't the internet permeated enough with sales pitches?
> even "thank you" notes for donors smells "fishy" to me...
>
> where _precisely_ is it you expect the proceeds would go?
>
> i think michael hart should be the #1 recipient, but i suspect
> he'd rather go without pay to have his project remain pure...
>
> indeed, isn't that pretty much what he just said?...
>
> -bowerbird
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>


--
Brandon Galbraith
Email: brandon.galbraith@gmail.com
AIM: brandong00
Voice: 630.400.6992
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060320/85940336/attachment-0001.html
From creeva at gmail.com  Mon Mar 20 12:07:53 2006
From: creeva at gmail.com (Brent Gueth)
Date: Mon Mar 20 12:13:35 2006
Subject: [gutvol-d] the secret garden demo
In-Reply-To: <302.fdfa0d.314f0ffa@aol.com>
Message-ID: <003201c64c59$f96fc590$6738a8c0@Corp.Symantec.Com>

This is the argument that I agreed with you with months ago.   I think that
plain text should always be the master as it is easier to format for new
devices with interopt with old.    Of course now saying this and siding with
you on it I'm going to get flamed to death with the old XML debate.   But
like before that's my .02.

 

A Twi a day keeps the wookiee away.

www.creeva.com

  _____  

From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Bowerbird@aol.com
Sent: Sunday, March 19, 2006 2:50 PM
To: gutvol-d@lists.pglaf.org; Bowerbird@aol.com
Subject: [gutvol-d] the secret garden demo

 

i have put up my newest demo, "the secret garden":
>   http://www.greatamericannovel.com/sgfhb/sgfhbc001.html

these demos are aimed at "continuous proofreading",
but with this latest example i've also begun doing the
_formatting_ expected for the purpose of pure reading.

for example, the chapter-headers are now _displayed_
as headers (i.e., big and bold), and they are hotlinked
back to the "hot table of contents" for easy navigation.

in addition, the "table of contents" pages are hotlinked
to the items listed.  (these hotlinks are in addition to 
the ones on the specialized "table of contents" pages
which are auto-generated and were always hotlinked.)

i've also changed from internet-style block-paragraphs
(with a blank line between paragraphs) to book-style
indented paragraphs (with no blank line between 'em)...

with this formatting, the auto-generated .html display
is starting to look _highly_similar_ to the original pages...

page-numbers are also colorized, to make 'em stand out.

i've also included "chapter-jump" links, so the reader can
jump from any chapter to the one before or the one after.

finally, i included links on each page that allow the reader
to conveniently switch from the 1-page display to 2-up...

***

for years now, many people here wanted to refuse to accept
my position that a plain-text format could serve as "master",
so they "challenged" me to "prove it" with some "examples"...

now that i am doing so, they have grown strangely silent.

just as i knew they would.

at any rate, i welcome any constructive criticism of my work.

-bowerbird

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060320/87a5b512/attachment.html
From prosfilaes at gmail.com  Mon Mar 20 12:23:42 2006
From: prosfilaes at gmail.com (David Starner)
Date: Mon Mar 20 12:30:36 2006
Subject: [gutvol-d] the secret garden demo
In-Reply-To: <003201c64c59$f96fc590$6738a8c0@Corp.Symantec.Com>
References: <302.fdfa0d.314f0ffa@aol.com>
	<003201c64c59$f96fc590$6738a8c0@Corp.Symantec.Com>
Message-ID: <6d99d1fd0603201223i3f5986d1x334d610a8b33d8b6@mail.gmail.com>

On 3/20/06, Brent Gueth <creeva@gmail.com> wrote:
> This is the argument that I agreed with you with months ago.   I think that
> plain text should always be the master as it is easier to format for new
> devices with interopt with old.

As you say in an rich-text email. All a "plain text" markup format
makes easy is a half-assed conversion. If you want to actually convert
it, you've got to break it down properly, and a standard XML reader
makes that as easy, if not easier, then a custom-designed program to
read an arbitrary "plain text" format.
From Bowerbird at aol.com  Mon Mar 20 17:49:58 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Mar 20 17:50:04 2006
Subject: [gutvol-d] so much to chat about, i'll have to go outline
Message-ID: <2be.7172d97.3150b5c6@aol.com>

so much to chat about, i'll have to go outline:

1.   agility=ability for zen markup viewer-apps
2.   sloppy thinking lumps donations with ads
3.   #3 of "monday morning quarterback" is up
4.   thanks for the support, brent, but stay low
5.   i've just posted another demonstration-book

***

1.   because the format of the z.m.l. format is
so very transparent, it's easy for apps to grok it.
and to manipulate it.   and to slap it into output.
thus the programs will become especially _agile_.

100k of script, 90k boilerplate, so you write 10k
just do the specific job you want done right now,
and you're debugging, with the whole wide world
(if you want) to help you find and fix that mistake.
the job gets done, and you're off to the next one...

your xml programs, however, will be bloatware,
with that heavy-markup requiring complex code
that'll be a nightmare to try to modify on the fly.
and thus improvements will be slow in coming...

and, as usual, the race goes to the swift.

the truth of the matter will be that the abundance
of the _zml_ viewer-programs, not the _xml_ ones,
plus their _agility_, will be the defining features that
tips things _to_ me (and not _away_ as you suggest).

***

2.   ok, i will explain how donations differ from ads.

a donation is a _reward_ giving a stamp of affirmation
signing a relationship whose past has _proven_ worth,
a firm avowal of underlying root in a _gift_economy_.
it says, "job well done, my friend, thank you very much."

an ad-sale is an _exchange_ that sets an expectation
that the relationship will _deliver_ worth in the future,
and becomes a symbol of foundation based on _barter_.
it says, "ok, i'd better get my money's worth out of this."

to the greatest extent possible now, the world needs
relations built on a cornerstone of _gift_, not _barter_.
project gutenberg is one of the leading lights in that
move to future that is gift-based, not barter-based...

and besides, in an organization that runs so completely
on volunteer labor, a little money would be a bad thing.
a terrible thing.

the only person who can reasonably expect _anything_
out of project gutenberg is michael, and even for him,
it's only due to all the years he spent in the wilderness,
not for being on a now-heavily-populated bandwagon.

***

3.   issue #3 of "monday morning quarterback" is out.
this one is short and sweet, focusing on just one point:
==================================
each scan you make should have, in its filename,
the _page-number_ of the page which it pictures.
==================================

>    http://groups.yahoo.com/group/bpsuper/message/7
>    http://snowy.arsc.alaska.edu/bowerbird/mmq/mmq03.txt

***

4.   while i appreciate your agreement with me, brent,
there's no real need to speak.   you'll only draw flames,
and it's better just to let these lying dogs die peacefully.
we're past the debate stage, anyway, and eating pudding.
hey, i'm gonna use something like that as my slogan --

            we _would_ eat our own dogfood, 
                but we don't _make_ dogfood;
                     we make _pudding_, and
                          we _love_ to eat it!   
                               it's _good_!

***

5.   another demo-book went up today, this one
titled "the hacker manifesto", by mckenzie wark.
wark is writing a new book in public, via a blog,
a test of the institute for the future of the book:
>    http://www.futureofthebook.org/blog/

the u.r.l. of my demo is:
>    http://www.greatamericannovel.com/ahmmw/ahmmwc001.html

i think i've forgotten to remind you so far that
it's a better overall reading experience if you
go into full-screen mode to do your reading.
(or hide all the toolbars if that's all you can do.)

not only does it remove unnecessary distraction,
allowing you to immerse yourself in the material,
but it also means the type and scans can be bigger.

(no one should complain e-books are hard to read,
because they can easily make e-books _easier_ to
read than paper, just by making the type _bigger_.)

also, if you hadn't noticed yet, clicking the image
"turns the page" to the next page.   (clicking the
left image on the 2-up interface "turns back to"
the preceding facing-pages spread in the book.)

as i said before, on a fast pipe, where scans
take less than a second or two to download,
you can speed through a book fairly quickly.

anyway, that's 5 demo-books up now:
>    http://www.greatamericannovel.com/mabie/mabiep001.html
>    http://www.greatamericannovel.com/myant/myantc001.html
>    http://www.greatamericannovel.com/tolbk/tolbkp001.html
>    http://www.greatamericannovel.com/sgfhb/sgfhbc001.html
>    http://www.greatamericannovel.com/ahmmw/ahmmwc001.html

pudding is served.

it's the beginning of the end for heavy markup...              :+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060320/b8093cfa/attachment.html
From hyphen at hyphenologist.co.uk  Tue Mar 21 00:34:59 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Tue Mar 21 00:35:13 2006
Subject: [gutvol-d] Lili Marlene
In-Reply-To: <82.35f5d394.30eeecb0@aol.com>
References: <82.35f5d394.30eeecb0@aol.com>
Message-ID: <3odv12pag3e46nivas302gc8iu2p5feuec@4ax.com>


I note that the original German words of Lili Marlene are now out of
copyright in the USA, the English and French words, and the music, written
in the 1940s, are unfortunately still in copyright in life plus 70
countries, I have not investigated the situation in life plus 50 countries

http://history.sandiego.edu/gen/snd/lilymarlene.html
>>>Written by German soldier Hans Leip in 1915, set to music by Norbert
Schultze in 1938 as The Girl under the Lantern , recorded by Lale Andersen,
broadcast by German Forces Radio but was quickly banned in Germany,
broadcast daily by Radio Belgrade from Yugoslavia to the Afrika Korps in
1941 when Rommel indicated he liked it, adopted by the British Eighth Army
as one of the favorite songs of World War II, sung on radio by Marlene
Dietrich, recorded in English by Anne Sheldon in 1944. <<<

As my German is almost non existant, I am perhaps the last person to make
this into etext.   Perhaps a German speaking volunteer would run with this.

-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain

From sly at victoria.tc.ca  Tue Mar 21 09:29:51 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Mar 21 09:29:53 2006
Subject: [gutvol-d] Lili Marlene
In-Reply-To: <3odv12pag3e46nivas302gc8iu2p5feuec@4ax.com>
References: <82.35f5d394.30eeecb0@aol.com>
	<3odv12pag3e46nivas302gc8iu2p5feuec@4ax.com>
Message-ID: <Pine.GSO.4.58.0603210924370.8480@vtn1.victoria.tc.ca>


In any event, the text of a single song is almost certainly
too short for a Project Gutenberg text. See:
http://www.gutenberg.org/faq/V-17

Perhaps this project might be a better home for it:
http://www.recmusic.org/lieder/

Andrew

On Tue, 21 Mar 2006, Dave Fawthrop wrote:

>
> I note that the original German words of Lili Marlene are now out of
> copyright in the USA, the English and French words, and the music, written
> in the 1940s, are unfortunately still in copyright in life plus 70
> countries, I have not investigated the situation in life plus 50 countries
>
> http://history.sandiego.edu/gen/snd/lilymarlene.html
> >>>Written by German soldier Hans Leip in 1915, set to music by Norbert
> Schultze in 1938 as The Girl under the Lantern , recorded by Lale Andersen,
> broadcast by German Forces Radio but was quickly banned in Germany,
> broadcast daily by Radio Belgrade from Yugoslavia to the Afrika Korps in
> 1941 when Rommel indicated he liked it, adopted by the British Eighth Army
> as one of the favorite songs of World War II, sung on radio by Marlene
> Dietrich, recorded in English by Anne Sheldon in 1944. <<<
>
> As my German is almost non existant, I am perhaps the last person to make
> this into etext.   Perhaps a German speaking volunteer would run with this.
>
>
From ian at babcockbrown.com  Tue Mar 21 09:45:09 2006
From: ian at babcockbrown.com (Ian Stoba)
Date: Tue Mar 21 10:21:02 2006
Subject: [gutvol-d] iRex ebook reader
Message-ID: <1C09A92C-A1AE-497D-8D87-006F01B05DBC@babcockbrown.com>

I came across this link today and did not remember seeing it  
discussed on the list:

http://www.irextechnologies.com/shop/products/iliad.htm

iRex (a Phillips spinoff) is preparing to launch an e-book reader.  
Technically it seems most similar to the Sony one, but apparently  
without the onerous DRM.

Engadget is saying it may retail for ?650, or about the same as a  
station wagon full of used paperbacks from my local "Friends of the  
Library" sale.

http://www.engadget.com/2006/03/19/irex-reveals-deets-on-its-iliad- 
ebook-reader/


This email message may contain information that is confidential and proprietary to Babcock & Brown or a third party. If you are not the intended recipient, please contact the sender and destroy the original and any copies of the original message. Babcock & Brown takes measures to protect the content of its communications. However, Babcock & Brown cannot guarantee that email messages will not be intercepted by third parties or that email messages will be free of errors or viruses. 

If you do not wish to receive any further e-mail from Babcock & Brown, please send an email to opt-out@babcockbrown.com.
From Bowerbird at aol.com  Tue Mar 21 17:13:04 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Mar 21 17:13:11 2006
Subject: [gutvol-d] perhaps someone should tell david rothman
Message-ID: <2f2.1ad4467.3151fea0@aol.com>

um, perhaps someone should tell david rothman that if he
keeps talking about me over on the d.p. forums, i'll have to
go over there and start posting again to clear up the record...

and i don't think you d.p. people want me to do that, do you?
no, of course you don't.   so maybe someone should tell him...

and for the record here, regarding "the relationship" between
distributed proofreaders and librarycity, i quoted extensively
in my post from the _webpage_ whose u.r.l. i listed at the top
of my message, so it wasn't _me_ saying all of those things...

and people who compounded those quotes with their own
misinterpretations should answer for their own mistakes...

***

i informed you about one of my newest demo-books:
>    http://www.greatamericannovel.com/sgfhb/sgfhbc001.html

but i forgot to say that i used scans i got from d.p. so as to
show people the quality of some of the scans that d.p. does.

these scans were good enough for some acceptable o.c.r.,
presumably, because the final e-text as posted was good,
but the scans are not good enough for reading purposes...

this is not a criticism -- because that wasn't their intent --
but it does have bearing on those people who try to tell us
the d.p. scans can be productively used for those purposes.

many (if not most) of the scans that d.p. has in storage are
simply not good enough for reading, even if they're cleaned.

they're good enough to do "continuous proofreading", yes,
but that's about all.   if we _really_ will want to put their scans
in some kind of "archive" for reading by end-users, then d.p.
needs to set a new standard of quality for people scanning...

anyway, since my other scan-sets are of very high quality,
it was good to have a demo with a lesser-quality scan-set.

but in general, i would not consider this level of quality to
be of above the minimal level required for public posting...
(and i remind people again that this was not its objective.)

***

meanwhile, here's a morsel from carlo on the d.p. forums:
>    Perhaps here we need a new idea. 
>    We need to be sure that the proofreading 
>    is OK before applying the formatting. 
>    This means that we have to check 
>    the proofreading quality before F1, 
>    and in case of need repeat a P round. 

hey, carlo, perhaps you need an even newer idea   --
forgo the formatting rounds entirely for zen markup!
(alright, it's the same old idea i suggested long ago...)


>    Then the project goes to the F rounds 
>    or to an off-line formatting. 

when i suggested offline formatting "long ago",
some d.p. people wanted to tar-and-feather me,
suggesting i didn't know what "distributed" meant.

well, um, yes, i certainly do, but pushing a whole book
worth of scans out at an array of people so they can say
"nope, no formatting on that page either..." is retarded...

and doing it twice is _doubly_ retarded...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060321/c0c03a81/attachment.html
From gbnewby at pglaf.org  Wed Mar 22 08:51:07 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Mar 22 08:51:08 2006
Subject: [gutvol-d] Sponsors
In-Reply-To: <441F0866.9040100@perathoner.de>
References: <90.70f24a7b.31505f13@aol.com> <441F0866.9040100@perathoner.de>
Message-ID: <20060322165107.GA6734@pglaf.org>

On Mon, Mar 20, 2006 at 08:54:14PM +0100, Marcello Perathoner wrote:
> Bowerbird@aol.com wrote:
> 
> >why would p.g. want to go into the ad-selling business?
> 
> Why would PG want to collect donations?

We spend our money on just a few things.  Our fiscal year
runs July - June, and we get an annual audit (which also costs
money!)

1. CD/DVD giveaways.  This was about $10000 in the prior fiscal year.
We reimburse volunteers for media & mailing costs; many recipients
choose to make a donation after getting their CD/DVD, so this
project is largely self-sustaining.

2. States compliance.  This will be reduced, since we no longer
have enough planned income to justify it - but since 2001, we've
tried to follow the often-onerous (sometimes easy) not-for-profit
fundraising guidelines from all fifty US states.

3. Office management and related compliance & activities.  This
is the Wingates shared 1/4 time salary to open the mail, deal with
our bank, occasionally field phone calls, and work on #2.

4. Buy books.  We reimburse a few bookbuyers who channel into DP,
at an average cost of < $1/book.

5. Support DP & PGLAF systems & hosting, occasional scanners
& supplies.  For example, hosting for DP's colocated server is
about $1100/year.

6. Pay Michael.  This hasn't happened in a few years, because his
target salary is a lot more than all of the above...in order to pay
Michael, we'd need him (or someone else) to do some successful
fundraising.  So, this is a theoretical budget item.

We *always* encourage people to seek fundraising opportunities,
because there are always some projects we'd like to grow (like
the giveaways) or start.  But such people should check with me
before getting started, since there are some guidelines to follow
(partially for our not-for-profit status, partially to keep
"on mission").

There's no way we could ever pay for the hugely valuable volunteer
labor, so the $50,000/year or so we've lived on for the past few years
has been plenty to sustain core production activities.

  -- Greg

From Bowerbird at aol.com  Fri Mar 24 10:45:09 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Mar 24 10:45:20 2006
Subject: [gutvol-d] when they act surprised
Message-ID: <361.812eaf.31559835@aol.com>

it was a mere 3-6 months ago that i was
informing people here that their scraping
of the google books would cause google to
become too conservative in displaying scans.

it's annoying when people tune out warnings.

but it is even _more_ annoying when they act
_surprised_ when the consequences show up!

>? ? http://groups.yahoo.com/group/ebook-community/message/25166

um, yes, bruce, google is being overly cautious.
your scan-scraper script is one main reason why.
and your catalog is _another_ main reason why...

of not quite the magnitude of publisher suits,
granted, but big enough to be "main" reasons.

so take a look in the mirror, buddy.

and we cannot ignore the fact that _scrapers_
are the ones currently spooking publisher fear.
their "hackers-will-just-grab the-whole-book"
nightmare is tinged in reality when they see you.

so you can act all surprised if you want,
upon discovering that your
actions have consequences.

but all the people who have been reading this list
know that _i_told_you_so_.?? and you didn't listen...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060324/be558214/attachment.html
From jon.ingram at gmail.com  Fri Mar 24 12:16:53 2006
From: jon.ingram at gmail.com (Jon Ingram)
Date: Fri Mar 24 12:24:03 2006
Subject: [gutvol-d] when they act surprised
In-Reply-To: <361.812eaf.31559835@aol.com>
References: <361.812eaf.31559835@aol.com>
Message-ID: <4baf53720603241216m5965f200w5af1a23c856f871a@mail.gmail.com>

On 3/24/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
> it was a mere 3-6 months ago that i was
> informing people here that their scraping
> of the google books would cause google to
> become too conservative in displaying scans.
>
> it's annoying when people tune out warnings.
>
> but it is even _more_ annoying when they act
> _surprised_ when the consequences show up!

I don't believe anyone has been mass-downloading books from Google's
archive based on Bruce's index of their content, using Bruce's scraper
or otherwise. Around a dozen DPers have claimed books to download,
according to the list available at

  http://homepage.ntlworld.com/jenjonliz/jon/tia/google.html

(note that the list isn't currently being maintained, because there's
little demand at DP for new content scraped from any site at the
moment, and several of us are in the initial stages of working on a
more general database-driven system for claiming books from image
providers)

The number of claimed books is in the low hundreds, and most of these
have not been downloaded, either because research has indicated that
the books are already in PG, or because there was no need for them on
DP until now, due to the current glut of content working its way
through the DP system. I'd be very surprised if DPers have been
responsible for scraping even a hundred complete texts from Google's
archive -- a tiny amount compared to the more than 35000 texts listed
in Bruce's current index.

As far I can tell, Google is allowing me to view all the works it has
allowed me to view ever since their site was set up, so I don't see
any evidence that they have become more conservative, at least in
content displayed to people in the UK. On the other hand, their policy
of restricting access based on the publication date being earlier than
1864 *does* exclude a lot of books which are public domain in the UK
from being viewed in the UK -- and, oddly, they aren't moving the
barrier forward each year, as they should (unlike the US, the public
domain isn't frozen here, so new material is entering every year). It
is just another example of US-based companies only dealing with non-US
issues as a poorly considered afterthought, so it's not all that
surprising :).

--
Jon Ingram
From bruce at zuhause.org  Fri Mar 24 16:37:28 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Fri Mar 24 16:37:31 2006
Subject: [gutvol-d] when they act surprised
In-Reply-To: <361.812eaf.31559835@aol.com>
References: <361.812eaf.31559835@aol.com>
Message-ID: <17444.37064.48800.463423@celery.zuhause.org>

Bowerbird@aol.com writes:
 > but it is even _more_ annoying when they act
 > _surprised_ when the consequences show up!
 > 
 > >? ? http://groups.yahoo.com/group/ebook-community/message/25166
 > 
 > um, yes, bruce, google is being overly cautious.
 > your scan-scraper script is one main reason why.
 > and your catalog is _another_ main reason why...

I don't think I ever expressed surprise.  Annoyance, perhaps.  I don't
believe that Google's decision to only classify books as PD only when
there's an explicit copyright, as opposed to including books with an
explicit publishing date has anything to do with my (or other
people's) scan scraper, or my catalog.  Furthermore, if they were so
concerned about them, do you think they would have put about another
25,000 books online in the PD status, and added a select box to search
for PD-only books?  The PD-only search seems to miss some things, but
I'm quibbling.

BTW, Google is aware of my catalog, and the Google Books program
manager mentioned my catalog as "Bruce Albrecht's catalog" (or
something close to it) at a conference. They've never attempted to
contact me.  Interpret that as you will.
From hyphen at hyphenologist.co.uk  Sat Mar 25 02:46:03 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sat Mar 25 02:46:18 2006
Subject: [gutvol-d] Four F W Moorman books bound together &  Any Tykes?
In-Reply-To: <000301c613f2$cbe0ab70$6401a8c0@ahainesp2600>
References: <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>
	<82.35f5d394.30eeecb0@aol.com>
	<KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>
	<3.0.5.32.20060107164958.0254d100@mail.chattanooga.net>
	<000301c613f2$cbe0ab70$6401a8c0@ahainesp2600>
Message-ID: <516a22dmui1b8la2dvadgkld3f6af2hkts@4ax.com>



I have just bought "Tales, Songs, and Plays of the Ridings" which contains
"Tales the Ridings", "More Tales of the Ridings", "Plays of the Ridings",
and "Songs of the Ridings".   This last "Songs of the Ridings I have
already done for PG, Etext No 3232.

On looking at what I have bought, these four books are simply a complete
reprinting of the latest edition of the four complete with title pages, and
even adverts bound together as a single book.  All are pre 1923.

Would PG by happy if I was to submit the three not already done as three
single books?   The advantage would be to split the work into sections to
use as a break from John Hartley books.

Any other Yorkshire Tykes out there?  Alison is helping me with
proofreading, are there any more Tykes out there who would be willing to
help doing Yorkshire Dialect books?  The DP route seems to have fallen at
the first hurdle :-(

-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain

From greg at durendal.org  Sat Mar 25 04:28:50 2006
From: greg at durendal.org (Greg Weeks)
Date: Sat Mar 25 05:00:06 2006
Subject: [gutvol-d] Four F W Moorman books bound together &  Any Tykes?
In-Reply-To: <516a22dmui1b8la2dvadgkld3f6af2hkts@4ax.com>
References: <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>
	<82.35f5d394.30eeecb0@aol.com>
	<KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>
	<3.0.5.32.20060107164958.0254d100@mail.chattanooga.net>
	<000301c613f2$cbe0ab70$6401a8c0@ahainesp2600>
	<516a22dmui1b8la2dvadgkld3f6af2hkts@4ax.com>
Message-ID: <Pine.LNX.4.63.0603250725210.6940@durendal.durendal.org>

On Sat, 25 Mar 2006, Dave Fawthrop wrote:

> Would PG by happy if I was to submit the three not already done as three
> single books?   The advantage would be to split the work into sections to
> use as a break from John Hartley books.

I've routinely split books apart that are reprinted this way. Sometimes it 
was necessary for rights reasons where I couldn't clear one or the other.

-- 
Greg Weeks
http://durendal.org:8080/greg/

From sly at victoria.tc.ca  Sat Mar 25 08:00:30 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat Mar 25 08:00:34 2006
Subject: [gutvol-d] Four F W Moorman books bound together &  Any Tykes?
In-Reply-To: <516a22dmui1b8la2dvadgkld3f6af2hkts@4ax.com>
References: <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>
	<82.35f5d394.30eeecb0@aol.com>
	<KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>
	<3.0.5.32.20060107164958.0254d100@mail.chattanooga.net>
	<000301c613f2$cbe0ab70$6401a8c0@ahainesp2600>
	<516a22dmui1b8la2dvadgkld3f6af2hkts@4ax.com>
Message-ID: <Pine.GSO.4.58.0603250753340.24838@vtn1.victoria.tc.ca>


Traditionally, it's been up to the volunteer to decide
what they would like to have done with this.

Myself, I like to have shorter texts by the same author
together if they have some subject resemblance, and are
likely to be downloaded together by someone interested
anyway.

For a recent example, see this text of Violet Jacob's poetry:
http://www.gutenberg.org/etext/17933

Andrew

On Sat, 25 Mar 2006, Dave Fawthrop wrote:

>
>
> I have just bought "Tales, Songs, and Plays of the Ridings" which contains
> "Tales the Ridings", "More Tales of the Ridings", "Plays of the Ridings",
> and "Songs of the Ridings".   This last "Songs of the Ridings I have
> already done for PG, Etext No 3232.
>
> On looking at what I have bought, these four books are simply a complete
> reprinting of the latest edition of the four complete with title pages, and
> even adverts bound together as a single book.  All are pre 1923.
>
> Would PG by happy if I was to submit the three not already done as three
> single books?   The advantage would be to split the work into sections to
> use as a break from John Hartley books.
>
> Any other Yorkshire Tykes out there?  Alison is helping me with
> proofreading, are there any more Tykes out there who would be willing to
> help doing Yorkshire Dialect books?  The DP route seems to have fallen at
> the first hurdle :-(
>
>
From sly at victoria.tc.ca  Mon Mar 27 09:36:13 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon Mar 27 09:36:19 2006
Subject: [gutvol-d] Dutch texts downloaded
Message-ID: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca>

Taking a look at the top-100 list,
http://www.gutenberg.org/browse/scores/top
For the statistics of the last seven days,
there seems to be more texts in Dutch than
I have noticed before.

It's good to see more interest in a wider
range of languages...


Andrew
From Bowerbird at aol.com  Mon Mar 27 10:44:48 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Mar 27 10:44:53 2006
Subject: [gutvol-d] welcome to another week
Message-ID: <2a7.a48791.31598ca0@aol.com>

d-lib has an article on language-translation by machine:
>    http://www.dlib.org/dlib/march06/smith/03smith.html

***

they also have an article on automatic "document recognition",
the ability to have a computer ascertain the underlying structure
of a document, what i've been talking about here for many years,
which my detractors here have repeatedly termed as "impossible":
>    http://www.dlib.org/dlib/march06/choudhury/03choudhury.html

in time, people will laugh at how ridiculously stupid my detractors were.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060327/bf8da816/attachment.html
From jeroen.mailinglist at bohol.ph  Mon Mar 27 14:27:13 2006
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Mon Mar 27 14:22:27 2006
Subject: [gutvol-d] Dutch texts downloaded
In-Reply-To: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca>
Message-ID: <442866C1.2050706@bohol.ph>


Hi All,

I noticed this as well, and it is getting less already. I don't know 
what is causing this, but thousands of copies have been downloaded. This 
almost starts to match the number of English downloads...

Can I have referrer logs, to find out where they come from? A per 
language top 100 would be much appreciated.

Jeroen.



Andrew Sly wrote:

>Taking a look at the top-100 list,
>http://www.gutenberg.org/browse/scores/top
>For the statistics of the last seven days,
>there seems to be more texts in Dutch than
>I have noticed before.
>
>It's good to see more interest in a wider
>range of languages...
>
>
>Andrew
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>  
>

From prosfilaes at gmail.com  Mon Mar 27 20:35:09 2006
From: prosfilaes at gmail.com (David Starner)
Date: Mon Mar 27 20:41:43 2006
Subject: [gutvol-d] Dutch texts downloaded
In-Reply-To: <442866C1.2050706@bohol.ph>
References: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca>
	<442866C1.2050706@bohol.ph>
Message-ID: <6d99d1fd0603272035w4998cfe2nc1c5d140924de47a@mail.gmail.com>

On 3/27/06, Jeroen Hellingman (Mailing List Account)
<jeroen.mailinglist@bohol.ph> wrote:
>
> Hi All,
>
> I noticed this as well, and it is getting less already. I don't know
> what is causing this, but thousands of copies have been downloaded. This
> almost starts to match the number of English downloads...
>
> Can I have referrer logs, to find out where they come from? A per
> language top 100 would be much appreciated.

Team Esperanto on DP was wondering about the most downloaded Esperanto
books. It would be interesting at least as a one-time thing.
From marcello at perathoner.de  Tue Mar 28 07:29:03 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Mar 28 07:29:07 2006
Subject: [gutvol-d] Esperanto texts downloaded
In-Reply-To: <6d99d1fd0603272035w4998cfe2nc1c5d140924de47a@mail.gmail.com>
References: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca>	<442866C1.2050706@bohol.ph>
	<6d99d1fd0603272035w4998cfe2nc1c5d140924de47a@mail.gmail.com>
Message-ID: <4429563F.9050400@perathoner.de>

David Starner wrote:

> Team Esperanto on DP was wondering about the most downloaded Esperanto
> books. It would be interesting at least as a one-time thing.

We keep records for the last 30 days only.

gutenberg=> SELECT scores.book_downloads.fk_books, SUM (downloads) AS 
downloads
gutenberg-> FROM scores.book_downloads, mn_books_langs
gutenberg-> WHERE mn_books_langs.fk_langs = 'eo'
gutenberg-> AND mn_books_langs.fk_books = scores.book_downloads.fk_books
gutenberg-> GROUP BY scores.book_downloads.fk_books
gutenberg-> ORDER BY downloads DESC;
  fk_books | downloads
----------+-----------
      8177 |       405
     16967 |       347
      7787 |       303
     17482 |       214
     11511 |       183
     17945 |       148
      8224 |       145
     17425 |       126
     17665 |        98
     11307 |        66
(10 rows)

gutenberg=>

-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Tue Mar 28 08:35:46 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Mar 28 08:35:51 2006
Subject: [gutvol-d] Dutch texts downloaded
In-Reply-To: <442866C1.2050706@bohol.ph>
References: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca>
	<442866C1.2050706@bohol.ph>
Message-ID: <442965E2.2000600@perathoner.de>

Jeroen Hellingman (Mailing List Account) wrote:

> I noticed this as well, and it is getting less already. I don't know 
> what is causing this, but thousands of copies have been downloaded. This 
> almost starts to match the number of English downloads...
> 
> Can I have referrer logs, to find out where they come from? A per 
> language top 100 would be much appreciated.

Spam attack courtesy of New Horizons. See:

   http://www.spews.org/html/S2507.html

They are collecting some texts to get past spam filters.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Tue Mar 28 09:50:30 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Mar 28 09:50:41 2006
Subject: [gutvol-d] another dose of reality
Message-ID: <248.97317aa.315ad166@aol.com>

x.m.l. fans here should read yet another dose of harsh reality,
this time surprisingly from one of your leaders, simon st. laurent:
>    http://www.xml.com/lpt/a/2006/03/15/next-web-xhtml2-ajax.html

it's interesting how simon uses the past tense for many of your buzzwords...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060328/1447f966/attachment.html
From sly at victoria.tc.ca  Tue Mar 28 10:11:04 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Mar 28 10:11:09 2006
Subject: [gutvol-d] Esperanto texts downloaded
In-Reply-To: <4429563F.9050400@perathoner.de>
References: <Pine.GSO.4.58.0603270935130.16713@vtn1.victoria.tc.ca>
	<442866C1.2050706@bohol.ph>
	<6d99d1fd0603272035w4998cfe2nc1c5d140924de47a@mail.gmail.com>
	<4429563F.9050400@perathoner.de>
Message-ID: <Pine.GSO.4.58.0603281009140.5042@vtn1.victoria.tc.ca>

In case anyone is interested besides David, here are the titles
to go with the list of numbers Marcello has provided. Notice
that the instructional books are quite clearly the most often
downloaded.

   PG number Downloads Title

      8177 |   405 The Esperanto Teacher
     16967 |   347 English-Esperanto Dictionary
      7787 |   303 A Complete Grammar of Esperanto
     17482 |   214 La Aventuroj de Alicio en Mirlando
     11511 |   183 Robinsono Kruso
     17945 |   148 Mark Twain: Tri Noveloj
      8224 |   145 Fundamenta Krestomatio
     17425 |   126 La Falo de Usxero-Domo
     17665 |    98 Mia Kontrabandulo
     11307 |    66 El la Biblio


On Tue, 28 Mar 2006, Marcello Perathoner wrote:

> David Starner wrote:
>
> > Team Esperanto on DP was wondering about the most downloaded Esperanto
> > books. It would be interesting at least as a one-time thing.
>
> We keep records for the last 30 days only.
>
> gutenberg=> SELECT scores.book_downloads.fk_books, SUM (downloads) AS
> downloads
> gutenberg-> FROM scores.book_downloads, mn_books_langs
> gutenberg-> WHERE mn_books_langs.fk_langs = 'eo'
> gutenberg-> AND mn_books_langs.fk_books = scores.book_downloads.fk_books
> gutenberg-> GROUP BY scores.book_downloads.fk_books
> gutenberg-> ORDER BY downloads DESC;
>   fk_books | downloads
> ----------+-----------
>       8177 |       405
>      16967 |       347
>       7787 |       303
>      17482 |       214
>      11511 |       183
>      17945 |       148
>       8224 |       145
>      17425 |       126
>      17665 |        98
>      11307 |        66
> (10 rows)
>
> gutenberg=>
>
>
From holden.mcgroin at dsl.pipex.com  Tue Mar 28 11:27:55 2006
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Tue Mar 28 11:51:46 2006
Subject: [gutvol-d] another dose of reality
In-Reply-To: <248.97317aa.315ad166@aol.com>
References: <248.97317aa.315ad166@aol.com>
Message-ID: <1143574076.4196.9.camel@steve-mcqueen>

On Tue, 2006-03-28 at 12:50 -0500, Bowerbird@aol.com wrote:
> x.m.l. fans here should read yet another dose of harsh reality,
> this time surprisingly from one of your leaders, simon st. laurent:
> >   http://www.xml.com/lpt/a/2006/03/15/next-web-xhtml2-ajax.html
> 
> it's interesting how simon uses the past tense for many of your
> buzzwords...

It seems like you're trying to pick a fight but I'll not bite.

The article you post is irrelevant to current XML-based plans for PG.
The whole point of the article is that direct delivery of XML to users
with stylesheets has not yet happened.

However, from what I've heard about current PG uses of XML, they tend
towards using XML as a master format for storage on the server. Content
is then converted to the user's desired format (plain text, HTML, XML,
PDF, or random format X).

It should be quite obvious that these two concepts are entirely
different. The planned PG approach does not even need the user to have
software capable of rendering XML with stylesheets. All it requires of
the user is a web browser for accessing PG's site in the first place and
a viewer for whichever download format he/she chooses.

----

On a slight side note, I don't see the point of your aggressive posts to
the list. Everybody here should be (is?) aiming towards the furthering
of PG's goals. If whatever format you choose happens to preferred in the
long run, that's not a reason for gloating. The other people on this
list are merely trying to help PG as much as we all hope you are.

Regards,
Holden

From Bowerbird at aol.com  Tue Mar 28 13:36:28 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Mar 28 13:36:39 2006
Subject: [gutvol-d] another dose of reality
Message-ID: <29b.8221559.315b065c@aol.com>

holden said:
>    It seems like you're trying to pick a fight

nope.   just sharing information.   got any?


>    The article you post is irrelevant to current XML-based plans for PG.

well, i guess that's a matter of opinion.

i think it is relevant, in the sense that
it talks about a bunch of much-hyped
"solutions" which have not materialized,
and now might _never_ come to fruition,
including some that are counted on here.


>    The whole point of the article is that direct delivery of XML to users
>    with stylesheets has not yet happened.

that's one of those "solutions", yes, but just one...


>    However, from what I've heard about current PG uses of XML, 
>    they tend towards using XML as a master format for storage

well yes, and as simon points out,
that is a "fallback" position from
the one that was originally staked,
which was the serving of x.m.l. files.

but even this fallback rings hollow here,
since the position that x.m.l. is needed
for _conversion_to_multiple_formats_
was always a tenuous one, in the sense
that many entities out there are already
doing mass conversion of p.g. e-texts,
even though they aren't in x.m.l. format.

further, the x.s.l.t. methodology that
has always been the crucial linchpin in
the "strategy" of x.m.l. advocates here
is one of the ones that simon relegates
to the past-tense.   i find that interesting.

and surely the examples of it that we have
seen so far have shown it is badly lacking.

add to the equation now that i'm showing,
with real example-books, that z.m.l. can
convert to multiple formats quite easily,
on the user's desktop, via button-clicks,
and the question becomes hard to avoid:
what's the reason to apply heavy markup?

don't get me wrong, i'm sure the hypesters
will be able to invent one, they are creative
that way, it's just that i think history should
warn us to take these with a grain of salt...


>    On a slight side note, I don't 
>    see the point of your aggressive posts 

first, my posts are not "aggressive".

but if you _choose_ to interpret them
that way, then i don't see _your_ point.
wouldn't it be better just to skip them?
why even bother reading them, holden?
(let alone replying to them?)

i mean, seriously, i could just as easily
interpret _your_ posts as ad hominem,
since you've said straight out that i am
"trying to pick a fight" with "aggressive posts".

but going down that road wouldn't be too
productive, so i consciously choose not to;
instead i have responded to your post with
rebuttals that are on-topic and on-point,
without diverting to attack your character.

if you want a mud-fight or a flame-war,
well i've shown i can do those things too;
but why not friendly conversation instead?

and don't get me wrong, i don't mean to be
disingenuous here, because i fully understand
that it's not pleasant to be on the losing side of
a "you were wrong" comment.   but that's the risk
you take when you take a stand and you're wrong.
but when someone is wrong, and you say they are
wrong, that doesn't mean it's an "aggressive" post.


>    Everybody here should be (is?) aiming 
>    towards the furthering of PG's goals. 

well yes, i believe that we all agree on that.
the next issue is, "how do we obtain that?"

on _that_ question, there is disagreement,
which has been longstanding, and ugly too.

and as much as some people might like to
sweep this disagreement under the carpet,
and have people forget what they said since
things aren't looking too swell for their side,
the disagreement still runs deep, and wide...

meanwhile, little progress is being made
on "the goals of p.g." that we all agree on...

how long does t.e.i./x.m.l./whatever remain
on the table as "the official plan" before it's
required to show some action and results?
how are "the goals of p.g." being served?

those are questions i think you all should be
asking yourselves.   as for me, i'll just keep on
plugging away with my little experiments, and
maybe someday you'll realize that z.m.l. is best.


>    If whatever format you choose 
>    happens to preferred in the long run, 
>    that's not a reason for gloating. 

well, i certainly will not be "gloating",
because i don't see much point in that.

however, it _is_ important to keep in mind
whose predictions were wrong, and right,
and whose credibility was badly shredded,
for future reference...

you know, fool me once, your fault,
but fool me twice, my fault, right?

so surely you can't mind if we evaluate
those matters quite closely, can you?

besides, in retrospect, my methodologies
will be so _obvious_ that no one will even 
consider their "invention" to be _special_;
that it was "controversial" will be laughable.

luckily, i'll be able to point to lots and lots of
messages that people posted to _this_ listserve
as solid evidence that some people didn't get it.

(which is why i spent so much time discussing it!
y'all would have been a lot smarter to _fold_ your
losing hand much earlier in this poker-game than
you did, instead of constantly raising your bets...)


>    The other people on this list are merely 
>    trying to help PG as much as we all hope you are.

and i give them full credit on the variable of "trying".

that doesn't mean i'm gonna start paying attention
to what they say they see in their crystal ball though,
because we've come to learn that it's badly cracked...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060328/aa972b31/attachment.html
From jon at noring.name  Tue Mar 28 13:57:05 2006
From: jon at noring.name (Jon Noring)
Date: Tue Mar 28 13:57:12 2006
Subject: [gutvol-d] another dose of reality
In-Reply-To: <1143574076.4196.9.camel@steve-mcqueen>
References: <248.97317aa.315ad166@aol.com>
	<1143574076.4196.9.camel@steve-mcqueen>
Message-ID: <1152284438.20060328145705@noring.name>

Holden wrote:
> Bowerbird wrote:

>> x.m.l. fans here should read yet another dose of harsh reality,
>> this time surprisingly from one of your leaders, simon st. laurent:
>>    http://www.xml.com/lpt/a/2006/03/15/next-web-xhtml2-ajax.html
>> 
>> it's interesting how simon uses the past tense for many of your
>> buzzwords...

> The article you post is irrelevant to current XML-based plans for PG.
> The whole point of the article is that direct delivery of XML to users
> with stylesheets has not yet happened.

The difficulty of adapting other-than-HTML XML documents to web
browsers is that web browsers are largely limited, by historical
development (and the resultant inertia of the installed-base of
web browsers), to the HTML "paradigm."

CSS partially helps with presentational "interpretation" of many XML
elements using the CSS 'display' property. But CSS 'display' does not
include values (nor should it for reasons best explained another time)
for the following important "bread and butter" web features:

   1) hypertext linking

   2) image and multimedia embedding

There's also the issue that using CSS 'display' to recognize table
markup must be of the HTML table model.

In addition, the HTML paradigm does not natively handle the TEI <note>
and DocBook <Note> elements (and similar constructs found in *many*
established markup vocabularies *except* HTML) which essentially place
annotative content in the main flow of the text at the point of
reference. Such annotative content is not intended to be displayed as
part of the main flow of the document, but to be extracted and somehow
rendered outside the mainflow.

(CSS2.1 may be used to float and move such inline annotative content,
but there's no native recognition in HTML of such inline annotative
content -- it's one of the bigger mistakes, and understandable given
the time I suppose, that the original HTML folk made when they
invented HTML and built rudimentary browsers which essentially
locked-in how web browsers are to work. They probably thought that
since the annotative content can be placed elsewhere and linked to
using <a>, why support inline annotative content? It was probably
programming expediency more than anything.)

XLink was primarily designed to be a vocabulary-independent way to add
hypertext linking and image/multimedia embedding to XML documents (I
think it could also be used for inline annotative content, but this is
less appealing for various reasons.)

Unfortunately, since everyone has been using HTML for so long, and
HTML already provides the <a>, <img> and <object> elements, there's
been little incentive to add XLink support to browsers. A sort of
Catch-22 situation.

Mozilla/Firefox support a limited subset of XLink sufficient to enable
hypertext linking as the following demo will show (of course, the link
only works using Mozilla-based browsers):

   http://www.windspun.com/demoxml/demolink.xml

Unfortunately, the full XLink spec is quite complex (because it was
designed to do everything except maybe toast bread) and this has acted
as a further impediment to its embracement. There are quite simple
subsets that could be implemented though, and implementation is fairly
straightforward, even of the full XLink. XLink is implemented in
several XML-based applications, but except for the limited Mozilla
support as described above, XLink has not yet reached the web browser
world.

(Note that CSS2 may be used to embed images within XML documents, but
this is a kludge since this violates the general principle of
separation of content from presentation (markup should be used to
embed the images, which are content, and not use CSS since documents
need to stand alone without CSS -- that is, no content references
should be placed into CSS -- lose the CSS, lose content.) For an
example using CSS for this purpose, which works only in Opera and
Mozilla/Firefox:

   http://www.windspun.com/demoxml/embedimage.xml  )

Handling non-HTML table models is a more complex issue, and one
which has no easy answer except either conforming all table markup
in XML documents to the HTML model (where CSS 'display' may then
be used for visual presentation), or adding multi-table-model support
to web browsers.

Inline annotative content is also a pretty sticky area. The XHTML
2.0 folk appeared to be close to adding something like the TEI <note>
tag to XHTML 2.0, but it appears they backed away from that, I
suppose because of the inertia of the current installed base of web
browsers. (This can be handled by CSS2-aware web browsers by simply
setting certain CSS properties for default handling.) At one time I
had a demo illustrating how to get inline annotative content to be
displayed to the side (such as in a sidebar), but I can't find that
demo. :^(

(Second note: XML Namespaces is a mechanism by which HTML markup may
be embedded within XML documents. So one could add something like
<xhtml:a>, <xhtml:img>, and <xhtml:object>, but then these are not
vocabulary-independent ways to add such functionality. XLink is the
better long-term solution because it is XML-generic.)


Anyway, the bottom line is that the biggest impediment to visual
presentation of "arbitrary" XML markup in web browsers is not XML, but
of the inertia of web browser developers to implement a few small
things, such as XLink support (even a subset sufficient for hypertext
linking and images/multimedia embedding.) There's nothing inherently
bad about XLink that makes it difficult to enable -- it's simply one
of those Catch-22 things, combined with some "political" aspects I
won't go into here.


> However, from what I've heard about current PG uses of XML, they tend
> towards using XML as a master format for storage on the server. Content
> is then converted to the user's desired format (plain text, HTML, XML,
> PDF, or random format X).

Yes. I observe that the main thrust these days in both PG and DP is to
develop a well-defined (constrained) subset of TEI for markup. XSLT
can then be used to generate web-browser friendly XHTML markup, as
well as other formats, such as plain text.

Except for the issues of hypertext linking, image/multimedia
embedding, table models, and inline annotative content as discussed
above, TEI itself may be natively rendered in advanced CSS2-aware web
browsers (such as Opera and Mozilla/Firefox). And as noted above, one
has some limited ability to use CSS to float inline annotative content
from the main flow to somewhere outside the mainflow. (Damn, wish I
could find that example illustrating this.)

Here's a couple CSS stylesheets already developed for rendering TEI:

   http://xml.web.cern.ch/XML/www.tei-c.org/Stylesheets/

(Been looking for examples of using these style sheets in action, but
haven't found any yet. Anyone?)


> On a slight side note, I don't see the point of your aggressive posts to
> the list. Everybody here should be (is?) aiming towards the furthering
> of PG's goals. If whatever format you choose happens to preferred in the
> long run, that's not a reason for gloating. The other people on this
> list are merely trying to help PG as much as we all hope you are.

Well, Bowerbird believes mastering PG/DP texts in XML is a bad idea,
so he's trying to convince PG/DP to instead embrace regularized plain
text, notably his ZML system. He's using all the weapons he can muster.

I don't see using ZML for mastering, but I do see using ZML rules for
regularizing the plain text output from a transformation of the XML
master.

Jon


From holden.mcgroin at dsl.pipex.com  Tue Mar 28 15:00:55 2006
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Tue Mar 28 15:00:58 2006
Subject: [gutvol-d] another dose of reality
In-Reply-To: <29b.8221559.315b065c@aol.com>
References: <29b.8221559.315b065c@aol.com>
Message-ID: <1143586855.4196.57.camel@steve-mcqueen>

Bowerbird,

I hope you don't mind if I snip your long post. I was unable to find
specific sections which would make for a suitable reply so I reply here
to your post as a whole.

The flaw in your argument, I find, is your suggestion that since PG's
XML teams have so far failed to produce any output, XML must be a bad
strategy for PG. Your ZML-based strategy has brought forth results and
so is better.

What is XML? XML is merely a method for devising markup languages. No
more, no less. If one project uses XML and, coincidentally, happens to
be slow, while another does not and happens to be faster, that does not
mean we can attribute such a speed difference to the different languages
used, particularly when other factors differ between the two projects.

Why has your ZML-based project brought forth results so quickly while
others have not? Simply because you are doing a different task to them.
Your project is essentially taking standard PG texts and automatically
formatting them. From what I've read of the XML team's efforts, they are
going back to the original books to ensure they get the formatting from
the original books.

There is, of course, a trade-off. Going back to the original books to
get missing formatting information is extremely time consuming but
guarantees accurate results. Your approach does not guarantee accurate
results, but settles for well formatted results in most cases, the
advantage of which is that it is fast.

Blaming a slow start on XML is to miss the point. The XML team could
just as well use automatic transformations to convert from PG texts to
their XML format, if they chose to. However, their opinion (they are
correct) is that such automatic translations as you are using can not
capture the full complexity of each book and will not work on every
book.

Moreover, choosing/defining a suitable format which is capable of
retaining every formatting nuance of any given text is not an enviable
task.

So, please take this as my plea for calm. You and the XML team are
working towards different goals. This is not a zero-sum game. If either
team produces a great product, it does not come at the expense of the
other team. If anything, why not just stop trying to disparage the other
team and just get on with producing texts in the best way that you know?

Regards,
Holden

From jon at noring.name  Tue Mar 28 15:04:52 2006
From: jon at noring.name (Jon Noring)
Date: Tue Mar 28 15:04:56 2006
Subject: [gutvol-d] another dose of reality
In-Reply-To: <29b.8221559.315b065c@aol.com>
References: <29b.8221559.315b065c@aol.com>
Message-ID: <83037338.20060328160452@noring.name>

Bowerbird wrote:

> further, the x.s.l.t. methodology that has always been the crucial
> linchpin in the "strategy" of x.m.l. advocates here is one of the
> ones that simon relegates to the past-tense.?i find that
> interesting.

Do you think Simon would agree with your assessment of what he is
saying? I think you are putting words into his mouth that he did not
say or mean.

XSLT is being *massively* used in quite a few XML applications, and
successfully so. No doubt XSLT has its share of problems as all
human-made systems have, but such problems have not stopped it from
being used in real-world systems. XSLT is not a theoretical spec --
it is definitely not "vapour."

DocBook is one notable success story. O'Reilly uses DocBook in much
of its publishing workflow (it was interesting to hear Tim O'Reilly
speak at Reading 2.0 -- he's a super-pragmatic person -- they use
DocBook and XSLT/XSL-FO because it *makes sense to*.) Rosetta
Solutions and other document conversion houses are moving fast to
mastering in XML (Rosetta Solutions is using DocBook) and using XSLT
(and the related XSL-FO) for outputting in various formats. It's been
eye-opening to talk with the several conversion houses (as we have
been doing for both OpenReader and LibraryCity.)

I also recall seeing a couple online book projects in academia which
master in TEI and use XSLT to generate XHTML and other formats. Do a
check on Google of "TEI XSLT". 168,000 pages came up. Have fun.

If PG/DP has failed so far to move to TEI-based (or other XML
vocabulary) mastering, it has little to do with XML, XSLT, etc. It's
simply the limited time of the volunteers. I notice that things in
PG-Land tend to move slow anyway in most areas, particularly when it
comes to change. Look at the problems you've had in getting text
errors corrected! (Although maybe that's due to not submitting error
reports to the right place.)

DP is where most of the action is taking place these days, but even
there, DP's long-planned move to a next-gen system (which includes a
uniform XML-based mastering) appears to have been put on hold as well
(or they're doing it in smaller increments.) They're too busy
producing texts. It is the tyranny found in every limited-funded,
volunteer organization (and even well-funded orgs): change tends to
take a long time unless some bright light steps forward to make
something happen.

You will no doubt argue, and there is merit in your argument, that
your ZML system (which is essentially regularized plain text) is the
answer to all PG's and DP's woes, but then you have to *show* that for
all the things they'd like to do with their texts in the long-term
future, ZML has sufficient structural resolution.

But your approach so far to convince others reminds me a lot of the
famous advertising slogan of "Ralph's Pretty Good Grocery" (in
Garrison Keilor's mythical small-town of Lake Wobegon):

   "If you can't find it at Ralph's, you can probably get along
    without it."

What's needed on both sides of the debate is a clear cut requirements
list of exactly what the "master" format is to accomplish/fulfill.
Then this will determine whether the simpler regularized text approach
is sufficient, or if an XML-based approach is called for. From my
study the last few years in related systems, the XML-based approach is
worth the extra work to get there, provided the XML vocabulary is
properly chosen and consistently applied.

So, your saying that "trust me, ZML is sufficient", is itself an
insufficient statement. It's like George Bush saying "trust me, the
invasion of Iraq is justified." You even come across like George
Bush who "knows" what's good for us but doesn't bother to explain why.
Just "trust me, I know what's good for you."

Of course, as just noted, there's no agreed to requirements list on
which to base any important decision upon, so this debate is sort of
being conducted in the dark.

Nevertheless, several of the main players in DP and PG have a pretty
good intuitive feeling that regularized text is not sufficient. Since
XML, properly done, will always surpass ZML in document structure
resolution, then the conservative position is XML (better to have more
machine-readable document structure than less -- one can always later
scale back on the markup if found unnecessary. But if there's a
million texts with insufficient structural resolution, then that's a
BIG problem.)

Also, developing a killer "viewer-app" system for ZML is not
sufficient to prove the merit of ZML, either, since visual
presentation is simply one use of digital texts. There's other uses
such as non-visual presentation, inter-publication linking,
annotation, searching/data-mining, machine translation, etc. There's
no doubt uses not yet recognized which may require more, not less,
document structural identification. Each use adds its own set of
requirements.

Jon Noring


From Bowerbird at aol.com  Tue Mar 28 15:19:18 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Mar 28 15:19:24 2006
Subject: [gutvol-d] another dose of reality
Message-ID: <1ad.495c14b8.315b1e76@aol.com>

holden said:
>    So, please take this as my plea for calm.

i'm _perfectly_ calm, holden...            :+)

sitting back and eating pudding,
as a matter of fact.   tastes good!         ;+)

but i don't even mind if people want to
spend their volunteer time doing x.m.l.
there will be a time and place for it too,
and i wish them the best of luck with it.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060328/19bc6c5b/attachment.html
From joshua at hutchinson.net  Tue Mar 28 18:02:04 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Mar 28 17:57:09 2006
Subject: [gutvol-d] another dose of reality
Message-ID: <20060329020204.641E1DA59F@ws6-6.us4.outblaze.com>


> ----- Original Message -----
> From: "Holden McGroin" <holden.mcgroin@dsl.pipex.com>
> 
> The flaw in your argument, I find, is your suggestion that since PG's
> XML teams have so far failed to produce any output, XML must be a bad
> strategy for PG. Your ZML-based strategy has brought forth results and
> so is better.
> 

PGDP *has* produced some XML based books.  Here is a short list of some of them (there are at least a few more, but this is a list I was able to put together quickly).  I'd also like to point out that this *partial* list is still far more than bowerbird's zml efforts.

http://www.gutenberg.org/etext/16697 - Epistle to the Son of the Wolf by Bah?'u'll?h
http://www.gutenberg.org/etext/16939 - Gems of Divine Mysteries by Bah?'u'll?h
http://www.gutenberg.org/etext/16940 - Gleanings from the Writings of Bah?'u'll?h by Bah?'u'll?h
http://www.gutenberg.org/etext/16941 - The Hidden Words of Bah?'u'll?h by Bah?'u'll?h
http://www.gutenberg.org/etext/16983 - The Kit?b-i-?q?n by Bah?'u'll?h
http://www.gutenberg.org/etext/16523 - The Kit?b-i-Aqdas by Bah?'u'll?h
http://www.gutenberg.org/etext/16984 - Prayers and Meditations by Bah?'u'll?h
http://www.gutenberg.org/etext/16985 - The Proclamation of Bah?'u'll?h by Bah?'u'll?h
http://www.gutenberg.org/etext/16986 - The Seven Valleys and the Four Valleys by Bah?'u'll?h
http://www.gutenberg.org/etext/17309 - The Summons of the Lord of Hosts by Bah?'u'll?h
http://www.gutenberg.org/etext/17310 - Tablets of Bah??u?ll?h Revealed after the Kitab-i-Aqdas by Bah?'u'll?h
http://www.gutenberg.org/etext/15697 - True Stories of History and Biography by Nathaniel Hawthorne


JHutch
From marcello at perathoner.de  Wed Mar 29 07:56:03 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Mar 29 07:56:07 2006
Subject: [gutvol-d] another dose of reality
In-Reply-To: <29b.8221559.315b065c@aol.com>
References: <29b.8221559.315b065c@aol.com>
Message-ID: <442AAE13.6050401@perathoner.de>

Bowerbird@aol.com wrote:

> add to the equation now that i'm showing,
> with real example-books, that z.m.l. can
> convert to multiple formats quite easily,
> on the user's desktop, via button-clicks,

You forgot to say that the "user's desktop" has to be the computer of 
your alleged girlfriend, a 1991 Mac running System 7.5. Because on 
everybody else's desktop it just prints the splash screen and crashes.


> how long does t.e.i./x.m.l./whatever remain
> on the table as "the official plan" before it's
> required to show some action and results?

There is no official plan.

If you can contribute a working technology and convince the people at DP 
to adopt it, you win. If not, you lose. That's that.

But wait! All you did in the last three years was to steal the time and 
to insult everybody who was trying to do some real work. You'll have a 
hard time getting people to adopt your gadgets because everybody hates 
you. And that's a dose of reality for you.


What about starting a book distribution site yourself? Servers are 
cheap. A genius like you will convert the PG library into all kinds of 
formats in no time at all. Go ahead insted of wasting your precious time 
with blockheads like us!



-- 
Marcello Perathoner
webmaster@gutenberg.org

From creeva at gmail.com  Wed Mar 29 08:16:11 2006
From: creeva at gmail.com (Brent Gueth)
Date: Wed Mar 29 08:21:59 2006
Subject: [gutvol-d] another dose of reality
In-Reply-To: <442AAE13.6050401@perathoner.de>
Message-ID: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com>

Now I'm not going to get in the middle of this on going feud, but what
readers do we have for each format?  The TEI formats are listed there but
what do you use to read them so you do not see all the markup encoding.
Firefox shows the markup when you open the file.   Notepad shows the markup
(to be expected).   MS Word just crashes upon opening because it says that
there are problems with the contents.

Going to the gutenberg help page and looking under formats listed there is
no information on it (this should have been the first thing done before
placing a new format on the site).   I understand that behind the scenes
there are tools to get these things handled, but why put them on the public
site if the tools are not available and a description is not available on
how to use them. 




-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Marcello Perathoner
Sent: Wednesday, March 29, 2006 10:56 AM
To: Project Gutenberg Volunteer Discussion
Cc: Bowerbird@aol.com
Subject: Re: [gutvol-d] another dose of reality

Bowerbird@aol.com wrote:

> add to the equation now that i'm showing,
> with real example-books, that z.m.l. can
> convert to multiple formats quite easily,
> on the user's desktop, via button-clicks,

You forgot to say that the "user's desktop" has to be the computer of 
your alleged girlfriend, a 1991 Mac running System 7.5. Because on 
everybody else's desktop it just prints the splash screen and crashes.


> how long does t.e.i./x.m.l./whatever remain
> on the table as "the official plan" before it's
> required to show some action and results?

There is no official plan.

If you can contribute a working technology and convince the people at DP 
to adopt it, you win. If not, you lose. That's that.

But wait! All you did in the last three years was to steal the time and 
to insult everybody who was trying to do some real work. You'll have a 
hard time getting people to adopt your gadgets because everybody hates 
you. And that's a dose of reality for you.


What about starting a book distribution site yourself? Servers are 
cheap. A genius like you will convert the PG library into all kinds of 
formats in no time at all. Go ahead insted of wasting your precious time 
with blockheads like us!



-- 
Marcello Perathoner
webmaster@gutenberg.org

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

From joshua at hutchinson.net  Wed Mar 29 08:42:06 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Mar 29 08:36:19 2006
Subject: [gutvol-d] another dose of reality
Message-ID: <20060329164206.9C3394F533@ws6-5.us4.outblaze.com>


> ----- Original Message -----
> From: "Brent Gueth" <creeva@gmail.com>
> 
> Now I'm not going to get in the middle of this on going feud, but what
> readers do we have for each format?  The TEI formats are listed there but
> what do you use to read them so you do not see all the markup encoding.
> Firefox shows the markup when you open the file.   Notepad shows the markup
> (to be expected).   MS Word just crashes upon opening because it says that
> there are problems with the contents.
> 

Good question, Brent.

The answer is that TEI is not really meant to be read by humans (just as HTML source code is not really meant to be read by humans).

The great thing about those TEI docs is that I only produced 1 file (the TEI document) and then a server automatically generated the ASCII, HTML and PDF formats.  I didn't have to manually fiddle with multiple files.

The TEI file is a text file, so technically you CAN read it in Notepad (or any text editor of your choice) but it won't be all that pretty.  Hopefully, the resulting files *from* it will be pretty, though.

The reason TEI is nice for this is that it provides a consistent markup method that is fairly easy for a computer to manipulate to "output" formats.

if you have any further questions, fire away.  I'll be happy to try to answer them.

Josh
From creeva at gmail.com  Wed Mar 29 08:45:38 2006
From: creeva at gmail.com (Brent Gueth)
Date: Wed Mar 29 08:44:59 2006
Subject: [gutvol-d] another dose of reality
In-Reply-To: <20060329164206.9C3394F533@ws6-5.us4.outblaze.com>
Message-ID: <008601c65350$38bf5470$6755fea9@Corp.Symantec.Com>

If it is supposed to be machine readable and the tools are not released with
a method of quickly editing them or a description on how they are usable to
the community, why are they posted on the web site.   I would think this
would lead to confusion to people that are privy to this mailing list and
the inside discussion.

This isn't an argument for or against any format, but to have something
customer facing (you don't sell anything but eyeballs are your customers and
proof of the work you are doing) you should have a customer facing reason or
explanation to have them there.   Whether it be the TEI format or the ZML
format or an XYZ format; there should be clear instructions of what the
format is and how it is handled if it is launched on the site. 


HTML, TXT, and PDF are considered ubiquitous in the internet age, but you
have the plucker format and an explanation of what it is.   I would also
have assumed when the first plucker texts were launched there was an
explanation of what the format is and hopefully the little html link next to
the download explaining what it is was there at launch.   To lessen doubt
that consumers are missing something to view these texts there should be a
link explaining what TEI is. 

-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Joshua Hutchinson
Sent: Wednesday, March 29, 2006 11:42 AM
To: Project Gutenberg Volunteer Discussion
Subject: RE: [gutvol-d] another dose of reality


> ----- Original Message -----
> From: "Brent Gueth" <creeva@gmail.com>
> 
> Now I'm not going to get in the middle of this on going feud, but what
> readers do we have for each format?  The TEI formats are listed there but
> what do you use to read them so you do not see all the markup encoding.
> Firefox shows the markup when you open the file.   Notepad shows the
markup
> (to be expected).   MS Word just crashes upon opening because it says that
> there are problems with the contents.
> 

Good question, Brent.

The answer is that TEI is not really meant to be read by humans (just as
HTML source code is not really meant to be read by humans).

The great thing about those TEI docs is that I only produced 1 file (the TEI
document) and then a server automatically generated the ASCII, HTML and PDF
formats.  I didn't have to manually fiddle with multiple files.

The TEI file is a text file, so technically you CAN read it in Notepad (or
any text editor of your choice) but it won't be all that pretty.  Hopefully,
the resulting files *from* it will be pretty, though.

The reason TEI is nice for this is that it provides a consistent markup
method that is fairly easy for a computer to manipulate to "output" formats.

if you have any further questions, fire away.  I'll be happy to try to
answer them.

Josh
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d

From joshua at hutchinson.net  Wed Mar 29 09:12:21 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Mar 29 09:07:10 2006
Subject: [gutvol-d] another dose of reality
Message-ID: <20060329171222.405E42F94D@ws6-3.us4.outblaze.com>

We do have tools and documentation on the website, but I'll admit there is not explanation link on the download page.

Here is a link to the TEI tools and documentation.

http://pgtei.pglaf.org/marcello/0.4/

Josh

> ----- Original Message -----
> From: "Brent Gueth" <creeva@gmail.com>
> To: "'Project Gutenberg Volunteer Discussion'" <gutvol-d@lists.pglaf.org>
> Subject: RE: [gutvol-d] another dose of reality
> Date: Wed, 29 Mar 2006 11:45:38 -0500
> 
> 
> If it is supposed to be machine readable and the tools are not released with
> a method of quickly editing them or a description on how they are usable to
> the community, why are they posted on the web site.   I would think this
> would lead to confusion to people that are privy to this mailing list and
> the inside discussion.
> 
> This isn't an argument for or against any format, but to have something
> customer facing (you don't sell anything but eyeballs are your customers and
> proof of the work you are doing) you should have a customer facing reason or
> explanation to have them there.   Whether it be the TEI format or the ZML
> format or an XYZ format; there should be clear instructions of what the
> format is and how it is handled if it is launched on the site.
> 
> 
> HTML, TXT, and PDF are considered ubiquitous in the internet age, but you
> have the plucker format and an explanation of what it is.   I would also
> have assumed when the first plucker texts were launched there was an
> explanation of what the format is and hopefully the little html link next to
> the download explaining what it is was there at launch.   To lessen doubt
> that consumers are missing something to view these texts there should be a
> link explaining what TEI is.
> 
> -----Original Message-----
> From: gutvol-d-bounces@lists.pglaf.org
> [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Joshua Hutchinson
> Sent: Wednesday, March 29, 2006 11:42 AM
> To: Project Gutenberg Volunteer Discussion
> Subject: RE: [gutvol-d] another dose of reality
> 
> 
> > ----- Original Message -----
> > From: "Brent Gueth" <creeva@gmail.com>
> >
> > Now I'm not going to get in the middle of this on going feud, but what
> > readers do we have for each format?  The TEI formats are listed there but
> > what do you use to read them so you do not see all the markup encoding.
> > Firefox shows the markup when you open the file.   Notepad shows the
> markup
> > (to be expected).   MS Word just crashes upon opening because it says that
> > there are problems with the contents.
> >
> 
> Good question, Brent.
> 
> The answer is that TEI is not really meant to be read by humans (just as
> HTML source code is not really meant to be read by humans).
> 
> The great thing about those TEI docs is that I only produced 1 file (the TEI
> document) and then a server automatically generated the ASCII, HTML and PDF
> formats.  I didn't have to manually fiddle with multiple files.
> 
> The TEI file is a text file, so technically you CAN read it in Notepad (or
> any text editor of your choice) but it won't be all that pretty.  Hopefully,
> the resulting files *from* it will be pretty, though.
> 
> The reason TEI is nice for this is that it provides a consistent markup
> method that is fairly easy for a computer to manipulate to "output" formats.
> 
> if you have any further questions, fire away.  I'll be happy to try to
> answer them.
> 
> Josh
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

>

From marcello at perathoner.de  Wed Mar 29 09:08:30 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Mar 29 09:08:39 2006
Subject: [gutvol-d] another dose of reality
In-Reply-To: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com>
References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com>
Message-ID: <442ABF0E.6000607@perathoner.de>

Brent Gueth wrote:

> Going to the gutenberg help page and looking under formats listed there is
> no information on it (this should have been the first thing done before
> placing a new format on the site).

Most of the formats are not described on the help page. Reason: I don't 
have the time to do it.

If you want to help, here is a list of all formats offered on PG. 
Provide a suitable description (short and concise) of any and I'll post 
it on the site.


gutenberg=> SELECT * from filetypes order by pk;
      pk     |           filetype           | sortorder |       mediatype
------------+------------------------------+-----------+------------------------ 
?          | Unspecified                  |       100 |
  avi        | MS Video                     |        10 |
  css        | CSS Stylesheet               |        10 |
  doc        | MS Word Document             |        10 |
  dvi        | TeX Device Independent       |        10 |
  eps        | Encapsulated PostScript      |        10 | 
application/postscript
  gif        | GIF Picture                  |        10 | image/gif
  html       | HTML                         |         5 | text/html
  index      | Index                        |         3 |
  iso        | ISO CD/DVD Image             |         7 |
  jpg        | JPEG Picture                 |        10 | image/jpeg
  license    | License                      |         2 |
  lit        | MS Lit for PocketPC          |        10 |
  ly         | LilyPond                     |        10 |
  md5        | MD5 Checksum                 |         8 |
  mid        | MIDI                         |        10 |
  mp3        | MP3 Audio                    |        20 | audio/mpeg
  mpg        | MPEG Video                   |        10 | video/mpeg
  mus        | Finale                       |        10 |
  nfo        | Proprietary `Folio' format   |        50 |
  pageimages | Raw Page Images              |        50 |
  pdb        | Palm Database                |        10 | 
application/vnd.palm
  pdf        | Adobe PDF                    |        10 | application/pdf
  png        | PNG Picture                  |        10 | image/png
  prc        | Palm Database                |        10 | 
application/vnd.palm
  ps         | PostScript                   |        10 | 
application/postscript
  ps2        | PostScript Level 2           |        10 | 
application/postscript
  qt         | Quicktime Video              |        10 | video/quicktime
  readme     | Readme                       |         1 |
  rtf        | MS Rich Text Format          |        10 | text/rtf
  sib        | Sibelius                     |        10 |
  svg        | SVG                          |        10 |
  tei        | TEI Text Encoding Initiative |        50 |
  tex        | TeX                          |        10 |
  tiff       | TIFF Picture                 |        10 | image/tiff
  tr         | Tome Raider                  |        10 |
  txt        | Plain text                   |         7 | text/plain
  wav        | MS Wave Audio                |        10 |
  xml        | XML                          |        50 | text/xml
  xsl        | XSLT Stylesheet              |        10 |
(40 rows)


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Wed Mar 29 09:29:28 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Mar 29 09:29:44 2006
Subject: [gutvol-d] another dose of reality
Message-ID: <141.584d3e4a.315c1df8@aol.com>

joshua said:
>    PGDP *has* produced some XML based books.? 
>    Here is a short list of some of them 
>    (there are at least a few more, but this 
>    is a list I was able to put together quickly).

well yes, and you are to be commended, josh,
for your work in actually doing some markup.

as should marcello for the groundwork he has done,
and jim tinsley for setting a sensible adoption policy
(which is very important and not as easy as it looks).

without you three, the little progress that _has_
been made would not have been accomplished.

and i wish you the best in your future efforts.


>    I'd also like to point out that this *partial* list 
>    is still far more than bowerbird's zml efforts.

well, you might want to enjoy that "lead"
as much as possible while you still can...           :+)

***

as long-time subscribers know, we've already done
the discussion thing on this topic for long enough,
it's time for pudding now, so i'll limit my replies to:

1)   jon, that was a lot of verbiage, but i don't think
you said anything of substance, so no reply for you.

2)   holden, i think you mischaracterized my aims,
since i'm not merely "reworking" old e-texts but
laying out a workflow to handle new ones as well,
both digitized paper-books and "born-digital" ones.
(my next few examples will be in the latter category.)

3)   marcello, i don't know if it's your italian "nature"
or your german "nurture", but you certainly seem to
have a penchant for disinformation and "the big lie".
i always feel slimed after i read one of your posts.   ick!

anyway, folks, no need to have this discussion again.
anyone who wants to see it can go read the archives.
it's pudding time now -- show proof, or go home...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060329/16bf8f9d/attachment.html
From joey at joeysmith.com  Wed Mar 29 10:44:22 2006
From: joey at joeysmith.com (joey)
Date: Wed Mar 29 10:59:09 2006
Subject: File formats and the website (was Re: [gutvol-d] another dose of
	reality)
In-Reply-To: <442ABF0E.6000607@perathoner.de>
References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com>
	<442ABF0E.6000607@perathoner.de>
Message-ID: <20060329184422.GA30671@joeysmith.com>

Marcello:

How about linking to wikipedia for those we don't have descriptions for?
If that's something we're interested in, I'd even be willing to take a
stab at creating new Wikipedia entries for those that don't exist in either
location.

Also, have we thought about making a PG wiki? It might take some of this
load off of you, even if we do it as a "staged to the wiki, rolled out
to 'production' on a given cycle after all changes in the current cycle
have been examined by one of the following trusted people..." to prevent
unsavory types from pollution the production site. I'd be more than willing
to help setup, maintain, and cross-populate such, if it's something PG is
interested in.


Here are some wikipedia links for you to use if you so choose. If you want
me to make the additional entries, just let me know.

avi: http://en.wikipedia.org/wiki/.avi
css: http://en.wikipedia.org/wiki/Cascading_Style_Sheets
dvi: http://en.wikipedia.org/wiki/DVI_file_format
eps: http://en.wikipedia.org/wiki/Encapsulated_PostScript
gif: http://en.wikipedia.org/wiki/GIF
html: http://en.wikipedia.org/wiki/HTML
iso: http://en.wikipedia.org/wiki/ISO_image
lit: http://en.wikipedia.org/wiki/Microsoft_Reader
ly: http://en.wikipedia.org/wiki/GNU_LilyPond
md5: http://en.wikipedia.org/wiki/MD5
mp3: http://en.wikipedia.org/wiki/MP3
mus: http://en.wikipedia.org/wiki/Finale_notation_program
pdb: http://en.wikipedia.org/wiki/Palm_OS
pdf: http://en.wikipedia.org/wiki/Portable_Document_Format
prc: http://en.wikipedia.org/wiki/Palm_OS
ps: http://en.wikipedia.org/wiki/PostScript
qt: http://en.wikipedia.org/wiki/QuickTime
rtf: http://en.wikipedia.org/wiki/RTF
sib: http://en.wikipedia.org/wiki/Sibelius_notation_program
svg: http://en.wikipedia.org/wiki/SVG
tei: http://en.wikipedia.org/wiki/Text_Encoding_Initiative
tex: http://en.wikipedia.org/wiki/TeX
tiff: http://en.wikipedia.org/wiki/TIFF
tr: http://en.wikipedia.org/wiki/TomeRaider
xml: http://en.wikipedia.org/wiki/XML
xsl: http://en.wikipedia.org/wiki/XSL



Also, you may or may not want to apply any of the following to the database.
These are brought over from the /etc/mime.types on my debian box, so apply
salt to taste. [See, I can make food analogies too! ;)]
update filetypes set mediatype = 'video/x-msvideo' where pk = 'avi';
update filetypes set mediatype = 'text/css' where pk = 'css';
update filetypes set mediatype = 'application/msword' where pk = 'doc';
update filetypes set mediatype = 'application/x-dvi' where pk = 'dvi';
update filetypes set mediatype = 'application/x-iso9660-image' where pk = 'iso';
update filetypes set mediatype = 'audio/midi' where pk = 'mid';
update filetypes set mediatype = 'text/plain' where pk = 'readme';
update filetypes set mediatype = 'image/svg+xml' where pk = 'svg';
update filetypes set mediatype = 'audio/x-wav' where pk = 'wav';
update filetypes set mediatype = 'application/xml' where pk = 'xsl';

From marcello at perathoner.de  Wed Mar 29 12:39:41 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Mar 29 12:39:46 2006
Subject: File formats and the website (was Re: [gutvol-d] another dose
	of	reality)
In-Reply-To: <20060329184422.GA30671@joeysmith.com>
References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com>	<442ABF0E.6000607@perathoner.de>
	<20060329184422.GA30671@joeysmith.com>
Message-ID: <442AF08D.5000702@perathoner.de>

joey wrote:

> How about linking to wikipedia for those we don't have descriptions for?
> If that's something we're interested in, I'd even be willing to take a
> stab at creating new Wikipedia entries for those that don't exist in either
> location.

What we need is a short description like those you find here:

   http://www.gutenberg.org/help/bibrec#format

The average Wikipedia entry is too complex for people who just want to 
know which format to download.


> Also, have we thought about making a PG wiki?

There already is a wiki for the newsletter editors ... not used very often.



-- 
Marcello Perathoner
webmaster@gutenberg.org

From jon at noring.name  Wed Mar 29 12:48:20 2006
From: jon at noring.name (Jon Noring)
Date: Wed Mar 29 12:48:27 2006
Subject: File formats and the website (was Re: [gutvol-d] another dose of
	reality)
In-Reply-To: <442AF08D.5000702@perathoner.de>
References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com>
	<442ABF0E.6000607@perathoner.de> <20060329184422.GA30671@joeysmith.com>
	<442AF08D.5000702@perathoner.de>
Message-ID: <1628983256.20060329134820@noring.name>

Marcello wrote:
> joey wrote:

>> How about linking to wikipedia for those we don't have descriptions for?
>> If that's something we're interested in, I'd even be willing to take a
>> stab at creating new Wikipedia entries for those that don't exist in either
>> location.

> What we need is a short description like those you find here:
>
>    http://www.gutenberg.org/help/bibrec#format
>
> The average Wikipedia entry is too complex for people who just want to
> know which format to download.

Well, whoever writes the short descriptions can link to Wikipedia
articles describing the media types in more detail, and for those
where there's no Wikipedia description, to write them.

But I agree with Marcello the first step is to write the short
descriptions.

Jon

From joey at joeysmith.com  Wed Mar 29 13:04:17 2006
From: joey at joeysmith.com (joey)
Date: Wed Mar 29 13:04:57 2006
Subject: File formats and the website (was Re: [gutvol-d] another dose
	of	reality)
In-Reply-To: <442AF08D.5000702@perathoner.de>
References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com>
	<442ABF0E.6000607@perathoner.de>
	<20060329184422.GA30671@joeysmith.com>
	<442AF08D.5000702@perathoner.de>
Message-ID: <20060329210417.GB30671@joeysmith.com>

On Wed, Mar 29, 2006 at 10:39:41PM +0200, Marcello Perathoner wrote:
> joey wrote:
> 
> >How about linking to wikipedia for those we don't have descriptions for?
> >If that's something we're interested in, I'd even be willing to take a
> >stab at creating new Wikipedia entries for those that don't exist in either
> >location.
> 
> What we need is a short description like those you find here:
> 
>   http://www.gutenberg.org/help/bibrec#format
> 
> The average Wikipedia entry is too complex for people who just want to 
> know which format to download.

	I can do that. I hadn't ever been to this page, so I didn't know what
	the expectation was.

> >Also, have we thought about making a PG wiki?
> 
> There already is a wiki for the newsletter editors ... not used very often.

	That's a "not interested"? As I've previously mentioned, I'd be glad to help
	maintain stuff, but it's hard when so much of it exists only in your head.
	Or is there some documentation or a mailing list I'm not aware of?


	For my part, I don't use the newsletter editor wiki because I'm not a newsletter
	editor. :)
From joey at joeysmith.com  Wed Mar 29 13:20:51 2006
From: joey at joeysmith.com (joey)
Date: Wed Mar 29 13:21:31 2006
Subject: File formats and the website (was Re: [gutvol-d] another dose
	of	reality)
In-Reply-To: <442AF08D.5000702@perathoner.de>
References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com>
	<442ABF0E.6000607@perathoner.de>
	<20060329184422.GA30671@joeysmith.com>
	<442AF08D.5000702@perathoner.de>
Message-ID: <20060329212051.GC30671@joeysmith.com>

So, before I get *too* far down this path, here's what I've come up with
so far. Is this usable to you?


AVI:
AVI files can contain both audio and video. They can generally be played with media players such as Windows Media Player, WinAmp, or Mplayer. See <a href="http://en.wikipedia.org/wiki/.avi">here</a> for more information.

CSS:
CSS (Cascading Style Sheets) are generally used to make HTML pages look nice, and are not intended for direct viewing. Your web browser will find these files as referenced by the HTML files that use them. See <a href="http://en.wikipedia.org/wiki/Cascading_Style_Sheets">here</a> for more information.

DVI:
The output format of a typesetting system called TeX. Generally more common on Unix-like platforms. Can be viewed using xdvi or Evince. See <a href="http://en.wikipedia.org/wiki/DVI_file_format">here</a> for more information.

EPS:
Short for "Encapsulated PostScript", it can generally be viewed with any PostScript viewer. A free PostScript viewer is available at <a href="http://www.cs.wisc.edu/~ghost/doc/AFPL/index.htm. See <a href="http://en.wikipedia.org/wiki/Encapsulated_PostScript">here</a> for more information.

GIF:
An image format generally viewable by any web browser. See <a href="http://en.wikipedia.org/wiki/GIF">here</a> for more information.

ISO:
A logical copy of a CD-ROM or other optical media. Most CD/DVD authoring utilities can deal with ISO images. A free tool for mounting these images on a Windows
machine as though they were inserted into a CD-ROM drive is available <a href="http://www.daemon-tools.cc/dtcc/download.php">here</a>. A tool for burning ISOs to a physical CD-R or CD-RW on Windows is available <a href="http://isorecorder.alexfeinman.com/isorecorder.htm">here</a>.

From Bowerbird at aol.com  Wed Mar 29 13:49:14 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Mar 29 13:49:26 2006
Subject: [gutvol-d] any graphic designers out there?
Message-ID: <2e8.47b10ad.315c5ada@aol.com>

please help a design-challenged e-book programmer...

check this out:
>?? http://www.greatamericannovel.com/meyer/shot.html

at the upper-left is a design for the title-page/cover.
at the upper-right is the same thing, with coordinates,
so you can tell how you'd suggest anything be moved...

the upper-left design is repeated at lower-left, with
an alternate design at lower-right.? do you prefer it?

this isn't just for this one cover, or i wouldn't bother...
it's concerning how i will write the all-purpose routine
for formatting covers, so i'd like to do a good job of it,
since it will be for thousands and thousands of books...

i noticed josh uses left-justified headers in his .tei books;
do people think that looks nice?   or is the old-fashioned
centering still the best way to go?   (i think so, but i don't
want to be too inflexible, so i'm willing to consider it all.)

any other suggestions -- a splash of color or what-not? --
would be welcome as well to spruce up the look of this and
move it into the digital world of the 21st-century e-book...

while i'm at it, here's one of the backgrounds i've been using.
>   http://www.greatamericannovel.com/meyer/goodbook.jpg

any feedback on that would be greatly appreciated, as would
a reworking of your own design.   (credit granted, naturally...)

and here's a nice "page" background from brewster kahle:
>   http://www.greatamericannovel.com/meyer/leftblank.jpg

combining my gutter with the page from brewster gives us:
>    http://www.greatamericannovel.com/meyer/blank.html
(the colors don't match up, but you get the idea here; breaking
the overall image up into pieces like these might be necessary.)

anyway, if this is fun for anyone out there, have at it...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060329/ceab10fd/attachment.html
From prosfilaes at gmail.com  Wed Mar 29 22:33:01 2006
From: prosfilaes at gmail.com (David Starner)
Date: Wed Mar 29 22:33:04 2006
Subject: [gutvol-d] another dose of reality
In-Reply-To: <008601c65350$38bf5470$6755fea9@Corp.Symantec.Com>
References: <20060329164206.9C3394F533@ws6-5.us4.outblaze.com>
	<008601c65350$38bf5470$6755fea9@Corp.Symantec.Com>
Message-ID: <6d99d1fd0603292233j7aa17276mefb93098c0da4b30@mail.gmail.com>

On 3/29/06, Brent Gueth <creeva@gmail.com> wrote:
> If it is supposed to be machine readable and the tools are not released with
> a method of quickly editing them or a description on how they are usable to
> the community, why are they posted on the web site.

Because it is the format of choice of certain people, and if it is to
be the master format for that etext, it should be available to
everyone. It would be against the policy of PG to find the master
format and only let certain people make changes to the master format
and use it to regenerate all the different forms.
From c.shepard at yahoo.com  Thu Mar 30 07:20:21 2006
From: c.shepard at yahoo.com (Chris Shepard)
Date: Thu Mar 30 08:13:21 2006
Subject: [gutvol-d] rdfterms
Message-ID: <20060330152021.45462.qmail@web38004.mail.mud.yahoo.com>

Hi,

The catalog.rdf file shows the pgterms namespace as living at
http://www.gutenberg.org/rdfterms, but it's not there. (i.e., 404)

Could someone send me the proper URL?

Many thanks.



"Computer Science is no more about computers than astronomy is 
 about telescopes."	
                                             -- E. W. Dijkstra

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
From marcello at perathoner.de  Thu Mar 30 08:55:57 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Mar 30 08:56:02 2006
Subject: [gutvol-d] rdfterms
In-Reply-To: <20060330152021.45462.qmail@web38004.mail.mud.yahoo.com>
References: <20060330152021.45462.qmail@web38004.mail.mud.yahoo.com>
Message-ID: <442C0D9D.1010501@perathoner.de>

Chris Shepard wrote:

> The catalog.rdf file shows the pgterms namespace as living at
> http://www.gutenberg.org/rdfterms, but it's not there. (i.e., 404)
> 
> Could someone send me the proper URL?

A namespace is not an URL (though it very much looks like one).

"The attribute's normalized value MUST be either an IRI reference ? the 
namespace name identifying the namespace ? or an empty string. The 
namespace name, to serve its intended purpose, SHOULD have the 
characteristics of uniqueness and persistence. It is not a goal that it 
be directly usable for retrieval of a schema (if any exists). Uniform 
Resource Names [RFC2141] is an example of a syntax that is designed with 
these goals in mind. However, it should be noted that ordinary URLs can 
be managed in such a way as to achieve these same goals."

   http://www.w3.org/TR/xml-names11/#ns-decl



-- 
Marcello Perathoner
webmaster@gutenberg.org

From gbuchana at rogers.com  Thu Mar 30 16:31:12 2006
From: gbuchana at rogers.com (Gardner Buchanan)
Date: Thu Mar 30 16:37:13 2006
Subject: [gutvol-d] any graphic designers out there?
In-Reply-To: <2e8.47b10ad.315c5ada@aol.com>
References: <2e8.47b10ad.315c5ada@aol.com>
Message-ID: <442C7850.7030007@rogers.com>

Hi there,

Bowerbird@aol.com wrote:

> the upper-left design is repeated at lower-left, with
> an alternate design at lower-right.  do you prefer it?
> 

Yes, I like the lower-right version better.

I like "--" rendered as an m-dash.

I'm not fussy about turning "Copyright" int the "(C)" symbol,
but suit yourself.  I think if you wanted that symbol, you
would use the (C) markup.

============================================================
Gardner Buchanan                       <gbuchana@rogers.com>
Ottawa, ON             FreeBSD: Where you want to go. Today.
From bzg at altern.org  Fri Mar 31 04:22:05 2006
From: bzg at altern.org (Bastien)
Date: Fri Mar 31 05:39:05 2006
Subject: [gutvol-d] Re: File formats and the website
In-Reply-To: <20060329212051.GC30671@joeysmith.com> (joey@joeysmith.com's
	message of "Wed, 29 Mar 2006 14:20:51 -0700")
References: <008201c6534c$1e830240$6755fea9@Corp.Symantec.Com>
	<442ABF0E.6000607@perathoner.de>
	<20060329184422.GA30671@joeysmith.com>
	<442AF08D.5000702@perathoner.de>
	<20060329212051.GC30671@joeysmith.com>
Message-ID: <87psk2olci.fsf@tallis.ilo.ucl.ac.uk>

joey <joey@joeysmith.com> writes:

> So, before I get *too* far down this path, here's what I've come up with
> so far. Is this usable to you?

I think it's a pretty good start. May i suggest you to have a look at
this: http://www.openformats.org/

Maybe it's often too didactic/normative for PG's purpose, but i think
you can grab some useful content.

Some excerpts:

* Plain text: http://www.openformats.org/en60

  Plain text (ASCII)

  Whenever possible, just avoid using formatted text: using plain text
  (either ascii or .txt format) guarantees complete access for
  everyone, regardless of their software, their operating system or
  the computer they are using. In your emails, if what is important to
  you is the content and not the formatting, send the text directly in
  the body of your message instead of sending it as an attachment.
  
  Plain text can carry no virus, it is extremely light and can be
  easily used to create tables (with tabs or commas) which any
  software is able to read.

* HTML: http://www.openformats.org/en61

  Hyper Text Markup Language (HTML)

  HTML format is the standard language for the web, and it was defined
  by an standardizing international organization (the
  W3_Consortium). HTML is a flexible universal format, rich and
  compact. Native HTML (with no javascript) can carry no virus and can
  be read on any platform.

  Note: The HTML code produced by Word is semi-proprietary, and it is
  prone to include information which cannot be displayed on all
  platforms.

* TeX, LaTeX, DVI: http://www.openformats.org/en62

  TeX, LaTeX and Device Independent Format (DVI)

  TeX is both a language to typeset documents and a programming
  language. Originally written to typeset mathematical documents in a
  professional manner, it is now used in many other areas.

  LaTeX is also a typsetting and programming language. It's actually a
  simplified version of TeX which enables top level instruction
  manipulation, just as HTML is a simplified version of SGML.

  DVI. A TeX or LaTeX source file must be compiled. The result of this
  compilation is in DVI format, readable on any platform. Most of the
  time, the result of the compilation will, in turn, be converted to
  PDF or PS.

* OpenDocument: http://www.openformats.org/en62x1

  OpenDocument is:

  a. An open, XML-based file format.

  b. An open standard, supported by the OASIS and ISO standards
     groups. 

  c. The default file format for OpenOffice.org 2.0 and KOffice 1.4. 

  d. A top prospect for an official format for the European
     Commission. 

  e. Our best chance to fight vendor lock-in associated with
     proprietary formats. 

* RTF: http://www.openformats.org/en63

  Rich Text Format (RTF)

  RTF format was introduced by Microsoft to create a standard format
  for text formatting. It offers the same format variety than DOC, all
  the while being (at least in its native version) a format with
  public specifications. Most word-processing programs are capable or
  reading and writing this format, but because certain programs tend
  to use proprietary extensions of this format, its compatibility
  remains uncertain.

* PS: http://www.openformats.org/en64

  PostScript (PS)

  The PostScript format is a language describing a page, developped by
  Adobe in 1985, created for printing and widely used in
  typography. One of its advantages is that it is universal (it is
  independent from the format of the original file) and it cannot
  carry viruses. Contrary to PDF format, PostScript does not allow to
  copy text viewed on a screen to paste it in another application. It
  can be generated with compatible printers (option: 'print in file')
  and with the GhostScript program.

* PDF: http://www.openformats.org/en65

  Portable Document Format (PDF)

  PDF format (Portable Document Format), developed by Adobe, is a
  document presentation format, the specifications for PDF are
  available on the web. It is a universal format (regardless of which
  platform and software are used to generate it), compatible with any
  printer, flexible (you can substitute fonts, add links, bookmarks,
  notes) and legible onscreen with the appropriate plugins. It can be
  generated with Adobe Acrobat, with the open source software
  GhostScript or created on the fly in a Unix environment.

* JPEG: http://www.openformats.org/en66

  Joint Photographic Expert Group (JPEG)

  JPEG is one of the most efficient picture compression formats
  currently available. This open format is very light and allows you
  to determine the rate of data compression, knowing that the higher
  the compression rate, the lower the quality of the picture. JPEG
  follows a process of cumulative compression: the image is clearly
  affected if you open it and save it with a new compression rate.

  A variant of this format, progressive JPEG, allows you to optimise
  the time it takes to display the picture on internet. The new
  JPEG_2000 standard, currently being defined, will allow for a better
  quality/compression ratio as well as the indexing of pictures with
  keywords.

* PNG: http://www.openformats.org/en67

  Portable Network Graphics (PNG)

  PNG-8 and PNG-24 are two open formats which are also
  license-free. They represent the principal alternative to the GIF
  format, specially created to optimise the display of images on
  internet. They allow data compression without loss of information
  and are supported by most browsers.

  The size of a PNG file remains significantly higher than its JPEG
  equivalent. However, PNG will advantageously replace GIF for images
  which are 8-bit or less.

* ... 

-- 
Bastien