From ag737 at freenet.carleton.ca  Tue Jan  1 11:45:20 2008
From: ag737 at freenet.carleton.ca (Wallace J.McLean)
Date: Tue, 01 Jan 2008 14:45:20 -0500
Subject: [gutvol-d] Happy Public Domain Day!
Message-ID: <1bba7e1ba8e4.1ba8e41bba7e@ncf.ca>


http://www.copyrightwatch.ca/?p=49


From Bowerbird at aol.com  Wed Jan  2 19:29:14 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 2 Jan 2008 22:29:14 EST
Subject: [gutvol-d] happy
Message-ID: <c27.2892954d.34adb08a@aol.com>

please.

happy new year.

thank you.

-bowerbird


**************************************
See AOL's top rated recipes 
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080102/fd364031/attachment.htm 

From Bowerbird at aol.com  Fri Jan  4 10:27:12 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 4 Jan 2008 13:27:12 EST
Subject: [gutvol-d] moby dick -- a report on the state of the art of
	digitization
Message-ID: <ce0.21edd084.34afd480@aol.com>

ok, i've told you recently that
o.c.r. from the o.c.a. is good.

and that it can be improved by
post-o.c.r. clean-up programs.

and then improved even further
by comparison with an existing
digitization, if one should exist,
to the point it can be _finished_,
quickly, even largely automatically.

***

here's a report in support of that...

i examined the o.c.a. first volume of "moby dick":
>    http://www.archive.org/details/mobydickorwhale01melvuoft

i did an initial comparison of their o.c.r.
with the e-text from project gutenberg:
>    http://www.gutenberg.org/etext/2489

it didn't take long to determine that the
p.g. e-text was from a different edition
than the one which the o.c.a. scanned...

the first tip-off was that the o.c.a. edition
used british spellings, not the american
ones which are there in the p.g. e-text.

this brings up a good point to consider
when we're talking about _comparison_
as a strategy for correcting o.c.r. text...

specifically, there are a number of things
that will cause superfluous "differences"
that need to be ignored in comparisons,
like american/british spelling variants...
you don't want these differences flagged.

also -- as is typical with british editions --
there was also a difference in quotemarks;
the british use single-quotemarks as default,
and nested quotes use double-quotemarks.
american editions use double-quotemarks
as the default, of course, with any internal
quotes signified by single-quotemarks...

in addition, one of the differences that you
will frequently find between editions involves
punctuation (especially colons and semicolons,
as well as various takes on hyphenated words),
and this is particularly true when one edition is
from a british publisher, the other an american.

there were other superficial differences
between these two texts.   as a quick list:
1. chapter numbers (roman versus arabic)
2. headings (all-upper versus mixed-case)
3. chapter initial capped words (versus not)
4. block indents (i.e., the o.c.a. text had none)

finally, one of the biggest complicating factors
on comparing these e-texts is due to a massive
_incompetence_ in the o.c.a. workflow, namely
that they lose all the em-dashes in their text...

that's right, you heard me correctly.

they lose all the em-dashes in their o.c.r. text.

some stupid person somewhere has evidently
mis-set some toggle, discarding em-dashes...

a glitch this big is ridiculously unforgivable...
my mind is just boggled that they could even
_make_ such a stupid mistake.   but they did...

even worse, i have tried -- tried repeatedly --
to bring it to their attention.   yet it persists...

this is just plain frustrating.   and i've decided
that i will make it one of my missions in 2008
to get them to fix this glitch.   wish me luck, eh?

in the meantime, though, we've gotta accept it,
and move on with our mission.

so, when you look at the results i will show you,
you need to keep in mind the following gotchas:
1. i have removed all the dashes from the results.
2. i have deleted quotemarks from the results too.

this means, of course, that this report will slightly
_under-estimate_ the number of errors present,
since none of the errors involving quotemarks or
em-dashes will be detected by the analysis here...

another frustration with the o.c.a. workflow is
that they lose the pagebreaks from their o.c.r.,
which means that our first task is restoring those.

once again, this is a _stupid_human_decision_,
and it reflects _extremely_poorly_ on the o.c.a.

in comparison, however, google is even worse.
google routinely loses not just pagebreaks and
em-dashes, but single- and double-quotemarks,
and the hyphens from end-line hyphenates too.
it's hard to imagine such mind-blowing stupidity
manifested by one of the richest businesses in the
world, ordinarily not _nearly_ so dumb about text.
but there it is, in black-and-white, for all to see...

i mean, _seriously_, doesn't anyone at these big
digitization projects even _look_ at their output?

these projects have scanned _millions_ of books
-- _literally_millions_ -- yet they remain oblivious
to a problem that reveals itself within 5 minutes!
it's sad.   and not just plain sad, it's tragically sad...

***

the first thing i had to do was fix the linebreaks
in the p.g. e-text so that they would conform to
the linebreaks in the o.c.r. text, to be compared...

after that, i went on to the next task, involving
-- wait a minute, did you just read over that and
accept it, as some kind of simple, routine task?
if so, think again.   restoring those linebreaks was
a difficult task, one that took far too much time.
sure, i wrote a program that did most of it, but
_that_ took some time.   and cleanup took more.

and -- since there was no good reason for p.g. to
rewrap the lines in the first place -- the time that
it took to restore the original linebreaks was just
wasted time.   i did it, because i had to, in order to
do this experiment, but still, it was a waste of time.

more on rewrapping later...

anyway...

this first volume of moby dick runs right at 600k.
so even though it's only one-half of "moby dick",
by itself it would constitute a relatively large book.

moreover, it consists of 11,411 (non-blank) lines.
keep that number in mind as i discuss my results.

my post-o.c.r. clean-up program is "in progress",
as i continue to improve it in an iterative process,
running it and then comparing outputs, and then
improving and re-running it for more comparison.

this means that the numbers are kind of "spongy",
in the sense that i could keep improving the app
till there are virtually _no_ differences remaining
between the output it gives and a "criterion" text.

but given a fair amount of "twiddling" for this book,
i came up with roughly _444_ lines which _differed_
between these versions of volume 1 of moby dick...

that's roughly 4% of the 11,411 lines, a percentage
which is very close to what i've gotten in other tests...

however, you should keep in mind that some of the
differences between these two editions were due to
the fact that they _are_ separate editions, meaning
that some of the 444 differences are accounted for
by _edits_ that were made in the (later) p.g. edition.

further, the p.g. text had some errors in it as well,
which cause differences between that text and the
o.c.r. output that o.c.a. obtained from their edition.

i estimate that fully _half_ of the 444 different lines
were _not_ due to errors in the o.c.r., which means
that only _2%_ of the 11,411 lines reflect o.c.r. errors.
this figure is also consistent with my previous results.

this means that we could proof these o.c.r. results to
a very high standard of accuracy by simply examining
the 222 lines that were different from the p.g. e-text.

thus far, i haven't even looked at the scans themselves,
so i cannot give estimates of how many of the 222 lines
were _actually_ o.c.r. errors.   some of them were likely
errors in the original book, which the o.c.r. recognized
_correctly_, and thus will not count against its accuracy.

plus there are those cases where so-called "o.c.r. errors"
should really be attributed to the human operators and
the workflow that they have created around the process.

(an example of this is the garbage characters that o.c.r.
throws when it encounters pencil-marks in the margins;
a proper workflow "standardizes" the scans by cropping,
so the o.c.r. has a "bounding box" around the text-block,
and won't even extend recognition out into the margins.
it's unfair to blame o.c.r. for _our_ workflow deficiencies.)

when all is said and done, i expect that we will see about
111 lines with o.c.r. errors, or 1% of the 11,411 total lines.

if we have tools that can focus us in on those 111 lines,
it's obvious that we can proof our books _much_ faster
than the look-at-every-word-on-every-page method...

and this means that the combination of good o.c.a. o.c.r.
and an aggressive post-o.c.r. clean-up program can create
text that is phenomenally accurate, with little human help.
considering the millions of scan-sets we need to digitize,
this is good news indeed.

these results confirm those i've obtained on 2 books earlier:
>    http://www.pgdp.net/phpBB2/viewtopic.php?t=24008

***

next week, i'll share extensive results from this test.

i'll show you the global changes that were made by
my post-o.c.r. clean-up application-tool.

i'll show you the files that ended up being compared.

i'll show you the lines that differed between the files.

i'll show you how i categorized these different lines.
(some were edits, some were errors in the p.g. e-text,
and some were punctuation differences; the rest were
the lines the o.c.r. _probably_ recognized incorrectly.)

if i've gotten around to it, i'll let you know the results
of my check of these differences against the scans...

finally, i'll show you the lines that _might_ have had
"stealth scannos" on them -- lines which _might_ be
a _problem_ with the comparison method of proofing,
were they found to be a relatively common occurrence.

***

for the weekend, however, chew on this little thought:
of the 11,411 lines in this e-text, some 11,000 of them
were digitized correctly, by the combination of the good
o.c.r. from the o.c.a. followed by my post-o.c.r. clean-up.

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080104/f10d8ecb/attachment.htm 

From Bowerbird at aol.com  Fri Jan  4 14:27:36 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 4 Jan 2008 17:27:36 EST
Subject: [gutvol-d] chinese versus english
Message-ID: <cb5.277325c7.34b00cd8@aol.com>

please.

in today's "posted" digest, the chinese versus english race is tight,
with chinese having 9 e-texts posted, and english surging with 10.
portuguese and esperanto had 1 each, to fill out the pack...

thank you.

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080104/f45de2d6/attachment.htm 

From julio.reis at tintazul.com.pt  Sat Jan  5 12:14:04 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Sat, 05 Jan 2008 20:14:04 +0000
Subject: [gutvol-d] chinese versus english
In-Reply-To: <mailman.2.1199563202.22361.gutvol-d@lists.pglaf.org>
References: <mailman.2.1199563202.22361.gutvol-d@lists.pglaf.org>
Message-ID: <1199564044.7678.74.camel@abetarda>

Speaking of Chinese e-texts... does anyone have a clue as to how these texts are produced? Have Chinese-speakers some web site like DP? Is it the work of people going solo? Or are these already available as e-texts in some other site?

Tintazul.


From sly at victoria.tc.ca  Sat Jan  5 12:25:28 2008
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat, 5 Jan 2008 12:25:28 -0800 (PST)
Subject: [gutvol-d] chinese versus english
In-Reply-To: <1199564044.7678.74.camel@abetarda>
References: <mailman.2.1199563202.22361.gutvol-d@lists.pglaf.org>
	<1199564044.7678.74.camel@abetarda>
Message-ID: <Pine.GSO.4.58.0801051221420.17321@vtn1.victoria.tc.ca>


On Sat, 5 Jan 2008, J?lio Reis wrote:

> Speaking of Chinese e-texts... does anyone have a clue as to how these texts are produced? Have Chinese-speakers some web site like DP? Is it the work of people going solo? Or are these already available as e-texts in some other site?
>
> Tintazul.

Interesting question. In case it helps, when I went back
through the posted list, checking the recent Chinese
texts, I was expecting to see one or two people who
were submitting them. Instead, after checking about 20
items, I only saw an email address duplicated once.
So it appears these are being submitted by many different
people. Since this is happening all at the same time,
it would be natural to assume that there is _some_
kind of organization behind it...


Andrew

From vze3rknp at verizon.net  Sat Jan  5 12:45:27 2008
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Sat, 05 Jan 2008 15:45:27 -0500
Subject: [gutvol-d] chinese versus english
In-Reply-To: <Pine.GSO.4.58.0801051221420.17321@vtn1.victoria.tc.ca>
References: <mailman.2.1199563202.22361.gutvol-d@lists.pglaf.org>
	<1199564044.7678.74.camel@abetarda>
	<Pine.GSO.4.58.0801051221420.17321@vtn1.victoria.tc.ca>
Message-ID: <477FEC67.2060708@verizon.net>


Andrew Sly wrote:
> On Sat, 5 Jan 2008, J?lio Reis wrote:
>
>   
>> Speaking of Chinese e-texts... does anyone have a clue as to how these texts are produced? Have Chinese-speakers some web site like DP? Is it the work of people going solo? Or are these already available as e-texts in some other site?
>>
>> Tintazul.
>>     
>
> Interesting question. In case it helps, when I went back
> through the posted list, checking the recent Chinese
> texts, I was expecting to see one or two people who
> were submitting them. Instead, after checking about 20
> items, I only saw an email address duplicated once.
> So it appears these are being submitted by many different
> people. Since this is happening all at the same time,
> it would be natural to assume that there is _some_
> kind of organization behind it...
My understanding, which may be wrong, is that there's a professor (in 
Taiwan?) who has his students transcribe texts, perhaps as part of a 
course. They then upload them to PG. That's why the Chinese texts come 
in clumps, with nothing for a very long time, and then a whole lot.

Juliet


From julio.reis at tintazul.com.pt  Sun Jan  6 13:05:28 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Sun, 06 Jan 2008 21:05:28 +0000
Subject: [gutvol-d] Chinese
In-Reply-To: <mailman.2.1199649602.5335.gutvol-d@lists.pglaf.org>
References: <mailman.2.1199649602.5335.gutvol-d@lists.pglaf.org>
Message-ID: <1199653528.19731.46.camel@abetarda>


> > My understanding, which may be wrong, is that there's a professor (in 
> > Taiwan?) who has his students transcribe texts, perhaps as part of a 
> > course. They then upload them to PG. That's why the Chinese texts come 
> > in clumps, with nothing for a very long time, and then a whole lot.

So all we need at the Portuguese team is to teach those hard-working
guys our language :-P because our 6th place in Gutenberg is as good as
gone.

Long live the Taiwanese powerhouse, and may many more spring up around
the Chinese-speaking world.

J?lio.


From Bowerbird at aol.com  Sun Jan  6 14:38:28 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 6 Jan 2008 17:38:28 EST
Subject: [gutvol-d] chinese versus english
Message-ID: <c5d.24c460bb.34b2b264@aol.com>

in the r.s.s. feed today from the posted list,
chinese beats english by a score of 3-2, but
german pulls a surprise and trumps all with 5.

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080106/b1962fad/attachment.htm 

From schultzk at uni-trier.de  Mon Jan  7 01:57:39 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Mon, 7 Jan 2008 10:57:39 +0100
Subject: [gutvol-d] moby dick -- a report on the state of the art of
	digitization
In-Reply-To: <ce0.21edd084.34afd480@aol.com>
References: <ce0.21edd084.34afd480@aol.com>
Message-ID: <6AE375A2-94C6-4314-B555-49ED68796102@uni-trier.de>

Hi Bowerbird,

	I have said it time and time again you will not get
	decent ocr or oca without a proper grammar (in the
	computer linguistic sense) and parser. Without these oca will have the
	problems you mentioned below.

	Far as your comparision English-American is concerned you need a
	decent dictionary/ies and a translation module:
		1) colour - color     easy enough to handle
		2) lorry - truck	a synonym dictionary could do this as well point one
		3) bonnet - trunk (as the trunk of a car)  this problem requires  
semantics
                                                                         
               or co-text analysis.

	1 and 2 are easy enough and are cheap to do automatically. Except if  
the author
	purposely mixes English and American (a very minute percentage I  
assume).

	3 is a different animal. It is the crux of any translation system.  
Yet, a well designed
	system will handle 90-95.  This type of project would take at least  
a year of man-power
	to produce, in other words far to expensive.

	Dashes are hard to differentiate for any automatic system. there are  
three kinds. Though they
	have different lengths, how can a system tell them apart. It would  
have to have intimate knowledge
	of the point size and font, which oc* systems do not have. Of course  
one could try to program the
	intelligence needed, but I assume it still would not work well.  
Humans are a lot better at this.

	Hyphenation and dashes can be differentiated with multi-line  
parsing, yet many programmers
	consider the effort not worth it.

	As time goes by oca will get grammar rules and more intelligence,  
just like the dictionaries and
	intelligence improved ocr.

	regards
		Keith.

	
Am 04.01.2008 um 19:27 schrieb Bowerbird at aol.com:

> ok, i've told you recently that
> o.c.r. from the o.c.a. is good.
>
> and that it can be improved by
> post-o.c.r. clean-up programs.
>
> and then improved even further
> by comparison with an existing
> digitization, if one should exist,
> to the point it can be _finished_,
> quickly, even largely automatically.
>
> ***
>
> here's a report in support of that...
>
> i examined the o.c.a. first volume of "moby dick":
> >   http://www.archive.org/details/mobydickorwhale01melvuoft
>
> i did an initial comparison of their o.c.r.
> with the e-text from project gutenberg:
> >   http://www.gutenberg.org/etext/2489
>
> it didn't take long to determine that the
> p.g. e-text was from a different edition
> than the one which the o.c.a. scanned...
>
> the first tip-off was that the o.c.a. edition
> used british spellings, not the american
> ones which are there in the p.g. e-text.
>
> this brings up a good point to consider
> when we're talking about _comparison_
> as a strategy for correcting o.c.r. text...
>
> specifically, there are a number of things
> that will cause superfluous "differences"
> that need to be ignored in comparisons,
> like american/british spelling variants...
> you don't want these differences flagged.
>
> also -- as is typical with british editions --
> there was also a difference in quotemarks;
> the british use single-quotemarks as default,
> and nested quotes use double-quotemarks.
> american editions use double-quotemarks
> as the default, of course, with any internal
> quotes signified by single-quotemarks...
>
> in addition, one of the differences that you
> will frequently find between editions involves
> punctuation (especially colons and semicolons,
> as well as various takes on hyphenated words),
> and this is particularly true when one edition is
> from a british publisher, the other an american.
>
> there were other superficial differences
> between these two texts.  as a quick list:
> 1. chapter numbers (roman versus arabic)
> 2. headings (all-upper versus mixed-case)
> 3. chapter initial capped words (versus not)
> 4. block indents (i.e., the o.c.a. text had none)
>
> finally, one of the biggest complicating factors
> on comparing these e-texts is due to a massive
> _incompetence_ in the o.c.a. workflow, namely
> that they lose all the em-dashes in their text...
>
> that's right, you heard me correctly.
>
> they lose all the em-dashes in their o.c.r. text.
>
> some stupid person somewhere has evidently
> mis-set some toggle, discarding em-dashes...
>
> a glitch this big is ridiculously unforgivable...
> my mind is just boggled that they could even
> _make_ such a stupid mistake.  but they did...
>
> even worse, i have tried -- tried repeatedly --
> to bring it to their attention.  yet it persists...
>
> this is just plain frustrating.  and i've decided
> that i will make it one of my missions in 2008
> to get them to fix this glitch.  wish me luck, eh?
>
> in the meantime, though, we've gotta accept it,
> and move on with our mission.
[snip snip]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080107/c6622173/attachment.htm 

From Bowerbird at aol.com  Mon Jan  7 10:43:18 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 7 Jan 2008 13:43:18 EST
Subject: [gutvol-d] moby dick -- a report on the state of the art of
	digitization
Message-ID: <d38.1c26c66c.34b3ccc6@aol.com>

please.

if anyone wants to do the same test
on volume 1 of moby dick, just to
"keep me honest", please feel free...

maybe someone could put it through
distributed proofreaders, so that we
can check my work when it emerges
from d.p., 1 or 2 or 3 years from now.

or, if you want to do it on volume 2,
go ahead, as i plan to do that next...
(split-half reliability tests, ya know.)

***

oh yeah.   i shoulda mentioned that
sometimes i take the difficult route
just to be perverse.

if you do not feel like doing that,
use moby10b instead of moby11,
as moby10b _is_ based upon the 
same edition the o.c.a. scanned...

(or at least i _thought_ that was so,
for some reason i can't remember,
but i might've been wrong on that.)

and if you _do_ stay with moby11,
be advised of a major deficiency at:
>    the scene of the catastrophe

_ironic_, because there is a loss of text here
-- involving 240+ words -- which makes this
"the scene of the catastrophe", quite indeed...

***

keith said:
>    you will not get decent ocr without a proper grammar

you seem not to notice that we _are_ getting "decent" o.c.r.

in fact, when combined with a good post-o.c.r. cleanup tool,
the results become quite _astounding_...


>    Far as your comparision English-American is concerned 
>    you need a decent dictionary/ies and a translation module:

i'm not sure what your point is, but it has no applicability here.

the only reason that i got any differences of this type is because
i was comparing an american edition with an english edition, and
that means i just have to harmonize the two, not do a translation.


>    Dashes are hard to differentiate for any automatic system.

um, again, no applicability here.   the dashes were recognized,
i'm quite sure, but a human glitch in the workflow drops 'em,
most probably because they are "high-bit ascii" characters...


>    they have different lengths, how can a system tell them apart

modern o.c.r. does quite well on this task.


>    Hyphenation and dashes can be differentiated with multi-line 
>    parsing, yet many programmers consider the effort not worth it.

oh please.   those programmers don't deserve the appellation...


>    As time goes by oca will get grammar rules and more intelligence, 
>    just like the dictionaries and intelligence improved ocr.

we'll have near-perfect recognition of the characters _long_ before then...

***

thank you.

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080107/747377e1/attachment.htm 

From Bowerbird at aol.com  Mon Jan  7 14:30:28 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 7 Jan 2008 17:30:28 EST
Subject: [gutvol-d] why cyberspace is cheaper than meat-space
Message-ID: <c26.26ef6df0.34b40204@aol.com>

please.

***

why cyberspace is cheaper than meat-space...

meatspace:   artist -> company -> wholesaler -> retailer -> audience

cyberspace: artist -> audience

***

thank you.

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080107/f7a4e1f9/attachment.htm 

From creeva at gmail.com  Tue Jan  8 06:04:26 2008
From: creeva at gmail.com (Brent Gueth)
Date: Tue, 8 Jan 2008 09:04:26 -0500
Subject: [gutvol-d] why cyberspace is cheaper than meat-space
In-Reply-To: <c26.26ef6df0.34b40204@aol.com>
References: <c26.26ef6df0.34b40204@aol.com>
Message-ID: <2510ddab0801080604k7606da6ha9f339c0eef989f0@mail.gmail.com>

I would say your close.

This would be more accurate:

 meatspace:  artist -> company -> wholesaler -> retailer -> audience

 cyberspace: artist -> distributor -> audience

While the artist may be the distributor themselves - there is web site
fees, Internet Access fees - these all go up on a curve the more
popular the artist is - and to match a normal revenue stream of
traditional publishing there will be costs associated with it - either
that or they aren't doing their taxes properly ;)

We take for granted that publishing on the Internet is "Free" - it's
not the free service for distribution normally would require you to go
the creative commons route and use archive.org to distribute your data
- or use a service that is advertising to your readers, consumers,
watchers, etc....in which you don't get a cut or can not adequately
control the content.

While the cost and barrier to entry in otherways has greatly been
reduced we can not turn a blind eye and says it doesn't cost anything
- to run the gutenberg it costs money - to run the archive.org it
costs money - just because the artist may or may not see it does not
negate the fact that it is there.

I agree with you that it is the more optimal all around method - I
just wanted to make sure that everything is realized.

On Jan 7, 2008 5:30 PM,  <Bowerbird at aol.com> wrote:
> please.
>
>  ***
>
>  why cyberspace is cheaper than meat-space...
>
>  meatspace:  artist -> company -> wholesaler -> retailer -> audience
>
>  cyberspace: artist -> audience
>
>  ***
>
>  thank you.
>
>  -bowerbird
>
>
>
> **************
> Start the year off right. Easy ways to stay in shape.
>  http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

From Bowerbird at aol.com  Tue Jan  8 09:57:23 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 8 Jan 2008 12:57:23 EST
Subject: [gutvol-d] why cyberspace is cheaper than meat-space
Message-ID: <c59.226aad7a.34b51383@aol.com>

brent said:
>   there is web site fees, Internet Access fees

right.   and i agree that in some cases, those can be substantial.

but i do not agree it means there should be a "distributor" node
in between the artist and the audience.   those nodes are meant
to indicate _entities_that _take_a _cut_ from the revenue stream.

web-site and internet access fees are the cost of doing business;
indeed, they're pretty much the cost of _being_human_ these days.

(yesterday i bought a domain-name for my sisters granddaughter,
an hour after she was born; no kid should be without a web-site.)

artists of various stripes have always had to incur costs to do art;
musicians have to pay money for lessons, and their instruments;
artists have to buy brushes and paints and canvas and stretchers;
sculptors have to put out a _ton_ of money for their raw material,
and tools to work it.   digital artists must buy hardware and software.

so your internet costs are just another expense on the ledger sheet.

i also say it's important to note that there _are_ ways to _minimize_
the costs of your digital distribution.   it's not _necessary_, and may
even be _undesirable_, for you to bear the full burden of distribution.
take advantage of the number-one digital benefit -- i.e., free copies.
give fans the explicit right to spread copies of your work far and wide.
allow them to put copies on their own web-sites, and let _them_ bear
some of the cost.   even better, seed copies to peer-to-peer networks,
so you don't have to even host an original copy on your own web-site.

myspace and facebook are now willing to host all kinds of your content
(although i encourage you to examine their terms of service carefully.)
youtube -- and other sites too -- will host your video for you, for free.
there's lots of photo-hosting sites (e.g., photobucket, flickr, shutterfly).
the internet archive -- and p.g. too -- are all too happy to host text...

also, lots of bloggers are finding that adsense pays their hosting bills.

so, while granting your point, i believe it's a relatively small concern...

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080108/0c2e2dbc/attachment.htm 

From creeva at gmail.com  Tue Jan  8 10:10:36 2008
From: creeva at gmail.com (Brent Gueth)
Date: Tue, 8 Jan 2008 13:10:36 -0500
Subject: [gutvol-d] why cyberspace is cheaper than meat-space
In-Reply-To: <c59.226aad7a.34b51383@aol.com>
References: <c59.226aad7a.34b51383@aol.com>
Message-ID: <2510ddab0801081010t6d5f713ak68c6b0e963da145e@mail.gmail.com>

I completely agree with you and say it's a small concern - it's just a
post production cost of business - compared ot the pre production cost
(supplies, lessons, etc.) in most regards it's nominal and can be
almost completely free -  it's just there and I wanted to throw it out
for those who think it's a completely free ride.

On Jan 8, 2008 12:57 PM,  <Bowerbird at aol.com> wrote:
> brent said:
>  >   there is web site fees, Internet Access fees
>
>  right.  and i agree that in some cases, those can be substantial.
>
>  but i do not agree it means there should be a "distributor" node
>  in between the artist and the audience.  those nodes are meant
>  to indicate _entities_that _take_a _cut_ from the revenue stream.
>
>  web-site and internet access fees are the cost of doing business;
>  indeed, they're pretty much the cost of _being_human_ these days.
>
>  (yesterday i bought a domain-name for my sisters granddaughter,
>  an hour after she was born; no kid should be without a web-site.)
>
>  artists of various stripes have always had to incur costs to do art;
>  musicians have to pay money for lessons, and their instruments;
>  artists have to buy brushes and paints and canvas and stretchers;
>  sculptors have to put out a _ton_ of money for their raw material,
>  and tools to work it.  digital artists must buy hardware and software.
>
>  so your internet costs are just another expense on the ledger sheet.
>
>  i also say it's important to note that there _are_ ways to _minimize_
>  the costs of your digital distribution.  it's not _necessary_, and may
>  even be _undesirable_, for you to bear the full burden of distribution.
>  take advantage of the number-one digital benefit -- i.e., free copies.
>  give fans the explicit right to spread copies of your work far and wide.
>  allow them to put copies on their own web-sites, and let _them_ bear
>  some of the cost.  even better, seed copies to peer-to-peer networks,
>  so you don't have to even host an original copy on your own web-site.
>
>  myspace and facebook are now willing to host all kinds of your content
>  (although i encourage you to examine their terms of service carefully.)
>  youtube -- and other sites too -- will host your video for you, for free.
>  there's lots of photo-hosting sites (e.g., photobucket, flickr,
> shutterfly).
>  the internet archive -- and p.g. too -- are all too happy to host text...
>
>  also, lots of bloggers are finding that adsense pays their hosting bills.
>
>  so, while granting your point, i believe it's a relatively small concern...
>
>  -bowerbird
>
>
>
>
> **************
> Start the year off right. Easy ways to stay in shape.
>  http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

From hart at pglaf.org  Tue Jan  8 15:19:25 2008
From: hart at pglaf.org (Michael Hart)
Date: Tue, 8 Jan 2008 15:19:25 -0800 (PST)
Subject: [gutvol-d] why cyberspace is cheaper than meat-space
In-Reply-To: <c59.226aad7a.34b51383@aol.com>
References: <c59.226aad7a.34b51383@aol.com>
Message-ID: <Pine.LNX.4.64.0801081518380.22241@pglaf.org>


I see all this noise about money.

Just where is this money supposed to be going?

Or coming from, for that matter?


mh

From Bowerbird at aol.com  Tue Jan  8 15:40:11 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 8 Jan 2008 18:40:11 EST
Subject: [gutvol-d] why cyberspace gives twice as much money to artists as
	meatspace does
Message-ID: <bec.18ad883a.34b563db@aol.com>

why cyberspace gives more money to artists than meatspace...

meatspace:? artist -$1-> company -$4-> wholesaler -$8-> retailer -$16-> 
audience

cyberspace: artist -$2-> audience

(dollar-amounts represent the selling-price per unit, so are paid 
right-to-left;
so out of the $16 price for an average book or music c.d., the artist 
receives $1.)

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080108/344dad1f/attachment.htm 

From Bowerbird at aol.com  Tue Jan  8 15:48:29 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 8 Jan 2008 18:48:29 EST
Subject: [gutvol-d] moby dick -- data in support of the report on the state
	of the art of digitization
Message-ID: <c33.229fb3dc.34b565cd@aol.com>

please.

here's those results from my research on comparing
cleaned-up o.c.r. results to an existing digitization...

to remind you, we're doing volume 1 of moby dick:
>    http://www.archive.org/details/mobydickorwhale01melvuoft
>    http://www.gutenberg.org/etext/2489

***

before beginning, let me make a brief comment on
"frankenstein" texts, assembled from many sources.

one of the p.g. versions of moby dick -- there are
three of them -- evolved through a few iterations:
>    This text is a combination of etexts, one from 
>    the now-defunct ERIS project at Virginia Tech 
>    and one from Project Gutenberg's archives.
>    The proofreaders of this version are indebted to 
>    The University of Adelaide Library for preserving 
>    the Virginia Tech version.   The resulting etext 
>    was compared with a public domain hard copy 
>    version of the text.

it doesn't matter.   each line is either right or wrong.
if a line is right, it doesn't matter how it came about.
and if it's wrong, it doesn't matter how it got that way.

i'll have more to say about "frankenstein" texts later,
but for now, the important point is "it doesn't matter".

****

the results of this experiment replicated 2 done before,
confirming a word-by-word examination of each page
is unnecessarily wasteful of human time and energy in
the production of a highly-accurate book digitization...

in this regard, it is perhaps useful to get an overview of
the state-of-the-site over at distributed proofreaders...

in 2007, d.p. posted 2,222 finished e-texts to p.g.

there are maybe 2-4 times that many books in their system
at the current time, being processed or in queues waiting...

it is for this reason that most books now can be expected
to take anywhere from 2-4 years to traverse the system...

most of the thousands of people who volunteer on the site
probably don't care much how long a book takes to digitize,
but _some_ do, and are becoming _increasingly_ unhappy
about the long time-period it takes to produce most books.

i've remarked before that i think d.p.'s workflow is grossly
inefficient, and that it wastes far too much time and energy
of the human beings who are volunteering their services...

i won't elaborate on it here, again, but i really do think that
it needs to be kept in mind when considering these results...

***

to do a comparison involves first a "shaping" of the files...

not to give away all my secrets here, but the initial step of
this "shaping" is to mold files with matching paragraphs...

for the most part, paragraphing in the p.g. e-text was clear,
but the o.c.a. text required quite a bit of work in this regard.
(e.g., runheads needed to be removed, paragraphs fixed...)

the following step was to rewrap the lines of the p.g. e-text
according to the linebreaks from the o.c.a. file.   as i've said,
this involved quite a lot of work as well, most of it because
i am writing and perfecting a program to perform this task.

the next step was to track down the _irrelevant_ differences,
and provide a means of controlling for them.   this wrinkle is
why an off-the-shelf tool like "wdiff" won't work for this job.

you might remember that a while back, i invited carlo to give
a workshop in using wdiff to compare different digitizations.
he never came through with it, but he _has_ recently posted
the output from one of his efforts to do such a comparison:
>    http://posso.dm.unipi.it/~traverso/Restricted/sh-wd3.txt
i encourage you to take a good hard look at his output, and
evaluate its usefulness compared to my output shown here...

the last step in the process is to find and correct differences.
(i'll probably elaborate on this last step in some future posts.)

now that i've given you a summary of the comparison process,
let's examine each of those steps a little bit more closely, ok?

***

you will recall that -- because of some stupid human being
who set a toggle incorrectly over at the o.c.a. -- their text is
_missing_ all of its em-dashes.   therefore, i decided to delete
the dashes from the p.g. e-text, to avoid spurious differences.

likewise, because the p.g. e-text used american double-quotes
for conversation, while the o.c.a. text used british single-quotes,
i changed all double-quotes to single-quotes, so they'd match...

finally, i believe i standardized some punctuation differences too.

keep all these changes in mind when you examine my results...

***

ok, finally, here is some real, honest-to-goodness data, at last!

in my last post, i said:
>    i'll show you the files that ended up being compared.

the "shaped" o.c.a. text is located here:
>    http://z-m-l.com/misc/mobyv1-oca-worked.txt

and the "shaped" p.g. e-text is here:
>    http://z-m-l.com/misc/moby11-all-worked.txt

***

in my last post, i said:
>    i'll show you the global changes that were made by
>    my post-o.c.r. clean-up application-tool.

i've appended to this post the changes that i made to the file.

for the most part, these clean-up routines will be familiar to
anyone experienced with the text typically returned by o.c.r.
(i didn't do anything distributed proofreaders couldn't do...)

for instance, i treat punctuation and characters "gone wild",
as listed, and searched for some high-probability scannos.
(the 3 listed were the only ones which occurred in this file;
ironically, in this book about a whale, there were 2 cases of
an _actual_ "arid" in this book, and 2 where it was a scanno.)

after the general class of _garbage_clean-up_, the next class
was concerned with the changes made by the editing process
in the creation of the _later_edition_ used for the p.g. e-text...

most of the cases in this latter class were _british_ spellings;
i have included those, so you can see what actually occurred.

there were also instances that might be british variants, or
might just be _editorial_decisions_, i'm not sure, including:
>    *wards // *ward
>    ay // aye
>    phrensy // frenzy

in general, british variants use "*ise", "*ising", and "*isation"
-- e.g., characterise, characterising, and characterisation --
whereas american variants use "*ize", "*izing, and "*ization"
-- e.g., characterize, characterizing, and characterization...

you can't do a global change because some british variants
do contain the "ize" or "izing".   i listed ones occurring here:
>    baptize, capsize, denizen, mizen*, seize, seizing, size

but as you can see, there are quite of few different terms,
and a number of them were of a relatively high frequency,
so controlling for them was a necessary part of this task...

the process of getting these term-pairs to simply "drop out"
of the files is a fairly complicated one, but i believe that i am
starting to get a relatively good handle on that programming.

***

in my last post, i said:
>    i'll show you the lines that differed between the files.

there were 444 lines where the files exhibited differences:

it's important to remember that not all these differences
were errors in the o.c.r., and i elaborate on that fact next.

***

in my last post, i said:
>    i'll show you how i categorized these different lines.
>    (some were edits, some were errors in the p.g. e-text,
>    and some were punctuation differences; the rest were
>    the lines the o.c.r. _probably_ recognized incorrectly.)

based on just a cursory look at each pair of different lines,
i made a rough split of them into the following categories...

one category was 96 _edits_ made in the later (p.g.) e-text:
>    http://z-m-l.com/misc/moby-out11-goodedit-96.html
your review of these pairs will demonstrate what i mean.

the next category was 48 _likely_errors_ in the p.g. e-text:
>    http://z-m-l.com/misc/moby-out11-badedit-48.html
again, your brief review should give an idea what i mean.

another obvious category was 64 _punctuation_differences_:
>    http://z-m-l.com/misc/moby-out11-punct-64.html
some of these might be o.c.r. errors, but for the most part,
it seemed to me these were intentionally-produced edits...

the next category was a special one -- 14 "stealth scannos":
>    http://z-m-l.com/misc/moby-out11-stealth-14.html
i pulled these for special consideration, discussed below.

finally, the category we are interested in, 222 o.c.r. errors:
>    http://z-m-l.com/misc/moby-out-scanno-222.html

this category's 222 pairs were _half_ of the original 444...

***

for those of you who didn't actually go and _look_ at those,
here's a few samples showing how i present the differences.

view these using a monospaced font for the superior results;
the third line underneath each pair helps you quickly perceive
where the lines differ; learn to use it, as it's extremely handy...
(i've tested presentation of differences; this is the _best_ way.)

>    Vhe bulwarks of ships from China; some high aloft in
>    the bulwarks glasses! of ships from China; some high aloft in
>    x============xxx=xxxx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

>    professor. Yes as everyone knows meditation andli
>    professor. Yes as every one knows meditation and
>    =======================xxxxxxxxxxxxxxxxxxxxxxxxxx

>    you yourself feel such a mystical vibration when first;
>    you yourself feel such a mystical vibration when first
>    ======================================================x

>    of the great whale himself. Such a gortentous and
>    of the great whale himself. Such a portentous and
>    ===================================x=============

>    With anxious grapnelsJE had sounded my pocket and only
>    With anxious grapnels I had sounded my pocket and only
>    =====================xx===============================

>    less service the soles of mv boots were in a most miserable
>    less service the soles of my boots were in a most miserable
>    ===========================x===============================

>    hear the sounds of the tinkling glasses within. But go i
>    hear the sounds of the tinkling glasses within. But go
>    ======================================================cx

>    on Ishmael said I at last; dont you hear? get away l
>    on Ishmael said I at last; dont you hear? get away
>    ==================================================cx

>    all but deserted. But presently I carne to a smoky
>    all but deserted. But presently I came to a smoky
>    ====================================xxxxxxxxxxxxxx

>    city Gomorrah? But The Cfossed Harpoons and
>    city Gomorrah? But The Crossed Harpoons and
>    ========================x==================

>    it from that Cashless window where the frost is on both
>    it from that sashless window where the frost is on both
>    =============x=========================================

***

ok, those 11 difference-pairs constitute the probably scannos
found in the first two chapters (11 pages) of our book here...

so let's take them individually, and look at the page-scans,
to see if we can tell exactly what might've caused the error.

>    Vhe bulwarks of ships from China; some high aloft in
>    the bulwarks glasses! of ships from China; some high aloft in
>    x============xxx=xxxx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

the capital-v is a scanno.   the "glasses!" is an _edit_
-- i probably called it a bad edit -- in the p.g. e-text.


>    professor. Yes as everyone knows meditation andli
>    professor. Yes as every one knows meditation and
>    =======================xxxxxxxxxxxxxxxxxxxxxxxxxx

the "li" was caused by pencil-marks in the page's margin.
although we can't really call that an o.c.r. "error" --
better cropping would have ensure that stuff like that
would not happen -- but it still has to be cleaned up...


>    you yourself feel such a mystical vibration when first;
>    you yourself feel such a mystical vibration when first
>    ======================================================x

again, pencil-marks in the margin...


>    of the great whale himself. Such a gortentous and
>    of the great whale himself. Such a portentous and
>    ===================================x=============

pencil-mark underlining the word could have caused this...


>    With anxious grapnelsJE had sounded my pocket and only
>    With anxious grapnels I had sounded my pocket and only
>    =====================xx===============================

again, pencil-mark underlining probably caused this...


>    less service the soles of mv boots were in a most miserable
>    less service the soles of my boots were in a most miserable
>    ===========================x===============================

straightforward o.c.r. error...


>    hear the sounds of the tinkling glasses within. But go i
>    hear the sounds of the tinkling glasses within. But go
>    ======================================================cx

pencil-mark in the margin again...


>    on Ishmael said I at last; dont you hear? get away l
>    on Ishmael said I at last; dont you hear? get away
>    ==================================================cx

the same pencil-mark from above caused this glitch too.


>    all but deserted. But presently I carne to a smoky
>    all but deserted. But presently I came to a smoky
>    ====================================xxxxxxxxxxxxxx

straightforward o.c.r. error...


>    city Gomorrah? But The Cfossed Harpoons and
>    city Gomorrah? But The Crossed Harpoons and
>    ========================x==================

a stray printer's mark on the page caused this one...


>    it from that Cashless window where the frost is on both
>    it from that sashless window where the frost is on both
>    =============x=========================================

another pencil-mark -- circling this word -- caused this...

***

summing it all up, then, we've got 7 pencil-marks as causes,
and 1 bad edit (in the p.g. e-text), and 1 printer's mark snafu,
and 3 scannos.   all in all, i'm impressed with the o.c.r. quality...

so, yeah, this book had a lot of pencil-marks in the margin.
and, quite honestly, when you consider it's been in a library
for maybe 50 or 100 years, that's not all that surprising, is it?

but it does show how ridiculous it would be to _honor_ these
pages with an extremely-high scanning resolution, which is
what some obsessively-compulsive people want us to do, as if
they constituted some pristine "idealized" version of this book.

and, to repeat, we certainly cannot blame our o.c.r. programs
when we feed them pages that we have not cropped properly.
if we don't want it to attend to stuff in the margins, then we
should draw a bounding-box around the text-block for it...
(not much we can do about marks _within_ the text-block, but
then again, maybe some image-wizards could fix that up too.)

the point is, our o.c.r. programs are doing some _great_ work...
(there were lots and lots of marks that _didn't_ foul up the o.c.r.)

in conclusion, if we did our jobs as well as the o.c.r. does its job,
the text that came out of our workflow would be nearly perfect...

***

in my last post, i said:
>    if i've gotten around to it, i'll let you know the results
>    of my check of these differences against the scans...

i didn't bother to do this yet.   i'm thinking that i will use
moby10b.txt instead, as it's based on the same edition
of the book as the one which was scanned by the o.c.a.
(at least i had some reason to think that might be true,
even though i cannot remember what that reason was.)

at any rate, i think you can tell by examining the 222 lines
where differences occurred that some had scanning errors
in them, while others might not have.   good enough for me.
even if all 222 had scanning errors, that's just 2% of the file.

***

in my last post, i said:
>    finally, i'll show you the lines that _might_ have had
>    "stealth scannos" on them -- lines which _might_ be
>    a _problem_ with the comparison method of proofing,
>    were they found to be a relatively common occurrence.

a "stealth scanno" is an o.c.r. error that will _not_ show up
in a regular spell-check, because the "wrong" word is "valid",
in the sense that it is the _correct_ spelling for another word.

so if "stage" is incorrectly recognized as "state", it won't be
flagged by a spell-checker, because "state" is also a word...
(whereas, for instance, "spage" would be flagged as wrong.)

"stealth" scannos are of concern to the comparison method
-- especially when we compare two sets of o.c.r. results --
because they might occur in both of the digitizations and --
since an identical error would lead to the lines "matching" --
we wouldn't look at that line, so we would _miss_the_error_.

therefore, in these experiments of mine, i'm super-sensitive
to any stealth-type scannos that might be uncovered, to see
if their frequency would constitute a troubling significance...

in the first 2 experiments, i was staggered to discover that
i found _not_a_single_trace_ of troublesome stealth scannos.
i was prepared to accept a _small_ number of stealth scannos,
since they wouldn't disturb the cost-benefit ratio _that_ much,
but i found _none_, even with some rather extensive checking.

in _this_ test, i uncovered instances that _might_have_been_
stealth scannos.   so, of course, i checked them very carefully.

this experiment was a great way to uncover stealth scannos,
since the p.g. text had gone through several human proofers,
and thus presumably could be thought free of stealth scannos.
(it's too bad we don't have human-proofed text for _all_ books!)

again, i was surprised by the infrequency of stealth scannos...

i found only _two_ in this entire 600k of text:

>    ---->             match // watch (p. 214) stealth (italics)
>    (Foresail rises and discovers the match standing lounging
>    (Foresail rises and discovers the watch standing lounging
>    ==================================x======================

>    ---->             shook // shock (p. 259) error (footnote)
>    and thereby combining the speed of the two objects for the shook; to
>    and thereby combining the speed of the two objects for the shock; to
>    ==============================================================x=====

it's worth noting that one of these two stealth scannos occurred on
an _italicized_ word, and the other was in a (small-font) footnote,
where the "c" does look very much like an "o", even to a human eye.

in a dozen other cases that _might've_ constituted a stealth scanno,
a review of the scan itself showed the o.c.r. had recognized correctly.
(the difference, accidental or intentional, was in the p.g. e-text.)

those dozen other cases are listed here:

>    ---->      state // stage (p. 34) o.c.r. was correct
>    tion state neither caterpillar nor butterfly. He was
>    tion stage neither caterpillar nor butterfly. He was
>    ========x===========================================

>    ---->      distinct // distant (p. 48) o.c.r. was correct
>    face shed a distinct spot of radiance upon the ships tossed
>    face shed a distant spot of radiance upon the ships tossed
>    ================xxx========================================

>    ---->      whaleman // whalemen (p. 137) o.c.r. was correct
>    the whaleman who first broke through the jealous policy
>    the whalemen who first broke through the jealous policy
>    ==========x============================================

>    ---->      liberally // literally (p. 149) o.c.r. was correct
>    these cases the native American liberally provides the
>    these cases the native American literally provides the
>    ==================================x====================

>    ---->      odd // old (p. 157) o.c.r. was correct
>    tanrail to mainmast Stubb the odd second mate came
>    taffrail to mainmast Stubb the old second mate came
>    ===xx=========================xxxx=================

>    ---->      place // space (p. 166) o.c.r. was correct
>    the various species or in this place at least to much of
>    the various species or in this space at least to much of
>    ===============================xx=======================

>    ---->      those // these (p. 167) o.c.r. was correct
>    given you those items. But in brief they are those:
>    given you those items. But in brief they are these:
>    ===============================================x===

>    ---->      ever // even (p. 192) o.c.r. was correct
>    and ever when most obscured by that London smoke
>    and even when most obscured by that London smoke
>    =======x========================================

>    ---->      had // has (p. 223) o.c.r. was correct
>    assailants had completely escaped them; to some minds
>    assailants has completely escaped them; to some minds
>    =============x=======================================

>    ---->      not // nor (p. 224) o.c.r. was correct
>    influences at work. Not even at the present day has the
>    influences at work. Nor even at the present day has the
>    ======================x================================

>    ---->      his // its (p. 281) o.c.r. was correct
>    his boats bow with his tail these allusions of his were at
>    his boats bow with its tail these allusions of his were at
>    ===================xx=====================================

>    ---->      whole // while (p. 322) o.c.r. was correct
>    mind to flog them all round thought upon the whole
>    mind to flog them all round thought upon the while
>    ===============================================x==

the results in general support the notion that the o.c.r. from the o.c.a.
is _extremely_good_, but these particular results specifically verify it...

a near-total absence of stealth scannos is a testament of high quality,
and the o.c.a. o.c.r. has demonstrated it in every test of my research...

i don't know exactly what o.c.r. app they're using, but it's _good_, folks.

if only o.c.a. fixed their glitches, so we could actually _use_ their text...

***

so, at the top of this post, i said "frankenstein" e-texts
do have their place with this comparison methodology.

at the same time, however, they have little enduring use.

because we now have a scan-set that has a lifetime which
is probably as rock-solid as we can imagine for _any_ file,
being backed by one of the world's biggest companies...
nothing is certain, of course, but the continued presence
of this scan-set at google is _relatively_ assured to people.

given that, my digital-text companion-file also probably
has a good chance of being mirrored into the far future...

since i will be linking my text-file up to the actual scans,
so the text can be verified in a very convenient manner,
the _accuracy_ of my digitization can be checked easily...

the project gutenberg e-texts, however, will have utility
that is relatively small now, since they cannot reliably be
verified by a particular scan-set, even a shortlived one...

it's not that i think _trustworthiness_ is all that essential;
i have argued that most people don't care about it much.
and indeed, as i've shown, i'm not reluctant to _do_edits_.

but if one text-file offers _verifiable_ "trustworthiness",
while another version simply says "trust me", it's certain
which of the two will be the one people come to prefer...

and yes, i'm aware that the p.g. plan is to post scan-sets,
and i know that many are in the process of being posted;
but with the lines rewrapped, any verification process will
always be a clumsy one.   make of that whatever you want.
but if i were you, i'd stop rewrapping the lines right now...
any rewrapped text that can't be verified will be trashed...

***

repeating the main results, in this book of 11,411 lines,
only 2% of them -- 222 -- were different across the files.

moreover, as my output shows, it's not even the case that
we need to proof those _entire_lines_, because a simple
routine clearly shows exactly where the difference occurs,
so it's often a matter of focusing on _a_single_character_.

and resolving our 222 differences, by viewing the scans,
leads to a digitization that's undoubtedly highly accurate.
i won't claim perfection, because glitches always happen,
but given the minimal amount of human time and energy
required by this comparison methodology relative to the
other methodologies commonly used, i _strongly_ believe
the cost-benefit ratio of my methodology is far superior...

this text is clean enough for "continuous proofreading"
-- a smooth-reading by people interested in its content.
my oft-repeated standard for that is 1-error-in-10-pages,
and there's no uncertainty at all that we reached that level...

***

it's obvious that we can compare digitizations much faster
than the proof-every-word-on-every-page methodology.

the results show that the combination of good o.c.a. o.c.r.
and an aggressive post-o.c.r. clean-up program can create
text that is phenomenally accurate, with little human help.
considering the millions of scan-sets we need to digitize,
this is good news indeed.

these results confirm those i've obtained on 2 books earlier:
>    http://www.pgdp.net/phpBB2/viewtopic.php?t=24008

at some point, d.p. will no longer be able to _ignore_ this...
even now, i believe they are on questionable moral ground,
considering the time and energy that their workflow wastes,
human resources that might well have been volunteered with
a reasonable expectation that they were being utilized wisely.
would you give to the red cross if you knew they wasted money?

at any rate, i welcome any feedback on this research experiment.

thank you.

-bowerbird

p.s.   appended are global changes made for the comparison, which
took two forms: cleaning up the o.c.r., and controlling for the edits.

>    --------------------------------------------------------------
>    --------------------------------------------------------------
>    --------------------------------------------------------------
>    global changes made to clean up the o.c.r. garbage
>    --------------------------------------------------------------
>    --------------------------------------------------------------
>    --------------------------------------------------------------
>
>    *****> characters gone wild
>
>    * [ ] { } | / \ < > _ @ # $ % ^ &
>
>    --------------------------------------------------------------
>
>    *****> contractions gone wild
>
>    's
>    he?s
>    he? s
>    J s to 's
>    I 've
>    I Ve
>    ye 've
>    we 've
>    Ve
>
>    --------------------------------------------------------------
>
>    *****> improper (or unlikely) punctuation settings
>
>    space-period

>    period-space-lowercase
>    comma-space-uppercase (not a name)
>
>    --------------------------------------------------------------
>
>    *****> improper (or unlikely) double-punctuation
>
>    ,;
>    ::
>
>    --------------------------------------------------------------
>
>    *****> common stealth scannos
>
>    arid = and (whole word)
>    lie = he (whole word)
>    hi = in (whole word)
>
>    --------------------------------------------------------------
>
>    *****> one-letter words that are not "o" or "a" or capital "i"
>
>    c // '
>    hear the sounds of the tinkling glasses within. But go i
>    on, Ishmael, said I at last; don't you hear? get away l
>    'Landlord,' I whispered, w that ain't the harpooneer,
>
>    --------------------------------------------------------------
>
>    *****> improper (or unlikely) line-starting punctuation
>
>    . semble; in some sort, did still. But that thing of his
>    )xtras with startling accounts of commonplaces never
>    ! life is gulped and gone. Steward, refill!
>
>    --------------------------------------------------------------
>
>    *****> improper (or unlikely) line-ending punctuation
>
>    thousand boat lowerings ere the White Whale had torn(
>
>    --------------------------------------------------------------
>    --------------------------------------------------------------
>    --------------------------------------------------------------
>    global changes made as controls for the edits
>    --------------------------------------------------------------
>    --------------------------------------------------------------
>    --------------------------------------------------------------
>
>    *****> spellings changed between editions, including british variants
>
>    &c // etc.
>    afterwards
>    agonising
>    armour
>    around // round
>    ay // aye
>    bedwards
>    behaviour
>    Cooke // Cook
>    caster // castor
>    characterising
>    civilise
>    clamour
>    colour
>    connection // connexion
>    considerating // considering
>    Duodecimoes // Duodecimos
>    dropt // dropped
>    enclosed // inclosed
>    endeavour
>    envelops // envelopes
>    favour
>    favourite
>    flavour
>    generalising
>    grey // gray
>    Hallo // Halloa
>    Hollo // Halloa
>    Holloa // Halloa
>    hindoo // hindu
>    homewards
>    honour
>    honourable
>    humour
>    idealised
>    idolater // idolator
>    individualising
>    insure // ensure
>    inwards
>    jeopardise
>    labour
>    licence
>    monopolising
>    neighbour
>    organise
>    Pottsfich // Pottsfisch
>    parlour
>    patronising
>    phrensy // frenzy
>    phrensies // frenzies
>    popularise
>    pulverise
>    realise
>    recognise
>    reverie // revery
>    rumour
>    savor
>    scrutinising
>    sermonising
>    soliloquise
>    southwards
>    specialities // specialties
>    spiralise
>    succour
>    symbolise
>    symbolisings
>    systematised // systemised
>    systemised
>    tantalising
>    tranquillise
>    uncivilise
>    upwards
>    valour
>    vapour
>    vigour
>    villan // villain
>    villanous // villainous
>    yea // yes
>    _Pequod_
>    _Pequod_'s
>
>    --------------------------------------------------------------
>
>    *****> british words that _do_ have "ize" or "izi"
>
>    baptize
>    capsize
>    denizen
>    mizen
>    mizentop
>    seize
>    seizing
>    size
>
>    --------------------------------------------------------------


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080108/260f196d/attachment-0001.htm 

From creeva at gmail.com  Tue Jan  8 17:52:02 2008
From: creeva at gmail.com (Brent Gueth)
Date: Tue, 8 Jan 2008 20:52:02 -0500
Subject: [gutvol-d] why cyberspace is cheaper than meat-space
In-Reply-To: <Pine.LNX.4.64.0801081518380.22241@pglaf.org>
References: <c59.226aad7a.34b51383@aol.com>
	<Pine.LNX.4.64.0801081518380.22241@pglaf.org>
Message-ID: <2510ddab0801081752i4a01231evc37216380bcb0539@mail.gmail.com>

I think he was talking in the abstract

On Jan 8, 2008 6:19 PM, Michael Hart <hart at pglaf.org> wrote:
>
> I see all this noise about money.
>
> Just where is this money supposed to be going?
>
> Or coming from, for that matter?
>
>
>
> mh
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From joyce.b.wilson at sbcglobal.net  Wed Jan  9 10:46:20 2008
From: joyce.b.wilson at sbcglobal.net (Joyce Wilson)
Date: Wed, 09 Jan 2008 12:46:20 -0600
Subject: [gutvol-d] Duplicate texts?
Message-ID: <4785167C.1040101@sbcglobal.net>

I ran across a couple of pairs of seemingly identical Chinese texts:

http://www.gutenberg.org/etext/24120
and
http://www.gutenberg.org/etext/24140

Also

http://www.gutenberg.org/etext/24112
and
http://www.gutenberg.org/etext/24155

Apologies if this is the wrong place to post about them.
Joyce W

From hart at pglaf.org  Wed Jan  9 11:29:50 2008
From: hart at pglaf.org (Michael Hart)
Date: Wed, 9 Jan 2008 11:29:50 -0800 (PST)
Subject: [gutvol-d] Duplicate texts?
In-Reply-To: <4785167C.1040101@sbcglobal.net>
References: <4785167C.1040101@sbcglobal.net>
Message-ID: <Pine.LNX.4.64.0801091129250.3799@pglaf.org>


Forwarded your noe to our CEO and Prof. Mao.

Thanks!!!

Michael S. Hart
Founder
Project Gutenberg

Recommended Books:

Dandelion Wine, by Ray Bradbury:  For The Right Brain
Atlas Shrugged, by Ayn Ran,:  For The Left Brain [or both]
Diamond Age, by Neal Stephenson:  To Understand The Internet
The Phantom Toobooth, by Norton Juster:  Lesson of Life. . .


On Wed, 9 Jan 2008, Joyce Wilson wrote:

> I ran across a couple of pairs of seemingly identical Chinese texts:
>
> http://www.gutenberg.org/etext/24120
> and
> http://www.gutenberg.org/etext/24140
>
> Also
>
> http://www.gutenberg.org/etext/24112
> and
> http://www.gutenberg.org/etext/24155
>
> Apologies if this is the wrong place to post about them.
> Joyce W
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From Bowerbird at aol.com  Wed Jan  9 13:54:19 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 9 Jan 2008 16:54:19 EST
Subject: [gutvol-d] an insistence on doing things the hard way
Message-ID: <bd6.24bf75c7.34b69c8b@aol.com>

please.

what do you do with people who insist on doing things the hard way?

i dunno.

d.p. is persisting in their attempts to find a way of measuring the
"confidence" that a certain page has been proofed "enough" times.

now they put out a call for "statisticians and data analysis gurus"
to help them solve this problem, which they believe must be done
before they can implement a "roundless" system.

how stupid.   how silly.

the "secret" is very simple, and i've given it to them repeatedly...

they're asking the wrong question.   you know that a page is "done"
when a certain number of people -- pick a number, any number --
have looked at the page and can no longer find anything to correct.

you don't need any fancy statistics.   you don't need _any_ statistics.
you just need to see whether any changes were made to the page...

and you don't even need to know _what_ was changed,
you only need to know _whether_ any change was made,
so you need nothing more than a simple equivalence test:
>    if before-text = after-text then no changes were made.

_any_ time a page is changed -- _any_ change, on _any_ page --
that change should be reviewed to make sure that it was correct.

if you don't have that as a solid policy which can never be violated,
you're going to have errors slipping through.   it's purely inevitable.

but if you _do_ have that as a solid policy, you need no other policy.
the only question remaining is how many times it must be verified...

i told them this years ago.   and i am telling them _again_ right now.
and i'll probably have to repeat it still another time, years from now.

what do you do with people who insist on doing things the hard way?

i dunno.   maybe feel sorry for them because they're stupid?
or mock them because they're stupid?   i dunno.   you tell me.

thank you.

-bowerbird

p.s.   and yes, this is _doubly_ stupid in light of the research i just 
posted
showing that -- after a good post-o.c.r. clean-up program -- most pages
won't have _any_ errors in them, and the ones that do will get a laser-focus.

p.p.s.   and the doubling cube of stupidity does another flip when they add 
in
the "confidence score" they want to assign to each and every _proofer_ as 
well.
i won't even bother to deal with that asinine nonsense...


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080109/7670d106/attachment.htm 

From schultzk at uni-trier.de  Thu Jan 10 02:56:26 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 10 Jan 2008 11:56:26 +0100
Subject: [gutvol-d] an insistence on doing things the hard way
In-Reply-To: <bd6.24bf75c7.34b69c8b@aol.com>
References: <bd6.24bf75c7.34b69c8b@aol.com>
Message-ID: <1CD6168E-A541-46E1-A9A5-936EFDDF25B5@uni-trier.de>

Hi Bowerbird,

	I will disagree here with you.

	Though you are right it could work out, but
	quality could lack. Alot of coulds.

	The DP approach could prove to be better and
	does have it very good cavets. It all depends
	on the implementation and design off their
	parameters. No of which I can say anything about.

	Yet, as we both know DP and the way they handle
	things they will not get it right.

	They do not need  "statisticians and data analysis gurus",
       but a good linguistics. They will be well a acquainted
       with proofing and know more than enough about data
       analysis and statistics. Even better would be a computer
       linguist, like me, but I do not like the DP way of things.

       Before the flames come in about my stance on DP let
       me say DP is doing a great job of producing texts.

	regards
		Keith.

	
Am 09.01.2008 um 22:54 schrieb Bowerbird at aol.com:

> please.
>
> what do you do with people who insist on doing things the hard way?
>
> i dunno.
>
> d.p. is persisting in their attempts to find a way of measuring the
> "confidence" that a certain page has been proofed "enough" times.
>
> now they put out a call for "statisticians and data analysis gurus"
> to help them solve this problem, which they believe must be done
> before they can implement a "roundless" system.
>
> how stupid.  how silly.
>
> the "secret" is very simple, and i've given it to them repeatedly...
>
> they're asking the wrong question.  you know that a page is "done"
> when a certain number of people -- pick a number, any number --
> have looked at the page and can no longer find anything to correct.
>
> you don't need any fancy statistics.  you don't need _any_ statistics.
> you just need to see whether any changes were made to the page...
>
> and you don't even need to know _what_ was changed,
> you only need to know _whether_ any change was made,
> so you need nothing more than a simple equivalence test:
> >   if before-text = after-text then no changes were made.
>
> _any_ time a page is changed -- _any_ change, on _any_ page --
> that change should be reviewed to make sure that it was correct.
>
> if you don't have that as a solid policy which can never be violated,
> you're going to have errors slipping through.  it's purely inevitable.
>
> but if you _do_ have that as a solid policy, you need no other policy.
> the only question remaining is how many times it must be verified...
>
> i told them this years ago.  and i am telling them _again_ right now.
> and i'll probably have to repeat it still another time, years from  
> now.
>
> what do you do with people who insist on doing things the hard way?
>
> i dunno.  maybe feel sorry for them because they're stupid?
> or mock them because they're stupid?  i dunno.  you tell me.
>
> thank you.
>
> -bowerbird
>
> p.s.  and yes, this is _doubly_ stupid in light of the research i  
> just posted
> showing that -- after a good post-o.c.r. clean-up program -- most  
> pages
> won't have _any_ errors in them, and the ones that do will get a  
> laser-focus.
>
> p.p.s.  and the doubling cube of stupidity does another flip when  
> they add in
> the "confidence score" they want to assign to each and every  
> _proofer_ as well.
> i won't even bother to deal with that asinine nonsense...
>
>
>
> **************
> Start the year off right. Easy ways to stay in shape.
> http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080110/094eb85e/attachment.htm 

From Bowerbird at aol.com  Thu Jan 10 10:43:45 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 10 Jan 2008 13:43:45 EST
Subject: [gutvol-d] chinese versus english
Message-ID: <c3d.29d640e0.34b7c161@aol.com>

after several days where english swamped chinese
-- that d.p. is an awesome digitizing machine --
chinese makes a surge back today to take it 11-4.

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080110/b009a9a2/attachment.htm 

From Bowerbird at aol.com  Fri Jan 11 13:33:34 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 11 Jan 2008 16:33:34 EST
Subject: [gutvol-d] state of the art
Message-ID: <bf3.209091b5.34b93aae@aol.com>

please.

next week, i'll conclude my "state of the art" report...

the wrap-up will include topics such as the front-matter,
a g.u.i. for making corrections, and conversion to z.m.l.,
including the preparation for "continuous proofreading".

if anyone has any questions or comments, you can
post them here or send them to me backchannel...

the conclusion, however, should already be very clear.
this comparison methodology works extremely well,
and it is an order of magnitude more _efficient_ than
the old system of proofing every word on every page.

this has been demonstrated in 3 separate tests now,
with stark and striking results obtained in each one...

i will do one or two additional replications, and then
turn my focus to perfecting tools to make it happen...

have a nice weekend.

thank you.

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080111/1c04d208/attachment.htm 

From ricardofdiogo at gmail.com  Sun Jan 13 18:44:57 2008
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Mon, 14 Jan 2008 02:44:57 +0000
Subject: [gutvol-d] #23961 copyrighted remove from catalog
Message-ID: <9c6138c50801131844r378d0efeh58c14855e902b93f@mail.gmail.com>

Etext #23961 (Manifesto Anti-Dantas) is not in the public domain in
the USA under the pre-1923 rule. Unless the editor gave his
permission, it must be removed from the catalog. 1916 is _NOT_ the
publication date. It's part of the text.

In cases where the publishing date is not prominent in Portuguese
Title Pages, you are much welcome to ask for my help before giving the
clearance line.

Ricardo

From Catenacci at Ieee.Org  Mon Jan 14 05:44:47 2008
From: Catenacci at Ieee.Org (Onorio Catenacci)
Date: Mon, 14 Jan 2008 08:44:47 -0500
Subject: [gutvol-d] Status of Magazine Articles
Message-ID: <c26320b80801140544p40b0f85at98258aea912f4efe@mail.gmail.com>

Hi all,

Apologies in advance if this has been discussed recently.  I've not
been getting the PG Volunteer Discussion List mails for some time due
to some other priorities taking my time.  Also, if this is answered on
a FAQ, please just point me to it.

I'd like to digitize articles from a magazine from 1929.  I'm fairly
sure that the magazine did not pay its writers so I'm also fairly sure
that the writers would have retained copyright.  However, there's no
way that I know of to be certain that the writers were not paid for
their work.

Is there any way that I can confirm the copyright status of these
individual articles?  Assuming the authors were not paid for their
work, would the authors (or their estates) retain copyright?   I mean
I know that copyright laws have changed since 1929 but I'd think the
work for hire aspects would have been the same even then.  I've
managed to get hold of the family of one of the authors and they've
tentatively given me permission to reprint the article; I would have
asked for their permission even if I were sure the article is in the
Public Domain because it just seems rude to me not to ask and I don't
believe that asking permission changes the basic copyright status of
the article either way.  I just want to make sure I'm not opening
myself up to a copyright infringement lawsuit from some other relative
who may think they can make a fast buck.

Any advice would be greatly appreciated.  If anyone knows of a
directory of lawyers that know IP law, that'd be fine with me; I don't
mind paying for legal advice.  I just don't know where I would find
lawyers that have expertise in this area of law.


-- 
Onorio Catenacci III

From grythumn at gmail.com  Mon Jan 14 05:51:21 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Mon, 14 Jan 2008 08:51:21 -0500
Subject: [gutvol-d] Status of Magazine Articles
In-Reply-To: <c26320b80801140544p40b0f85at98258aea912f4efe@mail.gmail.com>
References: <c26320b80801140544p40b0f85at98258aea912f4efe@mail.gmail.com>
Message-ID: <15cfa2a50801140551p2de7f0c8ie7ce70ba39fb6518@mail.gmail.com>

On Jan 14, 2008 8:44 AM, Onorio Catenacci <Catenacci at ieee.org> wrote:
> I'd like to digitize articles from a magazine from 1929.  I'm fairly
> sure that the magazine did not pay its writers so I'm also fairly sure
> that the writers would have retained copyright.  However, there's no
> way that I know of to be certain that the writers were not paid for

Was it published in the US, and were the authors of US birth? If so,
it may be clearable under Rule 6. The research is fairly long, but it
is where the majority of the SF is coming from. Also, check to see if
they have copyright notices.. some of the amateur magazines can be
cleared that way.

R C

From Catenacci at Ieee.Org  Mon Jan 14 06:07:34 2008
From: Catenacci at Ieee.Org (Onorio Catenacci)
Date: Mon, 14 Jan 2008 09:07:34 -0500
Subject: [gutvol-d] Status of Magazine Articles
In-Reply-To: <15cfa2a50801140551p2de7f0c8ie7ce70ba39fb6518@mail.gmail.com>
References: <c26320b80801140544p40b0f85at98258aea912f4efe@mail.gmail.com>
	<15cfa2a50801140551p2de7f0c8ie7ce70ba39fb6518@mail.gmail.com>
Message-ID: <c26320b80801140607l6eba4b7fhf5ee5f8e878f0d72@mail.gmail.com>

On Jan 14, 2008 8:51 AM, Robert Cicconetti <grythumn at gmail.com> wrote:
> On Jan 14, 2008 8:44 AM, Onorio Catenacci <Catenacci at ieee.org> wrote:
> > I'd like to digitize articles from a magazine from 1929.  I'm fairly
> > sure that the magazine did not pay its writers so I'm also fairly sure
> > that the writers would have retained copyright.  However, there's no
> > way that I know of to be certain that the writers were not paid for
>
> Was it published in the US, and were the authors of US birth? If so,
> it may be clearable under Rule 6. The research is fairly long, but it
> is where the majority of the SF is coming from. Also, check to see if
> they have copyright notices.. some of the amateur magazines can be
> cleared that way.
>

Yes published in the US and yes authors were US born.  If anyone doing
the research for the SF stuff could point me to resources to check
this, I would appreciate the help.  When I  looked through the FAQ,
Rule 6 seemed most applicable to me but I have to confess it wasn't
quite clear to me.

I did look at table of contents for the magazine and I believe the
publication info was at the bottom of that page.  I didn't see a
copyright notice but I may have missed it.  I do believe that this
magazine was mostly written by amateur authors interested in the hobby
that the magazine was discussing.

-- 
Onorio Catenacci III

From greg at durendal.org  Mon Jan 14 06:10:39 2008
From: greg at durendal.org (Greg Weeks)
Date: Mon, 14 Jan 2008 09:10:39 -0500 (EST)
Subject: [gutvol-d] Status of Magazine Articles
In-Reply-To: <c26320b80801140544p40b0f85at98258aea912f4efe@mail.gmail.com>
References: <c26320b80801140544p40b0f85at98258aea912f4efe@mail.gmail.com>
Message-ID: <Pine.LNX.4.63.0801140908530.31504@durendal.durendal.org>

On Mon, 14 Jan 2008, Onorio Catenacci wrote:

> Is there any way that I can confirm the copyright status of these
> individual articles?  Assuming the authors were not paid for their
> work, would the authors (or their estates) retain copyright?   I mean
> I know that copyright laws have changed since 1929 but I'd think the
> work for hire aspects would have been the same even then.  I've
> managed to get hold of the family of one of the authors and they've
> tentatively given me permission to reprint the article; I would have
> asked for their permission even if I were sure the article is in the

I've had some luck chasing down the current owners of the magazines and 
asking them. F&SF/Venture and Analog both responded to my inquiries. I've 
been ignored a lot too though.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From greg at durendal.org  Mon Jan 14 06:10:39 2008
From: greg at durendal.org (Greg Weeks)
Date: Mon, 14 Jan 2008 09:10:39 -0500 (EST)
Subject: [gutvol-d] Status of Magazine Articles
In-Reply-To: <c26320b80801140544p40b0f85at98258aea912f4efe@mail.gmail.com>
References: <c26320b80801140544p40b0f85at98258aea912f4efe@mail.gmail.com>
Message-ID: <Pine.LNX.4.63.0801140908530.31504@durendal.durendal.org>

On Mon, 14 Jan 2008, Onorio Catenacci wrote:

> Is there any way that I can confirm the copyright status of these
> individual articles?  Assuming the authors were not paid for their
> work, would the authors (or their estates) retain copyright?   I mean
> I know that copyright laws have changed since 1929 but I'd think the
> work for hire aspects would have been the same even then.  I've
> managed to get hold of the family of one of the authors and they've
> tentatively given me permission to reprint the article; I would have
> asked for their permission even if I were sure the article is in the

I've had some luck chasing down the current owners of the magazines and 
asking them. F&SF/Venture and Analog both responded to my inquiries. I've 
been ignored a lot too though.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From greg at durendal.org  Mon Jan 14 06:13:06 2008
From: greg at durendal.org (Greg Weeks)
Date: Mon, 14 Jan 2008 09:13:06 -0500 (EST)
Subject: [gutvol-d] Status of Magazine Articles
In-Reply-To: <c26320b80801140607l6eba4b7fhf5ee5f8e878f0d72@mail.gmail.com>
References: <c26320b80801140544p40b0f85at98258aea912f4efe@mail.gmail.com>
	<15cfa2a50801140551p2de7f0c8ie7ce70ba39fb6518@mail.gmail.com>
	<c26320b80801140607l6eba4b7fhf5ee5f8e878f0d72@mail.gmail.com>
Message-ID: <Pine.LNX.4.63.0801140911060.31504@durendal.durendal.org>

On Mon, 14 Jan 2008, Onorio Catenacci wrote:

> I did look at table of contents for the magazine and I believe the
> publication info was at the bottom of that page.  I didn't see a
> copyright notice but I may have missed it.  I do believe that this
> magazine was mostly written by amateur authors interested in the hobby
> that the magazine was discussing.

It sounds like a good candidate for Rule 5, no copyright notice. I've 
cleared at least one issue of Astounding that way. Look the cover and a 
few pages around the table of contents page to be sure. I've never seen 
the copyright notice on a magazine any where else but the table of 
contents page.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From grythumn at gmail.com  Mon Jan 14 07:54:55 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Mon, 14 Jan 2008 10:54:55 -0500
Subject: [gutvol-d] #23961 copyrighted remove from catalog
In-Reply-To: <9c6138c50801131844r378d0efeh58c14855e902b93f@mail.gmail.com>
References: <9c6138c50801131844r378d0efeh58c14855e902b93f@mail.gmail.com>
Message-ID: <15cfa2a50801140754g71d9ebf8v51badf4be13fd011@mail.gmail.com>

How do you know this is a rule 1? Do you have the clearance key?

R C

On Jan 13, 2008 9:44 PM, Ricardo F Diogo <ricardofdiogo at gmail.com> wrote:
> Etext #23961 (Manifesto Anti-Dantas) is not in the public domain in
> the USA under the pre-1923 rule. Unless the editor gave his
> permission, it must be removed from the catalog. 1916 is _NOT_ the
> publication date. It's part of the text.

From ricardofdiogo at gmail.com  Mon Jan 14 08:15:12 2008
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Mon, 14 Jan 2008 16:15:12 +0000
Subject: [gutvol-d] #23961 copyrighted remove from catalog
In-Reply-To: <15cfa2a50801140754g71d9ebf8v51badf4be13fd011@mail.gmail.com>
References: <9c6138c50801131844r378d0efeh58c14855e902b93f@mail.gmail.com>
	<15cfa2a50801140754g71d9ebf8v51badf4be13fd011@mail.gmail.com>
Message-ID: <9c6138c50801140815j6f4e6f76qe277aeeba94772eb@mail.gmail.com>

2008/1/14, Robert Cicconetti <grythumn at gmail.com>:
> How do you know this is a rule 1? Do you have the clearance key?
>
> R C
>

I don't. You can call it intuition.

Ricardo

From gbnewby at pglaf.org  Mon Jan 14 10:51:06 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon, 14 Jan 2008 10:51:06 -0800
Subject: [gutvol-d] #23961 copyrighted remove from catalog
In-Reply-To: <9c6138c50801131844r378d0efeh58c14855e902b93f@mail.gmail.com>
References: <9c6138c50801131844r378d0efeh58c14855e902b93f@mail.gmail.com>
Message-ID: <20080114185106.GB12200@mail.pglaf.org>

On Mon, Jan 14, 2008 at 02:44:57AM +0000, Ricardo F Diogo wrote:
> Etext #23961 (Manifesto Anti-Dantas) is not in the public domain in
> the USA under the pre-1923 rule. Unless the editor gave his
> permission, it must be removed from the catalog. 1916 is _NOT_ the
> publication date. It's part of the text.
> 
> In cases where the publishing date is not prominent in Portuguese
> Title Pages, you are much welcome to ask for my help before giving the
> clearance line.
> 
> Ricardo

Thanks for your note, Ricardo.  Please email copyright at pglaf.org (which
goes to me & Juliet, who perform the clearances) for copyright
info/inquiries.

For this item, the library catalog says it was published in 1916.
Here's a long link:
  http://opac.porbase.org/ipac20/ipac.jsp?session=T1981772V77E7.45462&profile=porbase&uri=full=3100024@!1405290@!2&ri=1&aspect=power&menu=search&source=192.168.0.17@!porbase&ipp=20&staffonly=&term=manifesto+anti&index=.TW&uindex=&oper=and&term=almada+negreiros&index=.AW&uindex=&aspect=power&menu=search&ri=1 

When do you think it was published?
  -- Greg


From Bowerbird at aol.com  Mon Jan 14 11:43:02 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 14 Jan 2008 14:43:02 EST
Subject: [gutvol-d] a quick update on "the stare of the art of digitization"
Message-ID: <c67.26496ac5.34bd1546@aol.com>

this is a quick update on "the stare of the art of digitization".

first, i've determined that the two p.g. versions of "moby dick"
have substantial differences, even though one of them was said
to be based -- in part -- on the other, so i've decided to use them
_both_ as comparison criterions against the one scanned by o.c.a.

i'll let you know how that goes...

***

also, i went out to gather material for the next replication of
my research, and was pleasantly surprised by what i found...

i looked up "books and culture", which was _the_ first book
from the public-domain that google made publicly available.

google now offers _five_ (count 'em, 5) scan-sets of this book!
(those are the "full view" ones; they have "no preview" ones too.)
2 from umichigan, 2 from stanford, and 1 from the n.y. public...

in addition, the o.c.a. offers 1 from the university of california and
1 from the university of toronto.   (and i expect more from the u.c.)

so we have plenty of versions with which to do comparisons on this,
plus it indicates we might enjoy a similar plenitude on other books.
with multiple sets of o.c.r. for one book, the results will be awesome!

i knew this development would come to pass sooner or later, but it's
nice to know that it's already happened; it's tremendously good news!

now...   if only google and the o.c.a. would get their _shit_together_
and stop dropping characters (like em-dashes and quote-marks),
we could get to the job of cleaning the o.c.r. from millions of books.

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080114/269080ae/attachment-0001.htm 

From Bowerbird at aol.com  Mon Jan 14 11:45:23 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 14 Jan 2008 14:45:23 EST
Subject: [gutvol-d] "stare of the art" -- ha ha!
Message-ID: <d49.1d3a2b30.34bd15d3@aol.com>

"stare of the art" -- ha ha!

it's a good thing i'm not a proofreader!        ;+)

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080114/a463327e/attachment.htm 

From Bowerbird at aol.com  Mon Jan 14 13:24:31 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 14 Jan 2008 16:24:31 EST
Subject: [gutvol-d] a quick update on "the state of the art of
	digitization"
Message-ID: <c86.2345f255.34bd2d0f@aol.com>

i said:
>    google now offers _five_ (count 'em, 5) scan-sets of this book!
>    ...
>    in addition, the o.c.a. offers 1 from the university of california and
>    1 from the university of toronto.? (and i expect more from the u.c.)

i couldn't resist a quick comparison of the 2 sets of o.c.r. from the o.c.a.

remember, one of these books was scanned at the university of toronto,
and the other was scanned at the university of california.

i did only a rudimentary clean-up on each, but it's already the case that
there are >6,000 lines in common between them, and <200 different...

i say, this comparison method is looking better and better all the time...    
     :+)

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080114/29b1f8fe/attachment.htm 

From hart at pglaf.org  Mon Jan 14 13:26:58 2008
From: hart at pglaf.org (Michael Hart)
Date: Mon, 14 Jan 2008 13:26:58 -0800 (PST)
Subject: [gutvol-d] !@! GUARDIAN says PG "World's Best Public Library
Message-ID: <Pine.LNX.4.64.0801141326410.16785@pglaf.org>


http://www.guardian.co.uk/technology/2008/jan/14/project.gutenberg?gusrc=rss&feed=technology

From Bowerbird at aol.com  Tue Jan 15 13:11:11 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 15 Jan 2008 16:11:11 EST
Subject: [gutvol-d] happy birthday
Message-ID: <cce.2381e843.34be7b6f@aol.com>

please.

>    I Have a Dream
>
>    by Martin Luther King, Jr.
>
>    excerpts from his speech of 28 August 1963,
>    at the Lincoln Memorial, in Washington D.C.
>
>
>    Let us not wallow in the valley of despair,
>    I say to you today, my friends.
>
>    ...so even though we face the difficulties of today and tomorrow,
>    I still have a dream.
>    It is a dream deeply rooted in the American dream.
>
>    I have a dream that one day this nation will rise up
>    and live out the true meaning of its creed:
>    "We hold these truths to be self-evident,
>    that all people are created equal."
>
>    I have a dream that one day
>    on the red hills of Georgia,
>    the sons of former slaves and
>    the sons of former slave owners will be able to
>    sit down together at the table of brotherhood.
>
>    I have a dream that one day even the state of Mississippi,
>    a state sweltering with the heat of injustice,
>    sweltering with the heat of oppression,
>    will be transformed into an oasis of freedom and justice.
>
>    I have a dream that my four little children will
>    will one day live in a world where
>    they will not be judged by the color of their skin but
>    by the content of their character.
>
>    I have a dream today!,
>    that little black boys
>    and black girls
>    will be able to
>    join hands with
>    little white boys
>    and white girls
>    as sisters and brothers.
>
>    I have a dream today!

happy birthday, dr. king...
thank you for your dream!

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080115/b9012705/attachment.htm 

From julio.reis at tintazul.com.pt  Wed Jan 16 13:48:53 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Wed, 16 Jan 2008 21:48:53 +0000
Subject: [gutvol-d] happy birthday
In-Reply-To: <mailman.2.1200513602.6672.gutvol-d@lists.pglaf.org>
References: <mailman.2.1200513602.6672.gutvol-d@lists.pglaf.org>
Message-ID: <1200520134.7247.152.camel@abetarda>

... and no luck in getting the corresponding e-text back on-line?

> > >   I Have a Dream
> > >
> > >   by Martin Luther King, Jr.


From gbnewby at pglaf.org  Wed Jan 16 14:40:23 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed, 16 Jan 2008 14:40:23 -0800
Subject: [gutvol-d] happy birthday
In-Reply-To: <1200520134.7247.152.camel@abetarda>
References: <mailman.2.1200513602.6672.gutvol-d@lists.pglaf.org>
	<1200520134.7247.152.camel@abetarda>
Message-ID: <20080116224023.GA10033@mail.pglaf.org>

On Wed, Jan 16, 2008 at 09:48:53PM +0000, J?lio Reis wrote:
> ... and no luck in getting the corresponding e-text back on-line?
> 
> > > >   I Have a Dream
> > > >
> > > >   by Martin Luther King, Jr.

There was a legal case (not involving PG) in which it was
determined that the speech is still under copyright protection.
So, PG removed it from our archive (years ago).

Under current copyright laws, it will be awhile before the speech
enters the public domain.  I don't think we ever asked, but the
King estate does not seem keen on granting redistribution rights.
  -- Greg


From Bowerbird at aol.com  Wed Jan 16 15:32:49 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 16 Jan 2008 18:32:49 EST
Subject: [gutvol-d] happy birthday
Message-ID: <d2c.1d78df45.34bfee21@aol.com>

please.

i'll send a copy of the whole thing to anyone who asks.

or, you know, you can find it yourself, on the internet,
just like i did.   (i hit the first site google regurgitated.)

imagine someone trying to tell me i can't copy a speech
that was given to _me_, about a _dream_ given to _me_.

ha!   try an' stop me.

free at last, free at last,
thank god almighty, we're free at last...

-bowerbird

p.s.   i kiss my girl any time i want, for as long as she likes.


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080116/4efcdaff/attachment.htm 

From schultzk at uni-trier.de  Thu Jan 17 00:22:12 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 17 Jan 2008 09:22:12 +0100
Subject: [gutvol-d] happy birthday
In-Reply-To: <20080116224023.GA10033@mail.pglaf.org>
References: <mailman.2.1200513602.6672.gutvol-d@lists.pglaf.org>
	<1200520134.7247.152.camel@abetarda>
	<20080116224023.GA10033@mail.pglaf.org>
Message-ID: <89624D09-10C0-45F9-82CE-B0838B6B5FFD@uni-trier.de>

Hi There,

	I am not sure what that case was about, but
	a speech made in the public is public domain.
	Furthermore, King was then a public figure and
	therefore his speech is even more public.

	The speech as itself is public domain. What is not
	public domain as per se are publications thereof.

	So what was the source of the PG version?

	regards
		Keith.

Am 16.01.2008 um 23:40 schrieb Greg Newby:

> On Wed, Jan 16, 2008 at 09:48:53PM +0000, J?lio Reis wrote:
>> ... and no luck in getting the corresponding e-text back on-line?
>>
>>>>>   I Have a Dream
>>>>>
>>>>>   by Martin Luther King, Jr.
>
> There was a legal case (not involving PG) in which it was
> determined that the speech is still under copyright protection.
> So, PG removed it from our archive (years ago).
>
> Under current copyright laws, it will be awhile before the speech
> enters the public domain.  I don't think we ever asked, but the
> King estate does not seem keen on granting redistribution rights.
>   -- Greg
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From gbnewby at pglaf.org  Thu Jan 17 10:38:11 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu, 17 Jan 2008 10:38:11 -0800
Subject: [gutvol-d] happy birthday
In-Reply-To: <89624D09-10C0-45F9-82CE-B0838B6B5FFD@uni-trier.de>
References: <mailman.2.1200513602.6672.gutvol-d@lists.pglaf.org>
	<1200520134.7247.152.camel@abetarda>
	<20080116224023.GA10033@mail.pglaf.org>
	<89624D09-10C0-45F9-82CE-B0838B6B5FFD@uni-trier.de>
Message-ID: <20080117183810.GC27689@mail.pglaf.org>

On Thu, Jan 17, 2008 at 09:22:12AM +0100, Schultz Keith J. wrote:
> Hi There,
> 
> 	I am not sure what that case was about, but
> 	a speech made in the public is public domain.
> 	Furthermore, King was then a public figure and
> 	therefore his speech is even more public.
> 
> 	The speech as itself is public domain. What is not
> 	public domain as per se are publications thereof.

Feel free (encouraged) to research the case and fight it
out with the King estate on our behalf.

Maybe it's public domain in other countries?

> 	So what was the source of the PG version?

Dunno.  Michael Hart probably knows.
  -- Greg

> 	regards
> 		Keith.
> 
> Am 16.01.2008 um 23:40 schrieb Greg Newby:
> 
> >On Wed, Jan 16, 2008 at 09:48:53PM +0000, J?lio Reis wrote:
> >>... and no luck in getting the corresponding e-text back on-line?
> >>
> >>>>>  I Have a Dream
> >>>>>
> >>>>>  by Martin Luther King, Jr.
> >
> >There was a legal case (not involving PG) in which it was
> >determined that the speech is still under copyright protection.
> >So, PG removed it from our archive (years ago).
> >
> >Under current copyright laws, it will be awhile before the speech
> >enters the public domain.  I don't think we ever asked, but the
> >King estate does not seem keen on granting redistribution rights.
> >  -- Greg
> >
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d at lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d

From Bowerbird at aol.com  Thu Jan 17 11:28:09 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 17 Jan 2008 14:28:09 EST
Subject: [gutvol-d] happy birthday
Message-ID: <d68.1de3dd22.34c10649@aol.com>

greg said:
>    Dunno.? Michael Hart probably knows.

the story i remember michael telling is that
coretta scott king wanted to shake p.g. down.
(i'm sure he used a more delicate phrasing...)

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080117/b6237651/attachment.htm 

From hart at pglaf.org  Thu Jan 17 14:43:59 2008
From: hart at pglaf.org (Michael Hart)
Date: Thu, 17 Jan 2008 14:43:59 -0800 (PST)
Subject: [gutvol-d] happy birthday
In-Reply-To: <d68.1de3dd22.34c10649@aol.com>
References: <d68.1de3dd22.34c10649@aol.com>
Message-ID: <Pine.LNX.4.64.0801171442270.31538@pglaf.org>


Just the opposite. . .The King Estate NEVER gave PG and grief,
we just too the Dream speech down when we learned of the final 
judicial events, after several reverals in favor of CBS.

mh

On Thu, 17 Jan 2008, Bowerbird at aol.com wrote:

> greg said:
>>    Dunno.? Michael Hart probably knows.
>
> the story i remember michael telling is that
> coretta scott king wanted to shake p.g. down.
> (i'm sure he used a more delicate phrasing...)
>
> -bowerbird
>
>
>
> **************
> Start the year off right.  Easy ways to stay in shape.
>
> http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
>

From Bowerbird at aol.com  Thu Jan 17 15:05:57 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 17 Jan 2008 18:05:57 EST
Subject: [gutvol-d] happy birthday
Message-ID: <c5e.291d75fd.34c13955@aol.com>

michael said:
>    Just the opposite. . .The King Estate NEVER gave PG and grief,
>    we just too the Dream speech down when we learned of the
>    final judicial events, after several reverals in favor of CBS.

oh, ok.   my apologies for misremembering what transpired...

but michael, since that is the case, you might want to see what
the lawyer for the king estate said, when the suit was brought:
>    "Let's talk about who's being greedy," Beck said. 
>    "We give the speech to schools for free. 
>    We give the speech to non-profits and churches for free. 
>    CBS -- they don't deny it -- charges $1,000 a minute for 
>    a public school to have access to 'I Have a Dream.'" 
>    http://www.cnn.com/US/9905/11/king.speech.02/index.html

given p.g.'s status as a non-profit library, i'd guess you're safe...

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080117/a0b32296/attachment.htm 

From ricardofdiogo at gmail.com  Fri Jan 18 18:02:35 2008
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Sat, 19 Jan 2008 02:02:35 +0000
Subject: [gutvol-d] Portuguese blog on ebooks
Message-ID: <9c6138c50801181802q65e6d338ye12afce2313b8fc5@mail.gmail.com>

Everyone has a blog. _I have the right to have one too!!_ Just in case
there's some Portuguese-speaking volunteer around, here's my new blog
on ebooks:

http://ler-digital.blogspot.com/

Ricardo

From julio.reis at tintazul.com.pt  Sun Jan 20 07:09:04 2008
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis)
Date: Sun, 20 Jan 2008 15:09:04 +0000
Subject: [gutvol-d] I Have A Dream
In-Reply-To: <mailman.2.1200686402.16698.gutvol-d@lists.pglaf.org>
References: <mailman.2.1200686402.16698.gutvol-d@lists.pglaf.org>
Message-ID: <1200841744.9965.31.camel@abetarda>

So how about someone asking the King Estate? I'd be happy to translate
it into Portuguese.

J?lio.

> given p.g.'s status as a non-profit library, i'd guess you're safe...


From prosfilaes at gmail.com  Sun Jan 20 07:54:05 2008
From: prosfilaes at gmail.com (David Starner)
Date: Sun, 20 Jan 2008 10:54:05 -0500
Subject: [gutvol-d] I Have A Dream
In-Reply-To: <1200841744.9965.31.camel@abetarda>
References: <mailman.2.1200686402.16698.gutvol-d@lists.pglaf.org>
	<1200841744.9965.31.camel@abetarda>
Message-ID: <6d99d1fd0801200754t704eb98bs19db19cb37135865@mail.gmail.com>

On Jan 20, 2008 10:09 AM, J?lio Reis <julio.reis at tintazul.com.pt> wrote:
> So how about someone asking the King Estate? I'd be happy to translate
> it into Portuguese.

(a) The reason we know it's copyrighted is because the King Estate
spent lots of money litigating it. There are enough examples of the
King Estate being restrictive on reuse to negate the interest most of
us might have in asking.

(b) I find it highly unlikely that even if they gave us permission to
host the speech, if they would let us make derivative works including
translation.

From f.fuchs at gmx.net  Sun Jan 20 12:06:30 2008
From: f.fuchs at gmx.net (Franz Fuchs)
Date: Sun, 20 Jan 2008 21:06:30 +0100
Subject: [gutvol-d] YouTube: A Librarian Reviews the XO Laptop
References: <c26320b80801140544p40b0f85at98258aea912f4efe@mail.gmail.com>
Message-ID: <001801c85b9f$fd0eea80$8c00000a@frf>

http://youtube.com/watch?v=quJIAucDOU0

---
I wish all the best for the One Laptop Per Child Foundation. Nicholas
Negroponte is a true visionary and I applaud his efforts to help provide an
education to children in developing nations. I wonder if he and others
working with the OLPC realize how much they are educating adults in this
nation as we partner with them to help bridge the gap in the digital divide
http://librarianbydesign.blogspot.com/

Added: January 13, 2008
---

Best regards
FrF


From Bowerbird at aol.com  Fri Jan 25 11:09:32 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 25 Jan 2008 14:09:32 EST
Subject: [gutvol-d] author pirates his own books
Message-ID: <bde.27e760fe.34cb8dec@aol.com>

author pirates his own books, and increases sales dramatically:
>    http://torrentfreak.com/alchemist-author-pirates-own-books-080124/

-bowerbird


**************
Biggest Grammy Award surprises of all time on AOL Music.
     
(http://music.aol.com/grammys/pictures/never-won-a-grammy?NCID=aolcmp003000000025
48)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080125/9e727373/attachment.htm 

From ricardofdiogo at gmail.com  Fri Jan 25 12:53:29 2008
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Fri, 25 Jan 2008 20:53:29 +0000
Subject: [gutvol-d] author pirates his own books
In-Reply-To: <bde.27e760fe.34cb8dec@aol.com>
References: <bde.27e760fe.34cb8dec@aol.com>
Message-ID: <9c6138c50801251253k1166141fo96c4e7b2a127b35@mail.gmail.com>

Yes. He also told me that PG is allowed to distribute his books. I'm
waiting for an answer from Greg so that Coelho can send the permission
letter to PG.

Ricardo

2008/1/25, Bowerbird at aol.com <Bowerbird at aol.com>:
> author pirates his own books, and increases sales dramatically:
>  >
> http://torrentfreak.com/alchemist-author-pirates-own-books-080124/
>
>  -bowerbird
>
>
>
> **************
> Biggest Grammy Award surprises of all time on AOL Music.
> (http://music.aol.com/grammys/pictures/never-won-a-grammy?NCID=aolcmp00300000002548)
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

From hart at pglaf.org  Fri Jan 25 13:15:32 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 25 Jan 2008 13:15:32 -0800 (PST)
Subject: [gutvol-d] author pirates his own books
In-Reply-To: <9c6138c50801251253k1166141fo96c4e7b2a127b35@mail.gmail.com>
References: <bde.27e760fe.34cb8dec@aol.com>
	<9c6138c50801251253k1166141fo96c4e7b2a127b35@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0801251314080.11547@pglaf.org>


You can have an answer from me.

Go for it!


And let's be sure to mention all this INSIDE the books.

Perhaps get permission from the suthor of one or twoarticles
that mention the whole process. . . .?


Michael

On Fri, 25 Jan 2008, Ricardo F Diogo wrote:

> Yes. He also told me that PG is allowed to distribute his books. I'm
> waiting for an answer from Greg so that Coelho can send the permission
> letter to PG.
>
> Ricardo
>
> 2008/1/25, Bowerbird at aol.com <Bowerbird at aol.com>:
>> author pirates his own books, and increases sales dramatically:
>> >
>> http://torrentfreak.com/alchemist-author-pirates-own-books-080124/
>>
>>  -bowerbird
>>
>>
>>
>> **************
>> Biggest Grammy Award surprises of all time on AOL Music.
>> (http://music.aol.com/grammys/pictures/never-won-a-grammy?NCID=aolcmp00300000002548)
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d at lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>
>>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From ricardofdiogo at gmail.com  Fri Jan 25 13:24:47 2008
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Fri, 25 Jan 2008 21:24:47 +0000
Subject: [gutvol-d] author pirates his own books
In-Reply-To: <Pine.LNX.4.64.0801251314080.11547@pglaf.org>
References: <bde.27e760fe.34cb8dec@aol.com>
	<9c6138c50801251253k1166141fo96c4e7b2a127b35@mail.gmail.com>
	<Pine.LNX.4.64.0801251314080.11547@pglaf.org>
Message-ID: <9c6138c50801251324o4edff427j7c7da588cc1e905a@mail.gmail.com>

Can I ask him to send you the following letter?

Michael S. Hart
Founder, Project Gutenberg
405 West Elm Street
Urbana IL, 61801-3231, USA

Dear Project Gutenberg:

It gives me pleasure to grant Project Gutenberg perpetual,
worldwide, non-exclusive rights to distribute all my books in electronic
form through Project Gutenberg Web sites, CDs or other current and
future formats.  No royalties are due for these rights. The same applies
to end users.

Sincerely,


Will it be enough?

Ricardo

2008/1/25, Michael Hart <hart at pglaf.org>:
>
>
>
> You can have an answer from me.
>
> Go for it!
>
>
> And let's be sure to mention all this INSIDE the books.
>
> Perhaps get permission from the suthor of one or twoarticles
> that mention the whole process. . . .?
>
>
> Michael
>
> On Fri, 25 Jan 2008, Ricardo F Diogo wrote:
>
> > Yes. He also told me that PG is allowed to distribute his books. I'm
> > waiting for an answer from Greg so that Coelho can send the permission
> > letter to PG.
> >
> > Ricardo
> >
> > 2008/1/25, Bowerbird at aol.com <Bowerbird at aol.com>:
> >> author pirates his own books, and increases sales dramatically:
> >> >
> >> http://torrentfreak.com/alchemist-author-pirates-own-books-080124/
> >>
> >>  -bowerbird
> >>
> >>
> >>
> >> **************
> >> Biggest Grammy Award surprises of all time on AOL Music.
> >> (http://music.aol.com/grammys/pictures/never-won-a-grammy?NCID=aolcmp00300000002548)
> >> _______________________________________________
> >> gutvol-d mailing list
> >> gutvol-d at lists.pglaf.org
> >> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >>
> >>
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d at lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From Bowerbird at aol.com  Mon Jan 28 13:08:32 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 28 Jan 2008 16:08:32 EST
Subject: [gutvol-d] talk about a thin computer!
Message-ID: <d50.2084aac7.34cf9e50@aol.com>

talk about a thin computer!
>    http://www.youtube.com/watch?v=i6yBo9NPkCQ

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080128/68ae4e57/attachment.htm 

From Bowerbird at aol.com  Tue Jan 29 15:29:40 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 29 Jan 2008 18:29:40 EST
Subject: [gutvol-d] more on "doing things the hard way"
Message-ID: <d0c.25d0f8e3.34d110e4@aol.com>

remember the project over at d.p. where they're attempting
to determine a "confidence in page accuracy" computation,
to tell if a page is accurate enough, or needs more proofing?

well, i'm sure they're hard at work, cranking their numbers,
but meanwhile, here comes in a good observation analysis:

>    The P1/P1/P2   - P3 skips are filtering through to F2 now, 
>    and look to be in pretty good shape.   Thumbs Up
>    I think it is the best intitiative we have had for a while. 
>    http://www.pgdp.net/phpBB2/viewtopic.php?p=417654#417654

that's right.   sending text through 2 rounds of p1, then a p2,
results in clean text, probably not all that much different from
a p1-p2-p3 routing.   3 rounds will usually give you clean text,
_even_without_any_"proven-talent"_p3_proofer_in_the_mix_...

whether you can stop after 2, or even 1, is what the question is.

but p1 proofers are _plentiful_, so why not just do p1-p1-p1?
and even just p1-p1 _if_ the second proofer makes no change?

oh yeah, then you'd just be following my rule about consensus...

no complex statistics necessary, just a simple test of equivalence.

you know people _want_ things to be difficult, _like_ them difficult,
when they won't even _try_ the simple way first...               :+)

-bowerbird

p.s.   oh yeah, and the best way to obtain great accuracy is to
program a checker that flags _all_ errors, and _only_ errors...
a hint: it's not as impossible as you might think at first glance.


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080129/b881b704/attachment.htm 

From creeva at gmail.com  Tue Jan 29 16:08:53 2008
From: creeva at gmail.com (Brent Gueth)
Date: Tue, 29 Jan 2008 19:08:53 -0500
Subject: [gutvol-d] more on "doing things the hard way"
In-Reply-To: <d0c.25d0f8e3.34d110e4@aol.com>
References: <d0c.25d0f8e3.34d110e4@aol.com>
Message-ID: <2510ddab0801291608v2547fe1atd3e5313454354a81@mail.gmail.com>

People reinvent the wheel to give them a place in hierarchy

Or

Bureaucracy makes the world go round.

Or

People think there is always a better way

Or finally,

Show a geek how you do something and he'll always find a way to show
why his method is better.


I'm not taking sides on either angle of this argument - but it's been
going on for awhile - what does it take to get a consensus and move on
from there.  Someone just needs to say this is the way it's going to
be done - then we can all move on, quietly gripe and take potshots
while work is getting accomplished versus debate.

On Jan 29, 2008 6:29 PM,  <Bowerbird at aol.com> wrote:
> remember the project over at d.p. where they're attempting
>  to determine a "confidence in page accuracy" computation,
>  to tell if a page is accurate enough, or needs more proofing?
>
>  well, i'm sure they're hard at work, cranking their numbers,
>  but meanwhile, here comes in a good observation analysis:
>
>  >   The P1/P1/P2  - P3 skips are filtering through to F2 now,
>  >   and look to be in pretty good shape.  Thumbs Up
>  >   I think it is the best intitiative we have had for a while.
>  >   http://www.pgdp.net/phpBB2/viewtopic.php?p=417654#417654
>
>  that's right.  sending text through 2 rounds of p1, then a p2,
>  results in clean text, probably not all that much different from
>  a p1-p2-p3 routing.  3 rounds will usually give you clean text,
>  _even_without_any_"proven-talent"_p3_proofer_in_the_mix_...
>
>  whether you can stop after 2, or even 1, is what the question is.
>
>  but p1 proofers are _plentiful_, so why not just do p1-p1-p1?
>  and even just p1-p1 _if_ the second proofer makes no change?
>
>  oh yeah, then you'd just be following my rule about consensus...
>
>  no complex statistics necessary, just a simple test of equivalence.
>
>  you know people _want_ things to be difficult, _like_ them difficult,
>  when they won't even _try_ the simple way first...              :+)
>
>  -bowerbird
>
>  p.s.  oh yeah, and the best way to obtain great accuracy is to
>  program a checker that flags _all_ errors, and _only_ errors...
>  a hint: it's not as impossible as you might think at first glance.
>
>
>
> **************
> Start the year off right. Easy ways to stay in shape.
>  http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

From Morasch at aol.com  Wed Jan 30 13:29:24 2008
From: Morasch at aol.com (Morasch at aol.com)
Date: Wed, 30 Jan 2008 16:29:24 EST
Subject: [gutvol-d] more on "doing things the hard way"
Message-ID: <d10.1c77cae9.34d24634@aol.com>

brent said:
>    but it's been going on for awhile

i'll say!                     :+)

and the stray comment i make here every once in a while is 
just a very small tip of the big iceberg of discussion over there.

i can point you to literally _dozens_ of different threads, where
many different proposals have been made, and some executed
-- threads going _dozens_ of pages, at 15 messages/page --
so these projects are discussed ad infinitum and then forgotten,
at least until a similar thread raises its head years down the line.

they've talked this issue to death, and basically gotten nowhere,
and it's extremely frustrating to them, to a great many of them...

that's why it's so comical when you can see the answer is so easy.


>    what does it take to get a consensus and move on from there.

none of them seem to know that, either, and say so, frequently...

basically, it means juliet giving the go-ahead,   but she's confused,
hopelessly confused, and that means everyone ends up confused.

but yeah, she's who stated that this "confidence in page" thingee
needed to be calculable before d.p. can go to a roundless mode.

it was at that time that i made my "that's not really the case" post.

i would have said it there -- said it _again_, that is, since i said it
there many times before -- except i was _banned_ from speaking.

so i said it here instead.   (thanks, michael, for freedom to speak.)

not many of the people that are off on this useless quest are here,
though, except for carlo (who's the main leader of the uselessness),
so it doesn't matter all that much.   just me feeling a need to say it...


>    then we can all move on, quietly gripe and take potshots
>    while work is getting accomplished versus debate.

it's not quite so easy to say that "work is getting accomplished"...

of course _some_ work is being "accomplished", but the question is
"at what expense in human time and energy?"   if the process wastes
a huge amount of resources, and could be massively more efficient
(getting more "accomplished" and creating more happiness as well),
shouldn't someone who can _recognize_that_ step up and speak out?

i certainly believe so, and believe so strongly.   so when that person is
_me_, i'm gonna step up and speak out.   and that's how it's gonna be.

but i'm sure glad no one has been making a federal case out of it lately.

i just wanna put myself on the record, so when d.p. eventually wises up,
an objective observer sees that they should've listened to me originally.

-bowerbird

p.s.   i'm also trying to inspire some thinking at a much higher level.
perhaps you would like it more if i just pitched posts at that altitude?

for example, since people are extending the effort to try to determine
how to predict if a page is accurate-enough or not, what if it appears
that -- with just a bit more effort -- they could obtain a useful answer?
then, even though it was a big mistake to _start_out_ on that pathway,
should they nevertheless continue?   now _that's_ an interesting query!
i would say _yes_, they should, even though i believe they won't succeed.
but i could have instead posted a message that considered this question.
would you have preferred that?

or, to take it even further, let's ask ourselves what kind of system they'll
employ in order to _test_the_efficacy_ of their predictor, if they do use it.
i would argue that they will need to utilize some kind of _infrastructure_
that collects error-reports downstream and feeds-back to their predictor.
otherwise, their predictor could be flawed, and they would never find out.
but they haven't thought that far ahead, and realized they need to build it.
moreover, if they _do_ create a downstream error-reporting system, then
_that_ could be considered their "last line of defense", and thus there is a
good reason to propose that they don't even _need_ a predictor machine.
and, in this regard, it's interesting to note that they have not made use of
their closest proxy to that variable now, namely the errors that are being
reported by their "smooth-readers", who read a final text _for_content_.
these smooth-readers do find errors -- even after 3 rounds of proofing
and 2 rounds of formatting, yes! -- and it would be extremely cogent to
ascertain the underlying nature of such errors, if there happens to be one.
so, would you prefer having a discussion that was pitched at _that_ level?

i'm a self-starter.   i'm happy -- quite happy -- to post without any 
replies.

but, you know, if anyone wants to have a _conversation_, i can do that too.
just let me know at what level of the mountain you want to pitch the tent...


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080130/5f8c42ce/attachment.htm 

From klofstrom at gmail.com  Wed Jan 30 18:22:19 2008
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Wed, 30 Jan 2008 16:22:19 -1000
Subject: [gutvol-d] more on "doing things the hard way"
In-Reply-To: <d10.1c77cae9.34d24634@aol.com>
References: <d10.1c77cae9.34d24634@aol.com>
Message-ID: <1e8e65080801301822t4d66de0crea0285523af0c5ad@mail.gmail.com>

On Jan 30, 2008 11:29 AM,  <Morasch at aol.com> wrote:

>  i'm a self-starter.  i'm happy -- quite happy -- to post without any replies.

So much so that you change your username to avoid killfiles. That's
not someone who doesn't mind whether anyone is listening or not;
that's someone intent to annoy. Your comments on the DP process are
worthless because you've never done any higher-round proofing,
formatting, or PPing. You've done a few pages a few years ago ... and
you're an expert? It is to laugh.

Any sensible list moderator would have banned you long ago. Now into
the killfile with you.

-- 
Karen Lofstrom

From schultzk at uni-trier.de  Thu Jan 31 01:40:22 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 31 Jan 2008 10:40:22 +0100
Subject: [gutvol-d] more on "doing things the hard way"
In-Reply-To: <d10.1c77cae9.34d24634@aol.com>
References: <d10.1c77cae9.34d24634@aol.com>
Message-ID: <6A1B3201-C914-41A4-A1C6-767CEBAC2C16@uni-trier.de>

Hi there,

	This post somewhat confuses me.
		1) In the middle it is signed Bowerbird
		2) a copy is sent to bowerbird
		3) the style is not bowerbird
		4) the content does seem to indicate bowerbird
		5) it is not bowerbird way to hide himself

	So is the post below actually from bowerbird.

	Anyway. As with any open project anarchy tends to rule.
	That is there are long discussions that do not seem to lead
	to meaningful end. That is the implementation of the discussed.
	More than often only a very small part actually gets implemented,
	since a concensus can not be reach.

	Such projects as DP and PG need some kind of authorative manager.
	Without such manager the evolution of the project takes time.

	If DP has a problem with you then stay away.  I do as I do not like
	their way of doings things. But that is just my opinion.

	regards
		Keith

Am 30.01.2008 um 22:29 schrieb Morasch at aol.com:

> brent said:
> >   but it's been going on for awhile
>
> i'll say!                    :+)
>
> and the stray comment i make here every once in a while is
> just a very small tip of the big iceberg of discussion over there.
>
> i can point you to literally _dozens_ of different threads, where
> many different proposals have been made, and some executed
> -- threads going _dozens_ of pages, at 15 messages/page --
> so these projects are discussed ad infinitum and then forgotten,
> at least until a similar thread raises its head years down the line.
>
> they've talked this issue to death, and basically gotten nowhere,
> and it's extremely frustrating to them, to a great many of them...
>
> that's why it's so comical when you can see the answer is so easy.
>
>
> >   what does it take to get a consensus and move on from there.
>
> none of them seem to know that, either, and say so, frequently...
>
> basically, it means juliet giving the go-ahead,  but she's confused,
> hopelessly confused, and that means everyone ends up confused.
>
> but yeah, she's who stated that this "confidence in page" thingee
> needed to be calculable before d.p. can go to a roundless mode.
>
> it was at that time that i made my "that's not really the case" post.
>
> i would have said it there -- said it _again_, that is, since i  
> said it
> there many times before -- except i was _banned_ from speaking.
>
> so i said it here instead.  (thanks, michael, for freedom to speak.)
>
> not many of the people that are off on this useless quest are here,
> though, except for carlo (who's the main leader of the uselessness),
> so it doesn't matter all that much.  just me feeling a need to say  
> it...
>
>
> >   then we can all move on, quietly gripe and take potshots
> >   while work is getting accomplished versus debate.
>
> it's not quite so easy to say that "work is getting accomplished"...
>
> of course _some_ work is being "accomplished", but the question is
> "at what expense in human time and energy?"  if the process wastes
> a huge amount of resources, and could be massively more efficient
> (getting more "accomplished" and creating more happiness as well),
> shouldn't someone who can _recognize_that_ step up and speak out?
>
> i certainly believe so, and believe so strongly.  so when that  
> person is
> _me_, i'm gonna step up and speak out.  and that's how it's gonna be.
>
> but i'm sure glad no one has been making a federal case out of it  
> lately.
>
> i just wanna put myself on the record, so when d.p. eventually  
> wises up,
> an objective observer sees that they should've listened to me  
> originally.
>
> -bowerbird
>
> p.s.  i'm also trying to inspire some thinking at a much higher level.
> perhaps you would like it more if i just pitched posts at that  
> altitude?
>
> for example, since people are extending the effort to try to determine
> how to predict if a page is accurate-enough or not, what if it appears
> that -- with just a bit more effort -- they could obtain a useful  
> answer?
> then, even though it was a big mistake to _start_out_ on that pathway,
> should they nevertheless continue?  now _that's_ an interesting query!
> i would say _yes_, they should, even though i believe they won't  
> succeed.
> but i could have instead posted a message that considered this  
> question.
> would you have preferred that?
>
> or, to take it even further, let's ask ourselves what kind of  
> system they'll
> employ in order to _test_the_efficacy_ of their predictor, if they  
> do use it.
> i would argue that they will need to utilize some kind of  
> _infrastructure_
> that collects error-reports downstream and feeds-back to their  
> predictor.
> otherwise, their predictor could be flawed, and they would never  
> find out.
> but they haven't thought that far ahead, and realized they need to  
> build it.
> moreover, if they _do_ create a downstream error-reporting system,  
> then
> _that_ could be considered their "last line of defense", and thus  
> there is a
> good reason to propose that they don't even _need_ a predictor  
> machine.
> and, in this regard, it's interesting to note that they have not  
> made use of
> their closest proxy to that variable now, namely the errors that  
> are being
> reported by their "smooth-readers", who read a final text  
> _for_content_.
> these smooth-readers do find errors -- even after 3 rounds of proofing
> and 2 rounds of formatting, yes! -- and it would be extremely  
> cogent to
> ascertain the underlying nature of such errors, if there happens to  
> be one.
> so, would you prefer having a discussion that was pitched at _that_  
> level?
>
> i'm a self-starter.  i'm happy -- quite happy -- to post without  
> any replies.
>
> but, you know, if anyone wants to have a _conversation_, i can do  
> that too.
> just let me know at what level of the mountain you want to pitch  
> the tent...
>
>
>
> **************
> Start the year off right. Easy ways to stay in shape.
> http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080131/64aafefa/attachment-0001.htm 

From schultzk at uni-trier.de  Thu Jan 31 01:56:07 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 31 Jan 2008 10:56:07 +0100
Subject: [gutvol-d] more on "doing things the hard way"
In-Reply-To: <1e8e65080801301822t4d66de0crea0285523af0c5ad@mail.gmail.com>
References: <d10.1c77cae9.34d24634@aol.com>
	<1e8e65080801301822t4d66de0crea0285523af0c5ad@mail.gmail.com>
Message-ID: <FFBD3475-E6A7-4079-9D7F-0D97AD41F65D@uni-trier.de>

Hi,

	What is in a username. I have about 8. I do tend to use
	just 2 E-mail addresses. The person did sign in the middle.
	
	Though as I notice in another reply I am not quite convinced
	it is actually Bowerbird.

	I do not know if you have ever done system analysis or not, but
	it can be done and is without practical experience. It is highly
	theorectical. I wonder if Einstein had practical experience with
	relativity. O.K. I know he did not. Yet, somehow he was right.
	At least that is what physics tells us today.

	No. No. No. Bowerbird does not come close to Einstein. He does
	have his caveats. I also tend to disagree with him and enjoy
	the discussions, because is willing to debate.

	regards
		Keith.

Am 31.01.2008 um 03:22 schrieb Karen Lofstrom:

> On Jan 30, 2008 11:29 AM,  <Morasch at aol.com> wrote:
>
>>  i'm a self-starter.  i'm happy -- quite happy -- to post without  
>> any replies.
>
> So much so that you change your username to avoid killfiles. That's
> not someone who doesn't mind whether anyone is listening or not;
> that's someone intent to annoy. Your comments on the DP process are
> worthless because you've never done any higher-round proofing,
> formatting, or PPing. You've done a few pages a few years ago ... and
> you're an expert? It is to laugh.
>
> Any sensible list moderator would have banned you long ago. Now into
> the killfile with you.
>
> -- 
> Karen Lofstrom
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From Bowerbird at aol.com  Thu Jan 31 10:32:29 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 31 Jan 2008 13:32:29 EST
Subject: [gutvol-d] more on "doing things the hard way"
Message-ID: <d52.22cc2472.34d36e3d@aol.com>

keith-

please accept my apologies.   i didn't mean to confuse anyone.
i just happened to e-mail that post from the wrong log-in...
so here it is again, so you know it's "genuine"...   thank you...

-bowerbird

=======================================

brent said:
>    but it's been going on for awhile

i'll say!                     :+)

and the stray comment i make here every once in a while is 
just a very small tip of the big iceberg of discussion over there.

i can point you to literally _dozens_ of different threads, where
many different proposals have been made, and some executed
-- threads going _dozens_ of pages, at 15 messages/page --
so these projects are discussed ad infinitum and then forgotten,
at least until a similar thread raises its head years down the line.

they've talked this issue to death, and basically gotten nowhere,
and it's extremely frustrating to them, to a great many of them...

that's why it's so comical when you can see the answer is so easy.


>    what does it take to get a consensus and move on from there.

none of them seem to know that, either, and say so, frequently...

basically, it means juliet giving the go-ahead,   but she's confused,
hopelessly confused, and that means everyone ends up confused.

but yeah, she's who stated that this "confidence in page" thingee
needed to be calculable before d.p. can go to a roundless mode.

it was at that time that i made my "that's not really the case" post.

i would have said it there -- said it _again_, that is, since i said it
there many times before -- except i was _banned_ from speaking.

so i said it here instead.   (thanks, michael, for freedom to speak.)

not many of the people that are off on this useless quest are here,
though, except for carlo (who's the main leader of the uselessness),
so it doesn't matter all that much.   just me feeling a need to say it...


>    then we can all move on, quietly gripe and take potshots
>    while work is getting accomplished versus debate.

it's not quite so easy to say that "work is getting accomplished"...

of course _some_ work is being "accomplished", but the question is
"at what expense in human time and energy?"   if the process wastes
a huge amount of resources, and could be massively more efficient
(getting more "accomplished" and creating more happiness as well),
shouldn't someone who can _recognize_that_ step up and speak out?

i certainly believe so, and believe so strongly.   so when that person is
_me_, i'm gonna step up and speak out.   and that's how it's gonna be.

but i'm sure glad no one has been making a federal case out of it lately.

i just wanna put myself on the record, so when d.p. eventually wises up,
an objective observer sees that they should've listened to me originally.

-bowerbird

p.s.   i'm also trying to inspire some thinking at a much higher level.
perhaps you would like it more if i just pitched posts at that altitude?

for example, since people are extending the effort to try to determine
how to predict if a page is accurate-enough or not, what if it appears
that -- with just a bit more effort -- they could obtain a useful answer?
then, even though it was a big mistake to _start_out_ on that pathway,
should they nevertheless continue?   now _that's_ an interesting query!
i would say _yes_, they should, even though i believe they won't succeed.
but i could have instead posted a message that considered this question.
would you have preferred that?

or, to take it even further, let's ask ourselves what kind of system they'll
employ in order to _test_the_efficacy_ of their predictor, if they do use it.
i would argue that they will need to utilize some kind of _infrastructure_
that collects error-reports downstream and feeds-back to their predictor.
otherwise, their predictor could be flawed, and they would never find out.
but they haven't thought that far ahead, and realized they need to build it.
moreover, if they _do_ create a downstream error-reporting system, then
_that_ could be considered their "last line of defense", and thus there is a
good reason to propose that they don't even _need_ a predictor machine.
and, in this regard, it's interesting to note that they have not made use of
their closest proxy to that variable now, namely the errors that are being
reported by their "smooth-readers", who read a final text _for_content_.
these smooth-readers do find errors -- even after 3 rounds of proofing
and 2 rounds of formatting, yes! -- and it would be extremely cogent to
ascertain the underlying nature of such errors, if there happens to be one.
so, would you prefer having a discussion that was pitched at _that_ level?

i'm a self-starter.   i'm happy -- quite happy -- to post without any 
replies.

but, you know, if anyone wants to have a _conversation_, i can do that too.
just let me know at what level of the mountain you want to pitch the tent...


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080131/1f8fba4a/attachment.htm 

From Bowerbird at aol.com  Thu Jan 31 11:12:17 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 31 Jan 2008 14:12:17 EST
Subject: [gutvol-d] more on "doing things the hard way"
Message-ID: <bff.2dcde807.34d37791@aol.com>

keith said:
>    No. No. No. Bowerbird does not come close to Einstein.

well, my hair _is_ quite beautiful,
but hey, _nothing_ could top his.
that guy had the greatest hair ever.

as for zora (karen lofstrom),
well, she acts like a dingbat...

(notice i did _not_ say she _is_
a dingbat, because that would
be an ad hominem argument;
i am only speaking about her
_behavior_, which is within
her capacity for _change_...)

the argument that you need to
be _inside_ a tar-pit to know
it's a tar-pit is laughably silly...

a lot of things are much more
easy to see from an _objective_
perspective, even a distant one.

and i have a ton of experience
digitizing text outside of d.p.

i've done dc-10 flight manuals,
text-books, magazines, poetry,
novels, and a host of other stuff,
including public-domain books...

if she would have paid attention,
zora would even know that i've
analyzed a book she processed;
i documented _dozens_ of errors
-- embarrassing ones -- inside it,
errors that remain to this very day.

i clean up and format digitized text
for entertainment, like other people
will do crossword puzzles or sudoku.

heck, i did lessig's "freedom of ideas"
last week, just for the fun of it, which
was nice because i had _clean_text_,
since i just copied it out of the .pdf,
but was also a bear because i had to
rework the formatting extensively,
since i copied the text out of a .pdf...
>    http://z-m-l.com/go/llfoi/llfoif001.html

the thing is, when you do something
_for_fun_, you simply won't let yourself
get trapped in a tar-pit...

that i make suggestions as to how d.p.
could get itself out of its current tar-pit
is _an_act_of_love_, because i highly value
the individuals who are volunteering time
and energy in support of the public-domain.

the failure of the d.p. "powers that be" --
not to mention many of those volunteers
-- is on _them_, and not my responsibility.

it's neither here nor there, though, because
within a few years time, anyone will be able to
digitize any book they want, by simply dropping
the o.c.r. results onto a clean-up program and
answering a few questions to resolve ambiguity,
so d.p. will either morph to that reality, or die...

-bowerbird


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080131/379d6ab2/attachment.htm 

From Bowerbird at aol.com  Thu Jan 31 11:23:32 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 31 Jan 2008 14:23:32 EST
Subject: [gutvol-d] two overarching thoughts on a roundless system of
	proofing
Message-ID: <d60.1e3b803b.34d37a34@aol.com>

please let me repeat this post, from saturday, december 15, 2007 at 1:29pm:

i'll have a lot more later -- it's written already, but i think i will wait
until monday to send it to this list -- but here are two overarching
thoughts about implementing a _roundless_ system of proofing...

(in case you're wondering why, this is a topic that is being discussed
over on the d.p. forums, presently, and often over the past few years.
and it's a shame it never moves past the discussion phase, since the
current system -- where _every_page_of_every_book_ is slated to go
through a specific number of rounds -- is grossly inefficient, and has
led to a huge waste of time and energy, plus endless discussions and
a wide array of experiments to overcome its obvious shortcomings.
however, the discussion is marred by a bunch of people who simply
don't know what they're talking about, and by the fact that no one
over there seems to be able to separate the wheat from the chaff...)

anyway, here are those two overarching thoughts.

1.? it's unnecessary to "formulate some kind of metric" to inform you
when a specific page can be considered "finished".? it is _done_ when
a certain number of people -- say 2 to 4 -- can't find any errors in it.
at that point, even if there _are_ still errors in it, it has simply become
unproductive to schedule yet _another_ set of eyes to look for them...
but, for the vast majority of pages, there just won't be any errors left.
you don't have to believe me.? just try it -- as the simplest thing that
_might_ work -- and you will happily discover it does indeed work...

2.? it's unnecessary to "formulate some kind of metric" to inform you
about the proofing skills of each volunteer.? it's easy enough to use
the obvious measures to determine a score, but it's unnecessary to
_use_ that score in order to assign pages to the proofer, since the
measure of whether a page is "finished" or not is impervious to the
skill levels of the proofers.? if 2-4 "average" proofers find no errors
left on a page, then the odds are that a "great" proofer won't either.
and -- once again -- you don't have to believe me that this is true;
try it -- as the simplest thing that _might_ work -- and find it does...

in other words, don't make it more complicated that it has to be...

-bowerbird

p.s.   thank you...


**************
Start the year off right.  Easy ways to stay in shape.
     
http://body.aol.com/fitness/winter-exercise?NCID=aolcmp00300000002489
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080131/3ea435fd/attachment-0001.htm