From Bowerbird at aol.com Fri Aug 1 10:15:58 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 1 Aug 2008 13:15:58 EDT Subject: [gutvol-d] bastien Message-ID: once again, bastien enters the arena with ad hominem crap. his description of me as "the one doing that much noise" _is_, however, one of the more poetic forms of "troll" that i've seen. remember, folks, if you simply put the phrase "bastien rules" in the subject header of any thread-topic that you introduce, i've pledged to not comment on that thread for a good while. so if you really want to talk about something without my input, that's the way to do it. now, who wants to step up and create the first bird-free thread? because we know how you're all itching to be so "constructive"... -bowerbird p.s. at one time in the past, an alternative listserve _was_ formed. i can't remember if it never took off at all, or only lasted two days. p.p.s. you might want to have jon noring moderate your listserve. he has a _lot_ of experience with that sort of thing... ************** Looking for a car that's sporty, fun and fits in your budget? Read reviews on AOL Autos. (http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 ) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080801/9f51d449/attachment.htm From bzg at altern.org Fri Aug 1 10:52:57 2008 From: bzg at altern.org (Bastien) Date: Fri, 01 Aug 2008 19:52:57 +0200 Subject: [gutvol-d] bastien In-Reply-To: (Bowerbird@aol.com's message of "Fri, 1 Aug 2008 13:15:58 EDT") References: Message-ID: Hello Bowerbird, Bowerbird at aol.com writes: > once again, bastien enters the arena with ad hominem crap. Sometimes we're so close to the problem we can't even see it. Maybe you're just so close to yourself that you can't understand the harm you're doing to this list. I think your story with Michael is a love story. You won't give up until there is only you and him on this list. Good luck! I'm off now. -- Bastien From Bowerbird at aol.com Fri Aug 1 10:59:39 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 1 Aug 2008 13:59:39 EDT Subject: [gutvol-d] how to clean up ("preprocess") the o.c.r. for a book -- 031 Message-ID: 31. create a hotlinked table-of-contents page for the book and of course, once we made that list of pages with chapter-headers, we had essentially created a hotlinked table-of-contents for the book, so with just a few global find/replace in the word-processor, voila: > http://z-m-l.com/go/mount/mount-c-toc.html i'll also append it to this e-mail, so if your e-mail client can grok weblinks, then you should have a ready-made table-of-contents for this book in your e-mail, that's how fluid the book has become. *** 76 more lines corrected (i.e., more or less created out of thin air), for a grand total of 574, on 31 routines. thanks for coming along! i'll be back tomorrow with the next (last?) suggestion in this series... if i get the laptop up and connected, that is, since i'm flying off to madison for the 2008 national poetry slam team championship... otherwise, if you don't hear from me in a little while, that is why... you guys are fun and all that, but with hundreds and hundreds of performance poets in my face, i'm not too sure i'll have much time for e-mail and listserves and all that other online minutiae stuff... -bowerbird

table of contents for "mountain blood"...

 ONE
     missing chapter 1 
here... (on page 009.html)
             II -- It was 
a mountain surrey, with a top and rolled
             Ill -- 
MEANWHILE, they drew steadily over the
             IV -- The 
dank, green smell hung in their nostrils
     missing chapter 5 
here... (on page 023.html)
             VI -- A 
SMALL, familiar group awaited the arrival
             VII -- A 
SMOOTH, conical hill rose sharply to the
             VIII -- 
OUTSIDE, the village, the Greenstream Valley,
             IX -- At 
noon, on the day following, he stood on
             X -- Large 
kerosene lamps dilated by tin reflectors
             XI -- Late in 
the night they were still playing
             XII -- He 
stumbled hastily down the stairway, and
             XIII -- The 
afternoon was waning when he gazed
             XIV -- He 
woke at dawn. The whippoorwills, the
             XV -- One of 
the canvas-covered mountain wagons
             XVI -- Gordon 
had intended to avoid the vicinity
             XVII -- Minus 
certain costs and the amount of his
             XVIII -- 
CLARE'S body was brought back to Greenstream
             XIX -- Gordon 
thought again of Lettice Hollidew
             XX -- CLARE'S 
funeral deducted a further sum
             XXI -- Again 
sober, without the resources of the
             XXII -- The 
room was singularly bare: a tin lamp
             XXIII -- When 
he woke the room was bright with
             XXIV -- She 
retreated, as he advanced, within the
             XXV -- He 
made his way to where Greenstream village
             XXVI -- THE 
"board or so" to be replaced on the ice
             XXVII -- His 
miscellaneous labors at the minister's

 TWO
     missing chapter 1 
here... (on page 149.html)
             II -- ITwas 
his own home to which he returned, the
             Ill -- Beyond 
the dining room was their bedroom,
             IV -- The 
following morning, "Oh, Gordon!" Lettice
     missing chapter 5 
here... (on page 175.html)
             VI -- The 
spring night was potent, warm and
             VII -- The 
memory of Meta Beggs was woven like a
             VII -- The 
memory of Meta Beggs was woven like a
             VIII -- He 
drove over the road that lay at the base
             IX -- META 
BEGGS saw Gordon at the same
             X -- Gordon 
found Meta Beggs on the outskirt
             XI -- ON 
Sunday he strolled soon after breakfast
             XII -- Gordon 
carefully explained the entire circumstance
             XIII -- WHEN 
Gordon returned to his dwelling
             XIV -- 
RUTHERFORD BERRY and Effie, Barnwell
             XV -- It was 
comparatively a short distance to the elder
             XVI -- ' 
'TT'VE got something for you," Gordon said sud-
             XVII -- V -- 
BUT, curiously, sitting alone, he gave little
             XVIII -- 
GORDON MAKIMMON made one step toward
             XIX -- A 
HOARSE, thin cry sounded from within
             XX -- He 
passed through the dining room to the
             XXI -- After 
a while he rose, impelled once more

 THREE
     missing chapter 1 
here... (on page 281.html)
             II -- The 
purpose of this gathering was instantly
             Ill -- 
(WENTY-SEVEN hundred and ninety
             IV -- The 
fitful wind had, apparently, driven the
     missing chapter 5 
here... (on page 294.html)
             VI -- Some 
days after the Vibards' arrival Gordon
             VII -- He 
knew, generally, where Alexander Crandall's
             VIII -- It 
was dark when Gordon closed the stable door
             IX -- On an 
afternoon of mid-August Gordon was
             X -- The heat 
thickened with the dusk. The wailing
             T -- XI -- He 
year, in the immemorial, minute shifting
             XII -- GORDON 
MAKIMMON, absorbed in the
             XIII -- Even 
if he proved able to buy out Simmons,
             XIV -- Gordon 
paid Valentine Simmons eighty-nine
             XV -- Gordon 
placed on the table before him the
             XVI -- As 
customary on Saturday noon Gordon
             XVII -- He 
felt strangely lost in the sudden emptiness
             XVIII -- The 
stir and heat of Sim's presence died
             XIX -- The 
cold sharpened; the sky, toward evening,
             XX -- Gordon 
met Valentine Simmons squarely
             XXI -- He 
rose at five on Thursday and consumed a
             XXII -- 
BUCKLEY SIMMONS was late in arriving
             XXIII -- 
GORDON MAKIMMON rose to a sitting
             XXIV -- An 
overwhelming desire possessed Gordon



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080801/dbcf19fc/attachment.htm 

From Bowerbird at aol.com  Fri Aug  1 11:06:57 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 1 Aug 2008 14:06:57 EDT
Subject: [gutvol-d] bastien
Message-ID: 

bastien said:
>   Maybe you're just so close to yourself that you can't understand
>    the harm you're doing to this list.

and maybe you're just so full of yourself that you can't understand
that you don't add anything to this list except ad hominem crap...

but believe you me, if the only kind of messages that _i_ sent to 
this list were ones like this (where i'm purposely mimicking you,
to show just how content-less your communications really are),
believe you me _i_ would realize how little i was contributing...


>   Good luck!

same to you!            :+)

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080801/84edf697/attachment-0001.htm 

From Bowerbird at aol.com  Fri Aug  1 11:28:16 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 1 Aug 2008 14:28:16 EDT
Subject: [gutvol-d] another view of those "mountain blood" p2 changes
Message-ID: 

i wrote up the analysis of the p2 diffs on "mountain blood" here:
>    http://z-m-l.com/go/mount/mount-c-p2results.html
scroll down past the text and you'll see the difference-pairs...

i've also recently created a new whole-book view on those diffs:
>    http://z-m-l.com/go/mount/mount-c-p2all.html
search for "^^^^^^^^^" to expose the colorized difference-pairs...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080801/f2c6011d/attachment.htm 

From hart at pglaf.org  Fri Aug  1 11:48:33 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 1 Aug 2008 11:48:33 -0700 (PDT)
Subject: [gutvol-d] bastien
In-Reply-To: 
References:  
Message-ID: 



Speak for yourself, Bastien.

YOU are the one answering his messages, not I.

I have told bowerbird right to his face, on multiple occasions,
that I think he emails are not what they would/could/should be,
and the protections that allows YOU to speak freely here is the
same exact protection that allows HIM to speak freely.

"Where they censor people,
they eventually censor books."


Michael



On Fri, 1 Aug 2008, Bastien wrote:

> Hello Bowerbird,
>
> Bowerbird at aol.com writes:
>
>> once again, bastien enters the arena with ad hominem crap.
>
> Sometimes we're so close to the problem we can't even see it.
> Maybe you're just so close to yourself that you can't understand
> the harm you're doing to this list.
>
> I think your story with Michael is a love story.  You won't give
> up until there is only you and him on this list.
>
> Good luck!
>
> I'm off now.
>
> -- 
> Bastien
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Fri Aug  1 12:24:02 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 1 Aug 2008 12:24:02 -0700 (PDT)
Subject: [gutvol-d] !@! Re:  woman in her own right -- 008 (and final)
In-Reply-To: 
References: 
	
	
	<4891EF2C.7020309@xs4all.nl>
	
	
	
Message-ID: 



On Fri, 1 Aug 2008, Bastien Guerry wrote:

> Michael Hart  writes:
>
>> So, once again, I simply point out that if you don't want a 
>> contact with bowerbird. . .which you all SAY. . .all you do is 
>> start your own listserver and don't let him in, or use a heavy 
>> hand on "moderation" if you do let him in.
>>
>> These solutions are simple.
>
> I'm not in favor of moderation.

At least we agree on that!


> But it's not that easy to build another list.

Actually, it is.

Not only that, but we will do all the heavy lifting to set it up,
and thena ll you have to do is run it however you like.


> If I build another list, I want people to know about this, and I 
> will surely send an email here, because I believe the gutvol-d 
> list attracted many interesting people.

That would be just fine, sending an email here is a good idea, and
I would be more than willing to put a blurb in the Newsletters, so
you would get maximum coveraage.


> How then can I be sure that the one doing that much noise on this 
> list will not join the new list under another name?

I think such an event would be pretty obvious, but even if not, the
"new" person in question would get tossed out immediately, no?


> Ignoring noise is always possible, but it requires a lot of 
> energy. I think people would prefer to spent this energy on 
> discussing things in a more constructive way.

The funny part of this is, of course, that the people who are
creating this entire situation REFUSE to ignore the noise AND
create ever so much MORE NOISE OF THEIR OWN!!!

Hee hee!

Michael


>
> Anyway.
>
> -- Bastien _______________________________________________ 
> gutvol-d mailing list gutvol-d at lists.pglaf.org 
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From klofstrom at gmail.com  Fri Aug  1 12:33:01 2008
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Fri, 1 Aug 2008 09:33:01 -1000
Subject: [gutvol-d] bastien
In-Reply-To: 
References:  
	
Message-ID: <1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>

On Fri, Aug 1, 2008 at 8:48 AM, Michael Hart  wrote:

> I have told bowerbird right to his face, on multiple occasions,  that I think he emails are not what they would/could/should be, and the protections that allows YOU to speak freely here is the same exact protection that allows HIM to speak freely.

Free speech absolutism is libertarianism run amok. Just as you need
police in real life, you need oversight and civility policies in
online discussion groups. That's not censorship. If there were only
one group, and it were run by the government, then it would be
censorship. But, since there are multiple venues for discussion,
guaranteeing an orderly environment is not censorship. In fact, it's
essential for a productive discussion.

Of course moderation isn't easy, any more than overseeing a police
department is easy. There are bad police and bad moderators. But the
alternative is worse.

Michael Hart is playing the part of the regulars on Usenet newsgroups
who say, "We don't need moderation. Just killfile the troll. If
everyone killfiles the troll, he won't get the feedback he wants and
will go away." This has never worked! Some people won't use a fillfile
and get angry enough to respond -- yes, that's true. But the "don't
feed the trolls" argument ignores newbies. Someone visiting a
newsgroup or an email list for the first time will not have a
killfile. He or she will read the insanity or venom on parade and
either leave as fast as possible, or tell off the troll. Which makes
the troll happy, because the group is then
alt.all.about.the.mighty.troll, forever and ever.

The NYTimes recently published a long article about trolls and
griefers. Interesting reading.

Would the mighty Michael allow the group to use disemvowelling? (See
article in Wikipedia.) That leaves the message, to be deciphered by
anyone who is curious, but the sting is gone. It's easy to ignore.

FWIW, I've had BB killfiled for years. I don't reply to him. But that
doesn't seem to have helped at all, because I still have to read the
responses to his behavior.

--
Karen Lofstrom

From hart at pglaf.org  Fri Aug  1 13:00:01 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 1 Aug 2008 13:00:01 -0700 (PDT)
Subject: [gutvol-d] bastien
In-Reply-To: <1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
Message-ID: 


If you take the time to read Karen Lofstrom's message below,
you will see that she answer her own questions, but not in a
way that supports her purported point of view.

Before she gets to that, however, she entices the readers in
a popular "category error" methodology that encourages those
readers to simply "lump" everything into "libertarianism" to
be simply flushed down the toilet with all the rest.

No one needs police for this sort of thing on listservers.

Her own answer of the "killfile" solution takes care of it.

However, as she, herself, points out in her closing phrases,
it is her own group of anti-bowerbird fanatics that take her
to task the most, because she hasn't the nerve to "killfile"
THEM along with her nemesis.

THEIR behaviour is obviously the cause of all her pain, not,
as she would have you believe, bowerbird's or my own.

If Karen were correct, Project Gutenberg would have never in
a million years made it. . .as this list has weathered storm
after storm of such issues with flame wars worse than this.

However, I must congratulate our population, as a whole, for
not taking the baiting that we have seen this week.

All in all this week has been a pretty good indicator of who
is able to act, and react, sensibly, rather than with noise.


mh



On Fri, 1 Aug 2008, Karen Lofstrom wrote:

> On Fri, Aug 1, 2008 at 8:48 AM, Michael Hart  
> wrote:
>
>> I have told bowerbird right to his face, on multiple occasions, 
>> that I think he emails are not what they would/could/should be, 
>> and the protections that allows YOU to speak freely here is the 
>> same exact protection that allows HIM to speak freely.
>
> Free speech absolutism is libertarianism run amok. Just as you 
> need police in real life, you need oversight and civility policies 
> in online discussion groups. That's not censorship. If there were 
> only one group, and it were run by the government, then it would 
> be censorship. But, since there are multiple venues for 
> discussion, guaranteeing an orderly environment is not censorship. 
> In fact, it's essential for a productive discussion.
>
> Of course moderation isn't easy, any more than overseeing a police 
> department is easy. There are bad police and bad moderators. But 
> the alternative is worse.
>
> Michael Hart is playing the part of the regulars on Usenet 
> newsgroups who say, "We don't need moderation. Just killfile the 
> troll. If everyone killfiles the troll, he won't get the feedback 
> he wants and will go away." This has never worked! Some people 
> won't use a fillfile and get angry enough to respond -- yes, 
> that's true. But the "don't feed the trolls" argument ignores 
> newbies. Someone visiting a newsgroup or an email list for the 
> first time will not have a killfile. He or she will read the 
> insanity or venom on parade and either leave as fast as possible, 
> or tell off the troll. Which makes the troll happy, because the 
> group is then alt.all.about.the.mighty.troll, forever and ever.
>
> The NYTimes recently published a long article about trolls and 
> griefers. Interesting reading.
>
> Would the mighty Michael allow the group to use disemvowelling? 
> (See article in Wikipedia.) That leaves the message, to be 
> deciphered by anyone who is curious, but the sting is gone. It's 
> easy to ignore.
>
> FWIW, I've had BB killfiled for years. I don't reply to him. But 
> that doesn't seem to have helped at all, because I still have to 
> read the responses to his behavior.
>
> -- Karen Lofstrom
>

From Bowerbird at aol.com  Fri Aug  1 13:25:02 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 1 Aug 2008 16:25:02 EDT
Subject: [gutvol-d] some sort of list manual
Message-ID: 

michael said:
>   he has been set up to quite a
>    significant degree by "tag team flamers" who alternate a
>   series of message carefully gauged to antagonize him but
>   are couched in these "civil" terms and, when confronted,
>   the "tag team flamers" each say they only send a minimal
>   number of messages each while bowerbird sent many. . .an
>   equal number perhaps to those from the tag team flamers.
>
>    I have also seen this on a number of other listservers--
>    and I hope it makes it into some sort of list manual.

yes, so i had to swat down each one of them individually...

the perplexing thing to me for a while was why they persisted.

i mean, it was clear that i was defeating each of their posts, so
i wondered why they kept coming back for more punishment...

then i realized that their _buddies_ didn't see the smackdowns,
because they've all got me on killfile.

that was quite an aha moment for me.

they could still look good to their buddies, by attacking me,
as none of the buddies realized that the mud wasn't sticking.
reinforcement from their buddies offset the pain i delivered.
they were caught in their little anti-bowerbird echo-chamber.

it was also quite a ha-ha moment for me.

that's because i realized that people reading _all_ of the posts
were seeing the mud-slinging _and_ the resultant smackdown,
thus knew the whole situation.   so no need to belabor the point.

that's when i decided that i could killfile most of my adversaries,
because i had already made all of them look silly stupid enough,
plus they'd maintain their "reputations" with occasional flames,
all the while failing to post any significant content of their own...
(and um, yes, i should have had bastien killfiled already as well.)

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080801/01658bb9/attachment.htm 

From steven at desjardins.org  Fri Aug  1 13:53:02 2008
From: steven at desjardins.org (Steven desJardins)
Date: Fri, 1 Aug 2008 16:53:02 -0400
Subject: [gutvol-d] bastien
In-Reply-To: 
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
Message-ID: <41fd8970808011353h157e97f5wdf64bf2f8801520d@mail.gmail.com>

On Fri, Aug 1, 2008 at 4:00 PM, Michael Hart  wrote:
>
> All in all this week has been a pretty good indicator of who
> is able to act, and react, sensibly, rather than with noise.

What you're missing is that this past week--and past month, and past
year--has also been a test of who is willing to suppress their voice
completely. When civilized discussion becomes impossible, only
uncivilized discussion remains; if the responses to an individual,
after years of activity, are uniformly uncivilized, that is not a
defense of his behavior, it is an indictment.

I do agree with your point that, if all hope for this list is gone,
the answer to make a new list with different rules. I'm unsubscribing
after sending this message; if somebody does make a mailing list for
productive volunteer discussions, I look forward to joining it.

From bzg at altern.org  Fri Aug  1 15:08:18 2008
From: bzg at altern.org (Bastien)
Date: Sat, 02 Aug 2008 00:08:18 +0200
Subject: [gutvol-d] Let's have a physical meeting and see what the noise
	sounds like!
In-Reply-To:  (Michael Hart's
	message of "Fri, 1 Aug 2008 11:48:33 -0700 (PDT)")
References:  
	
Message-ID: 

Michael Hart  writes:

> Speak for yourself, Bastien.

I do, thanks!

> YOU are the one answering his messages, not I.

Did I give you the impression i was putting words into your month?
I cannot see anything like that in my post.

And I was not answering *HIS* message, I was answering yours, insisting
on the fact that it's not that easy to build another list. By "building"
I do not mean "create a list", I mean "gather people interested in the
same thing."

> I have told bowerbird right to his face, on multiple occasions,
> that I think he emails are not what they would/could/should be,
> and the protections that allows YOU to speak freely here is the
> same exact protection that allows HIM to speak freely.

I'm not discussing about freedom.

Just imagine that we all are in the same physical room.  Then what I try
to say looks more obvious: if there is too much noise in one room, it's
not easy to concentrate or move to another room in hope that the noise
will not follow.  This is just what I said before BB accused me of ad
hominem attack.

There paradox here is that you seem to tell off people who complain a
bit about the noise, while indirectly supporting BB's noise in the name
of free speech.  

-- 
Bastien

From bzg at altern.org  Fri Aug  1 15:09:56 2008
From: bzg at altern.org (Bastien)
Date: Sat, 02 Aug 2008 00:09:56 +0200
Subject: [gutvol-d] bastien
In-Reply-To:  (Bowerbird@aol.com's message of
	"Fri, 1 Aug 2008 13:15:58 EDT")
References: 
Message-ID: 

Bowerbird at aol.com writes:

> once again, bastien enters the arena with ad hominem crap.

By the way, I've never seen such a rude way of doing ad hominem attack
that the one you used in your post: putting my name in the subject line.

-- 
Bastien

From bzg at altern.org  Fri Aug  1 15:21:22 2008
From: bzg at altern.org (Bastien)
Date: Sat, 02 Aug 2008 00:21:22 +0200
Subject: [gutvol-d] bastien
In-Reply-To:  (Michael Hart's
	message of "Fri, 1 Aug 2008 13:00:01 -0700 (PDT)")
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
Message-ID: 

Michael Hart  writes:

> Her own answer of the "killfile" solution takes care of it.

The killfile solution won't do for newcomers.

The killfile solution doesn't do for people who don't know how to do it.

The killfile solution doesn't do for people with a grain of humanity,
those who feel it's rude to shut someone down.

Defending free speech by telling anyone that he's got the right not to
listen to other's speach looks wrong to me.  It only encourage people
not to listen, it doesn't help them contribute in a useful way.

-- 
Bastien

From Bowerbird at aol.com  Fri Aug  1 15:35:21 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 1 Aug 2008 18:35:21 EDT
Subject: [gutvol-d] bowerbird
Message-ID: 

bastien wants a conversation, but all i make (according to him) is "noise".

and he doesn't understand why that is an ad hominem attack.
(does he even know what "ad hominem" means?   i think not.
does he know why it's considered bad form?   obviously not.)

nor does he understand that there is no substance to his post,
so he cannot understand why it is _his_ post that is the "noise".

does it matter to him that he doesn't make sense?
or is it sufficient just to throw a little mud at me?

meanwhile, i've made 31 posts in the last month about
"how to clean up o.c.r." -- and that was just one series --
and there was very little "conversation" about _any_ of it...

yet these guys who bellyache don't realize how _stupid_
it makes them look...

here's a clue, kids: when you talk about _me_, you're _off-topic_.

am i gonna go off-topic with you?   you bet your sweet bippy i am,
to make sure you are defeated each and every time you go there,
because i won't let myself be bullied.   i will fight you back and win.
and i will do that every time -- every single time -- until you stop.
3 times a day, 3 times a year, it doesn't matter, _every_darn_time_.

-bowerbird

p.s.   by the way, i set up a listserve a long time back for people
to have a place to vent when they get angry at one of my posts.
it's called "bash-bowerbird", and it's a yahoogroups listserve...
>    http://games.groups.yahoo.com/group/bash-bowerbird/
no posts there yet.   perhaps bastien would like to be the first!



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080801/38045aea/attachment.html 

From bzg at altern.org  Fri Aug  1 15:33:28 2008
From: bzg at altern.org (Bastien)
Date: Sat, 02 Aug 2008 00:33:28 +0200
Subject: [gutvol-d] some sort of list manual
In-Reply-To:  (Bowerbird@aol.com's message of
	"Fri, 1 Aug 2008 16:25:02 EDT")
References: 
Message-ID: 

Bowerbird at aol.com writes:

> (and um, yes, i should have had bastien killfiled already as well.)

Thanks!  

I decided to react because the last week I moved in a country where all
the internet connections are nearly broken. It takes at least one minute
to download emails of this list, and it takes the same to send mine.

I don't have any particular problem with noise.  In countries with good
connection, we can stand trolls -- some of them are even funny!  :+)

But in poor countries, the one where the PG makes more sense, and the
one where it would be great to see people participating in PG's effort,
the one which don't have libraries or bookstores, here the noise is not
bearable anymore.  Here the noise would prevent anyone for joining the
community.

-- 
Bastien

From Bowerbird at aol.com  Fri Aug  1 15:41:04 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 1 Aug 2008 18:41:04 EDT
Subject: [gutvol-d] bowerbird
Message-ID: 

bastien said:
>    Defending free speech by telling anyone that 
>    he's got the right not to listen to other's speach 
>    looks wrong to me.? It only encourage people
>    not to listen, it doesn't help them contribute in a useful way.

do you really not realize the solution?

if not, i'll be happy to tell you what it is:

_create_a_compelling_conversation_.

you can do it elsewhere if you want,
or you can do it _right_here_, now...

and if you can't start one yourself,
jump one started by someone else.

jon richfield, for instance, made a
very intriguing post just yesterday,
one full of wit and insightful points.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080801/824c03a4/attachment.htm 

From Bowerbird at aol.com  Fri Aug  1 15:45:48 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 1 Aug 2008 18:45:48 EDT
Subject: [gutvol-d] some sort of list manual
Message-ID: 

bastien said:
>    Here the noise would prevent anyone for joining the community.

again with the "noise" label, as if it were some kind of objective truth.

whatever, bastien.   and i emphathize with your narrow bandwidth.
i suggest you accept michael's offer and make a separate listserve...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080801/f9871bb2/attachment.htm 

From bzg at altern.org  Fri Aug  1 15:53:24 2008
From: bzg at altern.org (Bastien)
Date: Sat, 02 Aug 2008 00:53:24 +0200
Subject: [gutvol-d] bowerbird
In-Reply-To:  (Bowerbird@aol.com's message of
	"Fri, 1 Aug 2008 18:35:21 EDT")
References: 
Message-ID: 

A non-text attachment was scrubbed...
Name: grrr.jpg
Type: image/jpeg
Size: 21992 bytes
Desc: not available
Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080802/2c2814cb/attachment-0001.jpg 

From Bowerbird at aol.com  Fri Aug  1 16:28:29 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 1 Aug 2008 19:28:29 EDT
Subject: [gutvol-d] bowerbird
Message-ID: 

bastien said:
>    How do you explain this?

all of the routines were fairly obvious.   not much to say about 'em.

what i _cannot_ explain is why they're not being used over at d.p.
it's a mystery.   for that, you'll need to talk to the people over there.


>    Grrrr!!!!

cute.   i love calvin.   and hobbes.           :+)

(but will you look at that.   pictures in e-mail!
a darn miracle.   what will they think of next?)

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080801/144aef7e/attachment.htm 

From donovan at abs.net  Fri Aug  1 17:43:35 2008
From: donovan at abs.net (D Garcia)
Date: Fri, 1 Aug 2008 20:43:35 -0400
Subject: [gutvol-d] Laissez-faire Moderation
In-Reply-To: 
References: 
	
	
Message-ID: <200808012043.35723.donovan@abs.net>

On Thursday 31 July 2008 13:50:14 Michael Hart wrote:
> You are ALL welcome to start your own listsevers at expense
> to be 100% defrayed by Project Gutenberg.

They shouldn't have to. This is the *PG Volunteer* discussion list, not 
the "let an essentially non-contributory element flood the list with posts 
that few care about enough to even be bothered to read or respond to unless 
they're particularly inflammatory" discussion list.

> So, once again, I simply point out that if you don't want a
> contact with bowerbird. . .which you all SAY. . .all you do
> is start your own listserver and don't let him in, or use a
> heavy hand on "moderation" if you do let him in.

Which apparently means that you encourage and endorse the creation of a 
divisive split within a community you founded, for the sole reason to not 
have to exercise your responsibilities as a moderator therein. Interesting. 
Disappointing, but interesting.

> If you just ignored bowerbird, it would not be there.

Has that worked for you, or is he still here?

> Once again I have been pilloried for NOT killing him off in
> a situation you could have eliminated several easy ways.

But it's not their job. That's the moderator's responsibility.

> If you think you can force me into using moderation weapons
> then I suggest you think again. . . .

"Force" has nothing to do with this. I suspect you can't explain how "keeping 
your community alive and growing by appropriately reducing obstacles to 
productive discussion" equates to "using moderation weapons" (a strong and 
most inappropriate word). You seem to have confused "weapon" with "tool."

It's really too bad that you've never accepted that your laissez-faire 
approach to PG isn't always appropriate. You're asking people on this list to 
do for themselves the job you implicitly promised to--and, despite your 
successes in many other areas, you've failed them terribly here.

Regretfully blunt, but sincerely honest:

David Garcia
System Administrator
Distributed Proofreaders

The opinions expressed above are my own, and do not represent an official 
position of any organization or employer with which I may be affiliated.

From hart at pglaf.org  Fri Aug  1 21:05:12 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 1 Aug 2008 21:05:12 -0700 (PDT)
Subject: [gutvol-d] some sort of list manual
In-Reply-To: 
References:  
Message-ID: 


On Sat, 2 Aug 2008, Bastien wrote:

> Bowerbird at aol.com writes:
>
>> (and um, yes, i should have had bastien killfiled already as well.)
>
> Thanks!
>
> I decided to react because the last week I moved in a country where all
> the internet connections are nearly broken. It takes at least one minute
> to download emails of this list, and it takes the same to send mine.
>
> I don't have any particular problem with noise.  In countries with good
> connection, we can stand trolls -- some of them are even funny!  :+)
>
> But in poor countries, the one where the PG makes more sense, and the
> one where it would be great to see people participating in PG's effort,
> the one which don't have libraries or bookstores, here the noise is not
> bearable anymore.  Here the noise would prevent anyone for joining the
> community.
>
> -- 
> Bastien

Then STOP making noise!!!


Michael



> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Fri Aug  1 21:13:43 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 1 Aug 2008 21:13:43 -0700 (PDT)
Subject: [gutvol-d] bastien
In-Reply-To: 
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
	
Message-ID: 




On Sat, 2 Aug 2008, Bastien wrote:

> Michael Hart  writes:
>
>> Her own answer of the "killfile" solution takes care of it.
>
> The killfile solution won't do for newcomers.

Anyone can do it.


> The killfile solution doesn't do for people who don't know how to do it.

Just search for "killfile" etc., with any search engine.

We all have our own learning curves, however, the "newbies" as
many still call them, have grown up with computers, and learn,
as we have seen, so much faster than we did.

It's a natural language to them, not an acquired one.


> The killfile solution doesn't do for people with a grain of humanity,
> those who feel it's rude to shut someone down.

Yet that is exactly what all those people are doing to bowerbird,
being so very "rude to shut someone down."

The difference is that it is so much MORE RUDE to censor someone
from the ENTIRE audience than just yourself.

It's only a matter of quatity, with either way you're a killer.


> Defending free speech by telling anyone that he's got the right 
> not to listen to other's speach looks wrong to me.  It only 
> encourage people not to listen, it doesn't help them contribute in 
> a useful way.

You can't have it both ways.

Either this silly flame war encourages people not to listen,
or telling people they have the right not to listen does the
discouraging them from listening???

No. . .you make no sense at all here, or with your ideal for
rudeness to shut people down. . . .

BOTH of these are things you can't have BOTH ways. . . .

Make up your mind. . . .


mh

>
> -- 
> Bastien
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Fri Aug  1 21:33:01 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 1 Aug 2008 21:33:01 -0700 (PDT)
Subject: [gutvol-d] Let's have a physical meeting and see what the noise
 sounds like!
In-Reply-To: 
References:  
	
	
Message-ID: 



On Sat, 2 Aug 2008, Bastien wrote:

> Michael Hart  writes:
>
>> Speak for yourself, Bastien.
>
> I do, thanks!
>
>> YOU are the one answering his messages, not I.
>
> Did I give you the impression i was putting words into your month? 
> I cannot see anything like that in my post.

YOU are creating more noise. . . .

If you are going to complain about the noise,
start by speaking that for yourself.


> And I was not answering *HIS* message, I was answering yours, 
> insisting on the fact that it's not that easy to build another 
> list. By "building" I do not mean "create a list", I mean "gather 
> people interested in the same thing."

If you can't gather people to your cause, sorry, but that's just a
problem everyone has to deal with on their own.

However, we will still provide all the assistance mentioned,
but the content, the membership drives, etc., are up to you.

If you can't get people to work with you. . .that's just you.


>> I have told bowerbird right to his face, on multiple occasions, 
>> that I think he emails are not what they would/could/should be, 
>> and the protections that allows YOU to speak freely here is the 
>> same exact protection that allows HIM to speak freely.
>
> I'm not discussing about freedom.

You certainly are using that freedom a lot, and saying a lot about
how it works for someone not discussion freedom.


> Just imagine that we all are in the same physical room.

But we are not.

We are not even in the same time zones, much less continents.

You can't interrupt here, or be interrupted, you can't shout
anybody down, you can't throw things at them, hit them, even
toss a drink in their faces.

Only those who want to curtail this freedom talk about being
in the same physical room, but it's just a pretense.


> Then what I try to say looks more obvious: if there is too much 
> noise in one room, it's not easy to concentrate or move to another 
> room in hope that the noise will not follow.

If you can't learn to concentrate on what you want and leave
the rest, then you will have great trouble socially.

The world is FULL of things that would distract you, fuller,
by far, than this little pile of letters and words.


> This is just what I 
> said before BB accused me of ad hominem attack.

This entire conversation has been an ad hominem attack.

First on bowerbird, then on me.

I have even been attacked for pointing out it was flame
war material as being my own flaming.

These people, including you, seem to think anyone who's
disagreeing with you is flaming ad hominem, but is is a
ver easy thing to tell who has a message and who has an
empty message with nothing but vitriol.


> There paradox here is that you seem to tell off people who 
> complain a bit about the noise, while indirectly supporting BB's 
> noise in the name of free speech.

See???  Now there's a perfect example.

You complain that I tell off people who are doing noise
in way of complaining, but you are complaining yourself
so make up your mind!

You are complaining about the very freedom you're using
and that you say you are not discussing.

You, and the others, have contradicted yourselves in an
instant in your own messages.

I'm not suporting anyone's noise here.

I get after bb as much as get after anyone.

However, this time around bb has been very constructive
and you guys haven't.

Other times it's been the other way around.

Most obvious is that you think by ganging up on him you
can beat him or me into submission.

This is NOT getting you any points. . . .


mh

>
> -- Bastien _______________________________________________ 
> gutvol-d mailing list gutvol-d at lists.pglaf.org 
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Fri Aug  1 21:52:37 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 1 Aug 2008 21:52:37 -0700 (PDT)
Subject: [gutvol-d] bastien
Message-ID: 


> On Fri, Aug 1, 2008 at 4:00 PM, Michael Hart  
> wrote:
>> 
>> All in all this week has been a pretty good indicator of who
>> is able to act, and react, sensibly, rather than with noise.
> 
> What you're missing is that this past week--and past month, and 
> past year--has also been a test of who is willing to suppress 
> their voice completely.

So, you are saying that because things were a certain way in the
past years that they can't be any different this year???

This is the kind of verbiage that stops progress in its tracks.

Also, "who is willing to supress their voice completely" is one
of the other major fallacies, the one called "ad absurdum," but
sometimes "ad infitum" by saying if something is taken to these
extremes it doesn't make sense.

Very few things make sense when taken to such extremes.

"Completely" is just another ad absurdum or ad infinitum as per
the various interpretations of those terms.


> When civilized discussion becomes impossible, only uncivilized 
> discussion remains; if the responses to an individual, after 
> years of activity, are uniformly uncivilized, that is not a 
> defense of his behavior, it is an indictment.

So, you are saying, even in this month when bowerbird sent in so
many great suggestions of how to improve our eBooks, that every,
single, one, of those "uniformly uncivilized" people, are right?

Right by mob appeal?

Are you saying bowerbird had driven them all insane?

Should we let those insane arguments rule?

No matter if those in the wrong become unanimous. . . .

They are still wrong.

What these people MUST do to be heard properly is to rationally,
calmly, not "ad" anything, come to the making of their points to
the extent of making eBooks in better manners.

If there isn't any of THAT in there, it is non sequitur.

Those who have kept this entire thread alive have no content, no
hopes, no dreams, no product, no nothing in their message that's
what could be called any real CONTENT.

It's all just mob warfare.


> I do agree with your point that, if all hope for this list is 
> gone, the answer to make a new list with different rules.

This list is now about 20 years old, one of the oldest, and the
list will continue to survive after you have taken you ball and
gone home in a huff.

However, I wish you all the best in getting what you want, ONLY
in terms of eBook productivity, but NOT in terms of censorship.

"Where they censor people,
they will censor books too!"


> I'm unsubscribing after sending this message; if somebody does 
> make a mailing list for productive volunteer discussions, I look 
> forward to joining it.

Of course, you'll have to find out about it some other way, but
it won't happen UNLESS people like YOU actually do it. . . .

You take all this energy complaining, but save none for doing a
new list the way YOU say YOU want it. . . .

There is a word for that, but not used in polite society.


mh

From hart at pglaf.org  Fri Aug  1 22:26:57 2008
From: hart at pglaf.org (Michael Hart)
Date: Fri, 1 Aug 2008 22:26:57 -0700 (PDT)
Subject: [gutvol-d] !@!!@!Re:  Laissez-faire Moderation
In-Reply-To: <200808012043.35723.donovan@abs.net>
References: 
	
	
	<200808012043.35723.donovan@abs.net>
Message-ID: 



On Fri, 1 Aug 2008, D Garcia wrote:

> On Thursday 31 July 2008 13:50:14 Michael Hart wrote:
>> You are ALL welcome to start your own listsevers at expense to be 
>> 100% defrayed by Project Gutenberg.
>
> They shouldn't have to. This is the *PG Volunteer* discussion 
> list, not the "let an essentially non-contributory element flood 
> the list with posts that few care about enough to even be bothered 
> to read or respond to unless they're particularly inflammatory" 
> discussion list.

Actually, that last part should be just the opposite!

They should NOT "respond to" "particularly inflammatory" notes.

By the way, those who contributed the most to these flame wars
have usually be those who contributed the least to eBooks.

I have teased bb about this publicly in the past, but I have a
responsiblity to treat his recent offerings rationally and not
throw them out without due consideration, just as I take quite
real time and effort here to answer you and the others.

You all get as proper a treatment as I can manage.

The trouble is that you don't want SOME to get that.


>> So, once again, I simply point out that if you don't want a 
>> contact with bowerbird. . .which you all SAY. . .all you do is 
>> start your own listserver and don't let him in, or use a heavy 
>> hand on "moderation" if you do let him in.
>
> Which apparently means that you encourage and endorse the creation 
> of a divisive split within a community you founded, for the sole 
> reason to not have to exercise your responsibilities as a 
> moderator therein. Interesting. Disappointing, but interesting.

That's right. . .as founder, I have a different perspective, and I
don't just have to follow the pack, I can encourage various packs,
a variety of approaches, unlike those who get ossified, into their
positions of power, and then can't adapt to the future.

Think rust belt.

Think IBM.

Think Detroit.

The only place you don't see this is in sports.

Real life is more important, so why not???

Because "power corrupts, and absolute power corrupts absolutely.

I am not, and never have been, into that kind of power.

That's why no one has been able to take over Project Gutenberg--
to re-form, or "reform" it into their own image.

Just about every year people say Project Gutenberg cannot go
another year until my kind of management.

Everybody can run Project Gutenberg better than I do. . . .


Don't forget, the word "moderator" meant one who is moderate,"
and "one who uses power moderately."

NOT "one who forces others to be moderate."

Of course, the "new" definitions shifted yards to the right,
but I'm not so easy to shift in my values, though I am easy,
quite literally, to get to try new ideas.

However, these messages are not suggesting new ideas for the
creation and distribution of eBooks, so they are irrelevant.

On the other hand, at least this past month, bowerbird quite
literally sent us a total of 31 suggestions for better books
and while I don't like ALL of them, I think it would be very
easy to use those suggestions in our eBook goals.

The goals of personal power, etc., I leave laying there.


>> If you just ignored bowerbird, it would not be there.
>
> Has that worked for you, or is he still here?

Not the way he is for you, eh???!!!

I just left it all alone a month or two ago, and things
improved quite a bit, eh?

Now YOU have NOT left it all alone, even when things in
the greater portion were going better, and more calmly.

Eh???


>> Once again I have been pilloried for NOT killing him off in a 
>> situation you could have eliminated several easy ways.
>
> But it's not their job. That's the moderator's responsibility.

We don't have that kind of moderator position here, never have.

You want it?

Make your own list.

There is NEVER going to be that kind of political power here,
any you can't create it.

People try every few years. . .it never works. . .life goes on.


>> If you think you can force me into using moderation weapons then 
>> I suggest you think again. . . .
>
> "Force" has nothing to do with this. I suspect you can't explain 
> how "keeping your community alive and growing by appropriately 
> reducing obstacles to productive discussion" equates to "using 
> moderation weapons" (a strong and most inappropriate word). You 
> seem to have confused "weapon" with "tool."

The "obstacles to productive discussion" are obviously those who
operate under the pretense that they would HAVE such discussions
if it weren't for bowerbird.

Well. . .I haven't seen any evidence for this, any attempt to do
"productive discussion" by these people. . .the evidence speaks,
volumes, for itself.

WHAT has bb stopped you from saying or doing???

This is so totally transparent that once again I am going to ask
if anyone needs me to point this out, and if no such messages do
arrive in my mailbox, I won't bother, and you will flame out, as
always, because no one is replying.


> It's really too bad that you've never accepted that your 
> laissez-faire approach to PG isn't always appropriate. You're 
> asking people on this list to do for themselves the job you 
> implicitly promised to--and, despite your successes in many other 
> areas, you've failed them terribly here.

You can't define my job. . .period.

You can't make promises for me. . .period.

I treat you as equally as possible. . .period.

If Project Gutenberg is a failure. . .more gruel please!

You don't define my success. . . .


> Regretfully blunt, but sincerely honest:


Try being as regretfully blunt and honest with yourself.

What have you proposed here other than mob censorship?

Ask one of your good friends, without any preamble,
to go back a month or two and read all this, and to
be blunt and honest with you.

They will tell you there is NO CONTENT to all this.

It's all just "kids in a sandbox" throwing tantrums
but saying it's the other kid's fault and trying to
get him thrown out by the teacher.

Sorry, but I've seen all this a dozen times just as
your parents and teachers saw through all your tiny
political maneuverings when you were kids.

I'm really surprised that people don't do better as
they grow up than they did in the sandboxes.

The worst part???

All too many moderators get taken in by this.

But it won't work here, not with me, not with Greg,
who is our CEO, not with any of our board members.

You have Distributed Proofreaders, isn't that enuf?


Michael S. Hart
Founder
Project Gutenberg


>
> David Garcia System Administrator Distributed Proofreaders
>
> The opinions expressed above are my own, and do not represent an 
> official position of any organization or employer with which I may 
> be affiliated. _______________________________________________ 
> gutvol-d mailing list gutvol-d at lists.pglaf.org 
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From richfield at telkomsa.net  Sat Aug  2 05:13:13 2008
From: richfield at telkomsa.net (Jon Richfield)
Date: Sat, 02 Aug 2008 14:13:13 +0200
Subject: [gutvol-d] !@!ACTA trade agreement brief for July 29-31
Message-ID: <48944F59.9090201@telkomsa.net>

To those who responded: much thanks for kind words.

There are all sorts of well-worn arguments about the protection of intellectual creations and property, and these are not likely to go away very soon.  The problem is firstly the concept of property, and that intrinsically entails conflicts of interest.  In a world of want and woe this in turn implies predators and prey, an ecology in which creators and possessors not only get screwed, but are widely accepted as legitimate targets.  Just ask Chairman Mao or the Lady Evita P. after they had attained the status of the empowered.    
Conversely, when possession enhances power, it feeds back positively to the successful predator and parasite whose motto is that inventors and creators are *supposed* to get screwed, in a community where to give a sucker an even break is not so much a crime or a faux pas, as an obscenity, a sin.  The truly virtuous disapprove of giving such lower forms of life any more than the wherewithal for life, because it detracts from the resources that rightly belong to the elite and the lickspittles, who wish to profit for evermore from their spoils.  
And who am I to criticise?  'Tis not in mortals to countermand success, not in politics surely, because the powerful and rich, or those that have their approval, have the political clout and they wield it for those who helped them into high station.  
So let us accept the needy man of prey at table, seeing that if we do not, he is likely to confiscate the table.  And let us by no means begrudge the occasional mouthful of chaff to the kine treading the corn. In our terminology, that amounts to saying that the labourer is worthy of his hire and the owner of his interest.  By all means grant the copyright and the proceeds of the copyright, if that really is what it takes to inspire and empower the creator.  
However, when we find that no one in interested in publishing or republishing material of public value, just because they don't see sufficient profit in it themselves, and yet wish to deny others the right to publish it without monetary profit, that is not in any public interest, nor yet in the interest of the author or publisher.  
Much to the contrary.  
If this is the kernel of the problem, then there are ways of cracking the nut without bruising fingers with sledgehammers or feelings with litigation.  The appropriate nutcracker should be assignment of rights, not by arbitrarily fixed periods of time from the date of this or that, but by nurture.  
"Money's like muck, that's profitable while
 'T serves for manuring of some fruitful soil.
 But on a barren one, like thee, methinks,
 'Tis like a dunghill that lies still and stinks." 
That's easy for Father Flecknoe to say about money, but for copyright matters are worse.  Economic theory has not nearly caught up with the concept of the value and negotiability of information in my opinion and as far as I know.  I don't understand money in the first place, and I am not sure how to deal with information as a structure of entities.  How to reconcile and combine the two in a single body of theory I have barely the slightest idea.  
However, something like the following seems to me to be a viable approach.  
1 On producing something presumably worth while, a creator or his heirs or assignees hold more or less absolute right over its publication until some time as it is published.  If he decides nay, that is his right, and anyone publishing it in the face of his refusal commits an offence.  (How serious an offence and what is to be done about it, and by whom, is not of primary interest here.)  Of course, if no one else thinks it worth while, as happened with van Gogh, then there should be no problem.  
2 Once the material has been published in some non-trivial fashion, such that the public have access (the author flashing a manuscript in the face of a witness is insufficient to establish publication) and such that the date of publication is reasonably ascertainable to within a year or so, then for some period thereafter (75 years? Sure! Who cares?) that work is in copyright and no one else may publish it for profit without permission.  
3 Furthermore, as long as it is commercially available for some reasonable price to a public of reasonable size, then during that period it remains in copyright.  Further yet more, if it is material of a nature that say, the author or copyright holder uses as proprietary information in his industry or in classes that his (or her, naturally!) company presents or the like, then the volume of his publication runs is no one's business but his own.  
4 However, should the volume of publication be too small to meet public demand over a long period (say 10 years?) then it should be open to any member of the public to give notice to the publisher or publicly in acceptable news media, or to some statutory body, that if *adequate* publication is not resumed within a reasonable period (1 year perhaps?) that he will commence publishing the material on some certified non-profit basis at some time soon after.  (Free electronic distribution or the like?)  Second-hand books do not count as re-commencement of publication.  
5 Should the original publishers reawaken to some prince's or toad's kiss (the latter usually being less germy, of course, but toads are fussier about whom they kiss) and undertake to re-commence adequate publication, they may do so, and accordingly require that the non-profit publication be stopped from the date of effective re-commencement.   They cannot however ask to put the evils back into Pandora's box, and anyone who has in the mean time obtained copies in good faith may retain them, though not to publish them in turn.  
6 In this way no one would lose.  Mr Muniglut or Margaret Mitchell's heirs could hold onto their rights as long as they supplied reasonable demand.  The public would not be deprived either of that which they were willing to pay for, or which no one thought worth offering them for money, but would be of inestimable value to posterity, or for that matter, to particular or perceptive members of contemporary society.  The author would have a far better chance of the kind of immortality that Woody Allen despised, but which Piet Hein referred to in:
Giving in is no defeat
Passing on is no retreat
Selves were made to rise above
You shall live in what you love.

Now, all this overflowed at the mouth from a full heart that hitherto has not been concerned with such matters, so I know very well that I have forgotten a lo-o-o-o-t.  I am only too happy however to discuss related concepts, however futilely.  I regret to say however, that my son is unlikely to have any opportunity to join in at present, as he has heavy hay on his fork.

Cheers,

Jon







From grythumn at gmail.com  Sat Aug  2 09:41:06 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Sat, 2 Aug 2008 12:41:06 -0400
Subject: [gutvol-d] bastien
In-Reply-To: 
References: 
Message-ID: <15cfa2a50808020941uc3153a3j8c8f3b5587abada6@mail.gmail.com>

On Sat, Aug 2, 2008 at 12:52 AM, Michael Hart  wrote:
> Also, "who is willing to supress their voice completely" is one
> of the other major fallacies, the one called "ad absurdum," but
> sometimes "ad infitum" by saying if something is taken to these
> extremes it doesn't make sense.

No, but it does mean that most of the active discussion is being
driven to other fora with lower noise floors and more helpful
attitudes. We (at DP, since that is where most of his sniping is
directed) have no shortage of people who can point out problem areas..
 we are short of people who can actually write code to fix them.  We
have a shortage of trained statisticians who can help determine a
reliable algorithm for when a page/project is "done". We have a
shortage of people who are willing commit to the hours to days of
polishing works in post processing, or again, of people to write code
to distribute the tasks that do not need to be done by a single
person.

Besides, weren't you going to ignore gutvol-d until Greg N pointed out
some specific interesting discussions?

R C

From walter.van.holst at xs4all.nl  Sat Aug  2 09:51:33 2008
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Sat, 02 Aug 2008 18:51:33 +0200
Subject: [gutvol-d] bastien
In-Reply-To: <15cfa2a50808020941uc3153a3j8c8f3b5587abada6@mail.gmail.com>
References: 
	<15cfa2a50808020941uc3153a3j8c8f3b5587abada6@mail.gmail.com>
Message-ID: <48949095.1040307@xs4all.nl>

Robert Cicconetti wrote:

> No, but it does mean that most of the active discussion is being
> driven to other fora with lower noise floors and more helpful
> attitudes. We (at DP, since that is where most of his sniping is

Which ones are those? Or do you mean the DP webfora?

Frankly, I am rather tired of the fact that a single person is allowed 
to poison this well. If someone were to start a separate mailing list, I 
might very well join.



Regards,

  Walter

From bruce at zuhause.org  Sat Aug  2 10:26:51 2008
From: bruce at zuhause.org (Bruce Albrecht)
Date: Sat, 2 Aug 2008 12:26:51 -0500
Subject: [gutvol-d] Moderation  (was: Re: bastien)
In-Reply-To: 
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
	
	
Message-ID: <18580.39131.718459.841886@celery.zuhause.org>

Michael Hart writes:
 > The difference is that it is so much MORE RUDE to censor someone
 > from the ENTIRE audience than just yourself.

Just because a mailing list is moderated, does not mean that someone
is not allowed to have a voice.  A moderator could reject a message
and request that the poster rewrite it so that it is not attacking a
specific person or persons.  If there is a point to the message beyond
the attack, the poster probably will be willing to trim the offensive
portion of the message, if not, the poster probably didn't care enough
about the point to make it.

Quite frankly, I find it far more rude, for example, for Marcello
Parathon to send a message that has no point other than he thinks
Bowerbird is an idiot, than for a moderator to block that post.

If all of us were in a room together at some sort of panel, and two
persons started shouting at each other about how stupid the other one
was, would you let them both go at it until the end of the panel?  No,
you'd do something to calm things down, like call on other people, or
ask (make) them leave the room, so others could make their points.
Why is there something magical about a mailing list that means that if
people can't self-censor, others shouldn't help (make) them do it?

Quite frankly, I only read this list about once a month, and I don't
blacklist anyone, but I tend to skip a lot of messages appear to have
no other point than X thinks Y is an idiot, sometimes with a lucid
explanation, but more often not.  I'd be more likely to read and
participate in discussions here if there were more ideas and fewer
attacks. 

From grythumn at gmail.com  Sat Aug  2 13:25:28 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Sat, 2 Aug 2008 16:25:28 -0400
Subject: [gutvol-d] bastien
In-Reply-To: <48949095.1040307@xs4all.nl>
References: 
	<15cfa2a50808020941uc3153a3j8c8f3b5587abada6@mail.gmail.com>
	<48949095.1040307@xs4all.nl>
Message-ID: <15cfa2a50808021325x3de25815l96910fd2b43d07ce@mail.gmail.com>

On Sat, Aug 2, 2008 at 12:51 PM, Walter van Holst
 wrote:
> Robert Cicconetti wrote:
>
>> No, but it does mean that most of the active discussion is being
>> driven to other fora with lower noise floors and more helpful
>> attitudes. We (at DP, since that is where most of his sniping is
>
> Which ones are those? Or do you mean the DP webfora?
>
> Frankly, I am rather tired of the fact that a single person is allowed
> to poison this well. If someone were to start a separate mailing list, I
> might very well join.

DP, DP-can, ebookforge, are ones I know of on the production side.
Plus some related stuff such as teleread, and some yahoo groups. The
cataloging team have their own list, the whitewashers have theirs, not
sure if Juliet and Greg have one..

I always thought this list should be the place where the various
subunits and independent producers talk about various things, and it
has functioned in the way in the past, but doesn't seem to function
that way recently. If the noise or aggravation level is too high,
people don't bother to come here and coordinate. And I think that is
sad, as it will let problems go unaddressed and groups uncoordinated.

R C

From hart at pglaf.org  Sat Aug  2 17:17:27 2008
From: hart at pglaf.org (Michael Hart)
Date: Sat, 2 Aug 2008 17:17:27 -0700 (PDT)
Subject: [gutvol-d] bastien
In-Reply-To: <15cfa2a50808021325x3de25815l96910fd2b43d07ce@mail.gmail.com>
References: 
	<15cfa2a50808020941uc3153a3j8c8f3b5587abada6@mail.gmail.com>
	<48949095.1040307@xs4all.nl>
	<15cfa2a50808021325x3de25815l96910fd2b43d07ce@mail.gmail.com>
Message-ID: 


It appears to me that it is the group, organized or not,
that would prefer to poison this particular well that is
doing the greatest damage by carrying on so about what a
rational person would simply write off as some mountains
made out of molehills.

This conversation is not an active discussion. . . .

It is merely a flame war that those who SAY they will be
more interested in something without so much noise are a
greater source of this noise than any other contributor.

By the way, I do save all these message, not just in the
archives, but in my own email archives, and when you are
trying to point out that YOU have not been part of these
circuses of noisemaking, I will just send them back from
the future to demonstrate this to all concerned.

So. . .I suggest the noisemakers think a little more out
of the present and into the future where reputations are
going to come to the real test.

The more of this you do, the worse you will look and all
the other parties will simply look consistent.


Michael S. Hart
Founder
Project Gutenberg

On Sat, 2 Aug 2008, Robert Cicconetti wrote:

> On Sat, Aug 2, 2008 at 12:51 PM, Walter van Holst
>  wrote:
>> Robert Cicconetti wrote:
>>
>>> No, but it does mean that most of the active discussion is being
>>> driven to other fora with lower noise floors and more helpful
>>> attitudes. We (at DP, since that is where most of his sniping is
>>
>> Which ones are those? Or do you mean the DP webfora?
>>
>> Frankly, I am rather tired of the fact that a single person is allowed
>> to poison this well. If someone were to start a separate mailing list, I
>> might very well join.
>
> DP, DP-can, ebookforge, are ones I know of on the production side.
> Plus some related stuff such as teleread, and some yahoo groups. The
> cataloging team have their own list, the whitewashers have theirs, not
> sure if Juliet and Greg have one..
>
> I always thought this list should be the place where the various
> subunits and independent producers talk about various things, and it
> has functioned in the way in the past, but doesn't seem to function
> that way recently. If the noise or aggravation level is too high,
> people don't bother to come here and coordinate. And I think that is
> sad, as it will let problems go unaddressed and groups uncoordinated.
>
> R C
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Sat Aug  2 17:33:02 2008
From: hart at pglaf.org (Michael Hart)
Date: Sat, 2 Aug 2008 17:33:02 -0700 (PDT)
Subject: [gutvol-d] bastien
In-Reply-To: <15cfa2a50808020941uc3153a3j8c8f3b5587abada6@mail.gmail.com>
References: 
	<15cfa2a50808020941uc3153a3j8c8f3b5587abada6@mail.gmail.com>
Message-ID: 



On Sat, 2 Aug 2008, Robert Cicconetti wrote:

> On Sat, Aug 2, 2008 at 12:52 AM, Michael Hart  wrote:
>> Also, "who is willing to supress their voice completely" is one
>> of the other major fallacies, the one called "ad absurdum," but
>> sometimes "ad infitum" by saying if something is taken to these
>> extremes it doesn't make sense.
>
> No, but it does mean that most of the active discussion is being
> driven to other fora with lower noise floors and more helpful
> attitudes.

And it is just this continuing noise level of yours that does it.

This conversation has just about been ALL noise from those who do
nothing but complain about the noise. . .the actual noisemakers--
whom you have accused--have contributed very little such noise.


> We (at DP, since that is where most of his sniping is
> directed)

And just where is all the CURRENT level of sniping coming from?

Do you NOT realize those accusing of sniping are doing more than
anyone else to create their own tempest in a teapot?


> have no shortage of people who can point out problem areas..
> we are short of people who can actually write code to fix them.  We
> have a shortage of trained statisticians who can help determine a
> reliable algorithm for when a page/project is "done". We have a
> shortage of people who are willing commit to the hours to days of
> polishing works in post processing, or again, of people to write code
> to distribute the tasks that do not need to be done by a single
> person.

Now. . .if you will just DO what you have outlined above. . .or DO
something to encourage others to DO what you have outlined above--
a new situation might arise.

But. . .the whay you have put your message above you are doing the
very same thing you are accusing about. . .saying something SHOULD
be done, COULD be done. . .etc. . .without actually doing more for
the situation than just complaining,

"If your comments are not part of a proposed solution,
then your comments are merely part of the noise storm."


> Besides, weren't you going to ignore gutvol-d until Greg N pointed out
> some specific interesting discussions?

And you are doing your best to make the discussion interesting,
are you not?

But without adding any REAL CONTENT that might be useful in the
creation and distribution of more and better eBooks.

THAT is the only measure of successful contribution to the very
discussions you seem to be trying to poison.

Your comments are doing more harm than good, at least it seems,
or is that what you all have intended in the first place?

It wouldn't be the first time. . . .

Several groups have tried it before. . . .

Without success. . . .


Michael S. Hart
Founder
Project Gutenberg


From bzg at altern.org  Sat Aug  2 15:55:51 2008
From: bzg at altern.org (Bastien Guerry)
Date: Sat, 02 Aug 2008 17:55:51 -0500
Subject: [gutvol-d] Let's have a physical meeting and see what the noise
	sounds like!
In-Reply-To:  (Michael Hart's
	message of "Fri, 1 Aug 2008 21:33:01 -0700 (PDT)")
References:  
	
	
	
Message-ID: <87hca2lxiz.fsf@altern.org>

Michael Hart  writes:

> You complain that I tell off people who are doing noise
> in way of complaining, but you are complaining yourself
> so make up your mind!

It's not because my own complaining add more reasons for 
others to complain that my own silence will deprive anyone 
from such reasons.  Why?  Because good trolls don't only 
enjoy replies, they also enjoy silence.  Both are signs 
of an audience that still pay attention, somehow.

The circle is elsewhere: the circle is in the fact that, 
if someone puts energy on a constructive discussion, there
are good chances that BB will put more energy in denying 
any value in what the OP is proposing.  This is also 
rationality: induction and preductions.  

> You are complaining about the very freedom you're using
> and that you say you are not discussing.

I'm not complaining about any freedom.  
I'm not in favor of moderation.  

My initial point was that it's might not be such a great 
idea to tell to the people here: "Guys, if you're not happy 
with the current way of discussing, get the hell out, I can 
even provide the car and the fuel."

I find it rude.

> You, and the others, have contradicted yourselves in an
> instant in your own messages.

You try to reduce people to self-contradiction because it 
looks like you don't want to see the complexity here.  See

> I'm not suporting anyone's noise here.
>
> I get after bb as much as get after anyone.

I'm not requiring special attention to my own noise.  

> Most obvious is that you think by ganging up on him you
> can beat him or me into submission.

I'm not trying to beat anyone, please don't let your emotions 
take over.  I'm not encouraging any sense of being a victim or 
whatsoever.  Maybe it's time to relax a bit!

Cheers,

-- 
Bastien

From bzg at altern.org  Sat Aug  2 15:57:23 2008
From: bzg at altern.org (Bastien)
Date: Sat, 02 Aug 2008 17:57:23 -0500
Subject: [gutvol-d] Thanks
In-Reply-To:  (Michael Hart's
	message of "Fri, 1 Aug 2008 21:52:37 -0700 (PDT)")
References: 
Message-ID: <87abfulxiz.fsf@altern.org>

Can you stop using my name in the subject line?

This is offensive.

-- 
Bastien

From bzg at altern.org  Sat Aug  2 15:19:13 2008
From: bzg at altern.org (Bastien Guerry)
Date: Sat, 02 Aug 2008 17:19:13 -0500
Subject: [gutvol-d] some sort of list manual
In-Reply-To:  (Michael Hart's
	message of "Fri, 1 Aug 2008 21:05:12 -0700 (PDT)")
References:  
	
Message-ID: <87vdyilxj2.fsf@altern.org>

Michael Hart  writes:

> On Sat, 2 Aug 2008, Bastien wrote:
>
>> Bowerbird at aol.com writes:
>>
>>> (and um, yes, i should have had bastien killfiled already as well.)
>>
>> Thanks!
>>
>> I decided to react because the last week I moved in a country where all
>> the internet connections are nearly broken. It takes at least one minute
>> to download emails of this list, and it takes the same to send mine.
>>
>> I don't have any particular problem with noise.  In countries with good
>> connection, we can stand trolls -- some of them are even funny!  :+)
>>
>> But in poor countries, the one where the PG makes more sense, and the
>> one where it would be great to see people participating in PG's effort,
>> the one which don't have libraries or bookstores, here the noise is not
>> bearable anymore.  Here the noise would prevent anyone for joining the
>> community.
>>
>> -- 
>> Bastien
>
> Then STOP making noise!!!

Your answer is offensive. 

Can we please get back to a more civilized tone?

In the email above I'm discussing the purpose of this list, as part of
the purpose of PG.  I'm pointing out that the way the list currently
runs doesn't help people from outside to jump in, especially those with
a low connection, those who might be in greatest want of public domain
Ebooks.

I'm sorry you call this "noise".

-- 
Bastien

From bzg at altern.org  Sat Aug  2 15:32:30 2008
From: bzg at altern.org (Bastien Guerry)
Date: Sat, 02 Aug 2008 17:32:30 -0500
Subject: [gutvol-d] bastien
In-Reply-To:  (Michael Hart's
	message of "Fri, 1 Aug 2008 21:13:43 -0700 (PDT)")
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
	
	
Message-ID: <87od4alxiz.fsf@altern.org>

Michael Hart  writes:

> On Sat, 2 Aug 2008, Bastien wrote:
>
>> Michael Hart  writes:
>>
>>> Her own answer of the "killfile" solution takes care of it.
>>
>> The killfile solution won't do for newcomers.
>
> Anyone can do it.

Newcomers don't know who to put in their killfile.  That's what I was
saying.  

>> The killfile solution doesn't do for people who don't know how to do it.
>
> Just search for "killfile" etc., with any search engine.

> We all have our own learning curves, however, the "newbies" as
> many still call them, have grown up with computers, and learn,
> as we have seen, so much faster than we did.
>
> It's a natural language to them, not an acquired one.

To "them" who?  Those who live in NYC?  Yes, maybe.  I don't buy Prensky
trendy concept of digital natives vs digital immigrants but maybe you're
right that it's easier for the new generation to killfile someone.  

But that's only part of the world.  

>> The killfile solution doesn't do for people with a grain of humanity,
>> those who feel it's rude to shut someone down.
>
> Yet that is exactly what all those people are doing to bowerbird,
> being so very "rude to shut someone down."
>
> The difference is that it is so much MORE RUDE to censor someone
> from the ENTIRE audience than just yourself.

Yes.  I'm not in favor of moderation, I told that before.  

But I'm in favor of self-moderation.  And sometimes self-moderation
happens thanks to the people around that help us refrain ourselves. 

Of course anyone can wait that the sacred "widsom of the crowds" 
create the conditions for self-moderation.  But when it doesn't, 
and when people seem to complain about just one contributor, 
then you cannot accuse the crowd of being crazy.  

> No. . .you make no sense at all here, or with your ideal for
> rudeness to shut people down. . . .

Your tone is offensive.

And I've not expressed any "ideal for rudeness to shut people down".

Can you please be a bit more considerate when replying?

Thanks,

-- 
Bastien

From hart at pglaf.org  Sat Aug  2 18:24:02 2008
From: hart at pglaf.org (Michael Hart)
Date: Sat, 2 Aug 2008 18:24:02 -0700 (PDT)
Subject: [gutvol-d] bastien
In-Reply-To: <87od4alxiz.fsf@altern.org>
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
	
	
	<87od4alxiz.fsf@altern.org>
Message-ID: 




On Sat, 2 Aug 2008, Bastien Guerry wrote:

> Michael Hart  writes:
>
>> On Sat, 2 Aug 2008, Bastien wrote:
>>
>>> Michael Hart  writes:
>>>
>>>> Her own answer of the "killfile" solution takes care of it.
>>>
>>> The killfile solution won't do for newcomers.
>>
>> Anyone can do it.
>
> Newcomers don't know who to put in their killfile.  That's what I was
> saying.


Saying this over and over is supposed to do what???

True newcomers don't know anything about the new situtation.

This is true of all newcomers to all situations, other than
when some knowledge from some other field may prove to have
some value "outside the box."


>
>>> The killfile solution doesn't do for people who don't know how to do it.
>>
>> Just search for "killfile" etc., with any search engine.
>
>> We all have our own learning curves, however, the "newbies" as
>> many still call them, have grown up with computers, and learn,
>> as we have seen, so much faster than we did.
>>
>> It's a natural language to them, not an acquired one.
>
> To "them" who?  Those who live in NYC?  Yes, maybe.  I don't buy 
> Prensky trendy concept of digital natives vs digital immigrants 
> but maybe you're right that it's easier for the new generation to 
> killfile someone.

Do you really expect anyone to believe that only NYC natives have
any source for information?

Do you not realize how many languages Google works in?

"Digital natives?"

"Digital immigrants?"

You could write a best seller with those as the title.

However, we are all digital immigrants, none of us was born into
this culture.


> But that's only part of the world.

But NOW the majority of those who could be digital, ARE digital!


>
>>> The killfile solution doesn't do for people with a grain of humanity,
>>> those who feel it's rude to shut someone down.
>>
>> Yet that is exactly what all those people are doing to bowerbird,
>> being so very "rude to shut someone down."
>>
>> The difference is that it is so much MORE RUDE to censor someone
>> from the ENTIRE audience than just yourself.
>
> Yes.  I'm not in favor of moderation, I told that before.
>
> But I'm in favor of self-moderation.  And sometimes 
> self-moderation happens thanks to the people around that help us 
> refrain ourselves.

So far this particular thread has been nearly all noise.

>
> Of course anyone can wait that the sacred "widsom of the crowds"
> create the conditions for self-moderation.  But when it doesn't,
> and when people seem to complain about just one contributor,
> then you cannot accuse the crowd of being crazy.

Sure I can.

It just take integrity.

As I said before, being unanimous does not make you correct,
but what we have here is perhaps a dozen people all ganging
up on one or two people. . .without being constructive in a
manner they CLAIM is what they want.


>> No. . .you make no sense at all here, or with your ideal for
>> rudeness to shut people down. . . .
>
> Your tone is offensive.
>
> And I've not expressed any "ideal for rudeness to shut people down".
>
> Can you please be a bit more considerate when replying?

My apologies. . .did I confuse some quote from another person
with something I thought you were saying?

If so, Please correct me, and I will write a new reply.


Hoping to thank you for your time and consideration soon,


Michael



> Thanks,
>
> -- 
> Bastien
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Sat Aug  2 18:28:13 2008
From: hart at pglaf.org (Michael Hart)
Date: Sat, 2 Aug 2008 18:28:13 -0700 (PDT)
Subject: [gutvol-d] some sort of list manual
In-Reply-To: <87vdyilxj2.fsf@altern.org>
References:  
	
	<87vdyilxj2.fsf@altern.org>
Message-ID: 



On Sat, 2 Aug 2008, Bastien Guerry wrote:

> Michael Hart  writes:
>
>> On Sat, 2 Aug 2008, Bastien wrote:
>>
>>> Bowerbird at aol.com writes:
>>>
>>>> (and um, yes, i should have had bastien killfiled already as well.)
>>>
>>> Thanks!
>>>
>>> I decided to react because the last week I moved in a country where all
>>> the internet connections are nearly broken. It takes at least one minute
>>> to download emails of this list, and it takes the same to send mine.
>>>
>>> I don't have any particular problem with noise.  In countries with good
>>> connection, we can stand trolls -- some of them are even funny!  :+)
>>>
>>> But in poor countries, the one where the PG makes more sense, and the
>>> one where it would be great to see people participating in PG's effort,
>>> the one which don't have libraries or bookstores, here the noise is not
>>> bearable anymore.  Here the noise would prevent anyone for joining the
>>> community.
>>>
>>> --
>>> Bastien
>>
>> Then STOP making noise!!!
>
> Your answer is offensive.
>
> Can we please get back to a more civilized tone?
>
> In the email above I'm discussing the purpose of this list, as 
> part of the purpose of PG.  I'm pointing out that the way the list 
> currently runs doesn't help people from outside to jump in, 
> especially those with a low connection, those who might be in 
> greatest want of public domain Ebooks.
>
> I'm sorry you call this "noise".
>
> -- 
> Bastien


Again I apologize if I confused something you said with something
someone else said, and will be only to glad to reply again to all
you say I should have been replying to, and NOT to what you might
say I should not have been replying to.

Yes, I agree that the noise level is not conducive, either to new
members OR to old ones.

However, just SAYING what you want this list to be is not enough,
you actually have to START DOING what you are saying. . . .

I would be only too happy to see you start just such a thread.


Hoping to be thanking you for doing so in the near future,


Michael


From hart at pglaf.org  Sat Aug  2 18:50:26 2008
From: hart at pglaf.org (Michael Hart)
Date: Sat, 2 Aug 2008 18:50:26 -0700 (PDT)
Subject: [gutvol-d] Let's have a physical meeting and see what the noise
 sounds like!
In-Reply-To: <87hca2lxiz.fsf@altern.org>
References:  
	
	
	
	<87hca2lxiz.fsf@altern.org>
Message-ID: 



On Sat, 2 Aug 2008, Bastien Guerry wrote:

> Michael Hart  writes:
>
>> You complain that I tell off people who are doing noise
>> in way of complaining, but you are complaining yourself
>> so make up your mind!
>
> It's not because my own complaining add more reasons for
> others to complain that my own silence will deprive anyone
> from such reasons.  Why?  Because good trolls don't only
> enjoy replies, they also enjoy silence.  Both are signs
> of an audience that still pay attention, somehow.

Sorry, I just can't buy this one.

You can't have it both ways, as much as you might like to.

I'm thinking of starting a "moderated" list just as a sort
of experiment to show just how strong silence can be.


> The circle is elsewhere: the circle is in the fact that,
> if someone puts energy on a constructive discussion, there
> are good chances that BB will put more energy in denying
> any value in what the OP is proposing.  This is also
> rationality: induction and preductions.

What you are doing here is trying to put words to bowerbird,
or actions, that you would LIKE to prove your point.

The only problem is that they are fictional. . . .


Fact:

Bowerbird has been on pretty good behavior this past month.


Fact:

This hasn't stopped the complaining, it accelerated it.

His improved behavior is NOT being rewarded.

Just the opposite.

This is NOT how to induce the trend you SAY you want.


>> You are complaining about the very freedom you're using
>> and that you say you are not discussing.
>
> I'm not complaining about any freedom.

Sorry, it sounded like it to me.


> I'm not in favor of moderation.

Even for bowerbird???


> My initial point was that it's might not be such a great
> idea to tell to the people here: "Guys, if you're not happy
> with the current way of discussing, get the hell out, I can
> even provide the car and the fuel."

I thought that was exactly what I was saying when I said this
could evolve into another discussion group we would pay for.

By the way, I remember someone complaining that this would do
more to create splinter groups or the like, and I somehow did
not get my comment about the


Fact:

DP started out as just exactly that same kind of splinter.


Fact:

_I_ had thousands of dollars taken out of what would have in
other cases gone into my paycheck. . .just to get DP going--
and yet DP has treated me on and off as some kind of enemy.

Personally, I think helping start DP was one of the best I'm
ever likely to help start!

However, if I don't help OTHER splinter groups start there's
never going to be a chance for ANOTHER such success.

I didn't view PG as the end all be all of the universe, so I
helped start DP.

I didn't view DP as the end all be all of the universe, so I
have been willing to help others try various startups.

When I stop trying to encourage new ideas to start up you'll
know I am truly ossified.

Until then, I keep looking for new horizons to go past.



> I find it rude.

Once again I apologize, and offer to reply again if you will
send me a clearer picture of what I replied to incorrectly.

I was not TRYING to be rude, I was just trying to answer the
messages in their same tone, and perhaps I can do better.



>> You, and the others, have contradicted yourselves in an
>> instant in your own messages.
>
> You try to reduce people to self-contradiction because it
> looks like you don't want to see the complexity here.  See

Sorry, it looks as if your second sentence got cut off here.

As for the complexity, perhaps you could elucidate further.


>> I'm not suporting anyone's noise here.
>>
>> I get after bb as much as get after anyone.
>
> I'm not requiring special attention to my own noise.

_I_ have to pay attention, whether you ask me to or not.


>> Most obvious is that you think by ganging up on him you
>> can beat him or me into submission.
>
> I'm not trying to beat anyone, please don't let your emotions
> take over.

I would suggest that you, and all others concerned,
take this advice to heart.

Please note that I did NOT state this in the obvious manner,
which would have been a cheap shot.


> I'm not encouraging any sense of being a victim or whatsoever.

Sorry, it seemed, at least for a moment, that you were ganging
up on bowerbird along with everyone else.

I will await the chance for you and I to correct that. . . .


> Maybe it's time to relax a bit!

I certainly hope so!

Thanks!!

me


>
> Cheers,
>
> -- 
> Bastien
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From grythumn at gmail.com  Sat Aug  2 19:39:50 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Sat, 2 Aug 2008 22:39:50 -0400
Subject: [gutvol-d] bastien
In-Reply-To: 
References: 
	<15cfa2a50808020941uc3153a3j8c8f3b5587abada6@mail.gmail.com>
	
Message-ID: <15cfa2a50808021939p7be46152nbea407a98bca6d63@mail.gmail.com>

On Sat, Aug 2, 2008 at 8:33 PM, Michael Hart  wrote:
> Now. . .if you will just DO what you have outlined above. . .or DO
> something to encourage others to DO what you have outlined above--
> a new situation might arise.
>
> But. . .the whay you have put your message above you are doing the
> very same thing you are accusing about. . .saying something SHOULD
> be done, COULD be done. . .etc. . .without actually doing more for
> the situation than just complaining,
>
> "If your comments are not part of a proposed solution,
> then your comments are merely part of the noise storm."

Me? I tried to get some discussion going on markup for the OED, but
got little response outside some encouragement. I  also have books in
progress at DP, I am developing a new PDF image extractor, and have
on-going experiments in low-cost microfilm scanning.  I was scanning
and clearing Rule 6 SF periodicals, but have dropped off lately due to
time constraints. I have tried to discuss some of these (at least the
OED) on this list... but they get buried in the noise. So I continue
the discussion elsewhere.

Do your research, since you archive all mail on this list, instead of
attacking people blindly. In the last few months, my posts consist of:
-1 post in this thread, reacting to your flaming of people complaining
about lack of moderation;
-a link to page images from a DP book, that you requested
-a post briefly describing tools available for preprocessing at DP
-a request for help with formatting in the OED
-a link to some netiquette sites to correct your misunderstanding that
taking offtopic discussions offlist was impolite
-a correction concerning the copy of Hypnotomachia that I PM'd at DP
-a question regarding running google ads at PG

Noise storm? Frankly, most of the noise I've been seeing here lately
has been from _you_.

-R C

From Bowerbird at aol.com  Sat Aug  2 22:51:50 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 3 Aug 2008 01:51:50 EDT
Subject: [gutvol-d] summer ain't over, folks, it's gonna get hotter
Message-ID: 

have you noticed the heat in here gets turned up
every time my proof becomes ever more focused?

i am about to conclude a month-long series that
shows unequivocally that a book could have been
taken to near-perfection with good preprocessing
-- a book _they_ chose, as their own experiment --
using two dozen routines that are plainly obvious,
which would have taken an hour or two to execute.

yet d.p. used some 6-12 person-hours per round
to proof this book, having done 3 rounds _so_far_,
with the book queued up for _another_ proofing...
and there will be many more hours poured into it.

do the d.p. decision-makers who are responsible
for this awful workflow want you to _see_ all this?

nope.   so d.p. apologists are now kicking up dust.

they hope to _distract_ you from what i am saying.
they tell you it's nothing, that i contribute nothing.
they tell you that it's "noise", to try to convince you.
they desperately want you to stop listening to me.
they'd become _censors_ so you couldn't hear me.
they will poison this list if necessary, to silence me.

don't worry...   the dust will settle.   and i'll be here,
still,
with proof that builds and gets even more focused,
until nobody can ignore it.

i've got at least 3 more sets of analyses to divulge...
each one will make the truth even more transparent.

summer ain't over, folks...   it's gonna get _hotter_...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ralf at ark.in-berlin.de  Sun Aug  3 00:03:19 2008
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Sun, 3 Aug 2008 09:03:19 +0200
Subject: [gutvol-d] Please change the subject
Message-ID: <20080803070319.GA11777@ark.in-berlin.de>

Please change the subject line when posting Re: bastien
You wouldn't want YOUR name as out-of-context subject
dragged all over the list?

Yes, and that's why bowerbird is an aggressive person,
since he did start this subject. Now, Mr Hart, I expect
from you a real Iraq on that Saddam, or will you lose out
like your countrymen, too?


ralf

From bzg at altern.org  Sun Aug  3 05:02:43 2008
From: bzg at altern.org (Bastien)
Date: Sun, 03 Aug 2008 07:02:43 -0500
Subject: [gutvol-d] Let's have a physical meeting and see what the noise
	sounds like!
In-Reply-To:  (Michael Hart's
	message of "Sat, 2 Aug 2008 18:50:26 -0700 (PDT)")
References:  
	
	
	
	<87hca2lxiz.fsf@altern.org>
	
Message-ID: <877iaypam3.fsf@altern.org>

Michael Hart  writes:

> I'm thinking of starting a "moderated" list just as a sort
> of experiment to show just how strong silence can be.

I'm not in favor of moderation.

>> The circle is elsewhere: the circle is in the fact that,
>> if someone puts energy on a constructive discussion, there
>> are good chances that BB will put more energy in denying
>> any value in what the OP is proposing.  This is also
>> rationality: induction and preductions.
>
> What you are doing here is trying to put words to bowerbird,
> or actions, that you would LIKE to prove your point.
>
> The only problem is that they are fictional. . . .

I don't think so.  Let me put thoughts in the crowd: I think 
many people here have this feeling that whatever they will try
to propose, there will be someone to discourage them, either by
overcriticism or by (intentionally) miscommunicating.

If I'm not wrong, it means that many individuals have this feeling, 
and this feeling comes from many observations.  

All good for free-BB if he behave recently, but you can't redeem
yourself that easily in people's opinion.  (See?  I like to speak 
for the "people"!)

To be clear: I have nothing against BB, I'm reading many of his 
posts.  I just think is way of replying is too often inappropriate.

-- 
Bastien

From bzg at altern.org  Sun Aug  3 04:54:54 2008
From: bzg at altern.org (Bastien)
Date: Sun, 03 Aug 2008 06:54:54 -0500
Subject: [gutvol-d] bastien
In-Reply-To:  (Michael Hart's
	message of "Sat, 2 Aug 2008 18:24:02 -0700 (PDT)")
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
	
	
	<87od4alxiz.fsf@altern.org>
	
Message-ID: <87ej56pam4.fsf@altern.org>

Michael Hart  writes:

>>> No. . .you make no sense at all here, or with your ideal for
>>> rudeness to shut people down. . . .
>>
>> Your tone is offensive.
>>
>> And I've not expressed any "ideal for rudeness to shut people down".
>>
>> Can you please be a bit more considerate when replying?
>
> My apologies. . .did I confuse some quote from another person
> with something I thought you were saying?

I don't know.

Here is what I said: 

  Defending free speech by telling anyone that he's got the right not to
  listen to other's speach looks wrong to me.  It only encourage people
  not to listen, it doesn't help them contribute in a useful way.

And here is how you translated it:

  No. . .you make no sense at all here, or with your ideal for 
  rudeness to shut people down. . . .

So yes, maybe you were replying to someone else.


I repeated many times that I'm all for a non-moderated list, and I am
all for a free BB.  What I tried to say boils down to this: you cannot
request self-moderation from the crowds, you can only request it from
individuals.  You are doing right in requesting it from me (though I'm
trying to discuss the purpose of this list, which might not be 100%
noise), but you would do right in requesting it from BB for the many
times he's not posting very useful posts.  That's it.

What happens is that you address yourself to "your people" (sic) as if
it was an anonymous mob.  But it's not.  So each time your ask the mob
of not adding noise by complaining, people complain that they are
individuals, and that not encouraging self-moderation soon enough is not
doing harm to one list (who cares?) but doing harm to that many
individuals.

-- 
Bastien

From Bowerbird at aol.com  Sun Aug  3 09:56:27 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 3 Aug 2008 12:56:27 EDT
Subject: [gutvol-d] overcriticism
Message-ID: 

bastien said:
>    I think many people here have this feeling that 
>   whatever they will try to propose, 
>   there will be someone to discourage them, 
>   either by overcriticism or 
>   by (intentionally) miscommunicating.

by "someone", of course, _you_ mean _me_.

and what an interesting comment this is...

what, for instance, constitutes "overcriticism"?
do you want a trophy because you "proposed"
something or other?   all praise and no criticism?

besides, i give praise as willingly as criticism,
when it is praise that is deserved.   for instance,
i thought highly of jon richfield's recent posts...
but aside from michael, all of you ignored them.

yet you're jumping all over the bowerbird thread,
with you in the lead this time, bastien.

(and there have been plenty of times where _my_
name was the one in the message subject headers.
but it doesn't bother me.   i'm sure you noticed that
i even changed the subject header to my own name.)

you also accuse me of "miscommunicating", and even
go on to say that i am doing it "intentionally".   what,
exactly, does that mean?, bastien, i am very curious...

if you mean i'm misconstruing something, then it will
be very easy for you to point out the error in my logic,
weaken my position, and prevail in the market of truth.

so stop whining, and rise to the occasion!

defeat me with superior argumentation, that's the ticket.

as it is, your argument is quite funny.   because there is
no one on this list who has been the _victim_ of more
"overcriticism" and "intentional miscommunication" than
i have been subjected to.   not that i'm whining about it;
it comes with the territory when you challenge the norm.
but it's the height of irony to hear that _i_ am doing it!

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From bzg at altern.org  Sun Aug  3 10:37:36 2008
From: bzg at altern.org (Bastien)
Date: Sun, 03 Aug 2008 12:37:36 -0500
Subject: [gutvol-d] bastien
In-Reply-To:  (Michael Hart's
	message of "Sat, 2 Aug 2008 17:17:27 -0700 (PDT)")
References: 
	<15cfa2a50808020941uc3153a3j8c8f3b5587abada6@mail.gmail.com>
	<48949095.1040307@xs4all.nl>
	<15cfa2a50808021325x3de25815l96910fd2b43d07ce@mail.gmail.com>
	
Message-ID: <87fxpm9ezj.fsf@altern.org>

Michael Hart  writes:

> So. . .I suggest the noisemakers think a little more out
> of the present and into the future where reputations are
> going to come to the real test.

You mean...  the heaven/hell test?

Well, in the meantime we're living in some kind of inconsistent
purgatory, where heaven and hell are just hypothetical horizons.

If I'd care about who is right and who is wrong /when the universe
collapses/, I'd wait for it to collapse before voicing my opinions.

Maybe I'm short-minded, but I'm not trying to reach absolute truth.
I'm just expressing random and fuzzy ideas about what I think of 
this list, in hope you and other can think about it.

-- 
Bastien

From bzg at altern.org  Sun Aug  3 10:55:26 2008
From: bzg at altern.org (Bastien)
Date: Sun, 03 Aug 2008 12:55:26 -0500
Subject: [gutvol-d] overcriticism
In-Reply-To:  (Bowerbird@aol.com's message of
	"Sun, 3 Aug 2008 12:56:27 EDT")
References: 
Message-ID: <87bq0a9e5t.fsf@altern.org>

Bowerbird at aol.com writes:

> bastien said:
>>   I think many people here have this feeling that
>>   whatever they will try to propose,
>>   there will be someone to discourage them,
>>   either by overcriticism or
>>   by (intentionally) miscommunicating.
>
> by "someone", of course, _you_ mean _me_.

You now.  Maybe me later, or any other.

> yet you're jumping all over the bowerbird thread,
> with you in the lead this time, bastien.

This is not your thread.

> (and there have been plenty of times where _my_
> name was the one in the message subject headers.
> but it doesn't bother me.  i'm sure you noticed that
> i even changed the subject header to my own name.)

Up to you.

> you also accuse me of "miscommunicating", and even
> go on to say that i am doing it "intentionally".  what,
> exactly, does that mean?, bastien, i am very curious...

Let me give a simple example:

April 2008: someone sends a message about "PG ebooks on OLPC XO"
  http://lists.pglaf.org/private.cgi/gutvol-d/2008-April/008277.html

You reply in a nice and constructive way:
  http://lists.pglaf.org/private.cgi/gutvol-d/2008-April/008389.html

The conversation goes on, still okay:
  http://lists.pglaf.org/private.cgi/gutvol-d/2008-April/008390.html

Then you reply this:
  http://lists.pglaf.org/private.cgi/gutvol-d/2008-April/008391.html

Your tone is okay.  The content is also okay: your pointing at the
difficulties.  But this email is not very constructive in other aspects:

- a bit too long for being effective;

- your core message is: 
  "Unless you're a genius, you'd better give up."

- your last word is:
  "No, I don't share my software."

You could work on all three.  By trying to be more concise and more to
the point.  By trying to give directions to some solutions instead of
pointing that the problems will persist for ever.  By sharing your
software instead of making it the hidden proof that you'll reach 
heaven with no friend from this list.

I'm taking this example because it wouldn't be helpful to take an
example of pure rant.  The conversation above is quite acceptable.  It
just bypasses the limits of a useful conversation by a thin thingy.

-- 
Bastien

From ricardofdiogo at gmail.com  Sun Aug  3 11:16:28 2008
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Sun, 3 Aug 2008 19:16:28 +0100
Subject: [gutvol-d] overcriticism
In-Reply-To: <87bq0a9e5t.fsf@altern.org>
References:  <87bq0a9e5t.fsf@altern.org>
Message-ID: <9c6138c50808031116g34ba8fa8i8cc2de1c96906d3b@mail.gmail.com>

2008/8/3 Bastien :
>
> - a bit too long for being effective;
>
> - your core message is:
>  "Unless you're a genius, you'd better give up."
>
> - your last word is:
>  "No, I don't share my software."
>

Yup that's BB all right. There's nothing he can do about it. Unless he
decides to be someone else, which I think it's not fair to ask...

All these threads about Bowerbird are getting quite _boring_ actually.

Ricardo

From hart at pglaf.org  Sun Aug  3 11:29:03 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 3 Aug 2008 11:29:03 -0700 (PDT)
Subject: [gutvol-d] bastien
In-Reply-To: <87ej56pam4.fsf@altern.org>
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
	
	
	<87od4alxiz.fsf@altern.org>
	
	<87ej56pam4.fsf@altern.org>
Message-ID: 




On Sun, 3 Aug 2008, Bastien wrote:

> Michael Hart  writes:
>
>>>> No. . .you make no sense at all here, or with your ideal for
>>>> rudeness to shut people down. . . .
>>>
>>> Your tone is offensive.
>>>
>>> And I've not expressed any "ideal for rudeness to shut people down".
>>>
>>> Can you please be a bit more considerate when replying?
>>
>> My apologies. . .did I confuse some quote from another person
>> with something I thought you were saying?
>
> I don't know.
>
> Here is what I said:
>
>  Defending free speech by telling anyone that he's got the right 
> not to listen to other's speach looks wrong to me.

What is your proposed alternative to "the right not to listen"
if not to impede /the right to speak/ presuing the "killfile"
option still is not amenable to you?

I should perhaps also mention those who complain that even if
they "killfile" someone, they still are annoyed when another,
even those supposed on their side, because these others quite
literally make up 90% of the resulting "noise."


> It only encourage people not to listen, it doesn't help them 
> contribute in a useful way.

The FIRST thing required to "help them contribute in a useful way"
. . .or to contribute in any manner whatsoever, is their ability
to speak freely.

All else comes SECOND.


mh

From hart at pglaf.org  Sun Aug  3 11:41:50 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 3 Aug 2008 11:41:50 -0700 (PDT)
Subject: [gutvol-d] Please change the subject
In-Reply-To: <20080803070319.GA11777@ark.in-berlin.de>
References: <20080803070319.GA11777@ark.in-berlin.de>
Message-ID: 


On Sun, 3 Aug 2008, Ralf Stephan wrote:

> Please change the subject line when posting Re: bastien
> You wouldn't want YOUR name as out-of-context subject
> dragged all over the list?
>
> Yes, and that's why bowerbird is an aggressive person,
> since he did start this subject. Now, Mr Hart, I expect
> from you a real Iraq on that Saddam, or will you lose out
> like your countrymen, too?
>
>
> ralf


Actually, and as I mentioned to Bastien earlier privately,
even Bastien was using his own name in the subject line so
when anyone, including myself, replied, his name was there
automatically again in the subject line.

We fixed that part, at least. . .at least some of them.

As for changing subject lines, people usually also put the
comments (was "previous subject line") when doing so.

As far as starting the subject, no I don't thing bowerbird
or any of the recent messagers started it, bowerbird did a
new subject line, but not what I thought of a new subject:

The originator of this thread, for those of us who have an
attention span that lasts more than one week, was comments
I seem to recall were made by Andrew Sly, who has no more,
not a single other comment, since the original message.

Some of the messages I received declare such silence good,
while others declare it bad, and at least ONE declared the
fact that BOTH good and bad were the case.

The truth is. . .that there is little truth going on here,
in this particular non-conversation, that I am declaring I
will probably NOT be answering all in the near future.

However, I promised someone, just as I was leaving for one
of our additions special "Geek Lunch" meetings, that I was
going to answer in detail later, but I lost the message.

Would that person please let me know who it was so I could
do some searching to find it again???


Thanks!!!


Michael

From hart at pglaf.org  Sun Aug  3 11:53:25 2008
From: hart at pglaf.org (Michael Hart)
Date: Sun, 3 Aug 2008 11:53:25 -0700 (PDT)
Subject: [gutvol-d] overcriticism
In-Reply-To: <9c6138c50808031116g34ba8fa8i8cc2de1c96906d3b@mail.gmail.com>
References:  <87bq0a9e5t.fsf@altern.org>
	<9c6138c50808031116g34ba8fa8i8cc2de1c96906d3b@mail.gmail.com>
Message-ID: 


At first, and perhaps last, I thought the subject line
"overcriticism" applied to the comments ABOUT bowerbird
rather than those indicated with quoteations marks below:

Are those really direct quotes from bowerbird???

Or are these false quotation marks, and you are actually
just quoting yourself???

This certainly does not improve the signal/ratio here,
and could likely come up again in these conversations,
in reference to your own reputation.

I can hardly believe people aren't just too ashamed to
actually make such statements in public fora.


Michael


On Sun, 3 Aug 2008, Ricardo F Diogo wrote:

> 2008/8/3 Bastien :
>>
>> - a bit too long for being effective;
>>
>> - your core message is:
>>  "Unless you're a genius, you'd better give up."
>>
>> - your last word is:
>>  "No, I don't share my software."
>>
>
> Yup that's BB all right. There's nothing he can do about it. Unless he
> decides to be someone else, which I think it's not fair to ask...
>
> All these threads about Bowerbird are getting quite _boring_ actually.
>
> Ricardo
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From bzg at altern.org  Sun Aug  3 14:18:54 2008
From: bzg at altern.org (Bastien)
Date: Sun, 03 Aug 2008 16:18:54 -0500
Subject: [gutvol-d] overcriticism
In-Reply-To:  (Michael Hart's
	message of "Sun, 3 Aug 2008 11:53:25 -0700 (PDT)")
References:  <87bq0a9e5t.fsf@altern.org>
	<9c6138c50808031116g34ba8fa8i8cc2de1c96906d3b@mail.gmail.com>
	
Message-ID: <877iax94qp.fsf@altern.org>

Michael Hart  writes:

> Are those really direct quotes from bowerbird???

No.  These are mine, and this is pretty obvious given the context.
Also, I gave the links to the emails, so you can check by yourself.

The exact quotation for the third point is:

"my source is not available, no.  you can buy it, but the pricetag is 6-figure."

http://lists.pglaf.org/private.cgi/gutvol-d/2008-April/008391.html

> This certainly does not improve the signal/ratio here,
> and could likely come up again in these conversations,
> in reference to your own reputation.

My reputation is okay, thanks.

That the ones putting pressure on other's reputation are the same
who say they don't care about their own, they just care about Truth.

And while advocating free speech you keep threatening people that what
they say might stick to their reputation *forever*. Uh. Scary!

-- 
Bastien

From schultzk at uni-trier.de  Mon Aug  4 00:15:55 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Mon, 4 Aug 2008 09:15:55 +0200
Subject: [gutvol-d] some sort of list manual
In-Reply-To: 
References:  
Message-ID: <8ABE630F-91FC-4255-8C00-01235E9ACC9B@uni-trier.de>

Hi Bastien,

	Where are you located ?

	If the internet service is so bad then
	I assume that there is also not at many
	users who would access PG.

	Yes, I agree though that it would probably
	benifit the users of that country.

	regards
		Keith.

Am 02.08.2008 um 00:33 schrieb Bastien:

> Bowerbird at aol.com writes:
>
>> (and um, yes, i should have had bastien killfiled already as well.)
>
> Thanks!
>
> I decided to react because the last week I moved in a country where  
> all
> the internet connections are nearly broken. It takes at least one  
> minute
> to download emails of this list, and it takes the same to send mine.
>
> I don't have any particular problem with noise.  In countries with  
> good
> connection, we can stand trolls -- some of them are even funny!  :+)
>
> But in poor countries, the one where the PG makes more sense, and the
> one where it would be great to see people participating in PG's  
> effort,
> the one which don't have libraries or bookstores, here the noise is  
> not
> bearable anymore.  Here the noise would prevent anyone for joining the
> community.
>
> -- 
> Bastien
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From schultzk at uni-trier.de  Mon Aug  4 00:10:27 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Mon, 4 Aug 2008 09:10:27 +0200
Subject: [gutvol-d] bastien
In-Reply-To: 
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
	
Message-ID: 

Hi Bastien,

	I have to disagree.
	Maybe the newcomers want to the kind of "information"
	that so-defined trolls give.

	Far as the "killfile" I am sure almost all e-mail
	user know about filters!

	Far as your free speech argument is concerned:
	I do not understand ! Not listening to someone
	only means that one does not care to contribute
	to the speakers view or stand point.

	I personally do not follow all threads and I also
	stop ready some that go shall I say: "Go the wrong way"!
	This process takes only seconds.

	regards
		Keith.

	
Am 02.08.2008 um 00:21 schrieb Bastien:

> Michael Hart  writes:
>
>> Her own answer of the "killfile" solution takes care of it.
>
> The killfile solution won't do for newcomers.
>
> The killfile solution doesn't do for people who don't know how to  
> do it.
>
> The killfile solution doesn't do for people with a grain of humanity,
> those who feel it's rude to shut someone down.
>
> Defending free speech by telling anyone that he's got the right not to
> listen to other's speach looks wrong to me.  It only encourage people
> not to listen, it doesn't help them contribute in a useful way.
>
> -- 
> Bastien
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From schultzk at uni-trier.de  Mon Aug  4 00:24:31 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Mon, 4 Aug 2008 09:24:31 +0200
Subject: [gutvol-d] bastien
In-Reply-To: <15cfa2a50808020941uc3153a3j8c8f3b5587abada6@mail.gmail.com>
References: 
	<15cfa2a50808020941uc3153a3j8c8f3b5587abada6@mail.gmail.com>
Message-ID: <90446762-4FC0-4353-93C4-7F12294FAE05@uni-trier.de>

Hi Robert,

	I tried to discuss things (here) with DP and
	was blasted similarly as BB for critic.

	I could help program, but the way DP handles
	critic I will not. NOT because I would not want
	to, but because I will not work with those who
	are not willing to accept critic if it does not
	contain praise for them if I am not paid for it.

	regards
		Keith.

Am 02.08.2008 um 18:41 schrieb Robert Cicconetti:

> On Sat, Aug 2, 2008 at 12:52 AM, Michael Hart  wrote:
>> Also, "who is willing to supress their voice completely" is one
>> of the other major fallacies, the one called "ad absurdum," but
>> sometimes "ad infitum" by saying if something is taken to these
>> extremes it doesn't make sense.
>
> No, but it does mean that most of the active discussion is being
> driven to other fora with lower noise floors and more helpful
> attitudes. We (at DP, since that is where most of his sniping is
> directed) have no shortage of people who can point out problem areas..
>  we are short of people who can actually write code to fix them.  We
> have a shortage of trained statisticians who can help determine a
> reliable algorithm for when a page/project is "done". We have a
> shortage of people who are willing commit to the hours to days of
> polishing works in post processing, or again, of people to write code
> to distribute the tasks that do not need to be done by a single
> person.
>
> Besides, weren't you going to ignore gutvol-d until Greg N pointed out
> some specific interesting discussions?
>
> R C
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From schultzk at uni-trier.de  Mon Aug  4 00:31:44 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Mon, 4 Aug 2008 09:31:44 +0200
Subject: [gutvol-d] Moderation  (was: Re: bastien)
In-Reply-To: <18580.39131.718459.841886@celery.zuhause.org>
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
	
	
	<18580.39131.718459.841886@celery.zuhause.org>
Message-ID: <0DD657F2-5703-42B7-80FE-34EE170846DC@uni-trier.de>

Hi Bruce,

	Nice point. Generally, true.
	Yet, unlike your example this panel
	is open ended. Furthermore, Everybody
	gets to speak simutaneously!

	I, too, would prefer alot less attacks,
	but you know the way KIDS are.

	regards
		Keith.

Am 02.08.2008 um 19:26 schrieb Bruce Albrecht:

> Michael Hart writes:
>> The difference is that it is so much MORE RUDE to censor someone
>> from the ENTIRE audience than just yourself.
>
> Just because a mailing list is moderated, does not mean that someone
> is not allowed to have a voice.  A moderator could reject a message
> and request that the poster rewrite it so that it is not attacking a
> specific person or persons.  If there is a point to the message beyond
> the attack, the poster probably will be willing to trim the offensive
> portion of the message, if not, the poster probably didn't care enough
> about the point to make it.
>
> Quite frankly, I find it far more rude, for example, for Marcello
> Parathon to send a message that has no point other than he thinks
> Bowerbird is an idiot, than for a moderator to block that post.
>
> If all of us were in a room together at some sort of panel, and two
> persons started shouting at each other about how stupid the other one
> was, would you let them both go at it until the end of the panel?  No,
> you'd do something to calm things down, like call on other people, or
> ask (make) them leave the room, so others could make their points.
> Why is there something magical about a mailing list that means that if
> people can't self-censor, others shouldn't help (make) them do it?
>
> Quite frankly, I only read this list about once a month, and I don't
> blacklist anyone, but I tend to skip a lot of messages appear to have
> no other point than X thinks Y is an idiot, sometimes with a lucid
> explanation, but more often not.  I'd be more likely to read and
> participate in discussions here if there were more ideas and fewer
> attacks.

From schultzk at uni-trier.de  Mon Aug  4 00:41:31 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Mon, 4 Aug 2008 09:41:31 +0200
Subject: [gutvol-d] Thanks (OT)
In-Reply-To: <87abfulxiz.fsf@altern.org>
References: 
	<87abfulxiz.fsf@altern.org>
Message-ID: <907C4A4B-0C75-4F5D-9FB1-0417744B1853@uni-trier.de>

Hi Bastien,

	I have completely lost you in your last post.
	I do not understand your offense. It is simply
	the subject to the thread and qutoes are
	properly indicated in the content of the
	e-mails.

	regards
		Keith.

Am 03.08.2008 um 00:57 schrieb Bastien:

> Can you stop using my name in the subject line?
>
> This is offensive.
>
> -- 
> Bastien
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From schultzk at uni-trier.de  Mon Aug  4 00:51:47 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Mon, 4 Aug 2008 09:51:47 +0200
Subject: [gutvol-d] overcriticism
In-Reply-To: <9c6138c50808031116g34ba8fa8i8cc2de1c96906d3b@mail.gmail.com>
References:  <87bq0a9e5t.fsf@altern.org>
	<9c6138c50808031116g34ba8fa8i8cc2de1c96906d3b@mail.gmail.com>
Message-ID: <98D597D0-7CF4-49A0-8769-0A2474CB12BB@uni-trier.de>

Hi Ricardo,

	Yes, he does not want to share his software!
	YET, HIS HAS SHARED HIS METHODOLOGIES.
	They are not that hard to understand and integrate
	into another system. Which would be easier than
	integrating his software.

	regards
		Keith.

Am 03.08.2008 um 20:16 schrieb Ricardo F Diogo:

> 2008/8/3 Bastien :
>>
>> - a bit too long for being effective;
>>
>> - your core message is:
>>  "Unless you're a genius, you'd better give up."
>>
>> - your last word is:
>>  "No, I don't share my software."
>>
>
> Yup that's BB all right. There's nothing he can do about it. Unless he
> decides to be someone else, which I think it's not fair to ask...
>
> All these threads about Bowerbird are getting quite _boring_ actually.
>
> Ricardo
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From gbnewby at pglaf.org  Mon Aug  4 01:05:38 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon, 4 Aug 2008 01:05:38 -0700
Subject: [gutvol-d] Logging hits to PG files
In-Reply-To: <48903A07.3010307@baechler.net>
References: <48903A07.3010307@baechler.net>
Message-ID: <20080804080538.GA5262@mail.pglaf.org>

On Wed, Jul 30, 2008 at 02:53:11AM -0700, Tony Baechler wrote:
> Hello,
> 
> Obviously, PG has no control over what their mirrors do, so this is 
> mostly about gutenberg.org and readingroo.ms.  My question is this: How 
> is PG logging and using information regarding what books are downloaded 
> and by whom?  I know gutenberg.org keeps Apache logs because they've 
> been mentioned here before.  What I'm wondering is what PG uses this 
> information for and how it's used.  I've never seen any mention of 
> readingroo.ms logs even though it is an official PG server and is owned 
> by Greg Newby.

readingroo.ms is one of the many mirrors, and has some other
stuff too.  It's one of four mirrors that I have some level of
direct control over [actually, three mirrors, and a non-public
copy]. 

Marcello's description of ibiblio, which hosts gutenberg.org,
is accurate as far as I know.

For readingroo.ms, it's located physically in the Seattle
Collocation server cage in Seattle.  

Logs for Apache & vsftpd are kept indefinitely.  I don't do
anything with them [no analysis or statistics, except ad hoc or
to diagnose troubles].  

Interestingly, I was thinking of setting up a Tor server on
readingroo.ms, but unfortunately I'm already guilty of driving
the Colo bandwidth costs higher than they would otherwise be,
so have instead tried to keep a low profile.  The good news
is that for most people, readingroo.ms is much faster htan
ibiblio -- they're on comparable fat network pipes, but readingroo.ms
is far less busy.

You are right that we *should* have a privacy policy for our
various sites & mirrors.  I'm not sure how to craft one globally,
though...maybe each site/mirror should have its own.

In the case of readingroo.ms, items are logged automatically,
but aren't shared, automatically or otherwise.  Some sort of
court order or somesuch could result in sharing, as could legal
or illegal wiretapping.

We have never used information from people who download [or donate,
or contribute eBooks, or whatever!] for marketing, third parties,
etc.  Heck, we don't even add people to our email list when they
make a donation.  So I think you have nothing to worry about in
that regard.  That doesn't mean nasty people aren't out to get
you, of course.

Other than some sort of law enforcement scenario, I can't imagine
turning over server logs to any stranger.  I can easily imagine
trying to use the logs for our own sites' improvements, though, as
Marcello described.  But the folks who get their hands on such
data are trusted & few.

BTW, I have very deep trust in the ibiblio folks, through over
10 years of personal contact.  Because they're based in librarianship
& journalism departments, they have some pretty in-depth notions
of proper professional, legal & ethical behavior.

As for worldebookfair, gutenberg.cc and others in that group:
you can email to ask.  
  John Guagliardo 

Ditto for all the other sites like PGofAU, PG-Canada,
Librivox, etc...
  -- Greg

> What prompted this question, besides the usual concerns about privacy 
> and security, is that I've recently been setting up and using TOR.  
> While I like the general concept and I very much like anonymous 
> browsing, it is very, very slow and is not good for file downloads.  I'm 
> not too worried on one hand whether PG knows what I download or not, but 
> on the other hand, an official published statement from PG would be 
> nice.  I'm also thinking of people outside of the US who either may not 
> legally use this material (but do anyway, obviously) or who may not read 
> it because of the restrictions placed on them by their governments.  In 
> my case, PG wouldn't get much out of my downloads anyway because I 
> download everything in English with a plain text edition, but I would 
> still be happier knowing that PG isn't going to use, sell, track, or 
> otherwise make use of information like my IP address, browser, etc.  I 
> do trust PG to a point, but the philosophy of TOR is to trust no one and 
> I'm starting to see more and more how easy it is to track someone's 
> browsing habbits.  The only reason why I don't switch to TOR for almost 
> everything is that it is very slow and is very short on relays.
> 
> [TOR https://www.torproject.org/] is the link for more information on 
> TOR.  There are versions for Windows, Mac, Linux, etc.
> 
> Thanks very much for providing clarification about this.  If this is in 
> the FAQ somewhere, sorry for not finding it, but other than on this list 
> and Pre-prints, I've not seen mention of readingroo.ms before.  While 
> I'm here, a similar statement about worldebookfair.com would be 
> helpful.  I don't trust worldebookfair.com or PGCC because they aren't 
> directly under the control of PG, Newby or Hart.
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From bzg at altern.org  Mon Aug  4 04:06:08 2008
From: bzg at altern.org (Bastien)
Date: Mon, 04 Aug 2008 06:06:08 -0500
Subject: [gutvol-d] overcriticism
In-Reply-To: <98D597D0-7CF4-49A0-8769-0A2474CB12BB@uni-trier.de> (Schultz
	Keith J.'s message of "Mon, 4 Aug 2008 09:51:47 +0200")
References:  <87bq0a9e5t.fsf@altern.org>
	<9c6138c50808031116g34ba8fa8i8cc2de1c96906d3b@mail.gmail.com>
	<98D597D0-7CF4-49A0-8769-0A2474CB12BB@uni-trier.de>
Message-ID: <87y73d6nvj.fsf@altern.org>

"Schultz Keith J."  writes:

> 	They are not that hard to understand and integrate
> 	into another system. Which would be easier than
> 	integrating his software.

Why would it be hard to integrate his software?  

I don't understand.

-- 
Bastien

From bzg at altern.org  Mon Aug  4 05:04:40 2008
From: bzg at altern.org (Bastien)
Date: Mon, 04 Aug 2008 07:04:40 -0500
Subject: [gutvol-d] bastien
In-Reply-To:  (Schultz
	Keith J.'s message of "Mon, 4 Aug 2008 09:10:27 +0200")
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
	
	
Message-ID: <87ljzd6l5z.fsf@altern.org>

"Schultz Keith J."  writes:

> 	Far as your free speech argument is concerned:
> 	I do not understand ! Not listening to someone
> 	only means that one does not care to contribute
> 	to the speakers view or stand point.

Let's say there are this two basic rights:

R1: the right to voice one's opinions.
R2: the right not to pay attention to other's opinion.

Complementary to these rights, let's say that there are these abilities

A1: the ability to express oneself
A2: the ability of not paying attention to other's opinion
A3: the ability to moderate oneself
A4: the ability to try to understand others opinion

You can defend (as I do) R1 as an ethical absolute.

Usually, when people complain about the noise in a mailing list, they
are not discussing R1 on a theoretical perspective, they are moving to
pragmatical considerations and say: defending R1 is not enough, you also
have to promote A3 and A4 somehow.

You can say "Okay, let's try." and try to wonder what are the practical
things you can _do_ to encourage A[1-4].

You can also dismiss this pragmatical request and say: "No, there is no
need to promote A3 and A4 because you have R2, and this should be enough
to make the noise barable."

The problem with this answer is that is places itself on a theoretical
ground to address a pragmatical problem.  The complainers can go on and
say: "R2 is okay but it is useless until people have A2."

Etc, etc.   

There are two problems here: the first one is to always dismiss
pragmatical requests by going back to principles.  

The other problem is that, in my opinion, there is no such thing as R2.
You could say: "There is a right of not getting wet when the rain falls"
but then you would misuse the word "right".

Promoting R2 as an ethically grounded so-called "right" is not only
misleading, it's also discouraging people from getting A4 (first), then
A3, then (on a forum) A2 and finally A1.

So while I agree R1 is an absolute pre-condition, I think A1 is the real
thing we shall try to achieve.

-- 
Bastien

From Bowerbird at aol.com  Mon Aug  4 11:20:30 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 4 Aug 2008 14:20:30 EDT
Subject: [gutvol-d] overcriticism
Message-ID: 

bastien said:
>   You could work on all three.? 
>    By trying to be more concise and more to the point.

you're telling me!   heck, you just have to _read_ my 
long   messages, but i have to _type_ the darn things!

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From bzg at altern.org  Mon Aug  4 11:22:55 2008
From: bzg at altern.org (Bastien)
Date: Mon, 04 Aug 2008 13:22:55 -0500
Subject: [gutvol-d] overcriticism
In-Reply-To:  (Bowerbird@aol.com's message of
	"Mon, 4 Aug 2008 14:20:30 EDT")
References: 
Message-ID: <87tze04p34.fsf@altern.org>

Bowerbird at aol.com writes:

> bastien said:
>>   You could work on all three.?
>>   By trying to be more concise and more to the point.
>
> you're telling me!  heck, you just have to _read_ my
> long  messages, but i have to _type_ the darn things!

Then do yourself a favor :)

-- 
Bastien

From hart at pglaf.org  Mon Aug  4 11:23:23 2008
From: hart at pglaf.org (Michael Hart)
Date: Mon, 4 Aug 2008 11:23:23 -0700 (PDT)
Subject: [gutvol-d] Taking Leave Once Again
Message-ID: 


I'll be watching but not commenting,
as mauch as possible for August.

Have fun flaming yourselves out.


Micahel S. Hart
Founder
Project Gutenberg


From Bowerbird at aol.com  Mon Aug  4 11:52:24 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 4 Aug 2008 14:52:24 EDT
Subject: [gutvol-d] boring yet amusing
Message-ID: 

ricardo said:
>    All these threads about Bowerbird 
>    are getting quite _boring_ actually.

gosh, isn't _that_ the truth!

they were getting quite boring years ago.

***

keith said:
>    Yes, he does not want to share his software!
>    YET, HIS HAS SHARED HIS METHODOLOGIES.

yes, it's kind of amusing, isn't it?

if i _did_ hand them my code, they'd be asking me
for pseudo-code instead, which is what i've been
giving them all along.   like i said, it's very amusing.

of course, it's also amusing how they want to insist
on open source, but then confess that they are short
on programmers who could actually re-work the code.
open-source advocates take programmers for granted.

what it boils down to is they don't know what they want,
they don't know what they need, they don't know what
their problems are, they don't know how to solve them,
they don't know how to do research on their processes,
they don't know how to analyze data from their research,
they don't know how to translate their research results to
a plan, and they don't know how to activate an action plan.

yet they could _still_ be making progress on improvement,
if only they could accept constructive criticism, but they can't
do that either...   instead they attack me and label me a "troll".
ok, be that way...          :+)

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Mon Aug  4 11:53:40 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 4 Aug 2008 14:53:40 EDT
Subject: [gutvol-d] Taking Leave Once Again
Message-ID: 

michael-

>    Micahel S. Hart

looks like you could use a break...         ;+)

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Mon Aug  4 16:10:36 2008
From: dakretz at gmail.com (don kretz)
Date: Mon, 4 Aug 2008 16:10:36 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 12
In-Reply-To: 
References: 
Message-ID: <627d59b80808041610h461a9c80h68778334eb5eb58a@mail.gmail.com>

The state of Distributed Proofreaders is a conundrum to me.

The assertion is made (and reinforced in another post) that DP is aware of
much wondroous improvement that could (and should) be wrought on the site
software. But all in vain, for there are no (0, none, not any) developers
with the skill and desire to offer assistance.

Yet, in this very thread, on another listserv far away (yet in the same
universe, I suppose), we have very recently seen four or five postings from
specific skilled and willing individuals who could (and in some cases,
already have,) demonstrated specifically those attributes, the lack of which
has been so lamented.

Michael's term, "ossification", is one that I myself have used in the past.
I myself would love to see proposals on how to break through this wearisome
impasse. I can't think of any. The only approach that seems to gain any
traction at all, much to everyone's chagrin and regret, appears to have been
that of Mr. Bird.

On Mon, Aug 4, 2008 at 12:00 PM,  wrote:

> Send gutvol-d mailing list submissions to
>        gutvol-d at lists.pglaf.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.pglaf.org/listinfo.cgi/gutvol-d
> or, via email, send a message with subject or body 'help' to
>        gutvol-d-request at lists.pglaf.org
>
> You can reach the person managing the list at
>        gutvol-d-owner at lists.pglaf.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gutvol-d digest..."
>
> Today's Topics:
>
>   1. Re: bastien (Bastien)
>   2. Re: overcriticism (Bowerbird at aol.com)
>   3. Re: overcriticism (Bastien)
>   4. Taking Leave Once Again (Michael Hart)
>   5. Re: boring yet amusing (Bowerbird at aol.com)
>   6. Re: Taking Leave Once Again (Bowerbird at aol.com)
>
>
> ---------- Forwarded message ----------
> From: Bastien 
> To: gutvol-d at lists.pglaf.org
> Date: Mon, 04 Aug 2008 07:04:40 -0500
> Subject: Re: [gutvol-d] bastien
> "Schultz Keith J."  writes:
>
> >       Far as your free speech argument is concerned:
> >       I do not understand ! Not listening to someone
> >       only means that one does not care to contribute
> >       to the speakers view or stand point.
>
> Let's say there are this two basic rights:
>
> R1: the right to voice one's opinions.
> R2: the right not to pay attention to other's opinion.
>
> Complementary to these rights, let's say that there are these abilities
>
> A1: the ability to express oneself
> A2: the ability of not paying attention to other's opinion
> A3: the ability to moderate oneself
> A4: the ability to try to understand others opinion
>
> You can defend (as I do) R1 as an ethical absolute.
>
> Usually, when people complain about the noise in a mailing list, they
> are not discussing R1 on a theoretical perspective, they are moving to
> pragmatical considerations and say: defending R1 is not enough, you also
> have to promote A3 and A4 somehow.
>
> You can say "Okay, let's try." and try to wonder what are the practical
> things you can _do_ to encourage A[1-4].
>
> You can also dismiss this pragmatical request and say: "No, there is no
> need to promote A3 and A4 because you have R2, and this should be enough
> to make the noise barable."
>
> The problem with this answer is that is places itself on a theoretical
> ground to address a pragmatical problem.  The complainers can go on and
> say: "R2 is okay but it is useless until people have A2."
>
> Etc, etc.
>
> There are two problems here: the first one is to always dismiss
> pragmatical requests by going back to principles.
>
> The other problem is that, in my opinion, there is no such thing as R2.
> You could say: "There is a right of not getting wet when the rain falls"
> but then you would misuse the word "right".
>
> Promoting R2 as an ethically grounded so-called "right" is not only
> misleading, it's also discouraging people from getting A4 (first), then
> A3, then (on a forum) A2 and finally A1.
>
> So while I agree R1 is an absolute pre-condition, I think A1 is the real
> thing we shall try to achieve.
>
> --
> Bastien
>
>
>
> ---------- Forwarded message ----------
> From: Bowerbird at aol.com
> To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com
> Date: Mon, 4 Aug 2008 14:20:30 EDT
> Subject: Re: [gutvol-d] overcriticism
> bastien said:
> >   You could work on all three.
> >   By trying to be more concise and more to the point.
>
> you're telling me!  heck, you just have to _read_ my
> long  messages, but i have to _type_ the darn things!
>
> -bowerbird
>
>
>
> **************
> Looking for a car that's sporty, fun and fits in your budget? Read reviews
> on AOL Autos.
> (
> http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017)
>
> ---------- Forwarded message ----------
> From: Bastien 
> To: gutvol-d at lists.pglaf.org
> Date: Mon, 04 Aug 2008 13:22:55 -0500
> Subject: Re: [gutvol-d] overcriticism
> Bowerbird at aol.com writes:
>
> > bastien said:
> >>   You could work on all three.
> >>   By trying to be more concise and more to the point.
> >
> > you're telling me!  heck, you just have to _read_ my
> > long  messages, but i have to _type_ the darn things!
>
> Then do yourself a favor :)
>
> --
> Bastien
>
>
>
> ---------- Forwarded message ----------
> From: Michael Hart 
> To: The gutvol-d Mailing List 
> Date: Mon, 4 Aug 2008 11:23:23 -0700 (PDT)
> Subject: [gutvol-d] Taking Leave Once Again
>
> I'll be watching but not commenting,
> as mauch as possible for August.
>
> Have fun flaming yourselves out.
>
>
> Micahel S. Hart
> Founder
> Project Gutenberg
>
>
>
>
> ---------- Forwarded message ----------
> From: Bowerbird at aol.com
> To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com
> Date: Mon, 4 Aug 2008 14:52:24 EDT
> Subject: Re: [gutvol-d] boring yet amusing
> ricardo said:
> >   All these threads about Bowerbird
> >   are getting quite _boring_ actually.
>
> gosh, isn't _that_ the truth!
>
> they were getting quite boring years ago.
>
> ***
>
> keith said:
> >   Yes, he does not want to share his software!
> >   YET, HIS HAS SHARED HIS METHODOLOGIES.
>
> yes, it's kind of amusing, isn't it?
>
> if i _did_ hand them my code, they'd be asking me
> for pseudo-code instead, which is what i've been
> giving them all along.  like i said, it's very amusing.
>
> of course, it's also amusing how they want to insist
> on open source, but then confess that they are short
> on programmers who could actually re-work the code.
> open-source advocates take programmers for granted.
>
> what it boils down to is they don't know what they want,
> they don't know what they need, they don't know what
> their problems are, they don't know how to solve them,
> they don't know how to do research on their processes,
> they don't know how to analyze data from their research,
> they don't know how to translate their research results to
> a plan, and they don't know how to activate an action plan.
>
> yet they could _still_ be making progress on improvement,
> if only they could accept constructive criticism, but they can't
> do that either...  instead they attack me and label me a "troll".
> ok, be that way...         :+)
>
> -bowerbird
>
>
>
> **************
> Looking for a car that's sporty, fun and fits in your budget? Read reviews
> on AOL Autos.
> (
> http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017)
>
> ---------- Forwarded message ----------
> From: Bowerbird at aol.com
> To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com
> Date: Mon, 4 Aug 2008 14:53:40 EDT
> Subject: Re: [gutvol-d] Taking Leave Once Again
> michael-
>
> >   Micahel S. Hart
>
> looks like you could use a break...        ;+)
>
> -bowerbird
>
>
>
> **************
> Looking for a car that's sporty, fun and fits in your budget? Read reviews
> on AOL Autos.
> (
> http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017)
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From grythumn at gmail.com  Mon Aug  4 16:41:17 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Mon, 4 Aug 2008 19:41:17 -0400
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 12
In-Reply-To: <627d59b80808041610h461a9c80h68778334eb5eb58a@mail.gmail.com>
References: 
	<627d59b80808041610h461a9c80h68778334eb5eb58a@mail.gmail.com>
Message-ID: <15cfa2a50808041641w75e1ae3cu4f5b5fe5fa1f93dd@mail.gmail.com>

On Mon, Aug 4, 2008 at 7:10 PM, don kretz  wrote:
> The assertion is made (and reinforced in another post) that DP is aware of
> much wondroous improvement that could (and should) be wrought on the site
> software. But all in vain, for there are no (0, none, not any) developers
> with the skill and desire to offer assistance.

How did you come to this conclusion? There is a shortage of
development time and talent, not a lack of it.

> Yet, in this very thread, on another listserv far away (yet in the same
> universe, I suppose), we have very recently seen four or five postings from
> specific skilled and willing individuals who could (and in some cases,
> already have,) demonstrated specifically those attributes, the lack of which
> has been so lamented.

Whom are you referring to? BB? I believe he mostly programs in visual
basic, and is unwilling to part with his code without a very large sum
of money. As DP is GPL'd and in php, there are problems even before
you consider the personality incompatibilities involved.

If someone _is_ interested in working on this, the DP code is
available at sourceforge, and sandboxes are available on the test
server for developers. There is a task list at DP if anyone wants some
smaller projects before tackling something as ingrained in the code
base as the round system.

R C

From dakretz at gmail.com  Mon Aug  4 17:48:31 2008
From: dakretz at gmail.com (don kretz)
Date: Mon, 4 Aug 2008 17:48:31 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 12
In-Reply-To: <627d59b80808041610h461a9c80h68778334eb5eb58a@mail.gmail.com>
References: 
	<627d59b80808041610h461a9c80h68778334eb5eb58a@mail.gmail.com>
Message-ID: <627d59b80808041748j3028e2c9w6c7bc7777e23bf28@mail.gmail.com>

I found it especially interesting to note the response of Mr. Vandenberg in
representing wikisource. Somehow, amazingly, he was able to beguile The Bird
into not only a dialog with genuine content, but also entice him to consider
contributing cooperatively.

That was especially intriguing because it occurred in close proximity to a
series of posts by rfrank (one of my mentionees), describing his process
trying to set up special procedures to for collecting statistics wrt his own
projects, and knowing how unable he has been to get access to DP site data
by other means. He's been so unfailingly nice, polite, and obliging in his
approach; but there's no apparent vandenberg-equivalent that I've seen
helping him out.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From lee at novomail.net  Mon Aug  4 17:52:35 2008
From: lee at novomail.net (Lee Passey)
Date: Mon, 04 Aug 2008 18:52:35 -0600
Subject: [gutvol-d] Please change the subject
In-Reply-To: <20080803070319.GA11777@ark.in-berlin.de>
References: <20080803070319.GA11777@ark.in-berlin.de>
Message-ID: <4897A453.4090900@novomail.net>

Over the course of the past several weeks, BowerBird (if that is, in
fact, his real name :-)) has regaled us with a number of regex-type
transformations which he believes with improve the efficiency of
Distributed Proofreaders when performed on a new text before that text
is exposed to the text proofreaders. (Why he would choose this forum to
make recommendations to DP is curious; after all, there is no real
connection between DP and Project Gutenberg other than the fact that PG
is, so far, the sole recipient of DP's largess.)

Presumably, all of the transformations that BowerBird has suggested can
easily be built into some sort of script such as sed, perl, php, even
bash. The problem, of course, is that many of these transformations are
only heuristics, and will frequently lead to the wrong result; but
probably not more often that they will lead to the correct result.

This uncertainty factor raises what I believe is an interesting
question: if these heuristics are applied to a text /before/ it is
exposed to human proofreaders, is there any reason /not/ to automate the
application of these heuristics, applying them to all texts before they
are fed into the DP mill, relying on subsequent human intervention to
find the newly introduced errors?

It seems to me that the /source/ of the errors is unimportant as
compared to the /quantity/ of errors. Of course, even more important is
the /quality/ of the errors; if the original errors are blatant, but the
introduced errors are subtle, it may make more sense to simply let the
the humans deal with the errors as they are more likely to catch the
blatant errors than the subtle ones.

So what do others think about this? (How's that for changing the subject?)



From Bowerbird at aol.com  Mon Aug  4 17:55:27 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 4 Aug 2008 20:55:27 EDT
Subject: [gutvol-d] to do what must be done
Message-ID: 

i said:
>    what it boils down to is they don't know what they want,
>   they don't know what they need, they don't know what
>   their problems are, they don't know how to solve them,
>   they don't know how to do research on their processes,
>   they don't know how to analyze data from their research,
>   they don't know how to translate their research results to
>   a plan, and they don't know how to activate an action plan.

dkretz is the exception that proves the rule.

roger frank shows a lot of potential, but...

and a few others get lucky once in a while...

but until the d.p. codebase contains a standalone
tool that any naive user can easily install and use,
a tool which does book-wide editing and viewing
with the capacity to bring focus to a specific point,
they ain't gonna be able to do what must be done.

of course, if d.p. would have had that kind of tool
at their disposal, as a cost-free app, a reality that
i would have brought about if not for buttheads,
then d.p. wouldn't have needed to have the code.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From grythumn at gmail.com  Mon Aug  4 18:44:22 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Mon, 4 Aug 2008 21:44:22 -0400
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 12
In-Reply-To: <627d59b80808041748j3028e2c9w6c7bc7777e23bf28@mail.gmail.com>
References: 
	<627d59b80808041610h461a9c80h68778334eb5eb58a@mail.gmail.com>
	<627d59b80808041748j3028e2c9w6c7bc7777e23bf28@mail.gmail.com>
Message-ID: <15cfa2a50808041844g5cd98c89u91be8ca892559dd@mail.gmail.com>

On Mon, Aug 4, 2008 at 8:48 PM, don kretz  wrote:
> That was especially intriguing because it occurred in close proximity to a
> series of posts by rfrank (one of my mentionees), describing his process
> trying to set up special procedures to for collecting statistics wrt his own
> projects, and knowing how unable he has been to get access to DP site data
> by other means. He's been so unfailingly nice, polite, and obliging in his
> approach; but there's no apparent vandenberg-equivalent that I've seen
> helping him out.

rfrank IS one of the DP developers (at least on a number of tools,
pre- and post-processing. Not sure if he is working on the site code),
as well as being PM, PP, and possibly several other hats I don't
recall offhand. He's quite capable of helping himself out, or asking
db-req for something if it is not available via the current
interfaces.

R C

From dakretz at gmail.com  Mon Aug  4 21:11:53 2008
From: dakretz at gmail.com (don kretz)
Date: Mon, 4 Aug 2008 21:11:53 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 13
In-Reply-To: 
References: 
Message-ID: <627d59b80808042111h29eed1c5v96a147a27d426107@mail.gmail.com>

Lee at Novomail.net posted:


It seems to me that the /source/ of the errors is unimportant as
compared to the /quantity/ of errors. Of course, even more important is
the /quality/ of the errors; if the original errors are blatant, but the
introduced errors are subtle, it may make more sense to simply let the
the humans deal with the errors as they are more likely to catch the
blatant errors than the subtle ones.

So what do others think about this? (How's that for changing the subject?)

I hate to be the bearer of bad news, but, distressingly, you are 100% on topic.

That's exactly the proposal The Bird has been making. It's the
self-evident means to make
practical use of the tools jfrank has been working on. And also the
regexes that I've made
available. It's been the topic of many threads among the DP fora.
There is, I think, no
opposition (at least active opposition) from anyone at Distributed Proofreading.

But there's also no one who has both authority and willingness to
provide assistance
and direction in incorporating it into the site codebase. At least so
far. I don't *think*
it's because it hasn't occurred to the right people that the need
exists, and the resources
are just waiting for someonm like, say, John Vandenberg.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From schultzk at uni-trier.de  Mon Aug  4 23:52:27 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Tue, 5 Aug 2008 08:52:27 +0200
Subject: [gutvol-d] overcriticism
In-Reply-To: <87y73d6nvj.fsf@altern.org>
References:  <87bq0a9e5t.fsf@altern.org>
	<9c6138c50808031116g34ba8fa8i8cc2de1c96906d3b@mail.gmail.com>
	<98D597D0-7CF4-49A0-8769-0A2474CB12BB@uni-trier.de>
	<87y73d6nvj.fsf@altern.org>
Message-ID: 

Hi Bastien,

	First I would like to say I can not be sure.
	It depends on how it was written and I do not
	have the code.

	As far a your question is concerned his routines
	are bound into a program with a particular input
	and output format. If these are simple text files
	then it should be easy enough. Now if he use a
	GUI to interact with the user the there is definately
	problem as what he shows on the screen and where he gets
	it from is most likely to be something completely different
	then what the other system is using.

	The first thing I do when taking a program (software package)
	and integrating into another system is first throw out the
	user interface. Then analyze the routines of interest. Recreate
	the API for these routines to match my program. Finally, I
	adjust the routine to the new API. More often than not it is
	quicker to just take the method and write my own code.

	Hope this helps

	regards
		Keith.

Am 04.08.2008 um 13:06 schrieb Bastien:

> "Schultz Keith J."  writes:
>
>> 	They are not that hard to understand and integrate
>> 	into another system. Which would be easier than
>> 	integrating his software.
>
> Why would it be hard to integrate his software?
>
> I don't understand.
>
> -- 
> Bastien
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From Bowerbird at aol.com  Tue Aug  5 00:31:52 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 5 Aug 2008 03:31:52 EDT
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 13
Message-ID: 



   That's exactly the proposal The Bird has been making.




**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Tue Aug  5 00:54:07 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 5 Aug 2008 03:54:07 EDT
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 13
Message-ID: 

dakretz said:
>    That's exactly the proposal The Bird has been making.

um, no, sorry, 180 degrees not.

lee proposed a human-less change process.

that is exactly _not_ what i am proposing.

in the specific type of preprocessing that _i_ am proposing,
a human being making decisions is the _essential_element_
of the methodology.   it makes little sense to do it otherwise.

how many (hundreds of) posts do i have to make
before you get the basic essentials correct?   really.

again, as i told you before, as i have told you countless times,
you need an offline tool that can give you book-wide editing
combined with a scan-viewing capability as well as the function
of bringing specific problem-areas to the attention of the human.
who makes the decision.   to edit or not to edit, or which type of edit.

once you build that tool, you'll understand - deeply and intimately --
that you no longer want to do any script-based unreviewed changes
_ever_again_, and it will jack up your efficiency to the necessary level.

until then, you're doomed to live in the dumb land of misunderstanding.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From schultzk at uni-trier.de  Tue Aug  5 01:36:14 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Tue, 5 Aug 2008 10:36:14 +0200
Subject: [gutvol-d] ethics of noise (was Bastien)
In-Reply-To: <87ljzd6l5z.fsf@altern.org>
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
	
	
	<87ljzd6l5z.fsf@altern.org>
Message-ID: <8C245BA3-8AAA-4FC4-9186-56322A2ADD34@uni-trier.de>

Hi Bastien,

	Very nicely said, yet the argumentation is slightly
	flawed. This discussion is quite OT, but as long as
	it is allowed I will go along. It should be helpful.
	If necessitated we can go offlist.

Am 04.08.2008 um 14:04 schrieb Bastien:

> "Schultz Keith J."  writes:
>
>> 	Far as your free speech argument is concerned:
>> 	I do not understand ! Not listening to someone
>> 	only means that one does not care to contribute
>> 	to the speakers view or stand point.
>
> Let's say there are this two basic rights:
>
> R1: the right to voice one's opinions.
> R2: the right not to pay attention to other's opinion.
>
> Complementary to these rights, let's say that there are these  
> abilities
	A right to something encompasses the ability to do it.
>
> A1: the ability to express oneself
	A1 is the prerequisite to R1. This is a universal. How
	good one is at it, is another matter and a skill.

> A2: the ability of not paying attention to other's opinion
	A2 is the prerequisite to R2! But I consider this ability
	to be universal. There are very few situations where
	one is not able not to pay attention to the statements
	(opinions) of others that are legal. (eg. a court of law)

> A3: the ability to moderate oneself
	A3 belongs to A1 in so far that the command of this skill
	aides the sucess of A1. Furthermore, it aides the others
	in gaining understanding of an argument.

> A4: the ability to try to understand others opinion
	A4 is, as you have defined it, universal. One has the ability to
	TRY to understand. Mastering the skill of understanding or the
	ability to understand is another matter. A4 as formulated, is the  
result
	of NOT chosing to exercising R2!
>
> You can defend (as I do) R1 as an ethical absolute.
	As stated above A1 is universal and I consider R1 as
	universal, too. Though there are societies, in which
	exercising R1 would be dangerous.
>
> Usually, when people complain about the noise in a mailing list, they
> are not discussing R1 on a theoretical perspective, they are moving to
> pragmatical considerations and say: defending R1 is not enough, you  
> also
> have to promote A3 and A4 somehow.
	Here I definately disagree! R1 is a prerequisite of a mailing
	list per se. A3 is, as I have mentioned above, helpful and
	should be encouraged. A4 as you have formulated can not be promoted
	nor facilitated as one has to decide to use it. (the ability to TRY...)

>
> You can say "Okay, let's try." and try to wonder what are the  
> practical
> things you can _do_ to encourage A[1-4].
	An ability one can not encourage. Either one has it or not. Though
	the ability can be refined, but that is not what we are discussing.
	The question should be how do we encourage the use of R1 and NOT
	R2. A3 is very helpful in this case.



> You can also dismiss this pragmatical request and say: "No, there  
> is no
> need to promote A3 and A4 because you have R2, and this should be  
> enough
> to make the noise barable."
	The use of A4 contradicts R2 as mentiontioned above.
	Agreed, promoting the use of A3 is desirable. Whether we can help
	others to use it and refine it is another.
>
> The problem with this answer is that is places itself on a theoretical
> ground to address a pragmatical problem.  The complainers can go on  
> and
> say: "R2 is okay but it is useless until people have A2."
	As I have tried to show A2 universal and the prerequisite to R2.
	It is up to every list user to execrise R2 by using A2 (simply
	not reading the post or ignoring it)

>
> Etc, etc.
>
> There are two problems here: the first one is to always dismiss
> pragmatical requests by going back to principles.
>
> The other problem is that, in my opinion, there is no such thing as  
> R2.
> You could say: "There is a right of not getting wet when the rain  
> falls"
> but then you would misuse the word "right".
	The question is NOT the RIGHT of not getting wet, but am I able
	to not wet wet when it rains. I can choose to not get wet if
	want to. I certainly have the ability to do it !

>
> Promoting R2 as an ethically grounded so-called "right" is not only
> misleading, it's also discouraging people from getting A4 (first),  
> then
> A3, then (on a forum) A2 and finally A1.
	Here you lost me. Yet, fact is people have A2 (see above). Furthermore,
	if one choses A4 one you will not need A2 and R2! If one choses NOT to
	use A4 then A2 and R2 are needed!
>
> So while I agree R1 is an absolute pre-condition, I think A1 is the  
> real
> thing we shall try to achieve.
	As mentioned above A1 and R1 is the reason for a mailing list in the
	first place.

	What is needed is:
		A5 the ability to understand! Which is the direct result from
	           of using A4 and not using A2!

		A6 the ability to respect others opinions
		
		A7 the ability not to get personal

		A8 the ability to want others to understand

	We ALL have these abilities. Though I do admit that human nature and
	egos do get in the way of use of the aforementioned abilities A5-8!

	The noise that is created here is a direct result of not using A6  
and A7,
	not willing to accept an attack on ones arguments as being not  
personal.

	The fact is negative critic can be very helpful. Understanding this  
fact
	is very important. Positive critic is not always possible.

	regards
		Keith.

	

From schultzk at uni-trier.de  Tue Aug  5 01:43:19 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Tue, 5 Aug 2008 10:43:19 +0200
Subject: [gutvol-d] boring yet amusing
In-Reply-To: 
References: 
Message-ID: 


Am 04.08.2008 um 20:52 schrieb Bowerbird at aol.com:

> ricardo said:
> >   All these threads about Bowerbird
> >   are getting quite _boring_ actually.
>
> gosh, isn't _that_ the truth!
>
> they were getting quite boring years ago.
>
> ***
>
> keith said:
> >   Yes, he does not want to share his software!
> >   YET, HIS HAS SHARED HIS METHODOLOGIES.
>
> yes, it's kind of amusing, isn't it?
>
> if i _did_ hand them my code, they'd be asking me
> for pseudo-code instead, which is what i've been
> giving them all along.  like i said, it's very amusing.
>
> of course, it's also amusing how they want to insist
> on open source, but then confess that they are short
> on programmers who could actually re-work the code.
> open-source advocates take programmers for granted.
	Open-source does not need to be that way. You can still
	kept the copyright in form of a notice that the source is
	copyrighted property of such and such and must retrain a statement
	stating so and changes approriately label and maybe use
	otherwise as one sees fit.

	regards
		Keith.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From schultzk at uni-trier.de  Tue Aug  5 02:05:20 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Tue, 5 Aug 2008 11:05:20 +0200
Subject: [gutvol-d] Please change the subject
In-Reply-To: <4897A453.4090900@novomail.net>
References: <20080803070319.GA11777@ark.in-berlin.de>
	<4897A453.4090900@novomail.net>
Message-ID: 

Hi Lee,

Am 05.08.2008 um 02:52 schrieb Lee Passey:

[snip, snip]

> Presumably, all of the transformations that BowerBird has suggested  
> can
> easily be built into some sort of script such as sed, perl, php, even
> bash. The problem, of course, is that many of these transformations  
> are
> only heuristics, and will frequently lead to the wrong result; but
> probably not more often that they will lead to the correct result.
	This depends. Computer linguistics involves itself with
	these heuristics and has shown that they can be quite
	good. It is a matter of careful programming or setting up
	the proper rules.

>
> This uncertainty factor raises what I believe is an interesting
> question: if these heuristics are applied to a text /before/ it is
> exposed to human proofreaders, is there any reason /not/ to  
> automate the
> application of these heuristics, applying them to all texts before  
> they
> are fed into the DP mill, relying on subsequent human intervention to
> find the newly introduced errors?
	There is no reason not to. That is why I sugested using a parser.
	It can be far more intelligent. It can even identify situations
	in which it is unsure and flag this case as appropriate.
	As a mater of fact one could flag all possible errors even ones
	the parser is sure and have a human except or not. Much in the
	way a spell checker works.
	This would require a retooling of the interface and process somewhat.
	Which is why I assume that DP considers it as unfeasible.
	
>
> It seems to me that the /source/ of the errors is unimportant as
> compared to the /quantity/ of errors. Of course, even more  
> important is
> the /quality/ of the errors; if the original errors are blatant,  
> but the
> introduced errors are subtle, it may make more sense to simply let the
> the humans deal with the errors as they are more likely to catch the
> blatant errors than the subtle ones.
	I believe that is the argument. Furthermore, it is the reason
	for the suggestion of ineffieciency.

	As it stands now there seems no one willing to do the work due
	to diverse reasons.

>
> So what do others think about this? (How's that for changing the  
> subject?)
>
	We have been there and back again. Yet, we may still see progress.

	regards
		Keith.


From schultzk at uni-trier.de  Tue Aug  5 02:21:58 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Tue, 5 Aug 2008 11:21:58 +0200
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 13
In-Reply-To: 
References: 
Message-ID: 


Am 05.08.2008 um 09:54 schrieb Bowerbird at aol.com:

> dakretz said:
> >   That's exactly the proposal The Bird has been making.
>
> um, no, sorry, 180 degrees not.
>
> lee proposed a human-less change process.
>
> that is exactly _not_ what i am proposing.
>
> in the specific type of preprocessing that _i_ am proposing,
> a human being making decisions is the _essential_element_
> of the methodology.  it makes little sense to do it otherwise.
	At some point human intervention is definately needed the
	question is when and how to do it.

>
>
> how many (hundreds of) posts do i have to make
> before you get the basic essentials correct?  really.
>
> again, as i told you before, as i have told you countless times,
> you need an offline tool that can give you book-wide editing
> combined with a scan-viewing capability as well as the function
> of bringing specific problem-areas to the attention of the human.
> who makes the decision.  to edit or not to edit, or which type of  
> edit.
	This can be done online as well. No specific need to do it strictly  
offline.
	I do understand the cavets of having it done offline, but inorder to  
do this
	the resources have to downloaded to your computer. That is alot of  
megabytes
	that are not always necessary. One can have book-wide edits while  
not transfering
	everthing at once. A on demand basis would be preferable.

	As I have mentioned in other posts errors could be flag and given to  
the proofers
	for final acceptance.
	[snip, snip]

	regards
		Keith.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Tue Aug  5 09:34:59 2008
From: dakretz at gmail.com (don kretz)
Date: Tue, 5 Aug 2008 09:34:59 -0700
Subject: [gutvol-d] Error detection and maybe correction
Message-ID: <627d59b80808050934v25fb4c56tf4bd74a86a298ad8@mail.gmail.com>

The topic of data, rules, and user interfaces is a hot one in software these
days (and probably most days). And there's a generally accepted design
strategy that addresses it directly, and goes by the acronym MVC (for Model,
View, Controller). It's not the only model available, but it's been around a
long time (going at least back to the Xerox Parc Alto project, whence Apple
Mac, mice, networked apps, Windows, and a bunch of other current design
metaphors,) and it's currently the most popular one, I believe.

This is quick and off the top of my head, so the other software-aware folks
can (and hopefully will) pick out the holes.

To quote wikipedia,

*Model-view-controller* (*MVC*) is an architectural
patternused
in software
engineering . Successful
use of the pattern isolates business
logicfrom user
interface  considerations,
resulting in an application where it is easier to modify either the visual
appearance of the application or the underlying business
ruleswithout affecting the
other. In MVC, the
*model* represents the information (the data) of the application and the
business rules used to manipulate the data, the *view* corresponds to
elements of the user interface such as text, checkbox items, and so forth,
and the *controller* manages details involving the communication to the
model of user actions such as keystrokes and
mousemovements.

Good software isolates these from each other, both logically and physically.

In the arena of currently under discussion, the Model comprises the data in
the database, the page images, the text being reviewed, and the algorithms,
regexes, heuristics, statistics, etc. The View and the Controller are the
static and dynamic components of the user interface and processing flow.

The DP code liberally, intentionally intermngles all three of these. There
are data structures that are responsible for user interface elements. There
is little distinction between UI style and algorithms. And whatever
intentional rigor is embedded is not anywhere described or documented. How
consistent it is, is hard go guess, since any principles to be applied are
not evident.

So adding text evaluation rules in the current design involves a deep
familiarity with the complete, overall program corpus, because anything you
change in  one area can (and probably will) have unintended consequences.

By way of illustration (and not a fair comparison, since my example is such
a small application):

In the case of my regular expressions, this is pretty naturally the way it's
been done. They are what you will find in the file I posted, and whose link
is further back up this list. (It also contains some comments; but that's
it.) I use the regexes by opening that file in my editor, and then
simultaneously opening a text file to edit. Sequentially, incrementally
applying the rules to the text is then a manual process by which I can
selectively execute bulk changes (some of which impose several hundred
changes at once); or see a list of the changes that would occur if I were to
apply a bulk change; or I can skip from point to point through the text and
approve changes selectively, and only where appropriate.

One indication of the usefulness of this style is the fact that, based only
on my posting above, my regex rules file has been downloaded nine times over
the last several weeks, and it should be easy for other developers to make
use of them in various contexts.

But to make this kind of work generally useful, the DP site code, and the
people who control it, need to understand these issues, and be willing
(maybe eager) to collaborate with other developers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Tue Aug  5 10:30:11 2008
From: dakretz at gmail.com (don kretz)
Date: Tue, 5 Aug 2008 10:30:11 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 14
In-Reply-To: 
References: 
Message-ID: <627d59b80808051030p3ea3a3f2q4f4db2a1943a894f@mail.gmail.com>

---------- Forwarded message ----------
From: Bowerbird at aol.com
To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com
Date: Tue, 5 Aug 2008 03:54:07 EDT
Subject: Re: [gutvol-d] gutvol-d Digest, Vol 49, Issue 13
dakretz said:
>   That's exactly the proposal The Bird has been making.

um, no, sorry, 180 degrees not.

lee proposed a human-less change process.

that is exactly _not_ what i am proposing.

Maybe not, but your actions (detailing the rules) are agnostic about the
type or existence of a user interface. You know as well as anyone that I'm
an advocate of UIs - I've done two recently for managing the conveyance of
provenance files from DP to PG - one that's entirely browser-based (but
useless, because DP chooses not to install it); the other, more clumsy one
that requires the user to completely download all the project files to their
PC from DP, and then completely upload all of them to PG.

I agree that, for your algorithms, the attended model seems far more useful.
But (as with some of my regexes), even an unsupervised application would be
useful and beneficial in providing a more accurate text to start with.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Tue Aug  5 10:36:42 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 5 Aug 2008 13:36:42 EDT
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 13
Message-ID: 

keith said:
>    At some point human intervention is definately needed 
>    the question is when and how to do it.

and i've answered that question, clearly and unequivocally...

it should be done during preprocessing, before the text goes
in front of any proofers.   and "how" it should be done is with
a tool built for that purpose, working on a book-wide basis...


>    This can be done online as well.

well, it _can_.   but it's not _efficient_ to do it that way.
and the main reason it's not efficient is because you are
constantly summoning up scans to evaluate the text, so
if you have to wait for each of those scans to download,
then you might as well be doing this as an _offline_ task.
(you also re-save the text of individual pages frequently,
but the bandwidth on those exchanges is manageable...)


>    No specific need to do it strictly offline.

except if you value efficiency.   which is the key benefit here.


>    I do understand the cavets of having it done offline, 
>    but inorder to do this the resources have to downloaded 
>    to your computer. 

you have to download the resources to do the job either way.
(it's amazing how often we seem to forget this is the case...)

so it's far better to download all of the scans as one .zip file
so they live with the proper names right on your hard-drive,
rather than one at a time by the browser when "needed" and
living in a not-easily-addressed place in the browser cache.


>    That is alot of megabytes that are not always necessary.

eventually you will look at every single page, with certainty.
so there are never any megabytes that are "unnecessary".

so the question is whether you download the scans in a way
that you can reliably access them easily any time you want,
or if you don't...


>    One can have book-wide edits while not transfering
>    everthing at once. A on demand basis would be preferable.

you're wrong.

spectacularly wrong.

if you download the zipped scan-set, you don't even have to
sit there and wait for it, you can go outside and smoke a joint.
when you download the one at a time, you wait for each one;
even at 3-4 seconds each, it adds up to unnecessary minutes.


>    As I have mentioned in other posts errors could be 
>    flag and given to the proofers for final acceptance. 

you'd think that that would be a workable alternative...

but roger frank's efforts is this regard have bombed...

it ends up that putting stuff _into_ the text (i.e., flags)
that you expect proofers to then remove from the text
doesn't seem to be quite that wise in retrospect...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Tue Aug  5 11:04:30 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 5 Aug 2008 14:04:30 EDT
Subject: [gutvol-d] Error detection and maybe correction
Message-ID: 

dkretz said:
>    The topic of data, rules, and user interfaces is a hot one in software 
these days

i don't think you'll gain much with this theoretical gobbledygook.
you know the domain of your needs, so focus things at that level.


>    I use the regexes by opening that file in my editor, and 
>    then simultaneously opening a text file to edit. 

that will work...   but it's clumsy, not efficient.   especially since there 
is
no accommodation of the need to view the scan to make the decision.


>    Sequentially, incrementally applying the rules to the text is then 
>    a manual process by which I can selectively execute bulk changes 
>    (some of which impose several hundred changes at once); or see 
>    a list of the changes that would occur if I were to apply a bulk change; 

>    or I can skip from point to point through the text and approve changes 
>    selectively, and only where appropriate.

except for a few things -- most notably spacey-quotes -- i've shown
with "mountain blood" (and with other books in my own experiments)
that the number of changes turned up by each routine is rather small,
so approving changes "selectively, and only where appropriate" is really
the only way to go, especially since some routines do have false-alarms.

besides, it's not always the case that you can even _specify_ the change
that should be made.   if you go back and examine the various routines
that i reported on, you'll see lots of cases where i had to group the hits
into various categories that represented different changes to be made...
so you couldn't code a specific change for these instances anyway...

and honestly, i don't understand this desire to make changes _blindly_.
it makes no sense to me.   on the one hand, you have the proofers doing
an unguided search for errors, in the hope that they will spot all of 'em.
so don't people understand the appeal -- and the fantastic efficiency --
of having the computer lead you right to the errors?   why complain that
you still need to look at the error and determine if it really is an error?
don't complain because you have to do the edit "manually"; it's still easy.

of course, i know the reason...   the reason is that you don't have a tool
that gives you (1) whole-book edibility, (2) page-scan viewability, and
(3) a focusing mechanism that takes you directly to each suspected error.

once you _do_ have such a tool, you'll realize that operating any other way
is sheer stupidity, and it will leave your mind entirely.


>    But to make this kind of work generally useful, the DP site code, 
>    and the people who control it, need to understand these issues, 
>    and be willing (maybe eager) to collaborate with other developers.

again, this preprocessing is best done as an offline standalone task.
so there's no need for the d.p. site code to "understand these issues",
or for the people who control that code to understand them, or even
for those people to be interested to collaborate with other developers.

all you have to do is deliver them the tool, and people will start using 
it...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From klofstrom at gmail.com  Tue Aug  5 11:24:08 2008
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Tue, 5 Aug 2008 08:24:08 -1000
Subject: [gutvol-d] Error detection and maybe correction
In-Reply-To: <627d59b80808050934v25fb4c56tf4bd74a86a298ad8@mail.gmail.com>
References: <627d59b80808050934v25fb4c56tf4bd74a86a298ad8@mail.gmail.com>
Message-ID: <1e8e65080808051124y488466f8s4dea57627a23e823@mail.gmail.com>

On Tue, Aug 5, 2008 at 6:34 AM, don kretz  wrote:

> The DP code liberally, intentionally intermingles all three of these. There are data structures that are responsible for user interface elements. There is little distinction between UI style and algorithms. And whatever intentional rigor is embedded is not anywhere described or documented. How
consistent it is, is hard go guess, since any principles to be applied
are not evident.

DKretz, beloved DP contributor, says that DP code is a tangled mess.
DP contributor Bill Tozier has complained that the DP code base is a
mess.  Juliet, who heads up the DP PTB (powers that be) agrees that
the DP code is a mess. That's what you get when you have a shifting
group of volunteer programmers adding bits here and there over the
years and no Linus Torvalds to exercise iron control over the code.

It would be nice if we had a foundation grant, say, to hire a team to
completely rewrite the DP code base. It's hard to figure out what to
do lacking such a grant. Pointing accusing fingers at DP from a
distance and saying "Nyah nyah your code sucks" may make some people
happy, but it doesn't improve matters on the ground. Nor does telling
us what we *should do*, without any real familiarity with the DP code
or DP operations and without any intention of actually doing any of
the work.

--
Karen Lofstrom



--
Karen Lofstrom

From Bowerbird at aol.com  Tue Aug  5 11:25:17 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 5 Aug 2008 14:25:17 EDT
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 13
Message-ID: 

keith said:
>    >?? I do understand the cavets of having it done offline,
>    >?? but inorder to do this the resources have to downloaded
>    >?? to your computer.

i said:
>    you have to download the resources to do the job either way.
>    (it's amazing how often we seem to forget this is the case...)

oh yeah, i should have mentioned here that i believe this task of
preprocessing could (should?) be done by the content provider.

and of course, they've already _got_ the scans on their computer,
and the text too, so they don't have to download anything at all...

***

anyway, let's not make a big deal about downloading those scans.
my "banana cream" tool downloads a scan-set in the background
right while you're working on preprocessing the text, so it's fast...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Tue Aug  5 11:55:37 2008
From: dakretz at gmail.com (don kretz)
Date: Tue, 5 Aug 2008 11:55:37 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 15
In-Reply-To: 
References: 
Message-ID: <627d59b80808051155y71fe269l45e864aff2db1cc8@mail.gmail.com>

Mileage differs, but I've found that I have pretty high confidence in a good
number of my regex alterations, without needing to visit each one.

As a point of possible general interest, I've made available a zip file
containing before and after
versionsof the most
recent EB project I've worked on (about 250 pages). You can see
all the changes that were made before releasing the project for people to
work on it. With maybe a dozen exceptions, the changes were all found and
applied with the regexes in (the most recent version of) the regex file I've
posted at the same location.

There are any number of diff viewing tools available; I generally use
winmerge .

There are 1775 lines with one or more changes. (Be sure to include
whitespace differences, otherwise it ignores spaceyquotes.) The majority of
the changes are spaceyquotes (which will be found in most FineReader-output
projects), and number-related issues (which is more significant in EB than
most projects).

I would be comfortable that over half of the changes could be applied, even
if I didn't expect 5 more "sets of eyes" to be applied. But I wouldn't want
to suggest that anyone else would need to draw the same conclusion.

I've never included a final, post-DP version of the text. It would be
interesting to see, but I bet I've fixed well over half the errors that
would eventually be found.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Tue Aug  5 12:16:37 2008
From: dakretz at gmail.com (don kretz)
Date: Tue, 5 Aug 2008 12:16:37 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 16
In-Reply-To: 
References: 
Message-ID: <627d59b80808051216s21c3ee2fode23d16d090f51d3@mail.gmail.com>

It would be nice if we had a foundation grant, say, to hire a team to
completely rewrite the DP code base. It's hard to figure out what to do
lacking such a grant. Pointing accusing fingers at DP from a distance and
saying "Nyah nyah your code sucks" may make some people happy, but it
doesn't improve matters on the ground. Nor does telling us what we *should
do*, without any real familiarity with the DP code or DP operations and
without any intention of actually doing any of the work.


Maybe, but you have more confidence in the power of money to solve this
problem for us than I.

I think it's not boasting to say that I'm as familiar as anyone outside the
original developers with the existing DP codebase. At least familiar enough
to have spent several months' time developing and user-testing several major
(uninstalled) enhancements, including the user's projects list, a revised
diff viewer, and a complete reimplementation of the page editing interface;
all of which have been installed and user-tested on the development server.

I also happen to believe (based on personal contact in addition to postings
here and on the DP site) that the number of people able and willing to
"actually do any of the work" is larger than generally recogized.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From klofstrom at gmail.com  Tue Aug  5 12:58:19 2008
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Tue, 5 Aug 2008 09:58:19 -1000
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 16
In-Reply-To: <627d59b80808051216s21c3ee2fode23d16d090f51d3@mail.gmail.com>
References: 
	<627d59b80808051216s21c3ee2fode23d16d090f51d3@mail.gmail.com>
Message-ID: <1e8e65080808051258p573cafb8sb6058cda7c26d8a4@mail.gmail.com>

On Tue, Aug 5, 2008 at 9:16 AM, don kretz  wrote:

> I also happen to believe (based on personal contact in addition to postings here and on the DP site) that the number of people able and willing to "actually do any of the work" is larger than generally recognized.

Hmmm. Perhaps someone would be willing to lead a project to rewrite
the DP code base? Rebuild it from scratch? An entirely separate
project from the existing "introduce new features one by one" and
"patch the spit and bailing wire" efforts.

Of course, by saying this, I am doing exactly what I criticize (alas!)
since I can't code. My one and only C, in my entire college career,
was in second-semester introductory programming. So I'll mention
"possible DP code refactoring"  once and then shut up.

-- 
Karen Lofstrom

From Bowerbird at aol.com  Tue Aug  5 16:06:44 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 5 Aug 2008 19:06:44 EDT
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 15
Message-ID: 

dkretz said:
>    Mileage differs, but I've found that I have 
>   pretty high confidence in a good number of my 
>   regex alterations, without needing to visit each one.

i'm sure there are some in which you can have confidence,
even without examining instances.   but those are the least
interesting of the bunch.   and -- by themselves -- they will
_not_ take a text to perfection, even if they do _improve_ it.

i too, for instance, have found it's not necessary to monitor
the spacey-quote corrections.   because of the redundancies
in that routine, it very rarely makes an erroneous correction,
and even most of those are detected with follow-on routines.

so, yeah, sure, you can (and typically will) fix those "blindly"...

but again, those types of never-miss corrections are atypical.
and, more importantly, they will not take a text to perfection.

to get to perfection, you need the routines i've been listing...
(and a handful more, to account for glitches in other books.)

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Tue Aug  5 16:17:07 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 5 Aug 2008 19:17:07 EDT
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 16
Message-ID: 

dkretz said:
>    At least familiar enough to have spent several months' time 
>    developing and user-testing several major (uninstalled) enhancements

i can say what dkretz cannot:   "lack of developer time" is an _excuse_
by the "powers-that-be" for any change they don't want to implement.

and that's why his "several major enhancements" are still _uninstalled_.

d.p. could also modify its round-based system into a _roundless_ one
with the wave of a wand, by simply declaring a large number of rounds
(e.g., 21), and then marking each page as a "no-diff" if it was a "no-diff"
in the 2 rounds previous to the current round.   it would essentially mean
that the only pages that were proofed in a specific round were the pages
which had been changed in the round before, or the round before that...

and yes, this has been suggested over at d.p., and no, not just by me...

but the ears are deaf there when it's something they don't want to hear.
and the mouth repeats the standard line about "lack of developer time"...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From rfrank at pobox.com  Tue Aug  5 18:15:28 2008
From: rfrank at pobox.com (Roger Frank)
Date: Tue, 05 Aug 2008 19:15:28 -0600
Subject: [gutvol-d] limits on machine-only preprocessing
Message-ID: <4898FB30.6030806@pobox.com>

All of my preprocessing code, including those pieces that are based solely on 
regexs, run without human intervention. That's not the best of all ways to
preprocess. The more human oversight you can give to the process, the better.
I've had to leave out many automated checks that could have been easily
resolved with a quick look. Bowerbird, I believe, uses a regex to find what's
questionable, and then find the spot in the text and makes a judgement on what 
he sees. Whether that is preprocessing or proofing round zero doesn't matter.
It will always produce better results, both from the human decision making
and from the ability to include regexs that generate many false positives
that can be resolved by the pre-proofer or pre-processor person.

The question is one of efficiency. Over at DP I believe we are completely
out of "Newcomer's Only" books. Since I am the primary provider of those
books and the one to give the feedback, I'd like to get something in the
queue right quick. I can use the ususal guiprep code--that's fast and easy
and finds many things. I can use my cpprep code, which is getting better
all the time as I learn what should be fixed and what should be flagged.
That's quick, too. I could scan and start four books a day with either
of those options. If I take an approach when I check the errors, and if
I assume conservatively 300 errors (that's only one per page), then for
those same four books, I have to click through 1200 errors. I just don't
have the bandwidth for that much manual preprocessing.

Is the cpprep code better than guiprep? If you look at the project threads
of all the newcomer's projects I preprocess, it shows how many errors
were missed in P1. Since the type (and version) of the preprocessing) is
shown for each book on the project page, I can see that the later versions
of cpprep do more for the P1s that guiprep. Cpprep works and is getting
better, within the limits of an automated tool.

That said, I would like to devote some bandwidth to developing an open-source
tool that would easily present each questionable spot to the pre-proofer for
modification. How to do that? Online? That would make the users maintain
their connection; it would probably need to be written in PHP (unless someone
is willing to work with me on a DP site clone based on Ruby on Rails). But it
wouldn't require the user to install new software. It could be triggered right
after the upload to DPscans and before going to P1 waiting.

If it's an offline tool, there is more latitude. Guiguts has an embedded editor
written in Perl. I suppose I could go that way (about a third of my DP support
tools at fadedpage.com are written in Perl.) But I don't like to write in Perl;
I like to write in Ruby. My question for the list is this: is there a
framework of an editor that could be leveraged into something like
Bowerbird's suggested tool--image on the left, text on the right, heat maps,
whatever. Any language that we could ask a user to install (right now they
almost all have Perl and many have Ruby). I'm open to anything. Something in
FXRuby would make my day. I know it could be done in PHP because that's what
the proofing interface looks like and is based on.

Bottom line: my fully-automated cpprep tool fixes what it is sure of, flags
what it thinks is wrong, and leaves the rest to the P1's who do a good job
and seem to be comfortable with what they are given. Better than that would
be a way to get a human involved but it's not with any tool I know of now.
As I see it, the tool would be useful to content providers at DP and to
those that roll their own. I just wish that the "as I see it" part were
a clearer vision of what it would be.

I'm open to suggestions from anyone. Thanks.

--Roger

From bzg at altern.org  Tue Aug  5 18:19:04 2008
From: bzg at altern.org (Bastien Guerry)
Date: Tue, 05 Aug 2008 20:19:04 -0500
Subject: [gutvol-d] ethics of noise
In-Reply-To: <8C245BA3-8AAA-4FC4-9186-56322A2ADD34@uni-trier.de> (Schultz
	Keith J.'s message of "Tue, 5 Aug 2008 10:36:14 +0200")
References:  
	
	<1e8e65080808011233o112e0fc4m774021b354765c4e@mail.gmail.com>
	
	
	
	<87ljzd6l5z.fsf@altern.org>
	<8C245BA3-8AAA-4FC4-9186-56322A2ADD34@uni-trier.de>
Message-ID: <87abfraqk7.fsf@altern.org>

Hi Keith,

"Schultz Keith J."  writes:

> 	A right to something encompasses the ability to do it.

This is were we disagree.  

Maybe "ability" was not the right word.  What I really mean is
"capacity".  A right does not imply the capacity by itself, it just
implies an obligation from the society to bring this capacity to anyone
as much as possible.  But the as "as much as possible" is what prevents
the capacity from being directly encompassed in a right.

For example, the right to vote does not encompass the capacity to do 
it, it only implies that the society has a moral obligation to make it
possible for everyone to vote. 

> 	A1 is the prerequisite to R1. 

No.  A1 and R1 are not directly linked.  It wouldn't make sense to
stipulate R1 if we knew that A1 is not reachable *at all*.  Just as it
wouldn't make sense to stipulate R1 if A1 were always the case.  R1 is
there because A1 is not always the case, and to make sure the society
(or the institution standing for it) try to promote A1.

Anyway, my main point was just this: you cannot answer by just defending
a principle when people complain about the effective state of things.
Yes the principle is necessary but it's not sufficient.

Regards,

-- 
Bastien

From dakretz at gmail.com  Tue Aug  5 19:05:31 2008
From: dakretz at gmail.com (don kretz)
Date: Tue, 5 Aug 2008 19:05:31 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 16
In-Reply-To: <627d59b80808051216s21c3ee2fode23d16d090f51d3@mail.gmail.com>
References: 
	<627d59b80808051216s21c3ee2fode23d16d090f51d3@mail.gmail.com>
Message-ID: <627d59b80808051905w229c5021qb6e6d5b6448f1f0e@mail.gmail.com>

Roger,

Are you familiar with the Twister app? I have a version of it to which I've
added an editing window, that displays the text for any image you choose.
It's written in Flex, which is a free tool you can download from the Adobe
site.

It has two interchangeable editor windows available. One is the standard
one, equivalent to an html text box. The other is a "Rich Text" editor, with
a control bar for bold, italics, font selection, etc.

I'm not sure how challenging it will be for you to pick up. The underlying
language is an enhanced form of javascript. The Flex language is completely
OO, perhaps close enough to Ruby to match things up for you.

Twister is my maiden project, and I feel pretty comfortable with it.

I'll post the Twister source on google if you are interested, and give you
all the help you want. It might be advantageous to work together through a
common interface for several functional tools, and I'm already viewing it
for the purpose of preprocessing text.

Ultimately, we couldtarget the dp editor interface. Its written in
javascript already, and I'm intimately familiar with it (and with
refactoring and extending it.)

Another option is Silverlight, which also supports a set lf class libraries
accessible from javascript on a browser, plus all the .NET languages
(including a Ruby dialect, I believe.)

The benefit of both approaches is that they give you a single development
toolset for both browser-hosted and workstation development; and Flex is
platform-agnostic. All it needs is the Flash browser runtime, installed on
something like 99% of PCs via the common browsers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Wed Aug  6 02:19:44 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 6 Aug 2008 05:19:44 EDT
Subject: [gutvol-d] limits on machine-only preprocessing
Message-ID: 

wow, a non-defensive reply from roger frank.
what a positive development.   more to say later...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From rfrank at pobox.com  Wed Aug  6 04:07:00 2008
From: rfrank at pobox.com (Roger Frank)
Date: Wed, 06 Aug 2008 05:07:00 -0600
Subject: [gutvol-d] Twister, Flex
Message-ID: <489985D4.7040200@pobox.com>

Thanks, Don, for the lead on Flex. I'll spend some time with it. I
think it needs Adobe Flash 9 and I'm not sure if I can put that on
this Linux box. I see the Flex developer software is ~250$. Do you use
that and is it worth it? Also how can I (we) see the Twister app
as you've set it up?

--Roger

From dakretz at gmail.com  Wed Aug  6 07:48:17 2008
From: dakretz at gmail.com (don kretz)
Date: Wed, 6 Aug 2008 07:48:17 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 17
In-Reply-To: 
References: 
Message-ID: <627d59b80808060748h1b2c02e0i79b997f171e483f8@mail.gmail.com>

Karen,

I didn't want to leave this without a response. As far as I can see, your
suggestion is a good one, and the only one that I've seen that potentially
can be useful in possibly moving DP off dead-center. A number of us think
that the evidence leads to the conclusion that DP, as it currently is
constituted, somply won't scale - the model is incapable of supporting a
proofing rate greater than it currently delivers; and the current model
concentrates too much of the responsibility and effort on too few highly
dedicated but self-evidently overworked people. Not because they are bad
people, or behaving particularly badly, but it's too little resource spread
too thinly, and both those people and the work of the entire site are paying
too high a price.

I would be willing to work on a rewrite. As I've often stated, the rewrite
I've already done is available, but I'd as eagerly help with a completely
different model, using other languages and web resources. I used PHP, mySQL,
etc. primarily in the hope that the site could make use if it in its current
form. They aren't tools I've used elsewhere - I had to learn them from
scratch on my own (which can be its own reward if your personality is
appropriately warped. :) Something more like wikisource might be worth
considering.

I do think it would need the grudging acceptance and endorsement, if not the
enthusiastic participation, of the current DP administration, however; but I
think that's unlikely.



  From: "Karen Lofstrom" 
  To: "Project Gutenberg Volunteer Discussion" 
  Date: Tue, 5 Aug 2008 09:58:19 -1000
  Subject: Re: [gutvol-d] gutvol-d Digest, Vol 49, Issue 16
  On Tue, Aug 5, 2008 at 9:16 AM, don kretz  wrote:


  Hmmm. Perhaps someone would be willing to lead a project to rewrite
  the DP code base? Rebuild it from scratch? An entirely separate
  project from the existing "introduce new features one by one" and
  "patch the spit and bailing wire" efforts.

  Of course, by saying this, I am doing exactly what I criticize (alas!)
  since I can't code. My one and only C, in my entire college career,
  was in second-semester introductory programming. So I'll mention
  "possible DP code refactoring" once and then shut up.

  --
  Karen Lofstrom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From lee at novomail.net  Wed Aug  6 07:49:01 2008
From: lee at novomail.net (Lee Passey)
Date: Wed, 06 Aug 2008 08:49:01 -0600
Subject: [gutvol-d] limits on machine-only preprocessing
In-Reply-To: 
References: 
Message-ID: <4899B9DD.80104@novomail.net>

Bowerbird at aol.com wrote:

[A personal attack, totally without substance.]

If you ever wonder why people label you a troll, this is why. This 
message surpasses even your typically churlish behavior.


From dakretz at gmail.com  Wed Aug  6 09:09:30 2008
From: dakretz at gmail.com (don kretz)
Date: Wed, 6 Aug 2008 09:09:30 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 17
In-Reply-To: <627d59b80808060748h1b2c02e0i79b997f171e483f8@mail.gmail.com>
References: 
	<627d59b80808060748h1b2c02e0i79b997f171e483f8@mail.gmail.com>
Message-ID: <627d59b80808060909wcb7d95rd2cb484a49be5a1a@mail.gmail.com>

Roger,

If you have any difficulty installing on Linux, let me know. It's supposed
to be completely portable (with the appropriate flash runtime). But I'm a
lot less interested in using flex or flash if it's not.

The only non-free-and-open-source piece should be Flex Builder; and you can
use that free for 30 days. Flex itself has a command-line compiler. There's
also a free design IDE, FlashDevelop, which works with Flex.

I'll compile and upload a copy of twister with the (incompletely implemented
- doesn't save your work) editor window. I'll put up the source as well. It
will be at the same location where I posted the regexes. I'll post when it's
there - probably later today or early tomorrow.

Google uses subversion to store source code.

Thanks, Don, for the lead on Flex. I'll spend some time with it. I
think it needs Adobe Flash 9 and I'm not sure if I can put that on
this Linux box. I see the Flex developer software is ~250$. Do you use
that and is it worth it? Also how can I (we) see the Twister app
as you've set it up?

--Roger










>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Wed Aug  6 11:01:33 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 6 Aug 2008 14:01:33 EDT
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 12
Message-ID: 

dkretz said:
>    I myself would love to see proposals on how 
>   to break through this wearisome impasse. 
>   I can't think of any. The only approach that 
>   seems to gain any traction at all, much to 
>   everyone's chagrin and regret, appears to 
>   have been that of Mr. Bird.

before we segue into a more positive direction,
hallelujah!, let me comment on this statement...

i don't know -- or much care -- exactly how
don intended this statement when he wrote it,
but i'm sure someone out there will interpret it
as saying that bowerbird is the asshole, and his
influence (to the extent that he has any) is what
causes "us good d.p. people" chagrin and regret.

so let's focus a spotlight of truth on that thought.

the reason this topic stayed on the table so long is
due to the tenacity i have displayed for it to persist,
because i've _known_ that it's an important concern.

the "people at d.p." -- or, more correctly, a smallish
subset of them -- are the ones who are responsible
for making this interaction turn into long-term ugly.

so any "chagrin and regret" must be blamed on them,
not scapegoated onto me.

i did what needed to be done, and did it in the face of
personal vilification that was completely unnecessary,
and reflects very poorly on the people dishing it out...

i'm not telling you here that i am some kind of "hero".
what i _am_ saying though is that, if the current drift
of positivity manifests itself into better tools and an
intelligent workflow at d.p., then _i_ will have been the
passerby who pulled the baby out of the burning house.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Wed Aug  6 15:50:21 2008
From: dakretz at gmail.com (don kretz)
Date: Wed, 6 Aug 2008 15:50:21 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 18
In-Reply-To: 
References: 
Message-ID: <627d59b80808061550h85b0b0fqcc505ddc23c5dd8d@mail.gmail.com>

rfrank (and anyone else interested) --

I've compiled a variant of the Twister tool with an editor window added. It's
called TwistEd.  Just browse
to a directory containing image and .txt files with matching names.

It's not impressive - mostly still the old app, but there is an edit window,
and it should load the page text into it (after you select a page in the
files list tab.)

I've also put the source code files (along with a lot of crufty files) on
the same site under"Source".
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From vlsimpson at gmail.com  Wed Aug  6 17:41:38 2008
From: vlsimpson at gmail.com (V. L. Simpson)
Date: Wed, 6 Aug 2008 19:41:38 -0500
Subject: [gutvol-d] Bowerbir OCR steps
Message-ID: 

I've gathered up BB's OCR steps into one single file and attached some
some regular expressions for the easy parts.

http://vls.freeshell.org/dpfiles/textchecks.txt

I'm no regexp expert so if  any one else has improvements let me me know.

Vance

From lee at novomail.net  Wed Aug  6 18:54:16 2008
From: lee at novomail.net (Lee Passey)
Date: Wed, 06 Aug 2008 19:54:16 -0600
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 13
In-Reply-To: <627d59b80808042111h29eed1c5v96a147a27d426107@mail.gmail.com>
References: 
	<627d59b80808042111h29eed1c5v96a147a27d426107@mail.gmail.com>
Message-ID: <489A55C8.5080705@novomail.net>

don kretz wrote:

>> Lee at Novomail.net posted:
>> 
>> 
>> It seems to me that the /source/ of the errors is unimportant as 
>> compared to the /quantity/ of errors. Of course, even more 
>> important is the /quality/ of the errors; if the original errors 
>> are blatant, but the introduced errors are subtle, it may make more
>>  sense to simply let the the humans deal with the errors as they 
>> are more likely to catch the blatant errors than the subtle ones.

[snip]

> That's exactly the proposal The Bird has been making.

As he pointed out, this is /not/ the proposal he is making. In fact,
it's not the proposal /I'm/ making; I'm just asking the question: is it
preferable/acceptable/inadvisable to automate some error correction,
even if it introduces new errors, and then pass the resulting text to
the unwashed proofreading masses without flagging the altered text? Explain.

> It's the self-evident means to make practical use of the tools jfrank
> has been working on. And also the regexes that I've made available.
> It's been the topic of many threads among the DP fora. There is, I 
> think, no opposition (at least active opposition) from anyone at 
> Distributed Proofreading.

Generally, I'm reluctant to talk about what DP should, or should not,
do, here in a PG forum. My interest here is /not/ in DP's practices, but
rather what practices should be adopted as Best Known Practices; whether
DP chooses to adopt BKP does not interest me. Let me, however, make a
few comments prompted by your reply.

I suspect that the preprocessing tools you and Mr. Frank have been
working on may be inappropriate for inclusion in DP's web-based
workflow. I agree with bb that this type of tool should be used on a
document-wide basis. And if what you're going to do is document-wide, I
think that neither web-based tools, nor a web interface, is the right
approach; I would recommend C, C++ or some other programming language.
You might be able to cobble something together using piped scripts;
it might suffer from performance issues, but it may be more flexible in
adding new tests.

In any case, as bb pointed out, you're talking about a tool designed for
a submitter or project manager to use to preprocess a text before it is
fragmented into the DP "one page/one picture" paradigm, so it is
reasonable to assume that the individual who is using the tool has
access to the entire text without having to download it, and who also
possesses a computer powerful enough to have created or manipulated the
text in the first place. A web interface seems to me to be a solution in
search of a problem.

> But there's also no one who has both authority and willingness to 
> provide assistance and direction in incorporating it into the site 
> codebase.

As I mentioned, it seems to me that this kind of tool need not be
incorporated into the site codebase at all. Simply write it, make it
available for download, and encourage submitters to use it. (This also
has the beneficial side effect that it is available to text-processors
who have chosen not to be associated with DP). Because this type of
program falls outside of DP's "one page/one picture" paradigm, it
probably does not fit well into any of the existing codebase, but at the
same time it is not inconsistent with it either.

[remainder snipped]




From grythumn at gmail.com  Wed Aug  6 19:18:03 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Wed, 6 Aug 2008 22:18:03 -0400
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 13
In-Reply-To: <489A55C8.5080705@novomail.net>
References: 
	<627d59b80808042111h29eed1c5v96a147a27d426107@mail.gmail.com>
	<489A55C8.5080705@novomail.net>
Message-ID: <15cfa2a50808061918m193749dga4ed69e6d6cdccc8@mail.gmail.com>

On Wed, Aug 6, 2008 at 9:54 PM, Lee Passey  wrote:
> I suspect that the preprocessing tools you and Mr. Frank have been
> working on may be inappropriate for inclusion in DP's web-based
> workflow. I agree with bb that this type of tool should be used on a
> document-wide basis. And if what you're going to do is document-wide, I
> think that neither web-based tools, nor a web interface, is the right
> approach; I would recommend C, C++ or some other programming language.
> You might be able to cobble something together using piped scripts;
> it might suffer from performance issues, but it may be more flexible in
> adding new tests.

http://home.comcast.net/~thundergnat/guiprep.html

Rather than reinvent the wheel why don't you look at guiprep? It can
definitely use some tuning for LOTE* and some of the functionality is
depreciated (we don't use OCR markup extraction at all anymore.) It
has some troubles... don't try to run it while Abbyy Finereader has
the png directory locked, for example... but it cleans up a lot of the
more common stuff. Oh, and make sure you turn off the long-S handling
unless you really need it.

R C
(*Languages other than English)

From hyphen at hyphenologist.co.uk  Wed Aug  6 23:30:04 2008
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Thu, 7 Aug 2008 07:30:04 +0100
Subject: [gutvol-d] Bowerbir OCR steps
In-Reply-To: 
References: 
Message-ID: <000001c8f857$08410060$18c30120$@co.uk>



V. L. Simpson Wrote

>I've gathered up BB's OCR steps into one single file and attached some
>some regular expressions for the easy parts.

>http://vls.freeshell.org/dpfiles/textchecks.txt

>I'm no regexp expert so if  any one else has improvements let me me know.

>Vance

Assuming that they work as intended, almost all these can occur 
occasionally in real text. Notably in mixed prose with poetry.
Thus require a human to check if they are OCR mistakes or intended.

As an example:
16. Double-line-end followed by lowercase
Common in poetry

A few do not work on non English text.

Dave F



From schultzk at uni-trier.de  Thu Aug  7 00:02:12 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 7 Aug 2008 09:02:12 +0200
Subject: [gutvol-d] Error detection and maybe correction
In-Reply-To: <1e8e65080808051124y488466f8s4dea57627a23e823@mail.gmail.com>
References: <627d59b80808050934v25fb4c56tf4bd74a86a298ad8@mail.gmail.com>
	<1e8e65080808051124y488466f8s4dea57627a23e823@mail.gmail.com>
Message-ID: 

I will step in here.

	First Thanx to Don for his technical description of
	a part of stofware engineering.

	It is good programming style to seperate data and user
	interaction. The model involved for driving a programm
	depends a the task. MVC is very well applicable to
	DP as far a I can see.

	The problem is that for MVC to work well you need somekind
	of framework to help in this using this model. DP laks this,
	but it is no fault of theirs and would be overkill to develope
	one. Using existenting frameworks is also not a route to take
	as DP will lose to much flexibility and platform independance.

	What is needed is a design that modularizing the data-processing and  
user
	interaction. This means very good interfaces (APIs) have to be  
developed
	and developers required to adhere to these rules. Not very hard to do.
	Furthermore a system for documentation must be instated to allow for
	developers to easily grasp the structure of the DP system and program
	appropriately.

	Achieving the above is a very large task and becomes even larger  
with the
	lack of documentation.

	Being involved in similar task, I would suggest the the DP code base  
be devolped
	(engineered) from the beginning. That is:
		1) specification the work flow in DP
		2) specification of the individual proccesses
		3) reimplementation of the parts of the work flow
		4) transition from the old code base to the new one.

	1 and 2 must be done first. 3 and 4 is a somewhat dynamic process  
and are
	interchangable.

	This socalled theoretical googlygop, especially 1 and 2 is very  
important
	inorder to coordinate everything. I can help in this and even non- 
developers
	or non-programmers can participate. They are more than often helpful  
as they
	tend to bring up points which are not evident to the developers.  
Furthermore,
	1 and 2 are important for the developers so that they can continue  
if somebody
	drops out or comes in, or takes a break. All one needs to do is  
stick to the
	specification.

	Last but not least anyone would be pleased to be paid. Yet, if this  
dogmatical
	change in developing is accepted at DP then I do not see any need  
for it.
	BTW, DP can keep on developing their current code base during the  
transition.
	This code will help things alot.

	regards
		Keith.

Am 05.08.2008 um 20:24 schrieb Karen Lofstrom:

> On Tue, Aug 5, 2008 at 6:34 AM, don kretz  wrote:
>
>> The DP code liberally, intentionally intermingles all three of  
>> these. There are data structures that are responsible for user  
>> interface elements. There is little distinction between UI style  
>> and algorithms. And whatever intentional rigor is embedded is not  
>> anywhere described or documented. How
> consistent it is, is hard go guess, since any principles to be applied
> are not evident.
>
> DKretz, beloved DP contributor, says that DP code is a tangled mess.
> DP contributor Bill Tozier has complained that the DP code base is a
> mess.  Juliet, who heads up the DP PTB (powers that be) agrees that
> the DP code is a mess. That's what you get when you have a shifting
> group of volunteer programmers adding bits here and there over the
> years and no Linus Torvalds to exercise iron control over the code.
>
> It would be nice if we had a foundation grant, say, to hire a team to
> completely rewrite the DP code base. It's hard to figure out what to
> do lacking such a grant. Pointing accusing fingers at DP from a
> distance and saying "Nyah nyah your code sucks" may make some people
> happy, but it doesn't improve matters on the ground. Nor does telling
> us what we *should do*, without any real familiarity with the DP code
> or DP operations and without any intention of actually doing any of
> the work.
>
> --
> Karen Lofstrom
>
>
>
> --
> Karen Lofstrom
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From hyphen at hyphenologist.co.uk  Thu Aug  7 00:33:35 2008
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Thu, 7 Aug 2008 08:33:35 +0100
Subject: [gutvol-d] PG got a mention on BBC
In-Reply-To: <15cfa2a50808061918m193749dga4ed69e6d6cdccc8@mail.gmail.com>
References: 	<627d59b80808042111h29eed1c5v96a147a27d426107@mail.gmail.com>	<489A55C8.5080705@novomail.net>
	<15cfa2a50808061918m193749dga4ed69e6d6cdccc8@mail.gmail.com>
Message-ID: <000001c8f85f$e808ede0$b81ac9a0$@co.uk>

The flagship program, Today, on BBC (British Broadcasting Corporation) Radio
4 today.
Gave a mention to PG in a piece on e-book readers.

Dave F


From schultzk at uni-trier.de  Thu Aug  7 00:40:26 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 7 Aug 2008 09:40:26 +0200
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 13
In-Reply-To: 
References: 
Message-ID: 


Am 05.08.2008 um 19:36 schrieb Bowerbird at aol.com:

> keith said:
> >   At some point human intervention is definately needed
> >   the question is when and how to do it.
>
> and i've answered that question, clearly and unequivocally...
>
> it should be done during preprocessing, before the text goes
> in front of any proofers.  and "how" it should be done is with
> a tool built for that purpose, working on a book-wide basis...
	Just for the sake of arguement. The User interaction inside of  
preprocessing
	is PROOFING the preprocessing.  So does it really matter who or when
	the initial check of the preprocessing is done. Agreed, it may be to  
much
	much imformation for the PROOFER at the time of the "proofing cycle".
	I would dicuss this more in detail, but unless things get set-up  
differently
	this dicussion would be just of theoretical value.

>
>
>
> >   This can be done online as well.
>
> well, it _can_.  but it's not _efficient_ to do it that way.
> and the main reason it's not efficient is because you are
> constantly summoning up scans to evaluate the text, so
> if you have to wait for each of those scans to download,
> then you might as well be doing this as an _offline_ task.
> (you also re-save the text of individual pages frequently,
> but the bandwidth on those exchanges is manageable...)
	Agreed. Yet, this model assumes only one preprossor. Again, as stated
	above dicussing this further is only of philosophical value.


>
>
>
> >   No specific need to do it strictly offline.
>
> except if you value efficiency.  which is the key benefit here.
>
>
> >   I do understand the cavets of having it done offline,
> >   but inorder to do this the resources have to downloaded
> >   to your computer.
>
> you have to download the resources to do the job either way.
> (it's amazing how often we seem to forget this is the case...)
	I have not forgotten this fact. The fact remains that prepocessing an
	entire book will take time. Furthermore, it is most  likely to be done
	by others as well. So that certain changes have been made shall we
	say over night (for whatever reasons). These changes to not need be
	done.  The question of the scanner images is a interresting problem.

>
>
> so it's far better to download all of the scans as one .zip file
> so they live with the proper names right on your hard-drive,
> rather than one at a time by the browser when "needed" and
> living in a not-easily-addressed place in the browser cache.
	I do not see it that way. It all depends on the implementation.
	Until we have an actual tool,  whether browser based or not.
	Also, your approach advocates several passes (at least as far
	as you have described it). Mine would use one.


>
> >   That is alot of megabytes that are not always necessary.
>
> eventually you will look at every single page, with certainty.
> so there are never any megabytes that are "unnecessary".
	Only if you assume one preprocessor does the whole book.
	I believe it is very well possible to do it otherwise.
>
>
> so the question is whether you download the scans in a way
> that you can reliably access them easily any time you want,
> or if you don't...
	I am glad we agree at this. Yet, as you have said EASILY,  [WHEN]  
YOU WANT.
>
>
>
> >   One can have book-wide edits while not transfering
> >   everthing at once. A on demand basis would be preferable.
>
> you're wrong.
>
> spectacularly wrong.
>
> if you download the zipped scan-set, you don't even have to
> sit there and wait for it, you can go outside and smoke a joint.
> when you download the one at a time, you wait for each one;
> even at 3-4 seconds each, it adds up to unnecessary minutes.
	I think you have misunderstood my process. The OCR is already done,
	and the "Flagging"  of proposed errors is already done. The only thing
	the preprocessing proofer needs to do is accept, discarded, or alter  
the
	proposed change. Sorry, for the confusion.
	
>
> >   As I have mentioned in other posts errors could be
> >   flag and given to the proofers for final acceptance.
>
> you'd think that that would be a workable alternative...
>
> but roger frank's efforts is this regard have bombed...
>
> it ends up that putting stuff _into_ the text (i.e., flags)
> that you expect proofers to then remove from the text
> doesn't seem to be quite that wise in retrospect...
	Are you basically, doing the same thing. All flag means
	make to the preprocessor proofer (editor) aware of here is a
	problem. So what is the REAL difference in the end result.
	Are you suggesting that spell checkers and grammar checkers
	do not work ?

	regards
		Keith.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From schultzk at uni-trier.de  Thu Aug  7 00:50:00 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 7 Aug 2008 09:50:00 +0200
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 13
In-Reply-To: 
References: 
Message-ID: 

I am wondering why you are changing the context!!
I will not bite.

It is just like talking about how to cook a rump steak and then
say I cook my rump roast differently.

Sorry ! ;-))

regards
	Keith.

Am 05.08.2008 um 20:25 schrieb Bowerbird at aol.com:

> keith said:
> >   >   I do understand the cavets of having it done offline,
> >   >   but inorder to do this the resources have to downloaded
> >   >   to your computer.
>
> i said:
> >   you have to download the resources to do the job either way.
> >   (it's amazing how often we seem to forget this is the case...)
>
> oh yeah, i should have mentioned here that i believe this task of
> preprocessing could (should?) be done by the content provider.
>
> and of course, they've already _got_ the scans on their computer,
> and the text too, so they don't have to download anything at all...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From schultzk at uni-trier.de  Thu Aug  7 01:03:34 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 7 Aug 2008 10:03:34 +0200
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 16
In-Reply-To: <1e8e65080808051258p573cafb8sb6058cda7c26d8a4@mail.gmail.com>
References: 
	<627d59b80808051216s21c3ee2fode23d16d090f51d3@mail.gmail.com>
	<1e8e65080808051258p573cafb8sb6058cda7c26d8a4@mail.gmail.com>
Message-ID: <85149B17-35FD-4107-9F0C-9DBF78C6B126@uni-trier.de>

Hi Karen,

	I will truely think about it.

Am 05.08.2008 um 21:58 schrieb Karen Lofstrom:

> On Tue, Aug 5, 2008 at 9:16 AM, don kretz  wrote:
>
>> I also happen to believe (based on personal contact in addition to  
>> postings here and on the DP site) that the number of people able  
>> and willing to "actually do any of the work" is larger than  
>> generally recognized.
>
> Hmmm. Perhaps someone would be willing to lead a project to rewrite
> the DP code base? Rebuild it from scratch? An entirely separate
> project from the existing "introduce new features one by one" and
> "patch the spit and bailing wire" efforts.
	No patching will not do. Transitions is needed. That is e.g.
	Get the prepocessing up and working. Insert it into the workflow.
	then go on to the next task.

	There will have to be alot of cooperation. So that both projects
	are in sync and the transition goes on without disrupting the work
	at DP. Also, I would hate to disgruntle those developing at DP.
	Furthermore, parts of there code will be need. It will possibly
	save alot of time.
>
> Of course, by saying this, I am doing exactly what I criticize (alas!)
> since I can't code. My one and only C, in my entire college career,
> was in second-semester introductory programming. So I'll mention
> "possible DP code refactoring"  once and then shut up.
	Karen, no need to shut up what you have said makes sense!
	I find it good to have someone around that is opened minded.
	If if there is a difference in opions or details.
	I at least am not perfect.

	regards
		Keith.


From schultzk at uni-trier.de  Thu Aug  7 01:19:30 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 7 Aug 2008 10:19:30 +0200
Subject: [gutvol-d] limits on machine-only preprocessing
In-Reply-To: <4898FB30.6030806@pobox.com>
References: <4898FB30.6030806@pobox.com>
Message-ID: <981D3B6A-113E-49C0-879F-607057D6A2CB@uni-trier.de>

Hi Roger,

	I hate to mention it. but Java would give you what you
	want. Yeh, it will be tricking getting everything together,
	calling the different processes to this and that done and
	keeping things in sync. Yet, workable.

	Another route would be using tcl/gtk.

	regards
		Keith.


	
Am 06.08.2008 um 03:15 schrieb Roger Frank:

>
> That said, I would like to devote some bandwidth to developing an  
> open-source
> tool that would easily present each questionable spot to the pre- 
> proofer for
> modification. How to do that? Online? That would make the users  
> maintain
> their connection; it would probably need to be written in PHP  
> (unless someone
> is willing to work with me on a DP site clone based on Ruby on  
> Rails). But it
> wouldn't require the user to install new software. It could be  
> triggered right
> after the upload to DPscans and before going to P1 waiting.
>
> If it's an offline tool, there is more latitude. Guiguts has an  
> embedded editor
> written in Perl. I suppose I could go that way (about a third of my  
> DP support
> tools at fadedpage.com are written in Perl.) But I don't like to  
> write in Perl;
> I like to write in Ruby. My question for the list is this: is there a
> framework of an editor that could be leveraged into something like
> Bowerbird's suggested tool--image on the left, text on the right,  
> heat maps,
> whatever. Any language that we could ask a user to install (right  
> now they
> almost all have Perl and many have Ruby). I'm open to anything.  
> Something in
> FXRuby would make my day. I know it could be done in PHP because  
> that's what
> the proofing interface looks like and is based on.
>
> Bottom line: my fully-automated cpprep tool fixes what it is sure  
> of, flags
> what it thinks is wrong, and leaves the rest to the P1's who do a  
> good job
> and seem to be comfortable with what they are given. Better than  
> that would
> be a way to get a human involved but it's not with any tool I know  
> of now.
> As I see it, the tool would be useful to content providers at DP  
> and to
> those that roll their own. I just wish that the "as I see it" part  
> were
> a clearer vision of what it would be.
>
> I'm open to suggestions from anyone. Thanks.
>
> --Roger
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From Morasch at aol.com  Thu Aug  7 02:32:29 2008
From: Morasch at aol.com (Morasch at aol.com)
Date: Thu, 7 Aug 2008 05:32:29 EDT
Subject: [gutvol-d] Bowerbir OCR steps
Message-ID: 

vlsimpson said:
>    I've gathered up BB's OCR steps 

ok, do yourself a favor, and don't attach my name to these...

first, there's no need, because you'd already collected them all.

second, since y'all have reinforced the dogs of d.p. to snap at
slightest mention of me, they won't use a tool named after me.
and you want them to use this tool, so pick another name...

and third, because these rules are just a pass at 1/3 of the tool,
and it isn't even the most important third, and although i _am_
willing to guide the design of the tool, it must be programmed
by people other than me, so name the tool after _those_ people.

***

dave said:
>    Thus require a human to check if they are OCR mistakes or intended.

dave has either been sleeping in the back of the classroom, or
has not been reading along lately, as he obviously doesn't know
that this matter has been decided -- once and for all -- decidedly.

yes, dave, you're absolutely right, a human being is required.   thank you.

-bowerbird

p.s.   use my name in your subject any time.   don't even have to spell it 
right...



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From marcello at perathoner.de  Thu Aug  7 02:43:46 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu, 07 Aug 2008 11:43:46 +0200
Subject: [gutvol-d] woman in her own right -- 008 (and final)
In-Reply-To: 
References: 	
	
Message-ID: <489AC3D2.2080406@perathoner.de>

Michael Hart wrote:

> Since there is not substantive content to this purported
> "civil response" it still falls into the FLAME category.

Wrong.

"Flaming is the hostile and insulting interaction between Internet 
users."  ---- http://en.wikipedia.org/wiki/Flaming_(internet)


> This puts you on the list for being banned if and when a
> situation arises if/when this list supports censorship.

Stop being ridiculous.


> Not that I have any particular interest in defending all
> the posts by bowerbird, but I feel I should point out to
> the concerned parties that he has been set up to quite a
> significant degree by "tag team flamers" who alternate a
> series of message carefully gauged to antagonize him

Stop being paranoid.

He has been antagonized by many people because many people (who actually 
*do* some work for PG) disagree with Bowerbirds ideas. If Bowerbird 
actually *did* some work for PG he would soon find out his ideas don't 
work in real life. Sadly, doing some useful work is not in BB's style.




-- 
Marcello Perathoner
webmaster at gutenberg.org


From marcello at perathoner.de  Thu Aug  7 02:55:46 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu, 07 Aug 2008 11:55:46 +0200
Subject: [gutvol-d] !@! Re:  woman in her own right -- 008 (and final)
In-Reply-To: 
References: 			<4891EF2C.7020309@xs4all.nl>	
	
Message-ID: <489AC6A2.5090507@perathoner.de>

Michael Hart wrote:

> However, flames are flames, and should be pointed out by an
> internet listserv moderator, which I have done.

You are *not* a moderator for this list.

So your pointing out messages as flames is just flaming.


> You are ALL welcome to start your own listsevers at expense
> to be 100% defrayed by Project Gutenberg.

DP has done this long ago. No need to do that twice.

Created own lists. Created own structures. Incorporated. People who 
really want to work for ebooks go to DP anyway.

Not the point.

The point is that many new people subscribe this list and immediately 
get wrong ideas about the working climate at PG.

By letting Bowerbird inflame this list, we lose a lot of potetial 
volunteers.


> So, once again, I simply point out that if you don't want a
> contact with bowerbird. . .which you all SAY. . .all you do
> is start your own listserver and don't let him in, or use a
> heavy hand on "moderation" if you do let him in.

Are you sending hundreds of people away because one person cannot 
behave? Stop being ridiculous.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From marcello at perathoner.de  Thu Aug  7 03:07:40 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu, 07 Aug 2008 12:07:40 +0200
Subject: [gutvol-d] bastien
In-Reply-To: 
References:  
	
Message-ID: <489AC96C.4030807@perathoner.de>

Michael Hart wrote:

> "Where they censor people,
> they eventually censor books."


You got your quote wrong ... again. Still don't use Google?


In his 1821 play, Almansor, the German writer Heinrich Heine ? referring 
to the burning of the Muslim holy book, the Qur'an, during the Spanish 
Inquisition ? famously wrote:

     ?Where they burn books, so too will they in the end burn human 
beings.? (?Dort, wo man B?cher verbrennt, verbrennt man auch am Ende 
Menschen.?)

---- http://en.wikipedia.org/wiki/Book_burning#Historical_background



-- 
Marcello Perathoner
webmaster at gutenberg.org


From Bowerbird at aol.com  Thu Aug  7 03:16:50 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 7 Aug 2008 06:16:50 EDT
Subject: [gutvol-d] twisted (bravo)
Message-ID: 

dakretz said:
>    I've compiled a variant of the Twister tool with an editor window added.

congratulations.   you've taken a huge stride that will forever change your 
vision.


>    browse to a directory containing image and .txt files with matching 
names.

close, but not quite...   those .txt files should be combined into a single 
file.
remember, we're working on a book-wide basis; that's our unit of analysis.
on input, your tool splits the text up into pages for their individual 
display...

other than that, there's some online/offline/bothline concerns at play here.

but more on that in a much more extensive post to come.

for now, just solidify the text-file.   after that, bravo!

-bowerbird

p.s.   i'll let you know what i think of the actual app when i test it out 
tomorrow.
but intellectually, what you've done is jumped the snake river on a 
motorcycle.

p.p.s.   fastest return on your programming investment next is a "find" 
routine.
search the text of the current page through subsequent pages until the last,
and then cycle back from the first to the current.   it's very powerful 
navigation,
and it's an easy "tell me if this string is contained in that string" thing 
to code...



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From schultzk at uni-trier.de  Thu Aug  7 03:18:12 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 7 Aug 2008 12:18:12 +0200
Subject: [gutvol-d] woman in her own right -- 008 (and final)
In-Reply-To: <489AC3D2.2080406@perathoner.de>
References: 	
	
	<489AC3D2.2080406@perathoner.de>
Message-ID: <597F8C23-FB8A-46E6-815C-C0DB52AAB4DD@uni-trier.de>

Hi Marcello,

Am 07.08.2008 um 11:43 schrieb Marcello Perathoner:

> Michael Hart wrote:
>
>> Since there is not substantive content to this purported
>> "civil response" it still falls into the FLAME category.
>
> Wrong.
>
> "Flaming is the hostile and insulting interaction between Internet  
> users."  ---- http://en.wikipedia.org/wiki/Flaming_(internet)
	I could not resist reading this. BTW you forgot the last ")" in your  
link.
	Wiki is nice, but not authorative. I you would have carefully read  
its entire
	entry, you would have to admit that most of the post here on this list
	and others would be defined pre se as flames. Even this one would be
	considered a flame. Actually, meant as a tease :-))

>
>
>> This puts you on the list for being banned if and when a
>> situation arises if/when this list supports censorship.
>
> Stop being ridiculous.
		Opps, a flame!
>
>
>> Not that I have any particular interest in defending all
>> the posts by bowerbird, but I feel I should point out to
>> the concerned parties that he has been set up to quite a
>> significant degree by "tag team flamers" who alternate a
>> series of message carefully gauged to antagonize him
>
> Stop being paranoid.
	Another flame!
>
> He has been antagonized by many people because many people (who  
> actually *do* some work for PG) disagree with Bowerbirds ideas. If  
> Bowerbird actually *did* some work for PG he would soon find out  
> his ideas don't work in real life. Sadly, doing some useful work is  
> not in BB's style.
	And here again.

	For the proof for aforemention definition:
		Similarly, a normal, non-flame message may have elements of a flame? 
it may be hostile,
		for example?but it is not a flame if its author seriously intends  
to advance the discussion.

	At least you do not seem to advance this discussion.

	As a added fun point from wiki:
	 Recently, several online forums have actively encouraged flaming  
amongst fellow posters.

  ;-))

	regards
		Keith.


From marcello at perathoner.de  Thu Aug  7 03:58:24 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu, 07 Aug 2008 12:58:24 +0200
Subject: [gutvol-d] Please change the subject
In-Reply-To: <4897A453.4090900@novomail.net>
References: <20080803070319.GA11777@ark.in-berlin.de>
	<4897A453.4090900@novomail.net>
Message-ID: <489AD550.7010907@perathoner.de>

Lee Passey wrote:

> It seems to me that the /source/ of the errors is unimportant as
> compared to the /quantity/ of errors. Of course, even more important is
> the /quality/ of the errors; if the original errors are blatant, but the
> introduced errors are subtle, it may make more sense to simply let the
> the humans deal with the errors as they are more likely to catch the
> blatant errors than the subtle ones.
> 
> So what do others think about this? (How's that for changing the subject?)


On one hand preprocessing the text would save proofers keystrokes.

On the other hand preprocessing would reduce proofers attention and reward.


(It's funny how Bowerbird can propose to preprocess text to reduce 
errors and at the same time propose to reintroduce deliberate errors to 
keep proofers attention up.)

Bowerbird likes to rant about the amount of wasted proofers time. But he 
failed to demonstrate that a proofer takes significantly longer to do a 
dirty page than to do a clean one. He never took the trouble to do a few 
pages with a stopwatch in his hand. (He never posted his results, so its 
safe to conclude he never did it.)


I think the difference is negligible. The typing time spent on a dirty 
page is very small compared to the eyeballing time.


Sure, the only way to know for certain is to set up a test environment 
and do a book both ways counting remaining errors and used time. Then we 
can decide which way is better.




-- 
Marcello Perathoner
webmaster at gutenberg.org


From walter.van.holst at xs4all.nl  Thu Aug  7 04:04:58 2008
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Thu, 07 Aug 2008 13:04:58 +0200
Subject: [gutvol-d] Please change the subject
In-Reply-To: <489AD550.7010907@perathoner.de>
References: <20080803070319.GA11777@ark.in-berlin.de>	<4897A453.4090900@novomail.net>
	<489AD550.7010907@perathoner.de>
Message-ID: <489AD6DA.30408@xs4all.nl>

Marcello Perathoner wrote:

> 
> On one hand preprocessing the text would save proofers keystrokes.
> 
> On the other hand preprocessing would reduce proofers attention and reward.

What about preprocessing that introduces suggestions that are to be 
checked by proofers? A heatmap would be a nice UI for this.

Regards,

  Walter

From marcello at perathoner.de  Thu Aug  7 04:21:05 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu, 07 Aug 2008 13:21:05 +0200
Subject: [gutvol-d] Error detection and maybe correction
In-Reply-To: <627d59b80808050934v25fb4c56tf4bd74a86a298ad8@mail.gmail.com>
References: <627d59b80808050934v25fb4c56tf4bd74a86a298ad8@mail.gmail.com>
Message-ID: <489ADAA1.7000101@perathoner.de>

don kretz wrote:

> The topic of data, rules, and user interfaces is a hot one in software these
> days (and probably most days). And there's a generally accepted design
> strategy that addresses it directly, and goes by the acronym MVC (for Model,
> View, Controller). It's not the only model available, but it's been around a
> long time (going at least back to the Xerox Parc Alto project, whence Apple
> Mac, mice, networked apps, Windows, and a bunch of other current design
> metaphors,) and it's currently the most popular one, I believe.
> 
> ...
> 
> But to make this kind of work generally useful, the DP site code, and the
> people who control it, need to understand these issues, and be willing
> (maybe eager) to collaborate with other developers.


The code at DP (and at PG) has evolved over time. When DP started nobody 
knew exactly where it was headed to. Nobody knew how big the codebase 
was going to be. Nobody knew how many hands where at disposal.

Of course, code refactoring is always an option, and when you refactor 
you can incorporate all buzzwords of the days (OO, patterns, extreme, 
fuzzy, Java, MVC, visitors, ...).

But you also have to consider the cost of refactoring. The cost at DP is 
to rewrite everything from scratch. With only half a dozen hands 
experienced enough to do this: not an option.


At the present state, incorporating changes is hard. (Nobody is the 
culprit.) Developers are not unwilling or disobliging. They just don't 
have time to osrt things out. (And it surely didn't help BB asked for 
these changes.)


In summary: the code at DP is working well. Which is more than can be 
said for many buzzword-driven projects and Vista.



-- 
Marcello Perathoner
webmaster at gutenberg.org


From schultzk at uni-trier.de  Thu Aug  7 04:28:00 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 7 Aug 2008 13:28:00 +0200
Subject: [gutvol-d] bastien
In-Reply-To: <489AC96C.4030807@perathoner.de>
References:  
	
	<489AC96C.4030807@perathoner.de>
Message-ID: 

Hi Marcello,

	Michaels "quote" as you put is actually quite correct. Believe me I  
am German
	and American, Studies are/were in English and lingusits.

	For a more precise translation:
		"There, where they burn books, they burn also in the end humans"
		Here
Am 07.08.2008 um 12:07 schrieb Marcello Perathoner:

> Michael Hart wrote:
>
>> "Where they censor people,
>> they eventually censor books."
>
>
> You got your quote wrong ... again. Still don't use Google?
>
>
> In his 1821 play, Almansor, the German writer Heinrich Heine ?  
> referring to the burning of the Muslim holy book, the Qur'an,  
> during the Spanish Inquisition ? famously wrote:
>
>     ?Where they burn books, so too will they in the end burn human  
> beings.? (?Dort, wo man B?cher verbrennt, verbrennt man auch am  
> Ende Menschen.?)
	For a more precise translation:
		"There, where they burn books, they burn also in the end humans"
		Here the "auch" can be either compounded to the end or, as you did  
it to the verb.

	The translation you have given would correct if Heine had written:
		?Dort wo man B?cher verbrennt, verbrennt man, auch, am Ende Menschen.?
	Of course I am suppossing modern german punctuation. I assume though  
this was probably true
	during Heines times.

	I was not sure if the german quote was correct and googled it and  
can up with this one:
	?Dort wo man B?cher verbrennt, verbrennt man am Ende auch Menschen.?

	source: http://www.zitate-online.de/literaturzitate/allgemein/19065/ 
dort-wo-man-buecher-verbrennt-verbrennt-man.html

	Which translates as
	  "There, where they burn books, they burn in the end, also humans"

	If need I check the book out in the university library and determine  
which quote is correct.
	Yet, in the light of the german quote found by me, I am sure that  
Micheals quote correct.

	Furthermore, the injection of "so to will" (more modern: "so too  
they will"; it is not a question) can be dropped
	giving you Micheals quote.

	Either way you turn it Micheal is correct.

	regards
		Keith.


From schultzk at uni-trier.de  Thu Aug  7 04:35:36 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 7 Aug 2008 13:35:36 +0200
Subject: [gutvol-d] twisted (bravo)
In-Reply-To: 
References: 
Message-ID: <91309CE8-7372-48C5-9ECD-D7C0938BC951@uni-trier.de>


Am 07.08.2008 um 12:16 schrieb Bowerbird at aol.com:

> dakretz said:
> >   I've compiled a variant of the Twister tool with an editor  
> window added.
>
> congratulations.  you've taken a huge stride that will forever  
> change your vision.
>
>
> >   browse to a directory containing image and .txt files with  
> matching names.
>
> close, but not quite...  those .txt files should be combined into a  
> single file.
> remember, we're working on a book-wide basis; that's our unit of  
> analysis.
> on input, your tool splits the text up into pages for their  
> individual display...
	I see now reason why it can not be broken up into individual files  
(pages). It is only up to the
	analysis software to handle when we need the other page and can act  
accordingly.

	Agreed, it makes simpler programming, but from the human interactor  
it makes no
	difference.  There are more than one way to skin a cat!	

	regards
		Keith.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From schultzk at uni-trier.de  Thu Aug  7 04:53:23 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 7 Aug 2008 13:53:23 +0200
Subject: [gutvol-d] Error detection and maybe correction
In-Reply-To: <489ADAA1.7000101@perathoner.de>
References: <627d59b80808050934v25fb4c56tf4bd74a86a298ad8@mail.gmail.com>
	<489ADAA1.7000101@perathoner.de>
Message-ID: 

Hi Marcello,

	Well said!

	The problem is sorting things out. Since things are sorted out is why
	things will get worse from a programming stand point.

	If the code base is engineered properly from scratch, writing the
	code will not be that hard. Things can be engineered by the few with
	the knowhow and the others write the code. All that is need is that the
	programmers adhere to the specification of the program of routine they
	are working on and all the pieces will fit into place. Furthermore,
	the individual programmers can make thier own test suites for there
	own parts if they wish to without interfering with the rest of the
	project.

	Am am sure you have work on large projects and know how this works.

	regards
		Keith.

Am 07.08.2008 um 13:21 schrieb Marcello Perathoner:

> don kretz wrote:
>
>> The topic of data, rules, and user interfaces is a hot one in  
>> software these
>> days (and probably most days). And there's a generally accepted  
>> design
>> strategy that addresses it directly, and goes by the acronym MVC  
>> (for Model,
>> View, Controller). It's not the only model available, but it's  
>> been around a
>> long time (going at least back to the Xerox Parc Alto project,  
>> whence Apple
>> Mac, mice, networked apps, Windows, and a bunch of other current  
>> design
>> metaphors,) and it's currently the most popular one, I believe.
>> ...
>> But to make this kind of work generally useful, the DP site code,  
>> and the
>> people who control it, need to understand these issues, and be  
>> willing
>> (maybe eager) to collaborate with other developers.
>
>
> The code at DP (and at PG) has evolved over time. When DP started  
> nobody knew exactly where it was headed to. Nobody knew how big the  
> codebase was going to be. Nobody knew how many hands where at  
> disposal.
>
> Of course, code refactoring is always an option, and when you  
> refactor you can incorporate all buzzwords of the days (OO,  
> patterns, extreme, fuzzy, Java, MVC, visitors, ...).
>
> But you also have to consider the cost of refactoring. The cost at  
> DP is to rewrite everything from scratch. With only half a dozen  
> hands experienced enough to do this: not an option.
>
>
> At the present state, incorporating changes is hard. (Nobody is the  
> culprit.) Developers are not unwilling or disobliging. They just  
> don't have time to osrt things out. (And it surely didn't help BB  
> asked for these changes.)
>
>
> In summary: the code at DP is working well. Which is more than can  
> be said for many buzzword-driven projects and Vista.
>
>
>
> -- 
> Marcello Perathoner
> webmaster at gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From joshua at hutchinson.net  Thu Aug  7 04:58:46 2008
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu, 7 Aug 2008 11:58:46 +0000 (GMT)
Subject: [gutvol-d] bastien
Message-ID: <1872592925.262851218110326603.JavaMail.mail@webmail08>


Keith, I'm confused.  You say Michael is correct, but the translation you give is very close to Marcello's quote, not Michael's.

To clarify, Michael wrote:

>> "Where they censor people,
>> they eventually censor books."

Marcello wrote:

"There, where they burn books, they burn also in the end humans"

You said a better translation is:

"There, where they burn books, they burn in the end, also humans"

Either way, Michael's quote mentions censoring people and books ... you and Marcello mention burning books and people.

Josh



From marcello at perathoner.de  Thu Aug  7 05:18:46 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu, 07 Aug 2008 14:18:46 +0200
Subject: [gutvol-d] bastien
In-Reply-To: 
References: 
			<489AC96C.4030807@perathoner.de>
	
Message-ID: <489AE826.70201@perathoner.de>

Schultz Keith J. wrote:

>     Either way you turn it Micheal is correct.

   Almansor:
   Wir h?rten da? der furchtbare Ximenes,
   Inmitten auf dem Markte, zu Granada -
   Mir starrt die Zung im Munde - den Koran
   In eines Scheiterhaufens Flamme warf!

   Hassan:
   Das war ein Vorspiel nur, dort wo man B?cher
   Verbrennt, verbrennt man auch am Ende Menschen.


Almansor:

We heard that the terrible Ximenes,
in the midst of the marketplace of Granada --
my tongue stiffs in my mouth -- threw
the Koran into a pile of flames.

Hassan:

That was only a prelude: where they burn books,
they end with burning people too.


Heine was referring to the burning of witches by the Holy Inquisition, 
which took place after the Spanish reconquista.



 >> Michael Hart wrote:
 >>
 >>> "Where they censor people,
 >>> they eventually censor books."


Seems to *me* that Michael got it the wrong end first, besides swapping 
"burning" with "censoring", and making a mess of one of the greatest 
German poets.





-- 
Marcello Perathoner
webmaster at gutenberg.org


From schultzk at uni-trier.de  Thu Aug  7 05:18:46 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 7 Aug 2008 14:18:46 +0200
Subject: [gutvol-d] bastien
In-Reply-To: <1872592925.262851218110326603.JavaMail.mail@webmail08>
References: <1872592925.262851218110326603.JavaMail.mail@webmail08>
Message-ID: <536B0A73-E9A7-437B-A2C1-231A9A366907@uni-trier.de>

Hi Josh,

	To explain Micheal is using the words of Heine as a metaphor.
	This metaphor is considered a quote.

	Micheal, says censor where Heine uses burn, he also reverses the  
positions
	of humans and books.

	I assume Marcello did not like what Micheal is getting at. So he  
simply stated
	Micheal misquoted which he did not.

	Of course, Micheal did not do an exact quote, but a metaphorical one.
	
	The implied meaning of Micheals quote I leave to the inclined reader  
or to
	Micheal if he is inclined to do.

	So much flamming about subtleties!

	regards
		keith.

  Am 07.08.2008 um 13:58 schrieb Joshua Hutchinson:

>
> Keith, I'm confused.  You say Michael is correct, but the  
> translation you give is very close to Marcello's quote, not Michael's.
>
> To clarify, Michael wrote:
>
>>> "Where they censor people,
>>> they eventually censor books."
>
> Marcello wrote:
>
> "There, where they burn books, they burn also in the end humans"
>
> You said a better translation is:
>
> "There, where they burn books, they burn in the end, also humans"
>
> Either way, Michael's quote mentions censoring people and books ...  
> you and Marcello mention burning books and people.
>
> Josh
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From schultzk at uni-trier.de  Thu Aug  7 05:25:34 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu, 7 Aug 2008 14:25:34 +0200
Subject: [gutvol-d] bastien
In-Reply-To: <489AE826.70201@perathoner.de>
References: 
			<489AC96C.4030807@perathoner.de>
	
	<489AE826.70201@perathoner.de>
Message-ID: <62F07C70-0EF1-4059-A951-0A7B7C1FCAED@uni-trier.de>

Hi Marcello,


	Much better. "they end" or "in the end" fits eventually quite well.
	
	Thanx for clearing adding light to the meatpher. Though I did not need
	it.

	regards
		Keith

Am 07.08.2008 um 14:18 schrieb Marcello Perathoner:

> Schultz Keith J. wrote:
>
>>     Either way you turn it Micheal is correct.
>
>   Almansor:
>   Wir h?rten da? der furchtbare Ximenes,
>   Inmitten auf dem Markte, zu Granada -
>   Mir starrt die Zung im Munde - den Koran
>   In eines Scheiterhaufens Flamme warf!
>
>   Hassan:
>   Das war ein Vorspiel nur, dort wo man B?cher
>   Verbrennt, verbrennt man auch am Ende Menschen.
>
>
> Almansor:
>
> We heard that the terrible Ximenes,
> in the midst of the marketplace of Granada --
> my tongue stiffs in my mouth -- threw
> the Koran into a pile of flames.
>
> Hassan:
>
> That was only a prelude: where they burn books,
> they end with burning people too.
>
>
> Heine was referring to the burning of witches by the Holy  
> Inquisition, which took place after the Spanish reconquista.
>
>
>
> >> Michael Hart wrote:
> >>
> >>> "Where they censor people,
> >>> they eventually censor books."
>
>
> Seems to *me* that Michael got it the wrong end first, besides  
> swapping "burning" with "censoring", and making a mess of one of  
> the greatest German poets.
>
>
>
>
>
> -- 
> Marcello Perathoner
> webmaster at gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From marcello at perathoner.de  Thu Aug  7 05:32:58 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu, 07 Aug 2008 14:32:58 +0200
Subject: [gutvol-d] bastien
In-Reply-To: <536B0A73-E9A7-437B-A2C1-231A9A366907@uni-trier.de>
References: <1872592925.262851218110326603.JavaMail.mail@webmail08>
	<536B0A73-E9A7-437B-A2C1-231A9A366907@uni-trier.de>
Message-ID: <489AEB7A.4080600@perathoner.de>

Schultz Keith J. wrote:

>     Of course, Micheal did not do an exact quote, but a metaphorical one.

Ah! He made it up himself. That may explain things.

But then he should not use quote signs, which always imply somebody is 
actually being quoted, and the Heine quote is the only one with books 
and people in existence.

(If you find more of that kind, please tell me. (And I already know 
"Fahrenheit 451".))


-- 
Marcello Perathoner
webmaster at gutenberg.org


From lee at novomail.net  Thu Aug  7 06:44:28 2008
From: lee at novomail.net (Lee Passey)
Date: Thu, 07 Aug 2008 07:44:28 -0600
Subject: [gutvol-d] Bowerbir OCR steps
In-Reply-To: <000001c8f857$08410060$18c30120$@co.uk>
References: 
	<000001c8f857$08410060$18c30120$@co.uk>
Message-ID: <489AFC3C.7030006@novomail.net>

Dave Fawthrop wrote:
> 
> V. L. Simpson Wrote
> 
>> I've gathered up BB's OCR steps into one single file and attached some
>> some regular expressions for the easy parts.
> 
>> http://vls.freeshell.org/dpfiles/textchecks.txt
> 
>> I'm no regexp expert so if  any one else has improvements let me me know.
> 
>> Vance
> 
> Assuming that they work as intended, almost all these can occur 
> occasionally in real text. Notably in mixed prose with poetry.
> Thus require a human to check if they are OCR mistakes or intended.
> 
> As an example:
> 16. Double-line-end followed by lowercase
> Common in poetry
> 
> A few do not work on non English text.

Yes, but this is a straw-man argument, because so far no one has 
suggested that humans be removed from the process. The suggestion is to 
use automated processes to reduce the number of errors /before/ it is 
reviewed by humans, and not to do it /after/ human review where it may 
produce unintended consequences.

If a human proof-reader can catch 90% of all errors, it seems to me you 
would want him to catch 90% of 10 errors, rather than 90% of 100 errors, 
even if they are two different sets of errors.

From lee at novomail.net  Thu Aug  7 06:57:54 2008
From: lee at novomail.net (Lee Passey)
Date: Thu, 07 Aug 2008 07:57:54 -0600
Subject: [gutvol-d] Please change the subject
In-Reply-To: <489AD550.7010907@perathoner.de>
References: <20080803070319.GA11777@ark.in-berlin.de>	<4897A453.4090900@novomail.net>
	<489AD550.7010907@perathoner.de>
Message-ID: <489AFF62.6050908@novomail.net>

Marcello Perathoner wrote:

> On one hand preprocessing the text would save proofers keystrokes.
> 
> On the other hand preprocessing would reduce proofers attention and reward.

The notion that proofers should be presented with texts containing a 
significant number of errors, perhaps artificially seeded if there are 
not enough, seems to me the height of perversity. I simply cannot 
imagine that any proofer would feel bored or unfilled if s/he were 
unable to find an error in a text. The notion boggles my mind. If there 
are such individuals (which I doubt) they're probably not the kind of 
detail-oriented people you would want proofing texts.

> (It's funny how Bowerbird can propose to preprocess text to reduce 
> errors and at the same time propose to reintroduce deliberate errors to 
> keep proofers attention up.)

It's funny how people can suggest that bb has proposed deliberately 
introducing errors, when he has never done so and has vehemently opposed 
the suggestion whenever it has been made. The only people who have 
suggested that errors should be deliberately introduced are those who 
feel threatened by the notion that proofers should start with texts 
which are as clean as possible, even if it requires a modicum more of 
effort before publication.

From vze3rknp at verizon.net  Thu Aug  7 07:04:57 2008
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Thu, 07 Aug 2008 10:04:57 -0400
Subject: [gutvol-d] DP code base (was Error detection and maybe
	correction)
In-Reply-To: 
References: <627d59b80808050934v25fb4c56tf4bd74a86a298ad8@mail.gmail.com>
	<489ADAA1.7000101@perathoner.de>
	
Message-ID: <489B0109.3090400@verizon.net>

The developers who are currently working with the DP code are well aware 
of the problems with it. They've been gradually working at refactoring 
it as they touch various pieces of it for other things. My understanding 
is that certain areas are now significantly better than they were, while 
others, most notably the user interface, are still in horrible shape.

One constraint that hasn't been mentioned is that we don't know exactly 
where we will end up, in terms of processes. It's one thing to structure 
code when you have, or can obtain, exact specifications as to what it is 
to do. But we don't have those, so our developers have been trying to 
build in as much flexibility as possible. The original code base made 
lots of assumptions about how things should/would work which have proven 
to be insufficient as we've gained experience. Some of those assumptions 
are still built in to the data structures etc. But on the whole, the 
efforts of our developers have been to make things as flexible as 
possible. Starting from scratch would be better, of course, but we are 
trying to head in the right direction.

Another matter that is often overlooked in discussion of workflow 
changes at DP is that changing the code is only the very beginning of 
the effort. We have lots of documentation that has to be updated as 
well, which is no trivial task. We have almost as few people who are 
really willing to work on documentation as we have developers who are 
familiar with the code. And then, of course, there's our large 
population of volunteers who would also need to be educated about the 
changes. All of this is not to say that fundamental change is 
impossible. We've shown that we can make large changes. It's just 
something that we approach with great care.

JulietS

Schultz Keith J. wrote:
> Hi Marcello,
>
>     Well said!
>
>     The problem is sorting things out. Since things are sorted out is why
>     things will get worse from a programming stand point.
>
>     If the code base is engineered properly from scratch, writing the
>     code will not be that hard. Things can be engineered by the few with
>     the knowhow and the others write the code. All that is need is 
> that the
>     programmers adhere to the specification of the program of routine 
> they
>     are working on and all the pieces will fit into place. Furthermore,
>     the individual programmers can make thier own test suites for there
>     own parts if they wish to without interfering with the rest of the
>     project.
>
>     Am am sure you have work on large projects and know how this works.
>
>     regards
>         Keith.
>
> Am 07.08.2008 um 13:21 schrieb Marcello Perathoner:
>
>> don kretz wrote:
>>
>>> The topic of data, rules, and user interfaces is a hot one in 
>>> software these
>>> days (and probably most days). And there's a generally accepted design
>>> strategy that addresses it directly, and goes by the acronym MVC 
>>> (for Model,
>>> View, Controller). It's not the only model available, but it's been 
>>> around a
>>> long time (going at least back to the Xerox Parc Alto project, 
>>> whence Apple
>>> Mac, mice, networked apps, Windows, and a bunch of other current design
>>> metaphors,) and it's currently the most popular one, I believe.
>>> ...
>>> But to make this kind of work generally useful, the DP site code, 
>>> and the
>>> people who control it, need to understand these issues, and be willing
>>> (maybe eager) to collaborate with other developers.
>>
>>
>> The code at DP (and at PG) has evolved over time. When DP started 
>> nobody knew exactly where it was headed to. Nobody knew how big the 
>> codebase was going to be. Nobody knew how many hands where at disposal.
>>
>> Of course, code refactoring is always an option, and when you 
>> refactor you can incorporate all buzzwords of the days (OO, patterns, 
>> extreme, fuzzy, Java, MVC, visitors, ...).
>>
>> But you also have to consider the cost of refactoring. The cost at DP 
>> is to rewrite everything from scratch. With only half a dozen hands 
>> experienced enough to do this: not an option.
>>
>>
>> At the present state, incorporating changes is hard. (Nobody is the 
>> culprit.) Developers are not unwilling or disobliging. They just 
>> don't have time to osrt things out. (And it surely didn't help BB 
>> asked for these changes.)
>>
>>
>> In summary: the code at DP is working well. Which is more than can be 
>> said for many buzzword-driven projects and Vista.

From joshua at hutchinson.net  Thu Aug  7 07:24:44 2008
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu, 7 Aug 2008 14:24:44 +0000 (GMT)
Subject: [gutvol-d] Please change the subject
Message-ID: <1585478277.1793141218119085003.JavaMail.mail@webmail09>



On Aug 7, 2008, lee at novomail.net wrote: 

The notion that proofers should be presented with texts containing a 
significant number of errors, perhaps artificially seeded if there are 
not enough, seems to me the height of perversity. I simply cannot 
imagine that any proofer would feel bored or unfilled if s/he were 
unable to find an error in a text. The notion boggles my mind. If there 
are such individuals (which I doubt) they're probably not the kind of 
detail-oriented people you would want proofing texts.

***

Actually, Lee, that exact sentiment pops up A LOT over at DP forums.  And not from the "powers-that-be" that BB likes to rail against (they've stayed out of it to a large extent).  It comes from brand spanking newbies to crusty old hands.  

Personally, I don't buy into it, but it is a very recurring thread of discussion among DP volunteers.

Josh

From vze3rknp at verizon.net  Thu Aug  7 08:18:17 2008
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Thu, 07 Aug 2008 11:18:17 -0400
Subject: [gutvol-d] Please change the subject
In-Reply-To: <489AFF62.6050908@novomail.net>
References: <20080803070319.GA11777@ark.in-berlin.de>
	<4897A453.4090900@novomail.net>	<489AD550.7010907@perathoner.de>
	<489AFF62.6050908@novomail.net>
Message-ID: <489B1239.6040402@verizon.net>

Lee Passey wrote:
> Marcello Perathoner wrote:
>
>> On one hand preprocessing the text would save proofers keystrokes.
>>
>> On the other hand preprocessing would reduce proofers attention and 
>> reward.
>
> The notion that proofers should be presented with texts containing a 
> significant number of errors, perhaps artificially seeded if there are 
> not enough, seems to me the height of perversity. I simply cannot 
> imagine that any proofer would feel bored or unfilled if s/he were 
> unable to find an error in a text. The notion boggles my mind. If 
> there are such individuals (which I doubt) they're probably not the 
> kind of detail-oriented people you would want proofing texts.
>
The notion may boggle your mind, but we have consistently found that 
there is a decided preference among some of our volunteers for working 
in our first proofing round. They like to feel useful, and that means 
correcting errors. They do, in fact, find it boring to not have to DO 
anything. Actually making corrections is just more fun than checking a 
page and finding nothing to fix. It takes awhile for many people to 
understand that finding the subtle, easily-overlooked errors is much 
more challenging. The truth is that many of our volunteers are not the 
incredibly detail-oriented folks who get satisfaction from finding that 
one tricky error every 10 pages. They are ordinary folks who want the 
immediate feel-good feedback of finding and fixing OCR errors. I don't 
know how many errors is "enough" for them to feel good about what 
they've done, or how few it takes for them to feel bored. I just know 
that there are folks who want to feel like they've Done Something on 
most of the pages that they touch. I can sympathize with this point of 
view since it is where I started out when I first came to DP.

I am NOT in favor of deliberately adding in errors for volunteers to 
find. That feels very wrong to me. I suppose I might change my mind if 
there were very strong evidence that doing so actually resulted in 
better texts in the end, but for now it just seems like a breaking of 
the tacit contract we make with our volunteers that says that their 
efforts are directly resulting in cleaner, better texts.

I'm also NOT saying that there should be no preprocessing of the 
material presented to the DP volunteers. On the contrary, I think that 
as much automated preprocessing should be done as is possible to do 
accurately. There's no excuse for garbage in the files that could have 
easily been removed automatically ahead of time. Preprocessing that 
notes where human judgment is required is fine. But having the content 
provider do all possible checks ahead of time, making all of the 
judgments, isn't something that I would advocate with our model. Our 
whole purpose is to distribute the work involved in creating a finished 
text. That, combined with leaving something for the volunteers to 
actually DO, says to me that there's a balance between what should be 
done ahead and what can be left for the proofing process. guiprep 
incorporated our collective understanding as a community of what that 
balance should be at the time that it was written. Roger Frank's tools 
reflect a better current understanding of that balance. I expect that 
eventually most content providers at DP will be using Roger's tools 
rather than guiprep, or perhaps guiprep will incorporate his tools.

JulietS

From dakretz at gmail.com  Thu Aug  7 08:53:48 2008
From: dakretz at gmail.com (don kretz)
Date: Thu, 7 Aug 2008 08:53:48 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 20
In-Reply-To: 
References: 
Message-ID: <627d59b80808070853h3c4c737cy509eff5d5fe3ad46@mail.gmail.com>

Consolidating the text file is already coded - it wants only a button to
push. Or, alternatively, you can load an existing text file (but I haven't
coded synchronizing it with the images.) At least this way you can see the
text and image selection matching as you browse.

The "Find/Replace" button is commented out because it's not done yet.  Your
description of wraparound Find is a good description of the "n" key in the
vim editor, the "?" key does the same thing backwards. Both are basic
requirements.

In the same vein, the trickier one to envision is "show me a scrolling list
of all matches in context (probably line above and line below), and let me
browse it, with the "Find" target preselected and the trigger primed to
replace it with a default string, or replace it with what I type instead.

And on, and on ...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Thu Aug  7 09:37:07 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 7 Aug 2008 12:37:07 EDT
Subject: [gutvol-d] my spam folder
Message-ID: 

boy, my spam folder is really lit up this morning...

for those who don't know, i now ignore the following trolls:
marcello, josh, lee, robert, karen, and a handful of others...

i do that because, in literally _years_ of my experience, they
almost _never_ advanced the dialog here in a meaningful way.

which is not to say that they would be incapable of doing so.

so please, if _any_ of 'em has said _anything_ that's worthwhile,
put it into your own words and tell me that they made a point...

otherwise, i'll assume that their input is up to its usual standard,
and i won't waste even one little bit of my time reading any of it.

with dakretz and maybe rfrank making some real progress now,
and even vlsimpson (who has been maintaining guiguts lately)
chipping in on the collaboration, it's no time to bother with trolls.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From sly at victoria.tc.ca  Thu Aug  7 10:17:59 2008
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu, 7 Aug 2008 10:17:59 -0700 (PDT)
Subject: [gutvol-d] Please change the subject
In-Reply-To: <1585478277.1793141218119085003.JavaMail.mail@webmail09>
References: <1585478277.1793141218119085003.JavaMail.mail@webmail09>
Message-ID: 



> On Aug 7, 2008, lee at novomail.net wrote:
>
> The notion that proofers should be presented with texts containing a
> significant number of errors, perhaps artificially seeded if there are
> not enough, seems to me the height of perversity. I simply cannot
> imagine that any proofer would feel bored or unfilled if s/he were
> unable to find an error in a text. The notion boggles my mind. If there
> are such individuals (which I doubt) they're probably not the kind of
> detail-oriented people you would want proofing texts.

On Thu, 7 Aug 2008, Joshua Hutchinson wrote:

> Actually, Lee, that exact sentiment pops up A LOT over at DP forums.  And not from the "powers-that-be" that BB likes to rail against (they've stayed out of it to a large extent).  It comes from brand spanking newbies to crusty old hands.
>
> Personally, I don't buy into it, but it is a very recurring thread of discussion among DP volunteers.


I have to say I agree with Josh here. I've had the impression that
many ppl working in P1 at DP like having something that needs to
be changed, and like fixing the "obvious" errors.

And since there is always an abundance of availible eyes in P1, why
not let them...

--Andrew

From dakretz at gmail.com  Thu Aug  7 11:09:39 2008
From: dakretz at gmail.com (don kretz)
Date: Thu, 7 Aug 2008 11:09:39 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 22
In-Reply-To: 
References: 
Message-ID: <627d59b80808071109o54ff657cu71a1a8445bf39542@mail.gmail.com>

> On Aug 7, 2008, lee at novomail.net wrote:
>
> The notion that proofers should be presented with texts containing a
> significant number of errors, perhaps artificially seeded if there are
> not enough, seems to me the height of perversity. I simply cannot
> imagine that any proofer would feel bored or unfilled if s/he were
> unable to find an error in a text. The notion boggles my mind. If there
> are such individuals (which I doubt) they're probably not the kind of
> detail-oriented people you would want proofing texts.

On Thu, 7 Aug 2008, Joshua Hutchinson wrote:

> Actually, Lee, that exact sentiment pops up A LOT over at DP forums. And
not from the "powers-that-be" that BB likes to rail against (they've stayed
out of it to a large extent). It comes from brand spanking newbies to crusty
old hands.

>

> Personally, I don't buy into it, but it is a very recurring thread of
discussion among DP volunteers.



I have to say I agree with Josh here. I've had the impression that many ppl
working in P1 at DP like having something that needs to be changed, and like
fixing the "obvious" errors.

And since there is always an abundance of availible eyes in P1, why not let
them...

--Andrew



Yup, that's the common assertion. My personal, 100% emotional response is
that I don't bond well with lab rats.

(This statement is not intended for any purpose other than possible
amusement value.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Thu Aug  7 11:33:04 2008
From: dakretz at gmail.com (don kretz)
Date: Thu, 7 Aug 2008 11:33:04 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 22
In-Reply-To: 
References: 
Message-ID: <627d59b80808071133x2fed3a0dr29b1ce2fd9db7fa6@mail.gmail.com>

---------- Forwarded message ----------

From: Juliet Sutherland 

To: Project Gutenberg Volunteer Discussion 

Date: Thu, 07 Aug 2008 11:18:17 -0400


I'm also NOT saying that there should be no preprocessing of the material
presented to the DP volunteers. On the contrary, I think that as much
automated preprocessing should be done as is possible to do accurately.
There's no excuse for garbage in the files that could have easily been
removed automatically ahead of time. Preprocessing that notes where human
judgment is required is fine. But having the content provider do all
possible checks ahead of time, making all of the judgments, isn't something
that I would advocate with our model. Our whole purpose is to distribute the
work involved in creating a finished text. That, combined with leaving
something for the volunteers to actually DO, says to me that there's a
balance between what should be done ahead and what can be left for the
proofing process. guiprep incorporated our collective understanding as a
community of what that balance should be at the time that it was written.
Roger Frank's tools reflect a better current understanding of that balance.
I expect that eventually most content providers at DP will be using Roger's
tools rather than guiprep, or perhaps guiprep will incorporate his tools.


JulietS

I think we run little risk of erring on the side of expecting too much
preprocessing on the part of Content Providers.

I've posted the before and after texts from an EB project, if anyone wants
an idea of what's possible. I've also posted the regexes (preprocessing
rules) that I used to make those alterations. And doing it with a text
editor is not my definition of user-friendly, nor efficient. But doing those
checks and making those corrections took somewhere between one and two
hours. In the context of text of what it takes to prepare a DP project,
that's pretty close to negligible by anyone's definition.

I think there are few Content Providers at dp that would find it
objectionable. I *know* there are CPs who would love to make those checks.

I think the point is, why doesn't dp to acknowledge the value, support it as
an important goal, and actually provide technical (as in system-related)
assistance and encouragement in incorporating it into the workflow?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From joshua at hutchinson.net  Thu Aug  7 11:42:55 2008
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu, 7 Aug 2008 18:42:55 +0000 (GMT)
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 22
Message-ID: <1103004776.1830941218134575440.JavaMail.mail@webmail09>

On Aug 7, 2008, dakretz at gmail.com wrote: 

I think there are few Content Providers at dp that would find it objectionable. I *know* there are CPs who would love to make those checks.
I think the point is, why doesn't dp to acknowledge the value, support it as an important goal, and actually provide technical (as in system-related) assistance and encouragement in incorporating it into the workflow?

***

What makes you think DP doesn't support it?  Granted the tools may need some updating, but GuiPrep in its various iterations has existed since right after the original Slashdotting*.  We've always recognized (and used) the importance of pre-processing, but most CPers in the past only wanted the automated style preprocessing.

Josh

* In DP history, the original Slashdotting came after DP was a story on slashdot.org and traffic spiked to unmatched levels.  It marks the point where DP really became a major player.

From paulmaas at airpost.net  Thu Aug  7 11:57:45 2008
From: paulmaas at airpost.net (Paul Maas)
Date: Thu, 07 Aug 2008 11:57:45 -0700
Subject: [gutvol-d] Please change the subject
In-Reply-To: <489B1239.6040402@verizon.net>
References: <20080803070319.GA11777@ark.in-berlin.de>
	<4897A453.4090900@novomail.net> <489AD550.7010907@perathoner.de>
	<489AFF62.6050908@novomail.net> <489B1239.6040402@verizon.net>
Message-ID: <1218135465.17655.1267534083@webmail.messagingengine.com>

Introducing errors to the text may be quite beneficial if done properly.

When added, the errors must be subtle, representing the types of errors
that are difficult for eagle-eyed proofers to spot.

I would presume that the long-time DP volunteers will be able to
assemble a comprehensive list of hard-to-spot error categories.

The benefit is obvious:  when these introduced errors are found, it is
very likely that any real and similarly hard-to-spot errors were also
found.

As a bonus, this allows DP to keep track of the "eagle-eye" performance
of individual proofers.

One might even consider telling the proofers that one or more subtle
errors were intentionally added to the text.  If the proofer finds them,
along with any other real errors (the proofer will not know which is
which), then they will get some kind of recognition or reward.  Not sure
if this will work, but an experiment to try.


On Thu, 07 Aug 2008 11:18:17 -0400, "Juliet Sutherland"
 said:
> Lee Passey wrote:
> > Marcello Perathoner wrote:
> >
> >> On one hand preprocessing the text would save proofers keystrokes.
> >>
> >> On the other hand preprocessing would reduce proofers attention and 
> >> reward.
> >
> > The notion that proofers should be presented with texts containing a 
> > significant number of errors, perhaps artificially seeded if there are 
> > not enough, seems to me the height of perversity. I simply cannot 
> > imagine that any proofer would feel bored or unfilled if s/he were 
> > unable to find an error in a text. The notion boggles my mind. If 
> > there are such individuals (which I doubt) they're probably not the 
> > kind of detail-oriented people you would want proofing texts.
> >
> The notion may boggle your mind, but we have consistently found that 
> there is a decided preference among some of our volunteers for working 
> in our first proofing round. They like to feel useful, and that means 
> correcting errors. They do, in fact, find it boring to not have to DO 
> anything. Actually making corrections is just more fun than checking a 
> page and finding nothing to fix. It takes awhile for many people to 
> understand that finding the subtle, easily-overlooked errors is much 
> more challenging. The truth is that many of our volunteers are not the 
> incredibly detail-oriented folks who get satisfaction from finding that 
> one tricky error every 10 pages. They are ordinary folks who want the 
> immediate feel-good feedback of finding and fixing OCR errors. I don't 
> know how many errors is "enough" for them to feel good about what 
> they've done, or how few it takes for them to feel bored. I just know 
> that there are folks who want to feel like they've Done Something on 
> most of the pages that they touch. I can sympathize with this point of 
> view since it is where I started out when I first came to DP.
> 
> I am NOT in favor of deliberately adding in errors for volunteers to 
> find. That feels very wrong to me. I suppose I might change my mind if 
> there were very strong evidence that doing so actually resulted in 
> better texts in the end, but for now it just seems like a breaking of 
> the tacit contract we make with our volunteers that says that their 
> efforts are directly resulting in cleaner, better texts.
> 
> I'm also NOT saying that there should be no preprocessing of the 
> material presented to the DP volunteers. On the contrary, I think that 
> as much automated preprocessing should be done as is possible to do 
> accurately. There's no excuse for garbage in the files that could have 
> easily been removed automatically ahead of time. Preprocessing that 
> notes where human judgment is required is fine. But having the content 
> provider do all possible checks ahead of time, making all of the 
> judgments, isn't something that I would advocate with our model. Our 
> whole purpose is to distribute the work involved in creating a finished 
> text. That, combined with leaving something for the volunteers to 
> actually DO, says to me that there's a balance between what should be 
> done ahead and what can be left for the proofing process. guiprep 
> incorporated our collective understanding as a community of what that 
> balance should be at the time that it was written. Roger Frank's tools 
> reflect a better current understanding of that balance. I expect that 
> eventually most content providers at DP will be using Roger's tools 
> rather than guiprep, or perhaps guiprep will incorporate his tools.
> 
> JulietS
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
-- 
  Paul Maas
  paulmaas at airpost.net

-- 
http://www.fastmail.fm - A fast, anti-spam email service.


From dakretz at gmail.com  Thu Aug  7 12:20:15 2008
From: dakretz at gmail.com (don kretz)
Date: Thu, 7 Aug 2008 12:20:15 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 23
In-Reply-To: 
References: 
Message-ID: <627d59b80808071220i3b8f95d3s860135a7dac79fe8@mail.gmail.com>

---------- Forwarded message ----------
From: Joshua Hutchinson 
To: gutvol-d at lists.pglaf.org
Date: Thu, 7 Aug 2008 18:42:55 +0000 (GMT)
Subject: Re: [gutvol-d] gutvol-d Digest, Vol 49, Issue 22
On Aug 7, 2008, dakretz at gmail.com wrote:

I think there are few Content Providers at dp that would find it
objectionable. I *know* there are CPs who would love to make those checks.
I think the point is, why doesn't dp to acknowledge the value, support it as
an important goal, and actually provide technical (as in system-related)
assistance and encouragement in incorporating it into the workflow?

***

What makes you think DP doesn't support it?  Granted the tools may need some
updating, but GuiPrep in its various iterations has existed since right
after the original Slashdotting*.  We've always recognized (and used) the
importance of pre-processing, but most CPers in the past only wanted the
automated style preprocessing.

Josh

* In DP history, the original Slashdotting came after DP was a story on
slashdot.org and traffic spiked to unmatched levels.  It marks the point
where DP really became a major player.


-----------

Good question. I don't know any other way to determine what's suported than
by what is asserted to be supported (which is only ever elucidated by
Juliet, I think), and then by what is actually given development direction
and resources. I can recall some amount of Type A support, but I've detected
no Type B support. (What few sporadic changes there have been, have been
provided, AFAIK, by developers with no more under dp direction or assistance
than you or I.) I've always considered guiprep to be a resource entirely
from outside the dp development organization, requiring no assistance or
cooperation. I don't think you'll find it embodied within the SourceForge dp
codebase, for instance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From grythumn at gmail.com  Thu Aug  7 12:31:00 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Thu, 7 Aug 2008 15:31:00 -0400
Subject: [gutvol-d] Please change the subject
In-Reply-To: <1218135465.17655.1267534083@webmail.messagingengine.com>
References: <20080803070319.GA11777@ark.in-berlin.de>
	<4897A453.4090900@novomail.net> <489AD550.7010907@perathoner.de>
	<489AFF62.6050908@novomail.net> <489B1239.6040402@verizon.net>
	<1218135465.17655.1267534083@webmail.messagingengine.com>
Message-ID: <15cfa2a50808071231g243a2a2cm6be56d9ed70e9f08@mail.gmail.com>

On Thu, Aug 7, 2008 at 2:57 PM, Paul Maas  wrote:
> Introducing errors to the text may be quite beneficial if done properly.
>
> [...]
>
> As a bonus, this allows DP to keep track of the "eagle-eye" performance
> of individual proofers.

Right. Most importantly, IMO, it gives a direct and quantifiable
element to be used to estimate error rates in a text, as well as
instantaneous feedback. Built-in QC.

R C

From sly at victoria.tc.ca  Thu Aug  7 12:59:41 2008
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu, 7 Aug 2008 12:59:41 -0700 (PDT)
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 22
In-Reply-To: <627d59b80808071133x2fed3a0dr29b1ce2fd9db7fa6@mail.gmail.com>
References: 
	<627d59b80808071133x2fed3a0dr29b1ce2fd9db7fa6@mail.gmail.com>
Message-ID: 


Well, so you have me convinced that your proposal for
pre-processing is a valid option for some people.

So, to take the Michael Hart approach, I could just say "yes!"

Yes, prepare more texts for DP as you describe.
Yes, post messages in the DP forums telling others Content
  Providers about what you suggest.
Yes, itemize exactly what steps to follow on a page on the
  DP wiki. The pre-processing page on that wiki appears
  meagre at this time.

Andrew

On Thu, 7 Aug 2008, don kretz wrote:

> I think we run little risk of erring on the side of expecting too much
> preprocessing on the part of Content Providers.
>
> I've posted the before and after texts from an EB project, if anyone wants
> an idea of what's possible. I've also posted the regexes (preprocessing
> rules) that I used to make those alterations. And doing it with a text
> editor is not my definition of user-friendly, nor efficient. But doing those
> checks and making those corrections took somewhere between one and two
> hours. In the context of text of what it takes to prepare a DP project,
> that's pretty close to negligible by anyone's definition.
>
> I think there are few Content Providers at dp that would find it
> objectionable. I *know* there are CPs who would love to make those checks.
>
> I think the point is, why doesn't dp to acknowledge the value, support it as
> an important goal, and actually provide technical (as in system-related)
> assistance and encouragement in incorporating it into the workflow?
>

From grythumn at gmail.com  Thu Aug  7 13:01:44 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Thu, 7 Aug 2008 16:01:44 -0400
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 23
In-Reply-To: <627d59b80808071220i3b8f95d3s860135a7dac79fe8@mail.gmail.com>
References: 
	<627d59b80808071220i3b8f95d3s860135a7dac79fe8@mail.gmail.com>
Message-ID: <15cfa2a50808071301y5650d6a7i92839f86791afe66@mail.gmail.com>

On Thu, Aug 7, 2008 at 3:20 PM, don kretz  wrote:
> Good question. I don't know any other way to determine what's suported than
> by what is asserted to be supported (which is only ever elucidated by
> Juliet, I think), and then by what is actually given development direction
> and resources. I can recall some amount of Type A support, but I've detected

You seem to be assuming DP is some sort of corporation where resources
can assigned to task units. DP developers are volunteers who work on
what interests them. There is some sort of coordination, but you'd be
much better off talking to the dev list directly, or posting a task
request at DP.

You are incorrect about the tools.. guiprep, gutwrench, guiguts,
snatch, rfrank's stuff, etc. etc. were developed by and for DP, even
if not incorporated into the DP site code.  They were written for the
same reason I'm writing a new pdf extractor.. a DPer saw a need for a
tool, and wrote one. They could, in theory, be hosted on the
sourceforge CVS, but they are often fairly specialized and written by
one or two people.

R C

From Bowerbird at aol.com  Thu Aug  7 13:15:36 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 7 Aug 2008 16:15:36 EDT
Subject: [gutvol-d] a scrolling list of all "find" matches
Message-ID: 

dakretz said:
>    the trickier one to envision is 
>   "show me a scrolling list of all matches in context 
>   (probably line above and line below), 
>   and let me browse it

again...   close, but not _quite_ right.         :+)

the line-above and line-below "context"
means you're still doing scripting-think.

a full-on interface means you _always_have_
the "context" of the full page when needed,
so the matching lines themselves are enough.

you'll see this when i put out "banana cream".

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Thu Aug  7 17:06:16 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 7 Aug 2008 20:06:16 EDT
Subject: [gutvol-d] online, offline, and bothline
Message-ID: 

in deciding who to listen to, and how to weight things they say,
i'd like to ask you, especially you, dkretz, to consider how closely
you will listen to people who haven't programmed this tool-chain.

and to think about how closely you will listen to _me_...

you've got a good start with flex, because it operates online and off,
inside the browser and out, so you don't have to waste time thinking
about whether you are an online process or an offline one, or whether
you operate inside the browser or outside of it.   this is all to the good.

what this means, of course, in terms of your own thinking about it
is that your app will live and work in both worlds at the same time.

the resources -- text and images -- need to exist online in the cloud,
_and_ offline on the hard-disks of any user who chooses to grab 'em.

i have already -- of course -- set up a basic system that operates as so,
a cyberlibrary that is built in a systematic way so as to exploited as such.

the base u.r.l. for the library is this:
>    http://z-m-l.com/go/

after that, each book is named with a unique 5-letter prefix
-- let's call it "bname", for book-name or bowerbird-name --
and the files are loaded into a folder with that unique bname.
>    http://z-m-l.com/go/bname

so, for instance, if the bname is "mabie", the folder would be:
>    http://z-m-l.com/go/mabie

after that, there are certain certainties, e.g., the .zml file is named as:
>    http://z-m-l.com/go/bname/bname.zml

so, for the "mabie" book, it would be this:
>    http://z-m-l.com/go/mabie/mabie.zml

there are a handful of books loaded into this system already:
>    http://z-m-l.com/go/mabie/mabie.zml
>    http://z-m-l.com/go/myant/myant.zml
>    http://z-m-l.com/go/sgfhb/sgfhb.zml
>    http://z-m-l.com/go/woodc/woodc.zml
>    http://z-m-l.com/go/mount/mount.zml

furthermore, by perusing the directory of each of these unique folders,
you can see what resources compose the book.   files are named using
the _page-number_ of the page in the printed-book, so are predictable.

the main file that's associated with each page-number is the image-file,
the scan of the page.   these filenames are also completely predictable...

for instance, the scan of page 123 in the "mabie" book has this name:
>    http://z-m-l.com/go/mabie/mabiep123.jpg

continuing, the scan of page 123 in the "myant" book has this name:
>    http://z-m-l.com/go/myant/myantp123.png

or, per the template, the scan for page 123 in the "bname" book is:
>    http://z-m-l.com/go/bname/bnamep123.png

notice that the image-type will depend on what it is...

the "p" in front of the "123" stands for "page", of course.

for forward-matter pages, they are prefixed by an "f", like:
>    http://z-m-l.com/go/myant/myantf009.png

replacing the ".png" or ".jpg" in each of the above lines with ".html"
will display a page with the text on one side, the image on the other.

and take a look at the folder contents at:
>    http://z-m-l.com/go/myant

notice how unnumbered pages in the body of the book are named, like:
>    http://z-m-l.com/go/myant/myantp046x2a.png

so one way of ascertaining the files that make up the book is to grab
the folder-listing and sorting the filenames to get the sequence info...

in addition, however, in a belt-and-suspenders type of redundancy,
the information is also contained in the text-file itself, as we will see.

if you grab the .zml file for any of those books, and split the contents
on [space-openbrace-openbrace], the top line of each item will have
the filename of the page-scan for that page.   so, for instance, you see
that it gives us this, for the "myant" book, along about page 45...

>     {{myantp045.png}}
>     {{myantp046.png}}
>     {{myantp046x2a.png}}
>     {{myantp046x2b.png}}
>     {{myantp047.png}}
>     {{myantp048.png}}

***

ok, so what does all of this mean?

with a filenaming convention like the one that i've laid out here,
your offline tool can has the smarts to build a bridge to the cloud.

give it the unique bname for the book, and it knows where to get
(1) the .zml file, which informs it (2) which files come in what order.

the tool can then download all the image-files, in the background,
and save them in the appropriately-named bname folder _locally_,
after which the book exists both in the cloud and on their machine.

so i will suggest that the next iteration of your program be able to
do this downloading so as to represent the book on your machine.

just so you know, the "mount" bname is where i've stored my version
of the "mountain blood" book that i was using in my "clean-up" series,
so that might be one that we could agree on as a communal project...
(it's good when you know you've got a clean text to use as a criterion.)

this filenaming infrastructure obviates the need for any _database_
to store the locations of the various resources that comprise a book,
and it introduces transparency and clarity to make it comprehensible
to even fairly naive observers who want to add value to the equation...

enough for now...   further wrinkles later...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Thu Aug  7 17:07:26 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 7 Aug 2008 20:07:26 EDT
Subject: [gutvol-d] thursday's child has far to go
Message-ID: 

keith said:
>    I see no reason why it can not be broken up into individual files 
(pages).
>    It is only up to the analysis software to handle when we need the other 
>    page   and can act accordingly.   Agreed, it makes simpler programming, 
>    but from the human interactor it makes no difference.

actually, it is from the point of the software that "it makes no difference".
but we want to give users the option of handling the text with another tool.
that's the main reason to solidify the whole book into one file, along with
the fact that this generally gives us a file between 200k and 900k, which is
quite manageable in terms of machine processing power and our bandwidth.
because of the convenience, the cost-benefit ratio tips us to the 
single-file.
a text-file for every page creates too much file clutter.

(plus editing concerns will introduce page-based "edit files" soon enough;
see, for the purpose of demarcation of edits, separate files are a good idea,
which is one of the reasons why we save the convention for _that_ purpose;
remember that my experience here is across the totality of the tool-chain.)

-bowerbird

***

juliet said:
>    we have consistently found that there is a decided preference among 
>    some of our volunteers for working in our first proofing round. 
>    They like to feel useful, and that means correcting errors. 
>    They do, in fact, find it boring to not have to DO anything. 

hold that thought.


>    Actually making corrections is just more fun 
>    than checking a page and finding nothing to fix. 

i agree that making corrections is a blast.

but you're confusing a few processes here...

let me begin to elucidate that confusion by asking if 
your smooth-readers "get bored" by too-perfect books.

if you say "no, because the expectation is that the book
will be in clean shape by the time it gets to that stage",
then you have made my point: that expectations matter.

your proofreaders -- like proofreaders in the real world --
should _expect_ that they will make _very_few_changes_,
and only because they've caught a finely-nuanced error...

they should _expect_ that their job is to _verify_ a page
that is perfect -- to verify that that page _is_ perfect...

they should even expect that their job is to _verify_ the
_previous_ verification, so each page gets verified twice.

in other words, all your proofers tomorrow will be much
like your p2/p3 proofers are today -- fine-teeth combs.

your p1 proofers -- the ones who make lots of changes --
will be p0 preprocessors, because that's where the action is.


>    It takes awhile for many people to understand that 
>    finding the subtle, easily-overlooked errors 
>    is much more challenging.

well, if your pages weren't full of errors that are fully
detectable with a single pass of an automated process,
the only errors that would be left remaining would be
those subtle, easily-overlooked errors, and perhaps
your proofers would become attuned to the challenge.

you feed 'em full of "junk-food" errors, and then you
wonder why they don't appreciate a delicate dessert...


>    I just know that there are folks who want to 
>    feel like they've Done Something on
>    most of the pages that they touch.

the thing is, with _much_ o.c.r. that's done correctly,
a good number of pages come out perfect right away.

so, you know, maybe you need to give your people
the chance to bite off a bigger piece of the pizza...

and here's where we come to the solution for all this.

you want to turn your proofers into preprocessors...

for the ones who want more than "a page at a time",
offer them the chance to "do something" book-wide.

preprocessing enables them to correct _lots_of_errors_,
and to do it _very_quickly_.   and they will most definitely
"feel like they've done something", because they will have.

in the space of an hour or so, they'll have brought a book
to near-perfection, correcting all the "big errors" in it, and
leaving only the "subtle easily-overlooked errors" behind...

i feel _incredibly_efficient_ when i am doing preprocessing...
it goes _fast_, and the velocity of improvement is compelling.

indeed, after being used to this improvement velocity, when i
settled into your page-by-page system, i felt claustrophobic.
it was far too slow, too primitive, too constrained, too clumsy.
i felt incredibly _inefficient_, and stupid to be so handicapped.

in fact...

considering how fast it is for a person to do o.c.r. clean-up,
i'd suggest having two people do it for the same set of o.c.r.,
and then comparing the two as a quality-assurance method...


>    On the contrary, I think that as much automated preprocessing 
>    should be done as is possible to do accurately

when are we going to put this canard to bed completely?

the amount of "automated" preprocessing you can do "accurately"
is next to none, and largely so unsurprising that it doesn't matter.
strike the term "automated preprocessing" from the conversation.
it's a pipe-dream.   it doesn't exist, except in a meaningless way...


>    Preprocessing that notes where human judgment is required is fine.

if you're talking about leaving flags in the text, it simply doesn't work.
all of roger's recent experiments have shown that it just doesn't work...


>    But having the content provider do all possible checks ahead of time, 
>    making all of the judgments, isn't something that I would advocate

ok, i've dealt with this several times already, and it's getting very tiring.

i do believe that it should be part of the job of the _content-provider_ 
to do preprocessing.   it's certainly part of the job as currently defined.

but yes, i'm talking about making the "preprocessing" a little bigger job.

so it would certainly be the case you could have somebody _other_than_
the content-provider do it.   a good choice is the eventual postprocessor;
preprocessing as i perceive it is much like what the postprocessor does...

so it really doesn't matter -- it does not matter! -- _who_ does the job.
the only thing that matters is that _someone_ does it, at the right time...

plus, as i made clear above, i would expect that many "ordinary proofers"
would take to preprocessing like a fish takes to water, precisely because
preprocessing gives you a _huge_ sense of immediate accomplishment...

don't get me wrong -- the page-by-page way was _brilliant_ in its day.
but that day was circa 2003.   with today's bandwidth, it's just handcuffs.


>    Our whole purpose is to distribute the work involved 
>    in creating a finished text. 

there's a lot of different ways you can "distribute" the work involved...
i encourage you to think about expanding your too-narrow definition.
take off the blinders...


>    That, combined with leaving something for the volunteers to
>    actually DO, says to me that there's a balance between what should 
>    be done ahead and what can be left for the proofing process.

you seem to see preprocessing as something done to the book _before_
"the volunteers" get it.   i see preprocessing as something the volunteers 
would do.   no, not the p1 volunteers, that's true.   but the p0 
volunteers...

"but we don't have a p0," you might say.

well, you don't have a separate round to _merge_ the _parallel_p1_ files
either, but that doesn't seem to be stopping rfrank from pursuing _that_.
he's assuming you'll interject that round if the process finds it worthy...
and i'm telling you that i've already found preprocessing to be worthy...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Thu Aug  7 17:19:39 2008
From: dakretz at gmail.com (don kretz)
Date: Thu, 7 Aug 2008 17:19:39 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 24
In-Reply-To: 
References: 
Message-ID: <627d59b80808071719rf4cdf3u69dc945b52646cec@mail.gmail.com>

---------- Forwarded message ----------
From: Andrew Sly 
To: Project Gutenberg Volunteer Discussion 
Date: Thu, 7 Aug 2008 12:59:41 -0700 (PDT)
Subject: Re: [gutvol-d] gutvol-d Digest, Vol 49, Issue 22

Well, so you have me convinced that your proposal for
pre-processing is a valid option for some people.

So, to take the Michael Hart approach, I could just say "yes!"

Yes, prepare more texts for DP as you describe.
Yes, post messages in the DP forums telling others Content
 Providers about what you suggest.
Yes, itemize exactly what steps to follow on a page on the
 DP wiki. The pre-processing page on that wiki appears
 meagre at this time.

Andrew

Yes, I think that's a pretty fair description of what I've done for a long
time. I don't tend to advocate for anything I'm not willing to put my own
time and effort into. You won't have much difficulty searching for my many
postings on the DP forums in this regard.

I quit posting to the wiki after I had my writings there removed, deleted,
and hidden under my User tag. It's counterproductive and hard enough to find
stuff anyway. Ref. JulietS. :)

Andrew, is that an offer to help? I can probably find a way for you to do
that. It looks like we're taking names. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Thu Aug  7 17:41:21 2008
From: dakretz at gmail.com (don kretz)
Date: Thu, 7 Aug 2008 17:41:21 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 24
In-Reply-To: 
References: 
Message-ID: <627d59b80808071741p15729a1i2703965e2837a74c@mail.gmail.com>

---------- Forwarded message ----------
From: "Robert Cicconetti" 
To: "Project Gutenberg Volunteer Discussion" 
Date: Thu, 7 Aug 2008 16:01:44 -0400
Subject: Re: [gutvol-d] gutvol-d Digest, Vol 49, Issue 23
On Thu, Aug 7, 2008 at 3:20 PM, don kretz  wrote:

>> You seem to be assuming DP is some sort of corporation where resources
>> can assigned to task units. DP developers are volunteers who work on
>> what interests them. There is some sort of coordination, but you'd be
>> much better off talking to the dev list directly, or posting a task
>> request at DP.

I'm sorry if I gave you that impression, although I don't see how I did. I'm
only assuming it's an organization with a leadership group who makes
decisions and allocates resources.

I tremble to imagine that the development process is as completely
anarchical as you describe. I know if Juliet asked me to implement just
about anything on her list, and gave me the access and (free) resources on
the system, I'd accept, give it my best shot, and probably deliver. If there
isn't at least that level of commitment, we are in trouble.

The dev list ran dry months and months ago. When it was active, I
participated actively. Any planning processes either don't exist, or they
are now out of sight.


>> You are incorrect about the tools.. guiprep, gutwrench, guiguts,
>> snatch, rfrank's stuff, etc. etc. were developed by and for DP, even
>> if not incorporated into the DP site code. They were written for the
>> same reason I'm writing a new pdf extractor.. a DPer saw a need for a
>> tool, and wrote one. They could, in theory, be hosted on the
>> sourceforge CVS, but they are often fairly specialized and written by
>> one or two people.

Unfortunately that approach didn't work too well for the page editor
interface I developed, or the My Projects list, or the diff viewer. They
seemed to work better integrated with the code base. At least that was the
case on the dev server, where we did user testing.

In fact, I think your examples reinforce my assertions. All standalone
utilities that required no cooperation or collaboration with dp software
people.

Which may be fine, and the way it's intended to be, but then we're better
off boldly declaring it, and leave off bemoaning the insuffiencies of
development resources.

I'm mainly bringing these issues up here in the hope that I can find other
avenues (wikisource, whatever) that are interested in accepting the type of
work dp has no interest in. Anything dp can use (e.g. twister, twistEd,
whatever) they are more than welcome to have - it's where I spend a fair
amount of my non-technical contribution, too, you  know. So far, that's
seemingly quite acceptable, and a much more positive experience for me
personally, obviously.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Thu Aug  7 18:07:35 2008
From: dakretz at gmail.com (don kretz)
Date: Thu, 7 Aug 2008 18:07:35 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 24
In-Reply-To: 
References: 
Message-ID: <627d59b80808071807i64a562d2s1f84895096e148ed@mail.gmail.com>

---------- Forwarded message ----------
From: Bowerbird at aol.com
To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com
Date: Thu, 7 Aug 2008 16:15:36 EDT
Subject: [gutvol-d] a scrolling list of all "find" matches
dakretz said:
> the trickier one to envision is
> "show me a scrolling list of all matches in context
> (probably line above and line below),
> and let me browse it

again... close, but not _quite_ right. :+)

the line-above and line-below "context"
means you're still doing scripting-think.

a full-on interface means you _always_have_
the "context" of the full page when needed,
so the matching lines themselves are enough.

you'll see this when i put out "banana cream".

-bowerbird
Ah, but that's not so great for a.) checking to see the consequences of a
bulk change I'm considering making, b.) counting and examining similar
errors and considering what new rules and replacements I might consider, c.
comparing possible search-and-replace strategies wrt their scope and
accuracy.
Once I've made the "safe" corrections, then you'll notice I've allowed for
your one-at-a-time mode for the stuff that wants individual inspection.
There are a number of those cases (some duly noted) in my regex file.
-----------------------------------------------
You might want to obseve that I've already covered you on the file naming
issue. That's the primary purpose of Twister (initially because PG wants it
that way too - so why not get it right to start with.) I could make the dp
host changes to accommodate, if ... but, we've been there.  :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From grythumn at gmail.com  Thu Aug  7 18:24:08 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Thu, 7 Aug 2008 21:24:08 -0400
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 24
In-Reply-To: <627d59b80808071741p15729a1i2703965e2837a74c@mail.gmail.com>
References: 
	<627d59b80808071741p15729a1i2703965e2837a74c@mail.gmail.com>
Message-ID: <15cfa2a50808071824w26d18821s83e7f4102d52c7d0@mail.gmail.com>

On Thu, Aug 7, 2008 at 8:41 PM, don kretz  wrote:
>>> You are incorrect about the tools.. guiprep, gutwrench, guiguts,
>>> snatch, rfrank's stuff, etc. etc. were developed by and for DP, even
>>> if not incorporated into the DP site code. They were written for the
>>> same reason I'm writing a new pdf extractor.. a DPer saw a need for a
>>> tool, and wrote one. They could, in theory, be hosted on the
>>> sourceforge CVS, but they are often fairly specialized and written by
>>> one or two people.
>
> Unfortunately that approach didn't work too well for the page editor
> interface I developed, or the My Projects list, or the diff viewer. They
> seemed to work better integrated with the code base. At least that was the
> case on the dev server, where we did user testing.

Those are a different class of problem that interact with the DP
database, so of course they work better in site code. There are also a
ton of utilities that the squirrels use that don't get published at
all. The standalone utilities are primarily for CP/PPs. I think there
was once an offline proofing application that died off because it
wasn't maintained and fell out of sync with the online interface. But
they are still utilities designed specifically for the DP workflow.

R C

From jayvdb at gmail.com  Thu Aug  7 18:56:19 2008
From: jayvdb at gmail.com (John Vandenberg)
Date: Fri, 8 Aug 2008 11:56:19 +1000
Subject: [gutvol-d] PG got a mention on BBC
In-Reply-To: <000001c8f85f$e808ede0$b81ac9a0$@co.uk>
References: 
	<627d59b80808042111h29eed1c5v96a147a27d426107@mail.gmail.com>
	<489A55C8.5080705@novomail.net>
	<15cfa2a50808061918m193749dga4ed69e6d6cdccc8@mail.gmail.com>
	<000001c8f85f$e808ede0$b81ac9a0$@co.uk>
Message-ID: 

On Thu, Aug 7, 2008 at 5:33 PM, Dave Fawthrop
 wrote:
> The flagship program, Today, on BBC (British Broadcasting Corporation) Radio
> 4 today.
> Gave a mention to PG in a piece on e-book readers.

It looks like this is the online article for it:

http://news.bbc.co.uk/today/hi/today/newsid_7545000/7545598.stm

--
John

From bzg at altern.org  Thu Aug  7 20:11:47 2008
From: bzg at altern.org (Bastien)
Date: Thu, 07 Aug 2008 22:11:47 -0500
Subject: [gutvol-d] bastien
In-Reply-To: <536B0A73-E9A7-437B-A2C1-231A9A366907@uni-trier.de> (Schultz
	Keith J.'s message of "Thu, 7 Aug 2008 14:18:46 +0200")
References: <1872592925.262851218110326603.JavaMail.mail@webmail08>
	<536B0A73-E9A7-437B-A2C1-231A9A366907@uni-trier.de>
Message-ID: <87hc9wgpzg.fsf@altern.org>

"Schultz Keith J."  writes:

> 	Of course, Micheal did not do an exact quote, but a metaphorical
> 	one.

Well, quoting is quoting.  Quotes clearly say: I'm quoting something.
If you're quoting from memory and you're not sure, then it's honest to
say so.  There is no such thing as a "metaphorical quote" (these quotes
imply I'm referring to what *you* might have in mind with this.)

> 	So much flamming about subtleties!

"When censors change the quotes, they eventually change the truth."

:)

-- 
Bastien

From vlsimpson at gmail.com  Thu Aug  7 20:59:35 2008
From: vlsimpson at gmail.com (vlsimpson at gmail.com)
Date: Thu, 7 Aug 2008 22:59:35 -0500
Subject: [gutvol-d] OCR Processing steps [was something else]
Message-ID: 

On 8/7/08, Dave Fawthrop  wrote:
> Assuming that they work as intended, almost all these can occur
> occasionally in real text. Notably in mixed prose with poetry.
> Thus require a human to check if they are OCR mistakes or intended.
>
> As an example:
> 16. Double-line-end followed by lowercase
> Common in poetry
>
> A few do not work on non English text.

The regexp's work,  I tested them. I've been using them for quite
awhile independently of the the series.

I think the series is useful and a request was made for a translation
from 'look for this' into a regular expression so ... there is my
effort.

I don't willy-nillly run a global search and replace. I plug in the
pattern and look at each hit.

As for not working for non-English text; yes there is that issue. The
patterns can be adapted I'm sure.

From azkar0 at gmail.com  Thu Aug  7 22:18:34 2008
From: azkar0 at gmail.com (Scott Olson)
Date: Thu, 7 Aug 2008 23:18:34 -0600
Subject: [gutvol-d] Bowerbir OCR steps
In-Reply-To: 
References: 
Message-ID: <2362473e0808072218t3636232ai89c2424db1248843@mail.gmail.com>

>
> 18. Fix the spacey quotes.
>
>     Regexp: This one made my brain hurt ;). I'll figure it out later (unless
>     someone else can do it).
>
> This one seems difficult to do as a pure regex search, since I believe it
involved counting odd/even quotes and the like. You might be able to
accomplish it by doing several passes, replacing the quotes with
placeholders to signify assumed open/close status (the unicode curly quotes
would be ideal as a visual for the operator). Then do searches looking for
spacey quotes (space after open or before closed), and non-terminating
quotes (an open without a close before double newline). Then replace back in
the ASCII quotes.

On Wed, Aug 6, 2008 at 6:41 PM, V. L. Simpson  wrote:

> I've gathered up BB's OCR steps into one single file and attached some
> some regular expressions for the easy parts.
>
> http://vls.freeshell.org/dpfiles/textchecks.txt
>
> I'm no regexp expert so if  any one else has improvements let me me know.
>
> Vance
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Thu Aug  7 22:54:06 2008
From: dakretz at gmail.com (don kretz)
Date: Thu, 7 Aug 2008 22:54:06 -0700
Subject: [gutvol-d] spaceyquotes
Message-ID: <627d59b80808072254q69aa63acwc1b6c59830b1b251@mail.gmail.com>

Here is the set of regexes from my regex file that deal with spaceyquotes.
This is in lieu of counting odd-evens. Doing it both ways is probably a good
idea. Some of these are pretty close to infallible, others need to be
inspected.

-- ' " xxxxxx"' => ' "xxxxxx"'
1,$s/ " \(\a.*\)"/ "\1"/

-- comma-space-quote-space => trim right space
1,$s/, " /, "/g

-- same for a period
1,$s/\. " /\. "/g

-- same for semicolon
1,$s/; " /; "/g

-- same for colon
1,$s/: " /: "/g

-- colon-space-quote-spqce => colon-space-quote
1,$s/: " /: "/g

-- space-quote-semicolon => trim leading space
1,$s/ ";/";/g

-- space-quote-space-right-paren => trim leading space
1,$s/ " (/" (/g

-- space-quote-rightparen => trim leading space
1,$s/ ")/")/g

-- leftparen-quote-space => trim right space
1,$s/(" /("/g

-- space-quote-dash => remove space
1,$s/ "-/"-/g

-- rightparen-space-quote-space => remove trailing space
1,$s/) " /) "/g

-- no spaceyquote after "and", "the", "is", "a", "as", "was", or a digit
1,$s/(\s)and " /$1and "/g
1,$s/\(\s\)the " /\1the "/g
1,$s/\(\s\)is " /\1is "/g
1,$s/\(\s\)a " /\1a "/g
1,$s/\(\s\)as " /\1as "/g
1,$s/\(\s\)was " /\1was "/g
1,$s/\(\d\) " /\1 "/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From schultzk at uni-trier.de  Fri Aug  8 00:05:28 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Fri, 8 Aug 2008 09:05:28 +0200
Subject: [gutvol-d] bastien
In-Reply-To: <489AEB7A.4080600@perathoner.de>
References: <1872592925.262851218110326603.JavaMail.mail@webmail08>
	<536B0A73-E9A7-437B-A2C1-231A9A366907@uni-trier.de>
	<489AEB7A.4080600@perathoner.de>
Message-ID: <687B0D9B-D3BB-435E-B2E6-CDD753A1B891@uni-trier.de>

Hi Marcello,

	As we are OT here I am stepping out.

	I am surpise you did not catch my metaphorical quote
	at the end of my message.

	Much ado about nothing.

	regards
		Keith.

Am 07.08.2008 um 14:32 schrieb Marcello Perathoner:

> Schultz Keith J. wrote:
>
>>     Of course, Micheal did not do an exact quote, but a  
>> metaphorical one.
>
> Ah! He made it up himself. That may explain things.
>
> But then he should not use quote signs, which always imply somebody  
> is actually being quoted, and the Heine quote is the only one with  
> books and people in existence.
>
> (If you find more of that kind, please tell me. (And I already know  
> "Fahrenheit 451".))
>
>
> -- 
> Marcello Perathoner
> webmaster at gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From jayvdb at gmail.com  Fri Aug  8 00:11:13 2008
From: jayvdb at gmail.com (John Vandenberg)
Date: Fri, 8 Aug 2008 17:11:13 +1000
Subject: [gutvol-d] bastien
In-Reply-To: <489AEB7A.4080600@perathoner.de>
References: <1872592925.262851218110326603.JavaMail.mail@webmail08>
	<536B0A73-E9A7-437B-A2C1-231A9A366907@uni-trier.de>
	<489AEB7A.4080600@perathoner.de>
Message-ID: 

On Thu, Aug 7, 2008 at 10:32 PM, Marcello Perathoner
 wrote:
> Schultz Keith J. wrote:
>
>>    Of course, Micheal did not do an exact quote, but a metaphorical one.
>
> Ah! He made it up himself. That may explain things.
>
> But then he should not use quote signs, which always imply somebody is
> actually being quoted, and the Heine quote is the only one with books and
> people in existence.
>
> (If you find more of that kind, please tell me. (And I already know
> "Fahrenheit 451".))

There is an assorted bunch here:

http://en.wikiquote.org/wiki/Censorship

There are a few others that hints at the link between books and people.

"Books won't stay banned. They won't burn. Ideas won't go to jail. In
the long run of history, the censor and the inquisitor have always
lost. The only sure way against bad ideas is better ideas. The source
of better ideas is freedom. The surest path to wisdom is liberal
education." ~ Alfred Whitney Griswold

"Assassination is the extreme form of censorship." ~ "The to
Tolerance", in preface of The Shewing-up of Blanco Posnet by George
Bernard Shaw

--
John Mark Vandenberg

From schultzk at uni-trier.de  Fri Aug  8 00:13:33 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Fri, 8 Aug 2008 09:13:33 +0200
Subject: [gutvol-d] Please change the subject
In-Reply-To: <489AFF62.6050908@novomail.net>
References: <20080803070319.GA11777@ark.in-berlin.de>	<4897A453.4090900@novomail.net>
	<489AD550.7010907@perathoner.de> <489AFF62.6050908@novomail.net>
Message-ID: 

Hi Everybody,

Am 07.08.2008 um 15:57 schrieb Lee Passey:

> Marcello Perathoner wrote:
>
>> On one hand preprocessing the text would save proofers keystrokes.
>> On the other hand preprocessing would reduce proofers attention  
>> and reward.
>
> The notion that proofers should be presented with texts containing  
> a significant number of errors, perhaps artificially seeded if  
> there are not enough, seems to me the height of perversity. I  
> simply cannot imagine that any proofer would feel bored or unfilled  
> if s/he were unable to find an error in a text. The notion boggles  
> my mind. If there are such individuals (which I doubt) they're  
> probably not the kind of detail-oriented people you would want  
> proofing texts.
	I idea that proofer need not be bored or find a reward, reminds me  
of dogs trained to search for drugs.
	If they do not find anything. Drugs are place purposely.

	Utterly, fearful of these implications.

>
>> (It's funny how Bowerbird can propose to preprocess text to reduce  
>> errors and at the same time propose to reintroduce deliberate  
>> errors to keep proofers attention up.)
>
> It's funny how people can suggest that bb has proposed deliberately  
> introducing errors, when he has never done so and has vehemently  
> opposed the suggestion whenever it has been made. The only people  
> who have suggested that errors should be deliberately introduced  
> are those who feel threatened by the notion that proofers should  
> start with texts which are as clean as possible, even if it  
> requires a modicum more of effort before publication.
>
	BB never said he would introduce errors deliberately. Somebody,  
discuss using the regex and mentioned
	that there is a possiblity of errors being introduced.
	
	As a matter of fact, DP has been, according to post here that THEY  
have(or at least planning) introduced errors
	to test the proofers quality.

	regards
		Keith.


From schultzk at uni-trier.de  Fri Aug  8 01:00:31 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Fri, 8 Aug 2008 10:00:31 +0200
Subject: [gutvol-d] DP code base (was Error detection and maybe
	correction)
In-Reply-To: <489B0109.3090400@verizon.net>
References: <627d59b80808050934v25fb4c56tf4bd74a86a298ad8@mail.gmail.com>
	<489ADAA1.7000101@perathoner.de>
	
	<489B0109.3090400@verizon.net>
Message-ID: 

Hi All, JuiletS,

	Let me explain the type of restructuring and refactoring I am  
thinking about.
	I will try to kept things simple and cut some corners here.

	First, of everything becomes modularized. That is parts be come  
exchangable.
	this is possible because in the specification defines rigid interfaces.
	That is we know in what form the input we be and what it will look  
like in the
	end.

	The system will be layered. So if we need something new, we analyze  
to which layer it belongs
	add it to or create a new one. Then it is integrated into the work  
flow or process.

	We will need libraries for filtering, searching, editing, statics,  
markup, whitewashing,
	deployment, formats conversion, display just to name a few.

	Lets look at the editing. There is an editor for the proofing:
		It displays the text
		It displays the scans

	Well, anybody notice that we could use this during preprocessing, as  
preprocessing is suggested
	by some. Yes, I know that it is designed right to do the job. The  
Editor need to be redisign
	sothat we can put together a configurable editor or editors. But, we  
libraries of the editor
	routines and can reuse those software components.

	Preprocessing could work this way. the input is flagged marking  
problematic errors(all),
	the editor can the display the text and the scans. For the display  
say we use the proposed
	heat maps. Say along the way we decide it is better to use markup  
instead. we do not need
	to rewrite the entire editor just the switch to the module for  
display the text. The preprocess
	will not need to see how we flagged or mark the text. The editor  
does all that work.
	In the end we remove the error mark-up and passit on.

	Say we decided to fix only the sure things during preprocessing.  
just remove the editor, add correction
	module, finished. We notice things do not go so well. Take automatic  
correction, put the editor back,
	have it just these errors and ignoring the rest.

	Before the efficiency friends start up. Flexibility comes at cost.  
But what would you have
	the user wait a second longer or wait months for the developers to  
change the system.

	This kind of engineering wil take time, especially the theoretical  
googlygop, as some call it, at least
	a half or a year if not more. What will scare most is that not a  
single line of code will have been
	written, heavy discussion.

	I know this will work. I have done it before. I even got paid for it.

	regrads
		Keith.



Am 07.08.2008 um 16:04 schrieb Juliet Sutherland:

> The developers who are currently working with the DP code are well  
> aware of the problems with it. They've been gradually working at  
> refactoring it as they touch various pieces of it for other things.  
> My understanding is that certain areas are now significantly better  
> than they were, while others, most notably the user interface, are  
> still in horrible shape.
>
> One constraint that hasn't been mentioned is that we don't know  
> exactly where we will end up, in terms of processes. It's one thing  
> to structure code when you have, or can obtain, exact  
> specifications as to what it is to do. But we don't have those, so  
> our developers have been trying to build in as much flexibility as  
> possible. The original code base made lots of assumptions about how  
> things should/would work which have proven to be insufficient as  
> we've gained experience. Some of those assumptions are still built  
> in to the data structures etc. But on the whole, the efforts of our  
> developers have been to make things as flexible as possible.  
> Starting from scratch would be better, of course, but we are trying  
> to head in the right direction.
>
> Another matter that is often overlooked in discussion of workflow  
> changes at DP is that changing the code is only the very beginning  
> of the effort. We have lots of documentation that has to be updated  
> as well, which is no trivial task. We have almost as few people who  
> are really willing to work on documentation as we have developers  
> who are familiar with the code. And then, of course, there's our  
> large population of volunteers who would also need to be educated  
> about the changes. All of this is not to say that fundamental  
> change is impossible. We've shown that we can make large changes.  
> It's just something that we approach with great care.
>
> JulietS
>

From schultzk at uni-trier.de  Fri Aug  8 01:11:25 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Fri, 8 Aug 2008 10:11:25 +0200
Subject: [gutvol-d] my spam folder
In-Reply-To: 
References: 
Message-ID: <30C172C6-D039-4ADF-8CC5-2B88677450A3@uni-trier.de>

Com'On BB,

	Marcello, did have an interresting post. and worthwhile,
	but do actually need the childish nit-picking.

	If you are REALLY interrested do the work yourself.

	regards
		Keith.

Am 07.08.2008 um 18:37 schrieb Bowerbird at aol.com:

> boy, my spam folder is really lit up this morning...
>
> for those who don't know, i now ignore the following trolls:
> marcello, josh, lee, robert, karen, and a handful of others...
>
> i do that because, in literally _years_ of my experience, they
> almost _never_ advanced the dialog here in a meaningful way.
>
> which is not to say that they would be incapable of doing so.
>
> so please, if _any_ of 'em has said _anything_ that's worthwhile,
> put it into your own words and tell me that they made a point...
>
> otherwise, i'll assume that their input is up to its usual standard,
> and i won't waste even one little bit of my time reading any of it.
>
> with dakretz and maybe rfrank making some real progress now,
> and even vlsimpson (who has been maintaining guiguts lately)
> chipping in on the collaboration, it's no time to bother with trolls.
>
> -bowerbird
>
>
>
> **************
> Looking for a car that's sporty, fun and fits in your budget? Read  
> reviews on AOL Autos.
> (http://autos.aol.com/cars-BMW-128-2008/expert-review? 
> ncid=aolaut00050000000017 )
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From schultzk at uni-trier.de  Fri Aug  8 01:15:43 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Fri, 8 Aug 2008 10:15:43 +0200
Subject: [gutvol-d] Please change the subject
In-Reply-To: 
References: <1585478277.1793141218119085003.JavaMail.mail@webmail09>
	
Message-ID: <82E07E0D-EF76-4EDC-9157-F6588F35F0B1@uni-trier.de>

No problem of letting them correct errors! It has been suggested
that preprocessing flag the all possible errors. Deciding whether
they are valid or not.

Looks to me that we could solve to problems at one time. We only need
to redefine the role of P1. No biggie!? ;-))

regards
	Keith.

Am 07.08.2008 um 19:17 schrieb Andrew Sly:

>
>
>> On Aug 7, 2008, lee at novomail.net wrote:
>>
>> The notion that proofers should be presented with texts containing a
>> significant number of errors, perhaps artificially seeded if there  
>> are
>> not enough, seems to me the height of perversity. I simply cannot
>> imagine that any proofer would feel bored or unfilled if s/he were
>> unable to find an error in a text. The notion boggles my mind. If  
>> there
>> are such individuals (which I doubt) they're probably not the kind of
>> detail-oriented people you would want proofing texts.
>
> On Thu, 7 Aug 2008, Joshua Hutchinson wrote:
>
>> Actually, Lee, that exact sentiment pops up A LOT over at DP  
>> forums.  And not from the "powers-that-be" that BB likes to rail  
>> against (they've stayed out of it to a large extent).  It comes  
>> from brand spanking newbies to crusty old hands.
>>
>> Personally, I don't buy into it, but it is a very recurring thread  
>> of discussion among DP volunteers.
>
>
> I have to say I agree with Josh here. I've had the impression that
> many ppl working in P1 at DP like having something that needs to
> be changed, and like fixing the "obvious" errors.
>
> And since there is always an abundance of availible eyes in P1, why
> not let them...
>
> --Andrew
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From schultzk at uni-trier.de  Fri Aug  8 01:25:55 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Fri, 8 Aug 2008 10:25:55 +0200
Subject: [gutvol-d] thursday's child has far to go
In-Reply-To: 
References: 
Message-ID: <123AE3FA-07EB-435C-8F94-2CEB1847CA5D@uni-trier.de>


Am 08.08.2008 um 02:07 schrieb Bowerbird at aol.com:

> keith said:
> >   I see no reason why it can not be broken up into individual  
> files (pages).
> >   It is only up to the analysis software to handle when we need  
> the other
> >   page  and can act accordingly.  Agreed, it makes simpler  
> programming,
> >   but from the human interactor it makes no difference.
>
> actually, it is from the point of the software that "it makes no  
> difference".
> but we want to give users the option of handling the text with  
> another tool.
> that's the main reason to solidify the whole book into one file,  
> along with
> the fact that this generally gives us a file between 200k and 900k,  
> which is
> quite manageable in terms of machine processing power and our  
> bandwidth.
> because of the convenience, the cost-benefit ratio tips us to the  
> single-file.
> a text-file for every page creates too much file clutter.
	So what is the problem.  If we develope tools that work with the  
text page based.
	So how hard would it be to concatinate the pages, how long would it  
take you!
	Your argument is mute.

	regards
		Keith.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From schultzk at uni-trier.de  Fri Aug  8 01:32:55 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Fri, 8 Aug 2008 10:32:55 +0200
Subject: [gutvol-d] bastien
In-Reply-To: <87hc9wgpzg.fsf@altern.org>
References: <1872592925.262851218110326603.JavaMail.mail@webmail08>
	<536B0A73-E9A7-437B-A2C1-231A9A366907@uni-trier.de>
	<87hc9wgpzg.fsf@altern.org>
Message-ID: 

As I have mentioned in other post. What is a quote and not we
can discuss in literature 101.

I wonder who is censoring who. Truth is not a universal concept.

regards
	Keith.


Am 08.08.2008 um 05:11 schrieb Bastien:

> "Schultz Keith J."  writes:
>
>> 	Of course, Micheal did not do an exact quote, but a metaphorical
>> 	one.
>
> Well, quoting is quoting.  Quotes clearly say: I'm quoting something.
> If you're quoting from memory and you're not sure, then it's honest to
> say so.  There is no such thing as a "metaphorical quote" (these  
> quotes
> imply I'm referring to what *you* might have in mind with this.)
>
>> 	So much flamming about subtleties!
>
> "When censors change the quotes, they eventually change the truth."
>
> :)
>
> -- 
> Bastien
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From marcello at perathoner.de  Fri Aug  8 01:46:56 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri, 08 Aug 2008 10:46:56 +0200
Subject: [gutvol-d] spaceyquotes
In-Reply-To: <627d59b80808072254q69aa63acwc1b6c59830b1b251@mail.gmail.com>
References: <627d59b80808072254q69aa63acwc1b6c59830b1b251@mail.gmail.com>
Message-ID: <489C0800.2060905@perathoner.de>

don kretz wrote:

> Here is the set of regexes from my regex file that deal with spaceyquotes.

When the solution gets that complex, its usually an indication that the 
algorithm is badly chosen.

Why don't you take 100 well-proofed texts and run a statistical analysis 
of the characters you find around quotes? Assuming iso-8859-1 there are 
only 256 * 256 possible combinations. Sum hits in a table. Normalize 
table into percentiles.

Then scan the dirty text and flag all combinations you never or seldom 
found before.


> -- space-quote-semicolon => trim leading space
> 1,$s/ ";/";/g

In French typography you often find space before (semi)colon.



-- 
Marcello Perathoner
webmaster at gutenberg.org


From schultzk at uni-trier.de  Fri Aug  8 01:57:44 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Fri, 8 Aug 2008 10:57:44 +0200
Subject: [gutvol-d] spaceyquotes
In-Reply-To: <489C0800.2060905@perathoner.de>
References: <627d59b80808072254q69aa63acwc1b6c59830b1b251@mail.gmail.com>
	<489C0800.2060905@perathoner.de>
Message-ID: 

Hi Marcello,

	There are actually few possible combinations.
	But we do not what to get into that.

	Furthermore, there are better ways of resolving this.
	
	algoriths are more often dependant on resources at hand.
	Here regex.

	For this kind of work there is gawk, flex, bison, yacc, etc.

	Keith.

	
Am 08.08.2008 um 10:46 schrieb Marcello Perathoner:

> don kretz wrote:
>
>> Here is the set of regexes from my regex file that deal with  
>> spaceyquotes.
>
> When the solution gets that complex, its usually an indication that  
> the algorithm is badly chosen.
>
> Why don't you take 100 well-proofed texts and run a statistical  
> analysis of the characters you find around quotes? Assuming  
> iso-8859-1 there are only 256 * 256 possible combinations. Sum hits  
> in a table. Normalize table into percentiles.
>
> Then scan the dirty text and flag all combinations you never or  
> seldom found before.
>
>
>> -- space-quote-semicolon => trim leading space
>> 1,$s/ ";/";/g
>
> In French typography you often find space before (semi)colon.
>
>
>
> -- 
> Marcello Perathoner
> webmaster at gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From marcello at perathoner.de  Fri Aug  8 04:20:25 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri, 08 Aug 2008 13:20:25 +0200
Subject: [gutvol-d] spaceyquotes
In-Reply-To: 
References: <627d59b80808072254q69aa63acwc1b6c59830b1b251@mail.gmail.com>	<489C0800.2060905@perathoner.de>
	
Message-ID: <489C2BF9.9060808@perathoner.de>

Schultz Keith J. wrote:

>     algoriths are more often dependant on resources at hand.
>     Here regex.

I suppose it is tempting, if the only tool you have is a hammer, to 
treat everything as if it were a nail.


>     For this kind of work there is gawk, flex, bison, yacc, etc.

For real regex work your choices are: sed, awk and perl.

perl can do regexes quite nicely besides providing real computing power 
where you need it.



-- 
Marcello Perathoner
webmaster at gutenberg.org


From bzg at altern.org  Fri Aug  8 04:46:47 2008
From: bzg at altern.org (Bastien Guerry)
Date: Fri, 08 Aug 2008 06:46:47 -0500
Subject: [gutvol-d] ...
In-Reply-To:  (Schultz
	Keith J.'s message of "Fri, 8 Aug 2008 10:32:55 +0200")
References: <1872592925.262851218110326603.JavaMail.mail@webmail08>
	<536B0A73-E9A7-437B-A2C1-231A9A366907@uni-trier.de>
	<87hc9wgpzg.fsf@altern.org>
	
Message-ID: <87y737afvc.fsf_-_@altern.org>

"Schultz Keith J."  writes:

> As I have mentioned in other post. What is a quote and not we
> can discuss in literature 101.

All what you said about Michael's "quote" looks driven by the will to
defend him.  I'm okay that there is nothing terribly wrong with wrong
quotes (especially on a small mailing list), but there is something
wrong in saying that wrong quotes are ok because they are metaphorical
quotes...  Culture might be about mixing things together, but hardly
about distorting them.

-- 
Bastien

From jayvdb at gmail.com  Fri Aug  8 07:15:22 2008
From: jayvdb at gmail.com (John Vandenberg)
Date: Sat, 9 Aug 2008 00:15:22 +1000
Subject: [gutvol-d] online, offline, and bothline
In-Reply-To: 
References: 
Message-ID: 

On Fri, Aug 8, 2008 at 10:06 AM,   wrote:
> in deciding who to listen to, and how to weight things they say,
> i'd like to ask you, especially you, dkretz, to consider how closely
> you will listen to people who haven't programmed this tool-chain.
>
> and to think about how closely you will listen to _me_...
>
> you've got a good start with flex, because it operates online and off,
> inside the browser and out, so you don't have to waste time thinking
> about whether you are an online process or an offline one, or whether
> you operate inside the browser or outside of it.  this is all to the good.
>
> what this means, of course, in terms of your own thinking about it
> is that your app will live and work in both worlds at the same time.
>
> the resources -- text and images -- need to exist online in the cloud,
> _and_ offline on the hard-disks of any user who chooses to grab 'em.
>
> i have already -- of course -- set up a basic system that operates as so,
> a cyberlibrary that is built in a systematic way so as to exploited as such.
>
> the base u.r.l. for the library is this:
>>   http://z-m-l.com/go/
>
> after that, each book is named with a unique 5-letter prefix
> -- let's call it "bname", for book-name or bowerbird-name --
> and the files are loaded into a folder with that unique bname.
>>   http://z-m-l.com/go/bname
>
> so, for instance, if the bname is "mabie", the folder would be:
>>   http://z-m-l.com/go/mabie
>
> after that, there are certain certainties, e.g., the .zml file is named as:
>>   http://z-m-l.com/go/bname/bname.zml
>
> so, for the "mabie" book, it would be this:
>>   http://z-m-l.com/go/mabie/mabie.zml
>
> there are a handful of books loaded into this system already:
>>   http://z-m-l.com/go/mabie/mabie.zml
>>   http://z-m-l.com/go/myant/myant.zml
>>   http://z-m-l.com/go/sgfhb/sgfhb.zml
>>   http://z-m-l.com/go/woodc/woodc.zml
>>   http://z-m-l.com/go/mount/mount.zml
>
> furthermore, by perusing the directory of each of these unique folders,
> you can see what resources compose the book.  files are named using
> the _page-number_ of the page in the printed-book, so are predictable.
>
> the main file that's associated with each page-number is the image-file,
> the scan of the page.  these filenames are also completely predictable...
>
> for instance, the scan of page 123 in the "mabie" book has this name:
>>   http://z-m-l.com/go/mabie/mabiep123.jpg
>
> continuing, the scan of page 123 in the "myant" book has this name:
>>   http://z-m-l.com/go/myant/myantp123.png
>
> or, per the template, the scan for page 123 in the "bname" book is:
>>   http://z-m-l.com/go/bname/bnamep123.png
>
> notice that the image-type will depend on what it is...
>
> the "p" in front of the "123" stands for "page", of course.
>
> for forward-matter pages, they are prefixed by an "f", like:
>>   http://z-m-l.com/go/myant/myantf009.png
>
> replacing the ".png" or ".jpg" in each of the above lines with ".html"
> will display a page with the text on one side, the image on the other.
>
> and take a look at the folder contents at:
>>   http://z-m-l.com/go/myant
>
> notice how unnumbered pages in the body of the book are named, like:
>>   http://z-m-l.com/go/myant/myantp046x2a.png
>
> so one way of ascertaining the files that make up the book is to grab
> the folder-listing and sorting the filenames to get the sequence info...
>
> in addition, however, in a belt-and-suspenders type of redundancy,
> the information is also contained in the text-file itself, as we will see.
>
> if you grab the .zml file for any of those books, and split the contents
> on [space-openbrace-openbrace], the top line of each item will have
> the filename of the page-scan for that page.  so, for instance, you see
> that it gives us this, for the "myant" book, along about page 45...
>
>>    {{myantp045.png}}
>>    {{myantp046.png}}
>>    {{myantp046x2a.png}}
>>    {{myantp046x2b.png}}
>>    {{myantp047.png}}
>>    {{myantp048.png}}
>
> ***
>
> ok, so what does all of this mean?
>
> with a filenaming convention like the one that i've laid out here,
> your offline tool can has the smarts to build a bridge to the cloud.
>
> give it the unique bname for the book, and it knows where to get
> (1) the .zml file, which informs it (2) which files come in what order.
>
> the tool can then download all the image-files, in the background,
> and save them in the appropriately-named bname folder _locally_,
> after which the book exists both in the cloud and on their machine.
>
> so i will suggest that the next iteration of your program be able to
> do this downloading so as to represent the book on your machine.

This is almost exactly how the pagescan sets are externally visible on
Wikisource, and from my very small experience with DP, it looks like
they have a methodical structure as well.  Essentially you are
creating a well-defined interface for retrieving the text & images.

What you have described, and built, is slightly simpler and a lot
neater than we have at Wikisource.

> this filenaming infrastructure obviates the need for any _database_
> to store the locations of the various resources that comprise a book,
> and it introduces transparency and clarity to make it comprehensible
> to even fairly naive observers who want to add value to the equation...

The need for a database comes in when you massively distribute the
task.  Breaking the text into pages, which you have said is
undesirable in this day and age, is also helpful to avoid conflicts,
but not necessary and it introduces other headaches.

Now on the topic of database or not, there is a third option :- a
massively distributed "database," that has recently been built ...
git.

http://en.wikipedia.org/wiki/Git_(software)

There are other version control systems that might do the trick, but I
think this is the most malleable, and thus suitable for the task.

If the PG etext database was rebuilt as a git repository, the many
disparate transcription projects would be able to "talk" a common
structure/language which has the intrinsic ability to allow fluid
interchange of etexts and fixes/changes/whatever, with etexts being
pushed to PG once they are completed.

For example, bowerbird could expose his "blood mountain" work onto a
public server, and tell me to grab it.  I could then go offline and
play around with it and then email him a few revisions that I was able
to make (git allows transmission of diffs via email).  He would import
those into his repo, and then someone else might have pull the revised
work, and make some further improvements.  After a few people have
done this, he could then post-process it and flag is as ready for PG
to grab it.  At some point PG could _pull_ it from bowerbird.

The beauty of git is that it is flexible about workflow.  revisions
can flow in any directions, so the _people_ determine the process.

If someone gets hit by a bus, the process can adapt to the natural
change in people that results.  If two groups want to interact
differently from the typical workflow, the same framework is still
used with their workflow being different. (For example, DP might be
authorised to _push_ etexts to PG because they are known to do their
copyright homework.)  If one content provider proves to be unreliable,
nobody bothers to _pull_ their work any more, and their changes become
isolated and the rest move on.

With a git based repo, copyright clearance could be implemented as a
intermediary step.  Instead of submitting evidence via a webform, I
could expose my work to someone PG trusts, who then imports it and
verifies it is acceptable before pushing it into PG.  If two people
are required to verify the copyright, it is a simple matter of putting
a second person between submitter and the final PG acceptance repo.

For over a year now I have wondered how to feed to PG the "fixes" that
Wikisource editors have made to PG etexts, as it doesnt seem well
publicised; I've also been frying other fish so I havent bothered to
ask.  I can see the process to upload etexts is documented:

http://www.gutenberg.org/wiki/Gutenberg:Public_Domain_eBook_Submission_How-To

But there is no "Corrections How-To" in here:

http://www.gutenberg.org/wiki/Category:How-To

How would I submit a single revision to PG.  How would I go about
submitting hundreds of revisions feasibly?  Is the revision history of
an etext retained?

--
John Mark Vandenberg

From azkar0 at gmail.com  Fri Aug  8 07:31:05 2008
From: azkar0 at gmail.com (Scott Olson)
Date: Fri, 8 Aug 2008 08:31:05 -0600
Subject: [gutvol-d] online, offline, and bothline
In-Reply-To: 
References: 
	
Message-ID: <2362473e0808080731j723b6a4en2cfebf8d9cb953e8@mail.gmail.com>

>
> For over a year now I have wondered how to feed to PG the "fixes" that
> Wikisource editors have made to PG etexts, as it doesnt seem well
> publicised; I've also been frying other fish so I havent bothered to
> ask.  I can see the process to upload etexts is documented:
>
>
> http://www.gutenberg.org/wiki/Gutenberg:Public_Domain_eBook_Submission_How-To
>
> But there is no "Corrections How-To" in here:
>
> http://www.gutenberg.org/wiki/Category:How-To
>
> How would I submit a single revision to PG.  How would I go about
> submitting hundreds of revisions feasibly?  Is the revision history of
> an etext retained?


It's burried in the Reader's FAQ:
http://www.gutenberg.org/wiki/Gutenberg:Readers%27_FAQ#R.26._I.27ve_found_some_obvious_typos_in_a_Project_Gutenberg_text._How_should_I_report_them.3F

The FAQ says to send errata to the work's submitter, but I think that's now
mostly handled by errata at pglaf.org.

On 8/8/08, John Vandenberg  wrote:
>
> On Fri, Aug 8, 2008 at 10:06 AM,   wrote:
> > in deciding who to listen to, and how to weight things they say,
> > i'd like to ask you, especially you, dkretz, to consider how closely
> > you will listen to people who haven't programmed this tool-chain.
> >
> > and to think about how closely you will listen to _me_...
> >
> > you've got a good start with flex, because it operates online and off,
> > inside the browser and out, so you don't have to waste time thinking
> > about whether you are an online process or an offline one, or whether
> > you operate inside the browser or outside of it.  this is all to the
> good.
> >
> > what this means, of course, in terms of your own thinking about it
> > is that your app will live and work in both worlds at the same time.
> >
> > the resources -- text and images -- need to exist online in the cloud,
> > _and_ offline on the hard-disks of any user who chooses to grab 'em.
> >
> > i have already -- of course -- set up a basic system that operates as so,
> > a cyberlibrary that is built in a systematic way so as to exploited as
> such.
> >
> > the base u.r.l. for the library is this:
> >>   http://z-m-l.com/go/
> >
> > after that, each book is named with a unique 5-letter prefix
> > -- let's call it "bname", for book-name or bowerbird-name --
> > and the files are loaded into a folder with that unique bname.
> >>   http://z-m-l.com/go/bname
> >
> > so, for instance, if the bname is "mabie", the folder would be:
> >>   http://z-m-l.com/go/mabie
> >
> > after that, there are certain certainties, e.g., the .zml file is named
> as:
> >>   http://z-m-l.com/go/bname/bname.zml
> >
> > so, for the "mabie" book, it would be this:
> >>   http://z-m-l.com/go/mabie/mabie.zml
> >
> > there are a handful of books loaded into this system already:
> >>   http://z-m-l.com/go/mabie/mabie.zml
> >>   http://z-m-l.com/go/myant/myant.zml
> >>   http://z-m-l.com/go/sgfhb/sgfhb.zml
> >>   http://z-m-l.com/go/woodc/woodc.zml
> >>   http://z-m-l.com/go/mount/mount.zml
> >
> > furthermore, by perusing the directory of each of these unique folders,
> > you can see what resources compose the book.  files are named using
> > the _page-number_ of the page in the printed-book, so are predictable.
> >
> > the main file that's associated with each page-number is the image-file,
> > the scan of the page.  these filenames are also completely predictable...
> >
> > for instance, the scan of page 123 in the "mabie" book has this name:
> >>   http://z-m-l.com/go/mabie/mabiep123.jpg
> >
> > continuing, the scan of page 123 in the "myant" book has this name:
> >>   http://z-m-l.com/go/myant/myantp123.png
> >
> > or, per the template, the scan for page 123 in the "bname" book is:
> >>   http://z-m-l.com/go/bname/bnamep123.png
> >
> > notice that the image-type will depend on what it is...
> >
> > the "p" in front of the "123" stands for "page", of course.
> >
> > for forward-matter pages, they are prefixed by an "f", like:
> >>   http://z-m-l.com/go/myant/myantf009.png
> >
> > replacing the ".png" or ".jpg" in each of the above lines with ".html"
> > will display a page with the text on one side, the image on the other.
> >
> > and take a look at the folder contents at:
> >>   http://z-m-l.com/go/myant
> >
> > notice how unnumbered pages in the body of the book are named, like:
> >>   http://z-m-l.com/go/myant/myantp046x2a.png
> >
> > so one way of ascertaining the files that make up the book is to grab
> > the folder-listing and sorting the filenames to get the sequence info...
> >
> > in addition, however, in a belt-and-suspenders type of redundancy,
> > the information is also contained in the text-file itself, as we will
> see.
> >
> > if you grab the .zml file for any of those books, and split the contents
> > on [space-openbrace-openbrace], the top line of each item will have
> > the filename of the page-scan for that page.  so, for instance, you see
> > that it gives us this, for the "myant" book, along about page 45...
> >
> >>    {{myantp045.png}}
> >>    {{myantp046.png}}
> >>    {{myantp046x2a.png}}
> >>    {{myantp046x2b.png}}
> >>    {{myantp047.png}}
> >>    {{myantp048.png}}
> >
> > ***
> >
> > ok, so what does all of this mean?
> >
> > with a filenaming convention like the one that i've laid out here,
> > your offline tool can has the smarts to build a bridge to the cloud.
> >
> > give it the unique bname for the book, and it knows where to get
> > (1) the .zml file, which informs it (2) which files come in what order.
> >
> > the tool can then download all the image-files, in the background,
> > and save them in the appropriately-named bname folder _locally_,
> > after which the book exists both in the cloud and on their machine.
> >
> > so i will suggest that the next iteration of your program be able to
> > do this downloading so as to represent the book on your machine.
>
> This is almost exactly how the pagescan sets are externally visible on
> Wikisource, and from my very small experience with DP, it looks like
> they have a methodical structure as well.  Essentially you are
> creating a well-defined interface for retrieving the text & images.
>
> What you have described, and built, is slightly simpler and a lot
> neater than we have at Wikisource.
>
> > this filenaming infrastructure obviates the need for any _database_
> > to store the locations of the various resources that comprise a book,
> > and it introduces transparency and clarity to make it comprehensible
> > to even fairly naive observers who want to add value to the equation...
>
> The need for a database comes in when you massively distribute the
> task.  Breaking the text into pages, which you have said is
> undesirable in this day and age, is also helpful to avoid conflicts,
> but not necessary and it introduces other headaches.
>
> Now on the topic of database or not, there is a third option :- a
> massively distributed "database," that has recently been built ...
> git.
>
> http://en.wikipedia.org/wiki/Git_(software)
>
> There are other version control systems that might do the trick, but I
> think this is the most malleable, and thus suitable for the task.
>
> If the PG etext database was rebuilt as a git repository, the many
> disparate transcription projects would be able to "talk" a common
> structure/language which has the intrinsic ability to allow fluid
> interchange of etexts and fixes/changes/whatever, with etexts being
> pushed to PG once they are completed.
>
> For example, bowerbird could expose his "blood mountain" work onto a
> public server, and tell me to grab it.  I could then go offline and
> play around with it and then email him a few revisions that I was able
> to make (git allows transmission of diffs via email).  He would import
> those into his repo, and then someone else might have pull the revised
> work, and make some further improvements.  After a few people have
> done this, he could then post-process it and flag is as ready for PG
> to grab it.  At some point PG could _pull_ it from bowerbird.
>
> The beauty of git is that it is flexible about workflow.  revisions
> can flow in any directions, so the _people_ determine the process.
>
> If someone gets hit by a bus, the process can adapt to the natural
> change in people that results.  If two groups want to interact
> differently from the typical workflow, the same framework is still
> used with their workflow being different. (For example, DP might be
> authorised to _push_ etexts to PG because they are known to do their
> copyright homework.)  If one content provider proves to be unreliable,
> nobody bothers to _pull_ their work any more, and their changes become
> isolated and the rest move on.
>
> With a git based repo, copyright clearance could be implemented as a
> intermediary step.  Instead of submitting evidence via a webform, I
> could expose my work to someone PG trusts, who then imports it and
> verifies it is acceptable before pushing it into PG.  If two people
> are required to verify the copyright, it is a simple matter of putting
> a second person between submitter and the final PG acceptance repo.
>
> For over a year now I have wondered how to feed to PG the "fixes" that
> Wikisource editors have made to PG etexts, as it doesnt seem well
> publicised; I've also been frying other fish so I havent bothered to
> ask.  I can see the process to upload etexts is documented:
>
>
> http://www.gutenberg.org/wiki/Gutenberg:Public_Domain_eBook_Submission_How-To
>
> But there is no "Corrections How-To" in here:
>
> http://www.gutenberg.org/wiki/Category:How-To
>
> How would I submit a single revision to PG.  How would I go about
> submitting hundreds of revisions feasibly?  Is the revision history of
> an etext retained?
>
> --
> John Mark Vandenberg
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From rfrank at pobox.com  Fri Aug  8 08:13:24 2008
From: rfrank at pobox.com (Roger Frank)
Date: Fri, 08 Aug 2008 09:13:24 -0600
Subject: [gutvol-d] spacey quotes
Message-ID: <489C6294.9060509@pobox.com>

I took Don Kretz's regexes and put them in the cpprep code.
When I put them in before any of my checks, I get this result:

  978 start of line spaced double quote
  317 dak3
  185 double spaces
  162 dak2
  109 dak1
   84 dak4
   80 false paragraph break suspect
   38 top line added
   24 single quote spacing, type 2
   23 suspect start of line
   22 end of line spaced double quote
   22 single quote spacing, type 1
   14 start page asterisk added
   14 double quote spacing, type 2
   14 end page asterisk added
   11 missing space?
    9 ellipsis character error
    8 double quote spacing, type 1
    7 unhandled unicode
    7 spaced question mark
    4 spaced exclamation
    3 dak5
    3 spaced punctuation
    3 dak11
    2 dak14
    2 initial semicolon
    2 dak18
    1 spaced double-punctation
    1 too many dashes
    1 three dashes
    1 dak17
    1 dak19

All the reports labelled dak* are his regexes, numbered consecutively.
Now to take the same routines and put them after my regexes. With
that done, here is the run output:

  978 start of line spaced double quote
  560 double quote spacing, type 1
  185 double spaces
  138 double quote spacing, type 2
   80 false paragraph break suspect
   38 top line added
   24 single quote spacing, type 2
   23 suspect start of line
   22 end of line spaced double quote
   22 single quote spacing, type 1
   14 start page asterisk added
   14 end page asterisk added
   11 missing space?
    9 ellipsis character error
    7 spaced question mark
    7 unhandled unicode
    4 spaced exclamation
    3 dak11
    3 spaced punctuation
    2 initial semicolon
    1 three dashes
    1 exclamation quote
    1 too many dashes
    1 dak14
    1 spaced double-punctation

The interesting part is that dak11 hit three times and dak14 hit once.
These were ones I missed completely.

dak11 is "space-quote-dash => remove space" (three hits)
dak14 is "the  quote" (one hit)

the dak11's before fix:
[dak11]: get my hat and sacque; and then "--a rosy flush stole up to
[dak11]: grand match "--she heard him spoken of as a wealthy South-
[dak11]: would have been but one escape "--quite unconsciously she slid

the dak14 before fix:
[dak14]: to you. You are all alone in the "world, you see. Of course

Those are all errors, detected correctly by dakretz's regexes. I'm going
to bake them in to the next version. Thanks, Don.

--Roger Frank

From dakretz at gmail.com  Fri Aug  8 09:14:19 2008
From: dakretz at gmail.com (don kretz)
Date: Fri, 8 Aug 2008 09:14:19 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 28
In-Reply-To: 
References: 
Message-ID: <627d59b80808080914m42e68532nf9a954b6ead382ef@mail.gmail.com>

>
> ---------- Forwarded message ----------
> From: Marcello Perathoner 
> To: Project Gutenberg Volunteer Discussion 
> Date: Fri, 08 Aug 2008 10:46:56 +0200
> Subject: Re: [gutvol-d] spaceyquotes
> don kretz wrote:
>
>  Here is the set of regexes from my regex file that deal with spaceyquotes.
>>
>
> When the solution gets that complex, its usually an indication that the
> algorithm is badly chosen.
>
> Why don't you take 100 well-proofed texts and run a statistical analysis of
> the characters you find around quotes? Assuming iso-8859-1 there are only
> 256 * 256 possible combinations. Sum hits in a table. Normalize table into
> percentiles.
>
> Then scan the dirty text and flag all combinations you never or seldom
> found before.
>
>
>  -- space-quote-semicolon => trim leading space
>> 1,$s/ ";/";/g
>>
>
> In French typography you often find space before (semi)colon.
>
>
>
> --
> Marcello Perathoner
> webmaster at gutenberg.org
>

I've found that the Natural Language Toolkit
adresses this and many,
many similar strategies (and others equally useful, but not similar.) There
is also provided a textbook on what the toolkit does, how,why, how to write
software using it, integrated with a fairly complete tutorial on Python
- sufficient, in my case, to
learn Python and the toolkit simultaneously. If nothing else, it helps me
put your targeted suggestions into a conceptual framework.

How can I help you help us apply your suggestions?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Fri Aug  8 09:46:05 2008
From: dakretz at gmail.com (don kretz)
Date: Fri, 8 Aug 2008 09:46:05 -0700
Subject: [gutvol-d] regex revision
Message-ID: <627d59b80808080946nc2fb086r1d98aac481a7cb58@mail.gmail.com>

I've posted a new copy of my regex file with many revisions.

A couple notes (assertions that in my mind almost go without saying):

Any software I write to use these (or anyone else's regexes, rules,
algorithms, whatever) needs to separate those elements into separate files
from the UI, so they can be selectively made available by language, project,
purpose, etc.

No question these are ad hoc, heuristic elements that probably are
redundant, and overlap, duplicate, and are in general badkly organized and
documented. It's evolving, and any additions, contradictions, objections,
suggestions, and help is welcomed and will be considered and included where
appropriate. Caveat emptor. (And that applies to the BB and rfrank
contributions, as soon as I get a chance.)

(Hmmm ... cute ... today is 080808. I'm probably among the last to notice. I
wonder what the Chinese numerological consequences are?)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: re.vim.080808
Type: application/octet-stream
Size: 1441 bytes
Desc: not available
URL: 

From Bowerbird at aol.com  Fri Aug  8 12:35:02 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 8 Aug 2008 15:35:02 EDT
Subject: [gutvol-d] spacey quotes -- silly season
Message-ID: 

i detailed how to fix spacey quotes -- and determine
other problems revolving around quotation marks --
in an earlier post.   my methodology works, very well...

any future programmers looking in on this thread 
would do good to ignore the current "silly season"
surrounding this topic and go back to my message.

i also find it humorous -- in an ironic sort of way --
that some of the people who called me "a troll" for
the years that i've been discussing preprocessing
now -- all of a sudden -- would have you believe
that _they_ are the "experts" in how to do it...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug  8 13:17:42 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 8 Aug 2008 16:17:42 EDT
Subject: [gutvol-d] online, offline, and bothline
Message-ID: 

john said:
>    This is almost exactly how the pagescan sets are externally visible 
>    on Wikisource, and from my very small experience with DP, 
>    it looks like they have a methodical structure as well.? 

d.p. has "a methodical structure" of sorts, but it's _not_ like mine.

d.p. file-location information is kept in a database that's exposed
to the human user via the "project page" for every specific book...
since you don't have access to the database, the only way you can
get that information is to scrape the project page, which i've done,
but it's essentially far too much work to get what _could_ be simple.


>    Essentially you are creating a well-defined interface 
>    for retrieving the text & images.

exactly.

and it's even better than developing an a.p.i., in my opinion, because
you don't need programming chops to benefit from it, or to add value.


>    The need for a database comes in 
>    when you massively distribute the task.? 

no, it doesn't.   and i'll show you that, as time goes on...

essentially, you just follow the same general philosophy,
of forming naming conventions that allow you to attain
the stuff that you need, without resorting to a database.


>    Now on the topic of database or not, 
>    there is a third option :- a massively distributed "database," 
>    that has recently been built ... git.

again, we have a tremendous gulf in our philosophies...

once you start adding in things like version control systems,
the number of people who grok your system takes a nosedive.

so if it's not necessary -- and it's not -- then i vote against it.


>    If the PG etext database was rebuilt as a git repository, 
>   the many disparate transcription projects would be able to 
>   "talk" a common structure/language which has 
>   the intrinsic ability to allow fluid interchange of etexts and 
>   fixes/changes/whatever, with etexts being pushed to PG 
>   once they are completed.

from this observer's perspective, the chances of p.g. implementing
that kind of system are small, very small, extremely small, maybe nil.


>    The beauty of git is that it is flexible about workflow.? revisions
>    can flow in any directions, so the _people_ determine the process.

perhaps my vision is limited, but i don't see the need for such a
complex system.   what i have termed "the march to perfection"
for an e-text is a simple procedure, with steps that grow small
very quickly, and then become nearly impossibly tiny minutiae.

version control systems -- even those focusing on documents --
are targeted at _living_documents_, ones that are "alive" in that
they are constantly evolving and being changed and updated...

that's not what we're doing here.   we're working on "dead" text.
we're representing something that was considered as "finished"
a _long_ time ago.   it was already poured into a physical mold...

so our texts _can_ -- and _will_ -- get to a point where we can
confidently say that their vast bulk will never be changed again.

in fact, i'd say it's our responsibility to take a text _to_ that point
before we ever release it to the public at large in the first place...

so we don't _need_ a "version control system", thank you much...


>    For over a year now I have wondered how to feed to PG 
>    the "fixes" that Wikisource editors have made to PG etexts, 

the p.g. system for updating e-texts is filled with major stupidity.


>    How would I submit a single revision to PG.? 

e-mail it to them.   check back a year later to see if it's been fixed.


>    How would I go about submitting hundreds of revisions feasibly?

they suggest you send them an updated copy of the file...


>    Is the revision history of an etext retained?

not really.   sometimes the old version is retained, so you could
execute your own compare operation between the two e-texts.
but sometimes the old version isn't even retained...

furthermore, there's no good way to ascertain whether your copy
of an e-text is equivalent to the latest copy available, except to
download the current one and compare it to your existing copy...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From marcello at perathoner.de  Fri Aug  8 13:49:17 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri, 08 Aug 2008 22:49:17 +0200
Subject: [gutvol-d] online, offline, and bothline
In-Reply-To: 
References: 
	
Message-ID: <489CB14D.7060600@perathoner.de>

John Vandenberg wrote:

> Now on the topic of database or not, there is a third option :- a
> massively distributed "database," that has recently been built ...
> git.

That's still a file based system.

What we need is an XML based system, where we can make updates down to 
the granularity of single XML elements. There's no need to transfer the 
whole work if only one paragraph is going to be changed.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From dakretz at gmail.com  Fri Aug  8 13:55:23 2008
From: dakretz at gmail.com (don kretz)
Date: Fri, 8 Aug 2008 13:55:23 -0700
Subject: [gutvol-d] pre-, mid-, post-, and other processing in general
Message-ID: <627d59b80808081355t2f7dc2f2gdcdd0403f3bb22fd@mail.gmail.com>

I think the single most important issue to keep in mind as we evaluate the
effectiveness and productivity associated various techniques and processes
is whether, and how, we measure things.

If there's a single reason the bird-assault has been able to maintain
momentum, it's because all the numbers have been coming from one direction.
The dp system, as currently designed and operated, simply has had no
objective way to respond. The argument has been so bloody because the
resistance has had only subjective, anecdotal arguments to make.

Recently there have been several individuals who have taken on the mission
of painfully constructing quality measures based on a very limited, atypical
set of custom projects, mostly without access to the dp data sources. I
personally suspect it's because those *do* who have access don't have the
ability to generate quantitative information (not that they simply don't
deign to spend their valuable personal time on it); but whichever the
reason, everyone involved would be so much better off if we could open up
the system and the process and the data sources to responsible analysis, and
create a planning process based on measurable results. It's not that hard.

Same thing for various tools and methods, whether originating from BB,
rfrank, myself, or anyone else. The standard question should be: Show me the
numbers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Morasch at aol.com  Fri Aug  8 14:32:15 2008
From: Morasch at aol.com (Morasch at aol.com)
Date: Fri, 8 Aug 2008 17:32:15 EDT
Subject: [gutvol-d] preprocessing changes made "blindly"
Message-ID: 

let's talk for a minute -- not long, i promise -- about
exactly what types of change you can make "blindly",
just so we can see how uninteresting these cases are...

one example of a routine that you can run "blindly" is
deleting the space in front of a spacey-question-mark.

in "her own right", there were 8 such instances:
>    languid: "Been away, somewhere, haven't you ?
>    if Gaspard, his particular waiter, missed him ?
>    the Duvals didn't keep an eye on Greenberry Point ?
>    "You are determined ?--Very well, then, come
>    "But you're not quite sure ?--oh! modest man!"
>    moment, will you ?--you're hipped on it!"
>    "Than your Southern ancestors ?--isn't that
>    will be: 'Come over and see us, won't you ?'"

now, like i said, you can do this change "blindly", and
you probably won't be burned by it.   but, to my mind,
anyway, i like to see a list of the hits, as shown above,
so i can scope out each line to make sure it looks right.

on cases like these, i won't bother to check the image,
because it's clear enough what needs to be done here
-- there's no way the image could "change my mind --
but i _do_ like to see the changes that are being made.

and i don't have to see each instance on its own page;
a list, just like the one above, is more than adequate...
(and no, we do not need surrounding "context" lines.)

so there's another wrinkle on the interface we desire,
an ability to execute a list of changes in one fell swoop.

***

or, for something out of the ordinary, but still _boring_,
also from "in her own right", we have very strange stuff.

i've mentioned this before, but the quotemarks here are
very weird.   there are lots of cases where the quotemarks
are just plain _wrong_ -- instead of being a close-quote
attached to the previous word, they are an _open-quote_
that is attached to the _next_ word.   they are not spacey,
which would be common.   they are actually outright wrong.

i honestly cannot remember that i've see that before, ever!

but i know for sure i've never seen it happen several times
in a book, let alone dozens and dozens of times, like here.

take the construction [space][doublequote]said:   "said

that seems like an extremely improbable construction...

yet, in the "o.c.r." that came from "in her own right",
this construction occurred a full 47 times, appended.

examine these cases, and you'll find that _all_ of them
are errors.   so i suspect that this is not "an o.c.r. error",
but rather a bug in the routine to fix spacey-quotemarks.

so in this case, i would do a global correction, and blindly.
(but yes, it'd be best to just fix the bug, if that's what it is.)

***

nonetheless, this little bug gives us a good plumbing tool.
we just need to think of these errors as being "embedded
intentionally, to see if the proofers were paying attention"...

as i said, the o.c.r. had this strange construction 47 times.

after p1, the occurrences dropped to 13.   (p1 got 34/47, 72%.)

after p2, there were 4 occurrences left.   (p2 caught 9/13, 69%.)

a proofing-accuracy rate of .7 is what we're getting here, and
that's not a bad rate.   it's also pretty close to what you always
get from d.p. people, as i can say having looked at a lot these
types of "errors"... given .7 accuracy, two rounds goes to 91%,
while three rounds zooms you to 97%, and four goes to 99%...

but why not catch and fix all of those [space][doublequote]said
errors _automatically_, so base accuracy starts at a higher level?

there's no good reason why we shouldn't do that.

***

anyway, as just noted, here's the best reason _not_ to embed
any intentional errors in an e-text, or even precaution "flags"
-- because 30% of them will go unnoticed by the p1 proofers,
and 9% will persist even after having gone in front of p2 people.

-bowerbird

>    "Blocks, seh! "said the negro. "'Tain't no
>    "Yass, seh! Yass, seh! "said the porter, and,
>    "Josephine! "said Dick, "here is Mr. Croyden,
>    "Oh, Lord! the old dragoon! "said Leigh. "I
>    "I'll double! "said Miss Tilghman.
>    "Very interesting! "said Macloud. "Very interesting,
>    "I see! "said Macloud, laughing. "What time
>    "Come on! "said Macloud, adjusting the stirrups.
>    "We can see them again! "said Croyden. "The
>    "Sure! "said Macloud. "I'll follow your voice
>    "No, I didn't lose anything! "said Croyden
>    "You go to the devil! "said Croyden. "She
>    "Damn! "said Croyden.
>    "Hum! "said Macloud. "So you're coming
>    "Granted! "said Macloud. "But how are we
>    "Neither! "said Croyden. "There is another
>    "Mr. Secretary! "said Rickrose, "my friends
>    "Greenbury Point! "said the Secretary, vaguely
>    "Very good! "said Croyden. "Have you any
>    "My dear Mr. Croyden! "said Axtell, "I don't
>    "What! "said Croyden.
>    "Just so! "said Croyden.
>    "I thought as much! "said Croyden. "Well,
>    "Shut up! "said Croyden. "I don't care to
>    "Very good! "said Macloud--" you're the one
>    "We wish you a very good day! "said Croyden.
>    "It's a foolish hunt, anyway! "said Croyden.
>    "Friends! "said Macloud. "Are there such
>    "Proceed! "said Croyden. "We are arriving,
>    "Hi!--I sut'n'y does! seh, I sut'n'y does! "said
>    "If you please, yes! "said Macloud.
>    "Yes, I have! "said Croyden.
>    "And that is like the Duvals! "said she. "It
>    "But, seriously! "said Macloud, "it would be
>    "I couldn't see it! "said Macloud. "I noticed
>    "Oh! damn Northumberland! "said Croyden.
>    "Not at all! "said Croyden. "It's no worse
>    "A woman! You're safe! "said Macloud. "He
>    "Oh, very well! "said Croyden. "Can you
>    us! "said Elaine. "I suppose it's scarcely proper
>    "It is, indeed! "said Elaine as she saw the table,
>    "That, for the police! "said Croyden, snapping
>    "This looks natural! "said Elaine. "We must
>    "There! "said he, as he arose. "Pirate's gold
>    "You are! "said Macloud. "I never saw a
>    "Alone! "said Croyden, bending over her.
>    "Nothing! "said Croyden.



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Morasch at aol.com  Fri Aug  8 14:39:54 2008
From: Morasch at aol.com (Morasch at aol.com)
Date: Fri, 8 Aug 2008 17:39:54 EDT
Subject: [gutvol-d] pre-, mid-, post-, and other processing in general
Message-ID: 

dakretz said:
>    The standard question should be: Show me the numbers.

once you get the right tool, you will know you are moving,
and you will even know that you are moving _fast_, and you
will cease to feel much need to prove it with a speedometer.

i love data.

but as i just said, the "march to perfection" has steps that
get smaller and smaller and smaller and smaller as you go.
at some point (in time and space), a g.p.s. unit is immaterial.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug  8 14:49:27 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 8 Aug 2008 17:49:27 EDT
Subject: [gutvol-d] online, offline, and bothline
Message-ID: 

so, dkretz, when will you have a new version of "twisted" that will
download the resource files from the cloud and store 'em locally?

might as well review the upload capability when you figure out the
download capability, since we'll need to upload any corrected text.

not rushing you, just wondering when we will be able to expect it?,
because i'll wait to pontificate on that until it will make more sense.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ajhaines at shaw.ca  Fri Aug  8 15:01:48 2008
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Fri, 8 Aug 2008 15:01:48 -0700
Subject: [gutvol-d] Revisions to PG etexts (was online, offline,
	and bothline)
References: 
Message-ID: <003901c8f9a2$5b272790$6401a8c0@ahainesp2400>

>   For over a year now I have wondered how to feed to PG
>   the "fixes" that Wikisource editors have made to PG etexts,

Errata reports can be submitted via email to errata_AT_pglaf.org.  Check the PG contacts page at 
http://www.gutenberg.org/wiki/Gutenberg:Contact_Information

Also take a look at PG's FAQ pages at http://www.gutenberg.org/wiki/Gutenberg:Readers%27_FAQ - 
specifically R.26, R.27 and R.28.

It should be noted that the main thing PG's errata system lacks is people.  Currently I think there 
are only 3-5 (and probably fewer) people (the Whitewasher team) handling errata reports.  The WWers 
(of which I'm one) are fully stretched with producing their own submissions and WWing others' 
submissions.  In their spare time (ha!), two of the WWers are also working their way through PG's 
old etexts, generally bringing them up to current standards, and moving them into PG's new folder 
structure.  (I'm working through 1996's etexts, another WWer through 1997's.)


>   the p.g. system for updating e-texts is filled with major stupidity.

I won't comment on this except to say that it's beneath comment.



>>   How would I submit a single revision to PG.
>  e-mail it to them.  check back a year later to see if it's been fixed.

>>   How would I go about submitting hundreds of revisions feasibly?
>  they suggest you send them an updated copy of the file...

Only minimally correct - read the above FAQ articles.



>>   Is the revision history of an etext retained?

>  not really.  sometimes the old version is retained, so you could
>  execute your own compare operation between the two e-texts.
>  but sometimes the old version isn't even retained...

Wrong.  Previous versions are retained, and archived into a folder below that which contains the new 
version.  The new version contains the etext's original Release date (e.g. May 1996), and the 
Posting date of the new version (e.g. August 5, 2008).   (For most of the previously-mentioned old 
etexts, there were no intervening revisions.)

Read FAQ articles R.35 and R.36 for more.



Al





From dakretz at gmail.com  Fri Aug  8 15:06:56 2008
From: dakretz at gmail.com (don kretz)
Date: Fri, 8 Aug 2008 15:06:56 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 30
In-Reply-To: 
References: 
Message-ID: <627d59b80808081506p7fa58a36r28ebd8754f1d525@mail.gmail.com>

>> so, dkretz, when will you have a new version of "twisted" that will
>> download the resource files from the cloud and store 'em locally?
>>
>> might as well review the upload capability when you figure out the
>> download capability, since we'll need to upload any corrected text.
>>
>> not rushing you, just wondering when we will be able to expect it?,
>> because i'll wait to pontificate on that until it will make more sense.
>>
>> -bowerbird

Well, you need to describe the process a little bit.

I wouldn't be surprised (though I haven't tested it) if what you already
have couldn't take a url as easily as a filesystem path (although you'll
need to type it in - you can't browse to it.) As long as the directory is
readable, it will fill the page table, and read the files.

Saving them locally is an option, but a bit fraught, setting up where,
checkiing for filesystem space, handling write permissions, etc.; and if you
have a fast net connection (mine is 150mbps) there's not much point with the
images, especially if they are cached.

Then, what to do with the text files? You want to enforce read-only on the
host. Even locally, it sounds like you don't care much, but I don't want to
overwrite the originals.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug  8 15:14:56 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 8 Aug 2008 18:14:56 EDT
Subject: [gutvol-d] the reg-ex collection
Message-ID: 

dkretz said:
>    No question these are ad hoc, heuristic elements that 
>    probably are redundant, and overlap, duplicate, and 
>    are in general badkly organized and documented. 
>    It's evolving, and any additions, contradictions, objections, 
>    suggestions, and help is welcomed and will be considered 
>    and included where appropriate. Caveat emptor.

it's very tempting to look at this as "a collection",
and feel a natural desire to try to make it bigger.

understandable, yes.   but the wrong path to take.

you want the smallest set of these that does the job
yet has the benefit of having the fewest false-alarms.

every false-alarm burns human energy inefficiently...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Fri Aug  8 15:22:10 2008
From: dakretz at gmail.com (don kretz)
Date: Fri, 8 Aug 2008 15:22:10 -0700
Subject: [gutvol-d] Bird's - what's it? Cream Puff?
Message-ID: <627d59b80808081522q67517dc2kbd7973c89ade0acb@mail.gmail.com>

I'm getting nervous about the variety of text you've used. All I've seen so
far (partially due to the same nervous problem I have with DP's variety of
test projects) is that we'll end up with something ideal for trivial
projects. The kind I never work on, unfortunately.

Are you equally confident about:

Encyclopedia Britannica?
LOTE projects?
Tables?
How do you handle TOC/Index/Appendix type stuff?
Mathematics?
Chemistry?
Poetry, creatively indented?
Drama?
Dictionaries?
Embedded Greek?
Scripture?
Mixed languages? (I remember an English narrative with pages of French
footnotes for references.)
Diaries/Chronicles?
Newspapers - e.g. "Notes and Queries" or "Punch")?
Footnotes? Sidenotes?
etc.
etc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug  8 15:38:02 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 8 Aug 2008 18:38:02 EDT
Subject: [gutvol-d] Revisions to PG etexts (was online, offline,
	and bothline)
Message-ID: 

al said:
>    It should be noted that the main thing PG's errata system lacks is 
people.? 

um, no, the _main_ thing that the system lacks is a coherent infrastructure,
one that specifically includes change-logs.

and the second major thing missing is _the_scans_ of most of the books...


>    Currently I think there are only 3-5 (and probably fewer) people 
>    (the Whitewasher team) handling errata reports.? 
>    The WWers (of which I'm one) are fully stretched with 
>    producing their own submissions and WWing others' submissions.? 

the insiders always blame the absence of progress on a lack of insiders...
with a better system for handling errata, there would be more volunteers.


>   I won't comment on this except to say that it's beneath comment.

aren't we all superior today?

i _will_ comment on this, to say that i have made _lots_ of comments
on the stupidity of the p.g. errata system, and i have backed up those
"comments" with hard-nosed examples, and laid out nice blueprints
on how the job _should_ be done.   you can find them in the archives.

and if _anyone_ -- specifically including mr. haines here -- would like
to discuss those old posts, you will find that i am quite willing to do so.


>   Only minimally correct - read the above FAQ articles.

right.   because everything happens _exactly_ like it says in the f.a.q.

let me give some clear direction to john vandersource (new nickname!):
before you embark on any large plan to send errata to the p.g. team,
do some simple tests to see if they can handle what you intend to send.
be prepared to face the possibility that they cannot, and thus that you
should decide _not_ to spend your time on such a pointless endeavor.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Fri Aug  8 15:38:42 2008
From: dakretz at gmail.com (don kretz)
Date: Fri, 8 Aug 2008 15:38:42 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 30
In-Reply-To: <627d59b80808081506p7fa58a36r28ebd8754f1d525@mail.gmail.com>
References: 
	<627d59b80808081506p7fa58a36r28ebd8754f1d525@mail.gmail.com>
Message-ID: <627d59b80808081538m630169ebofe387e2ceb381e9c@mail.gmail.com>

>> dkretz said:
>> >    No question these are ad hoc, heuristic elements that
>> >    probably are redundant, and overlap, duplicate, and
>> >    are in general badkly organized and documented.
>> >    It's evolving, and any additions, contradictions, objections,
>> >    suggestions, and help is welcomed and will be considered
>> >    and included where appropriate. Caveat emptor.
>>
>> it's very tempting to look at this as "a collection",
>> and feel a natural desire to try to make it bigger.
>>
>> understandable, yes.   but the wrong path to take.
>>
>> you want the smallest set of these that does the job
>> yet has the benefit of having the fewest false-alarms.
>>
>> every false-alarm burns human energy inefficiently...
>>
>> -bowerbird
>>

My experience is that there is an inverse relationship between precision and
coverage.
That's not defending my list, or saying it shouldn't be as simple as
possible. But
you either end up with more rules, or rules with increasingly obscure
clauses appended.

Make it as simple as possible. But no simpler. That's a quote.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug  8 16:06:57 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 8 Aug 2008 19:06:57 EDT
Subject: [gutvol-d] Bird's - what's it? Cream Puff?
Message-ID: 

dkretz said:
>   My experience is that there is an inverse relationship 
>    between precision and coverage. That's not defending my list, 
>    or saying it shouldn't be as simple as possible. But you either end up 
>    with more rules, or rules with increasingly obscure clauses appended.

you've essentially come to the same point as me;
that you got there from the opposite direction is
of little consequence.


>    Make it as simple as possible. But no simpler. That's a quote.

again, i said this in the opposite order -- do the job,
but do it as simply as possible -- so there is no cause
(or utility) for disagreement among us.

i am quite sure that you will come to detest false-alarms
as much as i do, and do as much as possible to avoid 'em.

think of it as a challenge to your creativity...           :+)


>    I'm getting nervous about the variety of text you've used.

"nervous"?

why in the world would you get "nervous"?

this isn't rocket science.   we won't lose any sleep over it...

my approach -- which starts with zero-based budgeting --
is that when you find an error in a book, you devise a routine
that would have captured that error automatically (if you can).

you test that routine, and if it proves its relative value across
several books, with a minimum of false-alarms, it's a keeper.

if you encounter different types of text, with different errors,
the errors themselves will teach you what routines are needed.

until then, don't try to "imagine" what errors _might_ pop up...

***

and, to repeat...   you're putting the emphasis in the wrong place.
the list of error-finding routines is _not_ what is most important.
the _tool_ that lets you handle the text and the images is the key.
go make that part first.   once you've got that, the rest will follow...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug  8 16:32:43 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 8 Aug 2008 19:32:43 EDT
Subject: [gutvol-d] online, offline, and bothline
Message-ID: 

dkretz said:
>    Well, you need to describe the process a little bit.

um...   i already described it, quite fully.


>    I wouldn't be surprised (though I haven't tested it) 
>    if what you already have couldn't take a url 
>    as easily as a filesystem path 
>    (although you'll need to type it in - you can't browse to it.) 
>    As long as the directory is readable, it will fill the page table, 
>    and read the files.

didn't work when i tried it.   so when can i expect an update?
one that you have tested, and which you firmly believe works?

i'd like one that works with my files -- the ones i pointed you to --
but i'd be happy to settle for one that works with files on your site...

but until i can see the capability to download a book from the cloud
and have it represented on my local hard-drive, we can't do much...


>    Saving them locally is an option, but a bit fraught, 

saving them locally is not really "an option", it's a requirement,
at least to my way of thinking, because this is an _offline_task_.
you shouldn't need a persistent internet connection to do this...


>    Saving them locally is an option, but a bit fraught, 
>    setting up where, checkiing for filesystem space, 
>    handling write permissions, etc.; 

well, you make it sound all so complicated.   and maybe it is...
for you...   with my development environment, it's quite simple.


>    and if you have a fast net connection (mine is 150mbps) 
>    there's not much point with the images, especially if they are cached.

there's an essential philosophical cornerstone here you're missing.

this is an offline task, and we want the resources comprising a book
to be available to the offline user, in a way that they are _solid_ on
the hard-drive of that machine, solid and accessible and useable with
any piece of software that that particular user might want to bring in.

i don't want a scan that the browser can find, or your program alone.
i want a scan that sits as an image-file in a directory on my machine,
right next to all the other scans from the same book, thank you kindly.

that's a _basic_ of what i'm talking about.   it's not a negotiable item...


>    Then, what to do with the text files? 
>    You want to enforce read-only on the host. 
>    Even locally, it sounds like you don't care much, 
>    but I don't want to overwrite the originals.

i've hinted at that, and i will get into it more deeply as we proceed,
but there's no use getting so far ahead of ourselves for right now,
especially now while i'm still busy here at the national poetry slam.
(by the way, it's been an awesome week here.   great-quality poetry!)

if you do as i said, and wrap the individual .txt files into a single file,
and download that whole-book text-file with the scans, we'll be fine.


>    but I don't want to overwrite the originals.

the whole-book text-file _is_ "the originals", and is _not_ overwritten.

it will all become very clear, very simple, very transparent, in its time,
so just trust.   or, if you don't want to just trust, then go ahead and sit,
and next week when i have more time i'll be able to explain everything.

-bowerbird

p.s.   i haven't had time to generate material in the form required for
your program.   do you have an example .zip file we could download?

p.p.s.   and could you _please_ stop using the digest subject-headers?



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Fri Aug  8 19:53:49 2008
From: dakretz at gmail.com (don kretz)
Date: Fri, 8 Aug 2008 19:53:49 -0700
Subject: [gutvol-d] Bird,
Message-ID: <627d59b80808081953m75e88687iccdf50e3b6b92aeb@mail.gmail.com>

First, slow down. Think. Do you remember where you put the meds?

OK, one thing at a time.

1. Does the current version load your projects, if they are already located
on your local PC?

2. I'm not trying to duplicate the functionality of Banana Cream. What I
have is an existing application that renames image files properly for
submitting to PG. I added a new form to it, to be able to view the text and
image together, and preprocess texts the way I'm currently comfortable doing
it, with my EB-specific regexes. Pretty soon I'll be able to load regexes
from the external file you've already seen, and execute them as I described;
which includes the mode(s) you described, plus one or two others you may not
hold is as high regard. No problem.

3. You have additional preprocessing capabilities that you will include in
your software. If it's really, really easy to implement the same rules as
regexes, I may add them to my file. If it's not easy (ok, it may be easy for
you, but say it's not easy for me, as determined by me), then I have high
expectations that I can just use Banana Cream.

4. I have already added (but not recompiled and posted) a version that
concatenates the pages into one file. That took about 15 minutes, as you
would imagine it would take, to write
for each file in directory/*.txt { read text from file; append text to big
string; }

5. You have a repository that stores text and images in some probably
compatible structure on a host. You want my program to add the capability to
transfer files from a host. I do that using an ftp utility. It would be
perhaps nice, and non-disruptive, to have my app transfer files. How about
if you send me a gmail address you would like to use for the purpose, and
I'll add it to the svn server where the source is posted (mentioned above),
and you can add it yourself? If you wait for me to do it, it will take
longer probably, given priorities and Real Life and stuff.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From jayvdb at gmail.com  Fri Aug  8 21:24:08 2008
From: jayvdb at gmail.com (John Vandenberg)
Date: Sat, 9 Aug 2008 14:24:08 +1000
Subject: [gutvol-d] online, offline, and bothline
In-Reply-To: 
References: 
Message-ID: 

On Sat, Aug 9, 2008 at 6:17 AM,   wrote:
> john said:
>>   This is almost exactly how the pagescan sets are externally visible
>>   on Wikisource, and from my very small experience with DP,
>>   it looks like they have a methodical structure as well.
>
> d.p. has "a methodical structure" of sorts, but it's _not_ like mine.
>
> d.p. file-location information is kept in a database that's exposed
> to the human user via the "project page" for every specific book...
> since you don't have access to the database, the only way you can
> get that information is to scrape the project page, which i've done,
> but it's essentially far too much work to get what _could_ be simple.

Indeed, but my point is that they have a structure.  Page scaping is
feasible if we collaborate on building a tool that does it.  The first
incarnations of a "wiki" API were built around html scraping.  Which
was a nightmare because the wiki software grew considerably and the
html was constantly changing.  The _big_ difference is that _lots_ of
people worked on the pagescaping "wiki" API, meaning it was well
maintained.  DP doesnt have many people interested in interacting in a
disengaged manner.  As far as I can see, the majority of people are
happy to operate via the web based interface.

To get an idea of the scale of automated edits in the wiki world, the
majority of edits on English Wikipedia are done by bots:

http://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits

And wikisource has three bots which outrank humans:

http://stats.wikimedia.org/wikisource/EN/TablesWikipediaEN.htm#wikipedians

Most wikis end up like this, as humuns invariably figure out a way to
make someone else do the grunt work for them, and software developers
invariably like to build smart grunts. The smarter the better, for all
involved.

>>   Essentially you are creating a well-defined interface
>>   for retrieving the text & images.
>
> exactly.
>
> and it's even better than developing an a.p.i., in my opinion, because
> you don't need programming chops to benefit from it, or to add value.
>
>
>>   The need for a database comes in
>>   when you massively distribute the task.
>
> no, it doesn't.  and i'll show you that, as time goes on...
>
> essentially, you just follow the same general philosophy,
> of forming naming conventions that allow you to attain
> the stuff that you need, without resorting to a database.

You have borked at the use of the word "database".

I appreciate the quick response, but you need to go back and read my
email, in total, and read about git.

>>   Now on the topic of database or not,
>>   there is a third option :- a massively distributed "database,"
>>   that has recently been built ... git.
>
> again, we have a tremendous gulf in our philosophies...
>
> once you start adding in things like version control systems,
> the number of people who grok your system takes a nosedive.

Excuse me?  You are talking at length about the undocumented nature of
the PG repo.  I am telling you how to make that repo more
understandable by putting it into a system that adds a "language" of
revisions.

The people that currently understand the PG repo are the only ones
that would _need_ to understand the _PG repo hosted on git_, and we
could probably even set it up so that those people didnt need to
understand much of it at all.

> from this observer's perspective, the chances of p.g. implementing
> that kind of system are small, very small, extremely small, maybe nil.

They dont need to.  Again, this is a feature of git.  It is, for want
of a better term that conveys some sense of it, a distributed
database.  PG doesnt need to implement it.  It can and does _grow_
from any node.  git is also designed to allow the history to be
grafted on; initially Linus started with a snapshot of the history up
to the time it went live, and over time the earlier revisions have
been added.

>>   The beauty of git is that it is flexible about workflow.  revisions
>>   can flow in any directions, so the _people_ determine the process.
>
> perhaps my vision is limited,

perhaps I need to seek medication...

> but i don't see the need for such a
> complex system.

PG and DP are already complex systems.  These systems are, as best I
can tell with limited exposure so far:
1) web interfaces with very strict software which dictates the
process, and depends on complex software enhancements to allow process
improvements
2) email addresses with very lax definitions of what is acceptable, in
order to be flexible and forgiving as it is designed for newbies.

Both of those are workflow problems.

> what i have termed "the march to perfection"
> for an e-text is a simple procedure, with steps that grow small
> very quickly, and then become nearly impossibly tiny minutiae.

right.  revision, revisions, revisions.  The size of these revisions
doesnt make them any less of a revision.

> version control systems -- even those focusing on documents --
> are targeted at _living_documents_, ones that are "alive" in that
> they are constantly evolving and being changed and updated...

Wrong.  version control systems (vcs) are built to distribute a task
across many people, or across time.  PG _has_ a version control
system, albeit constructed of text files, rsync, email, and other
hairy links - and it appears that the PG vcs is _lossy_.  Linus
created a "version control system" which was similar in nature to what
PG is using, only his was not lossy, and it was constructed to deal
with his crazy idea of decentralisation.  He purposely didnt adopt any
of the major vcs available at the time because they were all
fundamentally flawed in that they distributed the task across time,
but they didnt properly distribute it across people.  So he then built
"git", which has no fixed topology built into the system - it is a
tool to facilitate the movement of changes in any direction one
chooses.  A wiki is also a version control system - an odd one I might
add, but nevertheless it has all the features one would expect, and
there are some crazy people using mediawiki for software versioning.

We all doing control of versions; we need a language of interchange of
those versions.

Also, consider a "United States Code" etext, or more feasible, a
subtitle of it.  That is a living document.

Consider an unpublished English translation of Heinrich Heine's
Almansor, which seems to be within PG's scope, which might be first
published by PG.  Despite everyones best intentions, such a work will
not be perfect.  Revisions to it would be desirable.  We want a system
that facilitates that.  (and _scales_ to hundreds of revisions to
various etexts)

> that's not what we're doing here.  we're working on "dead" text.
> we're representing something that was considered as "finished"
> a _long_ time ago.  it was already poured into a physical mold...

More importantly than the fact that we can, and my perspective should,
consider "living documents" in any revised model, I'm mostly proposing
a common language for us to talk while a work is "etextus infectus" -
i.e. when there are still under construction and many revisions being
done, and more to do.

DP is built as a self-enclosed system which emits completed works, as
PG is a depository of completed works.  That model works well, except
that it puts immense pressure of the DP software to meet all needs,
and DP has many stale projects which other groups might be interested
in.  e.g. a person might want to "take" a DP project, import it into a
client based application to do an entire stage in a disconnected
manner, and then push it back to DP.  This interchange can be managed
by using a distributed database+version control system.

Bowerbird has a different way of working on etexts, Wikisource has
another, other developers on this list are mentioning their own client
side tools in development, and there are hundreds of other smaller
projects and depositories of reasonable size around the place.  If we
can develop a fluid method of moving "etextus infectus" around between
projects, new software will spring up to target more specific needs.
We might find a group that is specifically interested in works that
are riddled with formula, and they might decide that the DP software
is not sufficient to tackle it enmass, so they build their own
software and take a few projects from DP while they fine tune their
software.  A DPer could then import the etext and inspect it to
determine how many rounds are going to be necessary before it is
finished to their level of satisfaction.

> so our texts _can_ -- and _will_ -- get to a point where we can
> confidently say that their vast bulk will never be changed again.

\o/

My point is that it doesnt happen overnight, and in the meantime, to
converse in a productive manner, we must talk revisions.  All other
talk is offtopic :-)

> in fact, i'd say it's our responsibility to take a text _to_ that point
> before we ever release it to the public at large in the first place...

I am less concerned about having a work 100% correct before it is
released, but that is an unrelated topic, and it is for others to
decide their own responsibilities.  I have strong opinions on this,
but in regards to the direction of Wikisource.  I like different
models to co-exist.

> so we don't _need_ a "version control system", thank you much...

I'm hoping to make you _want_ one.

>>   For over a year now I have wondered how to feed to PG
>>   the "fixes" that Wikisource editors have made to PG etexts,
>
> the p.g. system for updating e-texts is filled with major stupidity.
>
>
>>   How would I submit a single revision to PG.
>
> e-mail it to them.  check back a year later to see if it's been fixed.

I am very bad at workflows that have gaps of longer than a week.

>>   How would I go about submitting hundreds of revisions feasibly?
>
> they suggest you send them an updated copy of the file...
>
>
>>   Is the revision history of an etext retained?
>
> not really.  sometimes the old version is retained, so you could
> execute your own compare operation between the two e-texts.
> but sometimes the old version isn't even retained...
>
> furthermore, there's no good way to ascertain whether your copy
> of an e-text is equivalent to the latest copy available, except to
> download the current one and compare it to your existing copy...

... and you dont want a version control system?  Colour me confused!

--
John

From jayvdb at gmail.com  Fri Aug  8 21:41:54 2008
From: jayvdb at gmail.com (John Vandenberg)
Date: Sat, 9 Aug 2008 14:41:54 +1000
Subject: [gutvol-d] online, offline, and bothline
In-Reply-To: <489CB14D.7060600@perathoner.de>
References: 
	
	<489CB14D.7060600@perathoner.de>
Message-ID: 

On Sat, Aug 9, 2008 at 6:49 AM, Marcello Perathoner
 wrote:
> John Vandenberg wrote:
>
>> Now on the topic of database or not, there is a third option :- a
>> massively distributed "database," that has recently been built ...
>> git.
>
> That's still a file based system.

That it is.  It is also much more.

> What we need is an XML based system, where we can make updates down to the
> granularity of single XML elements. There's no need to transfer the whole
> work if only one paragraph is going to be changed.

granularity can be built into a git repo.  granularity adds overhead,
as does XML.  I expect we will need XML in the mix, and that many
discussions about granularity.

At this stage I'm not so concerned with how we would design the repo,
and what formats would be used.  I'm interested to see who is
interested in a model shift :- towards an adhoc mesh structure, of
people and projects operating disengaged but with a common language
keeping them unified.

--
John Vandenberg

From jayvdb at gmail.com  Sat Aug  9 01:38:51 2008
From: jayvdb at gmail.com (John Vandenberg)
Date: Sat, 9 Aug 2008 18:38:51 +1000
Subject: [gutvol-d] woman in her own right -- 008 (and final)
In-Reply-To: <489AC3D2.2080406@perathoner.de>
References: 
	
	
	<489AC3D2.2080406@perathoner.de>
Message-ID: 

On Thu, Aug 7, 2008 at 7:43 PM, Marcello Perathoner
 wrote:
> He has been antagonized by many people because many people (who actually
> *do* some work for PG) disagree with Bowerbirds ideas.

I can see this.

> If Bowerbird actually
> *did* some work for PG he would soon find out his ideas don't work in real
> life.

The solution, barring censorship, will be found in critically
assessing his output rather than his ideas.

> Sadly, doing some useful work is not in BB's style.

I've heard this from a few people now, on and off the list...

Could someone please clarify whether this is exaggerated a little?

i.e. Has bowerbird added *nothing* to PG except mailing list posts
which I've found to be plentiful, and sometimes even useful.

(wouldnt it be useful if PG had a mechanism where I could inquire
"show me all contributions from bowerbird")

--
John Mark Vandenberg
(who has, thus far, done nothing useful for PG)

From Bowerbird at aol.com  Sat Aug  9 03:45:36 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 9 Aug 2008 06:45:36 EDT
Subject: [gutvol-d] Bird,
Message-ID: 

dkretz said:
>    First, slow down. Think. Do you remember where you put the meds?

i'm at the national poetry slam, dude, the meds are _everywhere_,
in the food, in the water, floating like tiny dust particles in the air.


>   1. Does the current version load your projects, 
>    if they are already located on your local PC?

of course not.   it checks to see if they are already there,
and if they are, it loads 'em.   otherwise it downloads 'em,
and stores them locally so they'll be there the next time...

which, by the way, is the same thing that your browser does.
except your browser doesn't store them in a nice logical place,
with correct filenames, and it even erases them eventually.   ick.

this is a _book_ that i want to have represented on the hard-disk
of a user, in a way that the user _knows_ that they have the book.

see, for me, this isn't just a way to pull down books for proofing.
it is the system used as a means of distribution and propagation.


>    2. I'm not trying to duplicate the functionality of Banana Cream. 

well of course you don't need to _think_ of it that way.

as long as you think of yourself as making a preprocessing tool,
you'll do just fine.   but of course, when you think of it _that_ way,
you _will_ end up "duplicating the functionality" of banana cream.

unless you've arranged your infrastructure completely differently...


>    What I have is an existing application that renames image files

i know this.   i don't think other people know it, though, so perhaps
you've written this history into your post so that they will learn it...

but i'm not sure why you think it's appropriate for them to learn it.
it doesn't bear much weight, not that i can see.


>    I added a new form to it, to be able to view the text and image 
>    together, and preprocess texts the way I'm currently comfortable 
>    doing it, with my EB-specific regexes. 

well, if you're just making this tool for yourself, that will be just fine.
but once you've scratched your own itch, you'll want to make a tool
that other people use to scratch their itch, and you'll need to decide
how they will obtain the resources that they will need to do the job...

that means the text and the scan-set.

you can just assume that they appeared on the hard-disk magically,
or instead you can incorporate the download ability into your tool...
given the ease of downloading from the cloud, you will do the latter,
if i don't miss my guess...


>    Pretty soon I'll be able to load regexes from the external file 
>    you've already seen, and execute them as I described; 
>    which includes the mode(s) you described, plus one or two 
>    others you may not hold is as high regard. No problem.

well, i guess i have to say it again, but you're putting far too much
emphasis on those reg-ex.   the rest of the tool is more important.

the reg-ex that will work for any specific book are easily discovered,
simply by assessing changes that were made by proofers during p1.

once you've done 20-40 books, you'll have a good idea of the _set_
of reg-ex needed to do the job for the type of books you have done.
if you've done 200-400, you will have an _excellent_ idea of the set...

so spending a lot of time poring over them isn't a good use of time.
that's my opinion.   but do whatever you want...


>    3. You have additional preprocessing capabilities that you will 
>    include in your software. If it's really, really easy to implement 
>    the same rules as regexes, I may add them to my file.

if you can do an array, you can do what i do.   it's extremely simple.


>    4. I have already added (but not recompiled and posted) 
>    a version that concatenates the pages into one file. 
>    That took about 15 minutes, as you would imagine it would take

once you've coded that same routine, from scratch, 20 times,
like i have, i'd imagine you'll be able to do it in about 2 minutes.        
:+)

but i don't even have that capability in "banana cream", because
i simply instruct the user to have the o.c.r. combine all the text...
doing everything right from the get-go will minimize the steps...


>    5. You have a repository that stores text and images 
>    in some probably compatible structure on a host.

i'm unsure what "repository" and "host" mean in that sentence, sorry.

but yes, the book exists both in the cloud -- the web, the internets --
and on the individual hard-disks of any person who goes and grabs it.
lots of copies keeps stuff safe.   the simple structure of the book online
is mirrored on a user's hard-drive with the exact same simple structure:
all of the files in a folder/directory compose the resources of that book...


>    You want my program to add the capability to transfer files from a host.
>    I do that using an ftp utility. 

i think it's better not to expect that your users will understand ftp.

even if they do understand ftp, i think it's better if they don't have to
use ftp.   especially since the tool can easily do what needs to be done.


>    It would be perhaps nice, and non-disruptive, 
>    to have my app transfer files. 

it would be nice.   and non-disruptive.   whatever that means...
more important than that, however, it would be darn _useful_.


>    How about if you send me a gmail address you would like to use 
>    for the purpose, and I'll add it to the svn server where the source
>    is posted (mentioned above), and you can add it yourself? 

um, i don't want to learn flex.   i have absolutely no need to learn flex.
i can already make cross-plat executables in my language of choice...
why would i need to learn flex?

and i don't intend to help with _any_ of the actual _programming_...

i'm posting my general advice on the _development_ of such a tool
because i've already _programmed_ such a tool, so i have reason to
think that i know how to do it, and have faced some of the decisions
that have to be made when you traverse this route, so i believe that
i can give programmers -- like you, but also other ones yet to come --
some good advice that will save them boatloads of time to get where
they will eventually want to go.

you can take that advice, or leave it, no skin off my nose either way...
it is my gift to the walls in the lobby of the project gutenberg library.


>    If you wait for me to do it, it will take longer probably, 
>    given priorities and Real Life and stuff.

no problem.   i've been singing this song for 5 years now,
so a little more waiting don't make no difference to me...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Sat Aug  9 03:49:38 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 9 Aug 2008 06:49:38 EDT
Subject: [gutvol-d] online, offline, and bothline
Message-ID: 

john-

i think we're talking at cross-purposes.

d.p. does things the d.p. way.   i think i have a better way of doing it.

p.g. does things the p.g. way.   i think i have a better way of doing it.

i'm in the process of informing people about my way of doing it...
trust me, the process does involve revisions, change-logs, all of it.
it does include a version-control-system, one based on simplicity.
(which, by the way, does _not_ mean that it lacks for max power.)
but "version control" is only one small part of a very long workflow.

anyway, like i said, i think i have a better way of doing it.

people might agree with me, or they might disagree with me.   fine.

you're talking about "git", and you say that i need to learn about it.

sorry, but i'm already able to do everything i want to do without it,
everything i need without it, so i don't think i need to learn about it.

whether p.g. would benefit from "git", i can't really say, to be honest.

whether d.p. would benefit from "git", i can't really say, to be honest.

but there's no way i can see that my methods would benefit from "git".

now maybe once i've described my methods, you might be able to tell me
precisely how i might benefit from "git", and i'll be open to the 
possibility.

but in the meantime, i can't see spending any time to go learn about it...
i can tell just from the terminology that it's the kind of technoid mumble
that sends me running out of the room screaming...   maybe it's just me...
but i seek to build simple, transparent processes with no learning curve.

however, perhaps the p.g. people here would like to hear about this "git"?
or perhaps the d.p. people here would like to hear about it?   let john know.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From marcello at perathoner.de  Sat Aug  9 04:45:30 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat, 09 Aug 2008 13:45:30 +0200
Subject: [gutvol-d] woman in her own right -- 008 (and final)
In-Reply-To: 
References: 			<489AC3D2.2080406@perathoner.de>
	
Message-ID: <489D835A.8020808@perathoner.de>

John Vandenberg wrote:

> (wouldnt it be useful if PG had a mechanism where I could inquire
> "show me all contributions from bowerbird")

Find a summary here:

   http://www.gnutenberg.de/bowerbird/


This is also a demonstration of PGTEI 0.5 (with sources included).


-- 
Marcello Perathoner
webmaster at gutenberg.org


From prosfilaes at gmail.com  Sat Aug  9 06:59:03 2008
From: prosfilaes at gmail.com (David Starner)
Date: Sat, 9 Aug 2008 09:59:03 -0400
Subject: [gutvol-d] woman in her own right -- 008 (and final)
In-Reply-To: 
References: 
	
	
	<489AC3D2.2080406@perathoner.de>
	
Message-ID: <6d99d1fd0808090659r5208f16av282d226458584a2c@mail.gmail.com>

On Sat, Aug 9, 2008 at 4:38 AM, John Vandenberg  wrote:
> i.e. Has bowerbird added *nothing* to PG except mailing list posts
> which I've found to be plentiful, and sometimes even useful.

His stats page on DP is
; you might
have to log in to see it. To summerize, he once spent the time to do
32 pages, and has never completed even a single page under the new
system he complains about constantly. IIRC, DP admins dragged out
every account they believed was his, and found no evidence he had done
work under another account, nor has he claimed such.

> (wouldnt it be useful if PG had a mechanism where I could inquire
> "show me all contributions from bowerbird")

Not really; I've never heard the request before. Unless you choose to
be anonymous, every book you upload to PG has your name on it;
Bowerbird could easily point to his real name or alias used for
uploading books to PG if he is in fact doing so.

From dakretz at gmail.com  Sat Aug  9 12:02:12 2008
From: dakretz at gmail.com (don kretz)
Date: Sat, 9 Aug 2008 12:02:12 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 33
In-Reply-To: 
References: 
Message-ID: <627d59b80808091202o11b4774ej436a83178a4a811e@mail.gmail.com>

>>
>> ---------- Forwarded message ----------
>> From: "John Vandenberg" 
>> To: "Project Gutenberg Volunteer Discussion" 
>> Date: Sat, 9 Aug 2008 14:41:54 +1000
>> Subject: Re: [gutvol-d] online, offline, and bothline
>> On Sat, Aug 9, 2008 at 6:49 AM, Marcello Perathoner
>>  wrote:
>>
>> At this stage I'm not so concerned with how we would design the repo,
>> and what formats would be used. I'm interested to see who is
>> interested in a model shift :- towards an adhoc mesh structure, of
>> people and projects operating disengaged but with a common language
>> keeping them unified.
>>
>> --
>> John Vandenberg
>>

Count me. I've advocated for investigating SCC software as the text
repository since I first started working with DP. And the software just
keeps getting better (or at least my awareness of it ...).
The three most attractive benefits that I anticipate are:
1. Being given a stable platform and a bunch of useful tools (e.g. diff,
audit trail) that we don't need to reinvent.
2. Forcing an abstraction of the repository interface, so the developers can
get beyond thinking about every problem in hardware and low level software
terms.
3. Nearly automatic support for multiple UIs and utilities sharing resources
without interfering with each other.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From cannona at fireantproductions.com  Sat Aug  9 16:56:45 2008
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sat, 9 Aug 2008 18:56:45 -0500
Subject: [gutvol-d] removing show-through from scanned images
Message-ID: 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Hi all.

Does anyone know of a way to easily get rid of show-through from grayscale
scans using Irfanview or Gimp?  By show-through, I of course mean the faint
text which bleeds through from the other side of the page.  I assume it's
some sort of thresholding option, but can't seem to figure it out.

Thanks.

Aaron


- --
Skype: cannona
MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iEYEAREDAAYFAkieLsIACgkQI7J99hVZuJd+OgCeO8ASe0OScb8AM508jTXZ3dqW
6s8AoM6MjUTwIuyTXoB0Tqs1e0t8zJQA
=1GWz
-----END PGP SIGNATURE-----

From jayvdb at gmail.com  Sat Aug  9 19:43:54 2008
From: jayvdb at gmail.com (John Vandenberg)
Date: Sun, 10 Aug 2008 12:43:54 +1000
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 33
In-Reply-To: <627d59b80808091202o11b4774ej436a83178a4a811e@mail.gmail.com>
References: 
	<627d59b80808091202o11b4774ej436a83178a4a811e@mail.gmail.com>
Message-ID: 

On Sun, Aug 10, 2008 at 5:02 AM, don kretz  wrote:
>>> ---------- Forwarded message ----------
>>> From: "John Vandenberg" 
>>> To: "Project Gutenberg Volunteer Discussion" 
>>> Date: Sat, 9 Aug 2008 14:41:54 +1000
>>> Subject: Re: [gutvol-d] online, offline, and bothline
>>> On Sat, Aug 9, 2008 at 6:49 AM, Marcello Perathoner
>>>  wrote:
>>>
>>> At this stage I'm not so concerned with how we would design the repo,
>>> and what formats would be used. I'm interested to see who is
>>> interested in a model shift :- towards an adhoc mesh structure, of
>>> people and projects operating disengaged but with a common language
>>> keeping them unified.
>>>
>>> --
>>> John Vandenberg
>>>
>
> Count me. I've advocated for investigating SCC software as the text
> repository since I first started working with DP. And the software just
> keeps getting better (or at least my awareness of it ...).
> The three most attractive benefits that I anticipate are:
> 1. Being given a stable platform and a bunch of useful tools (e.g. diff,
> audit trail) that we don't need to reinvent.
> 2. Forcing an abstraction of the repository interface, so the developers can
> get beyond thinking about every problem in hardware and low level software
> terms.
> 3. Nearly automatic support for multiple UIs and utilities sharing resources
> without interfering with each other.

This is a nice overview & comparison.

http://www.infoq.com/articles/dvcs-guide

Of particular importance for a PG repo will be a partial checkouts, a
feature of git, so that people only need to obtain the "bundle" of
files that are needed to do the work they intend to do.

The MS Windows support of git is behind the unix/OSX implementation,
which has caused many projects to adopt other alternatives, however
the Windows command line support is more than usable, and there is a
java implementation which is also doing very well.  I have faith that
by the time we have constructed a functioning git repo, a windows GUI
would be stable.

If there is another vcs you think we should consider, I am all ears.

--
John Vandenberg

From gilrick at iinet.net.au  Sat Aug  9 23:25:35 2008
From: gilrick at iinet.net.au (gilrick at iinet.net.au)
Date: Sun, 10 Aug 2008 14:25:35 +0800
Subject: [gutvol-d] removing show-through from scanned images
In-Reply-To: 
References: 
Message-ID: <6rfke8$ag4gj5@outbound.icp-qv1-irony-out2.iinet.net.au>

Aaron,

I have never had any success (post scanning) getting rid of the 
show-through.  When scanning placing a solid sheet of totally opaque 
paper, card or plastic to match the colour of the printing on top of 
the page to be scanned removes the show-through.  So a sheet of solid 
black over black print gives a clean scan.

Good luck,

Gil


At 07:56 AM 10/08/2008, you wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: RIPEMD160
>
>Hi all.
>
>Does anyone know of a way to easily get rid of show-through from grayscale
>scans using Irfanview or Gimp?  By show-through, I of course mean the faint
>text which bleeds through from the other side of the page.  I assume it's
>some sort of thresholding option, but can't seem to figure it out.
>
>Thanks.
>
>Aaron
>
>
>- --
>Skype: cannona
>MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail
>address.)
>
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.4.8 (MingW32) - GPGrelay v0.959
>Comment: Key available from all major key servers.
>
>iEYEAREDAAYFAkieLsIACgkQI7J99hVZuJd+OgCeO8ASe0OScb8AM508jTXZ3dqW
>6s8AoM6MjUTwIuyTXoB0Tqs1e0t8zJQA
>=1GWz
>-----END PGP SIGNATURE-----
>_______________________________________________
>gutvol-d mailing list
>gutvol-d at lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d

From grythumn at gmail.com  Sun Aug 10 01:55:53 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Sun, 10 Aug 2008 04:55:53 -0400
Subject: [gutvol-d] removing show-through from scanned images
In-Reply-To: 
References: 
Message-ID: <15cfa2a50808100155o4eea2ff3nf96270f2d6230c1a@mail.gmail.com>

On Sat, Aug 9, 2008 at 7:56 PM, Aaron Cannon
 wrote:
> Hi all.
>
> Does anyone know of a way to easily get rid of show-through from grayscale
> scans using Irfanview or Gimp?  By show-through, I of course mean the faint
> text which bleeds through from the other side of the page.  I assume it's
> some sort of thresholding option, but can't seem to figure it out.

Simple thresholding will work, or at least reduce it, if you are
working with something with uniform lighting (book on an Optibook,
say, or a destructively scanned book). Adaptive thresholding works
better for something with non-uniform lighting (microfilm, digicam,
book on regular flatbed scanner, etc.) Abbyy Finereader does a pretty
good job (Be sure to enable "Convert grayscale/Color to black and
white"), and recent versions of imagemagick have a local-adaptive
thresholding option, IIRC.

R C

From ralf at ark.in-berlin.de  Sun Aug 10 10:11:56 2008
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Sun, 10 Aug 2008 19:11:56 +0200
Subject: [gutvol-d] removing show-through from scanned images
In-Reply-To: 
References: 
Message-ID: <20080810171156.GA9841@ark.in-berlin.de>

You wrote 
> Does anyone know of a way to easily get rid of show-through from grayscale
> scans using Irfanview or Gimp?  By show-through, I of course mean the faint
> text which bleeds through from the other side of the page.  I assume it's
> some sort of thresholding option, but can't seem to figure it out.

Try also Colors/Retinex with gimp.


ralf

From cannona at fireantproductions.com  Sun Aug 10 17:55:30 2008
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sun, 10 Aug 2008 19:55:30 -0500
Subject: [gutvol-d] removing show-through from scanned images
References: 
	<6rfke8$ag4gj5@outbound.icp-qv1-irony-out2.iinet.net.au>
Message-ID: 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Who woulda' thunk it.  The black background did the trick.  Thanks all for
the suggestions.

Aaron


- --
Skype: cannona
MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail
address.)
- ----- Original Message -----
From: 
To: "Project Gutenberg Volunteer Discussion" 
Sent: Sunday, August 10, 2008 1:25 AM
Subject: Re: [gutvol-d] removing show-through from scanned images


> Aaron,
>
> I have never had any success (post scanning) getting rid of the
> show-through.  When scanning placing a solid sheet of totally opaque
> paper, card or plastic to match the colour of the printing on top of the
> page to be scanned removes the show-through.  So a sheet of solid black
> over black print gives a clean scan.
>
> Good luck,
>
> Gil
>
>
> At 07:56 AM 10/08/2008, you wrote:
>>-----BEGIN PGP SIGNED MESSAGE-----
>>Hash: RIPEMD160
>>
>>Hi all.
>>
>>Does anyone know of a way to easily get rid of show-through from grayscale
>>scans using Irfanview or Gimp?  By show-through, I of course mean the
>>faint
>>text which bleeds through from the other side of the page.  I assume it's
>>some sort of thresholding option, but can't seem to figure it out.
>>
>>Thanks.
>>
>>Aaron
>>
>>
>>- --
>>Skype: cannona
>>MSN/Windows Messenger: cannona at hotmail.com (don't send email to the
>>hotmail
>>address.)
>>
>>-----BEGIN PGP SIGNATURE-----
>>Version: GnuPG v1.4.8 (MingW32) - GPGrelay v0.959
>>Comment: Key available from all major key servers.
>>
>>iEYEAREDAAYFAkieLsIACgkQI7J99hVZuJd+OgCeO8ASe0OScb8AM508jTXZ3dqW
>>6s8AoM6MjUTwIuyTXoB0Tqs1e0t8zJQA
>>=1GWz
>>-----END PGP SIGNATURE-----
>>_______________________________________________
>>gutvol-d mailing list
>>gutvol-d at lists.pglaf.org
>>http://lists.pglaf.org/listinfo.cgi/gutvol-d
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iEYEAREDAAYFAkifkC0ACgkQI7J99hVZuJf0nACgmCMzE9cyNMNIfe9/76ISAbL1
b14AoKM5p9QYv2F31kww+eWGLtOLjjez
=skKO
-----END PGP SIGNATURE-----

From hyphen at hyphenologist.co.uk  Mon Aug 11 00:55:11 2008
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Mon, 11 Aug 2008 08:55:11 +0100
Subject: [gutvol-d] removing show-through from scanned images
In-Reply-To: 
References: 	<6rfke8$ag4gj5@outbound.icp-qv1-irony-out2.iinet.net.au>
	
Message-ID: <000c01c8fb87$961192f0$c234b8d0$@co.uk>


Aaron Cannon wrtote

> Who woulda' thunk it.  The black background did the trick.  Thanks all for
> the suggestions.

>Aaron

How strange! I have had the problem but would never have guessed.  
Perhaps someone should put that trick on the PG web site FAQ.

Dave Fawthrop


From Bowerbird at aol.com  Tue Aug 12 10:39:22 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 12 Aug 2008 13:39:22 EDT
Subject: [gutvol-d] back in los angeles
Message-ID: 

ok, i'm back from the national poetry slam team championship.   (charlotte 
won.)

so, john, did people ever get you straight on whether i'm "useless" or not?   
         :+)

every person has a purpose...

and i am quite content in knowing mine...

and i'm not sure exactly why some people think they can question mine...

but be that as it may...

now that the festival adrenaline has worn off and i'm back at home,
i'll probably be mostly sleeping for the rest of the week, so i will just
reprise some posts i've made in the past, while i regain my bearings...

one of the subjects that john and don want me to address is _revisions_
-- error-reports, change-logs, the whole 9 yards -- so i'll focus on that.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Tue Aug 12 10:46:03 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 12 Aug 2008 13:46:03 EDT
Subject: [gutvol-d] from the way-back machine -- 01
Message-ID: 

here's a post that i made way back on march 9, 2005...

subject: a wiki-like mechanism for continuous proofreading and 
error-reporting

jon said:
>    I believe as you do that an error reporting system is a good idea
>    so readers may submit errors they find in the texts they use -- 
>    sort of an ongoing post-DP proofing process.

i didn't elaborate earlier that it goes much deeper than that.

a very important point here is that an error-reporting system
-- over and above the obvious effect of getting errors fixed --
will actively incorporate readers into the entire infrastructure,
making them active participants cumulating a world of e-books.

if you have ever edited a page on a wiki, you're likely aware that
the experience gives a very strong feeling of _empowerment_ --
because you can "leave your mark" right on a page, quite literally.

if we set up a wiki-page to collect the error-reports for an e-text,
in a system allowing people to check the text against a page-image,
they'll be much more motivated to report errors than they are now,
with the "send an e-mail" system.   the feedback is more immediate,
and compelling, with a wiki.   furthermore, by collecting the reports,
in the change-log right on the wiki, you can avoid duplicate reports.
you can also give rational for rejecting any submitted error-reports,
and/or engage people in discussion about whether to act on a report.

all of this makes your readers feel _responsible_ for the e-texts.

a lifetime of experience with printed matter has made people very
_passive_ about typographic errors.   there's no reason to "report"
an error they find in a newspaper, for instance, because hey, it's
already been printed.   the same with a magazine or a printed book.
water under the bridge.   and they translate that same attitude over
to e-books, even though it _does_ do good to report errors there.
so we need to do something to shake them out of their passivity,
something to make them feel _responsible_ for helping fix errors.

(just for the record, although i use the term "wiki", i don't mean it
literally.   what i have in mind is more of a "guestbook" type method,
where people can _add_ their text to the page, but not necessarily
_delete_ what other people have added.   it's thus more like a blog,
where everyone can add their comments to the bottom of the page,
but the top part stays constant, to list the "official" information.
but i'll still use the term "wiki" to connote a free-flowing attitude.)

in addition to the wiki, you can build an error-reporting capability
into the viewer-program that you give people to display the e-texts.
if they doubt something in the e-text, they click a button and boom!,
that page-image is downloaded into the program so they can see it.
if they have indeed found an error, they copy the line in its bad form,
correct it to its good form, and then click another button and boom!,
the error-report is e-mailed right off to the proper e-mail address.

this symbolic (and real!) incorporation of readers into our processes
is a rad thing to do.   but it's not the _only_ benefit of such a system;
it also facilitates the automation of the error-correction procedures.

the error-report can be formatted such that your software can
automatically summon the e-text _and_ the relevant page-scan.
so you see a screen with the page-scan _and_ the error-report.
you check its merit, and if it's good, click the "approve" button
and the e-text is automatically edited.   further, the change-log
is updated right on the wiki-page for that e-text, and anyone who
requested error-notification gets an e-mail describing the change.
auxiliary versions of the e-text -- like the .html and .pdf files --
are automatically updated.   and all you did was click one button...
face it, if you're dealing with 15,000+ e-texts, doing it manually
is a sure-fire way to burn yourself out.   who needs that hassle?

i mocked up a demo up this, using a simple a.o.l. guestbook script.
i'm sure you versatile script-kiddies here could do something that
was much more sophisticated, but my version will give you the idea:
      http://users.aol.com/bowerbird/proof_wiki.html

-bowerbird


**************
Looking for a car that's sporty, fun and fits in your 
budget? Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Tue Aug 12 16:33:07 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 12 Aug 2008 19:33:07 EDT
Subject: [gutvol-d] tanglewood tales
Message-ID: 

posted digest 51-23 came through today at noon.

"tanglewood tales" -- e-text #976 -- was updated.

i suggest that we use that as an experimental text
for our discussion of _revisions,_change-logs,_etc._

my analysis shows 36 corrections were made to it...

of course, unless you're willing to do the comparison,
you won't know that, because project gutenberg gives
_no_change-log_ for its revisions...

not only that, but they've made it more difficult than it
needs to be for people to do the comparison, because
they rewrapped the text, giving the world _yet_another_
set of linebreaks for this book, for absolutely no reason.

at any rate, 36 corrections is about what i would expect.
this was a _very_ early e-text, obviously, but contrary to
what many people would have you believe, it is _untrue_
early e-texts have inferior quality.   some do, some don't.

at 36 errors -- with none of 'em all that consequential --
this was a fairly typical early e-text, a tad better than most.

of course, this e-text is dated september 2001, while the
_publication_ date is july 1997, so it _might_ have_been_
reworked between 1997 and 2001, we just don't know...

indeed, it might have been reworked _since_ 2001, and
they just kept the same date, we don't know that either...

but for our purposes, we'll use the 2001 and 2008 dates,
and take the e-texts at face value...

-bowerbird

p.s.   let's applaud david widger for redoing these e-texts!
and for providing an .html version alongside the .txt file...
in creating a .txt file that will auto-generate an .html file,
david is creating consistency in the .txt files, a good thing!



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Wed Aug 13 02:02:46 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 13 Aug 2008 05:02:46 EDT
Subject: [gutvol-d] tanglewood tales -- 01
Message-ID: 

ok, let's take a look at the new improved "tanglewood tales", e-text #976.

one of the easiest checks to do is for period-whitespace-lowercase combo.

oops!

got 1 hit right off the bat:
>    partly because it seemed so sad a thing to take away this young man's
>    life. however wicked he might be, and partly, no doubt, because his
>    heart was wiser than his head, and quaked within him at the thought of

that period after "life" (in the second line) should probably be a comma...

not a serious error, no, but one that was very easy for the machine to catch.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Wed Aug 13 14:09:43 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 13 Aug 2008 17:09:43 EDT
Subject: [gutvol-d] on error-reporting, revisions, change-logs, etc.
Message-ID: 

john, did yesterday's re-post of that march 2005 message
help you better understand how i think of error-reporting?

i stress:
1)   shaking people out of their apathy on reporting of errors.
2)   an infrastructure which makes it easy to check for an error.
3)   the importance of immediate feedback to activate readers.
4)   a system that makes error-reports available to the public.
5)   the possibility of dialog revolving around an error-report.
6)   the means for registering a "minority opinion" on an error.

i've implemented this type of system on the books i've posted.

>    http://z-m-l.com/go/myant/myantp123.html
>    http://z-m-l.com/go/mabie/mabiep123.html
>    http://z-m-l.com/go/sgfhb/sgfhbp123.html
>    http://z-m-l.com/go/mount/mountp123.html

as you'll see on each of those pages -- on all of my pages --
there is an error-reporting form at the bottom of the page...

to report an error, they simply copy in the bad line, as is,
and then make the correction to it.   very straightforward.

and, as expected, the name of the error-log is standardized:
>    http://z-m-l.com/go/myant/myant-er.html
>    http://z-m-l.com/go/mabie/mabie-er.html
>    http://z-m-l.com/go/sgfhb/sgfhb-er.html
>    http://z-m-l.com/go/mount/mount-er.html

additionally, people can use the form to leave a comment...

john, i made a note for you on this page:
>    http://z-m-l.com/go/mount/mountp123.html

if you go there, you'll see the note on the bottom of the page,
plus the link to the error-log for that book.   click the link and
you can see all of the error-reports for the book, _including_
the note that i left.   each report there contains a back-link too.

this makes it very easy for an administrator to get an overview
of all of the error-reports for the book, and to check each one.

if a specific report has pointed out something actually incorrect,
then the administrator can set wheels into motion to correct it...
(more about that tomorrow...)

i call this system of public input "continuous proofreading", and
i would require every book to go through 6 months of it _before_
being placed into a format more conducive to immersive reading.

in addition to this online "continuous proofreading" mode, however,
an error-reporting capability can be put into our offline viewer-app.

when a person is reading our book in this viewer, and suspects that
there's an error, they can click a button to download the page-scan.
if an examination of the scan shows that the text is in fact in error,
then -- just as on the web-page -- the person can report the bug
by copying in the bad line of text, then making the correction to it,
and then clicking a button to upload the error-report, which is sent
directly to the form for that page on the website, for public viewing.
(in this regard, it's no different from if they'd made the report there.)

no "send us an e-mail and it might be corrected in the future" crap;
the error-report is _made_public_immediately_, so that the person
who comes to the web-page the very next second is informed of it...
(and can make up their own mind on its validity, by viewing the scan.)

so, john, this gives you an introduction to my error-reporting system.
as i said, tomorrow i will get into things from the _administrator_ side,
but i think it's important to see the interface from the _user_ side first...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-BMW-128-2008/expert-review?ncid=aolaut00050000000017 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From whirl123 at gmail.com  Thu Aug 14 08:40:34 2008
From: whirl123 at gmail.com (Wendy Verbruggen)
Date: Thu, 14 Aug 2008 16:40:34 +0100
Subject: [gutvol-d] no copyright information?
Message-ID: <817982f10808140840w19b857c8yaaa21320a9f3e2f5@mail.gmail.com>

Hi all,

I'm a real newbie to PG, so I hope I'm posting to the correct list..

I've been trying to find a book I could start on, and I thought I had found
3 possible candidates:

(1) Mary Grant Bruce - Captain Jim
(2) Thornton Burgess - Old Man Coyote
(3) Ernest Poole - Blind

They aren't already done and not on the "in-progress" list, and as far as I
could find information on them on the internet they were published before
1923.

I have also found a copy of all three in our college library (and they
certainly look and smell old enough ;-), but there's a problem:

(2) contains the text "First published in 1938" - I'm assuming that just
means that it is a reprint? and therefore that I can't use it?

But what's really worrying me is that (1) and (3) don't contain *any*
information at all. There's a page with the title and author, even the
publisher is mentioned - but no date whatsoever. Does that happen a lot? And
what does it mean? Should I try to find other copies of these books, or
would that be a waste of time (I'm not even sure where to start looking)? Or
does the absence of copyright information actually mean I *can* use these
books?

I'll happily do some more searching for a suitable book, but I'm a bit
worried that of the three I randomly chose already 2 of them have no
information.. I can't look at them in the library and have to order them and
then wait for a day before I get them, so it would be nice to at least
increase the odds.

Any help/information will be much appreciated!

Thanks,
Wendy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From walter.van.holst at xs4all.nl  Thu Aug 14 09:00:56 2008
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Thu, 14 Aug 2008 18:00:56 +0200
Subject: [gutvol-d] no copyright information?
In-Reply-To: <817982f10808140840w19b857c8yaaa21320a9f3e2f5@mail.gmail.com>
References: <817982f10808140840w19b857c8yaaa21320a9f3e2f5@mail.gmail.com>
Message-ID: <48A456B8.1030802@xs4all.nl>

Wendy Verbruggen wrote:
> Hi all,
> 
> I'm a real newbie to PG, so I hope I'm posting to the correct list..
> 
> I've been trying to find a book I could start on, and I thought I had 
> found 3 possible candidates:
> 
> (1) Mary Grant Bruce - Captain Jim

According to the Library of Congress Catalog she lived from 1878-1958, 
but this particular title is not in that catalog.

> (2) Thornton Burgess - Old Man Coyote

Again, the Library of Congress Catalog has this author, but also this title:

The adventures of Old Man Coyote, by Thornton W. Burgess ... with...
Relevance: 	
LC Control No.: 	16022282
LCCN Permalink: 	http://lccn.loc.gov/ 16022282
Type of Material: 	Book (Print, Microform, Electronic, etc.)
Personal Name: 	Burgess, Thornton W. (Thornton Waldo), 1874-1965.
Main Title: 	The adventures of Old Man Coyote, by Thornton W. Burgess 
... with illustrations by Harrison Cady.
Published/Created: 	Boston, Little, Brown, and company, 1916.
Description: 	120 p. illus. 18 cm.


> (3) Ernest Poole - Blind

LC Control No.: 	 20018299
LCCN Permalink: 	http://lccn.loc.gov/ 20018299
Type of Material: 	Book (Print, Microform, Electronic, etc.)
Personal Name: 	Poole, Ernest, 1880-1950.
Main Title: 	Blind; a story of these times,
Published/Created: 	New York, The Macmillan company, 1920.
Description: 	3 p. l., 3-416 p. 19 cm.


> But what's really worrying me is that (1) and (3) don't contain *any* 
> information at all. There's a page with the title and author, even the 
> publisher is mentioned - but no date whatsoever. Does that happen a lot? 
> And what does it mean? Should I try to find other copies of these books, 
> or would that be a waste of time (I'm not even sure where to start 
> looking)? Or does the absence of copyright information actually mean I 
> *can* use these books?

This happens a lot, especially with older books. It usually is possible 
to gather the relevant data from public catalogs. In this case I picked 
the Library of Congress Catalog since that is one of the most extensive 
ones available in the English speaking world.

Regards,

  Walter

From grythumn at gmail.com  Thu Aug 14 09:18:17 2008
From: grythumn at gmail.com (Robert Cicconetti)
Date: Thu, 14 Aug 2008 12:18:17 -0400
Subject: [gutvol-d] no copyright information?
In-Reply-To: <817982f10808140840w19b857c8yaaa21320a9f3e2f5@mail.gmail.com>
References: <817982f10808140840w19b857c8yaaa21320a9f3e2f5@mail.gmail.com>
Message-ID: <15cfa2a50808140918m70994763u92a9bd4fbf699d94@mail.gmail.com>

On Thu, Aug 14, 2008 at 11:40 AM, Wendy Verbruggen  wrote:
> Hi all,
> I'm a real newbie to PG, so I hope I'm posting to the correct list..
> I've been trying to find a book I could start on, and I thought I had found
> 3 possible candidates:
> (1) Mary Grant Bruce - Captain Jim
> (2) Thornton Burgess - Old Man Coyote
> (3) Ernest Poole - Blind
[...]
> I have also found a copy of all three in our college library (and they
> certainly look and smell old enough ;-), but there's a problem:
> (2) contains the text "First published in 1938" - I'm assuming that just
> means that it is a reprint? and therefore that I can't use it?
> But what's really worrying me is that (1) and (3) don't contain *any*
> information at all. There's a page with the title and author, even the
> publisher is mentioned - but no date whatsoever. Does that happen a lot? And
> what does it mean? Should I try to find other copies of these books, or
> would that be a waste of time (I'm not even sure where to start looking)? Or
> does the absence of copyright information actually mean I *can* use these
> books?

Hmm. First question.. (forgive this if it seems obvious) have you
looked on the back side of the title page (aka verso)?

You need to look at the country of origin.. Copyright notices were a
requirement on US publications, but not necessary on books published
in the UK. Posting the publisher information would make this more
certain, but:

For the first one, if your publisher notice matches this you should be
fine (From BL integrated catalogue):
 Bruce, Mary Grant, Captain Jim, etc. (pp. 311. Ward, Lock & Co.:
London, 1919.)

The third appear to have been published in the US.. the BL points to a
US edition (From the LoC catalog):
Main Title: 	 Blind; a story of these times,
Published/Created: 	New York, The Macmillan company, 1920.

The second one was originally published before 1923, but may have
modern content that must omitted or cleared as a Rule 6. "First
Published" often refers to new material. You can also do a comparison
against a known PD version. The keywords for reprints are "reprint"
and "facsimile".

Good catalogs to look (for english-language books): LoC, BL, NYPL
(Catnyp?), Amicus, Worldcat (usually involves a per-search charge to
the library you are searching from; use last). If your edition matches
the information of a major catalog, you can use that to establish the
date for clearance purposes.

R C

From sly at victoria.tc.ca  Thu Aug 14 09:49:43 2008
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu, 14 Aug 2008 09:49:43 -0700 (PDT)
Subject: [gutvol-d] no copyright information?
In-Reply-To: <817982f10808140840w19b857c8yaaa21320a9f3e2f5@mail.gmail.com>
References: <817982f10808140840w19b857c8yaaa21320a9f3e2f5@mail.gmail.com>
Message-ID: 


Ok, first things first. Are you aware of Distributed Proofreaders?
I would strongly suggest you spend some time there first if you
are able to. You can go and help out with just one page at a time,
and get some idea of issues that are likely to come up in
transcribing old books.

Otherwise, you can try to take on a whole project on your own,
but you'll have a steep learning curve in front of you if you do.

For your first few texts, _keep things simple as possible_
don't try to do some complex copyright clearance. The easiest
thing is to work with a book with a clear pre-1923 copyright
or publication statement. But one of the ones you've found
could work.

As Robert mentioned, more data is needed. This can include
_all_ the text on the title page, and its verso. Ocassionally
there is useful information on the last few pages as well.

The point is to be able to match it up with a library
record, so things like publisher, title statement, and
number of pages (determined by the last page that has
a number printed on it) are important.

Does that help?

--Andrew

On Thu, 14 Aug 2008, Wendy Verbruggen wrote:

> Hi all,
>
> I'm a real newbie to PG, so I hope I'm posting to the correct list..
>
> I've been trying to find a book I could start on, and I thought I had found
> 3 possible candidates:
>
> (1) Mary Grant Bruce - Captain Jim
> (2) Thornton Burgess - Old Man Coyote
> (3) Ernest Poole - Blind
>
> They aren't already done and not on the "in-progress" list, and as far as I
> could find information on them on the internet they were published before
> 1923.
>
> I have also found a copy of all three in our college library (and they
> certainly look and smell old enough ;-), but there's a problem:
>
> (2) contains the text "First published in 1938" - I'm assuming that just
> means that it is a reprint? and therefore that I can't use it?
>
> But what's really worrying me is that (1) and (3) don't contain *any*
> information at all. There's a page with the title and author, even the
> publisher is mentioned - but no date whatsoever. Does that happen a lot? And
> what does it mean? Should I try to find other copies of these books, or
> would that be a waste of time (I'm not even sure where to start looking)? Or
> does the absence of copyright information actually mean I *can* use these
> books?
>
> I'll happily do some more searching for a suitable book, but I'm a bit
> worried that of the three I randomly chose already 2 of them have no
> information.. I can't look at them in the library and have to order them and
> then wait for a day before I get them, so it would be nice to at least
> increase the odds.
>
> Any help/information will be much appreciated!
>
> Thanks,
> Wendy
>

From whirl123 at gmail.com  Thu Aug 14 10:14:06 2008
From: whirl123 at gmail.com (whirl)
Date: Thu, 14 Aug 2008 18:14:06 +0100
Subject: [gutvol-d] no copyright information?
In-Reply-To: <15cfa2a50808140918m70994763u92a9bd4fbf699d94@mail.gmail.com>
References: <817982f10808140840w19b857c8yaaa21320a9f3e2f5@mail.gmail.com>
	<15cfa2a50808140918m70994763u92a9bd4fbf699d94@mail.gmail.com>
Message-ID: <817982f10808141014r2c840c67je04aec98925d41ee@mail.gmail.com>

Thanks everyone for the information!

I think I will submit the Mary Grant Bruce book for copyright clearance
since the information given by Robert matches the book exactly (and with a
bit of luck I'll get to the other two as well after that!).

> For the first one, if your publisher notice matches this you should be
> fine (From BL integrated catalogue):
> Bruce, Mary Grant, Captain Jim, etc. (pp. 311. Ward, Lock & Co.:
> London, 1919.)

I still think it's a bit strange that the books contain so little
information, but at least they do have a publisher so that's good :-)

Thanks again! I'm sure I'll be back in due course with more questions ;-)

Wendy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ajhaines at shaw.ca  Thu Aug 14 10:19:55 2008
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Thu, 14 Aug 2008 10:19:55 -0700
Subject: [gutvol-d] no copyright information?
References: <817982f10808140840w19b857c8yaaa21320a9f3e2f5@mail.gmail.com>
Message-ID: <001401c8fe31$f8b76400$6401a8c0@ahainesp2400>

You might also want to talk to your library's staff to see if they're aware of Project Gutenberg. 
Perhaps they can help you find a book that's *definitely* clearable under PG's Rule 1, i.e. it has a 
pre-1923 copyright or publish date, rather than a book that's undated and needs to be researched. 
(Andrew: I wrote this before seeing your similar advice (great minds, etc.) .)

One thing to be cautious of is that scanning can be rough on a book, especially if the binding is 
old, and stiff or fragile.  A library might not be too thrilled at loaning a book in reasonable 
condition, and getting it back somewhat the worse for wear.

Thrift/charity stores, or library book sales (check http://www.booksalefinder.com/ to find one near 
you), are excellent places to find old books on the cheap, and since you'd own such books, you 
wouldn't have to worry about being responsible to someone else for the wear and tear.


Start with something simple--all prose, with no illustrations.  (Illustrations require an HTML 
version of the etext, as well as the basic text version required by PG.)  Stay away from poetry or 
plays, unless you like doing the necessary formatting.

As one of the PG Whitewashers, I've dealt with a number of first-time submitters who bit off much 
more than they could chew.


All three of the authors you mention are represented in PG.  You should take a look at some of those 
submissions to see how they've been handled.

Al


----- Original Message ----- 
From: Wendy Verbruggen
To: gutvol-d at lists.pglaf.org
Sent: Thursday, August 14, 2008 8:40 AM
Subject: [gutvol-d] no copyright information?


Hi all,


I'm a real newbie to PG, so I hope I'm posting to the correct list..


I've been trying to find a book I could start on, and I thought I had found 3 possible candidates:


(1) Mary Grant Bruce - Captain Jim
(2) Thornton Burgess - Old Man Coyote
(3) Ernest Poole - Blind


They aren't already done and not on the "in-progress" list, and as far as I could find information 
on them on the internet they were published before 1923.


I have also found a copy of all three in our college library (and they certainly look and smell old 
enough ;-), but there's a problem:


(2) contains the text "First published in 1938" - I'm assuming that just means that it is a reprint? 
and therefore that I can't use it?


But what's really worrying me is that (1) and (3) don't contain *any* information at all. There's a 
page with the title and author, even the publisher is mentioned - but no date whatsoever. Does that 
happen a lot? And what does it mean? Should I try to find other copies of these books, or would that 
be a waste of time (I'm not even sure where to start looking)? Or does the absence of copyright 
information actually mean I *can* use these books?


I'll happily do some more searching for a suitable book, but I'm a bit worried that of the three I 
randomly chose already 2 of them have no information.. I can't look at them in the library and have 
to order them and then wait for a day before I get them, so it would be nice to at least increase 
the odds.


Any help/information will be much appreciated!


Thanks,
Wendy



_______________________________________________
gutvol-d mailing list
gutvol-d at lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d 



From whirl123 at gmail.com  Thu Aug 14 10:31:05 2008
From: whirl123 at gmail.com (whirl)
Date: Thu, 14 Aug 2008 18:31:05 +0100
Subject: [gutvol-d] no copyright information?
In-Reply-To: <001401c8fe31$f8b76400$6401a8c0@ahainesp2400>
References: <817982f10808140840w19b857c8yaaa21320a9f3e2f5@mail.gmail.com>
	<001401c8fe31$f8b76400$6401a8c0@ahainesp2400>
Message-ID: <817982f10808141031j39f324f4j2d0b9edaa19b9ec1@mail.gmail.com>

Thanks for your concern, and I know that this is going to be a big project.
But I'm not planning on scanning the book (I would never do that to a
library book!), I'm going to type it. While that is of course a lot of work,
I think it will limit the technical difficulties.

I've typed books before, just for fun.. but then I heard of PG and figured I
might as well make  my typing useful to someone :)

It's a good point though to look at books on PG of these same authors to see
how they're formatted, I'll certainly do that!

Wendy

On Thu, Aug 14, 2008 at 6:19 PM, Al Haines (shaw)  wrote:

> You might also want to talk to your library's staff to see if they're aware
> of Project Gutenberg. Perhaps they can help you find a book that's
> *definitely* clearable under PG's Rule 1, i.e. it has a pre-1923 copyright
> or publish date, rather than a book that's undated and needs to be
> researched. (Andrew: I wrote this before seeing your similar advice (great
> minds, etc.) .)
>
> One thing to be cautious of is that scanning can be rough on a book,
> especially if the binding is old, and stiff or fragile.  A library might not
> be too thrilled at loaning a book in reasonable condition, and getting it
> back somewhat the worse for wear.
>
> Thrift/charity stores, or library book sales (check
> http://www.booksalefinder.com/ to find one near you), are excellent places
> to find old books on the cheap, and since you'd own such books, you wouldn't
> have to worry about being responsible to someone else for the wear and tear.
>
>
> Start with something simple--all prose, with no illustrations.
>  (Illustrations require an HTML version of the etext, as well as the basic
> text version required by PG.)  Stay away from poetry or plays, unless you
> like doing the necessary formatting.
>
> As one of the PG Whitewashers, I've dealt with a number of first-time
> submitters who bit off much more than they could chew.
>
>
> All three of the authors you mention are represented in PG.  You should
> take a look at some of those submissions to see how they've been handled.
>
> Al
>
>
> ----- Original Message ----- From: Wendy Verbruggen
> To: gutvol-d at lists.pglaf.org
> Sent: Thursday, August 14, 2008 8:40 AM
> Subject: [gutvol-d] no copyright information?
>
>
>
> Hi all,
>
>
> I'm a real newbie to PG, so I hope I'm posting to the correct list..
>
>
> I've been trying to find a book I could start on, and I thought I had found
> 3 possible candidates:
>
>
> (1) Mary Grant Bruce - Captain Jim
> (2) Thornton Burgess - Old Man Coyote
> (3) Ernest Poole - Blind
>
>
> They aren't already done and not on the "in-progress" list, and as far as I
> could find information on them on the internet they were published before
> 1923.
>
>
> I have also found a copy of all three in our college library (and they
> certainly look and smell old enough ;-), but there's a problem:
>
>
> (2) contains the text "First published in 1938" - I'm assuming that just
> means that it is a reprint? and therefore that I can't use it?
>
>
> But what's really worrying me is that (1) and (3) don't contain *any*
> information at all. There's a page with the title and author, even the
> publisher is mentioned - but no date whatsoever. Does that happen a lot? And
> what does it mean? Should I try to find other copies of these books, or
> would that be a waste of time (I'm not even sure where to start looking)? Or
> does the absence of copyright information actually mean I *can* use these
> books?
>
>
> I'll happily do some more searching for a suitable book, but I'm a bit
> worried that of the three I randomly chose already 2 of them have no
> information.. I can't look at them in the library and have to order them and
> then wait for a day before I get them, so it would be nice to at least
> increase the odds.
>
>
> Any help/information will be much appreciated!
>
>
> Thanks,
> Wendy
>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From sly at victoria.tc.ca  Thu Aug 14 10:49:35 2008
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu, 14 Aug 2008 10:49:35 -0700 (PDT)
Subject: [gutvol-d] no copyright information?
In-Reply-To: <817982f10808141031j39f324f4j2d0b9edaa19b9ec1@mail.gmail.com>
References: <817982f10808140840w19b857c8yaaa21320a9f3e2f5@mail.gmail.com>
	<001401c8fe31$f8b76400$6401a8c0@ahainesp2400>
	<817982f10808141031j39f324f4j2d0b9edaa19b9ec1@mail.gmail.com>
Message-ID: 

Hi Wendy.

Al's advice is good; in the beginning try to avoid something
with complex formatting.

Just a reminder: get the copyright clearance done before you
start if you have any doubts.

Since you plan on typing the text out, it might be a good idea
to do one chapter and then get some feedback on it. I would
volunteer to do that if you like.

--Andrew

On Thu, 14 Aug 2008, whirl wrote:

> Thanks for your concern, and I know that this is going to be a big project.
> But I'm not planning on scanning the book (I would never do that to a
> library book!), I'm going to type it. While that is of course a lot of work,
> I think it will limit the technical difficulties.
>
> I've typed books before, just for fun.. but then I heard of PG and figured I
> might as well make  my typing useful to someone :)
>
> It's a good point though to look at books on PG of these same authors to see
> how they're formatted, I'll certainly do that!
>
> Wendy
>

From Bowerbird at aol.com  Thu Aug 14 11:15:25 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 14 Aug 2008 14:15:25 EDT
Subject: [gutvol-d] tanglewood tales -- 02
Message-ID: 

we're examining the new improved "tanglewood tales", e-text #976.

another very easy routine is a search for comma-whitespace-uppercase,
controlling for names, of course.   so let's run that and see what pops up.

oops!

a hit on this routine too:
>    Was Theseus afraid? By no means, my dear auditors. What! a hero like
>    Theseus afraid, Not had the Minotaur had twenty bull-heads instead of
>    one. Bold as he was, however, I rather fancy that it strengthened his

that comma after "afraid" should be an exclamation point, according to 
google:
>    http://books.google.com/books?id=JtvjZ5GjtHoC
navigate to page 32...

that's 2 easily-found-by-the-computer errors in this just-updated book...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Thu Aug 14 12:53:12 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 14 Aug 2008 15:53:12 EDT
Subject: [gutvol-d] on error-reporting, revisions, change-logs, etc.
Message-ID: 

ok, so yesterday we looked at things from the user's perspective.

today we'll look at the error-reporting process from the view of
the actions to be taken by the administrator of the cyberlibrary...

first we will consider the actual _formatting_ of the error-reports.

i use a "bad/good" paired-lines format which is, as usual, simple.

the top line is the original line with error, and the bottom is the fix.

so we might have something like this, for instance:
>    will not bc tempted to become an author by profession. If so I
>    will not be tempted to become an author by profession. If so I

occasionally, to facilitate easy human grasping of the difference,
i'll also include a third line that highlights the point of difference:
>    will not bc tempted to become an author by profession. If so I
>    will not be tempted to become an author by profession. If so I
>    ==========^===================================================
(you have to use a monospaced font to see the change-line correctly.)

i have considered a lot of ways of presenting points of difference,
and -- in my experience -- this is the one that gives best results.

note that this format would _not_ work for large-scale changes
-- where entire paragraphs get moved around in a document --
but that's not the kind of "editing" that's typically done with o.c.r.

note also that this methodology requires that the linebreaks are
retained throughout the process.   the simplicity of this system is
one reason why i believe p-book linebreaks should be retained...

(once you start dealing with _revisions_, the complexity of it all
jumps up considerably if the p-book linebreaks aren't retained,
to the point it becomes unmanageable for all practical purposes,
especially since there are so many _other_ good reasons for the
retention of p-book linebreaks.   remember, once the end-user
gets the file, they are totally free to rewrap the text if they want,
so that's not a consideration; we're only talking about our format.)

i'm assuming that the text that goes in is fairly well representative
of the text that comes out, and for that, this format works well...

moreover, this format lends itself easily to _machine-readability_,
an extremely important consideration in the scheme of our stuff,
since my whole process is meant to be executed programmatically.

the first step is checking whether the error-report calls for action;
the relevant page-scan needs to be summoned for human review.

if there _is_ an error, then a chain of events is brought into action:
*   thank-you e-mail is sent to the person who reported the error.
*   the text-file is called up and the edit is automatically executed.
*   all the change-logs are updated to show the edit that was done.
*   the auxiliary versions -- .html, .pdf, etc. -- are auto-generated.
*   all of the old files are renamed and the new files take their place.
*   the library catalog is updated to list the new dates of these files.
*   a notification of the change is sent to the listserve for that book.

if the judgment of the administrator is that the error-report is bad,
e-mail about the decision is sent to the person who filed the report,
and the decision is also posted to the webpage of the relevant page
in order to head off any duplicates of the error-report in the future.

again, everything that _can_ be automated within these processes
_will_ be automated, so ongoing maintenance of the cyberlibrary
can be done with a minimum of human energy.   it's fully possible.
nutshell: if it's costly to maintain your library, you're doing it wrong.

tomorrow i'll give you some actual examples of the system in action,
so if anyone has any questions or wants to suggest a sample book,
do speak up.   otherwise i'm happy to continue this as a monologue.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Thu Aug 14 13:42:13 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 14 Aug 2008 16:42:13 EDT
Subject: [gutvol-d] whenever there's a rumor site that needs page views
Message-ID: 

ha!   geek culture has the teleblawg's number:

>    http://www.geekculture.com/joyoftech/joyarchives/1136.html

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ebooks at ibiblio.org  Thu Aug 14 21:39:49 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Fri, 15 Aug 2008 00:39:49 -0400
Subject: [gutvol-d] tanglewood tales -- 02
In-Reply-To: 
References: 
Message-ID: <48A50895.5080107@ibiblio.org>

Bowerbird wrote:

> we're examining the new improved "tanglewood tales", e-text #976.
> 
> another very easy routine is a search for comma-whitespace-uppercase, 
> controlling for names, of course.  so let's run that and see what pops up.


It is a "very easy routine," especially if one knows how to use it. 
Let's try it on Bowerbird's version of "Mountain Blood," which he's 
talked about so much recently.

http://z-m-l.com/go/mount/mount.zml

And we get a hit on these two lines on page 20:


     Nothing further back was known in Greenstream,
     It was well known that the first George Gordon MacKimmon


That comma after "Greenstream" should be a period, according to the 
scan of page 20 mounted next to the text of the page in Bowerbird's 
continuous proofreading version:

http://z-m-l.com/go/mount/mountp020.html

(In case the period isn't clear enough in that scan, here's a link to 
a larger image:

http://z-m-l.com/go/mount/mountp020.png )


> oops!


You can say that again. Oops! The next time you do a search for 
"comma-whitespace-uppercase" you may want to make sure that the 
"whitespace" can include a new line. :)


> that's 2 easily-found-by-the-computer errors in this just-updated book...


Here's a second "easily-found-by-the-computer" error in Bowerbird's 
version of "Mountain Blood." Since he's so fond of "spacey quotes," 
let's search for them. Sure enough, we get a hit on this line on page 163:

     pork. Then he muttered, " -- full of ideas and airs.

http://z-m-l.com/go/mount/mountp163.html

Oops again!


Jose Menendez


P.S. I  thought about using Bowerbird's "error-reporting form" on 
pages 20 and 163 of his continuous proofreading version, but I hate 
filling out forms. :)

From marcello at perathoner.de  Fri Aug 15 05:13:23 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri, 15 Aug 2008 14:13:23 +0200
Subject: [gutvol-d] tanglewood tales -- 02
In-Reply-To: <48A50895.5080107@ibiblio.org>
References:  <48A50895.5080107@ibiblio.org>
Message-ID: <48A572E3.8060304@perathoner.de>

Jose Menendez wrote:

> You can say that again. Oops! The next time you do a search for 
> "comma-whitespace-uppercase" you may want to make sure that the 
> "whitespace" can include a new line. :)

He also must not read the file line-wise. MUAHAHAHA!



From wvholst at xs4all.nl  Fri Aug 15 07:56:42 2008
From: wvholst at xs4all.nl (Walter H. van Holst)
Date: Fri, 15 Aug 2008 16:56:42 +0200 (CEST)
Subject: [gutvol-d] tanglewood tales -- 02
Message-ID: <21693.80.127.124.230.1218812202.squirrel@webmail.xs4all.nl>

> P.S. I  thought about using Bowerbird's "error-reporting form" on
> pages 20 and 163 of his continuous proofreading version, but I hate
> filling out forms. :)

Schadenfreude is hardly ever a good thing. Nonetheless, I found this
rather amusing.

Regards,

 Walter


From lee at novomail.net  Fri Aug 15 08:09:08 2008
From: lee at novomail.net (Lee Passey)
Date: Fri, 15 Aug 2008 09:09:08 -0600
Subject: [gutvol-d] tanglewood tales -- 02
In-Reply-To: <48A572E3.8060304@perathoner.de>
References:  <48A50895.5080107@ibiblio.org>
	<48A572E3.8060304@perathoner.de>
Message-ID: <48A59C14.5020906@novomail.net>

Marcello Perathoner wrote:

> Jose Menendez wrote:
> 
>> You can say that again. Oops! The next time you do a search for 
>> "comma-whitespace-uppercase" you may want to make sure that the 
>> "whitespace" can include a new line. :)
> 
> He also must not read the file line-wise. MUAHAHAHA!

And because both of you allegedly are in his "kill file," (as am I) 
he'll never even know about the flaws. If, at some point in the future, 
we discover that his routines suddenly account for newlines as 
whitespace, we'll know that all his bluster about not paying any 
attention to his critics and detractors is just show.

From Bowerbird at aol.com  Fri Aug 15 09:46:07 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 15 Aug 2008 12:46:07 EDT
Subject: [gutvol-d] tanglewood tales -- 03
Message-ID: 

we're looking at the new improved "tanglewood tales", e-text #976.

another check is whether all the double-quotes are nested correctly.

oops!

got a hit on this check too:
>    "O no, dear Proserpina," cried the sea nymphs; "we dare not go with you
>    upon the dry land. We are apt to grow faint, unless at every breath we
>    can snuff up the salt breeze of the ocean. And don't you see how careful
>    we are to let the surf wave break over us every moment or two, so as
>    to keep ourselves comfortably moist? If it were not for that, we should
>    look like bunches of uprooted seaweed dried in the sun.
>    
>    "It is a great pity," said Proserpina. "But do you wait for me here, and
>    I will run and gather my apron full of flowers, and be back again before

the close-quote on the upper paragraph is missing, 
since the lower paragraph has a different speaker...

3 total easily-found-by-the-computer errors in this just-updated book...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug 15 09:49:31 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 15 Aug 2008 12:49:31 EDT
Subject: [gutvol-d] tanglewood tales -- 02
Message-ID: 

thanks, jose, but i'm not finished with "mountain blood" yet.
there are other errors in it as well...   as i am sure you know...            
 :+)

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug 15 09:53:42 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 15 Aug 2008 12:53:42 EDT
Subject: [gutvol-d] tanglewood tales -- 02
Message-ID: 

jose said:
>    >   P.S. I? thought about using Bowerbird's "error-reporting form"
>    >    on pages 20 and 163 of his continuous proofreading version, 
>    >   but I hate filling out forms. :)

walter said:
>    Schadenfreude is hardly ever a good thing.

ah, heck, take your humor wherever you can find it...
if a few little errors on my part brighten up your day,
feel free to feel all the "schadenfreude" you want, walter...


>    Nonetheless, I found this rather amusing.

you found it amusing that jose hates filling out forms?
wow, walter, you must be _really_ hard-up for humor...            ;+)

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug 15 11:52:24 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 15 Aug 2008 14:52:24 EDT
Subject: [gutvol-d] on error-reporting, revisions, change-logs, etc.
Message-ID: 

for my change-logs, i use a sample "bad/good" paired-lines format.

the top line is the original line with error, and the bottom is the fix.
occasionally, i'll also include a third line to pinpoint the difference:
>    will not bc tempted to become an author by profession. If so I
>    will not be tempted to become an author by profession. If so I
>    ==========^===================================================

a chain of events is brought into action to execute this revision:
*   thank-you e-mail is sent to the person who reported the error.
*   the text-file is called up and the edit is automatically executed.
*   all the change-logs are updated to show the edit that was done.

*   the auxiliary versions -- .html, .pdf, etc. -- are auto-generated.
*   all of the old files are renamed and the new files take their place.
*   the library catalog is updated to list the new dates of these files.
*   a notification of the change is sent to the listserve for that book.

it's important to note that all of these steps are done _automatically_.

so let's look at each of them individually...


*   thank-you e-mail is sent to the person who reported the error.

if the person included their e-mail address on the error-report,
a "thank-you" e-mail is sent to them to reward their sharp eyes...


*   the text-file is called up and the edit is automatically executed.

this is simple enough.   the program calls up the .zml text-file and
does a search for the bad line(s), replaces it with the good line(s),
appends the change-time to the name of the older .zml text-file,
and then saves the new file with the default .zml name.   all done...


*   all the change-logs are updated to show the edit that was done.

the paired-lines of the edit are appended to the book's change-log.
they are also appended to any other change-logs that are relevant,
such as the library change-log, the month change-log, and so on...


*   the auxiliary versions -- .html, .pdf, etc. -- are auto-generated.

the next automatic-step is to regenerate all of the auxiliary versions.


*   all of the old files are renamed and the new files take their place.

the old versions of the auxiliary files are renamed (or just deleted),
and the new versions take their place...


*   the library catalog is updated to list the new dates of these files.

the library catalog is updated, as is any other metadata being kept.


*   a notification of the change is sent to the listserve for that book.

in my cyberlibrary, each book has a listserve/wiki attached to it, and
notification of any changes are posted, to inform interested people.
(these days, notice-giving could also occur via an r.s.s. feed or even
via twitter or friendfeed or any of a number of social-aggregators.)


again, all this happens _automatically_ whenever an administrator
submits a list of changes to the program that will execute that list.

***

ok, i just regenerated all that off the top of my head, without going
back to earlier posts to see if i'd forgotten anything, so there might
be a few things i would add to this list if my memory was refreshed,
or if someone made a good suggestion that i hadn't thought of, but
this should give you a good idea of the type of system i have in mind.

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug 15 13:06:37 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 15 Aug 2008 16:06:37 EDT
Subject: [gutvol-d] on error-reporting, revisions, change-logs, etc.
Message-ID: 

the paired-lines change-log format can be extended to other uses.

for instance, consider this file:
>    http://z-m-l.com/go/oneoo/weball.html

in this case, i was comparing two different digitizations of a book.

the black lines are ones that were identical in the two digitizations.
the red/blue lines are the ones that differed between digitizations.

***

here's another (smaller) version of that same file:
>    http://z-m-l.com/go/oneoo/webone.html

this version only contains the lines where there was a difference.
this version _also_ contains a radio-button by each differing line,
so that proofers can go through the text -- summoning the scan
if they desire -- and click the radio-button of the _correct_ line...

if neither line is correct, they can click the "check it" box instead.

(and if both lines look to be correct, which could happen here,
because these two digitizations were of different _editions_ of 
the book, then the proofer would also click the "check it" box...
or, of course, they can just decline to click any of those buttons.)

when they press the "doit" button at the very bottom of the page,
their "votes" are stored with all the votes from the other proofers.

a program can then determine which lines are consistently selected,
and create a change-log in paired-line format for an administrator,
who would submit that file to the program which executes changes.

none of this is difficult to program.

***

i think that's all i have on this topic for now, so i'll send this off.
if i do think of more to say, i'll type it up for you for next week...

have a good weekend!         :+)

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Sat Aug 16 05:46:08 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 16 Aug 2008 08:46:08 EDT
Subject: [gutvol-d] tanglewood tales -- 04
Message-ID: 

we're examining the new improved "tanglewood tales", e-text #976.

i like to make hyphenates consistent across the book --
most especially when they were consistent in the p-book.

appended are 22 mid-line hyphenates that had variant forms.
my examination indicates that 15 corrections should be made,
but i'm not gonna count those as "errors" here on this exercise,
although i probably should.   jose would count them against me.      ;+)

however, one case came up when i searched for an uppercase letter
directly following a lowercase letter, so i _will_ count this instance...

there were two occurrences of "wonder-book" in the updated version:
>    'Wonder-Book'?"
>    my literary experience by constituting me editor of the "Wonder-Book."

and also two occurrences of "wonderbook" in the updated version:
>    the "WonderBook."
>    ask me to edit a third "WonderBook," the public of little folks must not

a check of the scans on google showed they were all hyphenated there,
so we'll have to presume the two unhyphenated instances were errors...

it also showed that none of the instances was surrounded by quotemarks,
but i won't count that as "errors", since that _could_ be a version 
difference.

still, 5 total easily-found-by-the-computer errors in this just-updated 
book...

-bowerbird

p.s.   here are the 22 instances of variant forms of a mid-line hyphenate.
i've indicated each of the 15 lines i would change with a "***" at the end.

blood-thirsty
2    that they had once been bloodthirsty men, and would now tear him limb
1    impossible to withstand this blood-thirsty battalion with his single ***

bull-head
1    Ah! the bull-headed villain! And O, my good little people, you will
1    Theseus afraid, Not had the Minotaur had twenty bull-heads instead of
3    fair upon the neck, and made his bull head skip six yards from his human 
***

dinner-time
1    there was a good fire in the kitchen, and that, at dinner-time, a ***
3    be served up. If your appetites tell you it is dinner time, then come
3    hospitable hostess that any hour of the day was dinner time with them,

dew-drops
1    of oaken bark, and she that sprinkled dew-drops from her fingers' ends,
2    of salt water, and the fountain nymph, besides scattering dewdrops from 
***

far-off
1    themselves among the far-off clouds. That sight, to be sure, made them
3    friendly and affectionate manner, far, far off, in the middle of hot
3    But her brothers were too far off to hear; especially as the fright took
3    would be heard as far off as King Agenor's palace!
3    of providing a comfortable meal. Not far off they saw a tuft of trees,
3    of it that there was a crew of poor shipwrecked mariners, not far off,

farm-house
2    farmhouses they sought hospitality, needed their assistance in the
1    All night long, at the door of every cottage and farm-house, Ceres ***

figure-head
1    "cut me off! cut me off! and carve me into a figure-head for your
2    tree. A carver in the neighborhood engaged to make the figurehead. ***
1    figure-heads, in what he intended for feminine shapes, and looking
1    until it was completed, and set up where a figure-head has always stood,
1    advice be wiser than this which Jason received from the figure-head of
1    figure-head.
1    so deeply that the figure-head drank the wave with its marvelous lips,

gold-hilted
1    King Aegeus trembled again. His eyes had fallen on the gold-hilted sword
1    his sandals and gold-hilted sword), and, hastening to the king, inquired
1    his gold-hilted sword, ready drawn from its scabbard, in the other, and

golden-hilted
1    valiant feats with his father's golden-hilted sword, and had gained ***

grown-up
3    grown up so suddenly out of the earth. But soon he saw the mountain
3    grown up there, in the center of which was seen a stately palace of
3    the habitation of a mighty king. It had grown up out of the earth in
1    The child promised to be as prudent as if she were a grown-up woman;
3    him. Had you left him to my care, he would have grown up like a child of
3    boys ride upon his back. And so, when his scholars had grown up, and
3    sandal. She had grown up in a very wild way, and talked much about the
3    think the dragon very terrible. You have grown up from infancy in the

ill-natured
1    such bodies as this. If he had been as ill-natured to them as he was
1    will, and looking as cross and ill-natured as you can imagine, on its
3    "Well, Jason," whispered Medea (for she was ill natured, as all ***

king-like
3    The king likewise heard the noise of the shuttle in the loom and the
1    Ulysses looked even more manly and king-like than before. He gave the
3    clouds, a long distance off, and looking like a flock of wild geese.

life-like
2    walked, and made other lifelike motions, there yet was a kind of jerk
1    was a life-like picture of their recent adventures, showing them in the 
***
2    appeared a lifelike representation of the head of Medusa with the snaky

life-long
1    upper servant of his people, and that it must be his life-long labor to
3    service, I promise to be grateful to you my whole life long."' Gazing

look-out
3    woman, with a kindly look out of her beautiful brown eyes. "Only let
1    was stationed as a look-out in the prow, where he saw a whole day's sail

many-colored
1    of many-colored hailstones, upon the heads of grown people and children,
3    wreaths that shall be as lovely as this necklace of many colored ***
1    could the most brilliant of the many-colored gems, which Proserpina had

pine-tree
3    little. He was so very tall that he carried a pine tree, which was eight
1    friends. His pine-tree walking stick lay on the ground, close by his
1    "Halloo, brother Antaeus! Get up this minute, and take your pine-tree
1    club, which looked bulkier and heavier than the pine-tree walking stick
3    the while brandishing the sturdy pine tree, so that it whistled through
3    "By hitting you a rap with this pine tree here," shouted Antaeus,
3    blow at him with his pine tree, which Hercules caught upon his club; and
3    who groaned and trembled at the stroke. His pine tree went so deep into
3    feet again, and pulled his pine tree out of the earth; and, all aflame
3    Giant's pine tree was shattered into a thousand splinters, most of which
3    "Step forward," cried he. "Since I've broken your pine tree, we'll try

re-create
1    in order to re-create the original myths.
2    voice from the top of the stairs, and who loved to recreate himself with 
***

sea-shore
2    together near the seashore in their father's kingdom of Phoenicia. They 
***
2    seashore, scampered across the sand, took an airy leap, and plunged ***
1    of his comrades, whom he had left at the sea-shore. These being arrived,
1    sea-shore, she hastened thither as fast as she could, and there beheld
1    towards the sea-shore; and in that direction, over the people's heads,
1    Argonauts spread a plentiful feast on the sea-shore, well knowing, from

to-morrow
1    "To-morrow, at breakfast time, you shall have an opportunity of judging
2    tomorrow. If I do not then return, you must hoist sail, and endeavor to 
***
1    to-morrow or the next day, or a hundred years hence, but were generally
1    to-day, and to-morrow morning, since you insist upon it, you shall try
1    you know. Come! Your night's work has been well performed; and to-morrow
1    set sail from Colchis before to-morrow's sunrise, the king means to

walking-stick
3    feet through the butt, for a walking stick. It took a far-sighted Pygmy,
3    friends. His pine-tree walking stick lay on the ground, close by his
3    walking stick in your hand. Here comes another Giant to have a tussle
3    club, which looked bulkier and heavier than the pine-tree walking stick
3    and seizing his walking stick, he strode a mile or two to meet him; all
1    skull with my walking-stick!" ***

wide-open
1    whose wide-open eyes are fixed so eagerly upon him. Thus the stories
3    she stood, with her pretty mouth wide open, as pale as the white lilies
3    and throwing them wide open, passed into the next room. Eurylochus,
3    stepped boldly forward, and threw the folding doors wide open. The
1    the contents of the gold box right down the monster's wide-open throat.



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Sun Aug 17 00:52:46 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 17 Aug 2008 03:52:46 EDT
Subject: [gutvol-d] tanglewood tales -- 05
Message-ID: 

we're looking at the new improved "tanglewood tales", e-text #976.

i always change those ridiculous 4-dash em-dashes to 2-dash ones.
but i also then do a check for a 3-dash sequence, just to make sure...

oops!

got one:
>    and faithful ally---this virtuous Giant--this blameless and excellent

6 total easily-found-by-the-computer errors in this just-updated book...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From schultzk at uni-trier.de  Sat Aug 16 23:58:07 2008
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Sun, 17 Aug 2008 08:58:07 +0200
Subject: [gutvol-d] ...
In-Reply-To: <87y737afvc.fsf_-_@altern.org>
References: <1872592925.262851218110326603.JavaMail.mail@webmail08>
	<536B0A73-E9A7-437B-A2C1-231A9A366907@uni-trier.de>
	<87hc9wgpzg.fsf@altern.org>
	
	<87y737afvc.fsf_-_@altern.org>
Message-ID: 

Hi Bastien,

	There are different ways of quoting.
	Most people see quotes as an exact quote
	word for word.

	In literature a quote can be several things.
		1) exact quote
		2) a character
		3) a theme

	One can quote music and art, too.

	Marcello was offended by what Micheal said and
	tried to disarm the quote stating it was faulty.
	It was not. Especially, when one grasps the deeper
	meaning and motives.

	You would be surprised of the quotes we use
	every day.

	As L. Zadeh says: "We are still confused, but
	on a higher level"

	regards
		Keith,

Am 08.08.2008 um 13:46 schrieb Bastien Guerry:

> "Schultz Keith J."  writes:
>
>> As I have mentioned in other post. What is a quote and not we
>> can discuss in literature 101.
>
> All what you said about Michael's "quote" looks driven by the will to
> defend him.  I'm okay that there is nothing terribly wrong with wrong
> quotes (especially on a small mailing list), but there is something
> wrong in saying that wrong quotes are ok because they are metaphorical
> quotes...  Culture might be about mixing things together, but hardly
> about distorting them.
>
> -- 
> Bastien
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From hyphen at hyphenologist.co.uk  Sun Aug 17 01:16:04 2008
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sun, 17 Aug 2008 09:16:04 +0100
Subject: [gutvol-d] tanglewood tales -- 05
In-Reply-To: 
References: 
Message-ID: <001101c90041$7ef99590$7cecc0b0$@co.uk>

 

 

Bowerbird at aol.com wrote



 

>we're looking at the new improved "tanglewood tales", e-text #976.

>i always change those ridiculous 4-dash em-dashes to 2-dash ones.
>but i also then do a check for a 3-dash sequence, just to make sure...

>oops!

>got one:
>>   and faithful ally---this virtuous Giant--this blameless and excellent

>6 total easily-found-by-the-computer errors in this just-updated book...

Yet another case of imposing modern typographic conventions on an old book L

 

Dave Fawthrop

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From marcello at perathoner.de  Sun Aug 17 06:10:51 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun, 17 Aug 2008 15:10:51 +0200
Subject: [gutvol-d] tanglewood tales -- 05
In-Reply-To: <001101c90041$7ef99590$7cecc0b0$@co.uk>
References: 
	<001101c90041$7ef99590$7cecc0b0$@co.uk>
Message-ID: <48A8235B.9020505@perathoner.de>

Dave Fawthrop wrote:

>>i always change those ridiculous 4-dash em-dashes to 2-dash ones.
>>but i also then do a check for a 3-dash sequence, just to make sure...


There are a multitude of dash characters in typography. Perusing the 
unicode manual we find:

   hyphen-minus
   hyphen
   figure dash
   en dash
   em dash
   horizontal bar (quotation dash)

Each has its particular use. (More information can be found in the TeX 
and Metafont books by Knuth)


The same is valid for spaces.


> Yet another case of imposing modern typographic conventions on an old book L

More than that. A case of deliberately introducing errors. Take:

   came out of S---- Place and walked towards K---- bridge

The four hyphens stand for omitted text. The author intends thus to 
leave the town unnamed in which the story happens.

A typographer would have used "horizontal bars".


Now look at this:

   came out of S-- Place and walked towards K-- bridge

This is now an obvious error.


Which will be further compounded to:

   came out of S--Place and walked towards K--Bridge

by Bowerbirds whitespace-eating snake oil formulas.


A case of easily-introduced-by-the-computer errors.


From Bowerbird at aol.com  Sun Aug 17 13:09:31 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 17 Aug 2008 16:09:31 EDT
Subject: [gutvol-d] tanglewood tales -- 05
Message-ID: 

dave said:
>    Yet another case of imposing modern typographic conventions on an old 
book.

exactly.

precisely because i've republished that "old" book for "modern" people.       
 :+)

which is exactly what publishers have been doing for centuries now...

plus i am creating a _library_ that can be accessed _electronically_ with
machinery that was largely unimaginable when this book was published
originally, so i need to "impose" _consistency_ on that library in order to
maximize the tremendous benefits of that large-scale electronic access.

but hey, if you really _want_ to look at that "old" book as it was published,
take a look at the scans, baby, i've mounted them for your convenience...

***

besides, aside from all that, consider this...

an em-dash indicates a pause of indeterminate length.

does placing two of them together indicate 
a _longer_ "pause of indeterminate length"?

what would that possibly mean?

is 2 times infinity more than infinity?   probably not, thank you...

***

further, the change i counted as an error was a _three-dash-sequence_.

three dashes.   not two.   not four.   three.   exactly three.

do you really want to try to tell anyone that there actually is such a beast?
evidently it would indicate a pause of "indeterminate length" times 1.5...

here's the exact line:
>    faithful ally---this virtuous Giant--this blame-

you can find it on this page:
>    http://z-m-l.com/go/tnglw/tnglwp098.html

now, if you think i've offended the author's intentions, i'd _love_ to hear 
it.
otherwise, i believe you're missing the larger point of this series of 
posts...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Mon Aug 18 00:05:52 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 18 Aug 2008 03:05:52 EDT
Subject: [gutvol-d] tanglewood tales -- 06
Message-ID: 

we're examining the new improved "tanglewood tales", e-text #976.

our next check is for doublequote-whitespace-doublequote.

oops!

yet another hit:
>    the enemy! Lead us to the charge! Death or victory!" "Come on, brave

you can confirm this error here:
>    http://z-m-l.com/go/tnglw/tnglwp324.html

7 total easily-found-by-the-computer errors in this just-updated book...

-bowerbird

p.s.   and yes, having learned from jose, i was more careful this time to 
check for
doublequote-lineend-doublequote, not just doublequote-space-doublequote.



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ajhaines at shaw.ca  Mon Aug 18 11:16:24 2008
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Mon, 18 Aug 2008 11:16:24 -0700
Subject: [gutvol-d] tanglewood tales -- 06
References: 
Message-ID: <000d01c9015e$864badb0$6401a8c0@ahainesp2400>

Wrong.  David's posted version is correct, quote-mark-wise.  

It's quite common to have several quoted sentences in a single paragraph.  In this case, it's from several people shouting battle cries more or less simultaneously.

You're assuming that your edition of this book is identical with the edition from which the original was prepared.  You've blotted your copybook on this one.

Al

  ----- Original Message ----- 
  From: Bowerbird at aol.com 
  To: gutvol-d at lists.pglaf.org ; Bowerbird at aol.com 
  Sent: Monday, August 18, 2008 12:05 AM
  Subject: [gutvol-d] tanglewood tales -- 06


  we're examining the new improved "tanglewood tales", e-text #976.

  our next check is for doublequote-whitespace-doublequote.

  oops!

  yet another hit:
  >   the enemy! Lead us to the charge! Death or victory!" "Come on, brave

  you can confirm this error here:
  >   http://z-m-l.com/go/tnglw/tnglwp324.html

  7 total easily-found-by-the-computer errors in this just-updated book...

  -bowerbird

  p.s.  and yes, having learned from jose, i was more careful this time to check for
  doublequote-lineend-doublequote, not just doublequote-space-doublequote.



  **************
  Looking for a car that's sporty, fun and fits in your budget? Read reviews on AOL Autos.
  (http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 ) 


------------------------------------------------------------------------------


  _______________________________________________
  gutvol-d mailing list
  gutvol-d at lists.pglaf.org
  http://lists.pglaf.org/listinfo.cgi/gutvol-d
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Mon Aug 18 12:33:21 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 18 Aug 2008 15:33:21 EDT
Subject: [gutvol-d] tanglewood tales -- 06
Message-ID: 

al said:
>    Wrong.? David's posted version is correct, quote-mark-wise.?

prove it. 

?
>    It's quite common to have several quoted sentences in a single 
paragraph.

i strongly disagree it's "common".   but i _will_ grant you that it's not 
unknown.

still, i've _proven_ -- with a scan -- that it is _incorrect_ in this 
particular case.


>    You're assuming that your edition of this book is identical 
>    with the edition from which the original was prepared.? 

and you're trying to argue -- without summoning one shred of proof --
that the original was prepared from a different edition.   again, prove it...

i've pointed to a publicly-available scan-set -- one from archive.org --
that we expect will be around for a _very_ long time which shows clearly
that the p.g. e-text is in error on this point.

if you can point to another publicly-available scan-set which shows clearly
that the p.g. e-text accurately reflects what is on the page in that 
scan-set,
then you will have sidestepped this particular error. (and i would be most
delighted to continue my research using _that_ edition instead, thank you.)

but if you just want to argue there _might_ be a different edition out there
which does indeed show the text as it is recorded in the p.g. e-text, but
you cannot _show_ anyone a scan-set of that different edition, well then
i would imagine that you yourself can figure out how poor that argument is.

***

perhaps you haven't been reading my posts to know that this is a major point
that i have argued for some time now, but the inability of project gutenberg
to back up the accuracy of its e-texts with scan-sets is going to be a 
_crucial_
failing of p.g., because other cyberlibraries _will_ be able to provide that 
proof.

even in the cases where p.g. _does_ provide a scan-set, the fact that you 
have
_rewrapped_ your text will make it extremely difficult for people to 
_compare_
the text with the scans... so they will naturally -- and quite understandably 
--
gravitate instead to cyberlibraries which have _not_ rewrapped the text, so 
as
to make it _easy_ for a person to use the scan to verify the accuracy of the 
text.

i repeat, prove it...

-bowerbird



**************
Looking for a car that's sporty, fun and fits in your budget? 
Read reviews on AOL Autos.
      
(http://autos.aol.com/cars-Volkswagen-Jetta-2009/expert-review?ncid=aolaut00030000000007 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ebooks at ibiblio.org  Mon Aug 18 21:20:47 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Tue, 19 Aug 2008 00:20:47 -0400
Subject: [gutvol-d] tanglewood tales -- 02
In-Reply-To: 
References: 
Message-ID: <48AA4A1F.7020908@ibiblio.org>

On Aug. 15, 2008, Bowerbird wrote:

> thanks, jose, but i'm not finished with "mountain blood" yet.
> there are other errors in it as well...  as i am sure you 
> know...            :+)


I like the way you ignored the fact that those were two 
"easily-found-by-the-computer errors" in your ebook. Heck, I didn't 
even need a regex to find that spacey quote. All I had to search for 
was space-doublequote-space ( " ).

But you're certainly right about there being other errors in it. For 
instance, here's your last line on page 195:

     darkly, Gordon, stood still, Meta Beggs fell be-

It should be this:

     darkly. Gordon stood still, Meta Beggs fell be-

http://z-m-l.com/go/mount/mountp195.html
http://z-m-l.com/go/mount/mountp195.png

Two errors in a single line. Oops again! Here's a tip: Sentences 
sometimes begin with a name. :)

As for your not being finished with it yet, why it was back on June 
27th--7 weeks ago!--that you posted this message:

http://lists.pglaf.org/private.cgi/gutvol-d/2008-June/008691.html

Here's an excerpt:

> so what i've done is to show people what a well-preprocessed version
> of this file would look like.  you can find it up on my website already:
> >   http://z-m-l.com/go/mount/mount.zml
> 
> so, no, it doesn't take long to do the preprocessing right.  not long at all.


And the very next day, June 28th, you posted this challenge:

http://lists.pglaf.org/private.cgi/gutvol-d/2008-June/008692.html


> i decided to move "blood mountain" along a bit,
> so it should be pretty much finished right now...
> 
> all the people who claim that i don't know how to
> do this shit are invited to find the flaws in my work.
> 
> >   http://z-m-l.com/go/mount/mount.zml
> >   http://z-m-l.com/go/mount/mountp001.html


"blood mountain"?! You know, it's usually not a good indicator of 
accuracy when you get the book's title backwards. ;)

So on June 28th, you said, "it should be pretty much finished right 
now..." But *7 weeks* later you say, "i'm not finished with 'mountain 
blood' yet." Somehow I doubt that David Widger spent 7 weeks on 
"Tanglewood Tales."

Of course, I knew that you weren't finished with "Mountain Blood" back 
when you issued that challenge, because you kept making more 
corrections to it as you watched the DP proofreaders work on their 
version. If we take a look at your "mount" file directory,

http://z-m-l.com/go/mount/

we'll see these ZML files listed:


   Name                    Last modified      Size

   mount--old1.zml         27-Jun-2008 14:23  410K
   mount--old2.zml         27-Jun-2008 22:50  410K
   mount--old3.zml         28-Jun-2008 14:39  410K
   mount--old4.zml         29-Jun-2008 19:29  410K
   mount-old5.zml          30-Jun-2008 11:03  410K
   mount.zml               10-Jul-2008 15:18  410K


First of all, it's funny how you, the self-proclaimed master of 
file-naming, have inconsistent file names. Note the single hyphen in 
the "old5" name vs. the two hyphens in the other "old" names.

Second, note the ones dated after June 28th, when you said, "it should 
be pretty much finished right now..." And those aren't the only ZML 
files you had. Back on July 7th, I looked at that directory, and 
"mount.zml" had a last modified date of "07-Jul-2008 09:32." I saved 
it and posted it on my website here:

http://www.ibiblio.org/ebooks/mount-07-Jul.zml

You apparently had a July 6th ZML file, too. If we look at the 
directory sorted by last modified date,

http://z-m-l.com/go/mount/?C=M;O=A

we'll see that most of your HTML files are dated "06-Jul-2008."

Now in this post on July 11th:

[gutvol-d] how to clean up ("preprocess") the o.c.r. for a book -- 
roadmap2
http://lists.pglaf.org/private.cgi/gutvol-d/2008-July/008742.html

you claimed this:

> more importantly, from my perspective, the preprocessing i did
> seems to have left _just_3_errors_ in this book of over 360 pages,
> which these human p1 proofers detected...
> 
> here they are:
> 
> >   If they took away the chair, Gordon knew, he wag
> should be:
> >   If they took away the chair, Gordon knew, he was
> 
> >   "Why, damn it fell, Gord!" exclaimed an individual,
> should be:
> >   "Why, damn it t'ell, Gord!" exclaimed an individual,
> 
> >   grip of these blood-money men; we'll have a state
> >   la wed bank; a rate of interest a man can carry without
> should be:
> >   grip of these blood-money men; we'll have a state
> >   lawed bank; a rate of interest a man can carry without


Let's compare those later ZML files to each other and to the one dated 
June 28th to see if your claim is true. We'll start with 
"mount--old3.zml" (from June 28th)

http://z-m-l.com/go/mount/mount--old3.zml

and "mount--old4.zml" (from June 29th).

http://z-m-l.com/go/mount/mount--old4.zml

Well, I see that you changed "he wag" to "he was" in that June 29th 
file. Was that the only correction you made that day? No. You also 
changed this line:

     I'm no sheep to drive into their lot and shear I"

to this:

     I'm no sheep to drive into their lot and shear!"

And you changed this line:

     'but at night -- satin gowns with trains and bare

to this:

     but at night -- satin gowns with trains and bare

Besides "he wag," your preprocessing missed those other errors, too, 
didn't it?

Now let's compare "mount--old4.zml" to "mount-old5.zml" (from June 30th).

http://z-m-l.com/go/mount/mount-old5.zml

In that one, besides some hyphenation changes, you changed the
10 occurrences of "Lattice" to "Lettice." You changed this line:

     place like that. He would be prou'd of me, and all

to this:

     place like that. He would be proud of me, and all

You changed this line:

     "We've never been storekeepers,"

to this:

     "We've never been storekeepers."

And you changed this line:

     planted grew as if by magic. Old Matthew Zane'

to this:

     planted grew as if by magic. Old Matthew Zane

Your preprocessing missed those errors, too, didn't it?

By the way, there's something funny about your correction of that
"We've never been storekeepers," line. In this post on July 21st,

[gutvol-d] how to clean up ("preprocess") the o.c.r. for a book -- 019
http://lists.pglaf.org/private.cgi/gutvol-d/2008-July/008768.html

your tip was "19.  check the paragraph-breaks that occur on 
page-breaks." And the first two bad lines you cited were these:

     radiant content settled upon her,

     "We've never been storekeepers,"

But the first line was already corrected to

     radiant content settled upon her.

in your first June 27th ZML file, "mount--old1.zml."

http://z-m-l.com/go/mount/mount--old1.zml

That 19th preprocessing routine of yours must have been running very 
slowly, since it needed 3 more days to find that "We've never been 
storekeepers," line. ;)

Next, let's compare "mount-old5.zml" (from June 30th) to the one I 
saved on July 7th and put on my website:

http://www.ibiblio.org/ebooks/mount-07-Jul.zml

Besides many more hyphenation changes, you changed this line:

     other day, sitting right in the house there, Pompey

to this:

     other day, sitting right in the house there, 'Pompey

You changed this line (the second of the three you admitted missing):

     "Why, damn it fell, Gord!" exclaimed an individual,

to this:

     "Why, damn it t'ell, Gord!" exclaimed an individual,

You changed this line:

     you'd meet in a day on a horse. You mind Erne

to this:

     you'd meet in a day on a horse. You mind Effie

You changed this line:

     that had overtaken his sister Erne, remarked by her

to this:

     that had overtaken his sister Effie, remarked by her

You changed this line (the last of the three you admitted missing):

     la wed bank; a rate of interest a man can carry without

to this:

     lawed bank; a rate of interest a man can carry without

And you changed this line:

     He year, in the immemorial, minute shifting

to this:

     The year, in the immemorial, minute shifting

Your preprocessing missed those other errors, too, didn't it?

Finally, let's compare the ZML file I saved from July 7th to the
latest one you have online, "mount.zml" (from July 10th).

http://z-m-l.com/go/mount/mount.zml

There's only one difference between them. You changed this line:

     but not Kenny's for nineteen years." Another bore,

to this:

     but not Henny's for nineteen years." Another bore,

Your preprocessing missed that error, too, didn't it?

Hmmm.... It seems that your claim was false, Bowerbird.


Jose Menendez

From walter.van.holst at xs4all.nl  Mon Aug 18 21:50:15 2008
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Tue, 19 Aug 2008 06:50:15 +0200
Subject: [gutvol-d] tanglewood tales -- 02
In-Reply-To: <48AA4A1F.7020908@ibiblio.org>
References:  <48AA4A1F.7020908@ibiblio.org>
Message-ID: <48AA5107.4070501@xs4all.nl>

Jose Menendez wrote:

> As for your not being finished with it yet, why it was back on June 
> 27th--7 weeks ago!--that you posted this message:

Although you are a better man than I am for providing this long overdue 
reality-check to our capitalisation-challenged 'performance poet', I 
have to come to his defense: to my knowledge he has never claimed that 
preprocessing (which incidentally, I feel should be fully automatic, 
human oversight makes it proofing) would be able to catch all OCR errors.

Regards,

  Walter

From dakretz at gmail.com  Mon Aug 18 21:58:52 2008
From: dakretz at gmail.com (don kretz)
Date: Mon, 18 Aug 2008 21:58:52 -0700
Subject: [gutvol-d] TwistEd
Message-ID: <627d59b80808182158t3be94c3bra7302f1a01886b07@mail.gmail.com>

There's a new, minimally functional
versionto look at.

It's most compelling feature is showing how far there is to go. But it does
javascript-compatible regexes pretty adequately.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Mon Aug 18 22:25:06 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 19 Aug 2008 01:25:06 EDT
Subject: [gutvol-d] tanglewood tales -- 02
Message-ID: 

see, jose, this is why i so often put you in my spam folder...

you go on and on and on about the smallest of points,
while simultaneously you blithely ignore the big picture.

so i'm gonna have to ignore any more posts you make
under this subject-header.   your shelf-date has expired.

***

yes, there were errors in all my versions of "mountain blood".
and there _continue_ to be errors, especially since the copy
that's up there right now was posted way back on july 10th.

there will even be errors in it when i mount my newest copy,
i would suppose, because i haven't proofed it, and i _won't_,
as i'm sure it's good enough for public "continuous proofing",
which i've defined as less-than-1-error-for-every-10-pages...

i've never said i have an error-free copy up.   that's unimportant.
nobody at d.p. or p.g. would care about that.   why should they?

what _is_ important is that -- if you use the right set of routines --
you can eliminate _all_ of the o.c.r. errors in this book _except_3_.

_that_ is something that the d.p. and p.g. people should care about,
because it shows them a good way to do their job more efficiently...
and because _none_ of their own preprocessing even comes close!

and you're smart enough to _know_ that that's the important point,
jose, so if you have a reply to _that_, then i would be glad to hear it.

but this constant carping about other irrelevant crap is bullshit --
you're smart enough to know it; you just make yourself look bad...


>    because you kept making more corrections to it as
>    you watched the DP proofreaders work on their version. 

that's exactly right.   and if you've been reading all my posts,
you'll know that i suggested to dkretz that this was precisely
how _he_ should go about finding the minimal necessary set
of routines -- by seeing what corrections are made during p1.

do that on enough books, and you'll have the _correct_recipe_.
i've said this clearly, and directly; there should be no confusion.

there was a good reason that all of my routines returned "hits";
because i only included a routine in this list _if_ it returned a hit.
in other words, it was the "minimal necessary set" _for_this_book_.
and -- in the end -- it caught every single error except for three...
that's darn good preprocessing, better than we could ever expect.


>    If we take a look at your "mount" file directory,
>    http://z-m-l.com/go/mount/
>    we'll see these ZML files listed:

exactly right.   i renamed old versions as i posted new ones,
so _anyone_ -- including you -- could track their progress.
you act like i'm hiding these files, when they're _right_there_.

a simple "index.html" file would prevent you from seeing them,
but you won't find such a file on _any_ of my /go/ books, since
my goal is making the file-structure as transparent as possible.


>    Besides "he wag," your preprocessing missed those other errors, 
>    too, didn't it?

you seem to be under the impression that my preprocessing was
done in one fell swoop.   but it wasn't.   it's been a piecemeal thing.
when p1 finds and fixes an error, i ask whether that error could've
been detected by a preprocessing routine.   if so, i add that routine,
and check to see if it turns up any _other_ errors in the book.   if so,
i fix those errors.   and that's precisely how d.p. should be operating.
and i've been very clear about this.   either you're not paying attention,
or (more likely) you just wanna give an appearance of "correcting" me.
it's bullshit, jose, and i'm calling you on your bullshit, so stop doing it.

to everyone else, please excuse my "french", but i only wrote this post
so that all of you would be clear on what i'm doing here, so if anyone
has any questions on my modus operandi, please feel free to ask me...

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Mon Aug 18 22:26:35 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 19 Aug 2008 01:26:35 EDT
Subject: [gutvol-d] TwistEd
Message-ID: 

dkretz said:
>    There's a new, minimally functional version to look at.

cool, i'll play around with it.

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Mon Aug 18 23:33:42 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 19 Aug 2008 02:33:42 EDT
Subject: [gutvol-d] TwistEd
Message-ID: 

dkretz said:
>    TwistEd

ok, excellent progress.   enough so that i'll send you
over a copy of "banana cream"...     congratulations...

you need "prev page" and "next page" _buttons_
-- and to respond to cursor-key presses too --
so the user can just remain in the view-text mode.
(there's little reason to look at that listbox of files.)

other than that, you're on your way...   keep going...

for other people who will want to look at this app,
i've uploaded a 15-meg .zip file with images and
text-files that you can use to try out the program:
>    http://z-m-l.com/tnglw/twisted-dkretz.zip

folks, _this_ is the way to be constructive...

once again, great work, don...             :+)

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Tue Aug 19 00:07:25 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 19 Aug 2008 03:07:25 EDT
Subject: [gutvol-d] tanglewood tales -- 07
Message-ID: 

we're looking at the new improved "tanglewood tales", e-text #976.

i took a look at the chapter-headers, and found one glitch there.

this header:
>    THE WAYSIDE. INTRODUCTORY.

should actually be split across two lines:
>    http://z-m-l.com/go/tnglw/tnglwp007.html

so that adds 1 more error to the total...

***

also worth noting is the fact that the _old_ version of this book was
consistent in the number of blank lines above each chapter-header,
while the new version is inconsistent.   specifically, 2 of the headers
have 3 blank lines above, instead of 4.   initially, i was a bit reluctant
to count these as errors, and was inclined to dismiss it with a simple
note that we really want to increase the consistency, not decrease it...

but then i looked at the .html version of the file, and i discovered that
this inconsistency in the number of blank lines above the headers had
caused some serious ramifications in the structure of the .html version.

the headers with insufficient blank lines above them had been demoted,
in the sense that they didn't have a link in the hotlinked table of contents,
and were formatted differently than other chapter-headers, even though
-- in the p-book -- there is no difference between them and the others...

>    http://www.gutenberg.org/files/976/976-h/976-h.htm

so i guess i have no choice but to consider those two headers to be errors.

so 10 total easily-found-by-the-computer errors in this just-updated book...

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From jayvdb at gmail.com  Tue Aug 19 00:21:52 2008
From: jayvdb at gmail.com (John Vandenberg)
Date: Tue, 19 Aug 2008 17:21:52 +1000
Subject: [gutvol-d] TwistEd
In-Reply-To: 
References: 
Message-ID: 

On Tue, Aug 19, 2008 at 4:33 PM,   wrote:
> dkretz said:
>>   TwistEd
>
> ok, excellent progress.  enough so that i'll send you
> over a copy of "banana cream"...    congratulations...

I've also had a quick look, and like what is coming of it.

> you need "prev page" and "next page" _buttons_
> -- and to respond to cursor-key presses too --
> so the user can just remain in the view-text mode.
> (there's little reason to look at that listbox of files.)

I couldnt see a save button.

When I make a change via the Replace button, the raw file is changed.
One day, I am going to regret hitting the Replace button; an undo
would be nice.

If I edit the text, it doesnt prompt me to save it and happily closes
without saving.

> other than that, you're on your way...  keep going...
>
> for other people who will want to look at this app,
> i've uploaded a 15-meg .zip file with images and
> text-files that you can use to try out the program:
>>   http://z-m-l.com/tnglw/twisted-dkretz.zip

404 Not Found.

--
John

From Morasch at aol.com  Tue Aug 19 00:51:29 2008
From: Morasch at aol.com (Morasch at aol.com)
Date: Tue, 19 Aug 2008 03:51:29 EDT
Subject: [gutvol-d] TwistEd
Message-ID: 

john said:
>   I couldnt see a save button.

don't need one.   it should save automatically on every change.


>    When I make a change via the Replace button, the raw file is changed.

not good.   each edit should be saved under a new time-stamped name,
with the latest revision also saved under the default name for that page.

the files should be merged originally into a single file, as i said earlier.
then, on an edit, the text for that individual page is saved independently.

so your original file -- with all the text -- might be named "tnglw.zml".
then when you make an edit to page 123, file "tnglwp123.txt" is saved.

if the text on that page is later edited again, the filenames might be:
>    tnglwp123-2008-08-12-14-22-03.txt
>    tnglwp123-2008-08-15-03-05-53.txt
>    tnglwp123-2008-08-18-23-52-52.txt

that way, you can easily tell -- just from looking at your file-directory --
exactly what pages have been edited, and exactly when they were edited.
(the filenames will sort nicely.)

and when you're ready to package up all your edits into a new merged file,
you can do it easily, retaining each of your "complete" files along the 
way...

it's also worth noting that automatic "robot" changes can honor these rules.

also worth noting is that _all_ of this is totally nondestructive, for 
rollback,
and for systematic tracing of exactly what was done, and when it was done.
this is very important.


>    One day, I am going to regret hitting the Replace button; 
>    an undo would be nice.

a "revert to previous file" button will be nice, yes,
but you can attain the same effect by deleting the
appropriate file with latest time-stamped name,
replacing the default file with the previous one...


>   If I edit the text, it doesnt prompt me to save it 
>    and happily closes without saving.

don will fix that...        :+)


>    http://z-m-l.com/tnglw/twisted-dkretz.zip
>    404 Not Found.

oops!   sorry.   my fault.   try this instead:
>    http://z-m-l.com/go/tnglw/twisted-dkretz.zip

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From marcello at perathoner.de  Tue Aug 19 04:21:08 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 19 Aug 2008 13:21:08 +0200
Subject: [gutvol-d] TwistEd
In-Reply-To: <627d59b80808182158t3be94c3bra7302f1a01886b07@mail.gmail.com>
References: <627d59b80808182158t3be94c3bra7302f1a01886b07@mail.gmail.com>
Message-ID: <48AAACA4.70801@perathoner.de>

don kretz wrote:

> There's a new, minimally functional version 
>  to look at.

... for suitably small values of 'functional'.

When I go to

   http://get.adobe.com/air/

I get:

   Sorry, your platform is not supported.


0 points out of 10, for gratuitously using a proprietary gadget that 
does not run on people's systems.



From Bowerbird at aol.com  Tue Aug 19 11:30:52 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 19 Aug 2008 14:30:52 EDT
Subject: [gutvol-d] pretending to talk to dkretz about twisted
Message-ID: 

ok, i'm gonna _pretend_ to talk to dkretz here, about twisted.

in reality, i'm talking to any programmer who wants to listen,
either now or in the future.   but i'll pretend i'm talking to don.

so don, if you don't particularly care to hear from me, then
just pretend i'm talking to someone else.   or you could even
decline to listen.   but that wouldn't be your smartest choice.

and make no mistake about it, what i will be dishing here is
_good_stuff_.   that $300,000 pricetag for all my programs?
well, that's for _all_ my programs, across the full toolchain.
but a good part of the "secret sauce" -- the ability to create
kick-ass electronic-book apps -- i will give away right here.
in this and future posts.   so listen up if you want some of it.

***

don, you're on the verge of an extremely big breakthrough.

just as soon as you create cursor-key navigation from page
to page, while leaving the edit-field up on-screen, you will
experience a _rush_ as you hold down that right cursor-key
and go boom!-boom!-boom!-boom!-boom!-boom!-boom!
through the pages of the book, with dreamy fluid navigation.

at that time, you will also realize that, as simple as it all was,
you've already built an interface that challenges the d.p. g.u.i.
sure, you'll need to learn how to grab resources from online,
rather than locally, but that will be fairly simple enough to do.

and when you've accomplished that, you will have the means
(thanks to flex) to execute that interface in the web-browser
_or_ offline, which all by itself puts your app in a class above.

push to this stage.   once you hit it, you'll be highly rewarded.
the adrenaline will give you motivation to continue furthur...

***

more on the interface...

as i said, make the edit-field the default, not the files listbox.
that positioning is from the "twister" mindset, improper here.
you could also work on making the files listbox more useful.
(and delete all the twister buttons; they're just a distraction.)

the other major reworking is that you want to have the image
on-screen consistently.   the "thumbnails" tab is unnecessary.
(oh, it can be retained, yes, but your users won't use it much.)

more importantly in this regard is to re-do the "hunter" tab...

much of the time -- heck, perhaps even _most_ of the time --
your users will want to see the scan in order to verify a change.
so you can't have the "find" interface displacing the scan-view.

so move the find/replace boxes up to the top (or the bottom),
along with all the other controls related to the change process,
so the text _and_ the scan will continue to be visible at all times.

(the only caveat here is that i have a 23" monitor, so sometimes
i'm unaware of how crowded things can get on a smaller screen.
but the ability to view text and scan together is certainly critical.)

this is not to say you should to eliminate the "hunter" tab totally.
but when the user _does_ have it showing, make it more useful.

specifically, when the user has executed a search for a keyword,
display a _list_ of the hits in a listbox -- the full text of each line.
this kind of list is _invaluable_ in a focused detection of glitches.
and if the user clicks on a specific hit, kick back to the scan-view,
with the hit highlighted.   plus, if the edit-field is then activated,
have the text of that hit pre-selected for most efficient editing.

***

back to the navigation...

revel in it!

with z.m.l. coding, chapter-heads are indicated by 4 blank lines.
so i can search -- behind and ahead -- to find the previous and
next chapters, so i can provide the user buttons to jump to them.

and, even better, since i use the pagenumbers in the filenames,
i can let the user type in a pagenumber and then just press enter,
and -- boom! -- they are delivered right to that page.   it's magic.
once you zoom around a book this way, anything else is too slow.

back to chapter-heads, i also build a _menu_ of them, so the user
can use that to jump right to a specific one, if they need to do that.
it's also very handy in getting an overview of the book's structure...
(the utility of this is very high when you have multi-level headers.)

i also build a lot of other menus, for a number of other purposes,
but since i don't know how easy it is for you to build menus in flex,
i won't dwell on that for now.   but if you can do it, do please tell me.

if you can't do menus easily, then listboxes (which i see you can do)
can also serve the purpose almost equally well...   (the reason menus
are typically better is that they do not displace text-scan viewability.)

what i did with "banana cream" is used _both_ menus and listboxes.

oh yeah, one of the most useful listboxes is on the "find" function.
there's no reason to automatically clear the listbox; let it cumulate,
because sometimes the user will want to go back to a previous hit...
put a "clear" button on it, so they can clear it if it gets too bulky, but
don't clear it automatically.

***

ok, aside from downloading resources from the web when needed,
i think the main thing i haven't discussed here is the text-file saving,
but i covered it pretty well in the previous message i posted, to john,
so i don't think i need to reiterate, unless you have specific questions.

one thing, though, is on the merged text-file.   on start-up, you will
read in the whole file, and then split it into an array on pagebreaks
-- in a .zml file they're indicated by " {{" -- for the individual pages.
but again, if you have any questions, do please feel free to fire away.

***

you're right on the verge!   i'm excited for you, and rooting for you!        
 :+)

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Tue Aug 19 11:36:22 2008
From: dakretz at gmail.com (don kretz)
Date: Tue, 19 Aug 2008 11:36:22 -0700
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 45
In-Reply-To: 
References: 
Message-ID: <627d59b80808191136x694e2a9o92fdf5fe484aebde@mail.gmail.com>

So far no derogations not anticipated in the design.

It runs on any system supporting flash. You need to install the Adobe
installer. I changed the description to:

Minimal TwistEd - requires Adobe AIR installer for your OS, found at
http://get.adobe.com/air/ .

It saves every file as you leave it for another. The persistence mechanism
doesn't have multiversioning or undo yet. It might be nice to toss it into
that repo that john van has been referring to. But at least provide
cumulative version files (which is what the DPToo does, by the way.)

Undo is featured prominently near the top of my todo list. Also "Replace All
In Page", "Replace All", "Prev" and "Replace + Prev", "Highlight All in
Page" (with "Replace All"), Regex History, counters for "Replaced" and
"Skipped" (for each Regex), "Load Regex File", etc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Tue Aug 19 11:48:10 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 19 Aug 2008 14:48:10 EDT
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 45
Message-ID: 

don-

one more time:   do the navigation stuff first.
then have d.p. volunteers start looking at it...

the fluidity in the app's ability to address the
whole book at the _level_ of the book will be
extremely important toward breaking down
the prevailing one-page-at-a-time mentality,
and it'll take a while to overturn that mindset,
so start 'em getting familiar with the app now.

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From marcello at perathoner.de  Tue Aug 19 11:55:33 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 19 Aug 2008 20:55:33 +0200
Subject: [gutvol-d] gutvol-d Digest, Vol 49, Issue 45
In-Reply-To: <627d59b80808191136x694e2a9o92fdf5fe484aebde@mail.gmail.com>
References: 
	<627d59b80808191136x694e2a9o92fdf5fe484aebde@mail.gmail.com>
Message-ID: <48AB1725.9040202@perathoner.de>

don kretz wrote:

> It runs on any system supporting flash. You need to install the Adobe 
> installer. I changed the description to:
> 
> Minimal TwistEd - requires Adobe AIR installer for your OS, found at 
> http://get.adobe.com/air/ .

You may add to the description that Adobe does not support Linux.


From dakretz at gmail.com  Tue Aug 19 11:56:29 2008
From: dakretz at gmail.com (don kretz)
Date: Tue, 19 Aug 2008 11:56:29 -0700
Subject: [gutvol-d] Adobe AIR for Linux
Message-ID: <627d59b80808191156l40acd7e6ub88ee25a7abd4755@mail.gmail.com>

Pre-release version can be found here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Tue Aug 19 11:58:53 2008
From: dakretz at gmail.com (don kretz)
Date: Tue, 19 Aug 2008 11:58:53 -0700
Subject: [gutvol-d] Proprietary ... Not.
Message-ID: <627d59b80808191158u1719bad3u1df6d0355a1ee7ec@mail.gmail.com>

Since it's written in Flex, it can also be jiggered a bit to run from a host
computer in a web page.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From walter.van.holst at xs4all.nl  Tue Aug 19 12:05:00 2008
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Tue, 19 Aug 2008 21:05:00 +0200
Subject: [gutvol-d] Proprietary ... Not.
In-Reply-To: <627d59b80808191158u1719bad3u1df6d0355a1ee7ec@mail.gmail.com>
References: <627d59b80808191158u1719bad3u1df6d0355a1ee7ec@mail.gmail.com>
Message-ID: <48AB195C.6050400@xs4all.nl>

don kretz wrote:
> Since it's written in Flex, it can also be jiggered a bit to run from a 
> host computer in a web page.

Please do, I am not too fond of another proprietary plug-in that may or 
may not be supported on my system.

Regards,

  Walter

From marcello at perathoner.de  Tue Aug 19 12:30:05 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 19 Aug 2008 21:30:05 +0200
Subject: [gutvol-d] Proprietary crap
In-Reply-To: <627d59b80808191158u1719bad3u1df6d0355a1ee7ec@mail.gmail.com>
References: <627d59b80808191158u1719bad3u1df6d0355a1ee7ec@mail.gmail.com>
Message-ID: <48AB1F3D.3060209@perathoner.de>

don kretz wrote:

> Since it's written in Flex, it can also be jiggered a bit to run from a 
> host computer in a web page.

Since Flex is based on Flash, and Flash is still unavailable in 64 bit 
Linux, for me it will run neither from the web nor from the desktop.

Still 0 points out of 10 I guess ...

Why don't you program in JavaScript? If Google can do Google Maps in 
JavaScript it should be easy for you to pull a few page images alongside 
a text box.


From dakretz at gmail.com  Tue Aug 19 12:52:48 2008
From: dakretz at gmail.com (don kretz)
Date: Tue, 19 Aug 2008 12:52:48 -0700
Subject: [gutvol-d] Proprietary crap ... Not.
Message-ID: <627d59b80808191252r1c22aa64yddaef48084b0bb76@mail.gmail.com>

Marcello Perathoner wrote:

>> >> don kretz wrote:
>>
>> >> Since it's written in Flex, it can also be jiggered a bit to run from
a
>> >> host computer in a web page.
>>
>> Since Flex is based on Flash, and Flash is still unavailable in 64 bit
>> Linux, for me it will run neither from the web nor from the desktop.
>>
>> Still 0 points out of 10 I guess ...
>>
>> Why don't you program in JavaScript? If Google can do Google Maps in
>> JavaScript it should be easy for you to pull a few page images alongside
>> a text box.

The language in Flex is Actionscript, and Actionscript is a (very nearly
proper) superset of javascript. I can take arbitrary javascript code and run
it as-is in Flex. So I am consciously, intentionally writing this to run in
a browser.

The problem is that, without AIR, you couldn't download it and run it unless
you were running your own web server. And handling the image and text files,
and managing their versions, would need yet another server application
(since that can't be done from the server's javascript sandbox) for obvious
security reasons that are built into the server and the javascript language.

Google may not be your best counter-example. They are increasingly using
Flex themselves for UI stuff (e.g. Google Finance). Especially stuff that's
heavy on text handling.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Tue Aug 19 13:08:49 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 19 Aug 2008 16:08:49 EDT
Subject: [gutvol-d] a 10-point plan for a preprocessing program
Message-ID: 

don-

here's a post i made on july 15, before you got active on the listserve.

-bowerbird

--------------------------------------------------------------

here's what roger frank needs to do to 
get his preprocessing program going...

1. clean up paragraphs. (have to do it sooner or later; so do it sooner.)

2. put top-blank-line on appropriate pages. (so proofers don't have to.)

3. clear up spacey quotes. (there can be literally _hundreds_ of   these.)

4. standardize ellipses. (so proofers skip the merry-go-round of changes.)

5. standardize em-dashes. (here too, skip the changes merry-go-round.)

6. dehyphenate. (or delay this until _after_ proofing. or skip altogether!)

7. "clothe" hyphens.   (better yet, just stop doing this stupid d.p. policy.)

8. run routines that find obvious o.c.r. errors. (as i've demonstrated.)

9. do a better job of formulating the "good words" list. (saves time!)

10. congratulate yourselves for a job well done...

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Tue Aug 19 13:12:40 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 19 Aug 2008 16:12:40 EDT
Subject: [gutvol-d] Proprietary crap ... Not.
Message-ID: 

dkretz-

ignore the troll.   he'll go away.   i proved that.   and you've got work to 
do.

-bowerbird

p.s.   besides, he doesn't really want to use your program anyway...



**************
It's only a deal if it's where you want to go. Find your travel deal 
here.
      (http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From marcello at perathoner.de  Tue Aug 19 13:13:19 2008
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 19 Aug 2008 22:13:19 +0200
Subject: [gutvol-d] Proprietary crap ... Not.
In-Reply-To: <627d59b80808191252r1c22aa64yddaef48084b0bb76@mail.gmail.com>
References: <627d59b80808191252r1c22aa64yddaef48084b0bb76@mail.gmail.com>
Message-ID: <48AB295F.3000408@perathoner.de>

don kretz wrote:

> The problem is that, without AIR, you couldn't download it and run it 
> unless you were running your own web server.

JavaScript runs fine from a page stored on your hard disk.


> And handling the image and 
> text files, and managing their versions, would need yet another server 
> application (since that can't be done from the server's javascript 
> sandbox) for obvious security reasons that are built into the server and 
> the javascript language.

Try the File object.



From dakretz at gmail.com  Tue Aug 19 14:14:45 2008
From: dakretz at gmail.com (don kretz)
Date: Tue, 19 Aug 2008 14:14:45 -0700
Subject: [gutvol-d] javascript objects
Message-ID: <627d59b80808191414p620fba9di5a5430384b88b62@mail.gmail.com>

Marcello Perathoner wrote:

>>
>> Try the File object.
>>


I can't find it in the list. 

Do you have another list that applies to current browsers?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Wed Aug 20 01:35:34 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 20 Aug 2008 04:35:34 EDT
Subject: [gutvol-d] tanglewood tales -- 08
Message-ID: 

we're examining the new improved "tanglewood tales", e-text #976.

today's check is for an unexpected uppercase that occurs mid-sentence,
after controlling for names, of course.

we get 3 hits here that are clearly wrong:
>    touch, disclosing an entrance just wide enough to admit them They crept
>    also, shed their brown leaves upon It, as often as the autumn came; and
>    weather At no great distance, they beheld a river gleaming in the

another 1 hit was found to be wrong (period, not colon) on checking:
>    subjects, and living in a very sorrowful way, all by himself: On Jason's

on 2 more, i'm unsure if the p-book is wrong, i'd have to consult a 
grammarian:
>    to teach them their A B C--which he invented for their benefit, and for
>    "But to resume: Shall we, my countrymen, suffer this wicked stranger to

there might be others, but i didn't want to do an extensive name-check,
or paw through a lot of false-alarms, as that's not the intent of this 
series,
which is to identify the errors that are _very_easy_to_detect_automatically_.

so we certainly won't count those last 2, but we still have 4 errors here...

so 14 total easily-found-by-the-computer errors in this just-updated book...

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ebooks at ibiblio.org  Wed Aug 20 15:23:55 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Wed, 20 Aug 2008 18:23:55 -0400
Subject: [gutvol-d] tanglewood tales -- 02
In-Reply-To: <48AA5107.4070501@xs4all.nl>
References:  <48AA4A1F.7020908@ibiblio.org>
	<48AA5107.4070501@xs4all.nl>
Message-ID: <48AC997B.5070005@ibiblio.org>

Walter van Holst wrote:

> Jose Menendez wrote:
> 
>> As for your not being finished with it yet, why it was back on June 
>> 27th--7 weeks ago!--that you posted this message:
> 
> Although you are a better man than I am for providing this long overdue 
> reality-check to our capitalisation-challenged 'performance poet', I 
> have to come to his defense: to my knowledge he has never claimed that 
> preprocessing (which incidentally, I feel should be fully automatic, 
> human oversight makes it proofing) would be able to catch all OCR errors.
> 
> Regards,
> 
>  Walter 


Here's a post Bowerbird made to this list about a month and a half ago:

"[gutvol-d] continued confusion over at distributed proofreaders"
http://lists.pglaf.org/private.cgi/gutvol-d/2008-June/008693.html

And here's an excerpt:


> we want to move the text as close to perfection as soon as possible.
> ideally, we would make it perfect in preprocessing, and then have the
> first p1 pass be the first no-change confirmation that it _was_ perfect,
> and the second p1 pass be the second no-change verification of that,
> in which case (in my opinion), we would be able to certify it as perfect.


And here's a post he made to the list this month:

"Re: [gutvol-d] gutvol-d Digest, Vol 49, Issue 15"
http://lists.pglaf.org/private.cgi/gutvol-d/2008-August/008991.html


> dkretz said:
> >   Mileage differs, but I've found that I have
> >   pretty high confidence in a good number of my
> >   regex alterations, without needing to visit each one.
> 
> i'm sure there are some in which you can have confidence,
> even without examining instances.  but those are the least
> interesting of the bunch.  and -- by themselves -- they will
> _not_ take a text to perfection, even if they do _improve_ it.

[snip]

> but again, those types of never-miss corrections are atypical.
> and, more importantly, they will not take a text to perfection.
> 
> to get to perfection, you need the routines i've been listing...
> (and a handful more, to account for glitches in other books.)


I'd say that "perfection" entails catching all OCR errors. :) But 
looking at Bowerbird's list of routines, I saw right away that they 
were inadequate for the task. For instance, let's look at this one:

"[gutvol-d] how to clean up ("preprocess") the o.c.r. for a book -- 028"
http://lists.pglaf.org/private.cgi/gutvol-d/2008-July/008872.html

> 28.  do a search for comma-space-uppercase, controlling for names.

There are two problems with that search. First, he only had "space" 
instead of "whitespace" after "comma." That's why he missed this error:

     Nothing further back was known in Greenstream,
     It was well known that the first George Gordon MacKimmon

Second, he had "controlling for names." That's why he missed this line:

     darkly, Gordon, stood still, Meta Beggs fell be-

The fact is that *many* sentences in novels begin with names, and it 
would be silly to assume that none of the periods preceding those 
names were turned into commas by the OCR. (It could also be that some 
of those OCR commas should have been semicolons, and some could have 
been caused by spots and specks on the pages.)

Now some might say that that would be too many commas to check--too 
many false alarms. Well, in Bowerbird's ZML copy of "Mountain Blood," 
a search for comma-whitespace-uppercase yields only 232 hits. And here 
are the first three:

1. Simmons' son, Buckley.

2. reins about the whipstock, Gordon swung out over

3. Nothing further back was known in Greenstream,
    It was well known that the first George Gordon MacKimmon

See how quickly he could have found that error, even with names 
included. :)

And if Bowerbird's preprocessing tool works the way he claims and as 
well as he claims, he should have been able to race through all 232 in 
minutes. By the way, I'm not saying whether those were the only errors 
I found with that search. Bowerbird will have to do it himself if he 
wants to find out. :)


Jose Menendez


P.S. By the way, Walter, after my first post on this topic, you wrote: 
"Schadenfreude is hardly ever a good thing. Nonetheless, I found this
rather amusing." Well, you may find this even more amusing. In this post,

"[gutvol-d] how to clean up ("preprocess") the o.c.r. for a book -- 015"
http://lists.pglaf.org/private.cgi/gutvol-d/2008-July/008755.html

Bowerbird's tip was this:

> 15.  search for all lines with a period-whitespace followed by lowercase,
> except controlling for cases where it was ellipse-whitespace-lowercase...

It occurred to me that he might have forgotten to make sure that his 
"whitespace" could include a new line. So I searched for single 
period-whitespace-lowercase and found these 6 incorrect lines:


1. been cropping the grass in. the broad, shallow gutter
2. himself pointedly in. its defiance.
3. Mrs. Hollidew in. the sitting room. He would wake
4. in. the return of the options to a county enhanced
5. quickly away; the. house was without a
6. him to where, on. the bureau, a lamp had been left.


What makes it especially amusing is that Bowerbird listed those 6 
incorrect lines in his post, but he apparently forgot to fix them! 
Perhaps he should change the name of his preprocessing tool from 
"banana cream" to "banana peel." ;)

From Bowerbird at aol.com  Wed Aug 20 17:17:26 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 20 Aug 2008 20:17:26 EDT
Subject: [gutvol-d] tanglewood tales -- 02
Message-ID: 

now jose, you know i called you on it, _and_ that i said i would not reply to
any more of your posts under this subject-header, so don't try to tempt me.   
   :+)

but hey, let's turn your pigheadedness to a laudable cause...

i just uploaded the latest copy of my "mountain blood" text.
you know where to find it.   so... make yourself useful and
go find the errors in it.

i changed quite a number of hyphenates -- including a large number
which begin with "half-" -- so please make an extensive list of those, too.
i won't consider them as "errors" -- after all, i changed 'em on purpose --
but it will give you something to do while you're searching for real 
errors...

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Thu Aug 21 01:18:24 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 21 Aug 2008 04:18:24 EDT
Subject: [gutvol-d] this just in, make of it what you will
Message-ID: 

http://www.sciencedaily.com/releases/2008/08/080814154329.htm

Aug. 19, 2008 | ScienceDaily

Computer Users Are Digitizing Books Quickly And Accurately With New Method

Millions of computer users collectively transcribe the equivalent of 160
books each day with better than 99 percent accuracy, despite the fact
that few spend more than a few seconds on the task and that most do not
realize they are doing valuable work, Carnegie Mellon University
researchers reported recently in Science Express.

They can work so prodigiously because Carnegie Mellon computer
scientists led by Luis von Ahn have taken a widely used Web site
security measure, called a CAPTCHA, and given it a second purpose --
digitizing books produced prior to the computer age. When Web visitors
solve one of the distorted-letter puzzles so they can register for email
or post a comment on a blog, they simultaneously help turn the printed
word into machine-readable text.

More than a year after implementing their version, called reCAPTCHA,
http://recaptcha.net/ on thousands of Web sites worldwide, the
researchers conclude that their word deciphering process achieves the
industry standard for human transcription services -- better than 99
percent accuracy. Their report, published online today, will appear in
an upcoming issue of the journal Science.

Furthermore, the amount of work that can be accomplished is herculean.
More than 100 million CAPTCHAs are solved every day and, though each
puzzle takes only a few seconds to solve, the aggregate amount of time
translates into hundreds of thousands of hours of human effort that can
potentially be tapped. During the reCAPTCHA system's first year of
operation, more than 1.2 billion reCAPTCHAs have been solved and more
than 440 million words have been deciphered. That's the equivalent of
manually transcribing more than 17,600 books.

"More Web sites are adopting reCAPTCHAs each day, so the rate of
transcription keeps growing," said von Ahn, an assistant professor in
the School of Computer Science's Computer Science Department. "More than
4 million words are being transcribed every day. It would take more than
1,500 people working 40 hours a week at a rate of 60 words a minute to
match our weekly output."

Von Ahn said reCAPTCHAs are being used to digitize books for the
Internet Archive and to digitize newspapers for The New York Times.
Digitization allows older works to be indexed, searched, reformatted and
stored in the same way as today's online texts.

Old texts are typically digitized by photographically scanning pages and
then transforming the text using optical character recognition (OCR)
software. But when ink has faded and paper has yellowed, OCR sometimes
can't recognize some words -- as many as one out of every five,
according to the Carnegie Mellon team's tests. Without reCAPTCHA, these
words must be deciphered manually at great expense.

Conventional CAPTCHAs, which were developed at Carnegie Mellon, involve
letters and numbers whose shapes have been distorted or backgrounds
altered so that computers can't recognize them, but humans can. To
create reCAPTCHAs, the researchers use images of words from old texts
that OCR systems have had trouble reading.

Helping to make old books and newspapers more accessible to a
computerized world is something that the researchers find rewarding, but
is only part of a larger goal. "We are demonstrating that we can take
human effort -- human processing power -- that would otherwise be wasted
and redirect it to accomplish tasks that computers cannot yet solve,"
von Ahn said.

For instance, he and his students have developed online games, available
at http://www.gwap.com, that analyze photos and audio recordings --
tasks beyond the capability of computers. Similarly, University of
Washington biologists recently built Fold It, http://fold.it/, a game in
which people compete to determine the ideal structure of a given protein.

In addition to von Ahn, authors of the new report include computer
science undergraduate Benjamin Maurer, graduate students Colin McMillen
and David Abraham, and Manuel Blum, professor of computer science.

Adapted from materials provided by Carnegie Mellon University
.



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Thu Aug 21 01:35:24 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 21 Aug 2008 04:35:24 EDT
Subject: [gutvol-d] tanglewood tales -- 09
Message-ID: 

we're looking at the new improved "tanglewood tales", e-text #976.

this check is on paragraphing.

we want every p-book paragraph to cohere together in the o.c.r, and
we want a blank line between adjacent paragraphs to separate them.

since i had 2 versions of this book -- the p.g. e-text and the text from
archive.org (i'm using the book named "tanglewoodtalesf00hawt6") --
i could compare the _paragraphing_ of the two text-files, an important
intermediate step on the way to comparing the text of the two versions.

where paragraphing agreed, i assumed it was correct (maybe wrongly).
but where it differed, i checked the scan to see which version was right.

what this revealed is that the p.g. e-text has incorrect paragraphing in
12 different spots, 6 of 'em being paragraphs improperly run together,
and the other 6 being paragraphs that had been wrongly split into two...

i went back and forth on whether to count all of these glitches as errors,
or just some of them (if one tries to invent extenuating circumstances),
but in the end, they _are_ clearly all errors, so i decided to count all 
12...

of course, these _could_ be edition differences.   but since al didn't give 
us
a pointer to a publicly-available alternative scan-set, we can't know that...

so 26 total easily-found-by-the-computer errors in this just-updated book...

(and recall that i was fairly lenient in not counting 15 hyphenation errors,
and 4 improper quoting errors, and those would have put the total at 45.)

***

now, we still haven't gotten down yet to the _real_ nitty-gritty, namely a
comparison of the p.g. e-text with another digitization, word-by-word.

however, i ain't gonna do that, because some people would "disagree"
that such a detailed analysis should be done by a whitewasher who is
"merely" updating an e-text to its location in the new naming structure.

and i can see that argument.

of course, that brings up the next obvious question:   who _will_ do it?
if the whitewashers aren't going to do it when they move the old texts,
who _is_ going to update the old texts?   nobody?   kinda looks that way.

isn't that a problem?   well, yeah, probably...

but this series has sharply focused on errors that are _easily_detected_,
and i don't wanna dilute that focus at the end with harder-to-find errors.

so we're gonna stay with the solidly-established number of 26-45 errors.

we'll wrap this series up tomorrow...

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Thu Aug 21 12:59:31 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 21 Aug 2008 15:59:31 EDT
Subject: [gutvol-d] time to step in and exercise some leadership
Message-ID: 

time for someone to step into this thread over at d.p.:
>    http://www.pgdp.net/phpBB2/viewtopic.php?t=15949&start=390
and tell the people there that it is _unacceptable_ to use
lousy o.c.r. programs that waste the time of the proofers...

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From donovan at abs.net  Thu Aug 21 13:26:37 2008
From: donovan at abs.net (D Garcia)
Date: Thu, 21 Aug 2008 16:26:37 -0400
Subject: [gutvol-d] time to step in and exercise some leadership
In-Reply-To: 
References: 
Message-ID: <200808211626.37380.donovan@abs.net>

On Thursday 21 August 2008 15:59:31 Bowerbird at aol.com wrote:
> time for someone to step into this thread over at d.p.:
> >    http://www.pgdp.net/phpBB2/viewtopic.php?t=15949&start=390
>
> and tell the people there that it is _unacceptable_ to use
> lousy o.c.r. programs that waste the time of the proofers...
>
> -bowerbird

Perhaps you could donate a few hundred dollars to the homeschooled minor (the 
person asking about what free/open OCR alternatives exist) so they can 
purchase a copy of one of your "approved" commercial OCR packages.

From Bowerbird at aol.com  Thu Aug 21 15:42:53 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 21 Aug 2008 18:42:53 EDT
Subject: [gutvol-d] time to step in and exercise some leadership
Message-ID: 

or, i suppose, you could instead decide to send marcello on a mission
to harangue abbyy to make their app "open source" and give it away...

and to make a version for linux, of course.

(and even one for us mac people too, while they're at it.)

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ebooks at ibiblio.org  Thu Aug 21 15:59:11 2008
From: ebooks at ibiblio.org (Jose Menendez)
Date: Thu, 21 Aug 2008 18:59:11 -0400 (EDT)
Subject: [gutvol-d] tanglewood tales -- 02
In-Reply-To: 
References: 
Message-ID: 

Bowerbird wrote:

> i just uploaded the latest copy of my "mountain blood" text.
> you know where to find it.   so... make yourself useful and
> go find the errors in it.


Now, Bowerbird, I'm sure you recall how long and how much you taunted Jon
Noring about there still being a couple of errors in his "My ?ntonia" demo
project, so I hope you don't expect that I'm going to reveal to you all
the errors that are left in your "Mountain Blood" text.

But just to show you that I don't have trouble finding errors in your
work, here's one more for you from page 341:

    the prospect of release from, its bewildering fullness.

The comma after "from" shouldn't be there.

http://z-m-l.com/go/mount/mountp341.html

http://z-m-l.com/go/mount/mountp341.png

Let me know when you're finally finished with it. (Sometime next year,
perhaps?) And I'll see how many errors are left. Of course, I probably
won't tell what they are. :)


Jose Menendez


P.S. I was going to add that 3 rounds of proofreading at DP (two parallel
P1 proofings, then a P2 round) hadn't caught the errors I've pointed out.
But before making that claim, I decided to check these two pages you put
online to show their proofing results:

http://z-m-l.com/go/mount/129_differences_a-vs-b_total.html
http://z-m-l.com/go/mount/mount-c-p2results.html

And look what I found:


#>   http://z-m-l.com/go/mount/mountp341.png

mount-a>   the prospect of release from its bewildering fullness.
mount-b>   the prospect of release from, its bewildering fullness.


So that OCR error *was* found by one group of P1 proofers, but somehow you
neglected to count it as one of the OCR errors that your preprocessing
missed. OOPS! I wonder whether they found any other errors that you
neglected to count.


From jayvdb at gmail.com  Thu Aug 21 17:17:00 2008
From: jayvdb at gmail.com (John Vandenberg)
Date: Fri, 22 Aug 2008 10:17:00 +1000
Subject: [gutvol-d] time to step in and exercise some leadership
In-Reply-To: 
References: 
Message-ID: 

On Fri, Aug 22, 2008 at 5:59 AM,   wrote:
> time for someone to step into this thread over at d.p.:
>>   http://www.pgdp.net/phpBB2/viewtopic.php?t=15949&start=390
> and tell the people there that it is _unacceptable_ to use
> lousy o.c.r. programs that waste the time of the proofers...

If the scans are uploaded to pgdp, it should be possible to push the
scans through the WeOCR grid, which released a new version 3 days ago.

http://weocr.ocrgrid.org/

On Wikisource we have an OCR button, which when invoked by a user,
sends a bot off to OCR the image using tesseract.

For example, here is the history of a page which used this feature:

http://en.wikisource.org/w/index.php?title=Page:Speeches_And_Writings_MKGandhi.djvu/996&action=history

This is the OCR result:

http://en.wikisource.org/w/index.php?title=Page:Speeches_And_Writings_MKGandhi.djvu/996&oldid=734806

And the cleaned up results:

http://en.wikisource.org/wiki/Page:Speeches_And_Writings_MKGandhi.djvu/996

The OCR is not always so good:

http://en.wikisource.org/w/index.php?title=Page:Robertson_Scottish_Gaelic_Dialects0104.png&oldid=712678

--
John Vandenberg

From Bowerbird at aol.com  Thu Aug 21 20:41:03 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 21 Aug 2008 23:41:03 EDT
Subject: [gutvol-d] time to step in and exercise some leadership
Message-ID: 

john said:
>   If the scans are uploaded to pgdp, it should be possible to push the
>    scans through the WeOCR grid, which released a new version 3 days ago.
>    http://weocr.ocrgrid.org/

it's cool that the web has gotten around to doing o.c.r. for people for 
free...


>    On Wikisource we have an OCR button, which when invoked by a user,
>    sends a bot off to OCR the image using tesseract.

...but tesseract is a long way from the state of the art.

you weren't on the listserve when i demonstrated it conclusively, with data,
but tesseract is so clearly inferior to abbyy finereader that it is a huge 
waste
of proofer's time to use it, to the point that its use is 
_strongly_discouraged_.
it can make hundreds -- thousands! -- of errors that finereader reads fine...

also, d.p. has an o.c.r. pool, where people who have good o.c.r. programs
are willing to do o.c.r. for anyone who doesn't have a good o.c.r. program.

so not only is the use of tesseract a waste of time and energy, but it's also 

totally unnecessary, so it's a totally unnecessary waste of time and energy.
that's why someone should enlighten people's thinking in that d.p. forum...

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From jayvdb at gmail.com  Thu Aug 21 23:00:03 2008
From: jayvdb at gmail.com (John Vandenberg)
Date: Fri, 22 Aug 2008 16:00:03 +1000
Subject: [gutvol-d] time to step in and exercise some leadership
In-Reply-To: 
References: 
Message-ID: 

On Fri, Aug 22, 2008 at 1:41 PM,   wrote:
> john said:
>>   If the scans are uploaded to pgdp, it should be possible to push the
>>   scans through the WeOCR grid, which released a new version 3 days ago.
>>   http://weocr.ocrgrid.org/
>
> it's cool that the web has gotten around to doing o.c.r. for people for
> free...
>
>
>>   On Wikisource we have an OCR button, which when invoked by a user,
>>   sends a bot off to OCR the image using tesseract.
>
> ...but tesseract is a long way from the state of the art.

Tesseract is excellent for many purposes, and getting better with each release.

http://groundstate.ca/ocr

> you weren't on the listserve when i demonstrated it conclusively, with data,

Dont worry, I hear ya ... in lots of places

http://mail.archive.org/pipermail/ol-discuss/2008-July/000130.html

> but tesseract is so clearly inferior to abbyy finereader

No surprise there.  They charge a high price for good reason.

> that it is a huge waste
> of proofer's time to use it, to the point that its use is
> _strongly_discouraged_.
> it can make hundreds -- thousands! -- of errors that finereader reads
> fine...

Agreed.  On Wikisource we are in agreement that the OCR button
shouldnt be so prominent, because newcomers often ask Tesseract to
perform miracles.

> also, d.p. has an o.c.r. pool, where people who have good o.c.r. programs
> are willing to do o.c.r. for anyone who doesn't have a good o.c.r. program.

While on the topic of free OCR, Scribd are accepting paper based
documents via regular post, doing OCR, and putting them online.  The
offer was made April 1, 2008 and is still going, ...

http://blog.scribd.com/2008/04/convert-your-paper-to-ipaper.html
http://www.boingboing.net/2008/04/01/free-bulkscanning-oc.html

It's hard to navigate their document collection to work out if any
texts have appeared via this route.

PDF uploads dont appear to be OCR'd:

http://www.scribd.com/doc/491637/Chymical-Natural-and-Physical-Magic

I think Internet Archive might also do OCR on uploads, but I've not
tried it.  What software are they using?

> so not only is the use of tesseract a waste of time and energy, but it's
> also
> totally unnecessary, so it's a totally unnecessary waste of time and energy.
> that's why someone should enlighten people's thinking in that d.p. forum...

Someone did point out in the forum that pgdp has an OCR Pool.

--
John Vandenberg

From Bowerbird at aol.com  Fri Aug 22 00:23:51 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 22 Aug 2008 03:23:51 EDT
Subject: [gutvol-d] time to step in and exercise some leadership
Message-ID: 

john said:
>   Tesseract is excellent for many purposes, and getting better with each 
release.

thousands of unnecessary errors.   i've proven it.   with data from several 
books.
a chain is only as strong as its weakest link...   and tesseract is a very 
weak link...

if you're gonna use tesseract, there's no need to develop preprocessing 
tools,
because most of them time, you'll be working with garbage from the get-go.

do solid research comparing tesseract with abbyy.   the results will be 
obvious.
it was so striking, it really surprised me.


>   Dont worry, I hear ya ... in lots of places
>   http://mail.archive.org/pipermail/ol-discuss/2008-July/000130.html

ok, well, for an "update" on that, look here:
>    http://mail.archive.org/pipermail/ol-tech/2008-August/

the upshot is that early this week the o.c.a. people seemed responsive
to this problem which i have been bringing to their attention for _years_,
quite literally, but by the middle of this week, they had reverted into their
defensive ignore-the-message-and-kill-the-messenger mode.   sigh...

you know, having done it a lot, i know that when you point out to people
that their work-product is defective, they're not going to take it very well.

but my goodness, when the flaw is both totally obvious and clearly fatal,
it seems like they would swallow their foolish pride and _go_fix_the_bug_.
but i guess not...

anyway, the problem is not with the o.c.r. that is done by the o.c.a. people.
they're using abbyy, and they get pretty good results.   the problem is that
they mishandle abbyy's output, so it's fatally flawed by the time users get 
it,
i.e., it's missing its em-dashes, and sometimes curly-quotes and other stuff.

i've worked on apps that make it easy to replace those characters if you are
looking at the scan, but even with the tools, it's still faster to re-do the 
o.c.r.

and it's just so _senseless_.   basically because some incompetent person has
introduced some glitch into the workflow, and is too proud -- or whatever --
to face up to the fact that he has his head up his butt, there are a ton of 
books
for which we have scans but no workable o.c.r.   it's enough to make you 
cry...


>    While on the topic of free OCR, Scribd are accepting 
>    paper based documents via regular post, doing OCR, 
>    and putting them online.? The offer was made 
>    April 1, 2008 and is still going

and i posted about it to this listserve on april 8th, and i quote:
>    "they're probably not thinking what you and i are thinking..."

but, to be truthful, i haven't yet submitted anything to them, so they 
_might_
cheerfully accept and act fast on 10,000 scan-sets i snagged from the o.c.a.

or maybe not.         :+)

anyway, there was, as you might suspect, no response from anyone to my post.

just like there was no response to today's message about the von ahn article,
where they say they've proofed 440 million words via the recaptcha method.
talk about "distributed proofreaders"!   you'd think that someone here would
have _something_ to say about it, wouldn't you?   or am i the only person who
can see its relevance?               :+)


>   Someone did point out in the forum that pgdp has an OCR Pool.

right, and if there was any leadership over there, that leadership would have
reinforced that message, and said explicitly that tesseract was unacceptable,
at least in the general case, and that exceptions would be made only when
the content-provider could show unequivocally that its output was excellent,
because otherwise the drain on proofers' time and energy was too wasteful...
it's simply unethical to waste the time and energy being donated by 
volunteers
in good faith to a good cause.   their expectation is that you wouldn't do 
that...

the data is overwhelming on the number of unnecessary errors in each book.

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From walter.van.holst at xs4all.nl  Fri Aug 22 00:48:12 2008
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Fri, 22 Aug 2008 09:48:12 +0200
Subject: [gutvol-d] time to step in and exercise some leadership
In-Reply-To: 
References: 
	
Message-ID: <48AE6F3C.9080307@xs4all.nl>

John Vandenberg wrote:
> On Fri, Aug 22, 2008 at 5:59 AM,   wrote:
>> time for someone to step into this thread over at d.p.:
>>>   http://www.pgdp.net/phpBB2/viewtopic.php?t=15949&start=390
>> and tell the people there that it is _unacceptable_ to use
>> lousy o.c.r. programs that waste the time of the proofers...
> 
> If the scans are uploaded to pgdp, it should be possible to push the
> scans through the WeOCR grid, which released a new version 3 days ago.
> 
> http://weocr.ocrgrid.org/

That is a nifty concept. Have any of the people involved already looked 
at the GPL version of Cuniform, which is probably the most advanced open 
source OCR package around (more advanced than Tesseract)?

The Linux port is under way at https://launchpad.net/~cuneiform

Regards,

  Walter


From joshua at hutchinson.net  Fri Aug 22 05:54:24 2008
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri, 22 Aug 2008 12:54:24 +0000 (GMT)
Subject: [gutvol-d] time to step in and exercise some leadership
Message-ID: <1199370355.337091219409664219.JavaMail.mail@webmail02>

I know a few years back DP looked at setting up something like that with FineReader running on the server.  There were two main gotchas.

1 - It had to run on a Windows server.  We could probably come up with something there (either a second box or virtualization).  The main DP server runs Linux, as I recall.

2 - The price was prohibitive.  It wasn't a one time cost for the server license, which we might have handled through a donation drive or something.  It had a per page price, which meant we'd have to budget and come up with money, make payments, etc.  I think it was right after that we came up with the OCR pool concept which is still in use today.

Josh

On Aug 22, 2008, walter.van.holst at xs4all.nl wrote: 

That is a nifty concept. Have any of the people involved already looked 
at the GPL version of Cuniform, which is probably the most advanced open 
source OCR package around (more advanced than Tesseract)?

The Linux port is under way at https://launchpad.net/~cuneiform



From Bowerbird at aol.com  Fri Aug 22 09:13:35 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 22 Aug 2008 12:13:35 EDT
Subject: [gutvol-d] tanglewood tales -- 02
Message-ID: 

so, do i even bother to open this latest e-mail from jose?   or not?

on the one hand, it's probably just more of the same diversionary crap.

on the other hand, maybe he's got a good juicy list of errors in my file...

i don't know what to do, it's a hard decision.

oh, ok, darn it, i'll open it and see.

one error.   one error!   that's all you got for me, jose, just one error?

c'mon, we both know if you were really trying, you'd have a list of six,
or even a dozen or two.   1 error?   that says you're just toying with us.

jeez, _i_ can find 1 error in the file, and i just posted it yesterday:
>    hundred per cent, increase."
>    hundred per cent increase."
>    http://z-m-l.com/go/mount/mountp219.html

(the scan shows a period instead of a comma, but i would "modernize"
"per cent." by deleting the period altogether, so it's an error either way.)

jose is correct (in his typical small-minded way) that the error he found
would not have been detected by my preprocessing routines as given,
so yes, that does increase the number of such errors -- from _3_ to _4_.

however, in his typical misses-the-big-picture way, jose fails to see that
a grand total of 4 o.c.r. errors missed in _preprocessing_ is still 
fantastic.

heck, even if he found a dozen more errors that were likewise undetectable
-- or two or three dozen! -- it still wouldn't change that big-picture 
view...

judgment:   same diversionary crap.

starting with this post, i'm keeping a thumbs up/down tally on jose's posts:
>    jose's scorecard
>    thumbs up -- 0
>    thumbs down -- 1

let's hope jose improves his signal-to-noise ratio...

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug 22 09:33:40 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 22 Aug 2008 12:33:40 EDT
Subject: [gutvol-d] tanglewood tales -- 10 -- the wrap-up
Message-ID: 

we've been looking at the new improved "tanglewood tales", e-text #976.

the old version of this book -- dating back to 2001 -- had 42 errors that
were corrected in the new version.   (that counts a number of instances of
consecutive-spaces that were changed to a single space; otherwise, the
number of corrections was 36.   i'm trying to give as much credit as i can.)

as i said earlier, the errors in the old version were _not_ all that 
serious...
it's good to keep that in mind.   still, we're glad those 42 errors were 
fixed.

however, on top of those 42, my simple routines here were able to detect
26 more easily-found-by-the-computer errors in this just-updated book.
(and recall that i was fairly lenient in not counting 15 hyphenation errors,
and 4 improper quoting errors, and those would have put my total at 45.)

this means there were at least 68 errors in the old version. (more like 87.)

the person doing the updated version caught 42 of those 68, or about 62%.

if the 42-68 errors that were in the older e-text were not all that 
significant,
the 26 errors which remain in the updated e-text are not significant 
either...

but nonetheless, since the 26 errors that i've uncovered were so dirt-simple
to detect, shouldn't we raise the bar -- and improve the tools -- to the 
point
that we _do_ detect them, and _fix_ them?   i cannot see any reason why 
not...

especially since the whitewashers typically decline to act on error-reports
on pre-10000 e-texts _until_ they're ready to move 'em to their new spot.

so when they actually _do_ the move, it would be reassuring to think that
they were correcting the errors therein to the very best of their 
abilities...

these are _obvious_ errors.   if you know how, you can find 'em in _seconds_.
there's absolutely no reason to allow these errors to remain in _any_ e-text.

clearly, the tools need to be improved.

these were simple routines, which should be in _any_ cleaning program.

let's hope it doesn't take another 7 years to fix the remaining 26-45 
errors...

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug 22 14:35:28 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 22 Aug 2008 17:35:28 EDT
Subject: [gutvol-d] an interesting contrast, concerning o.c.r. at d.p.
Message-ID: 

ok, here's an interesting contrast.   see this message:
>    http://www.pgdp.net/phpBB2/viewtopic.php?p=482389#482389

that gives you a good idea of the kind of experience and expertise
that is available at d.p, centered around the use of abby finereader.

contrast that with the naivety of the people looking for freeware o.c.r.

it's not bad that they're naive -- everyone who's starting out is naive --
but it is bad that experienced people are not guiding them elsewhere...

notice even in that regard that some people waved off some others about
simple-ocr, saying it was a waste of time.   that's great!   now do the same
about tesseract.   oh, sure, you can fill them in with even more solid 
details
-- about how tesseract is being supported by google as open-source, and
how that means it's likely to improve as time goes by, and if they would like
to _experiment_ with it so that they can track that progress as it occurs 
that
would be a good thing, and how maybe d.p. can even help get it improved,
and so on -- just as long as you highlight the _main_ point most strongly,
namely that "we don't use tesseract for stuff that's going to go to proofers,
because abbyy saves their time by making considerably fewer o.c.r. errors."

or even shorter: "don't use tesseract.   have the o.c.r. pool do it with 
abbyy."

that is, wave them off of tesseract like you waved them off of simple-ocr.
not quite so far off, it's true, but at least far enough from using tesseract
on text that will go in front of proofers.

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dakretz at gmail.com  Fri Aug 22 17:22:53 2008
From: dakretz at gmail.com (don kretz)
Date: Fri, 22 Aug 2008 17:22:53 -0700
Subject: [gutvol-d] TwistEd
Message-ID: <627d59b80808221722o25719df4id3280222ec785c0f@mail.gmail.com>

I've posted the next iteration of TwistEd. It's minimally less minimal.

http://code.google.com/p/dp50/downloads/list

There are several regex search-and-replace patterns hardcoded into it. (The
next step is to be able to read and write them to a file; and to create new
ones interactively.

The regex for detecting quoted strings may be of interest. it does a
mediocre job of detecting them, highlighting them, and suggesting a revision
to fix spacey-quotes. There are lots of ways to improve how it's used, but a
good next suggestion would be a better regex pattern. Especially one that
can differentiate ones with and without problems. (Please test before
posting, though.)

There is a second Hunter control subpanel at the bottom of the page image
tab, so you can stay in text/image view and move around. I've texted "Next"
and "Replace + Next" a bit, but "Prev" and "Replace + Prev" not much.

Without an "Undo" feature yet, the "Replace All" should be avoided. Don't
run with scissors.

I'm mainly looking for feedback on the primary workflow at this point.

I'll be out of the country next week, so unfortunately won't be responding
after tomorrow.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Fri Aug 22 20:55:22 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 22 Aug 2008 23:55:22 EDT
Subject: [gutvol-d] TwistEd
Message-ID: 

dkretz said:
>   I've posted the next iteration of TwistEd.

congratulations, my friend.   you did it...

very nice.   you've crested over the verge.
be responsible for cursor-keys and you'll
start picking up the speed of momentum.


>    and suggesting a revision to fix spacey-quotes.
>   There are lots of ways to improve how it's used, but 
>    a good next suggestion would be a better regex pattern.

wrong.   a good next suggestion is to
use the methodology i told you about,
which has a far greater accuracy rate...

use a screwdriver for screws, not a hammer.

***

>    I'm mainly looking for feedback on the primary workflow at this point.

i'm not sure what that means, but i am sure
that i will have plenty of advice to give you...        ;+)

***

have a nice trip outside the country...

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From gbnewby at pglaf.org  Sat Aug 23 13:38:35 2008
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat, 23 Aug 2008 13:38:35 -0700
Subject: [gutvol-d] PG article
Message-ID: <20080823203835.GB10958@mail.pglaf.org>

FYI:

----- Forwarded message from Kelly Sonora  -----

 From: Kelly Sonora 
 To: gbnewby at pglaf.org
 Subject: Project Gutenberg

Hi Dr. Gregory,



We just posted an article, "*100 Awesome, Free Web Tools for Elementary
Teachers*" (
http://www.smartteaching.org/blog/2008/08/100-awesome-free-web-tools-for-elementary-teachers/).
I thought I'd bring it to your attention in case you think your readers
would find it interesting.



I am happy to let you know that your site has been included in this list.



Either way, thanks for your time!



Kelly Sonora

----- End forwarded message -----

From Bowerbird at aol.com  Sat Aug 23 14:20:08 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 23 Aug 2008 17:20:08 EDT
Subject: [gutvol-d] Q&A with Developer Who Turns Ebooks into iPhone
	Applications
Message-ID: 

>    Q&A with Developer Who Turns Ebooks into iPhone Applications
>    http://toc.oreilly.com/2008/08/qa-with-developer-who-turns-eb.html

question:
>    Some App Store reviewers complain that 
>    you're making money off of public domain content. 
>    How do you address these complaints?

answer:
>    The Project Gutenberg license clearly allows people 
>    to sell works based on the Gutenberg files. 
>    I am following the license, and I do send 20 percent 
>    of the revenue earned to the Project Gutenberg Foundation. 
>    Mobipocket, eReader and Amazon Kindle 
>    all sell public domain works for much more than $0.99.

and then he continues with this:
>    Each book requires a lot of manual work. 
>    The Project Gutenberg text files are a good starting point, 
>    but I have to edit each one to add information about 
>    chapter starts, poems, songs, emphasized text, etc. 
>    Many files have extra data like page numbers that 
>    have to be cleaned up. I tried to automate this part, 
>    but there is so much variety in the files that 
>    only hand editing can get the correct results.

every developer who tries to work with the p.g. library says this same thing.

every single developer.

every one.

without exception.

every single one.

when is this problem going to be addressed?

when?

i really want to know.

so i'm asking for the 493rd time, when is this problem going to be fixed?

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ajhaines at shaw.ca  Sat Aug 23 15:41:51 2008
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Sat, 23 Aug 2008 15:41:51 -0700
Subject: [gutvol-d] Q&A with Developer Who Turns Ebooks into
	iPhoneApplications
References: 
Message-ID: <000a01c90571$6f719d90$6401a8c0@ahainesp2400>

If he's trying to clean up the older etexts, e.g., those done before 2000 (the year, not the etext 
number), that David Widger and I are working through, he's got his work cut out for him--some of 
them take hours to clean up.  (Some of them could benefit from a complete re-do from scratch. 
Volunteers?)  (Please--no yammering about consistency or content loss--it isn't fair or reasonable 
to impose today's hardware and software standards on PG's pioneer volunteers.  Some of them didn't 
have scanners, and for those that did, I can only assume that current OCR software is vastly 
superior to that of the 1990's.  They also didn't have the benefit of Gutcheck/Jeebies/Gutspell to 
check their texts.)

In my WW experience, DP's text files are generally consistent with PG's standards in their 
between-chapter spacing; the presence of indentation to indicate poems, blockquotes, etc.; the use 
of underscores to indicate italics, and so forth.  (Has this developer read PG's FAQ's?  If not, why 
not?)

If the iPhone can't display a given text file in some particular way, is it not smart enough to 
downshift gracefully and display raw text?  If not, why not?

The iPhone supports valid HTML, so why does this developer not ignore the text versions of etexts in 
favor of the HTML versions?



----- Original Message ----- 
From: Bowerbird at aol.com
To: gutvol-d at lists.pglaf.org ; Bowerbird at aol.com
Sent: Saturday, August 23, 2008 2:20 PM
Subject: Re: [gutvol-d] Q&A with Developer Who Turns Ebooks into iPhoneApplications


>   Q&A with Developer Who Turns Ebooks into iPhone Applications
>   http://toc.oreilly.com/2008/08/qa-with-developer-who-turns-eb.html

question:
>   Some App Store reviewers complain that
>   you're making money off of public domain content.
>   How do you address these complaints?

answer:
>   The Project Gutenberg license clearly allows people
>   to sell works based on the Gutenberg files.
>   I am following the license, and I do send 20 percent
>   of the revenue earned to the Project Gutenberg Foundation.
>   Mobipocket, eReader and Amazon Kindle
>   all sell public domain works for much more than $0.99.

and then he continues with this:
>   Each book requires a lot of manual work.
>   The Project Gutenberg text files are a good starting point,
>   but I have to edit each one to add information about
>   chapter starts, poems, songs, emphasized text, etc.
>   Many files have extra data like page numbers that
>   have to be cleaned up. I tried to automate this part,
>   but there is so much variety in the files that
>   only hand editing can get the correct results.

every developer who tries to work with the p.g. library says this same thing.

every single developer.

every one.

without exception.

every single one.

when is this problem going to be addressed?

when?

i really want to know.

so i'm asking for the 493rd time, when is this problem going to be fixed?

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel deal here.
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)



_______________________________________________
gutvol-d mailing list
gutvol-d at lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d 



From bruce at zuhause.org  Sat Aug 23 19:14:43 2008
From: bruce at zuhause.org (Bruce Albrecht)
Date: Sat, 23 Aug 2008 21:14:43 -0500
Subject: [gutvol-d] Q&A with Developer Who Turns Ebooks
	into	iPhoneApplications
In-Reply-To: <000a01c90571$6f719d90$6401a8c0@ahainesp2400>
References: 
	<000a01c90571$6f719d90$6401a8c0@ahainesp2400>
Message-ID: <18608.50196.293.882963@celery.zuhause.org>

Al Haines (shaw) writes:
 > The iPhone supports valid HTML, so why does this developer not ignore the text versions of etexts in 
 > favor of the HTML versions?

If the iPhone doesn't support CSS, which is true of some other
smartphone browsers I've used, then using the HTML versions that use
CSS to do "fancy stuff" like mark page breaks, is part of the problem:

 > and then he continues with this:
 > >   Each book requires a lot of manual work.
 > >   The Project Gutenberg text files are a good starting point,
 > >   but I have to edit each one to add information about
 > >   chapter starts, poems, songs, emphasized text, etc.
 > >   Many files have extra data like page numbers that
 > >   have to be cleaned up. I tried to automate this part,
 > >   but there is so much variety in the files that
 > >   only hand editing can get the correct results.

I've found that GutenMark tends to work fairly well to create HTML
editions from text-only files, but I've only used it on novels, and
GutenMark doesn't handle table of contents, or links around chapter
starts. 

Of course, there are a number of volunteers who would argue that the
solution is to use some sort of markup language that can be used to
generate any number of formats (e.g., PG-TEI, or in BB's case, ZML; my
impression of ZML is that it's not capable of handling embedded
images, and if I'm wrong, I'm sure BB will correct me), so that we
could have standard translators for all sorts of file formats,
including whatever level of HTML the iPhone supports.

From Bowerbird at aol.com  Sat Aug 23 19:39:38 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 23 Aug 2008 22:39:38 EDT
Subject: [gutvol-d] Q&A with Developer Who Turns Ebooks into
	iPhoneApplications
Message-ID: 

al said:
>    If he's trying to clean up the older etexts

this isn't just a problem with the "older" e-texts.


>    Please--no yammering about consistency

"yammering"?   do you really think that inconsistency
can be excused by accusing critics of 'yammering"?


>    it isn't fair or reasonable to impose today's hardware 
>    and software standards on PG's pioneer volunteers.

oh please.   do you think _consistency_ is "today's standard", and that 
it had never occurred to anyone to think about it in ye olden dayes?


>    Some of them didn't have scanners

again, this has nothing to do with consistency.


>    and for those that did, I can only assume that 
>    current OCR software is vastly superior to that of the 1990's.

also, nothing to do with consistency.


>    They also didn't have the benefit of Gutcheck/Jeebies/Gutspell 
>    to check their texts.)

and yet the e-texts, even new ones, are still inconsistent.   go figure...

if a _committment_ is not made to attain consistency, you won't get it.


>    In my WW experience, DP's text files are generally consistent 
>    with PG's standards in their between-chapter spacing; 
>    the presence of indentation to indicate poems, blockquotes, etc.; 
>    the use of underscores to indicate italics, and so forth.? 

they are indeed "generally consistent", emphasis on "generally".

and "generally consistent" is another phrase for "inconsistent".

what part of this do you not understand?


>    (Has this developer read PG's FAQ's?? If not, why not?)

you have this childlike innocence that the _reality_ of the e-texts is 
represented by the f.a.q.   the inconsistency problem is that it is _not_.

if the e-texts followed the f.a.q., there would _be_ no inconsistencies!

that's exactly the problem i have asked -- for years -- to have solved...


>    If the iPhone can't display a given text file in some particular way, 
>    is it not smart enough to downshift gracefully and display raw text?? 
>    If not, why not?

yeah, right, the problem is the _iphone_ isn't "smart enough" to do it right;
i'm sure steve jobs and company will get right down to solving that matter.

you seriously don't know what the problem is, do you?
which means we have a meta-problem.   and you're not
willing to do any work to _find_out_ what the problem is,
are you?   so we have a meta-problem on the meta-problem.
and you don't "generally" read posts about the problem, do you?
so we have meta-problem on meta-problem on meta-problem...
and when you _do_ read those posts, you wonder why they're so
_rude_, don't you?, like the person had been "yammering" for years,
and nobody had done anything about it, or even acknowledged it.
so you say, "hey, that's no way to win friends or influence people."
don't you?   and you set out to kill the rude yammering messenger...


>    The iPhone supports valid HTML, so why does this developer 
>    not ignore the text versions of etexts in favor of the HTML versions?

never written an e-book program, have you?

never talked to a programmer who has written an e-book program, have you?

don't have the slightest idea about the dynamics at work in this situation, 
do you?

but yet you're _convinced_ that the problem lies anywhere except in your 
yard...

well, al, i must say, you're a perfect fit as a whitewasher.   yep, a perfect 
fit.

except for the fact that you haven't kill-filed me yet.   but you will.   
soon...

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Sat Aug 23 20:04:51 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 23 Aug 2008 23:04:51 EDT
Subject: [gutvol-d] Q&A with Developer Who Turns Ebooks into
	iPhoneApplications
Message-ID: 

bruce said:
>    my impression of ZML is that it's not capable of handling embedded 
images, 
>    and if I'm wrong, I'm sure BB will correct me)

yes, you're wrong.        :+)


>    there are a number of volunteers who would argue that 
>    the solution is to use some sort of markup language 
>    that can be used to generate any number of formats

in order to _get_ to "some sort of markup language" like that,
you will need to _remove_the_inconsistencies_ in the library...

but if you just removed the inconsistencies, you could be well on
the way to having _any_ sort of markup language, any sort at all.

so instead of engaging in the senseless discussion of "markup"
-- one that has a long proven history here of going nowhere --
settle on the _simple_ step of _removing_the_inconsistencies_...


>    so that we could have standard translators for all sorts of file 
formats,
>    including whatever level of HTML the iPhone supports.

the problem here is not the "level of html the iphone supports".

that's totally irrelevant, because this developer isn't using .html.
and there are very good reasons why he's not.

project gutenberg needs to _listen_ to developers, carefully,
with respect, and a real desire to understand what they need,
and a commitment to give it to them.   it's really not that hard.

if you want it in a nutshell, it's as simple as "follow your own f.a.q.".

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Sun Aug 24 10:59:27 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 24 Aug 2008 13:59:27 EDT
Subject: [gutvol-d] september 23 is four weeks from today
Message-ID: 

on 9/23, four weeks from today, i will release "banana cream" to the public.

it is already in the hands of a few people who are giving me feedback...

i pulled back earlier releases of this program because flak was directed
at me by a small number of people on this list.   i will likewise pull back
this release if those same people again direct flak at me in the interim...

i refuse to give my tools to a community who doesn't deserve them...

so if you would like to see this program, and they start to send out flak,
i suggest you let them know -- in no uncertain terms -- to knock it off.

not that i'm worried about those people.   they have lost their credibility,
so are much less vocal.   (they now do their backstabbing backchannel.)

i just want you to know how to prevent their sabotage of your toolchest.
don't let their bad behavior deprive you of a chance to evaluate this tool.

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From bruce at zuhause.org  Mon Aug 25 06:51:01 2008
From: bruce at zuhause.org (Bruce Albrecht)
Date: Mon, 25 Aug 2008 08:51:01 -0500
Subject: [gutvol-d] Q&A with Developer Who Turns Ebooks
	into	iPhoneApplications
In-Reply-To: 
References: 
Message-ID: <18610.47301.781455.593867@celery.zuhause.org>

Bowerbird at aol.com writes:
 > bruce said:
 > >    my impression of ZML is that it's not capable of handling embedded 
 > images, 
 > >    and if I'm wrong, I'm sure BB will correct me)
 > 
 > yes, you're wrong.        :+)

So how do you handle embedded images with ZML?

From Bowerbird at aol.com  Mon Aug 25 10:05:20 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 25 Aug 2008 13:05:20 EDT
Subject: [gutvol-d] Q&A with Developer Who Turns Ebooks into
	iPhoneApplications
Message-ID: 

bruce said:
>    So how do you handle embedded images with ZML?

basically the exact same way it's handled in .html.

you put the filename of the graphic wherever you want it displayed, and 
the viewer-program picks up the name, gets the file, and shows it there...

that's why i ask p.g. to put the graphic filenames in their .txt versions, 
and
continue asking again and again when they refuse this reasonable request.

the z.m.l. viewer-app will float the graphic to the nearest bit of 
whitespace,
where it is presented as big as possible, and automatically becomes a _link_
to a separate window where it is presented as large as the monitor can go...

(this separate window is also a dedicated graphics-viewer that will show a
slideshow of all of the graphics -- and movies -- in the entire document.)

now, of course, you don't have to put any _markup_ around the filename...

but an especially nifty part of the convention is that you need not include
the _extension_ of the graphic either.   so if you have an image-file named
"bruce.jpg", then it will be displayed on any page that has the word "bruce".

(it's the same for other media-files, so a movie named "zuhause.mov" will be
presented on any page with the string "zuhause".   and likewise for an .mp3.)

this means that document-creation is radically simplified...

not only that, but -- to facilitate the creation of "mash-ups" -- _any_ file
that is contained in the current folder is considered part of the document.

so by simply placing aptly-named files in the folder, an end-user can 
"inject"
content into the book.   for example, if they put in a file named 
"content.jpg",
the file will be displayed on any page which contains the word "content" on 
it.

or by replacing the "content.jpg" file with another image-file with that 
name,
they can have the new image displayed instead of the old one.   (but of 
course,
they can do the same type replacement with an .html version of the document.)

as expected, when .zml is converted to .html or .pdf, their conventions are 
used.

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From Bowerbird at aol.com  Tue Aug 26 14:09:10 2008
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 26 Aug 2008 17:09:10 EDT
Subject: [gutvol-d] tanglewood tales -- 11 -- clean-up
Message-ID: 

we examined the new improved "tanglewood tales", e-text #976.

here are some clean-up notes...

at one time, al suggested the p.g. text might have been based
on a different edition than the one that i used for checking it...

while i don't think it accounts for many of the errors i located
-- perhaps none at all, and certainly not the dozens i found --
there were a huge number of punctuation differences, and that's
a pretty good indication that they were indeed different editions,
although much sleuthing on my part failed to discover it online...

here are other points-of-difference that might help you locate
the edition that might have been used to make the p.g. e-text:

pg>    By this time, the whole nation of the Pygmies
ao>    By this time, the whole nation of Pygmies
me>    http://z-m-l.com/go/tnglw/tnglwp085.html

pg>    "Alas!" he cried, "I greatly fear that we shall
ao>    "Alas!" cried he, "I greatly fear that we shall
me>    http://z-m-l.com/go/tnglw/tnglwp194.html

pg>    "Dearest mother," exclaimed Proserpina,
ao>    "Dearest mother," answered Proserpina,
me>    http://z-m-l.com/go/tnglw/tnglwp270.html

also, here are a few more errors inside the text of the new improved version:

pg>    And whence could this bull have com? Europa and her brothers had been
ao>    And whence could this bull have come? Europa and her brothers had been
me>    http://z-m-l.com/go/tnglw/tnglwp109.html

pg>    for all those weary wonderings in quest of her since he left King
ao>    for all those weary wanderings in quest of her since he left King
me>    http://z-m-l.com/go/tnglw/tnglwp158.html

pg>    father. I ought not to forget the prophets and conjurors, of whom 
there
ao>    father. I ought not to forget the prophets and conjurers, of whom 
there
me>    http://z-m-l.com/go/tnglw/tnglwp298.html

-bowerbird



**************
It's only a deal if it's where you want to go. Find your travel 
deal here.
      
(http://information.travel.aol.com/deals?ncid=aoltrv00050000000047)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: