From Bowerbird at aol.com  Thu Jan  5 13:42:08 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Jan  5 13:42:27 2006
Subject: [gutvol-d] 36,000 and counting
Message-ID: <82.35f5d394.30eeecb0@aol.com>


distributed proofreaders just hit 36,000 registered users.

congratulations!

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060105/35e97f53/attachment.html
From charlzzf at heritagewifi.com  Thu Jan  5 18:39:25 2006
From: charlzzf at heritagewifi.com (Charles Franks)
Date: Thu Jan  5 18:47:23 2006
Subject: [gutvol-d] 36,000 and counting
In-Reply-To: <82.35f5d394.30eeecb0@aol.com>
Message-ID: <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>

Actually, due to a previous programmer removing inactive usernames from the
database the count is much higher...37,437 at the moment. The way to figure
out the correct number is to go to the forums and at the bottom of the main
page and hover over the "The newest registered user is" link and look for
the &u=(some number). That number will be the 'correct' number of users.

Apparently their code for the "We have 36001 registered users" line actually
counts lines in the user table versus looking at the highest userid number
in use.

Thanks though!

Charles Franks
Founder, Distributed Proofreaders


-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Bowerbird@aol.com
Sent: Thursday, January 05, 2006 2:42 PM
To: gutvol-d@lists.pglaf.org; Bowerbird@aol.com
Subject: [gutvol-d] 36,000 and counting



distributed proofreaders just hit 36,000 registered users.

congratulations!

-bowerbird

From Bowerbird at aol.com  Thu Jan  5 19:32:34 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Jan  5 19:32:55 2006
Subject: [gutvol-d] 36,000 and counting
Message-ID: <1c5.37e058be.30ef3ed2@aol.com>

charles said:
>    Thanks though!

no, thank _you_.

it is a remarkable achievement to have
motivated so many to enlist in the cause,
with a good number (in the hundreds!)
devoting _significant_ time and energy
-- 10-40 hours a week! -- from busy lives...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060105/83a49dd9/attachment.html
From jon at noring.name  Thu Jan  5 20:10:32 2006
From: jon at noring.name (Jon Noring)
Date: Thu Jan  5 20:10:53 2006
Subject: [gutvol-d] 36,000 and counting
In-Reply-To: <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>
References: <82.35f5d394.30eeecb0@aol.com>
	<KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>
Message-ID: <612299901.20060105211032@noring.name>

Charles wrote:
> Bowerbird wrote:

>> distributed proofreaders just hit 36,000 registered users.

> Actually, due to a previous programmer removing inactive usernames from the
> database the count is much higher...37,437 at the moment. The way to figure
> out the correct number is to go to the forums and at the bottom of the main
> page and hover over the "The newest registered user is" link and look for
> the &u=(some number). That number will be the 'correct' number of users.
>
> Apparently their code for the "We have 36001 registered users" line actually
> counts lines in the user table versus looking at the highest userid number
> in use.

If the number of users was 1/10 of that, it would be a remarkable
achievement.

But a number rapidly approaching 40,000 is hard to fathom.

A remarkable achievement! Kudos to Charles for founding DP, and for
Juliet and many of the others who keep the system going. And of course
a lot of praise to all the ordinary folk who, page-by-page, proof the
scan sets. (I need to revisit DP and do a few pages myself.)

Maybe it's time to hold an annual DP picnic. Considering the number of
people, it probably needs to be a potluck. Can you imagine having to
buy and prepare 40,000 hot dogs? <laugh/>

Jon

From tstowell at chattanooga.net  Sat Jan  7 13:49:58 2006
From: tstowell at chattanooga.net (Tim Stowell)
Date: Sat Jan  7 16:59:29 2006
Subject: [gutvol-d] 36,000 and counting
In-Reply-To: <612299901.20060105211032@noring.name>
References: <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>
	<82.35f5d394.30eeecb0@aol.com>
	<KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>
Message-ID: <3.0.5.32.20060107164958.0254d100@mail.chattanooga.net>

At 09:10 PM 1/5/06 -0700, Jon wrote:
>Charles wrote:
>A remarkable achievement! Kudos to Charles for founding DP, and for
>Juliet and many of the others who keep the system going. And of course
>a lot of praise to all the ordinary folk who, page-by-page, proof the
>scan sets. (I need to revisit DP and do a few pages myself.)
>
>Maybe it's time to hold an annual DP picnic. Considering the number of
>people, it probably needs to be a potluck. Can you imagine having to
>buy and prepare 40,000 hot dogs? <laugh/>
>
>Jon

What is DP?

Tim
no hot dogs thanks

From ajhaines at shaw.ca  Sat Jan  7 17:28:14 2006
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Sat Jan  7 17:28:25 2006
Subject: [gutvol-d] 36,000 and counting
References: <KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>
	<82.35f5d394.30eeecb0@aol.com>
	<KPEIKILNIGEGKFIHGFBJGEDCGLAA.charlzzf@heritagewifi.com>
	<3.0.5.32.20060107164958.0254d100@mail.chattanooga.net>
Message-ID: <000301c613f2$cbe0ab70$6401a8c0@ahainesp2600>

Distributed Proofreaders - http://www.pgdp.net/c/default.php


----- Original Message ----- 
From: "Tim Stowell" <tstowell@chattanooga.net>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Saturday, January 07, 2006 1:49 PM
Subject: Re: [gutvol-d] 36,000 and counting


> At 09:10 PM 1/5/06 -0700, Jon wrote:
>>Charles wrote:
>>A remarkable achievement! Kudos to Charles for founding DP, and for
>>Juliet and many of the others who keep the system going. And of course
>>a lot of praise to all the ordinary folk who, page-by-page, proof the
>>scan sets. (I need to revisit DP and do a few pages myself.)
>>
>>Maybe it's time to hold an annual DP picnic. Considering the number of
>>people, it probably needs to be a potluck. Can you imagine having to
>>buy and prepare 40,000 hot dogs? <laugh/>
>>
>>Jon
> 
> What is DP?
> 
> Tim
> no hot dogs thanks
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From jon at noring.name  Mon Jan  9 12:38:20 2006
From: jon at noring.name (Jon Noring)
Date: Mon Jan  9 12:45:19 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus;
	Vannevar Bush; Asimov
Message-ID: <1675677963.20060109133820@noring.name>

Everyone,

In doing some research on ebook history, which naturally will
prominently include Project Gutenberg because of the obvious impact PG
and Michael Hart has had on etexts and ebooks, I'm trying to
reconstruct a fact database which includes the who, what, when, where,
why and how of the various seminal events.

So, I've just created a specialized Yahoo Group to collect/archive the
snippets of facts that come up:

   http://groups.yahoo.com/group/ebook-history/

You're welcome to join and post any information you know on ebook
history. Especially wanted is the pre-1990 period: commercial,
academic, and public (free). We should collect this information before
it is lost to the mists of time.


*****

Anyway, in doing research on what is found in the early Google
Groups archive on PG, Bowerbird recommended that I dig through
textfiles.com, which appears to have archived a large number of ASCII
texts that existed on various BBS systems of the 1980's and early
1990's (its coverage/completeness is unknown, however.)

So, focusing on the first "modern" classic book that PG issued,
"Alice's Adventures in Wonderland" (text #11, officially released
01 Jan 1991), I dug through textfiles.com to see what they had.

The oldest copy I found there was "alice11.txt", Millenium Fulcrum
Edition 1.1, dated 1990 (by copyright claim.) See:

   http://www.textfiles.com/etext/FICTION/alice11.txt

That was issued in the very early era when PG was still affiliated
with Duncan Research. As an aside, it is interesting to note the huge
differences in that early text's boilerplate compared with the
present boilerplate. PG has evolved quite a bit in handling the legal
aspects (particularly copyright) of its texts, which is to be fully
expected. So this text is a nice historical reminder of how far PG
has come in the last 16 years.

I was hoping, though, to find a much older text version of "Alice".
Bowerbird stated his belief that Michael Hart keypunched Alice a lot
earlier than 1990/91, but so far I have not found that version,
assuming it was distributed and ended up in some BBS or online
archive.

So, Michael, if you're reading this, did you keypunch "Alice" well
before 1990, where did you distribute it, and does a copy exist
somewhere? That would truly be a historical work, especially if it
was "digiscribed" in the 1970's or early 80's (BBS systems began
maturing in the mid- to late-1980's.)

A question for others reading this: where else should I search for
information on digitized books placed on BBS in the 1980's? Were there
others besides Michael who "digiscribed" public domain books and texts
in the 1980's and placed them online? (I plan to dig through more of
the textfiles area to see what book texts are dated in the 1980's, if
any, and who did them.)

*****

Another interesting thing I discovered in my research -- and which some
of you undoubtedly know about -- is the "Brown Corpus":

   http://en.wikipedia.org/wiki/Brown_Corpus

In the late-1960's, the partial/full texts of a variety of 500 works
published in 1961 were keypunched for computer use (a maximum of 2000
words for each work), totalling a little over 1,000,000 words. The
purpose was solely for lexicostatistics and not for direct reading.
For this purpose the Brown Corpus is quite famous (enough to rate its
own wikipedia article.)

Only a few years later, in 1971 Michael Hart keypunched into a
computer "The Declaration of Independence" for the purpose of
electronic "distribution" and direct reading by others, so Michael is,
as far as is now known, the first person documented to experiment with
electronic distribution of readable, published digital texts. (I plan
to contact the Brown Corpus people, if any are still alive, to see if
there were experiments at Brown, or elsewhere, on this in the 1960's.)

But nevertheless, to see major portions of published texts and books
being keypunched and processed by computers in the 1960's is truly
remarkable.

*****

Another really cool thing, I found a Usenet message from 1987 which,
in turn, is a fairly comprehensive description of an Atlantic Monthly
article written by Vannevar Bush in July 1945, entitled "As We May
Think". It is beyond amazing the insights Vannevar Bush had relevant
to ebooks, to elibraries (like PG's) and the role of individuals and
volunteers. Again, some of you have probably read Vannevar Bush's
article, but for those who haven't...

Usenet summary:

   http://groups.google.com/group/comp.sys.mac.hypercard/msg/660f72a6e3b5f7a2?hl=en&

And the actual article is reproduced here:

   http://www.theatlantic.com/doc/194507/bush

*****

Finally, Mark Bernstein, the founder of Eastgate, which in 1987 issued
several contemporary hypertext fiction ebooks on floppy disk and
CD-ROM, mentioned to me Asimov's mid-50's book "Foundation" where
ebooks are implicit. Has anyone read this book and can comment on
Asimov's 1950's vision for ebooks?

Thanks!

Jon Noring


From sly at victoria.tc.ca  Mon Jan  9 13:38:01 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon Jan  9 13:38:33 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown
	Corpus; Vannevar Bush; Asimov
In-Reply-To: <1675677963.20060109133820@noring.name>
References: <1675677963.20060109133820@noring.name>
Message-ID: <Pine.GSO.4.58.0601091336500.25003@vtn1.victoria.tc.ca>

Thanks for providing the links, fascinating reading.

In answer to one of your questions,

The website:
http://www.aston.ac.uk/lss/english/02_msc/02_diss/mward.jsp

Mentions using alice10.txt, as well as a few other early
PG texts, from the the Walnut Creek CD ROM.

Any chance of finding one of those still floating around
somewhere?

Andrew


On Mon, 9 Jan 2006, Jon Noring wrote:

> Everyone,
>
> In doing some research on ebook history, which naturally will
> prominently include Project Gutenberg because of the obvious impact PG
> and Michael Hart has had on etexts and ebooks, I'm trying to
> reconstruct a fact database which includes the who, what, when, where,
> why and how of the various seminal events.
>

[snip]
From joshua at hutchinson.net  Mon Jan  9 13:56:17 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Jan  9 13:56:47 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice";
	Brown Corpus; Vannevar
Message-ID: <20060109215617.70460EE901@ws6-1.us4.outblaze.com>

----- Original Message -----
From: "Jon Noring" <jon@noring.name>

> 
> Finally, Mark Bernstein, the founder of Eastgate, which in 1987 issued
> several contemporary hypertext fiction ebooks on floppy disk and
> CD-ROM, mentioned to me Asimov's mid-50's book "Foundation" where
> ebooks are implicit. Has anyone read this book and can comment on
> Asimov's 1950's vision for ebooks?
> 
> Thanks!
> 
> Jon Noring
> 


Asimov mentions "film-books" in many of his works, but it is unclear if they are film strips of text running through a special reader or some kind of interactive medium or perhaps a "moving pictures" version of books.  I think he leaves it deliberately vague.

NOTE: I'm pulling this from memory as I don't have any of my Asimov books here at work to flip through.

Josh
From jon at noring.name  Mon Jan  9 14:11:34 2006
From: jon at noring.name (Jon Noring)
Date: Mon Jan  9 14:12:10 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus;
	Vannevar Bush; Asimov
In-Reply-To: <Pine.GSO.4.58.0601091336500.25003@vtn1.victoria.tc.ca>
References: <1675677963.20060109133820@noring.name>
	<Pine.GSO.4.58.0601091336500.25003@vtn1.victoria.tc.ca>
Message-ID: <6132705.20060109151134@noring.name>

Andrew wrote:

> Thanks for providing the links, fascinating reading.

You're welcome. I found my quickie search to yield fascinating stuff.

Unfortunately, I have little time these days to pursue the level of
research I'd like to (which involves talking to a lot of the
old-timers by phone, digging through obscure archives, even doing some
library research to get paper copies of old articles and books that I
can't get online.)


> In answer to one of your questions,
>
> The website:
> http://www.aston.ac.uk/lss/english/02_msc/02_diss/mward.jsp
>
> Mentions using alice10.txt, as well as a few other early
> PG texts, from the the Walnut Creek CD ROM.
>
> Any chance of finding one of those still floating around
> somewhere?

Interesting.

It is unknown whether this is the original one (version 1.0) which
appeared before the version 1.1 edition linked in my prior message.
This might have been version 10.0, for example (Alice is currently at
version 30). Until we find it, we won't know for sure. (The date of
the WC CD-ROM is 1997, and I would surmise they would have kept up
with the latest PG texts, but then maybe not.)

The ultimate authority on whether Michael Hart keyed-in and/or
released "Alice" well before 1990 is Michael himself. Hopefully he will
reply and clarify matters. Better yet, to point out which BBS had
archived it, so we can try to locate a copy with a timestamp.

Jon



From Bowerbird at aol.com  Mon Jan  9 14:35:52 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Jan  9 14:36:28 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus;
	Vannevar Bush; Asimov
Message-ID: <2d0.161067f.30f43f48@aol.com>

andrew said:
>   a few other early PG texts, from the Walnut Creek CD ROM.
>    Any chance of finding one of those still floating around somewhere?

i did some searching for a walnut-creek c.d. a long time back,
but got nowhere.   but maybe an ebay expert could help you...

however, mr. noring already has solid evidence that predates
that cd-rom. what he's looking for is even _earlier_ evidence...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060109/a927b75a/attachment.html
From Bowerbird at aol.com  Mon Jan  9 14:46:44 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Jan  9 14:47:29 2006
Subject: [gutvol-d] my impression
Message-ID: <1fc.103cdcc2.30f441d4@aol.com>

my impression is that mr. noring would like to
knock michael hart down a few notches, and
that's why he's doing his "historical research"...

still, those of us who were promoting e-books
back in the '80s know who was leading the pack.
it was michael hart.

and not only was he not getting any _credit_ then,
he was actually derided as something of a kook...

which he _is_, of course, but the kooks are often
the people who end up transforming our world...        :+)

so to try and strip him of his credit here, now that
we finally have come to realize his genius, well...
it is downright cruel.   small-minded and cruel...

with due respect to the dreamers who came before
and handed to us the _idea_ of electronic-books,
including alan kaye, h.g. wells, and douglas adams,
there is no question who _invented_ the e-book,
by virtue of sitting down and actually entering one:
it's michael hart...

as one of the greatest inventors who ever lived said,
only 1% is inspiration, the other 99% is perspiration.

plus michael hart gave us something even better --
the concept of "unlimited distribution" of e-books...

compared to the _commercial_ e-book efforts,
which somehow noring wants on equal footing,
look how many more riches _that_ idea gave us.

-bowerbird

p.s.   along these lines, consider this from alan kaye:
>    "We're running on fumes technologically today,"
>    he says. "The sad truth is that 20 years or so of
>    commercialization have almost completely missed 
>    the point of what personal computing is about."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060109/035b7817/attachment.html
From gbnewby at pglaf.org  Mon Jan  9 15:38:15 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon Jan  9 15:38:17 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus;
	Vannevar Bush; Asimov
In-Reply-To: <1675677963.20060109133820@noring.name>
References: <1675677963.20060109133820@noring.name>
Message-ID: <20060109233815.GB21426@pglaf.org>

> ...
> I was hoping, though, to find a much older text version of "Alice".
> Bowerbird stated his belief that Michael Hart keypunched Alice a lot
> earlier than 1990/91, but so far I have not found that version,
> assuming it was distributed and ended up in some BBS or online
> archive.

I got my first copy of Alice via email in about 1988, or possibly
a little earlier (it was no later than June of 1988, because
I got it while still at SUNY Albany).

Unfortunately the file I got is lost, among some other old
emails on 9-track tapes that didn't make one of my institutional
transitions.

It was the same (if earlier) Millenium Fulcrum edition that
was in the PG collection, though at that time I had never
heard of PG.

(When I arrived at UIUC in 1991, I read a local newspaper
article about Michael & PG and looked him up.  The rest,
as they say, is history!).

  -- Greg
From jon at noring.name  Mon Jan  9 15:48:35 2006
From: jon at noring.name (Jon Noring)
Date: Mon Jan  9 15:49:15 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus;
	Vannevar Bush; Asimov
In-Reply-To: <20060109233815.GB21426@pglaf.org>
References: <1675677963.20060109133820@noring.name>
	<20060109233815.GB21426@pglaf.org>
Message-ID: <80308709.20060109164835@noring.name>

Greg wrote:

>> I was hoping, though, to find a much older text version of "Alice".
>> Bowerbird stated his belief that Michael Hart keypunched Alice a lot
>> earlier than 1990/91, but so far I have not found that version,
>> assuming it was distributed and ended up in some BBS or online
>> archive.

> I got my first copy of Alice via email in about 1988, or possibly
> a little earlier (it was no later than June of 1988, because
> I got it while still at SUNY Albany).

Thanks! This is useful information. I'll re-search in Google and see
if someone archived this older version. The version 1.1 I found is
dated (copyrighted) 1990, so I assume the version you had may have
been the original version 1.0. You might ask Michael to think back
when he first placed Alice online. It's a very historical fact.
Besides the KJV of the Bible, did Michael post to the Internet before
1990 any other classic works? (I'm not referring to the nine
political documents which form PG text #1-9, but more recognized book
works like "Alice".)


> Unfortunately the file I got is lost, among some other old
> emails on 9-track tapes that didn't make one of my institutional
> transitions.

Understood. I saved very little from the late 1980's when I first got
online (Usenet and various BBS).


> It was the same (if earlier) Millenium Fulcrum edition that
> was in the PG collection, though at that time I had never
> heard of PG.

Was the early GUTNBERG mailing list archived as well? When was that
list formed? The first mention I see of GUTNBERG is January 1990. I
don't believe Google has it archived, unfortunately. That would be a
treasure trove of information on the early days of PG, and probably
will mention, in passing, things that happened before GUTNBERG was
started.


Thanks again.

Jon

From gbnewby at pglaf.org  Mon Jan  9 16:22:56 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon Jan  9 16:22:56 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus;
	Vannevar Bush; Asimov
In-Reply-To: <80308709.20060109164835@noring.name>
References: <1675677963.20060109133820@noring.name>
	<20060109233815.GB21426@pglaf.org>
	<80308709.20060109164835@noring.name>
Message-ID: <20060110002256.GA27181@pglaf.org>

On Mon, Jan 09, 2006 at 04:48:35PM -0700, Jon Noring wrote:
> Was the early GUTNBERG mailing list archived as well? When was that
> list formed? The first mention I see of GUTNBERG is January 1990. I
> don't believe Google has it archived, unfortunately. That would be a
> treasure trove of information on the early days of PG, and probably
> will mention, in passing, things that happened before GUTNBERG was
> started.

I don't think it was.

I believe our first automated list was on a LISTSERV run
at uiuc (listserv.cso.uiuc.edu).  That was from about
1992 or so through 1997 or earlier 1998).

Then, we moved to UNC, which used similar software.
listserv.unc.edu.  I do have the archives from that
period, somewhere.

It was only in 2003 or so that we moved to mailman
on lists.pglaf.org

The number & makeup of lists changed over that time.
Originally, the main (or only) purpose was for a
monthly newsletter or similar, plus announcements
of new titles.  

I am not aware of any archives we got from the UIUC
LISTSERV lists.  I'm sure some folks have their own
personal copies, though.
  -- Greg

From jon at noring.name  Mon Jan  9 16:47:51 2006
From: jon at noring.name (Jon Noring)
Date: Mon Jan  9 16:48:30 2006
Subject: [gutvol-d] my impression
In-Reply-To: <1fc.103cdcc2.30f441d4@aol.com>
References: <1fc.103cdcc2.30f441d4@aol.com>
Message-ID: <237975147.20060109174751@noring.name>

Bowerbird wrote:

> my impression is that mr. noring would like to
>  knock michael hart down a few notches, and
>  that's why he's doing his "historical research"...

The important thing is to gather the facts as best as can be done from
primary sources, and the recollections of those who lived then. Let
the facts speak for themselves. I welcome, and want, Michael Hart, for
example, to provide specific information of the 1971 to 1989-90 period,
such as what texts he keyed in and distributed on various networks
(the fledgling Internet and BBS), besides the nine "political" ones
which form PG texts #1-9 (like the Declaration of Independence -- not
that I have anything against these texts, but they are not book length
works.)

I hope that a few PGers will take an interest in this evolving project
and submit their tidbits of ebook lore and history (and hopefully
references) to the ebook-history group. Bowerbird, you are welcome to
add your own knowledge of the old-days.

Anyway, if I'm guilty of anything, it's that I want to get to the
bottom of the truth, so everything can be put into its proper and
correct perspective, whatever that may turn out to be.


>  still, those of us who were promoting e-books
>  back in the '80s know who was leading the pack.
>  it was michael hart.

What I'm still having trouble finding is any pre-1989 references to
Michael Hart in relation to his text activities. On Usenet, it is a
total blank (according to Google groups, which has archived Usenet and
Bitnet back before 1985.) The first mention I've found of "Project
Gutenberg" on Usenet is a message posted 15 Jan 1990, which contains a
few messages written by Michael Hart on 20 Dec 1989 talking about
Project Gutenberg, and that it is #1 in a series:

   http://groups.google.com/group/soc.culture.esperanto/msg/5e7ebc23a2866f1b?dmode=source&hl=en

People like Bowerbird who were plying the BBS back in the late 80's
may vaguely remember some things, but I'd like to know specifics. Were
there newspaper and magazine articles covering MH? Were any of his
texts, besides the short political ones that form PG texts #1-9, being
distributed around BBS and ftp sites in the late 1980's? I simply see
a dearth of information. Even Michael Hart's various bios, including
his wikipedia, are fuzzy on what he did between 1971 and 1988/9 other
than to say that technology was not quite there to do anything major,
including with volunteers. When we see the first mention of PG in
1990, PG rapidly grew after that, and the rest is history. I can only
surmise that even if Michael were thinking about launching PG (in a
"let's get thousands of volunteers to type in books" way) in the
mid-80's, he did not do so -- he waited for technology to hit a
critical point. This is understandable. Paraphrasing Ecclesiastes:
"there's a time and place for everything under the sun" -- and at
least from the evidence I've seen, he really didn't go gangbusters on
the volunteer-driven Project Gutenberg vision until very late in 1989.

Yet I know there were commercial ebook projects in the 1986 time
frame, and I think even before. A Turkish scholar digitized the complete
works of a famous Turkish poet in 1986. Lots and lots of stuff, but
nothing about Michael Hart in the same mid- to late- 80's time frame.

Help me, people, let me know what's missing... Would love to get the
original GUTNBERG Bitnet list archive, if it exists before 1990. That
should be placed online (if not already).


>  so to try and strip him of his credit here, now that
>  we finally have come to realize his genius, well...
>  it is downright cruel.?  small-minded and cruel...

So you are saying the history of ebooks should not be studied in full
because it might clarify everyone's actual role in the history of the
ebook?

As I noted, I welcome *everyone* to submit ebook lore tidbits,
especially for the pre-1990 period. I'm not really planning to write
the actual history, but to help collect the bits and pieces for either
a historian to write, or a *group* of us to write for the wikipedia.

The truth is out there, as Mulder would say. What really happened is
what should be documented. No more, no less.


>  with due respect to the dreamers who came before
>  and handed to us the _idea_ of electronic-books,
>  including alan kaye, h.g. wells, and douglas adams,
>  there is no question who _invented_ the e-book,
>  by virtue of sitting down and actually entering one:
>  it's michael hart...

He may have been the first to experiment with placing a short text
on a computer for the purpose that others may electronically access it
for reading. But I'm not so sure of that. With Brown University
keypunching in over 1,000,000 words in the mid-1960's, plus the
various types of pioneering electronic text research they and others
were doing at the time, someone might have experimented with this, and
possibly even wrote about it in journal articles.

That it is not reported may be due to them not seeing the importance
of it right away (it was the 1960's -- even by 1971 visualization
hardware on computers greatly improved -- my wife worked in the early
computer hardware days in an image processing lab.) When I worked as a
research associate at the University of Minnesota, I investigated a
lot of odd things that never got published, but on rare occasion have
shared that 20 years after the fact with interested researchers who
*were* interested in it.

I believe Michael Hart when he says he did what he did back in 1971.
But did he publish it? When was that information first made "public"?
And this is not proof that he was the first to experiment with it. The
only thing we are sure of is that today, no one else has stepped
forward to claim having done it earlier. That's why I'm hoping to talk
to those involved with the Brown research in the 1960's and 70's since
they probably know a lot of interesting tidbits of who did what around
the world in the late 60's and early 70's regarding electronic text
research and ideas related to what we today define to be an "ebook".

*****

It is interesting that in my search, the first use of the phrase
"Project Gutenberg" as found in Google Groups was in 1987 by the Atari
Corporation! I just talked with the guy, Art Morgan, who led the project
at Atari at the time using that project name. He wrote, in part,

   "...I used "Project Gutenberg" as the internal code name for the
   Atari SLM804 Laser Printer.  We (Atari) never trademarked the
   name, and used it for briefings to user groups and the press shortly
   before the product launch of the Atari desktop publishing system.
   Since we didn't commercially use the name, Mr. Hart probably had no
   knowledge that Atari was using it.  That is, unless he owned an Atari
   ST and was plugged into the Atari user community at the time."

Mr. Morgan also seemed to imply that he never even heard of Michael Hart's
Project Gutenberg even up to today (have another email in with him to
clarify that. Definitely it does not seem like there was any
communication between Atari and Michael at that time.)


>  plus michael hart gave us something even better --
>  the concept of "unlimited distribution" of e-books...

The biggest contribution Michael Hart gave is that he promoted with a
zealousness unmatched by anyone else the need to digitize public
domain books and distribute them for free to everyone and anyone, and
organizing volunteers to help make this a reality. This is his legacy
and place in the ebook universe, and what a wonderful legacy it is.

I currently believe that in public discourse Barry Shein and his
"KiloMonkeys" (later OBI) proposal (from Sept 1989) beat MH to the
punch in the public airing of the idea which includes volunteerism
(subject to change as new evidence surfaces.) But Michael Hart made it
happen. (To be fair to Barry, he was diverted in running the
world.std.com ISP, while Michael threw himself full-time into PG,
which is necessary to run a network of volunteers, so that's why the
Online Book Initiative never gained the same traction as PG did in the
early 1990's. Michael probably has more interesting info to share
about Barry Shein and his KiloMonkeys proposal in 1989. Maybe Michael
did publicly propose the PG idea earlier than Barry Shein's
"KiloMonkeys", but I've not found any mention in the Google database,
nor in any of the bios on Michael.)



>  compared to the _commercial_ e-book efforts,
>  which somehow noring wants on equal footing,
>  look how many more riches _that_ idea gave us.

All aspects of digital publications, both copyrighted and public
domain, are important when gathering the history of ebooks and digital
publications. It's a complex, multi-faceted area with many players. I
do believe when the final history is written, it will be very much
like the automobile in complexity, seminal events and individuals.

Jon Noring


(p.s., doing a quick check on Google looking for the archive of the
GUTNBERG list, bit.listserv.gutnberg  -- Google groups has 717
messages for this group, and the oldest, a cross-post to the
rec.arts.books, dated 17 July 1990, is a request for an online copy
of the "Taming of the Shrew". So, again, I somehow believe this group,
intended for use by the volunteers, was not around before 1990 or so.
But let me know if it was!)

From jon at noring.name  Mon Jan  9 16:52:29 2006
From: jon at noring.name (Jon Noring)
Date: Mon Jan  9 16:53:07 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus;
	Vannevar Bush; Asimov
In-Reply-To: <20060110002256.GA27181@pglaf.org>
References: <1675677963.20060109133820@noring.name>
	<20060109233815.GB21426@pglaf.org>
	<80308709.20060109164835@noring.name>
	<20060110002256.GA27181@pglaf.org>
Message-ID: <369937765.20060109175229@noring.name>

Greg wrote:
> Jon Noring wrote:

>> Was the early GUTNBERG mailing list archived as well? When was that
>> list formed? The first mention I see of GUTNBERG is January 1990. I
>> don't believe Google has it archived, unfortunately. That would be a
>> treasure trove of information on the early days of PG, and probably
>> will mention, in passing, things that happened before GUTNBERG was
>> started.

> I don't think it was.
>
> I believe our first automated list was on a LISTSERV run
> at uiuc (listserv.cso.uiuc.edu).  That was from about
> 1992 or so through 1997 or earlier 1998).

Oh well. As noted in another message I just posted, a quick Google
Groups search shows that it has archived 717 messages for the group
bit.listserv.gutnberg. The first mention is from July 1990, and the
rest start in 1991.

I'm sure the early GUTNBERG list would be fascinating to follow, and
itself contain historical information. Do you know when the GUTNBERG
list was actually started?


> I am not aware of any archives we got from the UIUC
> LISTSERV lists.  I'm sure some folks have their own
> personal copies, though.

Well, hopefully someone saved the early GUTNBERG archive. If so, be
sure to let Greg know.

Jon

From greg at durendal.org  Mon Jan  9 16:34:22 2006
From: greg at durendal.org (Greg Weeks)
Date: Mon Jan  9 17:00:38 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown
	Corpus; Vannevar Bush; Asimov
In-Reply-To: <Pine.GSO.4.58.0601091336500.25003@vtn1.victoria.tc.ca>
References: <1675677963.20060109133820@noring.name>
	<Pine.GSO.4.58.0601091336500.25003@vtn1.victoria.tc.ca>
Message-ID: <Pine.LNX.4.63.0601091933390.27046@durendal.durendal.org>

On Mon, 9 Jan 2006, Andrew Sly wrote:

> Mentions using alice10.txt, as well as a few other early
> PG texts, from the the Walnut Creek CD ROM.

I believe I have a copy of this, but I'm not sure what vintage. I'll have 
to look.

-- 
Greg Weeks
http://durendal.org:8080/greg/

From jon at noring.name  Mon Jan  9 19:05:41 2006
From: jon at noring.name (Jon Noring)
Date: Mon Jan  9 19:06:22 2006
Subject: [gutvol-d] Further comments by the Atari "Project Gutenberg" team
	leader
Message-ID: <11610376783.20060109200541@noring.name>

Everyone,

In a prior message this afternoon I noted that the first mention of
the phrase "Project Gutenberg" in Google Groups appeared in 1987, and
had nothing to do with the PG we know. It was used internally by the
Atari Corporation to describe a new laser printer. I asked Art Morgan,
who headed up that project, and whose name is associated with the 1987
message, to clarify what he wrote, and why he chose the name "Project
Gutenberg", which I know everyone here will relate to. With his
permission, he said:


"Yes, I personally came up with the name since I was team lead on the
SLM804 project.  We didn't assign the product a model number until
fairly late in its development.  I had to give a "sneak peek" talk on
it, so I dubbed it Project Gutenberg. In 1987, Apple had the only true
desktop publishing system around, way before HP started selling laser
printers for PCs and commoditized them. Unfortunately, Apple's laser
printer was a computer in itself, and they had to charge a premium for
this redundancy.

"Atari's CEO, Jack Tramiel, gave us the edict to create a laser
printer "for the masses, not the classes".  I came up with the idea to
have the host system perform all the RIP (raster image processing)
functions of the printer, and just "pump" the final bitmap image of
the page to the printer.  This would require only a "dumb" laser
printer engine to get the job done.  The Atari ST was the perfect
printer host - it was based on the Motorola 68000 and had gobs of
memory - exactly the platform found in Apple's laser printers.

"It was easy to talk to Adobe to port their code to the ST since it
was developed on the same 68000 platform.  But, to lower the costs
further, we went with a PostScript clone from Imagen, and a printer
engine from TEC (not Canon).  Anyway, we should have patented the
whole lot because NeXT later used the host-based laser printer idea
for their system, and now Dell offers one for PCs.

"Why did I pick Gutenberg?  Gutenberg's moveable type technology
enabled the printing of low-cost bibles and brought the word of God to
everyone, just as Atari's unique RIP technology brought desktop
publishing to the masses ..."


(In a followup reply, Art mentioned:)

"Sorry to admit it, but I didn't know about Michael Hart until you
told me about him.  He sounds like quite a visionary fellow - like Ted
Nelson or Alan Kay.

"Feel free to use my text - I'm honored and flattered!  Take care!"



For those interested...

Jon


From tb at baechler.net  Mon Jan  9 23:58:31 2006
From: tb at baechler.net (Tony Baechler)
Date: Tue Jan 10 00:04:41 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown
	Corpus; Vannevar Bush; Asimov
In-Reply-To: <Pine.GSO.4.58.0601091336500.25003@vtn1.victoria.tc.ca>
References: <1675677963.20060109133820@noring.name>
	<Pine.GSO.4.58.0601091336500.25003@vtn1.victoria.tc.ca>
Message-ID: <7.0.1.0.2.20060109235503.037acc60@baechler.net>

Hello.  I have the following.  Write me off list if interested.  I 
think as many of the early files should be saved as possible.  I 
don't have alice10 but it might be floating around online somewhere.

etext90:

ALL11.ZIP
ALL7011.ZIP
BILL11.ZIP
CONST11.ZIP
GETTY11.ZIP
JFK11.ZIP
KJV10.ZIP
LIBER11.ZIP
LINC111.ZIP
LINC211.ZIP
MAYFL11.ZIP
WHEN11.ZIP

etext91:

AESOP10.ZIP
AESOP11.ZIP
ALICE30.ZIP
feder16.zip
HISONG12.ZIP
LGLASS18.ZIP
lglass19.zip
moby.zip
MOBYNO.ZIP
PETER15A.ZIP
PETER16.ZIP
PLBOSS10.ZIP
ROGET13.ZIP
ROGET13A.ZIP
roget14.zip
ROGET14A.ZIP
roget15a.zip
SNARK12.ZIP
WORLD12.ZIP


The plboss10.zip is particularly interesting because it contains a 
brief note from Judy Boss but none of the usual PG headers.  As far 
as the standard PG headers go, this seems to be the oldest.  It came 
from my Walnut Creek CD-ROM.  I am not sure about the other titles 
but I used to have an older copy of Alice around somewhere.

From tb at baechler.net  Tue Jan 10 00:07:36 2006
From: tb at baechler.net (Tony Baechler)
Date: Tue Jan 10 00:07:05 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown
	Corpus; Vannevar Bush; Asimov
In-Reply-To: <1675677963.20060109133820@noring.name>
References: <1675677963.20060109133820@noring.name>
Message-ID: <7.0.1.0.2.20060110000514.037d8eb0@baechler.net>

Hello.  One other source of old books is OBI, or the Online Books 
Initiative.  PG borrowed some books from them including Moby.  Also 
there was Wiretap.  OBI used to be at ftp://world.std.com/ but I 
think it's long gone.  However, it was reasonably famous for its time 
so it might be archived somewhere.  I remember browsing it when PG 
was much smaller, even in 1996 or so. Also ftp.ibiblio.org has quite 
a bit of old articles and such.

From sly at victoria.tc.ca  Tue Jan 10 00:36:57 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Jan 10 00:37:37 2006
Subject: [gutvol-d] Pre-1990 ebook history
In-Reply-To: <80308709.20060109164835@noring.name>
References: <1675677963.20060109133820@noring.name>
	<20060109233815.GB21426@pglaf.org>
	<80308709.20060109164835@noring.name>
Message-ID: <Pine.GSO.4.58.0601100035460.9505@vtn1.victoria.tc.ca>

Jon:

Here's a very promising lead for you to follow up.

Check out the notes at the beginning of Paradise Lost
http://www.gutenberg.org/etext/26

"This etext was originally created in 1964-1965 according to Dr.
Joseph Raben of Queens College, NY..."


At a closer look, "edition 12" appears to be the only
one right now in the main PG archive.
http://www.gutenberg.org/dirs/etext92/plrabn12.txt

However, a search for "plrabn10.txt" and "plrabn11.txt"
finds some sites that still have them.

Andrew
From gbnewby at pglaf.org  Tue Jan 10 00:52:19 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue Jan 10 00:52:20 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus;
	Vannevar Bush; Asimov
In-Reply-To: <7.0.1.0.2.20060110000514.037d8eb0@baechler.net>
References: <1675677963.20060109133820@noring.name>
	<7.0.1.0.2.20060110000514.037d8eb0@baechler.net>
Message-ID: <20060110085219.GA1275@pglaf.org>

On Tue, Jan 10, 2006 at 12:07:36AM -0800, Tony Baechler wrote:
> Hello.  One other source of old books is OBI, or the Online Books 
> Initiative.  PG borrowed some books from them including Moby.  Also 
> there was Wiretap.  OBI used to be at ftp://world.std.com/ but I 
> think it's long gone.  However, it was reasonably famous for its time 
> so it might be archived somewhere.  I remember browsing it when PG 
> was much smaller, even in 1996 or so. Also ftp.ibiblio.org has quite 
> a bit of old articles and such.

Spies in the wire
http://wiretap.area.com/Gopher/

I have a local copy from a few years ago, but it looks
about the same.

When wiretap was active, they took some stuff from PG,
and vice-versa.  Most of the eBook content from Wiretap is
now in textfiles.org, I think.
  -- Greg
From greg at durendal.org  Tue Jan 10 04:39:14 2006
From: greg at durendal.org (Greg Weeks)
Date: Tue Jan 10 05:00:04 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown
	Corpus; Vannevar Bush; Asimov
In-Reply-To: <Pine.LNX.4.63.0601091933390.27046@durendal.durendal.org>
References: <1675677963.20060109133820@noring.name>
	<Pine.GSO.4.58.0601091336500.25003@vtn1.victoria.tc.ca>
	<Pine.LNX.4.63.0601091933390.27046@durendal.durendal.org>
Message-ID: <Pine.LNX.4.63.0601100738030.28889@durendal.durendal.org>

On Mon, 9 Jan 2006, Greg Weeks wrote:

> On Mon, 9 Jan 2006, Andrew Sly wrote:
>
>> Mentions using alice10.txt, as well as a few other early
>> PG texts, from the the Walnut Creek CD ROM.
>
> I believe I have a copy of this, but I'm not sure what vintage. I'll have to 
> look.

The one I have is dated 1992. It probably doesn't have any of the older 
stuff on it. I also found my copy of "The Library of the Future"

-- 
Greg Weeks
http://durendal.org:8080/greg/

From radicks at bellsouth.net  Mon Jan  9 17:51:56 2006
From: radicks at bellsouth.net (Dick Adicks)
Date: Tue Jan 10 07:07:34 2006
Subject: [gutvol-d] my impression
Message-ID: <BFE87D6C.438B%radicks@bellsouth.net>

See the introduction to PG etext #26, Paradise Lost, identified there as
"the oldest etext known to Project Gutenberg (ca. 1964-1965)"

Dick Adicks


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060109/2bad90a2/attachment.html
From hart at pglaf.org  Tue Jan 10 07:46:32 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue Jan 10 07:46:33 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown
	Corpus; Vannevar Bush; Asimov
In-Reply-To: <2d0.161067f.30f43f48@aol.com>
References: <2d0.161067f.30f43f48@aol.com>
Message-ID: <Pine.LNX.4.60.0601100745380.6457@pglaf.org>


On Mon, 9 Jan 2006 Bowerbird@aol.com wrote:

> andrew said:
>>   a few other early PG texts, from the Walnut Creek CD ROM.
>>    Any chance of finding one of those still floating around somewhere?
>
> i did some searching for a walnut-creek c.d. a long time back,
> but got nowhere.   but maybe an ebay expert could help you...
>
> however, mr. noring already has solid evidence that predates
> that cd-rom. what he's looking for is even _earlier_ evidence...
>
> -bowerbird
>

I still have some of the old Simtel and Walnut Creek CDROMS,
and even some of the floppies that predated those. . .somewhere.

Michael

From hart at pglaf.org  Tue Jan 10 08:01:59 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue Jan 10 08:02:01 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown
	Corpus; Vannevar Bush; Asimov
In-Reply-To: <Pine.LNX.4.63.0601100738030.28889@durendal.durendal.org>
References: <1675677963.20060109133820@noring.name>
	<Pine.GSO.4.58.0601091336500.25003@vtn1.victoria.tc.ca>
	<Pine.LNX.4.63.0601091933390.27046@durendal.durendal.org>
	<Pine.LNX.4.63.0601100738030.28889@durendal.durendal.org>
Message-ID: <Pine.LNX.4.60.0601100754520.6457@pglaf.org>


On Tue, 10 Jan 2006, Greg Weeks wrote:

> On Mon, 9 Jan 2006, Greg Weeks wrote:
>
>> On Mon, 9 Jan 2006, Andrew Sly wrote:
>> 
>>> Mentions using alice10.txt, as well as a few other early
>>> PG texts, from the the Walnut Creek CD ROM.
>> 
>> I believe I have a copy of this, but I'm not sure what vintage. I'll have 
>> to look.
>
> The one I have is dated 1992. It probably doesn't have any of the older stuff 
> on it. I also found my copy of "The Library of the Future"

Alice was possibly the first widely distributed eBook, but my recollection
doesn't seem to match everyone else's.

I recall completing it in 1988, and many of the files contain a notice
of the 1988 Millennium Fulcrum Edition. . .that was me, thinking outside
the box of the then current limitations of Project Gutenberg, which had
been mostly a History of Democracy sort of thing in the 1970's.

Some people, including our CEO Greg Newby, tell me that I released Alice
a few years earlier than that, perhaps in 1984, as Greg says he first saw
it around 1985 and that's how he first became aware of me and PG.

This IS possible, since I was a BBS Sysop in 1984-1985 and did put a LOT
of the early Project Gutenberg works on the Champaign County Computer Club
BBS during those years for free download, which might have included the
earliest versions of Alice.  My own recollection is that I did Alice
no earlier than 1985, after I moved into this house in September,
because I seem to recall doing the typing and proofreading at this
very desk on an early incarnation of this same computer system.

60,900,000 hits for "e-book" OR ebook OR ebooks.

60,900,000 hits for bomb.

Give eBooks in 2006!!!


Michael S. Hart
Founder
Project Gutenberg

From jon at noring.name  Tue Jan 10 08:10:19 2006
From: jon at noring.name (Jon Noring)
Date: Tue Jan 10 08:10:34 2006
Subject: [gutvol-d] Intro from PG text #26 (source text from 1964-5)
In-Reply-To: <BFE87D6C.438B%radicks@bellsouth.net>
References: <BFE87D6C.438B%radicks@bellsouth.net>
Message-ID: <1291928104.20060110091019@noring.name>

Dick wrote:

> See the introduction to PG etext #26, Paradise Lost, identified
> there as "the oldest etext known to Project Gutenberg (ca. 1964-1965)"

Wow!

Here's the Intro of that text detailing the source of the original
etext. Undoubtedly Michael Hart wrote this introduction since it is
nicely right-justified. <smile/> A comment and question below.

*******************************************************************

Introduction  (one page)
 
This etext was originally created in 1964-1965 according to Dr. 
Joseph Raben of Queens College, NY, to whom it is attributed by 
Project Gutenberg.  We had heard of this etext for years but it 
was not until 1991 that we actually managed to track it down to 
a specific location, and then it took months to convince people 
to let us have a copy, then more months for them actually to do 
the copying and get it to us.  Then another month to convert to 
something we could massage with our favorite 486 in DOS.  After 
that is was only a matter of days to get it into this shape you 
will see below.  The original was, of course, in CAPS only, and 
so were all the other etexts of the 60's and early 70's.  Don't 
let anyone fool you into thinking any etext with both upper and 
lower case is an original; all those original Project Gutenberg 
etexts were also in upper case and were translated or rewritten 
many times to get them into their current condition.  They have 
been worked on by many people throughout the world. 
 
In the course of our searches for Professor Raben and his etext 
we were never able to determine where copies were or which of a 
variety of editions he may have used as a source.  We did get a 
little information here and there, but even after we received a 
copy of the etext we were unwilling to release it without first 
determining that it was in fact Public Domain and finding Raben 
to verify this and get his permission.  Interested enough, in a 
totally unrelated action to our searches for him, the professor 
subscribed to the Project Gutenberg listserver and we happened, 
by accident, to notice his name. (We don't really look at every 
subscription request as the computers usually handle them.) The 
etext was then properly identified, copyright analyzed, and the 
current edition prepared. 
 
To give you an estimation of the difference in the original and 
what we have today:  the original was probably entered on cards 
commonly known at the time as "IBM cards" (Do Not Fold, Spindle 
or Mutilate) and probably took in excess of 100,000 of them.  A 
single card could hold 80 characters (hence 80 characters is an 
accepted standard for so many computer margins), and the entire 
original edition we received in all caps was over 800,000 chars 
in length, including line enumeration, symbols for caps and the 
punctuation marks, etc., since they were not available keyboard 
characters at the time (probably the keyboards operated at baud 
rates of around 113, meaning the typists had to type slowly for 
the keyboard to keep up). 

*******************************************************************


Am I right to assume that this etext was originally punched in for
lexical (text) analysis? That time frame corresponds to when the Brown
Corpus was started.

What other complete texts of books were rumored (or known to be)
"digitized" (such as it is on punch cards) in the 1960's and early
70's?


Thanks.

Jon

From jon at noring.name  Tue Jan 10 08:20:00 2006
From: jon at noring.name (Jon Noring)
Date: Tue Jan 10 08:20:11 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus;
	Vannevar Bush; Asimov
In-Reply-To: <Pine.LNX.4.60.0601100754520.6457@pglaf.org>
References: <1675677963.20060109133820@noring.name>
	<Pine.GSO.4.58.0601091336500.25003@vtn1.victoria.tc.ca>
	<Pine.LNX.4.63.0601091933390.27046@durendal.durendal.org>
	<Pine.LNX.4.63.0601100738030.28889@durendal.durendal.org>
	<Pine.LNX.4.60.0601100754520.6457@pglaf.org>
Message-ID: <641093536.20060110092000@noring.name>

Michael Hart wrote:

> Alice was possibly the first widely distributed eBook, but my recollection
> doesn't seem to match everyone else's.
>
> I recall completing it in 1988, and many of the files contain a notice
> of the 1988 Millennium Fulcrum Edition. . .that was me, thinking outside
> the box of the then current limitations of Project Gutenberg, which had
> been mostly a History of Democracy sort of thing in the 1970's.
>
> Some people, including our CEO Greg Newby, tell me that I released Alice
> a few years earlier than that, perhaps in 1984, as Greg says he first saw
> it around 1985 and that's how he first became aware of me and PG.
>
> This IS possible, since I was a BBS Sysop in 1984-1985 and did put a LOT
> of the early Project Gutenberg works on the Champaign County Computer Club
> BBS during those years for free download, which might have included the
> earliest versions of Alice.  My own recollection is that I did Alice
> no earlier than 1985, after I moved into this house in September,
> because I seem to recall doing the typing and proofreading at this
> very desk on an early incarnation of this same computer system.

Thanks for the details, Michael. Very good historical information.
I've cross-posted this reply to the ebook-history list so it may be
preserved.

What other books, besides the "history of democracy type texts", do
you recall placing on the Champaign County Computer Club BBS in
1984-85?

Jon

From hart at pglaf.org  Tue Jan 10 08:33:24 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue Jan 10 08:33:25 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown
	Corpus; Vannevar Bush; Asimov
In-Reply-To: <20060110002256.GA27181@pglaf.org>
References: <1675677963.20060109133820@noring.name>
	<20060109233815.GB21426@pglaf.org>
	<80308709.20060109164835@noring.name>
	<20060110002256.GA27181@pglaf.org>
Message-ID: <Pine.LNX.4.60.0601100830450.6457@pglaf.org>


On Mon, 9 Jan 2006, Greg Newby wrote:

> On Mon, Jan 09, 2006 at 04:48:35PM -0700, Jon Noring wrote:
>> Was the early GUTNBERG mailing list archived as well? When was that
>> list formed? The first mention I see of GUTNBERG is January 1990. I
>> don't believe Google has it archived, unfortunately. That would be a
>> treasure trove of information on the early days of PG, and probably
>> will mention, in passing, things that happened before GUTNBERG was
>> started.
>
> I don't think it was.
>
> I believe our first automated list was on a LISTSERV run
> at uiuc (listserv.cso.uiuc.edu).  That was from about
> 1992 or so through 1997 or earlier 1998).
>
> Then, we moved to UNC, which used similar software.
> listserv.unc.edu.  I do have the archives from that
> period, somewhere.
>
> It was only in 2003 or so that we moved to mailman
> on lists.pglaf.org
>
> The number & makeup of lists changed over that time.
> Originally, the main (or only) purpose was for a
> monthly newsletter or similar, plus announcements
> of new titles.
>
> I am not aware of any archives we got from the UIUC
> LISTSERV lists.  I'm sure some folks have their own
> personal copies, though.
>  -- Greg

I think the first PG Newsletters went out from one of
the IBM mainframes at the UI in 1988 or 1989.

We moved several times. . .I remember at least two:

vmd.cso.uiuc.edu
and
vme.cso.uiuc.edu

before we moved to the UNIX machines.

I'm not sure if we ever ran from vmc.cso.uiuc.edu.


Michael

From schultzk at uni-trier.de  Wed Jan 11 00:22:13 2006
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Wed Jan 11 01:05:50 2006
Subject: [gutvol-d] Intro from PG text #26 (source text from 1964-5)
In-Reply-To: <1291928104.20060110091019@noring.name>
References: <BFE87D6C.438B%radicks@bellsouth.net>
	<1291928104.20060110091019@noring.name>
Message-ID: <EFA14F10-7FBE-4B73-B4DB-885EA5E75175@uni-trier.de>

Hi Everybody,

	I hate to disapoint everybody, but there are even older
	"etexts" than this! Though I have to admit that back then they
	were not called etexts !!? They were called corpera. They were not
	stored on disks or such mass storage system, but on punch cards and
	such.

	"ebooks" have been aroun since the mid 80s. They were programs that
	were dedicated to one book and its display. I have "The Hitchhiker's
	Guide to the Galaxy" somewhere in box. Anybody remeber the Apple Newton
	(also mid 80s) it also what would be termed today as ebooks. As a  
matter
	of fact I use to read the first PG etexts on my Newton.

		Just my 2 Euro cents worth
			Keith.

Am 10.01.2006 um 17:10 schrieb Jon Noring:

> Dick wrote:
>
>> See the introduction to PG etext #26, Paradise Lost, identified
>> there as "the oldest etext known to Project Gutenberg (ca.  
>> 1964-1965)"
>
> Wow!
>
> Here's the Intro of that text detailing the source of the original
> etext. Undoubtedly Michael Hart wrote this introduction since it is
> nicely right-justified. <smile/> A comment and question below.
>
> *******************************************************************
>
> Introduction  (one page)
>
> This etext was originally created in 1964-1965 according to Dr.
> Joseph Raben of Queens College, NY, to whom it is attributed by
> Project Gutenberg.  We had heard of this etext for years but it
> was not until 1991 that we actually managed to track it down to
> a specific location, and then it took months to convince people
> to let us have a copy, then more months for them actually to do
> the copying and get it to us.  Then another month to convert to
> something we could massage with our favorite 486 in DOS.  After
> that is was only a matter of days to get it into this shape you
> will see below.  The original was, of course, in CAPS only, and
> so were all the other etexts of the 60's and early 70's.  Don't
> let anyone fool you into thinking any etext with both upper and
> lower case is an original; all those original Project Gutenberg
> etexts were also in upper case and were translated or rewritten
> many times to get them into their current condition.  They have
> been worked on by many people throughout the world.
>
> In the course of our searches for Professor Raben and his etext
> we were never able to determine where copies were or which of a
> variety of editions he may have used as a source.  We did get a
> little information here and there, but even after we received a
> copy of the etext we were unwilling to release it without first
> determining that it was in fact Public Domain and finding Raben
> to verify this and get his permission.  Interested enough, in a
> totally unrelated action to our searches for him, the professor
> subscribed to the Project Gutenberg listserver and we happened,
> by accident, to notice his name. (We don't really look at every
> subscription request as the computers usually handle them.) The
> etext was then properly identified, copyright analyzed, and the
> current edition prepared.
>
> To give you an estimation of the difference in the original and
> what we have today:  the original was probably entered on cards
> commonly known at the time as "IBM cards" (Do Not Fold, Spindle
> or Mutilate) and probably took in excess of 100,000 of them.  A
> single card could hold 80 characters (hence 80 characters is an
> accepted standard for so many computer margins), and the entire
> original edition we received in all caps was over 800,000 chars
> in length, including line enumeration, symbols for caps and the
> punctuation marks, etc., since they were not available keyboard
> characters at the time (probably the keyboards operated at baud
> rates of around 113, meaning the typists had to type slowly for
> the keyboard to keep up).
>
> *******************************************************************
>
>
> Am I right to assume that this etext was originally punched in for
> lexical (text) analysis? That time frame corresponds to when the Brown
> Corpus was started.
>
> What other complete texts of books were rumored (or known to be)
> "digitized" (such as it is on punch cards) in the 1960's and early
> 70's?
>
>
> Thanks.
>
> Jon
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From prosfilaes at gmail.com  Wed Jan 11 02:36:33 2006
From: prosfilaes at gmail.com (David Starner)
Date: Wed Jan 11 02:36:55 2006
Subject: [gutvol-d] Intro from PG text #26 (source text from 1964-5)
In-Reply-To: <EFA14F10-7FBE-4B73-B4DB-885EA5E75175@uni-trier.de>
References: <BFE87D6C.438B%radicks@bellsouth.net>
	<1291928104.20060110091019@noring.name>
	<EFA14F10-7FBE-4B73-B4DB-885EA5E75175@uni-trier.de>
Message-ID: <6d99d1fd0601110236y65437f09te5cdccb79e25869d@mail.gmail.com>

On 1/11/06, Keith J. Schultz <schultzk@uni-trier.de> wrote:
> Hi Everybody,
>
>        I hate to disapoint everybody, but there are even older
>        "etexts" than this! Though I have to admit that back then they
>        were not called etexts !!? They were called corpera. They were not
>        stored on disks or such mass storage system, but on punch cards and
>        such.

According to the intro, etext #26 was probably entered on cards in
64-65. The Brown Corpus (I'm guessing 1962, since the Wikipedia
article doesn't really say) didn't really include etexts, since it was
2000 word samples, not entire texts. Given the memory size and cost of
early computers, and the fact that Wikipedia says the "Brown Corpus
pioneered the field of corpus linguistics", I'd like some evidence
that there were older etexts.
From tb at baechler.net  Wed Jan 11 02:46:46 2006
From: tb at baechler.net (Tony Baechler)
Date: Wed Jan 11 02:45:55 2006
Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown
	Corpus; Vannevar Bush; Asimov
In-Reply-To: <Pine.LNX.4.60.0601100745380.6457@pglaf.org>
References: <2d0.161067f.30f43f48@aol.com>
	<Pine.LNX.4.60.0601100745380.6457@pglaf.org>
Message-ID: <7.0.1.0.2.20060111024416.031ca6e0@baechler.net>

Hello all and especially Michael.  I am very interested in old simtel 
CDs.  I am especially looking for simtel-20 archives which would be 
from the early 1990's.  Also it would be nice to find an old mirror 
of oak.oakland.edu which had the early PC-SIG, PC-BLUE, COMUG and 
large CP/M collections.  If anyone has any PC-SIG CDs, especially 
edition 12 or earlier, I am interested.  Contact me off list since 
this is off topic.

At 07:46 AM 1/10/2006, you wrote:
I still have some of the old Simtel and Walnut Creek CDROMS,
>and even some of the floppies that predated those. . .somewhere.
>
>Michael

From radicks at bellsouth.net  Wed Jan 11 07:44:05 2006
From: radicks at bellsouth.net (Dick Adicks)
Date: Wed Jan 11 07:44:11 2006
Subject: [gutvol-d] Early ebook history info wanted
Message-ID: <BFEA91F5.43BF%radicks@bellsouth.net>

When I was teaching at Georgia Tech from 1965 to 1968, Professor William
Mullen in the English department there was transferring the Episcopal
psalter to IBM cards, using ALGOL.  Bill inspired me to take an introductory
course in that computer language so that I could initiate a similar project,
but I went no further with it.  I don't know what become of his work, but he
had accumulated many boxes of cards, with one line to every card.

Dick Adicks

A small group of thoughtful people could change the world.  Indeed, it's the
only thing that ever has.--Margaret Mead


. . . if vicious people are united and form a power, honest people must do
the same.
                                                  --Leo Tolstoy


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060111/ce25442f/attachment.html
From traverso at dm.unipi.it  Wed Jan 11 11:22:56 2006
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Wed Jan 11 11:10:44 2006
Subject: [gutvol-d] Early ebook history info wanted
In-Reply-To: <BFEA91F5.43BF%radicks@bellsouth.net> (message from Dick Adicks
	on Wed, 11 Jan 2006 10:44:05 -0500)
References: <BFEA91F5.43BF%radicks@bellsouth.net>
Message-ID: <200601111922.k0BJMuA21996@pico.dm.unipi.it>


In Pisa, there is an institute of computational linguistics,
http://www.ilc.cnr.it/ who originated from a research group of the
national univerity computing center in 1967.

I think that there has been work even earlier, probably 1965, and I
remember that they began with the input of Dante's Commedia and the
creation of concordances. I can retrieve more accurate informations if
needed.

Carlo Traverso
From robsuth at robsuth.plus.com  Thu Jan 12 08:50:18 2006
From: robsuth at robsuth.plus.com (Robert Sutherland)
Date: Thu Jan 12 11:15:10 2006
Subject: [gutvol-d] Ebook reading devices
Message-ID: <6.2.3.4.1.20060112164044.02ce58d0@mail.plus.net>

Just to bring my enquiry of   8 June 05 uptodate, I continued my 
email/online enquiries as far as I could, but found no product among 
the DVD portable readers that would deal with text, nor was any of 
the manufacturers interested in producing one. Apart from the French 
reader Cybook (which I think is too expensive for the task and market 
although otherwise for the most part ideal) there seems still to be 
no special ebook reading device available in UK or the EU generally: 
laptops are still too big and heavy, and PDAs still have far too 
small a screen My enquiries confirmed my strong impression that there 
are protectionist interests holding this back, presumably in the 
interests of proprietary issues of ebooks. That is really rather to 
bury the head in the sand, especially now that Google are to set up 
their Library - if PG and Google are going to be fully useful a new 
device is inevitable. Can PG and Google not take a hand themselves to 
enable production of something they undoubtedly will need?

Robert Sutherland
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060112/74e87244/attachment.html
From hart at pglaf.org  Thu Jan 12 12:00:35 2006
From: hart at pglaf.org (Michael Hart)
Date: Thu Jan 12 12:00:37 2006
Subject: !@!Re: [gutvol-d] Ebook reading devices
In-Reply-To: <6.2.3.4.1.20060112164044.02ce58d0@mail.plus.net>
References: <6.2.3.4.1.20060112164044.02ce58d0@mail.plus.net>
Message-ID: <Pine.LNX.4.60.0601121141130.17130@pglaf.org>


On Thu, 12 Jan 2006, Robert Sutherland wrote:

> Just to bring my enquiry of   8 June 05 uptodate, I continued my email/online 
> enquiries as far as I could, but found no product among the DVD portable 
> readers that would deal with text, nor was any of the manufacturers 
> interested in producing one. Apart from the French reader Cybook (which I 
> think is too expensive for the task and market although otherwise for the 
> most part ideal) there seems still to be no special ebook reading device 
> available in UK or the EU generally: laptops are still too big and heavy,

Don't they have "notebook" computers that weigh hardly more than books?


> and PDAs still have far too small a screen

I've seen products such as Blackberrys and Treos that seem to have
screens at least twice as large as most PDAs, perhaps one of those
would be better.


> My enquiries confirmed my strong impression that there are protectionist 
> interests holding this back, presumably in the interests of proprietary 
> issues of ebooks.

Yes, it is all too obvious that virtually all the multibillion dollar
participants, from Google to Yahoo to Amazon to HarperCollins, and even
to The Library of Congress, do not seem to have "easy access" in mind,
but that is merely because they have no concept other than traditional
"business plans."

I think that just as Google came up with a different kind of business
plan based on a free product, that others will do so with free eBooks,
or eBooks so inexpensive that no one will worry about the cost.


> That is really rather to bury the head in the sand, especially now that 
> Google are to set up their Library - if PG and Google are going to be fully 
> useful a new device is inevitable. Can PG and Google not take a hand 
> themselves to enable production of something they undoubtedly will need?

Need?

But first, I can't put Google and PG in the same group for several reasons.

Google is worth over $100 billion, PG is barely worth an account sheet.

Google, after 13 months of high visibility press releases still has not
taken over more than a few percent of the eBook marketplace, and I still
haven't seen anything new from Yahoo, Amazon, HarperCollins, or even
The Library of Congress that makes me think they will be responsible
for a million eBooks between the lot of them before a million eBooks
are made simply and easily available by people beneath their radar.


Back to need. . . .

Of course, iPods and cellphones don't have large screens, but that
didn't stop people from reading eBooks on them, and even Apple has
to admit that they are selling 10 times as many iPods as computers.

1.25 million Apple computers sold in the last quarter.

14 million iPods.

But that barely scratches the surface of cellphone sales which are
in the range of 1 billion per year and only 100 million computers.

People are going to use what they have.

I just don't see a market for that dedicated eBook reader we have
heard talk about for ever so long, particularly at prices we must
say are equal to that of a cheap computer and usually filled with
stuff to keep us from doing much with free eBooks.

Michael
From hart at pglaf.org  Thu Jan 12 14:29:34 2006
From: hart at pglaf.org (Michael Hart)
Date: Thu Jan 12 14:29:36 2006
Subject: [gutvol-d] Early ebook history info wanted
In-Reply-To: <200601111922.k0BJMuA21996@pico.dm.unipi.it>
References: <BFEA91F5.43BF%radicks@bellsouth.net>
	<200601111922.k0BJMuA21996@pico.dm.unipi.it>
Message-ID: <Pine.LNX.4.60.0601121428000.20723@pglaf.org>


On Wed, 11 Jan 2006, Carlo Traverso wrote:

>
> In Pisa, there is an institute of computational linguistics,
> http://www.ilc.cnr.it/ who originated from a research group of the
> national univerity computing center in 1967.
>
> I think that there has been work even earlier, probably 1965, and I
> remember that they began with the input of Dante's Commedia and the
> creation of concordances. I can retrieve more accurate informations if
> needed.

Our first Webmaster, Pietro di Miceli, is Italian, living in Rome,
and I think he heard of this but that it was so walled off from any
public consumption that no one he knew could get to it, so it pretty
much remained in the area of rumor.

Michael

From imaclean at gmail.com  Fri Jan 13 00:54:27 2006
From: imaclean at gmail.com (Ian MacLean)
Date: Fri Jan 13 01:57:47 2006
Subject: [gutvol-d] Ebook reading devices
In-Reply-To: <6.2.3.4.1.20060112164044.02ce58d0@mail.plus.net>
References: <6.2.3.4.1.20060112164044.02ce58d0@mail.plus.net>
Message-ID: <3156339d0601130054p1cf6d6f9o4e8c1926528ce805@mail.gmail.com>

Have you seen the just announced ebook reader from Sony using a high
res e-ink display ?:

http://blogs.reuters.com/2006/01/04/glimpse-at-new-sony-reader/
http://products.sel.sony.com/pa/prs/reader_features.html

They claim to have done away with the ridculous restrictions built
into their previous reader ( the libre ) and it *should* be able to
ready txt and pdf, html although these will need to be converted into
Sony's proprietry BBeB format before being transferred to the device.

Another product out soon that will use the same e-ink technology is
the irex ER0100 :
http://www.epaper.org.uk/index.php?option=com_content&task=view&id=56&Itemid=2

which apparently will be able to display PDF, XHTML, or TXT without conversion.

Both these devices will probably be as expensive as the Cybook but
with much higher readability of e-ink displays.

Ian


On 1/13/06, Robert Sutherland <robsuth@robsuth.plus.com> wrote:
>  Just to bring my enquiry of   8 June 05 uptodate, I continued my
> email/online enquiries as far as I could, but found no product among the DVD
> portable readers that would deal with text, nor was any of the manufacturers
> interested in producing one. Apart from the French reader Cybook (which I
> think is too expensive for the task and market although otherwise for the
> most part ideal) there seems still to be no special ebook reading device
> available in UK or the EU generally: laptops are still too big and heavy,
> and PDAs still have far too small a screen My enquiries confirmed my strong
> impression that there are protectionist interests holding this back,
> presumably in the interests of proprietary issues of ebooks. That is really
> rather to bury the head in the sand, especially now that Google are to set
> up their Library - if PG and Google are going to be fully useful a new
> device is inevitable. Can PG and Google not take a hand themselves to enable
> production of something they undoubtedly will need?
>
>  Robert Sutherland
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>
From joshua at hutchinson.net  Fri Jan 13 06:44:02 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Jan 13 06:44:06 2006
Subject: [gutvol-d] Ebook reading devices
Message-ID: <20060113144402.B17F6EE36C@ws6-1.us4.outblaze.com>

Anyone heard if the BBeB format is open/documented?  And even better, if anyone has created an open source converter?  If these things take off, it would be nice to have the ability to generate files for them from our collection.  If there is an open source converter, there is a chance we could do such a thing right on the server.

Josh

----- Original Message -----
From: "Ian MacLean" <imaclean@gmail.com>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] Ebook reading devices
Date: Fri, 13 Jan 2006 17:54:27 +0900

> 
> Have you seen the just announced ebook reader from Sony using a high
> res e-ink display ?:
> 
> http://blogs.reuters.com/2006/01/04/glimpse-at-new-sony-reader/
> http://products.sel.sony.com/pa/prs/reader_features.html
> 
> They claim to have done away with the ridculous restrictions built
> into their previous reader ( the libre ) and it *should* be able to
> ready txt and pdf, html although these will need to be converted into
> Sony's proprietry BBeB format before being transferred to the device.
> 
> Another product out soon that will use the same e-ink technology is
> the irex ER0100 :
> http://www.epaper.org.uk/index.php?option=com_content&task=view&id=56&Itemid=2
> 
> which apparently will be able to display PDF, XHTML, or TXT without conversion.
> 
> Both these devices will probably be as expensive as the Cybook but
> with much higher readability of e-ink displays.
> 
> Ian
> 
> 
> On 1/13/06, Robert Sutherland <robsuth@robsuth.plus.com> wrote:
> >  Just to bring my enquiry of   8 June 05 uptodate, I continued my
> > email/online enquiries as far as I could, but found no product among the DVD
> > portable readers that would deal with text, nor was any of the manufacturers
> > interested in producing one. Apart from the French reader Cybook (which I
> > think is too expensive for the task and market although otherwise for the
> > most part ideal) there seems still to be no special ebook reading device
> > available in UK or the EU generally: laptops are still too big and heavy,
> > and PDAs still have far too small a screen My enquiries confirmed my strong
> > impression that there are protectionist interests holding this back,
> > presumably in the interests of proprietary issues of ebooks. That is really
> > rather to bury the head in the sand, especially now that Google are to set
> > up their Library - if PG and Google are going to be fully useful a new
> > device is inevitable. Can PG and Google not take a hand themselves to enable
> > production of something they undoubtedly will need?
> >
> >  Robert Sutherland
> >  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d@lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> >
> >
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From jon at noring.name  Fri Jan 13 07:32:33 2006
From: jon at noring.name (Jon Noring)
Date: Fri Jan 13 07:32:49 2006
Subject: [gutvol-d] Ebook reading devices
In-Reply-To: <20060113144402.B17F6EE36C@ws6-1.us4.outblaze.com>
References: <20060113144402.B17F6EE36C@ws6-1.us4.outblaze.com>
Message-ID: <1754496216.20060113083233@noring.name>

Joshua asked:

> Anyone heard if the BBeB format is open/documented?  And even
> better, if anyone has created an open source converter?  If these
> things take off, it would be nice to have the ability to generate
> files for them from our collection.  If there is an open source
> converter, there is a chance we could do such a thing right on the server.

My best understanding from following the Librie list, talking with the
"librie guy", and a few snippets of news releases, is that the BBeB
Xylog DTD/schema/spec is still unpublished, but that Sony plans to
publish (and maybe release as an "open standard") the format.

Looking at the incomplete Xylog schema used in the Librie which has
been reverse engineered, as well as a couple of Xylog XML documents,
has revealed some interesting tidbits:

1) It's an all-in-one XML document -- everything is dumped inside a
   single document, including images, metadata, etc. (I vaguely
   remember Microsoft trying to patent this idea. Anyone know?)

2) All the examples I've seen are text-encoded in UTF-16. This means
   either that UTF-16 is supported (along with hopefully UTF-8), or
   that it is required. This makes sense for the Japanese origin of
   the format where, I gather, UTF-16 is more efficient than UTF-8
   when encoding Han characters and such.

3) It does NOT use CSS -- rather it uses its own styling scheme which
   does not appear to completely map to CSS (or the mapping is very
   complex). The core model may not be the same as the CSS box model.

   This is troubling why they chose their own styling language rather
   than fully embrace some subset of CSS. Part of this may stem from
   the core layout model, which is faintly reminiscent of PDF.

4) The document structure is dirt simple. There are two types of "text
   blocks" supported, which is sort of analogous to a <div> box. Within
   a text block one can have one or more <P> (paragraphs), and there is
   a small supported set of inline tags. There does not appear, but
   I'm not certain (I can only go by what I've seen so far), to be
   support for defined structures such as tables, lists, blockquotes,
   and even headers. All these things have to be fitted within the
   text block/paragraph. This appears to make accessibility more
   difficult since there's no predefined semantics one can assign to
   the various structures (which could include, I suppose sidebars and
   stuff) so those using text-to-speech may have to figure out what's
   what without any machine-recognizable cues.)

Definitely, the Xylog vocabulary is not suitable for use as a "master"
format for etexts. It's more of a derivative format for primarily
visual presentation purposes.

Anyway, these are my impressions from incomplete information. Once the
BBeB Xylog schema is published, we'll know for sure. And it is
possible the schema used for the U.S. Sony may be updated from the one
used in the Librie.

It is sad that they ignored established standards (such as HTML,
OEBPS, CSS, TEI, etc.) and decided to roll their own. And so far I
don't see any innovations that makes it better for representing
digital publications. I see it as a step backwards. (To be fair, the
motivation for developing it maybe was to minimize hardware resource
requirements, so for that it may be innovative, but I see no other
advantages, not even in document conversion.)

We'll see...

YMMV.

Jon



From Bowerbird at aol.com  Fri Jan 13 09:36:04 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Jan 13 09:36:17 2006
Subject: [gutvol-d] Ebook reading devices
Message-ID: <db.352b72f3.30f93f04@aol.com>

ian said:
>   Have you seen the just announced ebook reader 
>    from Sony using a high res e-ink display ?

i might pay $350 for a handheld with web access.
but for a machine that's isolated from cyberspace?
um, no thanks.   next?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060113/33ab2c07/attachment.html
From marcello at perathoner.de  Fri Jan 13 14:34:49 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Jan 13 14:34:47 2006
Subject: [gutvol-d] PG Website Report for 2005
Message-ID: <43C82B09.2050205@perathoner.de>

User base and server load

The PG website is ranked at position 3,842 by Alexa (Dec 31, 2005). Note 
that Alexa is an Amazon company and thus bookish people are 
overrepresented in the Alexa user base.

While the absolute Alexa rank value is skewed in our favor, the trend 
Alexa gives is quite objective. Since Jan 1, 2005:

  - 2.5 times as many people are using PG and
  - PG is serving 2 times as many web pages.

To accomodate the increased load at better response times the site was 
redesigned for more static pages and better caching. Rarely used 
features like skins were dropped in favor of better performance.


Where do new users come from?

Most visitors find us thru search engines. But 10% come from wikipedia. 
The catalog team has been active in editing appropriate wikipedia 
articles to point to our ebooks. If you want more people to come to PG, 
help us editing more wikipedia articles.

Dec 2005
Visitors  %      Referrer URL

236011    24.20% www.google.com/
97795     10.03% en.wikipedia.org/
78527     8.05%  search.yahoo.com/
34796     3.57%  www.promo.net/
28997     2.97%  www.google.co.uk/
24810     2.54%  www.stumbleupon.com/
20355     2.09%  www.google.ca/
16598     1.70%  search.msn.com/
15289     1.57%  www.google.de/
13714     1.41%  www.google.co.in/


Search Terms

Visitors found us searching for these terms:

Dec 2005
Visitors  Search Term

24923     project gutenberg
18297     gutenberg
9391      gutenberg project
9176      free ebooks
6964      ebooks
4303      e books
4059      free e books
3807      free books
3489      online books
3348      project gutenburg

While most of these people knew beforehand what they were looking for, 
some of them found us by searching for "free ebooks" or similar generic 
terms.

I have rewritten our main page to push our ranking for the "free ebooks" 
search term. We are also well positioned in google for following generic 
search terms:

Dec 2005
Pos. Search Term

2    free ebooks
2    free books
5    ebooks
15   books

Some other sites rank better than PG because they have "ebook" in the 
domain name. This way all links to those sites must contain the target 
word "ebook". If you want to help pushing our rank, set links in your 
web pages pointing to the PG main page with the link text "free ebooks" 
like this:

   My favorite <a href="http://www.gutenberg.org">free ebooks</a> site.


Pretty Pictures

Snapshots of Alexa statistics on Dec 31, 2005:

Pageviews:
   http://www.gutenberg.org/internal/reports/2005/alexa.6yp.png

Reach:
   http://www.gutenberg.org/internal/reports/2005/alexa.6yr.png

Rank:
   http://www.gutenberg.org/internal/reports/2005/alexa.6yt.png

User: internal
Pass: books


-- 
Marcello Perathoner
webmaster@gutenberg.org

From imaclean at gmail.com  Fri Jan 13 21:07:13 2006
From: imaclean at gmail.com (Ian MacLean)
Date: Fri Jan 13 21:07:33 2006
Subject: [gutvol-d] Ebook reading devices
In-Reply-To: <db.352b72f3.30f93f04@aol.com>
References: <db.352b72f3.30f93f04@aol.com>
Message-ID: <3156339d0601132107g41a89d08j9b2be0b92c9bb62@mail.gmail.com>

On 1/14/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
> ian said:
>  >   Have you seen the just announced ebook reader
>  >   from Sony using a high res e-ink display ?
>
>  i might pay $350 for a handheld with web access.
>  but for a machine that's isolated from cyberspace?
>  um, no thanks.  next?

Fair enough - then the iRex device might be a better bet. And less proprietry.

http://www.cryptonomicon.net/msh/2005/12/eink-based-ebook-reader-to-ship-in.html
http://www.irextechnologies.com/downloads/Productleaflet-Iliad.pdf

"According to iRex, the Illiad will come with a digitizer and stylus
allowing the user to input comments on digital documents. It directly
supports PDF, XHTML and Text (Unicode?) formats. Like traditional
eBook readers, the device will connect to a host PC via USB. Unlike
traditional eBook readers, however, the unit will also sport WiFi and
wired ethernet network interfaces."

Ian
From Bowerbird at aol.com  Fri Jan 13 21:53:06 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Jan 13 21:53:32 2006
Subject: [gutvol-d] Ebook reading devices
Message-ID: <2e5.8ddc4c.30f9ebc2@aol.com>

ian said:
>    Fair enough - then the iRex device 
>    might be a better bet.   And less proprietry.

might be.   except i heard it will be $300-$500.
with no deep pockets to subsidize and back it.
way too much for a dumb terminal in this age...

next?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060114/938d4868/attachment.html
From hyphen at hyphenologist.co.uk  Fri Jan 13 23:13:35 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Fri Jan 13 23:14:05 2006
Subject: [gutvol-d] PG Website Report for 2005
In-Reply-To: <43C82B09.2050205@perathoner.de>
References: <43C82B09.2050205@perathoner.de>
Message-ID: <c19hs1hc1qi0mt0j21jvvfqaiihvsr7pds@4ax.com>

On Fri, 13 Jan 2006 23:34:49 +0100,  Marcello Perathoner
<marcello@perathoner.de> wrote:


>Most visitors find us thru search engines. But 10% come from wikipedia. 
>The catalog team has been active in editing appropriate wikipedia 
>articles to point to our ebooks. If you want more people to come to PG, 
>help us editing more wikipedia articles.

How easy would it be to add my Yorkshire dialect authors to Wilipedia?
-- 
Dave Fawthrop <dave hyphenologist co uk>
17,000 free e-books at Project Gutenberg! http://www.gutenberg.net
For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/

From sly at victoria.tc.ca  Fri Jan 13 23:27:25 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Fri Jan 13 23:27:56 2006
Subject: [gutvol-d] PG Website Report for 2005
In-Reply-To: <c19hs1hc1qi0mt0j21jvvfqaiihvsr7pds@4ax.com>
References: <43C82B09.2050205@perathoner.de>
	<c19hs1hc1qi0mt0j21jvvfqaiihvsr7pds@4ax.com>
Message-ID: <Pine.GSO.4.58.0601132316320.25202@vtn1.victoria.tc.ca>

Hi Dave

On Sat, 14 Jan 2006, Dave Fawthrop wrote:

>
> How easy would it be to add my Yorkshire dialect authors to Wilipedia?
>

Fairly easy, as long as you have the information to put in.
Wikipedia is very much a learn-as-you-go type of enviroment,
so you might want to just take a look at a few other author
biographies and then create an initial article. The easiest
might be to just create what's called a "stub" article to
start with, then flesh it out. I could try to anticipate any
of a number of different questions you might have, but perhaps
I'll just say feel free to ask if you have any.

An alternate solution would be to send to me what information
you have, and I could rework it when I find the time (and
inclination) :)

Andrew
From imaclean at gmail.com  Sat Jan 14 08:20:09 2006
From: imaclean at gmail.com (Ian MacLean)
Date: Sat Jan 14 08:20:15 2006
Subject: [gutvol-d] Ebook reading devices
In-Reply-To: <2e5.8ddc4c.30f9ebc2@aol.com>
References: <2e5.8ddc4c.30f9ebc2@aol.com>
Message-ID: <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>

On 1/14/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
> ian said:
>  >   Fair enough - then the iRex device
>  >   might be a better bet.  And less proprietry.
>
>  might be.  except i heard it will be $300-$500.
>  with no deep pockets to subsidize and back it.

hmm iRex is a spin off of Phillips - thats fairly deep pockets.
>  way too much for a dumb terminal in this age...

Sure its quite expensive - but its only the 2nd or third e-ink based
device out there and that is of course its biggest selling point - the
readability of the screen.

And besides - an ipod is $300 and it only plays music ...
From hart at pglaf.org  Sat Jan 14 11:06:59 2006
From: hart at pglaf.org (Michael Hart)
Date: Sat Jan 14 11:07:01 2006
Subject: [gutvol-d] Ebook reading devices
In-Reply-To: <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
References: <2e5.8ddc4c.30f9ebc2@aol.com>
	<3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
Message-ID: <Pine.LNX.4.60.0601141104260.27572@pglaf.org>


On Sun, 15 Jan 2006, Ian MacLean wrote:

> On 1/14/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
>> ian said:
>> >   Fair enough - then the iRex device
>> >   might be a better bet.  And less proprietry.
>>
>>  might be.  except i heard it will be $300-$500.
>>  with no deep pockets to subsidize and back it.
>
> hmm iRex is a spin off of Phillips - thats fairly deep pockets.
>>  way too much for a dumb terminal in this age...
>
> Sure its quite expensive - but its only the 2nd or third e-ink based
> device out there and that is of course its biggest selling point - the
> readability of the screen.

"The Medium Is The Massage!"


> And besides - an ipod is $300 and it only plays music ...

But it plays a LOT of music!!!

eBook readers should certainly be able to hold as many books
as MP3 players hold tunes!!!

Not to mention that iPods will do eBooks, right from the 1st
week they were ever on the market, but I'll be that the new
eBook readers won't do iTunes. . .at least for now. . . .

;-)

Michael

From Bowerbird at aol.com  Sat Jan 14 11:27:23 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Jan 14 11:27:33 2006
Subject: [gutvol-d] Ebook reading devices
Message-ID: <6d.53d0baa4.30faaa9b@aol.com>

ian said:
>   hmm iRex is a spin off of Phillips - thats fairly deep pockets.

oh, right.   why do you think they didn't put the big name on it?


ian said:
>    And besides - an ipod is $300 and it only plays music ...

"only"?

i'd guess people are willing to pay a lot more for music
than for books.   i don't have figures on music, but the 
number of books read annually by the average american
is _one_.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060114/d9b7cb68/attachment.html
From mattsen at arvig.net  Sat Jan 14 11:09:30 2006
From: mattsen at arvig.net (Chuck MATTSEN)
Date: Sat Jan 14 11:28:49 2006
Subject: [gutvol-d] Ebook reading devices
In-Reply-To: <Pine.LNX.4.60.0601141104260.27572@pglaf.org>
References: <2e5.8ddc4c.30f9ebc2@aol.com>
	<3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
	<Pine.LNX.4.60.0601141104260.27572@pglaf.org>
Message-ID: <op.s3dt54c4989pjw@notebook>

On Sat, 14 Jan 2006 13:06:59 -0600, Michael Hart <hart@pglaf.org> wrote:

> "The Medium Is The Massage!"

Thank god, someone who quotes that one correctly... :-)

-- 
Chuck Mattsen (Mahnomen, MN)
mattsen@arvig.net
http://eot.com/~mattsen/mtsearch.htm

From hart at pglaf.org  Sat Jan 14 12:06:16 2006
From: hart at pglaf.org (Michael Hart)
Date: Sat Jan 14 12:06:17 2006
Subject: [gutvol-d] Ebook reading devices
In-Reply-To: <6d.53d0baa4.30faaa9b@aol.com>
References: <6d.53d0baa4.30faaa9b@aol.com>
Message-ID: <Pine.LNX.4.60.0601141205060.27572@pglaf.org>


On Sat, 14 Jan 2006 Bowerbird@aol.com wrote:

> ian said:
>>   hmm iRex is a spin off of Phillips - thats fairly deep pockets.
>
> oh, right.   why do you think they didn't put the big name on it?
>
>
> ian said:
>>    And besides - an ipod is $300 and it only plays music ...
>
> "only"?
>
> i'd guess people are willing to pay a lot more for music
> than for books.

iTunes alone is closing in on a million sales already.

We've already seen the first million selling download.


> i don't have figures on music, but the
> number of books read annually by the average american
> is _one_.

Where can I look up things like that?

>
> -bowerbird
>
mh
From hart at pglaf.org  Sat Jan 14 12:07:38 2006
From: hart at pglaf.org (Michael Hart)
Date: Sat Jan 14 12:07:40 2006
Subject: [gutvol-d] Ebook reading devices
In-Reply-To: <op.s3dt54c4989pjw@notebook>
References: <2e5.8ddc4c.30f9ebc2@aol.com>
	<3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
	<Pine.LNX.4.60.0601141104260.27572@pglaf.org>
	<op.s3dt54c4989pjw@notebook>
Message-ID: <Pine.LNX.4.60.0601141206390.27572@pglaf.org>


On Sat, 14 Jan 2006, Chuck MATTSEN wrote:

> On Sat, 14 Jan 2006 13:06:59 -0600, Michael Hart <hart@pglaf.org> wrote:
>
>> "The Medium Is The Massage!"
>
> Thank god, someone who quotes that one correctly... :-)

Well, there WERE censored editions [Texas?] that required the book to be
"The Medium Is The Message" or so I have been told.

mh
From Bowerbird at aol.com  Sat Jan 14 12:14:12 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Jan 14 12:14:28 2006
Subject: [gutvol-d] Ebook reading devices
Message-ID: <1f0.4a97b395.30fab594@aol.com>

michael said:
>    Where can I look up things like that?

i think it was in the recent report on reading
out of the n.e.a.   which was a hatchet job,
so i'm not sure how trustworthy it is, so...

but i think it's no longer a major secret that
americans don't really read a lot...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060114/f132568a/attachment.html
From walter.van.holst at xs4all.nl  Sat Jan 14 12:49:09 2006
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Sat Jan 14 12:49:19 2006
Subject: [gutvol-d] Ebook reading devices
In-Reply-To: <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
References: <2e5.8ddc4c.30f9ebc2@aol.com>
	<3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
Message-ID: <43C963C5.4080400@xs4all.nl>

Ian MacLean wrote:

>
>> might be.  except i heard it will be $300-$500.
>> with no deep pockets to subsidize and back it.
>>    
>>
>
>hmm iRex is a spin off of Phillips - thats fairly deep pockets.
>  
>
How much of a pipe dream would it to try to get iRex to introduce a 
version with a 1 GB CF2 card included with a selection from Gutenberg? I 
mean, having to pay about 400 E for  an eReader that already includes a 
few thousand classics changes the equation quite a bit.

Regards,

 Walter
From sly at victoria.tc.ca  Sun Jan 15 00:23:27 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Jan 15 00:23:49 2006
Subject: [gutvol-d] Chinese names in PG catalog
In-Reply-To: <c19hs1hc1qi0mt0j21jvvfqaiihvsr7pds@4ax.com>
References: <43C82B09.2050205@perathoner.de>
	<c19hs1hc1qi0mt0j21jvvfqaiihvsr7pds@4ax.com>
Message-ID: <Pine.GSO.4.58.0601150018490.10047@vtn1.victoria.tc.ca>

I've put this message here, rather than on the low-traffic catalogers
list, so as to reach more people--see request for help at the end.

I've been editing the author headings for Chinese names in the catalog
over the last few days, in an effort to get closer to some kind of
consistancy.

You can see some of the results by looking towards the bottom
of the pages:
http://www.gutenberg.org/browse/authors/other.html
http://www.gutenberg.org/browse/languages/zh

For anyone interested in the details, I'm aiming to have the main
form using Pinyin romanization, with no tone marks, as used at
the Library of Congress, since their Pinyin conversion (of which
day 1 was Oct. 1, 2000). Then I'd like to have the names in Chinese
characters as a secondary form, if possible. I'll also include other
romanized forms if they seem wide-spread enough.

However, I've about reached the limit of what I can do.
(and there are no guarantees that I have not made any
blatant errors, although I've tried to be careful).

So this is a request for anyone more familiar with the
language who might be able to help check what I have done
and do the same for the remaining Chinese authors...

Andrew
From hyphen at hyphenologist.co.uk  Mon Jan 16 09:13:12 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Mon Jan 16 09:20:11 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <43C963C5.4080400@xs4all.nl>
References: <2e5.8ddc4c.30f9ebc2@aol.com>
	<3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
	<43C963C5.4080400@xs4all.nl>
Message-ID: <igjns1pqnsdj50m5prvddehhqvevddg1a8@4ax.com>

I am working on Yorkshire dialect poems and text, by John Hartley etext No
17472 and have previously done some of F W Moorman's  3232, 2888  work.

There never was and never will be grammar or dictionaries for Yorkshire
dialect, and there were *many* variations extant in late 19th/early 20th
centuries.   I was brought up in the West Riding and am doing another book
about North Riding dialect, only 100km away and find it difficult to
understand.  Conventionally there are three variations for the three
Yorkshire Ridings extant at the present day.  My mother a teacher in the
1920s could detect several variations in a single *town*.
Think about English before Dr Johnson, or American before Noah Webster.

I am told by the whitewashers that it is *essential* that all text for PG
pass guiguts.    Because this assumes that the language scanned is American
it gives 90% plus false positive errors, on my books, which is totally
unsatisfactory for any piece of test software. 

Is there a language free version of Guiguts?

-- 
Dave Fawthrop <dave hyphenologist co uk>
17,000 free e-books at Project Gutenberg! http://www.gutenberg.net
For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/

From hiddengreen at gmail.com  Mon Jan 16 10:45:25 2006
From: hiddengreen at gmail.com (Cori)
Date: Mon Jan 16 10:53:10 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <igjns1pqnsdj50m5prvddehhqvevddg1a8@4ax.com>
References: <2e5.8ddc4c.30f9ebc2@aol.com>
	<3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
	<43C963C5.4080400@xs4all.nl>
	<igjns1pqnsdj50m5prvddehhqvevddg1a8@4ax.com>
Message-ID: <910fee4a0601161045t71d44beci1f5476bbc079bc3a@mail.gmail.com>

Hullo Dave, and all.

> Is there a language free version of Guiguts?

I'm guessing you mean language-free version of Gutcheck, since Guiguts
(one of the custom-written eText processors used at Distributed
Proofreaders) is essentially language-free (its interface is in
English, but it copes with all sorts of odd characters in other
languages.)

> I am told by the whitewashers that it is *essential* that all text for PG
> pass guiguts.    Because this assumes that the language scanned is American
> it gives 90% plus false positive errors, on my books, which is totally
> unsatisfactory for any piece of test software.

This is just my thought, so I expect a WWer will reply shortly and far
more authoritatively.  But Gutcheck flags are warnings, not
necessarily errors.  It *is* necessary to *check* all of them, but
unnecessary to *fix* all of them.  For example, in a "quoted sentence
ending in a footnote marker,"[1] ... Gutcheck will grouse about
unspaced quotes, whereas obviously this is quite fine.  However, in
other places in the text, the"spacing of quotes might well have gone
astray," and that would be a fixable error.

Some warnings, such as for non-ASCII characters, may be rather
redundant in a Latin-1 or UTF-8 file.  I use Guiguts to check the
Character Counts present (to make sure there aren't any unexpected
characters) and then turn off this warning for Gutcheck with a clear
conscience.  As long as the check is done at an appropriate point in
processing, **fully**, the Gutcheck warnings are duplicating what you
already know about the file.

Hope this helps - it's non-official, but informed through many, many
cheery hours of Gutchecking :)

Cori
From jtinsley at pobox.com  Mon Jan 16 12:54:37 2006
From: jtinsley at pobox.com (Jim Tinsley)
Date: Mon Jan 16 12:55:12 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <igjns1pqnsdj50m5prvddehhqvevddg1a8@4ax.com>
References: <2e5.8ddc4c.30f9ebc2@aol.com>
	<3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
	<43C963C5.4080400@xs4all.nl>
	<igjns1pqnsdj50m5prvddehhqvevddg1a8@4ax.com>
Message-ID: <20060116205437.GA3423@panix.com>

On Mon, 16 Jan 2006 17:13:12 +0000, Dave Fawthrop <hyphen@hyphenologist.co.uk> wrote:


>I am told by the whitewashers that it is *essential* that all text for PG
>pass guiguts.    Because this assumes that the language scanned is American
>it gives 90% plus false positive errors, on my books, which is totally
>unsatisfactory for any piece of test software. 
>
>Is there a language free version of Guiguts?

I'm not quite sure which question you're asking, and about which
checking tool, but I think there is some confusion somewhere, of
emphasis if not of fact, and I'm continually surprised by people who
don't know the origins of really quite recent procedures I remember
vividly, and I've had several threads recently about this general
subject of checking, so please bear with me while I regurgitate
history. I hope you'll find a satisfactory answer in here somewhere.

Anybody can use any programs they like to make texts, and different
people do use different tools, according to their own needs or the
needs of the individual texts. Considering that we get French and
German and Esperanto and Chinese texts, not to mention older English,
there is no one-size-fits-all solution for language.

Once, there were no checking tools at all, except for spellcheckers
built into Word Perfect and Word, which is what most people used, and
I could tell you some stories about having to convert those!

David Price and Martin Ward and I made checkers that we used for
ourselves. There may have been others, but those are the ones I'm
aware of. Everything else was Mark One Eyeball.

I had done a lot of cleaning-up work on a lot of texts for various
people, and I would then send those on to Michael for posting. They
would commonly take hours of work each. In self-defense, I wrote a
checker I (later) renamed to gutcheck. When the WWs were formed in
2001, I brought gutcheck with me, and we all used it to find errors
quickly in incoming texts.

It was still standard, at that time, for gutcheck to find anything up
to 50 or 100 errors in a typical incoming text. Checking and fixing
could still take hours, and often involve long threads with the
submitter.

Up till then, there was really no difference between DP and Other
texts, though because the people who mostly submitted from DP were
experienced, and because DP favored simple texts (! yes, it's true),
they were easier than the usual. When DP hit Slashdot, in late 2002,
I was still posting the majority of texts, and both the quantity and
quality of texts coming from DP went nuts. And so did I. To put it
mildly, I got mediaeval on peoples' asses about the quality of
incoming texts. I still wince when I remember some of the things I
said then. But the point is that the few WWs couldn't possibly handle
the amount of work now being spewed at us.

What happened next was a kind of arms race between submitters and WWs.
Submitters didn't want to have their texts bounced, or go through a
long re-checking thread, so they adopted the checking tools we used to
ensure that we wouldn't easily find errors. (Which, in a way was kind
of a bad thing. It used to be that I knew that gutcheck would find
about _half_ of the errors in an incoming text, but if the submitter
had used gutcheck, I would find none, but would have no idea how many
more I had to look for. I used to have lots of fun when I found a new
check to add but hadn't released the new version yet. Heh. Anyway...)

The most significant feature of DP, I often think, is that because of
the need for multiple people to work on the same text, new information
and methods propagate and are assimilated much faster there than
elsewhere.

In March 2003, Charlz set up the PPV system to meet the new pressures.
New producers/PPs would have their file checked by more experienced
people, who have come to do, at least for DP, most of the work that
the WWs did pre-Slashdot. I burned out, and had to go away on an
extended business trip anyhow. David Widger started actively WWing
other peoples' submissions, and between the new PPV system and David,
things became stable again, but at a higher volume than before.

A couple of months later, Steve Schulze (thundergnat) responded to the
need for people who couldn't easily work with command-line tools to
use gutcheck, and wrote GuiGuts, which uses gutcheck to create a list
of things to check, and does a whole lot of other things as well, in a
GUI. It has become the standard "Swiss Army Knife" for preparing texts
in DP. I will be forever grateful to him for saving me from having to
write a cross-platform GUI for gutcheck! :-)

And GuiGuts and gutcheck have accreted features ever since. If you
have GuiGuts, then you have gutcheck, since Steve bundles it with
GuiGuts -- and you also have a large number of other tools that may
or may not be useful for the particular text you're working on.

There are many other checkers available as well, and I'd love to
ramble on about them, but this is too long already, and it doesn't
bear on your question.

This is how it comes -- by evolution, not by fiat -- that incoming
texts are checked with _several_ tools, according to what seems
appropriate for the text, but most commonly with gutcheck and/or
GuiGuts.

Of course, we don't catch all the errors, but we mostly don't have to
spend hours on each one anymore either. With texts from DP, we know
that usually two people have gone through more-or-less the same list
of checks that we do, so mostly we don't find much that needs
querying. But still we give each one a once-over.

Now, _which_ tools are going to get used by a WW will depend on the
person and the text. "Text-checking" (scannos, letter-combinations,
etc.) in gutcheck is pretty useless outside "normal" modern English
prose, because of the false positives. You can switch it off by using
the -t switch from the command line. Or, running through GuiGuts, in
Fixup/Gutcheck options, just tick the -t option to disable. But there
are also other checks like scannos and regexes in GuiGuts that may
give a lot of false positives when run against a text heavy in
dialect.

So when you say "pass GuiGuts", I don't know exactly what you mean.
The things that GuiGuts and gutcheck (and the various other checkers)
note are _queries_, not pass/fail items. If the author wrote "beear",
then that's what he wrote. Some functions (but I couldn't offhand give
you a list of which) in GuiGuts may query it, and so might gutcheck,
or GutAxe, or gutspell, or check-punct, or whatever. In fact, I'm
surprised you got a comment about it at all, unless there were real
errors in the text that could have been caught by the commonest of
checks used today. Getting into discussion threads with submitters is
a HUGE burner of time that, for the most part, the WWs don't have, so
we don't start one except when we must.

It's still a bit of an arms race between the producers and the
checkers, whether those are WWs or PPVs. It doesn't matter whether you
use one tool or another, so long as the result is at least good enough
that whoever checks your file won't find any problems. I had a thread
with a submitter recently in which I bounced a text, saying that I had
spent 18 minutes to find the first error, and the submitter asked what
I do and I said something like "Well, I run the standard checks, and I
look at those and call up any extra checks I think might apply and I
actually _read_ paragraphs from the text for about half an hour, and
if I can't find any problems in that time, I consider it goes clean,"
and he said "OK, then next time, I just have to hold you off for 12
more minutes! :-)"

The thing about this particular arms race is that it is beneficial.
Because the producers are always trying to get it past the checkers
clean, and the checkers are always trying to catch something wrong in
the incoming texts, the overall quality level goes relentlessly up. If
every checker could spend hours and hours on every text, it would go
up more, but as many people on this list know, checking is hard and
tiresome work, and people who are willing and experienced and good at
it are always in demand, and there are always more texts coming in --
which is a GOOD thing! -- so we have to accept that there is only so
much we can do in any given case. 

jim

(Now tell me that all you wanted was the -t switch. :-)

From Bowerbird at aol.com  Mon Jan 16 13:16:47 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Jan 16 13:17:22 2006
Subject: [gutvol-d] Language free version of guiguts?
Message-ID: <254.4cc4a1d.30fd673f@aol.com>


seems like you should welcome new tools...

anyway, when are scans going to go online?

until the general public can compare e-texts to
the page-scans, you simply aren't using the best
"checker" at your disposal -- all of their eyeballs.

"debugging is parallelizable" (a.k.a. distributable),
but hey, _only_ if you actually _set_it_up_ that way...

of course, if you _want_ to do all the checking yourself...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060116/66b8e365/attachment.html
From darrenburnhill at hotmail.com  Tue Jan 17 04:15:27 2006
From: darrenburnhill at hotmail.com (Darren Burnhill)
Date: Tue Jan 17 04:16:13 2006
Subject: [gutvol-d] Yorkshire Dialect
In-Reply-To: <20060116200003.40CF68C28F@pglaf.org>
Message-ID: <BAY101-F48D13B615CCEE38B2D713CB1A0@phx.gbl>

Hi,

>  There never was and never will be grammar or dictionaries for Yorkshire
>  dialect,

Forgive me for pointing you toward things you already know;

There are a few others (A grammar of the dialect of Windhill by Joseph 
Wright, etc.), but Folklore and Customs of the North Riding of Yorkshire by 
Richard Blakeborough is probably right up your street as it contains a 
substantial glossary.  I do have a (reprint) copy that I will get round to 
OCRing, but i'm working my way through the 'Old Yorkshire' series first. 
Meanwhile there are a few copies in the Bradford system; 
http://www.biskit.yorks.com/

This may also be of interest: http://www.yorksj.ac.uk/dialect/

IME the dialectical variants still exist to this day.

>From sunny Shipley ;)


From hyphen at hyphenologist.co.uk  Tue Jan 17 04:53:44 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Tue Jan 17 04:54:40 2006
Subject: [gutvol-d] Yorkshire Dialect
In-Reply-To: <BAY101-F48D13B615CCEE38B2D713CB1A0@phx.gbl>
References: <20060116200003.40CF68C28F@pglaf.org>
	<BAY101-F48D13B615CCEE38B2D713CB1A0@phx.gbl>
Message-ID: <lmops1d1ekfd98f6qvd2vuvu4fkef3peo5@4ax.com>

On Tue, 17 Jan 2006 12:15:27 +0000,  "Darren Burnhill"
<darrenburnhill@hotmail.com> wrote:

|Hi,
|
|>  There never was and never will be grammar or dictionaries for Yorkshire
|>  dialect,
|
|Forgive me for pointing you toward things you already know;
|
|There are a few others (A grammar of the dialect of Windhill by Joseph 
|Wright, etc.), but Folklore and Customs of the North Riding of Yorkshire by 
|Richard Blakeborough is probably right up your street as it contains a 
|substantial glossary.  

As I explained, but you snipped there are massive differences between the
dialects of the three Ridings. :-(   not to mention the changes with time.
I would not consider any of the existing glossaries definitive.
I generally use Kellett's  The Yorkshire Dictionary.

|I do have a (reprint) copy that I will get round to 
|OCRing, but i'm working my way through the 'Old Yorkshire' series first. 
|Meanwhile there are a few copies in the Bradford system; 
|http://www.biskit.yorks.com/

Great, let me know what else you are considering doing, so we do not start
the same things.

I have almost finished Ben Preston's "Dialect and other poems".

I have partially done 
Yorksher Puddin' by John Hartley,
Yorkshire Folk Talk, By Morris harvested from
http://www.genuki.org.uk/big/eng/YKS/Misc/Books/FolkTalk/

I hope eventually to do all John Hartley's work, but I doubt I will ever
finish it.

|>From sunny Shipley ;)

From windy Shelf ;-)
-- 
Dave Fawthrop <dave hyphenologist co uk>
17,000 free e-books at Project Gutenberg! http://www.gutenberg.net
For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/

From hyphen at hyphenologist.co.uk  Tue Jan 17 06:13:09 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Tue Jan 17 06:14:23 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <254.4cc4a1d.30fd673f@aol.com>
References: <254.4cc4a1d.30fd673f@aol.com>
Message-ID: <qmups19nn53cuil6bpkoo0jciah090rg0k@4ax.com>

On Mon, 16 Jan 2006 16:16:47 EST,  Bowerbird@aol.com wrote:

|
|seems like you should welcome new tools...

If they work on my projects 

|anyway, when are scans going to go online?

When I get tools which work half way OK, and can get the submission process
sorted to my satisfaction.
-- 
Dave Fawthrop <dave hyphenologist co uk>
17,000 free e-books at Project Gutenberg! http://www.gutenberg.net
For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/

From Bowerbird at aol.com  Tue Jan 17 09:57:58 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Jan 17 09:58:08 2006
Subject: [gutvol-d] Language free version of guiguts?
Message-ID: <9d.6f74a718.30fe8a26@aol.com>

dave said:
>    If they work on my projects
...
>    When I get tools which work half way OK, and can 
>    get the submission process sorted to my satisfaction.

actually, dave, my post was directed at project gutenberg.         :+)

but i'm glad you're thinking along similar lines...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060117/15ca1160/attachment.html
From hyphen at hyphenologist.co.uk  Wed Jan 18 18:55:17 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Wed Jan 18 18:55:32 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <qmups19nn53cuil6bpkoo0jciah090rg0k@4ax.com>
References: <254.4cc4a1d.30fd673f@aol.com>
	<qmups19nn53cuil6bpkoo0jciah090rg0k@4ax.com>
Message-ID: <setts1hq7an390l6fvfe9f07g7e00miqug@4ax.com>

On Tue, 17 Jan 2006 14:13:09 +0000,  Dave Fawthrop
<hyphen@hyphenologist.co.uk> wrote:

|
|Finally guiguts is as it stands unusable on my texts. No doubt I will find
|other equally drastic problems


I  have now played with gutcheck a bit more and worked out what some of the
specifically *American* tests are which are not true in Yorkshire dialect
are, or indeed *English*  which translates to Queens English in American. 
Without the code or indeed a working knowledge of Perl, these are based
purely on the gutcheck false errors found.   My computer languages are C
Fortran77,  Basic, some Pascal, and a little Cobol.

1. Gutcheck objects to words consisting of one or more consonants without a
vowel.  These are common in Yorkshire dialect    Th'    T'   etc.etc.   the
apostrophe indicates missing letters

2. Gutcheck objects to words of three vowels or more    eea    and similar
words occur in Yorkshire dialect.

3. Gutcheck  gets its knickers completely in the twist on single quotes '
It assumes that the single quote is speech,   Whereas in  English (Queens
English in American)  single quotes are uncommon and double quotes indicate
speech.   Hartley, especially uses double quotes for speech.   It
misinterprets apostrophes indicating missing unsounded letters as single
quotes. 

4. Gutcheck also gets its knickers in a twist about double quotes and
objects to      " "       I have not worked out why, but no doubt that will
give me more sleepless nights come to me at about 3 am :-( 

5. Gutcheck  also assumes that a line end is a paragraph end, which is not
true  in  poetry, even American poetry.    Speech commonly spans lines in
all poetry, many lines in the poetry I work on, and may well span stanzas
in Harleys work.

No doubt I will find other systematic errors in the way Gutcheck works on
my text.

Sorry about the threading. Agent is sending things by email which I had
meant to send to the list. :-(

-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain

From donovan at abs.net  Wed Jan 18 19:36:27 2006
From: donovan at abs.net (D Garcia)
Date: Wed Jan 18 20:14:28 2006
Subject: [dp-pg] Re: [gutvol-d] Language free version of guiguts?
In-Reply-To: <setts1hq7an390l6fvfe9f07g7e00miqug@4ax.com>
References: <254.4cc4a1d.30fd673f@aol.com>
	<qmups19nn53cuil6bpkoo0jciah090rg0k@4ax.com>
	<setts1hq7an390l6fvfe9f07g7e00miqug@4ax.com>
Message-ID: <200601182236.27702.donovan@abs.net>

On Wednesday 18 January 2006 09:55 pm, Dave Fawthrop wrote:
> On Tue, 17 Jan 2006 14:13:09 +0000,  Dave Fawthrop
>
> <hyphen@hyphenologist.co.uk> wrote:
> |Finally guiguts is as it stands unusable on my texts. No doubt I will find
> |other equally drastic problems
>
> I  have now played with gutcheck a bit more and worked out what some of the
> specifically *American* tests are which are not true in Yorkshire dialect
> are, or indeed *English*  which translates to Queens English in American.
> Without the code or indeed a working knowledge of Perl, these are based
> purely on the gutcheck false errors found.   My computer languages are C
> Fortran77,  Basic, some Pascal, and a little Cobol.

Dave, perhaps you're labouring under a misapprehension here. Your comments 
seem to indicate some confusion.

Jim Tinsley's gutcheck is pretty much 100% plain vanilla C code.

Steve (thundergnat's) guiguts is 100% perl, with the ability to act as a front 
end interface to external programs such as gutcheck.

Source code to both is readily available and included in the downloads last I 
checked.

I'm not sure what you expect either of those developers to do about your 
situation. It seems to me that you are trying to use a hammer when what you 
really need is a 4mm Torx driver. Obviously, both programs were developed 
with the intent of checking the most common texts submitted to PG--English.

Others with specialised needs (such as decrufting poorly OCR'ed Fraktur, or 
old long-ess texts) have developed their own specialised tools for those 
specific purposes. They had the subject matter expertise and the technical 
skills to implement these. Quick Google searches will reveal other similar 
tools directed at their niches.

I completely support the preservation of strongly localized texts such as 
those you are working with. Have you considered applying your skills in C and 
Yorkshire to create a customised version of gutcheck for your needs?
From hyphen at hyphenologist.co.uk  Thu Jan 19 01:32:28 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Thu Jan 19 01:32:42 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <20060119011933.GA16170@panix.com>
References: <20060119011933.GA16170@panix.com>
Message-ID: <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com>

On Wed, 18 Jan 2006 20:19:33 -0500,  Jim Tinsley <jtinsley@pobox.com>
wrote:

|On Wed, 18 Jan 2006 11:44:46 +0000, Dave Fawthrop <hyphen@hyphenologist.co.uk> wrote:
|
|>On Mon, 16 Jan 2006 15:54:37 -0500,  Jim Tinsley <jtinsley@pobox.com>
|>wrote:
|>
|>|On Mon, 16 Jan 2006 17:13:12 +0000, Dave Fawthrop <hyphen@hyphenologist.co.uk> wrote:
|>|
|>|
|>|>I am told by the whitewashers that it is *essential* that all text for PG
|>|>pass guiguts.    Because this assumes that the language scanned is American
|>|>it gives 90% plus false positive errors, on my books, which is totally
|>|>unsatisfactory for any piece of test software. 
|>|>
|>|>Is there a language free version of Guiguts?
|>|
|>|I'm not quite sure which question you're asking, and about which
|>|checking tool, but I think there is some confusion somewhere, of
|>|emphasis if not of fact, and I'm continually surprised by people who
|>|don't know the origins of really quite recent procedures I remember
|>|vividly, and I've had several threads recently about this general
|>|subject of checking, so please bear with me while I regurgitate
|>|history. I hope you'll find a satisfactory answer in here somewhere.
|>
|>I could only find one tool which shows on my Win XP computer that is
|>guiguts.   This as far as I can ascertain has various  subroutines which
|>are very badly tied together, and in no way at all follow the Windoze
|>interface.
|>
|>|Anybody can use any programs they like to make texts, and different
|>|people do use different tools, according to their own needs or the
|>|needs of the individual texts. Considering that we get French and
|>|German and Esperanto and Chinese texts, not to mention older English,
|>|there is no one-size-fits-all solution for language.
|>
|>To get things past whitewashers one apparently must use this, or things get
|>rejected.   Your assertion is therefore clearly theoretically correct, but
|>in reality absolutely wrong 
|
|Now, this I can flatly deny. 

There is no such thing as a standard without a test to show that the test
has been passed.   Schools teach to the exam, which some think wrong, but
it happened to me and judging by the media Brouhaha still happens in the
UK.
I was an Engineer, and there a draughtsman who failed to put a test
(tolerance) on anything in a drawing, was exposed to public ridicule.  If
such a drawing got onto the shop floor the production departments would
deliberately fail to follow any reasonable tolerance.

| I can think of half-a-dozen people
|offhand, regular producers, who don't use gutcheck in any form.
|They don't need to. Their quality standards are high enough that
|it won't find any real errors. I do run it on their texts, as a
|matter of form, but I know in advance what the result will be.
|For all I know, there are others, equally good, but I don't know
|that they don't use gutcheck because the subject never comes up.
|Most of us, of course, are not that good.

You have just admitted that gutcheck is the standard on PG.
|
|Bill Flis, who wrote the GutWrench package, uses his own checkers
|exclusively, and I know equally well that I won't find any errors
|that can sanely be caught by automation in his texts either. You can
|find them, if you're interested, at http://www.pgdp.net/tools/GW.zip
|
|>|Once, there were no checking tools at all, except for spellcheckers
|>|built into Word Perfect and Word, which is what most people used, and
|>|I could tell you some stories about having to convert those!
|>|
|>|David Price and Martin Ward and I made checkers that we used for
|>|ourselves. There may have been others, but those are the ones I'm
|>|aware of. Everything else was Mark One Eyeball.
|>|
|>|I had done a lot of cleaning-up work on a lot of texts for various
|>|people, and I would then send those on to Michael for posting. They
|>|would commonly take hours of work each. In self-defense, I wrote a
|>|checker I (later) renamed to gutcheck. When the WWs were formed in
|>|2001, I brought gutcheck with me, and we all used it to find errors
|>|quickly in incoming texts.
|>
|>But gutcheck gives 90% plus false positive errors, many hundreds on my
|>texts in Yorkshire Dialect, mostly poems.   It enforces the American
|>language, and American punctuation conventions.   It objects to most
|>Yorkshire abbreviated words  such as    t'    which occur dozens of times
|>in the poems I work on.    It also objects to non standard punctuation
|>which occur in my texts as an example    "?    whereas American convention
|>apparently is  ?"   . 
|>
|>Writing as one who has designed, written and sold language software for
|>some 20 years (see my web site).   The *first* stage in the design of any
|>software involving language is how other languages will be treated.   This
|>is usually done by putting all the features of one language, in a specific
|>data structure(s) and/or subroutine(s) which can be used or not as
|>required.
|>
|>All I asked for was a copy of gutcheck with the features specific to
|>American removed which should be a very short editing and recompiling job.
|
|I'm not sure how you define "American", but ALL gutcheck features are
|language-specific, one way or another. You really appreciate this when
|checking Hebrew or Tagalog! Even the relatively familiar French,
|German and Spanish have various punctuation features quite
|incompatible with gutcheck's assumptions. I'm talking with various
|LOTE producers about language-specific versions, but have not yet
|decided to take any action.

Then gutcheck should be modified to have versions for many languages.
If you read the Subject of this thread, you will find: "Language free
version of guiguts?"
|
|>Worse the only way to view output is on a screen. Copy does not work so it
|>is impossible to copy the output to a text file and edit the repeated false
|>positives out of the list.   It is totally unacceptable to distribute a GUI
|>program where the standard Copy and Paste functions do not work
|>
|>Worse still and absolutely ***unforgivable*** in any GUI program the
|>settings places the settings file on ***THE DESKTOP***.  Deleting it loses
|>all settings.   
|>
|
|I can't comment on GuiGuts. As a command-line guy, I don't use it all
|that much, except sometimes, when I find some specific feature
|invaluable. If you want to comment, the appropriate place is in the
|GuiGuts thread of the Tool Development forum at DP, which Steve reads
|and answers questions and requests in.
|http://www.pgdp.net/phpBB2/viewforum.php?f=13

I have asked the question here.   
I do not do forums.

|>|Up till then, there was really no difference between DP and Other
|>|texts, though because the people who mostly submitted from DP were
|>|experienced, and because DP favored simple texts 
|>
|>DP is by its nature not suitable for my texts, because the language is as
|>different from American as say French.   A non Tyke (yorkshireman) as has
|>been shown in the past, has extreme difficulty understanding the text.
|
|Well, considering that they regularly do several languages, I doubt if
|Yorkshire dialect would stand out much. Right now, in round 1, I find:
|English, German (math, with LaTeX), Finnish, French with Scots, Middle
|English, Middle French, Portuguese, English with Ancient Greek,
|Spanish, Italian, Dutch, German, English with Breton, French, Tagalog,
|Latin, and I just know there's some Esperanto around somewhere. I know
|they've also done Irish (sean-litri?), because I had a hell of a time
|finding all the correct characters for the UTF-8 version (and I'm
|still not convinced about Tironian-et). Of course, if you want real
|variety, you need to hit the European DP.
|
|>|And GuiGuts and gutcheck have accreted features ever since. If you
|>|have GuiGuts, then you have gutcheck, since Steve bundles it with
|>|GuiGuts -- and you also have a large number of other tools that may
|>|or may not be useful for the particular text you're working on.
|>|
|>|There are many other checkers available as well, and I'd love to
|>|ramble on about them, but this is too long already, and it doesn't
|>|bear on your question.
|>
|>|This is how it comes -- by evolution, not by fiat -- 
|>
|>Untrue!
|>I am *forced* to use guiguts/gutcheck by the Whitewashers.
|
|I say again: not everyone does. Just eradicate all mistakes and nobody
|will ever know what you used.
|
|>Gutcheck does not work on Windoze.
|
|It runs in a Win32 command prompt, but it doesn't have a GUI on any
|platform.

"You have to be joking MAN"
|
|>| that incoming
|>|texts are checked with _several_ tools, according to what seems
|>|appropriate for the text, but most commonly with gutcheck and/or
|>|GuiGuts.
|>
|>
|>Finally guiguts is as it stands unusable on my texts. No doubt I will find
|>other equally drastic problems
|>
|>As all my work goes on my own web site, and gets copied from there onto
|>many other sites, PG is just a  nice add on and could be ditched if it were
|>to take too much effort.
|>
|>The text which WW objected to so strongly has been on my site for a couple
|>of years, and absolutely *nobody* has noticed the ?errors?   People read it
|>for the dialect, not the punctuation. I have however had several
|>appreciative emails.
|
|Well, I'm very familiar with that condition, but that's a whole
|'nother argument. A text does not have to be perfect to be valuable.
|We have many older texts, especially, that have many errors. That
|doesn't make them useless. I handle most of the errata reports for PG,
|and nearly all of then express appreciation for the availability of
|the text, along with their handful of reported errors. I may find
|another hundred or so problems when I check the text out, but these
|readers never noticed them. Two million downloads a month, with (I
|estimate) about one million errors among 17,000 books, and we get
|about one errata report per day.
|
|And there are many people who do want to make etexts but don't want
|to live within the constraints of PG -- some don't want the
|quality-checking, some complain that we don't quality-check enough,
|some don't want to work in plain text, some don't want to go through
|the clearance procedures, and so on.
|
|We have 40 to 60 submitted texts in the average week, and three WWs
|active to take them at the moment. If everything in an incoming text
|is perfect, one of us will spend about an hour on it. Plus a load of
|time on other activities. We can't accommodate everyone on everything,
|and there is no doubt that the quality gets higher as time goes on,
|because of the processing that we do. This is what we have to do,
|to keep the operation moving and the quality high. Not everyone is
|going to be happy with the process. Some will choose not to send their
|texts to PG. I'm sorry about that.
|
|>|(Now tell me that all you wanted was the -t switch. :-)
|>
|>I am not going back to the bad old Unix days, when each program had to be
|>learned individually.   Come back Bill Gates. All is forgiven. 
|
|Well, I say again, if you don't want to use it, you don't have to;
|not everyone does, and especially not everyone does for all texts.
|It's essentially a collection of regexes, selected to give, on
|average, the best results for the most common type of PG files. 
|Many
|DPers who work on other types of texts just put together their own set
|of regexes, and run them through GuiGuts or GutWrench or from a *nix
|command line, whichever they prefer.

I do not do windoze programming.  
You are essentially saying that a non programmer can work for PG. :-(
Did you really mean that?

You have agreed with me above that gutcheck is the standard which must be
passed to get.   I am just trying to find a version of that standard which
will run on my machine, with the text 

As I understand it that answer to a perfectly reasonable request see
Subject from PG was:
****************** 
***GET STUFFED.***
******************

I will look for a workaround.
-- 
Dave Fawthrop <dave hyphenologist co uk>
17,000 free e-books at Project Gutenberg! http://www.gutenberg.net
For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/

From traverso at dm.unipi.it  Thu Jan 19 02:21:45 2006
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Thu Jan 19 02:08:27 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com> (message from Dave
	Fawthrop on Thu, 19 Jan 2006 09:32:28 +0000)
References: <20060119011933.GA16170@panix.com>
	<0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com>
Message-ID: <200601191021.k0JALjx02910@pico.dm.unipi.it>


I have times ago done some work to build a multilingual form of
gutcheck, (and I still think that it is a very reasonable aim) but I
stopped when Jim refused the very idea that this should be done. I am
still using my (now obsolete) version of gutcheck with the french
customization.

My idea was that some constants, (for example, the list of vowels and
the list of strings suspicious inside a word) instead of being
had-wired in the code should be contined in constants defined in
header files included at compile time. 

If you want, I can try to update my version, and discuss extensions to
other languages.

Carlo Traverso
From sly at victoria.tc.ca  Thu Jan 19 02:58:16 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Jan 19 02:58:23 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com>
References: <20060119011933.GA16170@panix.com>
	<0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com>
Message-ID: <Pine.GSO.4.58.0601190136150.22581@vtn1.victoria.tc.ca>


First, some general musing here...

It's interesting to see how as the number of people involved
with PG in one way or another keeps growing, so does a general
misunderstanding about the nature of the project.

Somehow, the impression of some people is that it all
runs like clockwork, and all little ambiguities are
swiftly and efficiantly dealt with. I can understand
that when someone sees the sheer amount of what has
been accomplished so far, it can be easy to assume that
"of course, _this_ has been done--it wouldn't make sense
otherwise."

Realistically, the processes that are in place grew up
over time, with volunteers doing their best to deal with
the demands of the moment and string something together
that would work. And it is not static either, it keeps
changing.

I'm not kidding when I say that the few people who do
the majority of the back-end stuff that keeps PG growing
have a backlog of years of PG-related tasks to tackle.

So, on the specific topic at hand...

On Thu, 19 Jan 2006, Dave Fawthrop wrote:

> You have just admitted that gutcheck is the standard on PG.

Yes, it is used a lot. Often it's a very useful tool
(sometimes even for non-english texts.) However, I would not
call it a standard in the sense of being a "test" that
a given text has to "pass" (such as a test for valid markup
on an html file). Rather, it is a tool which just
about every text being added to the collection is run through,
as a way of 1) assesing the over-all level of the text, and 2)
guarding against last-minute gremlins that do unexpected things
to a text (and yes, interesting things do happen sometimes.)

I have submitted some German and French texts to PG which I
have reformatted from other sources, and, as expected, a run
through gutcheck resulted in many places being questioned
that were just fine in the given languages. So, if I thought
it needed, I just added a note when submitting the texts
that "gutcheck flags a lot of false positives on this one."

It looks like the source for gutcheck is availible at
http://gutcheck.sourceforge.net/
if you are interested in modifying it for your own
uses. (If you are just dealing with one or two texts,
it might not be worth the bother, but if you foresee
working through lots of Yorkshire text, it could be
more worthwhile.)

......

So, will the conditions I discussed above change?

Well, PG is certainly more organized in some ways
than it used to be, and I could see it going further
in that direction. However, I don't realistically
see it ceasing to be run by volunteers, which does
set some of the tone.

I'm not pretending that I think PG is perfect here.
Like anyone else who is involved, I have my own
issues (one of my pet peeves is if the stated character
encoding in the header does not match what is actually
in the text), but I know they will not likely be
dealt with unless I go ahead and try to work on them.

I've found a good approach is building consensus with others.
>From the cataloging point of view, I've regularly had help from
native speakers of various languages (Finnish and Tagalog spring
to mind) which has helped me to make bibliographic data more
precise than I ever could have managed on my own. As well
I've occasionally sent queries to the reference desks of
libraries in many corners of the world. If I can get it
organized, I'm hoping to make a sub-project where I can
target a few wikipedia users who have indicated they have
fluency in both English and Chinese, and give them a way to
help improve the consistency of the author data for our
Chinese texts.

I'd better stop now, before I meander off-topic too much...

But I hope this has helped somewhat.

And thanks for caring about Project Gutenberg. :)

Andrew
From blondeel at clipper.ens.fr  Thu Jan 19 02:25:47 2006
From: blondeel at clipper.ens.fr (Sebastien Blondeel)
Date: Thu Jan 19 02:58:26 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <200601191021.k0JALjx02910@pico.dm.unipi.it>
References: <20060119011933.GA16170@panix.com>
	<0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com>
	<200601191021.k0JALjx02910@pico.dm.unipi.it>
Message-ID: <20060119102547.GA10893@clipper.ens.fr>

I have developed programs to help me proof faster/better. I work mainly
in French but they seem to work well in other latin alphabet languages
(I tried them a little in English, Spanish).

http://www.pgdp.net/phpBB2/viewtopic.php?p=158673#158673
(get in touch with me if you want to give them a try; the
CVS-commited version is not the very latest one)

I use them to do R1/R2, P1/P2, and, as of recently, P0 that is to say
quick preparation of OCR'd texts before publication on PGDP Int'l.

I define language-related things (constants, suffixes, prefixes). Right
now, apparently being the only user and developer of these programs,
there are many special cases for French. But it could be easy to add
things for other languages.

As an example a French rule is: the word is accepted if it starts with
"j'" and continues with a vowel and the rest is an accepted word.

For example: "j'aime" (I love) is accepted because "aime" (love) is.
"j'arbre" (I tree) is accepted because "arbre" (tree) is. This means
nothing of course, but a proofer is bound to spot that: it is not a
scanno (and not likely to happen in OCR anyway).

Kicking some grammatical checks in would be the next step. Right now the
programs are just working on a syntactical basis. I have a list of
French words with all their possible grammatical natures (noun /
adjective / conjugated verb for this tense and this person...) but
unfortunately it was published by ABU under a restrictive license which
makes it difficult for me to repackage and reuse. The free list of words
I found in Debian packages is very incomplete (it is missing many simple
pass? simple conjugated verbs, most if not all subjonctif imparfaits...)

In English we could for example decide "<word>'s" is accepted if
"<word>" is (and does not finish with an "s").

I am planning to think and develop or reuse things to do PM later on,
probably focusing more or less on producing XML TEI.
From joshua at hutchinson.net  Thu Jan 19 05:32:36 2006
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Jan 19 05:32:38 2006
Subject: [gutvol-d] Language free version of guiguts?
Message-ID: <20060119133236.3E2F19E851@ws6-2.us4.outblaze.com>

> 
> I do not do windoze programming.
> You are essentially saying that a non programmer can work for PG. :-(
> Did you really mean that?
> 
> You have agreed with me above that gutcheck is the standard which must be
> passed to get.   I am just trying to find a version of that standard which
> will run on my machine, with the text
> 


Well, I'll try to be nice even though you are being very confrontational...

No one but you ever said any of the above.

1 - GutCheck is *not* required.  Jim, who is probably the end-all on the subject since he wrote the thing, flat out said it is not required.

2 - GuiGuts and GutCheck are *not* the same thing.  GuiGuts is a text editor written in PERL.  GutCheck is a text checker written in C.  You can run GutCheck from GuiGuts (among many, many other things).

3 - Asking for GuiGuts support here is a waste of time.  The developer of GuiGuts isn't here, he is on the DP forums.  Which you've flat out refused to go to.  Fine, just don't expect help for a Dell laptop when you call HP tech support, either.

4 - No one said you had to work/create tools for PG.  But if you want a tool for something that doesn't currently exist, you either create it yourself or do without (or wait until someone else needs it and decides to do the work you don't want to do).  Personally, I've done all three over the years.

5 - *AND MOST IMPORTANT!*  GutCheck is not a test that must be passed.  It is better thought of as a checker that will flag things that are wrong more often than right.  You should run it, because it *will* help you find mistakes that exist in your text. You, as an intelligent, thinking human being, must check each item and verify whether it is correct.  Sometimes it is right.  Sometimes it is wrong.  If the system was 100% infalliable, we wouldn't need humans anywhere in the process.

Josh

From Gutenberg9443 at aol.com  Fri Jan 20 15:31:58 2006
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Jan 20 15:32:04 2006
Subject: [gutvol-d] Ebook reading devices
Message-ID: <1c8.39607e27.3102ccee@aol.com>

 
In a message dated 1/12/2006 12:15:48 P.M. Mountain Standard Time,  
robsuth@robsuth.plus.com writes:

enquiries as far as I could, but found no product among the DVD  portable 
readers that would deal with text, nor was any of the manufacturers  interested 
in producing one. Apart from the French reader Cybook (which I  think is too 
expensive for the task and market although otherwise for the most  part ideal) 
there seems still to be no special ebook reading device available  in UK or the 
EU generally: laptops are still too big and heavy, and PDAs still  have far 
too small a screen My enquiries confirmed my strong impression that  there are 
protectionist interests holding this back, presumably in the  interests of 
proprietary issues of ebooks. 


I suggest you go to eBookWise and FictionWise. Although they do not deal  
with DVD portable readers, they are seeking input into what people need and  
want, and would be very glad to hear from you.
 
My husband, two of my daughters, several of my friends, and I all use the  
eBookWise reader which you can find for sale for about $100 at eBookWise.com. By 
 using extra memory cards, you can build immense libraries, or you can use 
only  the memory built into the device and build an immense library on your 
computer,  realizing that you'll have to download often. I have six memory cards 
and would  like to have about sixty more, but they aren't free.
 
Anne Wingate
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060120/21ebeb56/attachment.html
From Gutenberg9443 at aol.com  Fri Jan 20 15:40:06 2006
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Jan 20 15:40:12 2006
Subject: [gutvol-d] Ebook reading devices
Message-ID: <22b.51ffa60.3102ced6@aol.com>

 
In a message dated 1/13/2006 10:07:27 P.M. Mountain Standard Time,  
imaclean@gmail.com writes:

It  directly
supports PDF, XHTML and Text (Unicode?)  formats.


An ebook that will support PDF is a good thing, but I wouldn't want it if  it 
will support ONLY these formats.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060120/7132f2fb/attachment.html
From bubblegirl at optusnet.com.au  Fri Jan 20 15:55:25 2006
From: bubblegirl at optusnet.com.au (Season BubbleGirl - bubblegirl.net)
Date: Fri Jan 20 19:09:21 2006
Subject: [gutvol-d] Ebook reading devices
References: <1c8.39607e27.3102ccee@aol.com>
Message-ID: <001801c61e1c$fced3740$0a01a8c0@bubblegirl>

If you want a portable reading device, you can get a PDA and then use a screen magnifier. They are very good and magnify the typical screen by 5x. It's still portable, but a lot easier to read. I use one for everything. Here is my review of one: 

"This product is called a Screen Magnifier. Officeonthegogo.com are the only makers and retailers of this part, but I think it's worth it. At a price of $29.95US, you can view everything twice the size. How? Designers made a cradle-type structure that slots onto the back of the Pocket PC. A magnifying plate is then to hover over the front of it from long arms. Through this plate users read or use their screen, it not actually touching the screen. 

It's a light-weight, easily handled product, compared to using a normal magnifier you could drop and break. For PALM users, they customise them to the right size. Because Office On The GoGo make them, they can alter them to the customers' needs. 

There are two types of stands the magnifying plate is put on. One being a backpack for the PDA, the other a proper stand. The backpack, as mentioned above, surrounds the PDA shell, where the stand props up the PDA for those using clip-on or infra-red keyboard. (I know you're wondering about keyboards - stay tuned for that information in an article coming soon). If you want both of these stands you can buy the "Magnifico Combo" for $39.95US. Tell Mike Sirius that Bubble Girl sent you. "

PDA Patrol by Season BubbleGirl

Hope this helps!


Season BubbleGirl
www.bubblegirl.net
Where individuality truly shines!

Author of A Doggy Diary and the coming autobiography, Life in a Bubble

Creator of Music Mash, PDA Patrol, and other free literature at bubblegirl.net
  ----- Original Message ----- 
  From: Gutenberg9443@aol.com 
  To: gutvol-d@lists.pglaf.org 
  Sent: Saturday, January 21, 2006 10:01 AM
  Subject: Re: [gutvol-d] Ebook reading devices


  In a message dated 1/12/2006 12:15:48 P.M. Mountain Standard Time, robsuth@robsuth.plus.com writes:
    enquiries as far as I could, but found no product among the DVD portable readers that would deal with text, nor was any of the manufacturers interested in producing one. Apart from the French reader Cybook (which I think is too expensive for the task and market although otherwise for the most part ideal) there seems still to be no special ebook reading device available in UK or the EU generally: laptops are still too big and heavy, and PDAs still have far too small a screen My enquiries confirmed my strong impression that there are protectionist interests holding this back, presumably in the interests of proprietary issues of ebooks. 
  I suggest you go to eBookWise and FictionWise. Although they do not deal with DVD portable readers, they are seeking input into what people need and want, and would be very glad to hear from you.

  My husband, two of my daughters, several of my friends, and I all use the eBookWise reader which you can find for sale for about $100 at eBookWise.com. By using extra memory cards, you can build immense libraries, or you can use only the memory built into the device and build an immense library on your computer, realizing that you'll have to download often. I have six memory cards and would like to have about sixty more, but they aren't free.

  Anne Wingate


------------------------------------------------------------------------------


  _______________________________________________
  gutvol-d mailing list
  gutvol-d@lists.pglaf.org
  http://lists.pglaf.org/listinfo.cgi/gutvol-d
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060121/3ab0b4f9/attachment.html
From robsuth at robsuth.plus.com  Fri Jan 20 17:58:48 2006
From: robsuth at robsuth.plus.com (Robert Sutherland)
Date: Fri Jan 20 19:30:45 2006
Subject: [gutvol-d] Ebook reading devices
In-Reply-To: <22b.51ffa60.3102ced6@aol.com>
References: <22b.51ffa60.3102ced6@aol.com>
Message-ID: <6.2.3.4.1.20060121015639.02ce3010@mail.plus.net>

I entirely agree - I believe the minimum should be .txt, .rtf, .htm & 
.pdf, and of course that still leaves the proprietary formats if one 
is likely to want to use them. But at least conversion from other 
formats to .pdf is usually straightforward.

Robert Sutherland
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
At 23:40 20-01-06, you wrote:
>In a message dated 1/13/2006 10:07:27 P.M. Mountain Standard Time, 
>imaclean@gmail.com writes:
>It directly
>supports PDF, XHTML and Text (Unicode?) formats.
>
>An ebook that will support PDF is a good thing, but I wouldn't want 
>it if it will support ONLY these formats.
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.375 / Virus Database: 267.14.21/236 - Release Date: 20-01-06
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060121/d259e112/attachment.html
From robsuth at robsuth.plus.com  Fri Jan 20 18:20:21 2006
From: robsuth at robsuth.plus.com (Robert Sutherland)
Date: Fri Jan 20 19:30:48 2006
Subject: [gutvol-d] Ebook reading devices
In-Reply-To: <1c8.39607e27.3102ccee@aol.com>
References: <1c8.39607e27.3102ccee@aol.com>
Message-ID: <6.2.3.4.1.20060121015910.02ce3740@mail.plus.net>


Ah, But!! I live furth of the USA and Canada (in Scotland), and those 
readers are not available here (or anywhere in EU, as far as I can 
make out). No one can tell me why  - the providers all talk about 
incompatibilities, but don't explain what they are; others say the 
market is too small to justify it, which is plainly nonsense. 
However, I pin my hopes on Iliad, which Phillips (Netherlands) are 
likely to bring out in April, but the price is not yet announced. In 
the meantime I hump my ancient Thinkpad up and down to bed!.

The DVD player inquiry was just the tail end of an earlier search - 
in case one could be found that ran text - their screens are mostly 
big enough, they are much lighter than laptops and some are very 
cheap. But alas, none does text!

The aggravation of the US/Canada restriction is that it is highly 
probable that whatever the incompatibility it could be overcome by a 
simple adapter or by quite simple software of some kind. It is 
incomprehensible to me that the providers of these devices have not 
made some kind of modification which would open up this potential 
market to them - What is the population of the European Union? and 
India and China, even more, a huge market waiting.

Robert Sutherland
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
At 23:31 20-01-06, you wrote:
>In a message dated 1/12/2006 12:15:48 P.M. Mountain Standard Time, 
>robsuth@robsuth.plus.com writes:
>enquiries as far as I could, but found no product among the DVD 
>portable readers that would deal with text, nor was any of the 
>manufacturers interested in producing one. Apart from the French 
>reader Cybook (which I think is too expensive for the task and 
>market although otherwise for the most part ideal) there seems still 
>to be no special ebook reading device available in UK or the EU 
>generally: laptops are still too big and heavy, and PDAs still have 
>far too small a screen My enquiries confirmed my strong impression 
>that there are protectionist interests holding this back, presumably 
>in the interests of proprietary issues of ebooks.
>
>I suggest you go to eBookWise and FictionWise. Although they do not 
>deal with DVD portable readers, they are seeking input into what 
>people need and want, and would be very glad to hear from you.
>
>My husband, two of my daughters, several of my friends, and I all 
>use the eBookWise reader which you can find for sale for about $100 
>at eBookWise.com. By using extra memory cards, you can build immense 
>libraries, or you can use only the memory built into the device and 
>build an immense library on your computer, realizing that you'll 
>have to download often. I have six memory cards and would like to 
>have about sixty more, but they aren't free.
>
>Anne Wingate
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.375 / Virus Database: 267.14.21/236 - Release Date: 20-01-06
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060121/62c9c6b0/attachment-0001.html
From jtinsley at pobox.com  Sun Jan 22 10:28:33 2006
From: jtinsley at pobox.com (Jim Tinsley)
Date: Sun Jan 22 11:42:33 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <200601191021.k0JALjx02910@pico.dm.unipi.it>
References: <20060119011933.GA16170@panix.com>
	<0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com>
	<200601191021.k0JALjx02910@pico.dm.unipi.it>
Message-ID: <20060122182833.GA25329@panix.com>

On Thu, 19 Jan 2006 11:21:45 +0100, Carlo Traverso <traverso@dm.unipi.it> wrote:

>
>I have times ago done some work to build a multilingual form of
>gutcheck, (and I still think that it is a very reasonable aim) but I
>stopped when Jim refused the very idea that this should be done.

"Opinions about Tolstoy and his work differ, but on one point
there surely might be unanimity. A writer of world-wide
reputation should be at least allowed to know how to spell his
own name. Why should any one insist on spelling it "Tolstoi"
(with one, two or three dots over the "i"), when he himself
writes it "Tolstoy"? The only reason I have ever heard suggested
is, that in England and America such outlandish views are
attributed to him, that an outlandish spelling is desirable
to match those views."

Love that quote. From Louise Maude's Translator's Preface to 
"Resurrection". I really must re-scan that, if only to capture 
the image of Tolstoy's signature -- with a "y" -- above those words.

I'm not a writer of world-wide reputation, of course, but I've
recently heard such outlandish views attributed to me that I'm
beginning to think of signing myself "Jim Tinslei", with, possibly,
three decadent dots over the "i".

So, did I ever "refuse the very idea that this should be done"? 

The society that is one of the referents of "Project Gutenberg", as I
understand it -- and I'm not at all sure that I do -- is a pretty good
model of a Libertarian society. It's even better than a Real Life
Libertarian society, since anyone can opt out; try doing that next
time your local tax-collector sends you a letter. People do (a) what
they want to do and (b) what they think should be done strongly enough
that they're willing to spend the hours of their lives doing it. Some
people also do (c) what Other People want them to do. Occasionally, or
usually. PPVs and WWs spend a lot of time on projects in which they,
personally, have no interest. Toolmakers try to accommodate the people
who use their tools. DP admins solve problems. Gravediggers move
difficult projects along. Like that. The complex society that exists
today in and around PG would all fall apart if some people didn't
offer themselves as something of a "public utility" in some limited
sphere.

So I'm used to the idea that people write to me out of the blue asking
for help or advice, just as I ask other "public utilities" within the
project for help and advice. But in such cases I, or they, are free to
refuse, or do something different. Project Gutenberg, however you
define it, doesn't sign anybody's paycheck, or make anybody do
anything at all. Neither do Michael or Greg as individuals. In fact,
those two worthies would walk a country mile in tight shoes to assure
you -- gesticulatingly -- that they have about as much influence over
what I do as Uri Geller's daily horoscope has over the shape of Reese
Witherspoon's toenails.

What's more, having the experience of being a public utility in PG
yourself, you know this better than most, which is why, when I dug
your original email on the subject in July 2002 out of the dumpster of
my archives, I was a little annoyed all over again that you had copied
Michael and Greg on it. 'Sfunny: I didn't remember the thread, but
when I saw the e-mail, I did remember that little sting of annoyance
at the assumption that either of them had anything at all to do with
my decisions about gutcheck.

Which is, of course, as nothing to the annoyance gutcheck has
inflicted over the years on various producers, so I guess, karmically,
I have it comin'. People who "grew up" in DP were "born" with others
looking over their shoulders, and so expect their homework to be
corrected unmercifully. Everybody there has fully internalized the
knowledge that they, and everyone else, makes mistakes, or, as Juliet
more correctly and insightfully remarked, _overlooks_ mistakes. Your
own recent excellent work on quantifying that will be invaluable in
several ways. Producers who had been making certain kinds of mistakes
for years without being aware of it, or having anyone correct them,
though, fully appreciated the pun in the name. It is no fun at all
having these things pointed out to you for the first time by someone
else. Dave's comments are really quite temperate compared to many of
the love-notes I received back in 2000-2002. My favorite was "DON'T
YOU DARE RUN THIS THING OVER MY OLD TEXTS!!" I was more than somewhat
sick myself when I first exposed some of my old work to jeebies, and
saw the full extent of my own heebieness, but at least in that case,
nobody else saw my shame, and I had no-one to be annoyed at but
myself.

I mention my annoyance because on re-reading what I wrote in response
to your proposal, it does jump off the screen at me, and I apologize
belatedly for that. It doesn't, however, have any bearing on my
decisions then or now.

What you actually proposed was that you should carve up gutcheck into
separate files, dealing with separate languages. If there had ever
been a day when I decided to sit down and write gutcheck, that's what
I might well have done from Day One, but there never was such a day.
To me, it's just a handy platform into which I can plug checks that I
find useful. 

As I said at the time, and so often before and since, I don't actually
think that the language-specific typo-checking functions should really
be in there at all; every text needs a spellcheck, and for texts that
have been spellchecked, these functions are only a source of false
positives. That's why I added the -t switch when I started sending it
out to other people. For me, they were handy as a quick way of getting
a hint whether an incoming file had been spellchecked or not.
Unfortunately, some producers lulled into a false sense of security by
not seeing typos flagged in gutcheck didn't do the spellcheck, which
was a problem I had to address around that time, but it seems to be
resolved now.

That's what I said, and that's what I still believe. 

I do think that punctuation checks for LOTE are an appropriate add-in,
but as a devout monoglot, I'm in no position to define them. I don't
have the experience of finding certain error-patterns by hand in LOTE
texts. People have suggested specific changes like this from time to
time, and I have usually incorporated them, where they don't cause
problems somewhere else. A few days ago, I asked any PPVs who want
certain punctuation type checks (or removal of existing checks) for
LOTE to define some for me. We'll see what comes out of that. Until I
see what the requested checks are, I'm not going to decide how to make
the changes. I'm certainly not going to refactor the code, or commit
myself to working with somebody else's refactored code, in advance of
knowing in what way it needs to be changed. 

Reading over old emails is weird; it brings back context. You wrote to
me when I was just setting up the SF site, to get it installed before
I released the FAQ and to give the Software Site a permanent link. Up
until then, people had got gutcheck directly from me, and often asked
for individualized versions, which I mostly made for them. If the
checks seemed good by my usual tests, I added them to "my" gutcheck
as well. That was the way it worked in that era. I looked forward, at
that time, to getting the damthing OUT, so that people could do their
own customizing, and I would be free. Free!! Bwahahahah!!

Heh.

I wished you well in your own customization, and I still do. The
volume of LOTE is much greater than it was then, and maybe somebody
working in that area (those areas?) will do their own thing. Great!
Maybe they'll ask me to customize some specific checks. Very
occasionally, people do that still. Maybe some PPVs in specific
languages will get together and suggest a coherent agenda to make
gutcheck (or some variant thereof) friendly to those languages.
I hope they do. They haven't yet.

Until that happens, I have more than enough things I want to do, and
think should be done, not to spend my limited PG time chasing Other
People to tell me things they want me to do . . . or, for that matter,
self-indulging in writing long posts to the vandalized wasteland that 
was once a productive resource for people making etexts. I really have
been very lazy since the Christmas break. Back to the grindstone.

jim


From Bowerbird at aol.com  Sun Jan 22 12:21:35 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Jan 22 12:21:40 2006
Subject: [gutvol-d] Language free version of guiguts?
Message-ID: <e3.2413338c.3105434f@aol.com>

anyway, it's a good thing 
command-line programs 
are back "in" these days...

thus gutcheck was able to
just "skip over" that messy
graphical-user-interface
period, thank heaven...

-bowerbird

p.s.   and no, i am _not_
suggesting that it was
anyone's _individual_
responsibility or failure.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060122/3f793c38/attachment.html
From hyphen at hyphenologist.co.uk  Sun Jan 22 13:16:08 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sun Jan 22 13:16:20 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <e3.2413338c.3105434f@aol.com>
References: <e3.2413338c.3105434f@aol.com>
Message-ID: <8ft7t19c2cileed5otbp4bvctgqk18948u@4ax.com>

On Sun, 22 Jan 2006 15:21:35 EST,  Bowerbird@aol.com wrote:

|anyway, it's a good thing 
|command-line programs 
|are back "in" these days...

Not with me.
-- 
Dave Fawthrop <dave hyphenologist co uk>
17,000 free e-books at Project Gutenberg! http://www.gutenberg.net
For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/

From hyphen at hyphenologist.co.uk  Sun Jan 22 13:26:24 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sun Jan 22 13:26:37 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <20060122182833.GA25329@panix.com>
References: <20060119011933.GA16170@panix.com>
	<0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com>
	<200601191021.k0JALjx02910@pico.dm.unipi.it>
	<20060122182833.GA25329@panix.com>
Message-ID: <knt7t1hgidbb8ishr32navd2fnphi41fi8@4ax.com>

On Sun, 22 Jan 2006 13:28:33 -0500,  Jim Tinsley <jtinsley@pobox.com>
wrote:


|What you actually proposed was that you should carve up gutcheck into
|separate files, dealing with separate languages. If there had ever
|been a day when I decided to sit down and write gutcheck, that's what
|I might well have done from Day One, but there never was such a day.
|To me, it's just a handy platform into which I can plug checks that I
|find useful. 

IMO with the advent of huge memory in even the entry level computers,   All
tests should be in the one program, and the different language versions
should be handled by simple switches/radio buttons, as with the various
sorts of angle brackets ATM.   OK the switches will inevitably become
complex and difficult. 
-- 
Dave Fawthrop <dave hyphenologist co uk>
17,000 free e-books at Project Gutenberg! http://www.gutenberg.net
For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/

From jtinsley at pobox.com  Sun Jan 22 13:39:44 2006
From: jtinsley at pobox.com (Jim Tinsley)
Date: Sun Jan 22 13:39:46 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <knt7t1hgidbb8ishr32navd2fnphi41fi8@4ax.com>
References: <20060119011933.GA16170@panix.com>
	<0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com>
	<200601191021.k0JALjx02910@pico.dm.unipi.it>
	<20060122182833.GA25329@panix.com>
	<knt7t1hgidbb8ishr32navd2fnphi41fi8@4ax.com>
Message-ID: <20060122213944.GA21967@panix.com>

On Sun, Jan 22, 2006 at 09:26:24PM +0000, Dave Fawthrop wrote:
>
>IMO with the advent of huge memory in even the entry level computers,   All
>tests should be in the one program, and the different language versions
>should be handled by simple switches/radio buttons, as with the various
>sorts of angle brackets ATM.   OK the switches will inevitably become
>complex and difficult. 

You're right, of course.

I _think_ you might even go one better. I've used the occurrence
of 50 instances of something recognizable as the English word "the"
as an indicator that a file is (at least partly) in English, and
a high number of certain types of characters to suggest that the
file is in ISO-8859 or UTF-8, and a high number of strings within
<> to indicate some flavor of *ML.

I suspect that a similar technique might be useful in multilingual
checkers in general, and if I wrote one I would certainly consider it.

jim


From hyphen at hyphenologist.co.uk  Mon Jan 23 00:12:02 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Mon Jan 23 00:12:16 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <20060122213944.GA21967@panix.com>
References: <20060119011933.GA16170@panix.com>
	<0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com>
	<200601191021.k0JALjx02910@pico.dm.unipi.it>
	<20060122182833.GA25329@panix.com>
	<knt7t1hgidbb8ishr32navd2fnphi41fi8@4ax.com>
	<20060122213944.GA21967@panix.com>
Message-ID: <kd29t154j8a67djrd72m3octoai1s6td74@4ax.com>

On Sun, 22 Jan 2006 16:39:44 -0500,  Jim Tinsley <jtinsley@pobox.com>
wrote:

|On Sun, Jan 22, 2006 at 09:26:24PM +0000, Dave Fawthrop wrote:
|>
|>IMO with the advent of huge memory in even the entry level computers,   All
|>tests should be in the one program, and the different language versions
|>should be handled by simple switches/radio buttons, as with the various
|>sorts of angle brackets ATM.   OK the switches will inevitably become
|>complex and difficult. 
|
|You're right, of course.
|
|I _think_ you might even go one better. I've used the occurrence
|of 50 instances of something recognizable as the English word "the"
|as an indicator that a file is (at least partly) in English, and
|a high number of certain types of characters to suggest that the
|file is in ISO-8859 or UTF-8, and a high number of strings within
|<> to indicate some flavor of *ML.
|
|I suspect that a similar technique might be useful in multilingual
|checkers in general, and if I wrote one I would certainly consider it.

There has been a lot of academic work on detecting language by counting
frequently used short words.    All languages have a different set of
frequently used short words. IIRC it is not particularly accurate, and
naturally falls down on text in two or more languages, I have a book in
Yorkshire and English on my desk ATM.

IMO Asking the user which language he/she is using would be easisier and
more reliable.
-- 
Dave Fawthrop <dave hyphenologist co uk>
17,000 free e-books at Project Gutenberg! http://www.gutenberg.net
For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/

From kouhia at nic.funet.fi  Wed Jan 25 11:39:46 2006
From: kouhia at nic.funet.fi (Juhana Sadeharju)
Date: Wed Jan 25 12:43:58 2006
Subject: [gutvol-d] Re: Ebook reading devices
Message-ID: <S22935AbWAYTjq/20060125193946Z+20355@nic.funet.fi>


Most important property of such a reader would be that
there is no need to install any upload-software to the
computer. Only then I could go to public library, browse
and download free ebooks to the reader. If the device has
the USB cable, then the device should be seen in the
computer as an usb/portable disk. The books saved to this
usb/portable disk then must also be readable in the reader.

Note, some mp3 players are seen as a usb/portable disk,
but the music files saved to the disk are not playable.

Some proprietary formats may have been reverse engineered
but that is illegal in USA and Europe. In Europe, one is
not even allowed to tell what unofficial software could be
used to convert the text to the proprietary format. And reverse
engineering would not help at all, because one cannot install
anything in public libraries.

Does eBookWise require a software installation?
Sony devices are known for requiring a software installation.

Note, the requirement for the software installation hits harder
the digital camera. Most popular cameras require the software
installation, and therefore one is not able, e.g., to transfer
the photos to safe, to home, via a public library, an internet
cafe, or a hotel. Think about travelling and a possibility that
your camera gets stolen or otherwise damaged, or that you're short
of the memory cards.

Juhana
-- 
  http://music.columbia.edu/mailman/listinfo/linux-graphics-dev
  for developers of open source graphics software
From imaclean at gmail.com  Wed Jan 25 19:16:42 2006
From: imaclean at gmail.com (Ian MacLean)
Date: Wed Jan 25 19:16:46 2006
Subject: [gutvol-d] Re: Ebook reading devices
In-Reply-To: <S22935AbWAYTjq/20060125193946Z+20355@nic.funet.fi>
References: <S22935AbWAYTjq/20060125193946Z+20355@nic.funet.fi>
Message-ID: <3156339d0601251916x63e7b46ai884ecadac1f7c0fc@mail.gmail.com>

On 1/26/06, Juhana Sadeharju <kouhia@nic.funet.fi> wrote:
>
> Most important property of such a reader would be that
> there is no need to install any upload-software to the
> computer. Only then I could go to public library, browse
> and download free ebooks to the reader. If the device has
> the USB cable, then the device should be seen in the
> computer as an usb/portable disk. The books saved to this
> usb/portable disk then must also be readable in the reader.
>
both the sony reader and the iRex have an sd card slot. So you would
only need to connect a card reader to the library computer to
download. Whether the reader supports the format you choose to copy to
the SD card is another matter.

Ian
From Bowerbird at aol.com  Fri Jan 27 12:22:51 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Jan 27 12:23:07 2006
Subject: [gutvol-d] blah blah blog
Message-ID: <90.6e2d8c3d.310bdb1b@aol.com>

as promised, i'm morphing myself to my blah blah blog.
don't want to talk about you behind your back, though...

>    http://journals.aol.com/bowerbird/bowerbirdseyeview

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060127/ae920118/attachment.html
From hart at pglaf.org  Sat Jan 28 10:17:08 2006
From: hart at pglaf.org (Michael Hart)
Date: Sat Jan 28 10:17:12 2006
Subject: [gutvol-d] blah blah blog
In-Reply-To: <90.6e2d8c3d.310bdb1b@aol.com>
References: <90.6e2d8c3d.310bdb1b@aol.com>
Message-ID: <Pine.LNX.4.60.0601281015450.356@pglaf.org>


On Fri, 27 Jan 2006 Bowerbird@aol.com wrote:

> as promised, i'm morphing myself to my blah blah blog.
> don't want to talk about you behind your back, though...
>
>>    http://journals.aol.com/bowerbird/bowerbirdseyeview
>
> -bowerbird
>
In re: to the top note and your reply:

I read the PG eBooks with all sorts of plain text viewers
and have no problems with inconsistencies, much less in
the various browsers that have a wider range of options.

Michael

From Bowerbird at aol.com  Sat Jan 28 12:00:23 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Jan 28 12:00:27 2006
Subject: [gutvol-d] blah blah blog
Message-ID: <202.11261918.310d2757@aol.com>

michael said:
>    I read the PG eBooks with all sorts of plain text viewers
>    and have no problems with inconsistencies, much less in
>    the various browsers that have a wider range of options.

the inconsistencies are ones that a person "wouldn't notice",
but which trip up any automated processing by a program...

an obvious example would be that most section-headings
(e.g., chapter headings) are preceded by four blank lines,
but the occasional one might have three, or five, instead...

nobody would claim that, in terms of a human reader, this
inconsistency is meaningful -- it's not -- but when it comes
to a program analyzing the file, it might make a difference...

if there's only one level of header, then 3 or 4 or 5 blank lines
might be equally good at signaling that there is a new section.

but if a book has three different levels of headers, as some do,
then you could use 5 blank lines to indicate the major sections,
4 to indicate regular sections, and 3 to indicate the subsections.

if the number of blank lines isn't consistent, the program has to
become much more sophisticated (and thus prone to failure) to
try and determine the _actual_ level of each header.

another example involves lines which should not be rewrapped,
such as the lines in a table, or the lines in a letter's address-block.
if these are consistently prefaced with one or more leading spaces,
then a rewrap routine is easy to _write_ and easy to _comprehend_,
and a programmer can spend time on more productive pursuits that
add value and functionality, not ones that just resolve inconsistencies.

lots of programmers have _started_ programs for the p.g. library.
the vast majority of them have given up before long, in frustration.
the inconsistencies in the formatting are the main source of difficulty.

someday someone will set up a shadow version of the p.g. library
where all the inconsistencies are resolved, and you will see then
how much value is added by the ingenuity of programmers who
are able to take consistent formatting of the e-texts for granted...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060128/0c77c7d4/attachment.html
From hart at pglaf.org  Mon Jan 30 13:34:09 2006
From: hart at pglaf.org (Michael Hart)
Date: Mon Jan 30 13:34:11 2006
Subject: [gutvol-d] blah blah blog
In-Reply-To: <202.11261918.310d2757@aol.com>
References: <202.11261918.310d2757@aol.com>
Message-ID: <Pine.LNX.4.60.0601301319460.22549@pglaf.org>


So, what you are telling me hre, ist hat while a human can muddle through ok,
it takes a computer to really maess things up.



On Sat, 28 Jan 2006 Bowerbird@aol.com wrote:

> michael said:
>>    I read the PG eBooks with all sorts of plain text viewers
>>    and have no problems with inconsistencies, much less in
>>    the various browsers that have a wider range of options.
>
> the inconsistencies are ones that a person "wouldn't notice",
> but which trip up any automated processing by a program...
>
> an obvious example would be that most section-headings
> (e.g., chapter headings) are preceded by four blank lines,
> but the occasional one might have three, or five, instead...
>
> nobody would claim that, in terms of a human reader, this
> inconsistency is meaningful -- it's not -- but when it comes
> to a program analyzing the file, it might make a difference...
>
> if there's only one level of header, then 3 or 4 or 5 blank lines
> might be equally good at signaling that there is a new section.
>
> but if a book has three different levels of headers, as some do,
> then you could use 5 blank lines to indicate the major sections,
> 4 to indicate regular sections, and 3 to indicate the subsections.
>
> if the number of blank lines isn't consistent, the program has to
> become much more sophisticated (and thus prone to failure) to
> try and determine the _actual_ level of each header.
>
> another example involves lines which should not be rewrapped,
> such as the lines in a table, or the lines in a letter's address-block.
> if these are consistently prefaced with one or more leading spaces,
> then a rewrap routine is easy to _write_ and easy to _comprehend_,
> and a programmer can spend time on more productive pursuits that
> add value and functionality, not ones that just resolve inconsistencies.
>
> lots of programmers have _started_ programs for the p.g. library.
> the vast majority of them have given up before long, in frustration.
> the inconsistencies in the formatting are the main source of difficulty.
>
> someday someone will set up a shadow version of the p.g. library
> where all the inconsistencies are resolved, and you will see then
> how much value is added by the ingenuity of programmers who
> are able to take consistent formatting of the e-texts for granted...
>
> -bowerbird
>
From hart at pglaf.org  Mon Jan 30 14:16:31 2006
From: hart at pglaf.org (Michael Hart)
Date: Mon Jan 30 14:16:33 2006
Subject: [gutvol-d] blah blah blog
In-Reply-To: <43DE8E8E.5090900@novomail.net>
References: <202.11261918.310d2757@aol.com>
	<Pine.LNX.4.60.0601301319460.22549@pglaf.org>
	<43DE8E8E.5090900@novomail.net>
Message-ID: <Pine.LNX.4.60.0601301415410.22549@pglaf.org>


On Mon, 30 Jan 2006, Lee Passey wrote:

> Michael Hart wrote:
>
>> 
>> So, what you are telling me hre, ist hat while a human can muddle through 
>> ok,
>> it takes a computer to really maess things up.
>
>
> I think what he is saying is that the human brain is a highly capable, 
> general purpose computing device, highly capable of resolving ambiguity, 
> whereas computers and their associated software are still rather primative 
> devices. I believe that at some point in the future computers will be capable 
> of resolving all the ambiguities inherent in PG e-texts, but that day is not 
> yet here, and until then the software is going to require some human help.

Still, I don't see why the computer has to make all those decisions.

Can't it just lay there, out of the process, and just let me read?

;-)
From lee at novomail.net  Mon Jan 30 14:09:18 2006
From: lee at novomail.net (Lee Passey)
Date: Mon Jan 30 14:45:33 2006
Subject: [gutvol-d] blah blah blog
In-Reply-To: <Pine.LNX.4.60.0601301319460.22549@pglaf.org>
References: <202.11261918.310d2757@aol.com>
	<Pine.LNX.4.60.0601301319460.22549@pglaf.org>
Message-ID: <43DE8E8E.5090900@novomail.net>

Michael Hart wrote:

>
> So, what you are telling me hre, ist hat while a human can muddle 
> through ok,
> it takes a computer to really maess things up.


I think what he is saying is that the human brain is a highly capable, 
general purpose computing device, highly capable of resolving ambiguity, 
whereas computers and their associated software are still rather 
primative devices. I believe that at some point in the future computers 
will be capable of resolving all the ambiguities inherent in PG e-texts, 
but that day is not yet here, and until then the software is going to 
require some human help.


From lee at novomail.net  Mon Jan 30 14:54:23 2006
From: lee at novomail.net (Lee Passey)
Date: Mon Jan 30 14:54:24 2006
Subject: [gutvol-d] blah blah blog
In-Reply-To: <Pine.LNX.4.60.0601301415410.22549@pglaf.org>
References: <202.11261918.310d2757@aol.com>	<Pine.LNX.4.60.0601301319460.22549@pglaf.org>	<43DE8E8E.5090900@novomail.net>
	<Pine.LNX.4.60.0601301415410.22549@pglaf.org>
Message-ID: <43DE991F.5070801@novomail.net>

Michael Hart wrote:

>
> On Mon, 30 Jan 2006, Lee Passey wrote:
>
>> Michael Hart wrote:
>>
>>>
>>> So, what you are telling me hre, ist hat while a human can muddle 
>>> through ok,
>>> it takes a computer to really maess things up.
>>
>>
>>
>> I think what he is saying is that the human brain is a highly 
>> capable, general purpose computing device, highly capable of 
>> resolving ambiguity, whereas computers and their associated software 
>> are still rather primative devices. I believe that at some point in 
>> the future computers will be capable of resolving all the ambiguities 
>> inherent in PG e-texts, but that day is not yet here, and until then 
>> the software is going to require some human help.
>
>
> Still, I don't see why the computer has to make all those decisions.
>
> Can't it just lay there, out of the process, and just let me read?
>
> ;-) 


It can, but it can also do more. Personally, my reading experience is 
improved if new chapters always start at the top of the screen, and if 
chapter and section headings are rendered in a way that makes it 
_obvious_ that they are chapter and section headings. When I read I like 
to become so engrossed that I don't have to stop and think about the 
mechanics of the layout. I _can_ do so, I just don't _want_ to.

Obviously, Project Gutenberg e-texts are insufficient for me, just as 
they are adequate for you. But it is a fallacy to assume that because 
they are sufficient for you that they are sufficient for everyone.


From jon at noring.name  Mon Jan 30 14:56:26 2006
From: jon at noring.name (Jon Noring)
Date: Mon Jan 30 14:56:26 2006
Subject: [gutvol-d] "Lady Chatterly's Lover" public domain in the U.S.?
Message-ID: <913752685.20060130155626@noring.name>

Everyone,

Even though D.H. Lawrence's "Lady Chatterly's Lover" was published in
1928, some of the publishing details in the following web site indicate
the possibility it may be public domain in the U.S.:

   http://web.ukonline.co.uk/rananim/lawrence/lcl.html


Anyone have more knowledge on the U.S. public domain status of this
work?

Jon

From Bowerbird at aol.com  Mon Jan 30 15:21:16 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Jan 30 15:21:25 2006
Subject: [gutvol-d] blah blah blog
Message-ID: <2cc.27f269c.310ff96c@aol.com>

lee said:
>   I believe that at some point in the future computers will be 
>    capable of resolving all the ambiguities inherent in PG e-texts, 
>    but that day is not yet here, and until then the software is 
>    going to require some human help.

well, my computer routines can already resolve _most_ ambiguities,
as i pointed out in the blog entry.   i've even offered to do this for p.g.,
providing p.g. makes a commitment to stay consistent in the future...

but there seemed to me no recognition for the need for consistency,
let alone any desire to make a commitment to attain it...

***

michael said:
>    Can't it just lay there, out of the process, and just let me read?

sure, it could.

but sometimes -- in the course of reading, and even outside of it --
there are things you want to know, or find out, or have the computer
do for you, or figure out for you.

take my headings example, for instance.
it's nice if the computer can figure out the headings for you,
because then it can produce a nice "table of contents" menu,
which you can look at to get a quick overview on the e-text,
or use to jump directly to the "dance of the quadrille" chapter
(even if you hadn't yet known that there was such a chapter)...

or, as lee has remarked, if the program knows the headers,
it can do nice things like start a chapter on a new screen, and
make the header big and bold.   these nice touches are... nice.

or take my text-wrapping example, as yet another instance.
sometimes you want to reflow the text to a narrow window.
if the non-rewrap lines have been clearly indicated -- e.g.,
with the leading space that i suggested -- then the computer
can do the rewrap for you quickly and easily, with no errors...
(and this goes beyond "a nice touch" into basic functionality.)

consistency also goes a _long_ way in making _conversions_.
surely you must realize that many of your e-texts are finding
their highest popularity as a result of some sort of conversion,
whether over at blackmask.com or manytexts.com or wherever.
i think it behooves you to groom your texts for such conversions,
whether you do those conversions or someone else does them...

(heck, the time that has been spent doing .html conversions alone
could have been reduced _significantly_ by improving consistency
and then depending upon automatic .html generation by computer.
and then all of the .html versions would have been consistent too!)

unless you have a _commitment_ to removing the inconsistencies,
though, you won't understand the depth of the processes required.
for instance, staying with the rewrap example, you wouldn't realize
that if you're creating a book of poetry, where _none_ of the lines
should be rewrapped, you should preface _all_ lines with a space.

other examples abound.   if footnotes are marked consistently,
the computer can be programmed to treat them appropriately.
a "table of illustrations" can become a more-useful hotlinked list,
or the launching point for a slideshow of all the book's pictures...

basically, it means thinking in terms of the _library_, not the _book_.

consistency is the foundation that can allow _all_kinds_ of neat
new features, limited only by the imaginations of programmers.
the reason you haven't seen this exhibited already is because
programmers have been flustered by massive inconsistencies.
even your own p.g. people can't produce the best possible tools
to help them do their jobs because of all the parsing difficulties.

***

so yeah, the computer could just lay there.   but why not put it to work?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060130/d01e4bd2/attachment.html
From grythumn at gmail.com  Mon Jan 30 15:29:54 2006
From: grythumn at gmail.com (Robert Cicconetti)
Date: Mon Jan 30 15:36:08 2006
Subject: [gutvol-d] "Lady Chatterly's Lover" public domain in the U.S.?
In-Reply-To: <913752685.20060130155626@noring.name>
References: <913752685.20060130155626@noring.name>
Message-ID: <15cfa2a50601301529s22d9772bje78f49a792185c14@mail.gmail.com>

AFAICT, it's still under copyright in the US. Lawrence was British (which
rules out Rule 5), and it was still under copyright (LIFE+70) in the UK in
1996, so GATT prevents Rule 6. 1930+95=2025 for the US.

http://www.gutenberg.org/howto/copyright-howto

It appears to be clearable in LIFE+50 and LIFE+70 countries, though.

Anyone want to chip in with expansions or corrections?

R C

On 1/30/06, Jon Noring <jon@noring.name> wrote:
>
> Everyone,
>
> Even though D.H. Lawrence's "Lady Chatterly's Lover" was published in
> 1928, some of the publishing details in the following web site indicate
> the possibility it may be public domain in the U.S.:
>
>    http://web.ukonline.co.uk/rananim/lawrence/lcl.html
>
>
> Anyone have more knowledge on the U.S. public domain status of this
> work?
>
> Jon
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060130/ece74dc0/attachment.html
From grythumn at gmail.com  Mon Jan 30 15:57:01 2006
From: grythumn at gmail.com (Robert Cicconetti)
Date: Mon Jan 30 15:57:07 2006
Subject: [gutvol-d] "Lady Chatterly's Lover" public domain in the U.S.?
In-Reply-To: <15cfa2a50601301529s22d9772bje78f49a792185c14@mail.gmail.com>
References: <913752685.20060130155626@noring.name>
	<15cfa2a50601301529s22d9772bje78f49a792185c14@mail.gmail.com>
Message-ID: <15cfa2a50601301557u1dd66479jfd0b520307b523af@mail.gmail.com>

Actually, that would 1928+95=2023, provided you find one of the first few
printings. Sorry. Still not PD in the US for quite some time.

R C

On 1/30/06, Robert Cicconetti <grythumn@gmail.com> wrote:
>
> AFAICT, it's still under copyright in the US. Lawrence was British (which
> rules out Rule 5), and it was still under copyright (LIFE+70) in the UK in
> 1996, so GATT prevents Rule 6. 1930+95=2025 for the US.
>
> http://www.gutenberg.org/howto/copyright-howto
>
> It appears to be clearable in LIFE+50 and LIFE+70 countries, though.
>
> Anyone want to chip in with expansions or corrections?
>
> R C
>
> On 1/30/06, Jon Noring <jon@noring.name> wrote:
> >
> > Everyone,
> >
> > Even though D.H. Lawrence's "Lady Chatterly's Lover" was published in
> > 1928, some of the publishing details in the following web site indicate
> > the possibility it may be public domain in the U.S.:
> >
> >    http://web.ukonline.co.uk/rananim/lawrence/lcl.html
> >
> >
> > Anyone have more knowledge on the U.S. public domain status of this
> > work?
> >
> > Jon
> >
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d@lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060130/3ddffcc8/attachment-0001.html
From jon at noring.name  Mon Jan 30 17:12:43 2006
From: jon at noring.name (Jon Noring)
Date: Mon Jan 30 17:12:45 2006
Subject: [gutvol-d] "Lady Chatterly's Lover" public domain in the U.S.?
In-Reply-To: <15cfa2a50601301557u1dd66479jfd0b520307b523af@mail.gmail.com>
References: <913752685.20060130155626@noring.name>
	<15cfa2a50601301529s22d9772bje78f49a792185c14@mail.gmail.com>
	<15cfa2a50601301557u1dd66479jfd0b520307b523af@mail.gmail.com>
Message-ID: <13372441.20060130181243@noring.name>

Robert Cicconetti wrote:

> Actually, that would 1928+95=2023, provided you find one of the
> first few printings. Sorry. Still not PD in the US for quite some time.

However, if one reads the link I provided previously, see:

http://web.ukonline.co.uk/rananim/lawrence/lcl.html

It says:

"...As it was published privately [in 1928], no copyright was issued,
so the novel was free game to any and all who wished to pirate their
own editions, and the pirates could sell them for as much as possible."

So, given this, is the original text public domain in the U.S.? If
not, who is the "rights holder"?

Jon

From grythumn at gmail.com  Mon Jan 30 19:39:17 2006
From: grythumn at gmail.com (Robert Cicconetti)
Date: Mon Jan 30 19:39:22 2006
Subject: [gutvol-d] "Lady Chatterly's Lover" public domain in the U.S.?
In-Reply-To: <13372441.20060130181243@noring.name>
References: <913752685.20060130155626@noring.name>
	<15cfa2a50601301529s22d9772bje78f49a792185c14@mail.gmail.com>
	<15cfa2a50601301557u1dd66479jfd0b520307b523af@mail.gmail.com>
	<13372441.20060130181243@noring.name>
Message-ID: <15cfa2a50601301939q7a7acd10y6b3561bdae3dbeee@mail.gmail.com>

On 1/30/06, Jon Noring <jon@noring.name> wrote:
>
> Robert Cicconetti wrote:
>
> > Actually, that would 1928+95=2023, provided you find one of the
> > first few printings. Sorry. Still not PD in the US for quite some time.
>
> However, if one reads the link I provided previously, see:
>
> http://web.ukonline.co.uk/rananim/lawrence/lcl.html
>
> It says:
>
> "...As it was published privately [in 1928], no copyright was issued,
> so the novel was free game to any and all who wished to pirate their
> own editions, and the pirates could sell them for as much as possible."
>
> So, given this, is the original text public domain in the U.S.? If
> not, who is the "rights holder"?
>

Then you'll have to establish that it wasn't under copyright in the UK in
1996 (Instead of using the blanket Life+70), and further search for renewals
under US law for a Rule 6 clearance. An article would not suffice.

If you wish to try to clear the book, feel free. It's not something that I'm
particularly interested in. I would expect it to take a lot of time and
effort both on your part and for Mr. Newby (who as we all know has gobs of
free time).

R C
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060130/af069358/attachment.html
From gbnewby at pglaf.org  Mon Jan 30 21:44:52 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon Jan 30 21:44:54 2006
Subject: [gutvol-d] "Lady Chatterly's Lover" public domain in the U.S.?
In-Reply-To: <15cfa2a50601301939q7a7acd10y6b3561bdae3dbeee@mail.gmail.com>
References: <913752685.20060130155626@noring.name>
	<15cfa2a50601301529s22d9772bje78f49a792185c14@mail.gmail.com>
	<15cfa2a50601301557u1dd66479jfd0b520307b523af@mail.gmail.com>
	<13372441.20060130181243@noring.name>
	<15cfa2a50601301939q7a7acd10y6b3561bdae3dbeee@mail.gmail.com>
Message-ID: <20060131054452.GA1424@pglaf.org>

On Mon, Jan 30, 2006 at 10:39:17PM -0500, Robert Cicconetti wrote:
> On 1/30/06, Jon Noring <jon@noring.name> wrote:
> >
> > Robert Cicconetti wrote:
> >
> > > Actually, that would 1928+95=2023, provided you find one of the
> > > first few printings. Sorry. Still not PD in the US for quite some time.
> >
> > However, if one reads the link I provided previously, see:
> >
> > http://web.ukonline.co.uk/rananim/lawrence/lcl.html
> >
> > It says:
> >
> > "...As it was published privately [in 1928], no copyright was issued,
> > so the novel was free game to any and all who wished to pirate their
> > own editions, and the pirates could sell them for as much as possible."
> >
> > So, given this, is the original text public domain in the U.S.? If
> > not, who is the "rights holder"?
> >
> 
> Then you'll have to establish that it wasn't under copyright in the UK in
> 1996 (Instead of using the blanket Life+70), and further search for renewals
> under US law for a Rule 6 clearance. An article would not suffice.
> 
> If you wish to try to clear the book, feel free. It's not something that I'm
> particularly interested in. I would expect it to take a lot of time and
> effort both on your part and for Mr. Newby (who as we all know has gobs of
> free time).
> 
> R C

I don't see any indication that this isn't covered by GATT, as described
in our Rule 6 "howto" at http://www.gutenberg.org/howto/copyright-howto

No "registration" for copyright was/is required in the UK nor, indeed,
anywhere...it's a US invention that didn't catch on much elsewhere,
and in fact is no longer required in the US.

The GATT actually rolled back copyrights (or created enforcement
opportunities, anyway) for items previously published in the US written
by non-US authors.  The Grove Press was one of several publishers that
specialized in these types of items.  From what I understand, it was
legal then (or at least not very enforceable, if illegal), but GATT made
things crystal clear.  Our "Rule 6" howto is a highly distilled version
of GATT prepared & endorsed by some of our volunteer legal experts...

As mentioned earlier, this should be copyright-free in life+70
countries, as well as life+50.

BTW, the unpublished (or largely undistributed) manuscripts mentioned
might be even worse off!  See our copyright howto for the dates.  But
the limited distribution of thos might be enough to not treat them as 
unpublished manuscripts.

To clear this item as public domain, we'd need confirmation from one of
our legal experts (which I won't ask for, based on what I've seen so
far), or a letter from a qualified lawyer, or something similar (like a
letter from the Librarian of Congress, or Lawrence's estate).  Yes, a
pretty high barrier.  Unless, of course, I'm missing something...
  -- Greg

From hyphen at hyphenologist.co.uk  Tue Jan 31 06:55:55 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Tue Jan 31 07:00:06 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <igjns1pqnsdj50m5prvddehhqvevddg1a8@4ax.com>
References: <2e5.8ddc4c.30f9ebc2@aol.com>
	<3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
	<43C963C5.4080400@xs4all.nl>
	<igjns1pqnsdj50m5prvddehhqvevddg1a8@4ax.com>
Message-ID: <h3uut19idjipjchn0232smabl2t3g9fso8@4ax.com>

On Mon, 16 Jan 2006 17:13:12 +0000,  Dave Fawthrop
<hyphen@hyphenologist.co.uk> wrote:

|I am working on Yorkshire dialect poems and text, by John Hartley etext No
|17472 and have previously done some of F W Moorman's  3232, 2888  work.

I have just finished another of Hartley's Dialect Books, Hartley's
Yorkshire ditties Second Series and put it through Gutcheck.   It is a tiny
book, 4 1/2 ins * 6 1/2 Ins and only 143 pages   3062 lines of PG Etext.
Gutcheck throws 526 errors. All of which are wrong, except about 10 are
trying to correct errors in the original text.   It only found about a
dozen real errors.

No more comment required.

-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain


From bzg at altern.org  Tue Jan 31 05:44:50 2006
From: bzg at altern.org (Bastien)
Date: Tue Jan 31 07:08:21 2006
Subject: [gutvol-d] blah blah blog
In-Reply-To: <2cc.27f269c.310ff96c@aol.com> (Bowerbird@aol.com's message of
	"Mon, 30 Jan 2006 18:21:16 EST")
References: <2cc.27f269c.310ff96c@aol.com>
Message-ID: <87fyn43419.fsf@tallis.ilo.ucl.ac.uk>

Hi,

Bowerbird@aol.com writes:

> well, my computer routines can already resolve _most_ ambiguities,

Can we see the code?  Can we use/test/improve it?

> whether over at blackmask.com or manytexts.com or wherever.

I can't find manytexts.com.  Typo ?

> so yeah, the computer could just lay there. but why not put it to 
> work?

I wish i could put my computer to test your routines.  Maybe i missed
a link, but i can't find any relevant entrie in the archives.

Cheers,

-- 
Bastien
From matthew at mc.clintock.com  Tue Jan 31 07:33:48 2006
From: matthew at mc.clintock.com (Matthew McClintock)
Date: Tue Jan 31 07:55:15 2006
Subject: [gutvol-d] blah blah blog
In-Reply-To: <87fyn43419.fsf@tallis.ilo.ucl.ac.uk>
References: <2cc.27f269c.310ff96c@aol.com>
	<87fyn43419.fsf@tallis.ilo.ucl.ac.uk>
Message-ID: <03BD8B4A-9204-417F-A35B-5FE27938DE5E@mc.clintock.com>

Maybe he means http://manybooks.net ?

Sorry, I don't know the context - I filter his posts, and only see  
everyone else's replies.

-Matt

On Jan 31, 2006, at 7:44 AM, Bastien wrote:

> I can't find manytexts.com.  Typo ?

--
http://mc.clintock.com
http://manybooks.net
--

From hart at pglaf.org  Tue Jan 31 08:00:02 2006
From: hart at pglaf.org (Michael Hart)
Date: Tue Jan 31 08:00:05 2006
Subject: !@!Re: [gutvol-d] blah blah blog
In-Reply-To: <43DE991F.5070801@novomail.net>
References: <202.11261918.310d2757@aol.com>
	<Pine.LNX.4.60.0601301319460.22549@pglaf.org>
	<43DE8E8E.5090900@novomail.net>
	<Pine.LNX.4.60.0601301415410.22549@pglaf.org>
	<43DE991F.5070801@novomail.net>
Message-ID: <Pine.LNX.4.60.0601310733050.12859@pglaf.org>

On Mon, 30 Jan 2006, Lee Passey wrote:

> Michael Hart wrote:
>
>> 
>> On Mon, 30 Jan 2006, Lee Passey wrote:
>> 
>>> Michael Hart wrote:
>>> 
>>>> 
>>>> So, what you are telling me hre, ist hat while a human can muddle 
>>>> through ok,
>>>> it takes a computer to really maess things up.
>>> 
>>> 
>>> 
>>> I think what he is saying is that the human brain is a highly capable, 
>>> general purpose computing device, highly capable of resolving ambiguity, 
>>> whereas computers and their associated software are still rather 
>>> primative devices. I believe that at some point in the future computers 
>>> will be capable of resolving all the ambiguities inherent in PG e-texts, 
>>> but that day is not yet here, and until then the software is going to 
>>> require some human help.
>> 
>> 
>> Still, I don't see why the computer has to make all those decisions.
>> 
>> Can't it just lay there, out of the process, and just let me read?
>> 
>> ;-) 
>
>
> It can, but it can also do more. Personally, my reading experience is 
> improved if new chapters always start at the top of the screen, and if 
> chapter and section headings are rendered in a way that makes it _obvious_ 
> that they are chapter and section headings. When I read I like to become so 
> engrossed that I don't have to stop and think about the mechanics of the 
> layout. I _can_ do so, I just don't _want_ to.
>
> Obviously, Project Gutenberg e-texts are insufficient for me, just as they 
> are adequate for you. But it is a fallacy to assume that because they are 
> sufficient for you that they are sufficient for everyone.

Obviously this is a conversation from those who are demanding something
in terms of eBook preparation that they do not demand of paper books.

I've spoken with librarians who would prefer that all books be made out
of the same kinds of materials, the same kinds of paper, bindings, with
them all cut to the same size. . .just think how much that would help a
library with shelving, cart design, drop off slots, mailing boxes, etc.

Then again, I've spoken to library patrons who would prefer that all of
the libraries buy the same edition of the same books and shelve all the
books in exactly the same manner, so they can walk into ANY library and
just grab the first red book on the left and know what it will be.

Once again the major point is that most of the work has already been in
the system for you before you came along, and it is up to you to, "Take
matters into your own hands," as one put it, and do the minuscule works
that are required to make the books completely consistent with your own
philosophy of how eBooks should be created.

There's nothing wrong with what such people are asking, other than that
they are asking someone else to do it for them, free of charge.

"An Unfunded Mandate" as the politicians often refer to such things.

Those of us who have been on this list, and others, for very long these
days have no trouble remember any number of people who have had variety
upon variety of requests that Project Gutenberg should be run in such a
fashion as to meet with their demands.

The response is always to invite the creation of some examples, along a
suggested pathway for future efforts, to be accompanied by your request
for others to assist you in making such future efforts.

It's one thing to ask for help with something you are doing, whether it
be a dozen examples per month or per year, until you finally get all of
the volunteers you need to make things happen the way you would like.

It's totally something else to insist that rules should be made to make
others do things your way, whether they want to or not, especially when
they are already doing most of what you want.

Of course, there is a middle ground:  write up the rules/suggestions in
such a way that anyone creating eBooks is likely to find them before an
eBook is created, and provide them with encouragement through examples.

I don't think anyone would have an objection to each of the participant
elements in these disussions having a URL posted to link to suggestions
they have about eBook creation standards as long as all such suggestion
files come complete with an ever increasing set of examples; after all,
if YOU are not convinced enough of your own suggestions to carry them a
short way every so often, how can you expect others to carry them every
single time they make an eBook?

The preferred solution at Project Gutenberg is to lead by example; make
your preferences known and provide a continuing set of examples.

Otherwise how will anyone know what you are encouraging them to do, and
the reasons you have for your requests.

In terms of making eBooks the way you want them to be:

"It is better to light a candle than to curse the darkness."

Provide examples.

Describe how these examples are better than previous examples, and how
the previous examples are better than those that came before them.

Make sure you describe the evolution of eBooks as a process, a process
that can be improved from time to time.

Then, if people like your new eBooks better than the old ones, you are
very likely to find a dozen volunteers to help you provide even better
eBook collections that will help you find even more volunteers.



Give the world eBooks in 2006!!!

Michael S. Hart
Founder
Project Gutenberg

From ciesiels at bigpond.net.au  Tue Jan 31 07:18:08 2006
From: ciesiels at bigpond.net.au (Michael Ciesielski)
Date: Tue Jan 31 09:48:40 2006
Subject: [gutvol-d] Comments on "blah blah blog"
In-Reply-To: <2cc.27f269c.310ff96c@aol.com>
References: <2cc.27f269c.310ff96c@aol.com>
Message-ID: <43DF7FB0.9050200@bigpond.net.au>

Please keep comments on issues raised in bowerbird's "blah blah blog" 
off this list. There appears to be a facility for comments on the blog 
itself, or I'm sure bowerbird would be happy to hear from you directly 
by email.

Thank you!
From hiddengreen at gmail.com  Tue Jan 31 10:23:04 2006
From: hiddengreen at gmail.com (Cori)
Date: Tue Jan 31 10:23:12 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <h3uut19idjipjchn0232smabl2t3g9fso8@4ax.com>
References: <2e5.8ddc4c.30f9ebc2@aol.com>
	<3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
	<43C963C5.4080400@xs4all.nl>
	<igjns1pqnsdj50m5prvddehhqvevddg1a8@4ax.com>
	<h3uut19idjipjchn0232smabl2t3g9fso8@4ax.com>
Message-ID: <910fee4a0601311023h7a826156ke08a5e683f40e983@mail.gmail.com>

On 1/31/06, Dave Fawthrop <hyphen@hyphenologist.co.uk> wrote:
> I have just finished another of Hartley's Dialect Books, Hartley's
> Yorkshire ditties Second Series and put it through Gutcheck.   It is a tiny
> book, 4 1/2 ins * 6 1/2 Ins and only 143 pages   3062 lines of PG Etext.
> Gutcheck throws 526 errors. All of which are wrong, except about 10 are
> trying to correct errors in the original text.   It only found about a
> dozen real errors.
>
> No more comment required.

Indeed :)  Catching a dozen real errors is a definite win!

Plus, though it doesn't sound like it happened for you, I find that
checking the false errors gives me a different view of the text, and
thus occasionally spot other things in the text, (usually sneaky
scannos of the lie/he, ago/age type.)

Thanks again for Gutcheck, Jim!

Cori
From hyphen at hyphenologist.co.uk  Tue Jan 31 11:32:57 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Tue Jan 31 11:48:22 2006
Subject: [gutvol-d] Anyone recomend a good free wisywig html editor to PG
	standards
Message-ID: <p3evt1167iekbuc6rfnp55o534vpfalihm@4ax.com>

Just turning my Hartley books into html.

The generate html function of guiguts did not do a bad job of generating
html, but the poetry needs a lot of tweaking.   But it expects me to insert
flags by hand :-(   I gave up doing that in the bad old nroff days.
<shudder> way back in OMG 1986.  

Anyone recommend a good free windoze html editor to PG standards?
I normally use an old copy of M$ Front page for html.

-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain

From hyphen at hyphenologist.co.uk  Tue Jan 31 11:49:13 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Tue Jan 31 11:58:38 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <910fee4a0601311130i65126631l97a9c6ef51c6ab4a@mail.gmail.com>
References: <2e5.8ddc4c.30f9ebc2@aol.com>
	<3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
	<43C963C5.4080400@xs4all.nl>
	<igjns1pqnsdj50m5prvddehhqvevddg1a8@4ax.com>
	<h3uut19idjipjchn0232smabl2t3g9fso8@4ax.com>
	<910fee4a0601311023h7a826156ke08a5e683f40e983@mail.gmail.com>
	<kbdvt11n2fj950249urj5bqi00o06vqs3l@4ax.com>
	<910fee4a0601311130i65126631l97a9c6ef51c6ab4a@mail.gmail.com>
Message-ID: <a1fvt1d9qsin3k54mihase1u4ud7k7be37@4ax.com>

On Tue, 31 Jan 2006 19:30:55 +0000,  Cori <hiddengreen@gmail.com> wrote:

|On 1/31/06, Dave Fawthrop <hyphen@hyphenologist.co.uk> wrote:
|> |Indeed :)  Catching a dozen real errors is a definite win!
|>
|> 12 in 500plus is a resounding failure.
|
|I think there might be a misunderstanding about the purpose of the
|Gutcheck ... if a text checking tool was provided that never gave me
|any false errors, I'd be convinced that it wasn't catching all it
|should be.  Spellcheckers, or the barrage of regex checks that DP has
|developed, all flag up false positives on my books, but they couldn't
|be made much more effective without personalising them to each and
|every text -- which would take more time than just clicking through
|the false alarms..?  The point of all these checks is to (hopefully)
|be over-sensitive to problems, rather than under-sensitive (thus
|leaving errors.)
|
|Or have I missed something in turn..?  Do you have text checking tools
|that only ever signal real errors..?  Can they be shared..?


With my other hat on I write "intelligent" language software, Low 90%
correct is very bad, above 99% correct is acceptable.   For a voluntary
organisation I would be accept 50% correct.

-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain 

From prosfilaes at gmail.com  Tue Jan 31 12:04:30 2006
From: prosfilaes at gmail.com (David Starner)
Date: Tue Jan 31 12:04:33 2006
Subject: !@!Re: [gutvol-d] blah blah blog
In-Reply-To: <Pine.LNX.4.60.0601310733050.12859@pglaf.org>
References: <202.11261918.310d2757@aol.com>
	<Pine.LNX.4.60.0601301319460.22549@pglaf.org>
	<43DE8E8E.5090900@novomail.net>
	<Pine.LNX.4.60.0601301415410.22549@pglaf.org>
	<43DE991F.5070801@novomail.net>
	<Pine.LNX.4.60.0601310733050.12859@pglaf.org>
Message-ID: <6d99d1fd0601311204g15258afcxe6e276df9543eed8@mail.gmail.com>

On 1/31/06, Michael Hart <hart@pglaf.org> wrote:
> Once again the major point is that most of the work has already been in
> the system for you before you came along, and it is up to you to, "Take
> matters into your own hands," as one put it, and do the minuscule works
> that are required to make the books completely consistent with your own
> philosophy of how eBooks should be created.
>
> There's nothing wrong with what such people are asking, other than that
> they are asking someone else to do it for them, free of charge.
>
> "An Unfunded Mandate" as the politicians often refer to such things.

Please don't bite people because they don't work the way you do. All
he said is that Gutenberg books aren't layed out optimally for him.
There's nothing wrong with someone discussing how things could be
better, in their opinion. He didn't ask anyone to do anything for him.
From Bowerbird at aol.com  Tue Jan 31 13:34:43 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Jan 31 13:34:50 2006
Subject: !@!Re: [gutvol-d] blah blah blog
Message-ID: <12b.6d4ce464.311131f3@aol.com>

mikey said:
>   Please keep comments on issues 
>    raised in bowerbird's "blah blah blog"
>    off this list.

sorry to disturb your sleep, mikey.
but you can go back to bed now...

***

bastien said:
>   Can we see the code?

no.


>    Can we use/test/improve it?

i'll release an app sooner or later that you can use.

as far as the routines themselves, creating them
took a _persistent_ application of common-sense
and a lot of elbow-grease.   that's all.   which means
you can generate them yourself, if you really want.

i'll also be willing to give you some pointers if you
should happen to hit a wall at any point, once you
have shown me you are willing to take on the job.

or, if you prefer, you can take the shortcut, and
buy my source code.   the price is in the 6 figures.
i can't afford to give away something that valuable.
if you can, i suggest you buy it and give it away...

but really, what i want you to take away is that you
_can_ do this, and it's really not even all that _hard_.
for instance, i've talked about using the number of
preceding blank lines as an indicator of heading level.
assuming that such consistency has been maintained, 
even a beginning programmer could write the code
to ascertain the structure of a book.   care to try it?
write it in pseudo-code first, and then in a language,
any language with which you are comfortable, be it
perl or python or ruby or basic or whatever you like.
it might be better to take it over to gutvol-p, so we
don't put a coding session in all the gutvol-d boxes,
but if you show me you're willing to do some work,
i'll be more than happy to show you what will pay off.
if you're not, then i'll refer you to the fables of aesop,
specifically the one about the little red hen...

***

michael said:
>    There's nothing wrong with 
>    what such people are asking, 
>    other than that they are asking 
>    someone else to do it for them, 
>    free of charge.

i know that michael is talking about
"such people" as a general class, and
i'm fairly confident he doesn't lump me
in that class.   but other people here might.
so let me be absolutely clear here for them...

i'm not asking anyone to do anything "for" me.

and i am more than willing to do the job myself.
i said so, directly, on my blog, and i repeat it here.

what i am doing is suggesting you make these changes
for _yourselves_ and for _your_readers_, simply because
consistency in the e-texts leads to greater functionality...

my experience is that this greater functionality would benefit
you with increased efficiency in the preparation of the e-texts,
and benefit your readers in greater total usability of the e-texts.


>    Those of us who have been on this list, and others, 
>    for very long these days have no trouble remember 
>    any number of people who have had variety upon variety 
>    of requests that Project Gutenberg should be run in 
>    such a fashion as to meet with their demands.

again, i make no "demands".   just giving you tips.   for your own good.


>    The response is always to invite the creation of some examples, 
>    along a suggested pathway for future efforts, to be accompanied by 
>    your request for others to assist you in making such future efforts.

it is also the case that i _am_ putting examples online.
i've already pointed to some, and more will come soon.

>    http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.zml
>    http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.html
>    http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.pdf
>    http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01b.pdf

other examples will be "my antonia", "books and culture", "free culture",
"the universe or nothing", "the secret garden", and a handful of books 
by cory doctorow, for starters.   as z.m.l. requires just a smattering of
changes to the markup-free version of an e-text, hundreds of examples
will follow before long, and thousands once my processes are refined...

if i don't lose interest, i might have most of the library converted in a 
year.

from these plain-text .zml "masters" will emerge automatic .html versions
-- from which a plethora of other formats will be able to be generated --
and automatic creation of .pdf versions according to user specifications...

other sweetnesses might follow too, like ipod versions and p.s.p. versions.
and last but not least, the z.m.l. viewer-app will create a kick-ass powerful
high-functionality electronic-book experience using these .zml "masters".
enough so that you'll wonder why you ever thought you needed "markup".

(there's more -- scansets and banana-cream -- but that's enough for now.)


>   Describe how these examples are better than previous examples

have done, and will continue to do, as appropriate times arrive...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060131/b7a245f6/attachment.html
From Bowerbird at aol.com  Tue Jan 31 14:59:23 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Jan 31 14:59:33 2006
Subject: [gutvol-d] blah blah blog
Message-ID: <d4.3670ae2e.311145cb@aol.com>

a concrete example might help...

here's the table of contents from
"free culture" by lawrence lessig,
in zen markup language format,
generated automatically from a
simple straightforward analysis
in about one-half of a second...

even though there are 3 levels
of headers, they are very clear,
indicated by varying indentation
(which represents, at the headers
themselves, a varying number of
preceding blank lines, of course.)

text-structures even more complex
than the one shown in this outline
can be communicated easily by the
number of preceding blank lines --
_if_ the rule is followed _consistently_
-- and grokked by routines consisting
of just a few lines of dirt-simple code...

by the way, just to say something "obvious"
that lee probably had not considered before,
one of the many ways my routines determine
the headers in a digitized text is to look for a
"table of contents" section -- usually toward
the start of the file, and usually marked with
"contents" or "table of contents" as a header --
and then examine that section quite carefully.
ends up it does a very good job of telling you
what specific phrases "might be" header-lines.

and if you're cleaning up the o.c.r. of a p-book,
for instance, there are usually _page-numbers_
there too, telling what _page_ each header is on.
pretty handy, eh?   indeed, in the .pdf of this book,
which you can download at http://www.lessig.org,
you will see that the page-numbers _are_ there, and
chapter 11, chimera, for instance, starts on page 177.

like i said, if you know what a header is likely to be,
and on what page it is located, it's fairly easy to find.
indeed, people have been using the "table of contents"
for precisely that reason for several hundred years now.

this is just one of the reasons why it ain't that hard
to write routines to ascertain the headers in a book.

like i said, it sounds very obvious when you hear it.
but have you ever heard anyone say it here before?

-bowerbird

---------------------------------------------


 TABLE OF CONTENTS


      Free Culture
      Table of Contents
      License
      Publisher Page
      Library of Congress Cataloging
      Dedication
      Preface
      Introduction

      'Piracy'
           Chapter 1: Creators
           Chapter 2: "Mere Copyists"
           Chapter 3: Catalogs
           Chapter 4: "Pirates"
                Film
                Recorded Music
                Radio
                Cable TV
           Chapter 5: "Piracy"
                Piracy I
                Piracy II

      'Property'
           Chapter 6: Founders
           Chapter 7: Recorders
           Chapter 8: Transformers
           Chapter 9: Collectors
           Chapter 10: "Property"
                Why Hollywood Is Right
                Beginnings
                Law: Duration
                Law: Scope
                Law and Architecture: Reach
                Architecture and Law: Force
                Market: Concentration
                Together

      Puzzles
           Chapter 11: Chimera
           Chapter 12: Harms
                Constraining Creators
                Constraining Innovators
                Corrupting Citizens

      Balances
           Chapter 13: Eldred I
           Chapter 14: Eldred II

      Conclusion

      Afterword
           Us, Now
                Rebuilding Freedoms Previously Presumed: Examples
                Rebuilding Free Culture: One Idea
           Them, Soon
                More Formalities
                Shorter Terms
                Free Use Vs. Fair Use
                Liberate the Music -- Again
                Fire Lots of Lawyers

      Footnotes
      Hyperlinks
      Acknowledgments
      Index
      About the Author
      Jacket
      Typos Corrected
      Permissions
      The Dead-Tree Hardback Version of this Work
      zero markup language -- z.m.l. -- the future of electronic-books

---------------------------------------------

p.s.   extra points for everyone who realized that
-- since the lines in the table of contents section
are not to be rewrapped -- that is the reason that
all are prefaced with at least one leading space...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060131/70be7733/attachment-0001.html
From Bowerbird at aol.com  Tue Jan 31 15:32:46 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Jan 31 15:32:54 2006
Subject: [gutvol-d] coold nothav e
Message-ID: <1c4.390ca2f5.31114d9e@aol.com>

mihceal siad:
>    So, what you are telling me hre, ist hat 
>    while a human can muddle through ok,
>    it takes a computer to really maess things up.

i coold nothav esaidit bettter myyyself.         :+)

-bowrebrid
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060131/a35049a0/attachment.html
From prosfilaes at gmail.com  Tue Jan 31 16:27:13 2006
From: prosfilaes at gmail.com (David Starner)
Date: Tue Jan 31 16:27:16 2006
Subject: [gutvol-d] Language free version of guiguts?
In-Reply-To: <a1fvt1d9qsin3k54mihase1u4ud7k7be37@4ax.com>
References: <2e5.8ddc4c.30f9ebc2@aol.com>
	<3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com>
	<43C963C5.4080400@xs4all.nl>
	<igjns1pqnsdj50m5prvddehhqvevddg1a8@4ax.com>
	<h3uut19idjipjchn0232smabl2t3g9fso8@4ax.com>
	<910fee4a0601311023h7a826156ke08a5e683f40e983@mail.gmail.com>
	<kbdvt11n2fj950249urj5bqi00o06vqs3l@4ax.com>
	<910fee4a0601311130i65126631l97a9c6ef51c6ab4a@mail.gmail.com>
	<a1fvt1d9qsin3k54mihase1u4ud7k7be37@4ax.com>
Message-ID: <6d99d1fd0601311627m4059cef9xf9dccb3ea4870627@mail.gmail.com>

On 1/31/06, Dave Fawthrop <hyphen@hyphenologist.co.uk> wrote:
> 12 in 500plus is a resounding failure.

Then don't use it. For many users, 12 out of 500 is good enough to
make the program useful. No program is ever going to handle dialect
well, because dialect doesn't follow the normal rules.

> With my other hat on I write "intelligent" language software, Low 90%
> correct is very bad, above 99% correct is acceptable.   For a voluntary
> organisation I would be accept 50% correct.

Intelligent language software is too broad; you write code to
automatically hyphenate words. A concrete problem like that is
significantly easier than a problem like "find errors in this text
document". 50% of errata sent by humans to errata@pglaf.org is wrong;
how do you expect a computer to do better?
From Morasch at aol.com  Tue Jan 31 17:02:40 2006
From: Morasch at aol.com (Morasch@aol.com)
Date: Tue Jan 31 17:02:48 2006
Subject: [gutvol-d] Language free version of guiguts?
Message-ID: <2ea.9b2402.311162b0@aol.com>

it's fair to say that gutcheck is an excellent piece of software.

it's also fair to say that it returns far too many false positives.

it's also fair to say that we _could_ have hoped that gutcheck's
status as "open source" would have lead it to be _improved_
far more often than it has been, given its level of importance.

it's also fair to say that it wouldn't be that hard to improve it,
if someone _decided_ to, as the problems are not intractable.

it's also fair to say if nobody improves it, it won't be improved.

all those things are fair to say.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060131/b2ed12ed/attachment.html
From Bowerbird at aol.com  Tue Jan 31 17:10:05 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Jan 31 17:10:13 2006
Subject: [gutvol-d] Language free version of guiguts?
Message-ID: <1ef.4b0b5db7.3111646d@aol.com>

i said:
>    it's fair to say

yep, that's me, and not some rogue impostor, just
mailed from my girlfriend's account, so you know.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060131/9d7c9d0a/attachment.html
From imaclean at gmail.com  Tue Jan 31 21:20:57 2006
From: imaclean at gmail.com (Ian MacLean)
Date: Tue Jan 31 21:21:00 2006
Subject: !@!Re: [gutvol-d] blah blah blog
In-Reply-To: <12b.6d4ce464.311131f3@aol.com>
References: <12b.6d4ce464.311131f3@aol.com>
Message-ID: <3156339d0601312120l3d6060cao114047a55d42f13b@mail.gmail.com>

On 2/1/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote:

>  bastien said:
>  >   Can we see the code?
>
>  no.
>
We'll just have to take your word for how good these "routines" are then.

>  >   Can we use/test/improve it?
>
>  i'll release an app sooner or later that you can use.

>  or, if you prefer, you can take the shortcut, and
>  buy my source code.  the price is in the 6 figures.
>  i can't afford to give away something that valuable.
>  if you can, i suggest you buy it and give it away...

You're kidding right ? Your code might be fantastic but you're
deluding yourself if you think anyone will pay you six figures for a
text processing app.


>  what i am doing is suggesting you make these changes
>  for _yourselves_ and for _your_readers_, simply because
>  consistency in the e-texts leads to greater functionality...
>  my experience is that this greater functionality would benefit
>  you with increased efficiency in the preparation of the e-texts,
>  and benefit your readers in greater total usability of the e-texts.

This I agree with. Although with an all-volunteer project its hard to
define and enforce the format. Tools like gutcheck are a start along
this road.


>  it is also the case that i _am_ putting examples online.
>  i've already pointed to some, and more will come soon.
>
> http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.zml

> http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.pdf
>  >
> http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01b.pdf

I assume the pdf there is generated from the zml ? via what mechanism
? a single conversion step or via somthing like latex ?

Any reason to not go with one of the existing plain-text markup
languages that already exist and have existing conversion tools ?
reST, asciidoc and others.

Is there a format description for z.m.l ? Do you have existing
tools/parsers for z.m.l with freely available code ? Or is that
another 6 figures ?  :)

>  from these plain-text .zml "masters" will emerge automatic .html versions
>  -- from which a plethora of other formats will be able to be generated --
>  and automatic creation of .pdf versions according to user specifications...

and how is this different from what gutenmark does ? Apart from
hopefully fewer errors in the html or latex output ?

>  other sweetnesses might follow too, like ipod versions and p.s.p. versions.
>  and last but not least, the z.m.l. viewer-app will create a kick-ass
> powerful  high-functionality electronic-book experience
Surely this is competely orthogonal to the choice of markup language.

> using these .zml "masters".
>  enough so that you'll wonder why you ever thought you needed "markup".

uh - just because you use whitespace instead of <h> to indicate
headings doesn't mean its not markup. The very fact that you are
pushing for consistent format for conversion means that it *is* markup
- oh and the fact that you've named it Zen *markup* language :)

Ian