From Bowerbird at aol.com  Wed Nov  1 15:50:04 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov  1 15:50:09 2006
Subject: [gutvol-d] gvd061101 -- the niceties of book typography
Message-ID: <4ad.3840eb24.327a8cac@aol.com>

here's your next issue in our "open-source" project, "babelfish".

first, a little more background on our sample book, "my antonia".

you can download an .html version from project gutenberg at...

oh wait...   there _is_ no .html version from project gutenberg, 
just the plain-text version.   gee, that's too bad, isn't it?   

that means all the people who would like an .html version are
out of luck.   and without an .html version, we cannot easily get
automatic conversions to the various e-book formats.   even the
offered plucker conversion will likely be a straight text-dump...

a straight text-dump is _not_ an "electronic-book",   not in my book.

oh well, you _can_ download it in .html from manybooks.net:
>    http://manybooks.net/titles/catherwietext95myant11.html

indeed, the "custom .html" lets you specify certain parameters,
like the fonts used (ok, you get a choice of 5, but it _is_ a choice),
size (10pt-16pt), leading (1x, 1.25x, 1.5x, 1.75x), justification,
and various indentation and margin parameters.   quite nifty!
it'd be neat if project gutenberg offered something like this...

(i should add i was unable to get a working download from this.)

i give huge props to matthew mcclintock, who runs manybooks.net.
since the sidelining of blackmask.com, he is the go-to guy for those
people who want to find p.g. e-texts in the various handheld formats.

he's put a converter on his site that can export out to all these formats:
>    PDF
>    PDF Large Print
>    eReader
>    Doc
>    Plucker
>    iSilo
>    zTXT
>    Rocketbook
>    iPod Notes
>    Sony
>    TCR
>    iRex iLiad PDF
>    Custom PDF
>    Custom HTML
>    RTF
>    Newton
>    Mobipocket

wow.   that's very impressive.   it seems that we don't even have to build
any conversion capability at all, we just have to feed matthew some files!
i appreciate all the hard work that he's done in providing such a service!

there is a glitch, though.   and it's the same one we experienced above:
a straight text-dump is _not_ an "electronic-book",   not in my book,

some manybooks.net conversions are simply a straight text-dump.
(that's not matthew's fault, i'm just stating it as a pure observation.)

i won't dwell on this, i'll just give a few examples.

i downloaded the regular .pdf version, so you can download that
if you want to look at the exact thing that i saw when i wrote this...

here's the "titlepage" of "my antonia", as shown in the .pdf:
>    http://snowy.arsc.alaska.edu/bowerbird/misc/anttitle.jpg

ouch.   not very pretty.   the original titlepage looked like this:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myantf003.png
that's what we expect a title page to look like.

and here's the scan of page #181 of the book:
>    http://snowy.arsc.alaska.edu/bowerbird/myant/myantp181.png

but here's what that same page looks like in the .pdf.
>    http://snowy.arsc.alaska.edu/bowerbird/misc/weevil.jpg

not only is the new chapter not at the head of a page,
neither is it bigger or bold like we expect of headers.

even worse, the poem in the epigraph is not just 
unformatted, but it is even incorrectly wrapped...

typographical niceties like these have been the hallmark
of paper-books for over 100 years, so it's embarrassing
when our newfangled e-books fail to clear that standard.
and many people report that it is a huge turn-off to them.
(and it's hard to tell 'em not to be so picky, because frankly,
when something falls so far below expectations, it _is_ bad.)

nor are some of the _advances_ that we expect of e-books
present here (e.g., there is no hotlinked table of contents).

in these areas, we want our open-source project to do better.
we want it to be able to make a first-rate e-book in .html form,
to the extent that such an animal is possible, so that the various
converter-programs have optimal input for best possible output.

(in this regard, one of the first things i do to a p.g. e-text is to
strip off the header and footer.   sorry, chaps, but they're ugly.
and besides, the very first item in an e-book file should be
_the_title_of_the_book_, and the next should be the author.
again, sorry, but that's just the way that it should be, period.)

***

before today's exercise, let me remind people once again that...

...i am a beginner with perl...

my code is _not_ something you should emulate.
(kids, do not try this at home.   you might get hurt.)

my formatting of that code, especially, is "unusual",
and will not look very familiar to most perl people.
so be it.   i hate those stupid curly braces.   hate 'em.

i'll repeat:   i'm a beginner with perl.

moreover, _that_is_the_point_.   (cue the ring of a bell here.)

you don't need to do anything more than copy sample code
out of a programming primer to get some good functionality,
_providing_ that the file-format of your e-book is dirt-simple.

if your format is complex, like docbook or .tei or x.m.l., then
you're gonna need a sophisticated programmer to get _any_
functionality out of your e-texts, and it'll be slow in coming...

simple is better.

so i thank my "critics" who characterized my perl as elementary.
they've done a better job of making my point than i could have...

***

for your reference:
>    http://www.greatamericannovel.com/myant/myantp123.html

you will remember that i am still looking for a contribution to our
open-source thing, in the form of c.s.s., but here goes anyway...

so today's assignment is: churn out the code for that page 123.

>    #!/usr/bin/perl
>    use CGI::Carp qw(fatalsToBrowser);
>    
>    ########## read the file in...
>    $filename="/home2/yoursiteinfohere/public_html/myant/myant-lf.zml";
>    open (inf,"$filename") or print "that file was not available...<p>\n";
>    read (inf,$thebook,2222222); close inf;
>    
>    ########## changes made here include the c.s.s. stylesheet...
>    print "content-type: text/html\n\n";
>    print '<!DOCTYPE html public "-//w3c//dtd html 4.01 transitional//en">'; 
print "\n"; print "\n";
>    print '<html lang="en"><html><head><title>my antonia!';
>    print "\n";
>    print '</title><meta http-equiv="content-type" content="text/html; 
charset=iso-8859-1">';
>    print "\n"; print '<style type="text/css"><!--'; print "\n";
>    print 'p { color: rgb(99,99,49); text-indent: 2em; margin-top: .25em; 
margin-bottom: .25em; }'; print "\n";
>    print 'h3 { color: rgb(11,11,255); }'; print "\n";
>    print 'h6 { color: rgb(255,11,11); margin-top: .15em; margin-bottom: 
.05em; }'; print "\n";
>    print '--></style></head><body>'; print "\n";
>    print "\n";
>
>    ########## ok, do page 123...
>    ########## first the navigation headers...
>    print '<a href="http://www.greatamericannovel.com/myant/myantp124.html">
'; print "\n";
>    print '<img border="2" width="49%" align="right" alt="p123.png"'; print 
"\n";
>    print 'src="http://snowy.arsc.alaska.edu/bowerbird/myant/myantp123.png" 
/>'; print "\n";
>    print '</a>'; print "\n";
>    print "\n"; print '<small><small>'; print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantp122.html"> 
p122 </a>_'; print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantp119.html"> 
-chap- </a>_'; print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantc-1.html"> 
toc-1 </a>_'; print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantp123w.html">
 p123w </a>_'; print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantc-2.html"> 
toc-2 </a>_'; print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantp131.html"> 
+chap+ </a>_'; print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantp124.html">
p124 </a>'; print "\n";
>    print "\n";
>    
>    ########## split the pages and pull out the correct one...
>    @thepage=split("{{",$thebook); foreach $thepage (@thepage) {
>    $pp++; if ($pp eq "148") { 
>    ########## now split out the lines on that page
>    @oneline=split("\n",$thepage); $maxline=@oneline; 
$maxminustwo=$maxline-2;
>    
>    ########## output the runhead...
>    print '<font color="#AAAAAA">'; print "<br>"; print '------ '; print 
"{{"; 
>    print $oneline[0]; print "</font>"; print ' </small></small></pre><p><br>
';
>    
>    ########## and all the lines on the page...
>    foreach $oneline (@oneline) {
>    $nn++; if ($nn ne "1" and $nn ne "2" and $nn < $maxminustwo) {
>    print $oneline;
>    if ($oneline ne "") {print ' <br>'; print "\n";}
>    if ($oneline eq "") {print ' <p>'; print "\n";}
>    }}}}
>    
>    ########## then the pagenumber...
>    print '<h6 align="center">[[123]]</h6><br>'; print "\n";
>    print '<small><small>'; print "\n";
>
>    ########## now repeat the navitgational links...
>    print '<a href="http://www.greatamericannovel.com/myant/myantp122.html"> 
p122 </a>_'; 
>    print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantp119.html"> 
-chap- </a>_'; 
>    print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantc-1.html"> 
toc-1 </a>_';
>    print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantp123w.html">
 p123w </a>_'; 
>    print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantc-2.html"> 
toc-2 </a>_'; print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantp131.html">
';
>    print ' +chap+ </a>_'; print "\n";
>    print '<a href="http://www.greatamericannovel.com/myant/myantp124.html">
p124 </a>'; 
>    print "\n";
>    print '<hr><p>'; print "\n";
>    print "\n";
>
>    ########## now put in the error-reporting form...
>    print '<form method="post" ';
>    print 
'action="http://www.greatamericannovel.com/scgi-bin/appendcomment.pl">'; 
>    print "\n";
>    print 'name <input type="text" name="name" size="50"><br>'; print "\n";
>    print 'e-mail <input type="text" name="email" size="50"><p>'; print 
"\n";
>    print 'bad <input type="text" name="old" size="50"></p><p>'; print "\n";
>    print 'new <input type="text" name="new" size="50"></p>'; print "\n";
>    print '<textarea name="comment" rows="10" cols="50"></textarea>'; print 
"\n";
>    print '<input type="hidden" name="filenameis" 
value="myant/myantp123.html"><br>'; 
>    print "\n";
>    print '<input type="submit"> or <input type="reset">'; print "\n";
>    print '</form><hr>'; print "\n";
>    print "\n";

you can see the results of this code by running this script:
>    http://www.greatamericannovel.com/scgi-bin/babelfish10.pl

there are a number of things to notice about this particular routine,
all of which will be dealt with in further detail in coming days...

first, i've reworked the .html so as to make use of a .css stylesheet.
(this lets me indent paragraphs, instead of using blank lines between.
it also allows me to have a proportional-spaced font, not that dreadful
monospaced font that is the default whenever you use the "pre" tag.
and of course the c.s.s. will help us in the future, on the pages which
have various structural features that we will want to display properly.)

second, i have included some links to help the user with navigation,
with one set of them at the top, and an identical set at the bottom...

third, i've pulled in the scan of the page, for easy comparison.
this is necessary when we want to do "continuous proofreading".

fourth, i've added a form that readers can use to report errors,
another essential aspect of our "continuous proofing" system...

i was going to add in each of these things on a separate day,
but i figured you could absorb the shock of all of them at once.

still, at the heart of this routine, we're displaying the text of a page,
something we had already worked out previously.   and indeed, this
routine to display a page is the main "engine" in an e-book program.

as to this code, it does a good job of presenting one page, #123.

the links to the surrounding pages (like page 122 and 124)
are hardwired, however, so tomorrow's exercise will require
that we turn them into variables, so that this routine will be
able to present _any_ page in the book, not just page 123...

go ahead, feel free to have a pass at modifying this routine.

after all, that's the point of open-source, that people can
just jump in and join the coding fun any time they want to!

***

now, for some other commentary...

***

oh geez, part 4...

in other news today, some people are unhappy with
the iliad e-book-machine, because it takes 40 seconds
to boot up, and you have to shut it down if you're not
reading because otherwise the battery will run down...

it ends up that people consider this slow boot-up time
to be very "unpaperlike", which is the main claim to fame
that e-ink has been bragging.   that's not all, either, since
a relatively slow page-turning time is another liability...
and we won't even talk about a price that is over $700.

our good friend david rothman has this to say:
>    Shortcomings like this should long have been solved

only an idiot would have had the expectation that
an early version of this product would be _free_ of
such "shortcomings" as this one...

and only a _pure_ idiot would have led other people on,
in terms of creating that stupid expectation in them...

and only the most _extreme_ of pure idiots would then
lash out at the product-maker for failing to live up to
the unreasonable expectations that the idiot had created.

the unmitigated bile of unrealized hype can be very nasty.

-bowerbird

p.s.   above, i commented on the lack of formatting on an epigraph.
as you can see, by referring to my version of that same page,
>    http://www.greatamericannovel.com/myant/myantp181.html
i have chosen to format the poem differently than it was formatted
in the paper-book, which is my prerogative as a re-publisher...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061101/95ae46c6/attachment-0001.html
From jon at noring.name  Wed Nov  1 18:11:09 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  1 18:17:49 2006
Subject: [gutvol-d] Line by line proofing of OCR text?
Message-ID: <73794793.20061101191109@noring.name>

Everyone,

For quite a while I've been wondering if line-by-line proofing of
OCR text will result in more accurate results, with higher efficiency
compared to the side-by-side page editing used in the initial
proofing stage at Distributed Proofreaders.

The difficulty I have with page-by-page editing is that the OCR text
and the original page scans are big blocks that sit side-by-side, and
I find during proofing that I have to move my eyes back and forth,
which to me is fairly tiring as I try to realign my view -- it
definitely slows me up and I always sense I may be missing something.

Now, if instead we had the following in our proofing window display:

original scan line   -->  development. He is always able to raise capi-
OCR text/edit window -->  developmenl, He is always.able to ra6e capi-

[Of course the original scan line is an actual image of the line, not
ASCII text as shown above. It is scaled as close as possible to the
OCR text line below which is user-editable. And the OCR text example
is something I made up, so don't criticize the choice of OCR errors!
Certainly the standard PG/DP scripts can be run to remove some to most
of the OCR errors before the line-by-line human proofing stage.]

This alignment allows me to do a vertical comparison, which I think
may make it easier to spot any OCR errors. It should, at least for
some people, increase the speed and accuracy of proofing. Well,
that's the hypothesis at least.

Now, certainly it will be argued that the proofer should be able to
see the entire page scan, such as for context, simple pleasure, and to
see if there were errors in generating the page image line. I agree!
But the original page can certainly be displayed to the proofer in a
separate window or to the side. So it is possible to have both (as well
as offer both proofing methods -- there are definitely pages with odd
text layouts where page-level proofing may be more appropriate.)

So, asking the proofing mavens here, has this been tested? What are
the fatal flaws in this? I can't help but think that this has already
been thought of and discarded by Charles Franks when he started DP.
But then, technology has changed the last few years, and maybe this
idea may again be considered.

Jon


From grythumn at gmail.com  Wed Nov  1 18:41:05 2006
From: grythumn at gmail.com (Robert Cicconetti)
Date: Wed Nov  1 18:47:24 2006
Subject: [gutvol-d] Line by line proofing of OCR text?
In-Reply-To: <73794793.20061101191109@noring.name>
References: <73794793.20061101191109@noring.name>
Message-ID: <15cfa2a50611011841x5abc4bacxa00c2b468827825c@mail.gmail.com>

I've used similar techniques when single-proofing in an OCR program,
and the trouble is one often needs to zoom out for context.. plus the
fact that'd we'd need to extract character or line position
information from the OCR engines to automate matching the text to the
image.

However, what you've asked for can be manually approximated using the
horizontal interface at DP.. enlarge the font size in the text window,
and increase the zoom level in the image area. You'll get three or
four lines of text, one above the other.

R C

On 11/1/06, Jon Noring <jon@noring.name> wrote:
> Everyone,
>
> For quite a while I've been wondering if line-by-line proofing of
> OCR text will result in more accurate results, with higher efficiency
> compared to the side-by-side page editing used in the initial
> proofing stage at Distributed Proofreaders.
From schultzk at uni-trier.de  Thu Nov  2 00:05:28 2006
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Thu Nov  2 00:05:35 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <45477F3D.70205@perathoner.de>
References: <c14.79d1e63.3277cf5e@aol.com>
	<1162248212.5857.1.camel@localhost.localdomain>
	<45477F3D.70205@perathoner.de>
Message-ID: <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>

Hi Marcello,

	I will ask my question again: Do you know what you are doing?


	Why do not you can your comments to another list, please.

	Your are worse than a kindergarden kid.

	Personally, I do not know any of the systems, but from what I
	heard and seen they are all primitive and none do the JOB and will.
	
	I know what it takes top do the job! It is my profession: Linguistics.

	Have you heard of SGML? If that is to complex( not complicated) then  
use
	XML. But, the problem is not the format, but getting the formatting  
done.
	According to my analysis so far automatic formating can only be done  
to a
	max of 80%. The rest has to be proofed manually.

	In the early days of PG I have discussed the matter with Micheal  
Hart that
	plain ASCII is not enough. Today, computers have advance and  
computing power
	is abundant. My opinion is that PG start using a markup language  
from the start.
	Sure the scanning and especially the proofing of the text will take  
a little longer,
	but the benifits are far greater.
	
	The markup should conatin:
		Chapter
		section,
		Character formating,
		PG Header,
		picture, sound, etc
	tags.

	Hey XML can do all that. All we need is a common xml template. One  
format! a known straucture
	a few filters and voila. a neat package exactly what everybody is  
trying to create.
	If you scan into word, and use a few macros(or one big one) you can  
get 90-95% of the mark-up done.
	Now add 10% mor time for proofing and you guys and gals have just  
what you will ever need.


	 regards
		Keith.
	
Am 31.10.2006 um 17:52 schrieb Marcello Perathoner:

> David A. Desrosiers wrote:
>
>> Its obvious from reading the snippets, that it is indeed copied  
>> out of a
>> rudimentary Perl primer, and not touched by anyone who has a strong
>> grasp of the power of the language at hand.
>
> He's a baby that makes poo in the chamberpot for the first time and
> thinks his parents are watching him because they want poo.
>
>
>> Exactly what is it you are trying to prove with this anyway? We  
>> know how
>> to write parsers that can chew up and spit out a Gutenberg etext into
>> other formats, I don't think that's the core of the problem here.
>
> He's just inventing warm water (and trying to get credit for it).
>
> This parser is online. It converts any PG text into a plucker  
> database.
> And it is open source and written in gasp! python. We have served
> 130,000 plucker texts in October this way. The only guy who hasn't
> noticed yet is him who notices everything.
>
> There are a few other PG parsers around like GutenMark and my PG to  
> TEI
> converter. All of them are open source and working today. So its only
> natural that you-know-who will hold his non-working
> at-the-rate-its-going-never-to-be-released zml parser against them,  
> just
> for the fun of causing confusion. Ever wondered who pays him to  
> fuzz and
> fudge?
>
>
>
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From hyphen at hyphenologist.co.uk  Thu Nov  2 00:55:49 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Thu Nov  2 00:56:22 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>
References: <c14.79d1e63.3277cf5e@aol.com>
	<1162248212.5857.1.camel@localhost.localdomain>
	<45477F3D.70205@perathoner.de>
	<064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>
Message-ID: <bgcjk2dpc5vaq4m57fgjkup1402ef8a029@4ax.com>

On Thu, 2 Nov 2006 09:05:28 +0100,  "Schultz Keith J."
<schultzk@uni-trier.de> wrote:

|Hi Marcello,
|
|	I will ask my question again: Do you know what you are doing?
|
|
|	Why do not you can your comments to another list, please.
|
|	Your are worse than a kindergarden kid.
|
|	Personally, I do not know any of the systems, but from what I
|	heard and seen they are all primitive and none do the JOB and will.
|	
|	I know what it takes top do the job! It is my profession: Linguistics.
|
|	Have you heard of SGML? If that is to complex( not complicated) then  
|use
|	XML. But, the problem is not the format, but getting the formatting  
|done.
|	According to my analysis so far automatic formating can only be done  
|to a
|	max of 80%. The rest has to be proofed manually.
|
|	In the early days of PG I have discussed the matter with Micheal  
|Hart that
|	plain ASCII is not enough. Today, computers have advance and  
|computing power
|	is abundant. My opinion is that PG start using a markup language  
|from the start.
|	Sure the scanning and especially the proofing of the text will take  
|a little longer,
|	but the benifits are far greater.
|	
|	The markup should conatin:
|		Chapter
|		section,
|		Character formating,
|		PG Header,
|		picture, sound, etc
|	tags.
|
|	Hey XML can do all that. All we need is a common xml template. One  
|format! a known straucture
|	a few filters and voila. a neat package exactly what everybody is  
|trying to create.
|	If you scan into word, and use a few macros(or one big one) you can  
|get 90-95% of the mark-up done.
|	Now add 10% mor time for proofing and you guys and gals have just  
|what you will ever need.


ROTFLMAO

When you learn to format things in plain text someone might listen.

-- 
Dave Fawthrop <dave hyphenologist co uk> For Yorkshire Dialect 
http://www.gutenberg.org/author/John_Hartley
http://www.gutenberg.org/author/F_W_Moorman
19,000 free e-books at Project Gutenberg! http://www.gutenberg.org

From marcello at perathoner.de  Thu Nov  2 03:54:07 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  2 03:54:11 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>
References: <c14.79d1e63.3277cf5e@aol.com>	<1162248212.5857.1.camel@localhost.localdomain>	<45477F3D.70205@perathoner.de>
	<064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>
Message-ID: <4549DC5F.8010002@perathoner.de>

Schultz Keith J. wrote:

>     Hey XML can do all that. All we need is a common xml template. One
> format! a known straucture
>     a few filters and voila. a neat package exactly what everybody is
> trying to create.

This is the reason why we have to shut up BB: People reading this list
will think that the most vociferous person represents the consensus in
PG research. Not so. BB just pesters eveybody who doesn't want to hear
with *his* at best half-baked ideas about text representation and
delivery. Nobody takes BB seriously, and you shouldn't too.


The state of PG research is:

Consensus has been reached about using a subset of TEI as master format
for PG texts (since PGXML seems to be dead). Which subset is still being
discussed.

There are at least 2 different working toolchains to convert subsets of
TEI to end user formats. Files produced with these toolchains have been
posted.

Of course, everything is still in active research and can change a lot.
But nobody seriously considers using anything other than TEI or XML as
master format.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From mattsen at arvig.net  Thu Nov  2 03:57:26 2006
From: mattsen at arvig.net (Chuck MATTSEN)
Date: Thu Nov  2 04:11:57 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <4549DC5F.8010002@perathoner.de>
References: <c14.79d1e63.3277cf5e@aol.com>
	<1162248212.5857.1.camel@localhost.localdomain>
	<45477F3D.70205@perathoner.de>
	<064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>
	<4549DC5F.8010002@perathoner.de>
Message-ID: <op.tid0t0iw989pjw@notebook>

On Thu, 02 Nov 2006 05:54:07 -0600, Marcello Perathoner  
<marcello@perathoner.de> wrote:

> This is the reason why we have to shut up BB: People reading this list
> will think that the most vociferous person represents the consensus in
> PG research. Not so. BB just pesters eveybody who doesn't want to hear
> with *his* at best half-baked ideas about text representation and
> delivery. Nobody takes BB seriously, and you shouldn't too.

Oh, I dunno ... I think any thinking person reading the list will quickly  
be able to discern the intent behind, and value of, any frequent flyer.   
:-)

-- 
Chuck Mattsen (Mahnomen, MN)
mattsen@arvig.net
From joshua at hutchinson.net  Thu Nov  2 05:29:55 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Thu Nov  2 05:30:02 2006
Subject: [gutvol-d] Line by line proofing of OCR text?
Message-ID: <15656271.1162474195305.JavaMail.?@fh1038.dia.cp.net>

At first blush, it seems like small return of investment.  The 
programming required (as well as the difference in how scans/ocr are 
prepared) would be very significant, while the increase in quality 
would be minuscule.  DP gets very good results with their current 
method and I think a better return on the programming investment would 
be to implement a "roundless" system, where each page is proofed again 
and again until a certain "confidence" level is reach.  Easy pages may 
only be seen a couple times, while a particularly nasty page might get 
seen by scores of people.  (See DP forums for length discussions of how 
this system might work.)

But, as always, the bottleneck is developer time.  We ALWAYS have more 
work than we have volunteers to do it.

Josh

>----Original Message----
>From: jon@noring.name
>Date: Nov 1, 2006 21:11 
>To: <gutvol-d@lists.pglaf.org>
>Subj: [gutvol-d] Line by line proofing of OCR text?
>
>Everyone,
>
>For quite a while I've been wondering if line-by-line proofing of
>OCR text will result in more accurate results, with higher efficiency
>compared to the side-by-side page editing used in the initial
>proofing stage at Distributed Proofreaders.
>
>The difficulty I have with page-by-page editing is that the OCR text
>and the original page scans are big blocks that sit side-by-side, and
>I find during proofing that I have to move my eyes back and forth,
>which to me is fairly tiring as I try to realign my view -- it
>definitely slows me up and I always sense I may be missing something.
>
>Now, if instead we had the following in our proofing window display:
>
>original scan line   -->  development. He is always able to raise 
capi-
>OCR text/edit window -->  developmenl, He is always.able to ra6e 
capi-
>
>[Of course the original scan line is an actual image of the line, not
>ASCII text as shown above. It is scaled as close as possible to the
>OCR text line below which is user-editable. And the OCR text example
>is something I made up, so don't criticize the choice of OCR errors!
>Certainly the standard PG/DP scripts can be run to remove some to 
most
>of the OCR errors before the line-by-line human proofing stage.]
>
>This alignment allows me to do a vertical comparison, which I think
>may make it easier to spot any OCR errors. It should, at least for
>some people, increase the speed and accuracy of proofing. Well,
>that's the hypothesis at least.
>
>Now, certainly it will be argued that the proofer should be able to
>see the entire page scan, such as for context, simple pleasure, and 
to
>see if there were errors in generating the page image line. I agree!
>But the original page can certainly be displayed to the proofer in a
>separate window or to the side. So it is possible to have both (as 
well
>as offer both proofing methods -- there are definitely pages with odd
>text layouts where page-level proofing may be more appropriate.)
>
>So, asking the proofing mavens here, has this been tested? What are
>the fatal flaws in this? I can't help but think that this has already
>been thought of and discarded by Charles Franks when he started DP.
>But then, technology has changed the last few years, and maybe this
>idea may again be considered.
>
>Jon
>
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From joshua at hutchinson.net  Thu Nov  2 05:33:18 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Thu Nov  2 05:33:21 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
Message-ID: <24690987.1162474398658.JavaMail.?@fh1038.dia.cp.net>

Are you sure you meant to address those comments to Marcello?

What you are talking about is what Marcello has done.  bowerbird is 
the kindergarten kid you seem to be talking about...

Josh

>----Original Message----
>From: schultzk@uni-trier.de
>Date: Nov 2, 2006 3:05 
>To: "Project Gutenberg Volunteer Discussion"<gutvol-d@lists.pglaf.
org>
>Subj: Re: [gutvol-d] gvd061030 -- let&#39;s get it started in here
>
>Hi Marcello,
>
>	I will ask my question again: Do you know what you are doing?
>
>
>	Why do not you can your comments to another list, please.
>
>	Your are worse than a kindergarden kid.
>
>	Personally, I do not know any of the systems, but from what I
>	heard and seen they are all primitive and none do the JOB and will.
>	
>	I know what it takes top do the job! It is my profession: 
Linguistics.
>
>	Have you heard of SGML? If that is to complex( not complicated) 
then  
>use
>	XML. But, the problem is not the format, but getting the 
formatting  
>done.
>	According to my analysis so far automatic formating can only be 
done  
>to a
>	max of 80%. The rest has to be proofed manually.
>
>	In the early days of PG I have discussed the matter with Micheal  
>Hart that
>	plain ASCII is not enough. Today, computers have advance and  
>computing power
>	is abundant. My opinion is that PG start using a markup language  
>from the start.
>	Sure the scanning and especially the proofing of the text will 
take  
>a little longer,
>	but the benifits are far greater.
>	
>	The markup should conatin:
>		Chapter
>		section,
>		Character formating,
>		PG Header,
>		picture, sound, etc
>	tags.
>
>	Hey XML can do all that. All we need is a common xml template. One  
>format! a known straucture
>	a few filters and voila. a neat package exactly what everybody is  
>trying to create.
>	If you scan into word, and use a few macros(or one big one) you 
can  
>get 90-95% of the mark-up done.
>	Now add 10% mor time for proofing and you guys and gals have just  
>what you will ever need.
>
>
>
>	 regards
>		Keith.
>	
>Am 31.10.2006 um 17:52 schrieb Marcello Perathoner:
>
>> David A. Desrosiers wrote:
>>
>>> Its obvious from reading the snippets, that it is indeed copied  
>>> out of a
>>> rudimentary Perl primer, and not touched by anyone who has a 
strong
>>> grasp of the power of the language at hand.
>>
>> He's a baby that makes poo in the chamberpot for the first time and
>> thinks his parents are watching him because they want poo.
>>
>>
>>> Exactly what is it you are trying to prove with this anyway? We  
>>> know how
>>> to write parsers that can chew up and spit out a Gutenberg etext 
into
>>> other formats, I don't think that's the core of the problem here.
>>
>> He's just inventing warm water (and trying to get credit for it).
>>
>> This parser is online. It converts any PG text into a plucker  
>> database.
>> And it is open source and written in gasp! python. We have served
>> 130,000 plucker texts in October this way. The only guy who hasn't
>> noticed yet is him who notices everything.
>>
>> There are a few other PG parsers around like GutenMark and my PG 
to  
>> TEI
>> converter. All of them are open source and working today. So its 
only
>> natural that you-know-who will hold his non-working
>> at-the-rate-its-going-never-to-be-released zml parser against 
them,  
>> just
>> for the fun of causing confusion. Ever wondered who pays him to  
>> fuzz and
>> fudge?
>>
>>
>>
>> -- 
>> Marcello Perathoner
>> webmaster@gutenberg.org
>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From bill at williamtozier.com  Thu Nov  2 05:36:41 2006
From: bill at williamtozier.com (William Tozier)
Date: Thu Nov  2 05:36:52 2006
Subject: [gutvol-d] Line by line proofing of OCR text?
In-Reply-To: <73794793.20061101191109@noring.name>
References: <73794793.20061101191109@noring.name>
Message-ID: <680E209C-05DF-476C-B31F-F47D08FB88BD@williamtozier.com>


On Nov 1, 2006, at 9:11 PM, Jon Noring wrote:

> So, asking the proofing mavens here, has this been tested? What are
> the fatal flaws in this? I can't help but think that this has already
> been thought of and discarded by Charles Franks when he started DP.
> But then, technology has changed the last few years, and maybe this
> idea may again be considered.

Far from being flawed, it's how I proof as well. As another  
respondent already pointed out, the DP proofing interface can be  
restructured to do something like this. Unfortunately, the diversity  
of proofers abilities, habits and interface preferences make it hard  
to standardize this sort of thing. Even fonts differ from platform to  
platform; if we're not working in Flash or some other typographically  
fixed standard, this sort of thing is sunk.

I'd say, though, that more important than presenting single lines to  
the reader, the act of forcing the gaze of the proofer to *follow*  
lines is what you're really looking for.

When proofing, I always ensure that the insertion cursor in the  
page's text field touches every character -- essentially I click  
before the first letter, and right-arrow through the entire text. Not  
least because this spell-checks every word (client-side, on my Mac),  
but also because the result is a word-by-word serial visit to every  
portion of the page. Even without Flash, we could imagine a number of  
interface elements that do this sort of thing: Something that  
serially highlights every word, two per second; an audible reader; a  
requirement that the cursor visit each letter before a page is  
considered done.

When I was a professional proofreader in a large academic printer,  
there were a number of old tried-and-true tricks we were taught:  
reading the text backwards, reading it aloud to a partner complete  
with punctuation, &c. But they all boiled down to getting the reader  
to look at the typeset page as a proofer, not a reader.  Slowing them  
down to the point where their eyes' habits were no longer  
comfortable, and they saw more of everything. Prohibiting saccades,  
among other things, and allowing them pay attention to short- and  
medium-scale textual patterns at the same time.

There are nearsighted little old ladies and 24-inch monitor-users  
among us at DP, and their ability to customize the interface and the  
presentation of the work is probably much more a boon than a threat:  
it invites more people to work. What we might consider is changing  
what that work is, to make it more obvious that it is not the kind of  
reading they're used to.
-----
Bill Tozier
AIM:    vaguery@mac.com
blog:   http://williamtozier.com/slurry
plazes: http://beta.plazes.com/user/BillTozier
skype:  vaguery

"Nature, however picturesque, never yet made a poet of a dullard."
   --Hjalmar Hjorth Boyesen


From desrod at gnu-designs.com  Thu Nov  2 06:20:32 2006
From: desrod at gnu-designs.com (David A. Desrosiers)
Date: Thu Nov  2 06:20:54 2006
Subject: [gutvol-d] gvd061101 -- the niceties of book typography
In-Reply-To: <4ad.3840eb24.327a8cac@aol.com>
References: <4ad.3840eb24.327a8cac@aol.com>
Message-ID: <1162477232.10976.36.camel@localhost.localdomain>

On Wed, 2006-11-01 at 18:50 -0500, Bowerbird@aol.com wrote:

> ...i am a beginner with perl... 
            ^^^^^^^^

You spelled "dangerous" wrong. ;) 


-- 
David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061102/7ae9287e/attachment.bin
From jon at noring.name  Thu Nov  2 07:54:35 2006
From: jon at noring.name (Jon Noring)
Date: Thu Nov  2 07:54:54 2006
Subject: [gutvol-d] Line by line proofing of OCR text?
In-Reply-To: <680E209C-05DF-476C-B31F-F47D08FB88BD@williamtozier.com>
References: <73794793.20061101191109@noring.name>
	<680E209C-05DF-476C-B31F-F47D08FB88BD@williamtozier.com>
Message-ID: <187839134.20061102085435@noring.name>

I'll answer both Joshua and Bill in this message...


Joshua Hutchinson wrote:

> At first blush, it seems like small return of investment.  The 
> programming required (as well as the difference in how scans/ocr are 
> prepared) would be very significant, while the increase in quality 
> would be minuscule.  DP gets very good results with their current 
> method and I think a better return on the programming investment would 
> be to implement a "roundless" system, where each page is proofed again 
> and again until a certain "confidence" level is reach.  Easy pages may 
> only be seen a couple times, while a particularly nasty page might get 
> seen by scores of people.  (See DP forums for length discussions of how 
> this system might work.)

A roundless approach definitely is smarter. Compare a page edit with
the prior edit, and when one does not see any new corrections, maybe
twice or three times in a row, there's high confidence the page has
been proofed to zero errors.

Since it seems like the real bottleneck at present in DP (at least
this is my understanding) is the latter stages, not the initial
proofing, then there should be no loss in throughput by implementing
this page edit comparison to get, hopefully, very high accuracy.


> But, as always, the bottleneck is developer time.  We ALWAYS have more 
> work than we have volunteers to do it.

Yep, this is one of the Laws of the Universe: There will never be
enough developers to do the job as one wants. <laugh/>


Bill Tozier wrote:

> I'd say, though, that more important than presenting single lines to  
> the reader, the act of forcing the gaze of the proofer to *follow*  
> lines is what you're really looking for.

Yes, this is definitely one of the problems I have with the current
system, knowing where one is on both the original page scan and the
text edit box. It requires effort for the mere mortal to realign
oneself as one goes back and forth between the original and the
proofed text. This realignment is, for a mere mortal like me at least,
pretty uncomfortable and quite inefficient.

For those with photographic memories (I am not of this elite), the
page-by-page approach probably works well for them. So, yes,
everyone is different in their abilities and preferences to proof
pages. I think the line-by-line approach should at least be
experimented with, and I'll look into doing so.


> When proofing, I always ensure that the insertion cursor in the  
> page's text field touches every character -- essentially I click  
> before the first letter, and right-arrow through the entire text. Not  
> least because this spell-checks every word (client-side, on my Mac),  
> but also because the result is a word-by-word serial visit to every  
> portion of the page. Even without Flash, we could imagine a number of  
> interface elements that do this sort of thing: Something that  
> serially highlights every word, two per second; an audible reader; a  
> requirement that the cursor visit each letter before a page is  
> considered done.

Interesting.


> When I was a professional proofreader in a large academic printer,  
> there were a number of old tried-and-true tricks we were taught:  
> reading the text backwards, reading it aloud to a partner complete  
> with punctuation, &c. But they all boiled down to getting the reader  
> to look at the typeset page as a proofer, not a reader.

Again interesting. The line-by-line approach definitely forces this
naturally, because usually there's little interesting content-wise in
a single line to distract -- it also eliminates reading since one is
doing a vertical comparison, rather than horizontal.


> There are nearsighted little old ladies and 24-inch monitor-users  
> among us at DP, and their ability to customize the interface and the  
> presentation of the work is probably much more a boon than a threat:  
> it invites more people to work. What we might consider is changing  
> what that work is, to make it more obvious that it is not the kind of  
> reading they're used to.

One thing I like with the line-by-line system is that it might even
allow proofing on limited hardware, like PDA's. Here we might not even
allow the proofer to make any edits -- but simply to flag whether the
text is right or not. (Hmmm, this is interesting). If the line gets
flagged 2-3 times that no edits occured, we assume it is proofed to
zero errors. If flagged as having an error, then someone else can
actually do the edit. I surmise that with the quality of OCR today,
plus the PG/DP tools to pre-process an OCR text, that in an *average*
book the percentage of lines with errors will be fairly low (less than
10% ???). Anyway...

*****

Now, it is my understanding that most advanced OCR packages can
produce an XML document of the raw OCR text, and the XML data includes
the bounding box information (the coordinates on the original page
scan where a word occurs) and line information. (I'm sure what I just
described is well-known among most of the PG/DP OCR experts, but I'm
sharing it with the others here who may not be aware.)

For example, here's a link which Branko Collin posted a few months ago
in a comment to the TeleRead blog. It points to one of these XML
documents, produced by DJVU OCR:

   http://ia201107.eu.archive.org/2/items/englishbookbindings00davenuoft/englishbookbindings00davenuoft_djvuxml.xml

(depending upon one's browser, you may have to look at the source to
see the bare document.)

This XML document contains all the raw OCR text associated with each
scanned page in the DJVU book.

Here's a snippet of the markup from somewhere in the middle, for
"page 36":


<PARAM name="PAGE" value="englishbookbindings00davenuoft_0036.djvu" />
  <HIDDENTEXT>
    <PAGECOLUMN>
      <REGION>
        <PARAGRAPH>
          <LINE>
            <WORD coords="644,531,904,473">XXVIII</WORD>
          </LINE>
        </PARAGRAPH>
        <PARAGRAPH>
          <LINE>
            <WORD coords="1168,537,1785,452">GENERAL</WORD>
          </LINE>
        </PARAGRAPH>
        <PARAGRAPH>
          <LINE>
            <WORD coords="1868,543,2868,457">INTRODUCTION</WORD>
          </LINE>
        </PARAGRAPH>
        <PARAGRAPH>
          <LINE>
            <WORD coords="646,747,811,652">the</WORD>
            <WORD coords="890,780,1457,653">eighteenth</WORD>
            <WORD coords="1534,782,1939,670">century</WORD>
            <WORD coords="2004,748,2062,689">a</WORD>
            <WORD coords="2128,751,2343,689">new</WORD>
            <WORD coords="2414,783,2703,689">grace</WORD>
            <WORD coords="2778,754,2980,692">was</WORD>
            <WORD coords="3058,754,3375,660">added</WORD>
          </LINE>
          <LINE>
            <WORD coords="644,922,775,794">by</WORD>
            <WORD coords="832,892,999,798">the</WORD>
            <WORD coords="1067,927,1513,800">inlaying</WORD>
            <WORD coords="1580,893,1689,801">of</WORD>
            <WORD coords="1745,893,1803,832">a</WORD>
            <WORD coords="1871,894,2235,799">leather</WORD>
            <WORD coords="2301,893,2410,801">of</WORD>
            <WORD coords="2466,892,2524,833">a</WORD>
            <WORD coords="2592,896,2951,806">second</WORD>
            <WORD coords="3015,897,3374,804">colour.</WORD>
          </LINE>
<!-- snip -->


For each line, we can easily determine the top line and bottom line
coordinates so the "strip" of the page scan associated with the line
can be displayed (as well as where the first word in the line starts
and where the final word ends -- useful for alignment of the strip
with the editable text.)

[We have a knotty problem if changes are made to the text in a line,
in rewriting the edits back into the XML (I won't explain why.) So we
only use the XML bounding box information to give us the coordinates
of the 'strip' in the image associated with a line, but we won't
update the original XML document. We might produce a different XML doc
with the edited results, though, viz.

<paragraph>
   <line tcoord="652"
         bcoord="783"
         lcoord="646"
         rcoord="3375">the eighteenth century a new grace was added</line>
   <line tcoord="794"
         bcoord="927"
         lcoord="644"
         rcoord="3374">by the inlaying of a leather of a second colour.</line>
   ...


Jon Noring

From marcello at perathoner.de  Thu Nov  2 08:55:46 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  2 08:55:51 2006
Subject: [gutvol-d] Line by line proofing of OCR text?
In-Reply-To: <187839134.20061102085435@noring.name>
References: <73794793.20061101191109@noring.name>	<680E209C-05DF-476C-B31F-F47D08FB88BD@williamtozier.com>
	<187839134.20061102085435@noring.name>
Message-ID: <454A2312.7080700@perathoner.de>

Jon Noring wrote:

> Yes, this is definitely one of the problems I have with the current
> system, knowing where one is on both the original page scan and the
> text edit box. It requires effort for the mere mortal to realign
> oneself as one goes back and forth between the original and the
> proofed text. This realignment is, for a mere mortal like me at least,
> pretty uncomfortable and quite inefficient.

The quick fix would be to implement a function that puts a horizontal
ruler on the image window if you click on it. (And scrolls the window so
the ruler is in the vertical middle.)

A few lines of JavaScript will do that. Firefox will even support
opacity so you can highlight a portion of text.


>           <LINE>
>             <WORD coords="646,747,811,652">the</WORD>
>             <WORD coords="890,780,1457,653">eighteenth</WORD>

Why not break the whole text down into words and use it as captcha
(http://en.wikipedia.org/wiki/Captcha) for the PG website? Everybody who
wants to download a file has to decipher a word. Haha, only serious.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Thu Nov  2 11:49:02 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  2 11:49:09 2006
Subject: [gutvol-d] gvd061102 -- thoughts on a thursday
Message-ID: <d02.cb720f.327ba5ae@aol.com>

today i'll wait for someone else to contribute to babelfish,
our little open-source project here...

but i have some other thoughts...

***

jon noring said:
>    I can't help but think that this has already been thought of 
>    and discarded by Charles Franks when he started DP.

well, that's probably because charles floated this very idea at 
the meetings held in san francisco for the 10,000th e-text...
so that's where you "got" the idea, jon.

i've tested the method, and yes, it would work just fine, except
there's no need to do line-by-line proofing at _all_ these days,
so finding a "better" way to go about doing it is irrelevant...

besides, i'm not sure how this would fit in the d.p. interface,
what with the slicing of each scan into dozens of files...

as for the coordinates of each line or word...

although it's simple enough to get that coordinate information
from an o.c.r. program, it is also very simple to write a routine
that collects the information just by examining the actual scan.

a screenshot of output from such a routine can be seen here:
>   http://snowy.arsc.alaska.edu/bowerbird/misc/line-determination.jpg

the number to the left of each line gives its topmost pixel, while
the number to its right indicates the height of its bottom pixels.
considering that the setting of this type was a _manual_ process,
the leading is amazingly consistent throughout, as you'll notice.
those typesetters really had their craft down...

i use this routine to highlight a line -- as shown in the graphic --
where a possible error might exist.   my program also selects the
questionable text -- in the editfield displayed next to the scan --
because automating this boring manual work of doing a correction
makes the process go much more quickly.   the proofer's attention is
drawn to the red-highlighted line on the scan so they can read that,
and then focus on the text-in-question to correct it when necessary.

***

jon said:
>    A roundless approach definitely is smarter.

gee, when both jon and josh agree with me,
i figure it won't be long before things change.

unfortunately "not long" is not the same thing as 
"soon" over in the land of distributed proofreaders.

meanwhile, any more reaction to the duguid article?

heck, noring's little "idea" about a proofing wrinkle
has pulled more commentary than duguid's piece...

so it's a good think duguid took his piece to the public,
instead of letting it get buried by taking it to d.p. alone.

***

jon said:
>   Yep, this is one of the Laws of the Universe: 
>    There will never be enough developers to 
>    do the job as one wants. <laugh/>

you should try the "open-source community",
where there are scads of programmers who
will happily do your programming for free...

***

marcello said:
>    Nobody takes BB seriously

wishful thinking!

the .tei folks have been touting their "solution"
for 5 years now, and nothing has yet materialized.

and -- in the 3 years i've been on this listserve --
the size of the library doubled to 20,000 e-texts.

when i've mirrored the whole thing in z.m.l. format,
and can maintain the entire library in my spare time,
while p.g. is still trying to figure out what kind of .tei
they're gonna settle on, and then goes begging for
the expertise needed to maintain that complex format,
let alone get any useful functionality out of it, we'll see
who takes whom seriously...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061102/4d3f06bc/attachment-0001.html
From jon at noring.name  Thu Nov  2 12:21:11 2006
From: jon at noring.name (Jon Noring)
Date: Thu Nov  2 12:21:31 2006
Subject: [gutvol-d] gvd061102 -- thoughts on a thursday
In-Reply-To: <d02.cb720f.327ba5ae@aol.com>
References: <d02.cb720f.327ba5ae@aol.com>
Message-ID: <1455561431.20061102132111@noring.name>

jon noring said:

>>??  I can't help but think that this has already been thought of
>>??  and discarded by Charles Franks when he started DP.

>  well, that's probably because charles floated this very idea at 
>  the meetings held in san francisco for the 10,000th e-text...
>  so that's where you "got" the idea, jon.

Is that the meeting held at the Internet Archive which you and I
attended? I don't remember Charles mentioning this technique, nor
again when I met him in Las Vegas a few months later. So if he did,
it has bounced around in my subconscious for a while and only now
is emerging as I see a need for it.

Charles (if you're still there), and Juliet, was the line-by-line
editing method mentioned at the PG/IA bash?


>  you should try the "open-source community",
>  where there are scads of programmers who
>  will happily do your programming for free...

Agreed, but even there, there's never enough volunteers to do all that
is often needed.

Jon

From Bowerbird at aol.com  Thu Nov  2 14:44:25 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  2 14:44:31 2006
Subject: [gutvol-d] sometimes i read the funniest things
Message-ID: <c34.5188007.327bcec9@aol.com>

sometimes i read the funniest things.         :+)

like when carlo said this, about me, over on the d.p. boards:
>   he regurarly googles himself, and might come back if he 
>    is named in an open forum. This one is currently not open, 
>    hence it is not indexed, but it is better to edit the posts 
>    with the name anyway.

don't be silly, carlo.   i haven't done a vanity search in ages,
mostly because "bowerbird" turns up too many false alarms.
(it seems there are some birds called that, in australia and
new guinea.   who knew?)          ;+)

you really think i care that i'm mentioned over there?
especially since the mentions are uniformly asinine?

i read the d.p. boards to learn stuff about digitizing.
i like to do my homework.

***

for instance, laura said this, yesterday:
>    For what it's worth, this is how Wikipedia handles equations 
>    on the pages where the need exists. Any mathematical equation 
>    is enclosed in <math> tags, and the rest of the page is dealt with 
>    normally. The server gives the reader a .png of the result, 
>    when the page is viewed.

that's interesting.   wikipedia serves up a graphic of an equation.
some of you people should go tell them how inadequate that is.

***

and here's what josh said just the other day:
>   PGTEI is extremely useful for one big reason. ?
>    You create one master document and then 
>    the system creates the other files automatically.

of course, the only two formats they are outputting to
-- even when they _do_ create an occasional .tei text --
is .pdf and .html, which is straightforward enough that
it can be done with the much-easier-to-maintain z.m.l.

because of this, the "multiple-formats" rationale will not fly
here, because i'll shoot it down, but over on the d.p. boards,
they're still giving it as the reason volunteers need to love .tei.

***

and carlo, speaking of you, here's another thing you just said:
>   The guiprep user guide that you quoted gives the wrong 
>    impression that you have to use rtf files. This is necessary 
>    only if you extract italics and bold markup, that should be 
>    avoided. For manual pre-processing txt is much better.

you're actually _recommending_ save people o.c.r. to plain-ascii,
which means all the formatting has to be reapplied by volunteers.
man, how laughable and backward is _that_?   like i've said before,
it's a good thing those volunteers don't realize how much of their
time and energy you are _wasting_, or they wouldn't stick around.

it's 2006, we've got full-on word-processing on a web-page, and
distributed proofreaders still strips everything back to raw ascii...

and telling people they have to do .tei markup to get .html and .pdf.

so hey, i'd have _plenty_ to talk about if i _was_ on the d.p. boards.

***

however, lucy's right, i have much better things to do than waste my
time and energy over there on the d.p. boards.   but if i _do_ decide
it might be fun to come and start in again, y'all will be blessed with
256 more posts -- on top of the 256 posts i've made so far, which
evidently made an impression, since you're still talking about me --
before i call another halt, so yeah, maybe you better not mention me.

(believe me, elisa, i don't need to go anywhere near 32,768 posts;
512 will be _more_ than sufficient to be remembered a long time.
especially since i've now started delivering the pudding...)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061102/8ad08564/attachment.html
From cannona at fireantproductions.com  Thu Nov  2 17:27:47 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu Nov  2 17:27:56 2006
Subject: [gutvol-d] sometimes i read the funniest things
References: <c34.5188007.327bcec9@aol.com>
Message-ID: <003901c6fee7$495ba0c0$0300a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bowerbird wrote:

for instance, laura said this, yesterday:
>    For what it's worth, this is how Wikipedia handles equations
>    on the pages where the need exists. Any mathematical equation
>    is enclosed in <math> tags, and the rest of the page is dealt with
>    normally. The server gives the reader a .png of the result,
>    when the page is viewed.

that's interesting.   wikipedia serves up a graphic of an equation.
some of you people should go tell them how inadequate that is.


Hmmm...  Let's think about this.  Wikipedia renders latex into images and
serves those images by default.  Your program just displays the images and
leaves it up to the ebook producer to render/scan/draw them.  Wikipedia lets
you change the default behavior and have the latex sent to the browser
directly.  Your program gives the user no choice in the matter.  So, once
again, the only thing that's inadequate is your zml.

Aaron Cannon

- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFFSpsbI7J99hVZuJcRAvp+AJoCozh5nXEGTvEIB1HpDIsyQRVNGACeIJZh
VVctWUw0IOAsoDVHRo3/yGI=
=iCTH
-----END PGP SIGNATURE-----

From schultzk at uni-trier.de  Fri Nov  3 00:33:52 2006
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Fri Nov  3 00:33:58 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <4549DC5F.8010002@perathoner.de>
References: <c14.79d1e63.3277cf5e@aol.com>	<1162248212.5857.1.camel@localhost.localdomain>	<45477F3D.70205@perathoner.de>
	<064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>
	<4549DC5F.8010002@perathoner.de>
Message-ID: <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de>

Hi Marcello,

	I YOU DO NOT take Bowerbird seriously, then why do you
	even make the effort to be absolutely nasty to him.
	If he is not to be taken seriously THEN IGNORE him and
	with all so called pests, he will go away.

	First off,  his method is not any better than anything or worse
	than your socalled TEI ! As you said it is not finished nor
	anywhere near it. The TEI movement has been around for at least
	5 years. As far as I am concerned it is vaporware. So if Bowerbird
	wants to get something going let him do it.

	THE ACTUAL PROBLEM PG has that there is no official specification
	to an official format or markup!! If there was one it would only take
	at the max 2 months to get something working. BUT, for the past 10
	years all I see is bits and pieces and a lot of discussion.

	Since the begining of PG I have discuss and ask Micheal Hart for a  
change
	in concept that PG use something else than PLAIN VANILLA ASCII as  
the BASE
	format for PG. To do conversion after scanning and especially  
proofing IS
	NOT the way to GO! It is a waste of resources. Furthermore, everbody  
wants
	to reinvent the wheel. Micheal Hart has to step in and allow for a  
change.
	We can always still make plain vanilla texts availible.

	It is not hard for the people doing the scanning and proofing to  
learn a official
	structure and use the tools that we can create.

	How about we all sitting done and discussing an official format and  
specifing it and
	geting the tools implemented? Instead of everybody doing there own  
thing !!
	The implementation will naturally have to be multi-platform and able  
to be
	web-based, too. We will have to use freely availible programming  
tools and languages.
	
Am 02.11.2006 um 12:54 schrieb Marcello Perathoner:

> Schultz Keith J. wrote:
>
>>     Hey XML can do all that. All we need is a common xml template.  
>> One
>> format! a known straucture
>>     a few filters and voila. a neat package exactly what everybody is
>> trying to create.
>
> This is the reason why we have to shut up BB: People reading this list
> will think that the most vociferous person represents the consensus in
> PG research. Not so. BB just pesters eveybody who doesn't want to hear
> with *his* at best half-baked ideas about text representation and
> delivery. Nobody takes BB seriously, and you shouldn't too.
>
>
> The state of PG research is:
>
> Consensus has been reached about using a subset of TEI as master  
> format
> for PG texts (since PGXML seems to be dead). Which subset is still  
> being
> discussed.
>
> There are at least 2 different working toolchains to convert  
> subsets of
> TEI to end user formats. Files produced with these toolchains have  
> been
> posted.
>
> Of course, everything is still in active research and can change a  
> lot.
> But nobody seriously considers using anything other than TEI or XML as
> master format.
>
>
>
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From schultzk at uni-trier.de  Fri Nov  3 00:45:06 2006
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Fri Nov  3 00:45:10 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <24690987.1162474398658.JavaMail.?@fh1038.dia.cp.net>
References: <24690987.1162474398658.JavaMail.?@fh1038.dia.cp.net>
Message-ID: <B6863162-BAD2-4025-93D1-2DD246303B84@uni-trier.de>

Hi Josh,

	Yes, I did and still direct that question at
	him. It has to do with his remarks and a post
	he made about two weeks ago, as with continuous
	posts he is making.

	Bowerbird is synical (spelling). He has his ways and ideas.
	Not all are bad. Just like his points. But, he has not gotten
	rude before somebody else has.

	Just because you do not like someone, do not jump on him
	everytime you see him. The proper way is to ignore him and
	not scream at him!!!

	regards
		Keith.
	
Am 02.11.2006 um 14:33 schrieb joshua@hutchinson.net:

> Are you sure you meant to address those comments to Marcello?
>
> What you are talking about is what Marcello has done.  bowerbird is
> the kindergarten kid you seem to be talking about...
>
> Josh
>
>> ----Original Message----
>> From: schultzk@uni-trier.de
>> Date: Nov 2, 2006 3:05
>> To: "Project Gutenberg Volunteer Discussion"<gutvol-d@lists.pglaf.
> org>
>> Subj: Re: [gutvol-d] gvd061030 -- let&#39;s get it started in here
>>
>> Hi Marcello,
>>
>> 	I will ask my question again: Do you know what you are doing?
>>
>>
>> 	Why do not you can your comments to another list, please.
>>
>> 	Your are worse than a kindergarden kid.
>>
>> 	Personally, I do not know any of the systems, but from what I
>> 	heard and seen they are all primitive and none do the JOB and will.
>> 	
[snip, snip]
From Bowerbird at aol.com  Fri Nov  3 02:05:38 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Nov  3 02:05:42 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
Message-ID: <cfe.da3a8d.327c6e72@aol.com>

keith said:
>    he has not gotten rude before somebody else has.

thanks for noticing.          :+)

and actually, i hope i haven't gotten rude even then.

i _will_ admit to heavy sarcasm, but that at least has
an element of humor in it, which my detractors lack,
so they are left with nothing but meanspiritedness...

and i regret that i bring that out in them, i do.
and if i didn't have a message that i firmly believe
needs to be heard here, then i would probably let 
their bullying carry the day, and go away.   but i 
have something to say.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061103/78ac4ee4/attachment.html
From marcello at perathoner.de  Fri Nov  3 03:47:50 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov  3 03:47:53 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de>
References: <c14.79d1e63.3277cf5e@aol.com>	<1162248212.5857.1.camel@localhost.localdomain>	<45477F3D.70205@perathoner.de>	<064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>	<4549DC5F.8010002@perathoner.de>
	<73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de>
Message-ID: <454B2C66.5020100@perathoner.de>

Schultz Keith J. wrote:

>     First off,  his method is not any better than anything or worse
>     than your socalled TEI ! As you said it is not finished nor
>     anywhere near it. The TEI movement has been around for at least
>     5 years. As far as I am concerned it is vaporware. So if Bowerbird
>     wants to get something going let him do it.

zml is worse than TEI because it cannot represent text features that are
widely used in the library. If a simple text format could do that, major
universities would use it. They all use TEI instead.

But if you mention that to BB, he just tells you to do his homework for him.

The TEI markup language is long since ready to use, it is undergoing its
5th revision right now. What is still in development are the tools to
convert TEI to PG end-user formats. But this doesn't stop you from
marking up any text using the full TEI right now.


>     THE ACTUAL PROBLEM PG has that there is no official specification
>     to an official format or markup!! If there was one it would only take
>     at the max 2 months to get something working. BUT, for the past 10
>     years all I see is bits and pieces and a lot of discussion.

Your judgement of development time is very poor. BB is working nearly 4
years on his very easy to do zml thingie and has nothing to show. But
even if he knew how to program, he couldn't do anything in 2 months.


>     How about we all sitting done and discussing an official format and
> specifing it and
>     geting the tools implemented? Instead of everybody doing there own
> thing !!
>     The implementation will naturally have to be multi-platform and able
> to be
>     web-based, too. We will have to use freely availible programming
> tools and languages.

Design by committee will never work in PG. It is against Michael Hart's
expressed policy to step down and give directions.

So, if you want an "official" format, the way to go is to create it
yourself and make it so good it will blow your opponents' formats away.
Maybe BB will team up with you. Good luck!


-- 
Marcello Perathoner
webmaster@gutenberg.org

From schultzk at uni-trier.de  Fri Nov  3 04:38:00 2006
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Fri Nov  3 04:38:08 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <454B2C66.5020100@perathoner.de>
References: <c14.79d1e63.3277cf5e@aol.com>	<1162248212.5857.1.camel@localhost.localdomain>	<45477F3D.70205@perathoner.de>	<064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>	<4549DC5F.8010002@perathoner.de>
	<73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de>
	<454B2C66.5020100@perathoner.de>
Message-ID: <DBB7DBB0-D2AA-46BF-ABAB-0AC2A76D198E@uni-trier.de>

Hi Marcello,

	
Am 03.11.2006 um 12:47 schrieb Marcello Perathoner:

> Schultz Keith J. wrote:
>
>>     First off,  his method is not any better than anything or worse
>>     than your socalled TEI ! As you said it is not finished nor
>>     anywhere near it. The TEI movement has been around for at least
>>     5 years. As far as I am concerned it is vaporware. So if  
>> Bowerbird
>>     wants to get something going let him do it.
>
> zml is worse than TEI because it cannot represent text features  
> that are
> widely used in the library. If a simple text format could do that,  
> major
> universities would use it. They all use TEI instead.
	The only place I heard about TEI is here. I also do care to
	know where it is used, so do not even think about telling me
	where.
	Whether zml or TEI is better I personally could careless.
	IMHO neither is worth the effort. There are already better systems
	out there that work and have tools for them. As I have said time and
	time again people like to reinvented the wheel to get a so called
	simpler method, but at the same time getting a system that have serious
	lackings.

>
> But if you mention that to BB, he just tells you to do his homework  
> for him.
	To quote a friend: "One million flies can not err!! Eat Shit!!"
	This quote is NOT directed at BB.
>
> The TEI markup language is long since ready to use, it is  
> undergoing its
> 5th revision right now. What is still in development are the tools to
> convert TEI to PG end-user formats. But this doesn't stop you from
> marking up any text using the full TEI right now.
>
>
>>     THE ACTUAL PROBLEM PG has that there is no official specification
>>     to an official format or markup!! If there was one it would  
>> only take
>>     at the max 2 months to get something working. BUT, for the  
>> past 10
>>     years all I see is bits and pieces and a lot of discussion.
>
> Your judgement of development time is very poor. BB is working  
> nearly 4
> years on his very easy to do zml thingie and has nothing to show. But
> even if he knew how to program, he couldn't do anything in 2 months.
	I was NOT talking about Bowerbird!!!! I was talking about PG
	in general !!!
>
>
>>     How about we all sitting done and discussing an official  
>> format and
>> specifing it and
>>     geting the tools implemented? Instead of everybody doing there  
>> own
>> thing !!
>>     The implementation will naturally have to be multi-platform  
>> and able
>> to be
>>     web-based, too. We will have to use freely availible programming
>> tools and languages.
>
> Design by committee will never work in PG. It is against Michael  
> Hart's
> expressed policy to step down and give directions.
	It would, if he allowed it. It works everywhere else in the world!!!

>
> So, if you want an "official" format, the way to go is to create it
> yourself and make it so good it will blow your opponents' formats  
> away.
> Maybe BB will team up with you. Good luck!
	Nice contradiction here !! I Keith J. Schultz can make and force
	an official PG format on everybody ???!!!

	Come on, Marcello. I get the feeling you have a big ego problem.
	
	I would have already set up a working system 10 years ago, if Micheal
	would have allowed it. But, I will not start doing anything unless
	it gets an official go ahead and will be used by PG officially.
	My time is to precious to waste on anything else.

	regards
		Keith.
From marcello at perathoner.de  Fri Nov  3 04:48:20 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov  3 04:48:24 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <cfe.da3a8d.327c6e72@aol.com>
References: <cfe.da3a8d.327c6e72@aol.com>
Message-ID: <454B3A94.3000301@perathoner.de>

Bowerbird@aol.com wrote:

> but i have something to say.

New *updated* edition! Illustrated! Get it now! Run!

  "The Proof is in the Poo"

  The Collected Sayings of Bowerbird.


As HTML:

  http://www.gnutenberg.de/bowerbird/


Also available as TEI master:

  http://www.gnutenberg.de/bowerbird/poo.tei


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Fri Nov  3 05:33:10 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov  3 05:33:13 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <DBB7DBB0-D2AA-46BF-ABAB-0AC2A76D198E@uni-trier.de>
References: <c14.79d1e63.3277cf5e@aol.com>	<1162248212.5857.1.camel@localhost.localdomain>	<45477F3D.70205@perathoner.de>	<064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>	<4549DC5F.8010002@perathoner.de>	<73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de>	<454B2C66.5020100@perathoner.de>
	<DBB7DBB0-D2AA-46BF-ABAB-0AC2A76D198E@uni-trier.de>
Message-ID: <454B4516.3070605@perathoner.de>

Schultz Keith J. wrote:

>     I know what it takes top do the job! It is my profession: Linguistics.

but he also wrote:

>     The only place I heard about TEI is here.

"The TEI was originally sponsored by the Association of Computers in the
Humanities (ACH), the Association for Computational Linguistics (ACL),
and the Association of Literary and Linguistic Computing (ALLC). Major
support has been received from the U.S. National Endowment for the
Humanities (NEH), the European Community, the Mellon Foundation, and the
Social Science and Humanities Research Council of Canada."

  http://www.tei-c.org/


A list of 134 projects using TEI can be found here:

  http://www.tei-c.org/Applications/


>     I would have already set up a working system 10 years ago, if Micheal
>     would have allowed it. But, I will not start doing anything unless
>     it gets an official go ahead and will be used by PG officially.

I'm sure you would! Especially with the great competence you have
already shown in your very own professional field.

Bottom line: You never will get PG to officially endorse any one format.

Your only chance is to create a format that is so much better than any
other format, that PG and DP volunteers don't want to use anything else.
At that point it will automatically become the "official" format.

Tip: don't go to DP and piss off everybody in sight like BB did. DP
people will have a very loud say in this question because its they that
create the books.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From hyphen at hyphenologist.co.uk  Fri Nov  3 06:15:42 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Fri Nov  3 06:16:17 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <454B3A94.3000301@perathoner.de>
References: <cfe.da3a8d.327c6e72@aol.com> <454B3A94.3000301@perathoner.de>
Message-ID: <0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com>

On Fri, 03 Nov 2006 13:48:20 +0100,  Marcello Perathoner
<marcello@perathoner.de> wrote:

|  http://www.gnutenberg.de/bowerbird/poo.tei
>>>
The XML page cannot be displayed 
Cannot view XML input using style sheet. Please correct the error and then
click the Refresh button, or try again later. 


--------------------------------------------------------------------------------

The system cannot locate the object specified. Error processing resource
'http://www.tei-c.org/P4X/DTD/pgtei-extensions.ent...

<!ENTITY % TEI.extensions.ent '' >%TEI.extensions.ent;
<<<
 

So modern I can not view it with latest IE  6.0.2900.2180


What use is something which *ordinary* people can not read?

-- 
Dave Fawthrop <hyphen@hyphenologist.co.uk> 

From joshua at hutchinson.net  Fri Nov  3 06:31:13 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Fri Nov  3 06:31:24 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
Message-ID: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net>


>----Original Message----
>From: marcello@perathoner.de
>
>Schultz Keith J. wrote:
>
>>     I would have already set up a working system 10 years ago, if 
Micheal
>>     would have allowed it. But, I will not start doing anything 
unless
>>     it gets an official go ahead and will be used by PG officially.
>
>
>Bottom line: You never will get PG to officially endorse any one 
format.
>
>Your only chance is to create a format that is so much better than 
any
>other format, that PG and DP volunteers don't want to use anything 
else.
>At that point it will automatically become the "official" format.
>

Marcello is right.  Greg and Michael have both said, in private 
communications and in public messages, that they will not dictate 
direction in PG.  Michael, especially, likes the "throw it against the 
wall and see if it sticks" method of management.  While it can be 
frustrating at times, because a more decisive leadership can often 
"make things happen," this is not something that is likely to change.  
Ever.  So, we have to plan with that in mind.

That said, the problem with TEI as a master format isn't that the 
format itself isn't ready.  It isn't even really the toolchain that 
converts TEI to other formats (and it is more formats than bb gives us 
credit for ... HTML, PDF, UTF-8, Latin-1, and US-ASCII are directly 
created, then a background server script creates a plucker document 
after posting ... so 6 formats are created from one TEI master), the 
problem is during the earlier steps.  Creating the TEI doc itself.  We 
have a good set of guidelines, but no tools specifically designed to 
help in that process.  

Personally, I use the GuiGuts editor developed by thundergnat over at 
DP (a wonderful little perl-based editor!) and a series of Regular 
Expressions that do the heavy lifting.  But I still spend quite a bit 
of time manually tweaking the results.  It's nice that once I'm done 
and it validates in TEI, I *know* the results will validate in HTML, 
create a good PDF and follow PG guidelines in the text documents, but 
it'd be nicer to have a better interface/toolset for creating the TEI.

Unfortunately, it is a bit of a chicken and egg thing.  Until I can 
make TEI more popular with folks, the developers don't make the tools.  
And until I have the tools, I can't get enough people to use it to 
reach critical mass.  I can create texts (and do) in TEI, but I don't 
have the skills to make tools for helping in the creation of said TEI 
docs.

As Jon Noring agreed earlier, there are NEVER enough developers to go 
around!  ;)

Josh
From desrod at gnu-designs.com  Fri Nov  3 06:39:29 2006
From: desrod at gnu-designs.com (David A. Desrosiers)
Date: Fri Nov  3 06:40:18 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com>
References: <cfe.da3a8d.327c6e72@aol.com> <454B3A94.3000301@perathoner.de>
	<0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com>
Message-ID: <1162564769.18628.11.camel@localhost.localdomain>

On Fri, 2006-11-03 at 14:15 +0000, Dave Fawthrop wrote:

> What use is something which *ordinary* people can not read? 

"ordinary" people use the HTML version, and those with the proper TEI
environment set up, use the TEI version. 


-- 
David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061103/588a40ad/attachment.bin
From hart at pglaf.org  Fri Nov  3 08:37:14 2006
From: hart at pglaf.org (Michael Hart)
Date: Fri Nov  3 08:37:16 2006
Subject: !@!Re: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net>
References: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net>
Message-ID: <Pine.LNX.4.60.0611030808520.8971@pglaf.org>


Neither Joshus Hutchinson, Marcello Perathoner, or Keith Shultz are correct,
and, in two of these three cases, this is part of a long running pattern, so
it is likely an intentional error.

Refutations applied to each comment below:


On Fri, 3 Nov 2006, joshua@hutchinson.net wrote:

>> ----Original Message---- From: marcello@perathoner.de
>>
>> Schultz Keith J. wrote:
>>
>>>     I would have already set up a working system 10 years ago, if Micheal
>>>     would have allowed it. But, I will not start doing anything unless
>>>     it gets an official go ahead and will be used by PG officially.

1.  Project Gutenberg encourages all such system setups and always has.

2.  You are now, and always have been, welcome to your own directories
on the official Project Gutenberg servers to work with.  Greg Newby
will be only too glad to set up any such services you like, and even
to help recruit volunteers to help you.


>> Bottom line: You never will get PG to officially endorse any one format.

To the extent of an exclusivve endorsement that would disallow other formats
that is is most likely true, but if you don't have faith in your own format,
then you can't expect anyone else to have it.

There will be no government sponsored religions here.


>> Your only chance is to create a format that is so much better than any other 
>> format, that PG and DP volunteers don't want to use anything else. At that 
>> point it will automatically become the "official" format.

As it should be.


> Marcello is right.  Greg and Michael have both said, in private communications
> and in public messages, that they will not dictate direction in PG.

We will not dictate one direction at the expense of other similar efforts.

This is not a race to create the official exclusive Project Gutenberg format.

We will present eBooks in lots of formats, particularly those request by
those to read our books.


> Michael, especially, likes the "throw it against the wall and see if 
> it sticks" method of management.

The actual quotation is:

"We encourage you to run your ideas up the flagpole and see who salutes."

If you can't get anyone to use your format, we are hardly going to force
your ideas through anyone's alimentary canal, either.


> While it can be frustrating at times, because a more decisive leadership can 
> often "make things happen," this is not something that is likely to change. 
> Ever.  So, we have to plan with that in mind.


"Make things happen" is exactly what Project Gutenberg encourages.

What you seem to want is for someone else to "make things happen" for you.

We'll help, but we won't do it for you.

And we won't declare your or your format the "official" winner.

There will always be be room for improvements.

Project Gutenberg is a dyanmic process to maximize the eBook potential,
not a static system to be once achieved and then left as a fossil record.

"Make Things Happen"

Don't wait for someone else to tell you that your idea has happened.


> Unfortunately, it is a bit of a chicken and egg thing.  Until I can
> make TEI more popular with folks, the developers don't make the tools.
> And until I have the tools, I can't get enough people to use it to
> reach critical mass.  I can create texts (and do) in TEI, but I don't
> have the skills to make tools for helping in the creation of said TEI docs.

Just start by posting your examples and pointing to them.

That's how YouTube, MySpace, Google, Yahoo, and Project Gutenberg started.

Don't expect to start at the end, it helps to start at the beginning.


> As Jon Noring agreed earlier, there are NEVER enough developers to go
> around!  ;)
>
> Josh


Or, there are too many developers creating not enough example eBooks
to generate any interest.

Over the 10 years mentioned at the top of this commentary, if you had
created just one eBook per month for any particular new style of format
then you would have a collection of well over 100 eBooks to demonstrate.

Without such an initial collection, it's hard to expect anyone to come.

"Build it, and they will come."


Thanks!!!

Give the world eBooks in 2006!!!

Michael S. Hart
Founder
Project Gutenberg

Blog at http://hart.pglaf.org

From marcello at perathoner.de  Fri Nov  3 08:51:30 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov  3 08:51:33 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com>
References: <cfe.da3a8d.327c6e72@aol.com> <454B3A94.3000301@perathoner.de>
	<0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com>
Message-ID: <454B7392.1060201@perathoner.de>

Dave Fawthrop wrote:
> On Fri, 03 Nov 2006 13:48:20 +0100,  Marcello Perathoner
> <marcello@perathoner.de> wrote:
> 
> |  http://www.gnutenberg.de/bowerbird/poo.tei
> The XML page cannot be displayed 
> Cannot view XML input using style sheet. Please correct the error and then
> click the Refresh button, or try again later. 
> 
> 
> --------------------------------------------------------------------------------
> 
> The system cannot locate the object specified. Error processing resource
> 'http://www.tei-c.org/P4X/DTD/pgtei-extensions.ent...
> 
> <!ENTITY % TEI.extensions.ent '' >%TEI.extensions.ent;
> <<<
> 
> So modern I can not view it with latest IE  6.0.2900.2180
> 
> What use is something which *ordinary* people can not read?

The error is in your browser. The object is served as "text/plain". That
means: It is to be displayed to the user without further ado. IE has no
business trying to interpret it in any way.


$ wget -S http://www.gnutenberg.de/bowerbird/poo.tei
--17:26:32--  http://www.gnutenberg.de/bowerbird/poo.tei
           => `poo.tei'
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:8118... connected.
Proxy request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Fri, 03 Nov 2006 16:26:32 GMT
  Server: Apache/2.0.55 (Debian) mod_jk/1.2.18 PHP/5.1.6-1
  Last-Modified: Fri, 03 Nov 2006 12:50:56 GMT
  ETag: "27400a-7c6c-2fe12c00"
  Accept-Ranges: bytes
  Content-Length: 31852
  Content-Type: text/plain; charset=utf-8
  Connection: close
Length: 31,852 (31K) [text/plain]

100%[====================================>] 31,852       123.50K/s

17:26:32 (123.38 KB/s) - `poo.tei' saved [31852/31852]

$

-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Fri Nov  3 08:58:09 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Nov  3 08:58:14 2006
Subject: [gutvol-d] The Proof is in the Poo
Message-ID: <bd0.77e4cde.327ccf21@aol.com>

marcello said:
>    Also available as TEI master:
>    http://www.gnutenberg.de/bowerbird/poo.tei

ok, that's kind of funny...          :+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061103/39877901/attachment-0001.html
From Bowerbird at aol.com  Fri Nov  3 10:10:54 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Nov  3 10:11:04 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
Message-ID: <cf0.e468b3.327ce02e@aol.com>

josh said:
>    it is more formats than bb gives us credit for ... 
>    HTML, PDF, UTF-8, Latin-1, and US-ASCII are directly created

surely you don't count different encodings as different "formats".


>    then a background server script creates a plucker document
>    after posting ... so 6 formats are created from one TEI master

well, at least plucker is an honest-to-goodness e-book format,
but creating plucker from .html isn't all that earth-shaking, is it?
plucker can pull down any set of associated pages on a site, not?


>    the problem is during the earlier steps.? 
>    Creating the TEI doc itself.? 

right. applying the markup.   that's always the hard part.
the part that invokes costs -- human time and energy.
(not to mention expertise.)

and if the only reason for paying these costs is to get
the benefit of multiple formats, and that benefit can be
obtained by use of simpler (and thus less costly) means,
then there's no sense in applying the markup.

you're gonna have to come up with different benefits
if you really want to be convincing.   follow the lead that
david gave a few weeks back, based on network effects.


>    We have a good set of guidelines, but no tools 
>    specifically designed to help in that process.?

and here the question of costs raises its ugly head again.
a complex format requires complex tools to deal with it,
and such tools are difficult and time-consuming to create.
and that's precisely why you don't have the tools you need.


>    Personally, I use the GuiGuts editor developed by thundergnat 
>    over at DP (a wonderful little perl-based editor!) and a series of 
>    Regular Expressions that do the heavy lifting.? 

i encourage everyone to take a good hard look at guiguts
and delve into regular expressions to see if that approach
will work for you.   it might.   or it might not.   you will know.


>    But I still spend quite a bit of time manually tweaking the results.?

right.   more time and energy and expertise.

all to get out a .pdf and an .html version.

which i can also return, with a simple .zml file,
which can be edited together by a 4th-grader.

and yes, i haven't yet demonstrated those capabilities, not fully,
but i have begun the process of doing so, and will be continuing.

(actually, i did show output from my process some time ago,
for "alice in wonderland", but it was just shouted down and
then forgotten about, the modus operandi of my detractors.)

***

marcello said:
>    

d.p. is currently averaging about 2000 books a year.
that's a healthy number.   it makes me happy to see it.
since p.g. took 10 years to get its first 10,000 books
(and 20 years before that to get its first _100_ books),
i have the historical perspective to appreciate 2000/year.

yet still, google's contract with the university of california
calls for u.c. to provide 3000 books per _day_ for scanning.
and who knows how many more are being done each day
from michigan and stanford and all of the other partners?

so let us hope we can open up more digitization channels
than d.p. to handle this huge avalanche of scanned books,
or p.g. will become little more than a former waystation...

yet the .tei move challenges even the dedicated technoids
over at d.p.   so how will "ordinary people" be able to deal?
and if we can't use the input from "ordinary people", then
how will we be able to keep up with the deluge?   we won't.

so you're being squeezed in two directions.   at the same time
that you need to make things easier, to get more volunteers,
you're moving to a system that makes digitization _harder_...
something's got to give...

***

michael said:
>    "Build it, and they will come."

as it should be.

i want a nice little contest, me alone against the .tei gang.

i've said it before, and i'll say it again:   i know i have a winning hand.
and i'm not gonna show it until the other people at the table _fold_,
or throw all their money in the pot.   i want the .tei people to invest
_lots_ of their precious time and energy in their format, so they see
with their own eyes, and understand from their own experience, that
the cost-benefit ratio cannot be justified.   depending on how stubborn
they are, they might have to spend a lot of time and energy before they
get that realization.   but that's putting your money where your mouth is.
when they have put all their money in the pot, i'll show y'all _my_ hand...

which is not to say that i'm not willing to throw a few cards down now.

so...   i'm sorry i got suckered -- even just a little bit -- back into this
"argument", because we're into pudding time now, my friends, where
you will begin to find the proof.   more on "babelfish" starting monday...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061103/af790b18/attachment.html
From joshua at hutchinson.net  Fri Nov  3 10:28:33 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Fri Nov  3 10:29:00 2006
Subject: !@!Re: [gutvol-d] gvd061030 -- let's get it started in here
Message-ID: <85688.1162578513975.JavaMail.?@fh1063.dia.cp.net>


>----Original Message----
>From: hart@pglaf.org
>
>Neither Joshus Hutchinson, Marcello Perathoner, or Keith Shultz are 
correct,
>and, in two of these three cases, this is part of a long running 
pattern, so
>it is likely an intentional error.

It looks like you were taking my comments as disparaging against you 
or the way PG does things.  I apologize.  It wasn't meant that way.  
There are, however, a few things I want to clear up, so I'll add 
comments to your comments.

>
>Refutations applied to each comment below:

>
>2.  You are now, and always have been, welcome to your own 
directories
>on the official Project Gutenberg servers to work with.  Greg Newby
>will be only too glad to set up any such services you like, and even
>to help recruit volunteers to help you.
>

Yes, this is absolutely true.  I've had a directory on the PG server 
for a couple years now (as long as I've been working actively on the 
PGTEI stuff).  Greg set me, almost no questions asked, and has given me 
lots of space to work.  I've since started using the same account for 
audio posting duties, so it isn't *just* for PGTEI anymore, but that is 
what it was originally intended for.

>
>> Marcello is right.  Greg and Michael have both said, in private 
communications
>> and in public messages, that they will not dictate direction in PG.
>
>We will not dictate one direction at the expense of other similar 
efforts.
>
>This is not a race to create the official exclusive Project Gutenberg 
format.
>
>We will present eBooks in lots of formats, particularly those request 
by
>those to read our books.
>

We just said the same thing, Michael.  You make it sound like I am 
looking for official backing.  I'm not.  I was explaining to Keith 
*why* such backing is never likely to be forthcoming.

Personally, I can see why a benevolent dictatorship style of managing 
has its appeal for Keith.  Things, right or wrong, tend to get done.  
But the more chaotic and *democratic* approach that PG has always 
followed allows for a survival of the fittest approach to new ideas.  
This has its merits too.   My intention, however, was not to argue or 
hint at a preference for either, but rather explain why (in my opinion) 
things work the way they do. 

>
>> Michael, especially, likes the "throw it against the wall and see 
if 
>> it sticks" method of management.
>
>The actual quotation is:
>
>"We encourage you to run your ideas up the flagpole and see who 
salutes."
>

*sings* You say tomato, I say tomato ...

Ahem, sorry, my voice stinks.  ;)

>
>What you seem to want is for someone else to "make things happen" for 
you.
>

Not at all.  As I explained above, my comments were not a whine or 
call for help from above (and I apologize if I was unclear in that 
regard), but rather an explanation of how PG works.

>
>> Unfortunately, it is a bit of a chicken and egg thing.  Until I can
>> make TEI more popular with folks, the developers don't make the 
tools.
>> And until I have the tools, I can't get enough people to use it to
>> reach critical mass.  I can create texts (and do) in TEI, but I 
don't
>> have the skills to make tools for helping in the creation of said 
TEI docs.
>
>Just start by posting your examples and pointing to them.
>

I do.  A lot of them. In fact ...

>
>Over the 10 years mentioned at the top of this commentary, if you had
>created just one eBook per month for any particular new style of 
format
>then you would have a collection of well over 100 eBooks to 
demonstrate.
>

I've been doing this for just over 2 years and have 81 PGTEI encoded 
documents posted to the PG archive.  I work my butt off, in fact.  
However, I'm not a good evangelist.  I'll be the first to admit it.  
I've tried to make the information known in a low-key manner and some 
people will come to me with questions because of it, but I've found the 
lack of utilities is the biggest stumbling block for new people.  They 
like the potential, but the tools to help them get there are not 
robust.  Yet.

>
>"Build it, and they will come."
>

Yes.  I agree.  The problem is that I can't build the tools.  I can do 
examples, I can clarify guidelines and I can provide advice and 
feedback (all of which I do), but I can't do much more than basic 
scripting in the way of tools.  (I created a script for David Widger 
that helps him quickly run a TEI doc through the TEI toolchain to 
create the posting files, then zip them up in a file ready for posting, 
but that is the extent of my abilities).

Marcello, who does wonderful work on the PGTEI toolchain, also has 
lots of other demands on his volunteer time since he is our webmaster, 
too (not to mention his life outside PG).  I can't rely on him to 
provide tools for editing/generating the TEI as well.  Hence, my 
original "chicken and egg" comment.

Josh
From lee at novomail.net  Fri Nov  3 10:31:00 2006
From: lee at novomail.net (Lee Passey)
Date: Fri Nov  3 10:31:10 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net>
References: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net>
Message-ID: <454B8AE4.4000603@novomail.net>

joshua@hutchinson.net wrote:

>  ... the problem is during the earlier steps. Creating the TEI doc
>  itself. We have a good set of guidelines, but no tools specifically
>  designed to help in that process.


Is this really true? I don't mean about the tools, because there clearly 
are no good tools to help with TEI doc creation (although if Mr. 
Noring's push to create an e-book-oriented XML editor ever comes to 
fruition, that could help). I mean about the good set of guidelines. The 
last time I looked (admitedly about a year ago) I couldn't find any. Can 
you provide a link?

How were these guidelines created? Are they merely a pointer to the TEI 
documentation, which is actually more obtuse than BB's ZML 
"specification"? Are they a result of discussion and consensus, and is 
that discussion archived somewhere? Or did you and Marcello simply put 
something together and say "here are the guidelines?" (I'm not saying 
that's a bad thing; consensus on the Internet is almost impossible to 
acheive, guidelines by experts and stake-holders have more relevance 
than popular opinion, and almost any guidelines, no matter how derived, 
are better than no guidelines at all.)

Just curious.
From joshua at hutchinson.net  Fri Nov  3 10:42:40 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Fri Nov  3 10:42:44 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
Message-ID: <31978970.1162579360874.JavaMail.?@fh1063.dia.cp.net>

We do have guidelines.  I suppose whether they are good or not is in 
the eye of the beholder.  ;)

What we have is here:  http://pgtei.pglaf.org/marcello/0.4/

(Warning: Due to the migration Greg is currently doing to a newer and 
faster pglaf server, the page is down.)

It is deliberately structured like the TEI documentation and does 
contain many links back to their much more extension documentation.  
There have been changes made, both due to my feedback and due to others 
that have tried to use it.

However, it could definitely be improved and flesh out.

As far as community feedback, that tends to happen more in the DP 
forums where folks are more actively putting together and talking about 
new etexts.

Josh

>----Original Message----
>From: lee@novomail.net
>Date: Nov 3, 2006 13:31 
>To: "Project Gutenberg Volunteer Discussion"<gutvol-d@lists.pglaf.
org>
>Subj: Re: [gutvol-d] gvd061030 -- let&#39;s get it started in here
>
>joshua@hutchinson.net wrote:
>
>>  ... the problem is during the earlier steps. Creating the TEI doc
>>  itself. We have a good set of guidelines, but no tools 
specifically
>>  designed to help in that process.
>
>
>Is this really true? I don't mean about the tools, because there 
clearly 
>are no good tools to help with TEI doc creation (although if Mr. 
>Noring's push to create an e-book-oriented XML editor ever comes to 
>fruition, that could help). I mean about the good set of guidelines. 
The 
>last time I looked (admitedly about a year ago) I couldn't find any. 
Can 
>you provide a link?
>
>How were these guidelines created? Are they merely a pointer to the 
TEI 
>documentation, which is actually more obtuse than BB's ZML 
>"specification"? Are they a result of discussion and consensus, and 
is 
>that discussion archived somewhere? Or did you and Marcello simply 
put 
>something together and say "here are the guidelines?" (I'm not 
saying 
>that's a bad thing; consensus on the Internet is almost impossible 
to 
>acheive, guidelines by experts and stake-holders have more relevance 
>than popular opinion, and almost any guidelines, no matter how 
derived, 
>are better than no guidelines at all.)
>
>Just curious.
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From marcello at perathoner.de  Fri Nov  3 11:23:36 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov  3 11:23:39 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <454B8AE4.4000603@novomail.net>
References: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net>
	<454B8AE4.4000603@novomail.net>
Message-ID: <454B9738.4050603@perathoner.de>

Lee Passey wrote:

> Is this really true? I don't mean about the tools, because there clearly
> are no good tools to help with TEI doc creation

Sebastian Rahtz of Oxford University Computing Services has created a
set of stylesheets to enable OpenOffice to load and save TEI documents.
I never checked them out though.


> did you and Marcello simply put
> something together and say "here are the guidelines?"

Yes.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Fri Nov  3 12:28:51 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov  3 12:28:55 2006
Subject: [gutvol-d] Challenge
In-Reply-To: <cf0.e468b3.327ce02e@aol.com>
References: <cf0.e468b3.327ce02e@aol.com>
Message-ID: <454BA683.9030303@perathoner.de>

Bowerbird@aol.com wrote:

> and if the only reason for paying these costs is to get
> the benefit of multiple formats, and that benefit can be
> obtained by use of simpler (and thus less costly) means,
> then there's no sense in applying the markup.

Strawman!

Nobody is disputing that *if* you can get there cheaper you should do
it. We are just saying that ZML cannot get us there. Therefore it is
immaterial how cheap it is.

All you have "demonstrated" until today is, that you can select *one*
text that is simple enough to make ZML look good. Instead you should
demonstrate that ZML is mighty enough to handle *every* text in the
library. Say: footnotes, endnotes, tables, lists, titles, subtitles,
images, equations,


I propose this contest:

I will select one PG text for you to mark up in ZML with all textual
features preserved.

You will select one PG text for me to mark up in TEI with all textual
features preserved.

In one week we will present our results online with the end-user formats
we generated, the markup source and the open-source source code of all
tools used in the process. If the end-user formats cannot be regenerated
using only the master format and the tools provided in source code, the
entrant is disqualified.

This will assure that both of us operate with the tools available today
on a real-world text of some difficulty and that the tools used are
freely available today to everybody else.

Chicken out?


> a complex format requires complex tools to deal with it,
> and such tools are difficult and time-consuming to create.
> and that's precisely why you don't have the tools you need.

Misrepresentation of facts!


I have all the editors I need. I have emacs and I have nxml-mode which
makes applying TEI tags a breeze and validates while I type so I don't
have to waste time looking for markup errors. I have vi, I have
OpenOffice and a couple commercial ones. All those handle TEI.

All you have is some generic text editor that helps nothing to get your
format right. And nobody is going to develop an editor that understands
ZML for you.


I have all the libraries I need to read, transform and write my format.
For all programming languages I might choose. Good and time-tested ones.
All open-source.

You have not one library for your format. Even if you write one in perl,
you still have none in java. If you write one in java you still have
none in python. ...


I have all the support I need. There is a dedicated mailing list for TEI
chock full of knowledgeable people who want to help. These people all
work in humanities and if I have a question eg. concerning TEI and Thai
script I'll get an answer.

You have no support at all. If a problem surfaces your only choice is to
 resort to the time-tested method of denying there is a problem at all.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From hart at pglaf.org  Fri Nov  3 13:00:26 2006
From: hart at pglaf.org (Michael Hart)
Date: Fri Nov  3 13:00:28 2006
Subject: !@!Re: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <85688.1162578513975.JavaMail.?@fh1063.dia.cp.net>
References: <85688.1162578513975.JavaMail.?@fh1063.dia.cp.net>
Message-ID: <Pine.LNX.4.60.0611031244090.14238@pglaf.org>

On Fri, 3 Nov 2006, joshua@hutchinson.net wrote:

>
>> ----Original Message----
>> From: hart@pglaf.org
>>
>> Neither Joshus Hutchinson, Marcello Perathoner, or Keith Shultz are correct,
>> and, in two of these three cases, this is part of a long running pattern, so
>> it is likely an intentional error.
>
> It looks like you were taking my comments as disparaging against you
> or the way PG does things.

I responded to the content only, of all three voices.

Perhaps I should have included Jon Noring, but he was only a 3rd party,
so I didn't.


> I apologize.  It wasn't meant that way.

Since I responded to content only, no need to apologize.


> There are, however, a few things I want to clear up, so I'll add
> comments to your comments.
>
>>
>> Refutations applied to each comment below:
>
>>
>> 2.  You are now, and always have been, welcome to your own directories
>> on the official Project Gutenberg servers to work with.  Greg Newby
>> will be only too glad to set up any such services you like, and even
>> to help recruit volunteers to help you.
>>
>
> Yes, this is absolutely true.  I've had a directory on the PG server
> for a couple years now (as long as I've been working actively on the
> PGTEI stuff).  Greg set me, almost no questions asked, and has given me
> lots of space to work.  I've since started using the same account for
> audio posting duties, so it isn't *just* for PGTEI anymore, but that is
> what it was originally intended for.

However, there is no reason you coulnd't get other directories for other
purposes than just your original purpose.


>
>>
>>> Marcello is right.  Greg and Michael have both said, in private communications
>>> and in public messages, that they will not dictate direction in PG.
>>
>> We will not dictate one direction at the expense of other similar efforts.
>>
>> This is not a race to create the official exclusive Project Gutenberg format.
>>
>> We will present eBooks in lots of formats, particularly those request by
>> those to read our books.
>>
>
> We just said the same thing, Michael.  You make it sound like I am
> looking for official backing.  I'm not.  I was explaining to Keith
> *why* such backing is never likely to be forthcoming.

Again, I was responding to all three voices, no need to apologize
for what someone else said.


> Personally, I can see why a benevolent dictatorship style of managing
> has its appeal for Keith.  Things, right or wrong, tend to get done.

Benevolent dictatorships can be very effective, and have been,
even inside PG there are subgroups that are very disciplined,
and that is fine with me, as long as they don't try to take over
the entire Project Gutenberg structure, which it seems they try
to do every year or so.


> But the more chaotic and *democratic* approach that PG has always
> followed allows for a survival of the fittest approach to new ideas.
> This has its merits too.

No reason you can't have both, and should not have both, when we have
the world world as potential volutneers.


>  My intention, however, was not to argue or hint at a preference for either, 
> but rather explain why (in my opinion) things work the way they do.

However, there is no reason we can't have both. . .which would require,
by structural definition, that the very highest level accept both,
then the sub-levels can define themselves as they please. . .but with
a top level without such flexibility, this cannot happen, other than
by a volunteer revolt.

>>> Michael, especially, likes the "throw it against the wall and see if
>>> it sticks" method of management.
>>
>> The actual quotation is:
>>
>> "We encourage you to run your ideas up the flagpole and see who salutes."
>>
>
> *sings* You say tomato, I say tomato ...

If you are going to use quotes, you should actually be guoting what was said,
or identify some other source.

Since you used my name, you are responsible to use my quotation,
unless otherwise specified, and then it is still a red herring.


> Ahem, sorry, my voice stinks.  ;)

[comment withheld]


>>
>> What you seem to want is for someone else to "make things happen" for you.
>>
>
> Not at all.  As I explained above, my comments were not a whine or
> call for help from above (and I apologize if I was unclear in that
> regard), but rather an explanation of how PG works.

Not all I wrote was in response to comments of one person,
as mentioned in the first line.

And

How PG works is to allow pretty much everyone to do their own things.

It is how PG does NOT work to create bosses that bothers some here.


>>> Unfortunately, it is a bit of a chicken and egg thing.  Until I can
>>> make TEI more popular with folks, the developers don't make the tools.
>>> And until I have the tools, I can't get enough people to use it to
>>> reach critical mass.  I can create texts (and do) in TEI, but I don't
>>> have the skills to make tools for helping in the creation of said
> TEI docs.
>>
>> Just start by posting your examples and pointing to them.
>>
>
> I do.  A lot of them. In fact ...
>
>>
>> Over the 10 years mentioned at the top of this commentary, if you had
>> created just one eBook per month for any particular new style of format
>> then you would have a collection of well over 100 eBooks to demonstrate.
>>
>
> I've been doing this for just over 2 years and have 81 PGTEI encoded
> documents posted to the PG archive.  I work my butt off, in fact.
> However, I'm not a good evangelist.  I'll be the first to admit it.

Greg, and I, and the Newseltter editors would be only to happy to help here.


> I've tried to make the information known in a low-key manner and some
> people will come to me with questions because of it, but I've found the
> lack of utilities is the biggest stumbling block for new people.  They
> like the potential, but the tools to help them get there are not robust.  Yet.

Perhaps we could be less low key on your behalf than you would be.


>>
>> "Build it, and they will come."
>>
>
> Yes.  I agree.  The problem is that I can't build the tools.  I can do
> examples, I can clarify guidelines and I can provide advice and
> feedback (all of which I do), but I can't do much more than basic
> scripting in the way of tools.  (I created a script for David Widger
> that helps him quickly run a TEI doc through the TEI toolchain to
> create the posting files, then zip them up in a file ready for posting,
> but that is the extent of my abilities).

This is why we should make an issue of promoting your project
and getting volunteers to help further teh tool development!


> Marcello, who does wonderful work on the PGTEI toolchain, also has
> lots of other demands on his volunteer time since he is our webmaster,
> too (not to mention his life outside PG).  I can't rely on him to
> provide tools for editing/generating the TEI as well.  Hence, my
> original "chicken and egg" comment.

Hencemy comments on doing some PR for your.

>
> Josh
>

Michael

From Bowerbird at aol.com  Fri Nov  3 14:26:12 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Nov  3 14:26:29 2006
Subject: [gutvol-d] Challenge
Message-ID: <c19.8b29ec8.327d1c04@aol.com>

marcello said:
>   The error is in your browser.

standard technoid answer:   it's _your_ fault.

man i'm glad i'm from the mac world,
where the mantra is "...it just works..."

***

marcello said:
>   A list of 134 projects using TEI can be found here:
>    http://www.tei-c.org/Applications/

and guess what...

in terms of societal mindshare as a cyberspace library,
project gutenberg kicks the ass out of _all_ of them,
not just individually, but _combined_.

and that's because, instead of the goop of tag soup,
michael hart had the vision to give people _the_words_.

and -- in the invention of "remixing" before anyone had
even coined the term for it -- people took those words
and they _ran_ with them.

that's the reason why project gutenberg, with no money,
ended up outcompeting all of those well-funded projects,
which continue to cost money while providing few benefits.

and now you want project gutenberg to mimic _their_ methods?

you're not the solution.   you're the problem.

***

marcello said:
>    Nobody is disputing that *if* you can get there cheaper 
>    you should do it. 

ok, good, i'm glad we can agree on economics 101 anyway.


>    We are just saying that ZML cannot get us there. 

since "there" is a relatively ambiguous place,
i'm not sure how you can say that so casually.

i was talking about .html and .pdf conversions.
i can pull that off.   that and a whole lot more...


>    Therefore it is immaterial how cheap it is.

it's not quite so easy to cast cost-benefit in those terms.
you have to specify the exact benefits and the exact costs,
and compare it to other combinations, and then decide that.

it's _never_ "immaterial" how cheap something is,
unless it returns absolutely _no_ benefits...


>    All you have "demonstrated" until today is, that you can 
>    select *one* text that is simple enough to make ZML look good. 

don't be silly.   i didn't select that text.   jon noring did.

and anyway, i've had several other books up, for some time now.

here's "books and culture", by hamilton wright mabie:
>    http://www.greatamericannovel.com/mabie/mabiec001.html
i didn't select this one either; it was google's first example book.

and here's "the secret garden", by frances hodgson burnett:
>    http://www.greatamericannovel.com/sgfhb/sgfhbc001.html
i didn't "select" this one, either, but i can't remember why i have it;
it might have been compare-independent-digitizations research.

and here's "the open library", by brewster kahle:
>    http://www.greatamericannovel.com/tolbk/tolbkp001.html
i did this for reasons that should be obvious, namely its relevance.

and then of course there's the "alice in wonderland":
>    http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.html
this one has been up for over a year now (2005/09/20).

now, it's certainly true that all of these books were fairly simple.
but here's a newsflash: there are tons and tons of simple books.
most of the books in the library already are relatively simple books.

of course, even a simple book becomes complicated in a complex format.
so, hey, if you think you can sell your complex format, go right ahead...

but i'd suggest you get cracking, because in just a couple weeks from now,
after i've shown people here how a simple format can handle simple books,
and maybe a good percentage of the complicated ones as well, it's gonna be
a lot harder for you to pull off that con job.


>    Instead you should demonstrate that ZML is mighty enough 
>    to handle *every* text in the library.    Say: 
>    footnotes, endnotes, tables, lists, titles, subtitles, images, 
equations,

everything in your list is in my test-suite (with equations as images):
>    http://snowy.arsc.alaska.edu/bowerbird/test-suite/test-suite.zml
>    http://snowy.arsc.alaska.edu/bowerbird/test-suite/test-suite.html

by the way, even though aaron's latest post was just the same old schtick,
it reminded me that since tex is just ascii (right?), there's no reason that
i can't bundle it in.   so just save your tex file(s) alongside the main 
text,
and my viewer will send them to the tex processor of the user's choice.
this -- as i have told you -- is a method that can be used for _any_ type
of auxiliary file(s) that you might want to bundle inside of your e-book.
so they wouldn't even have to be ascii files; any file-format would do...


>    I propose this contest:
>    I will select one PG text for you to mark up in ZML 
>    with all textual features preserved.
>    You will select one PG text for me to mark up in TEI 
>    with all textual features preserved.

one text?

right up above you wanted me to handle _every_ text in the library.

now you want to construe the contest in terms of _one_text_?

sorry, the contest has already been framed:   it's the entire library.

whenever you're ready to start making .tei texts in earnest, do it.
i'm ready to start matching you on a text-for-text basis any time.
i'll maintain my .zml mirror of the library by my widdle wonesome.
you can get all your .tei buddies to help you maintain your mirror.

and the decision as to the winner will be made by _the_people_...
the same people who have voted project gutenberg #1 so far...

and hey, if you wanna point me to a certain text now, i'll take a look,
and see how .zml would approach it.   i'll remind people that i have
been asking for this kind of pointer for some time now.   you should
even give me a half-dozen such pointers, because i'm sure you will
point to something esoteric, and i'll have to wave off a few of 'em,
since i've always said there will be some texts that .zml can't do...

and if you want me to point you to one text, it'll be my test-suite.
i will likewise remind people that i've been asking to see that done
in .tei for well over a year already now, and nobody ever did it...


>   and the open-source source code of all tools used in the process

you really wanna see the cards in my hand before 
you throw all of your money in the pot, don't you?

i don't blame you.   why waste your time away, eh?

unfortunately for you, that's not how the game is played.

if you were totally confident that you were gonna win,
you'd be as disinterested in my toolset as i am in yours.

(emacs and those linux apps only a technoid could love;
yeah, right, you're gonna make those palatable to people.
here's another newsflash:   writely, zoho, jotspot, pbwiki...)


>    nobody is going to develop an editor that understands ZML for you.

"nobody" doesn't have to do that "for me", because i've already done it.
i'm not a mere script kiddie; i can make honest-to-goodness applications.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061103/586b1187/attachment-0001.html
From marcello at perathoner.de  Fri Nov  3 15:02:05 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Nov  3 15:02:09 2006
Subject: [gutvol-d] Challenge
In-Reply-To: <c19.8b29ec8.327d1c04@aol.com>
References: <c19.8b29ec8.327d1c04@aol.com>
Message-ID: <454BCA6D.3040903@perathoner.de>

Bowerbird@aol.com wrote:

>>    I propose this contest:
>>    I will select one PG text for you to mark up in ZML 
>>    with all textual features preserved.
>>    You will select one PG text for me to mark up in TEI 
>>    with all textual features preserved.
> 
> sorry, the contest has already been framed:   it's the entire library.

Chickenheart!


>>   and the open-source source code of all tools used in the process
> 
> unfortunately for you, that's not how the game is played.

Nothing to show after 4 years of hype? That's too bad.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jeroen.mailinglist at bohol.ph  Fri Nov  3 14:57:35 2006
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Fri Nov  3 15:15:20 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de>
References: <c14.79d1e63.3277cf5e@aol.com>	<1162248212.5857.1.camel@localhost.localdomain>	<45477F3D.70205@perathoner.de>	<064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>	<4549DC5F.8010002@perathoner.de>
	<73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de>
Message-ID: <454BC95F.90807@bohol.ph>

Schultz Keith J. wrote:
>
>     First off,  his method is not any better than anything or worse
>     than your socalled TEI ! As you said it is not finished nor
>     anywhere near it. The TEI movement has been around for at least
>     5 years. As far as I am concerned it is vaporware. 
I have been using TEI for 8 years. I have posted over 200 ebooks
produced using TEI.
My tools are available on-line, and the TEI master files can be had for
the asking. The reason
I haven't posted the TEI themselves is simply because I haven't taken
enough care to
make my tools ready for prime time. TEI isn't vaporware, but a format
that is actually and
widely used, however, since it is fairly flexible and very rich, and
since it approaches text from the semantic,
not the visual appearance edge, tools for it are not always up-to-task
to generate a good-looking
or even acceptable rendered results in a multitude of formats, such as
HTML, PDF, plain text, etc.
With increasing complexity and features of texts, some programming and
fine-tuning of processes
will be required...

Jeroen.


From Bowerbird at aol.com  Fri Nov  3 16:13:41 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Nov  3 16:13:53 2006
Subject: [gutvol-d] Challenge
Message-ID: <cbd.291ebfe.327d3535@aol.com>

marcello said:
>    Chickenheart!

um, did i miss a pointer to an e-text somewhere?
and is my pointer to my test-suite being ignored?


>    Nothing to show after 4 years of hype? That's too bad.

well, i'm not showing my source code to you, nope.
but i invite you to join the open-source project here.

and i'm still ready to match you text-for-text, any time,
whenever you think you're ready to start getting started.

i'm gonna make sure you've wasted a _lot_ of your time
before i show other people that you're wasting your time.
(but i'll make sure they see it before they waste _their_ time.)

meanwhile, every day, more and more e-texts are posted
that are amazingly close to being totally-finished .zml files.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061103/39b46dda/attachment.html
From gbnewby at pglaf.org  Fri Nov  3 17:34:59 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Nov  3 17:35:01 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <31978970.1162579360874.JavaMail.?@fh1063.dia.cp.net>
Message-ID: <20061104013459.GA21049@pglaf.org>

On Fri, Nov 03, 2006 at 06:42:40PM +0000, joshua@hutchinson.net wrote:
> We do have guidelines.  I suppose whether they are good or not is in 
> the eye of the beholder.  ;)
> 
> What we have is here:  http://pgtei.pglaf.org/marcello/0.4/
> 
> (Warning: Due to the migration Greg is currently doing to a newer and 
> faster pglaf server, the page is down.)

Sorry, I didn't realize...fixed now.  I put it in to place
a week or two ago, but our free nameserver host (xname.org)
had a major DDoS last week, and I moved our DNS to a new
host.

Next, I'll make our entries active on *both* hosts :)

Let me know if anything still seems amiss.
  -- Greg
From gbnewby at pglaf.org  Fri Nov  3 17:35:50 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Nov  3 17:35:50 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <DBB7DBB0-D2AA-46BF-ABAB-0AC2A76D198E@uni-trier.de>
Message-ID: <20061104013550.GA21170@pglaf.org>

On Fri, Nov 03, 2006 at 01:38:00PM +0100, Schultz Keith J. wrote:
> ...
> 	I would have already set up a working system 10 years ago, if Micheal
> 	would have allowed it. But, I will not start doing anything unless
> 	it gets an official go ahead and will be used by PG officially.
> 	My time is to precious to waste on anything else.

Keith, here is your official "go ahead."
As Michael responded, we'll use essentially any reasonable format.
So if you have something in mind, go for it.

If you were thinking that your official format would exclude other
formats, that's a different issue.  I don't think that will happen
too soon...

Officially yours,
  -- Greg

Dr. Gregory B. Newby
Chief Executive and Director
Project Gutenberg Literary Archive Foundation http://gutenberg.net
A 501(c)(3) not-for-profit organization with EIN 64-6221541
gbnewby@pglaf.org
From joey at joeysmith.com  Fri Nov  3 18:09:23 2006
From: joey at joeysmith.com (joey)
Date: Fri Nov  3 18:11:39 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <cf0.e468b3.327ce02e@aol.com>
References: <cf0.e468b3.327ce02e@aol.com>
Message-ID: <20061104020923.GA9638@joeysmith.com>

On Fri, Nov 03, 2006 at 01:10:54PM -0500, Bowerbird@aol.com wrote:
> i want a nice little contest, me alone against the .tei gang.

Based on what you've told us about available "ZML marked up etexts"
and a little bit of Google-mojo (see http://tinyurl.com/y6kfhl),
it would appear you're about 100 books behind.
From cannona at fireantproductions.com  Fri Nov  3 18:48:59 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Fri Nov  3 18:51:12 2006
Subject: [gutvol-d] Challenge
References: <c19.8b29ec8.327d1c04@aol.com>
Message-ID: <00a801c6ffbc$1666acc0$0300a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bowerbird wrote:
>
> everything in your list is in my test-suite (with equations as images):
>>    http://snowy.arsc.alaska.edu/bowerbird/test-suite/test-suite.zml
>>    http://snowy.arsc.alaska.edu/bowerbird/test-suite/test-suite.html
>
> by the way, even though aaron's latest post was just the same old schtick,
> it reminded me that since tex is just ascii (right?), there's no reason
> that
> i can't bundle it in.   so just save your tex file(s) alongside the main
> text,
> and my viewer will send them to the tex processor of the user's choice.
> this -- as i have told you -- is a method that can be used for _any_ type
> of auxiliary file(s) that you might want to bundle inside of your e-book.
> so they wouldn't even have to be ascii files; any file-format would do...

On an earlier occasion, he also wrote:
">   Sure we do. We use TeX (or pseudo-TeX fragments).

and that's why that's what i'll probably do as well,
when the time comes that i feel that it's necessary,
because that's my modus operandi, to utilize the
existing conventions, to best leverage current work."

"but for now, i'm not at all worried about this 'problem'."


So, I'm curious.  can we expect support for equations some time before the
death of the universe?  What other plain-as-day modifications to ZML can we
lead you to?

Don't worry, those were rhetorical questions.  I wouldn't want to trick you
in to accidentally answering me after you promised not to.

Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFFTAAgI7J99hVZuJcRAmzXAKChO19RmWqe0yvFI/WlOC1IWutu0gCgqZdN
aICB3d+EUkmgMf2mBDiQxic=
=XLMv
-----END PGP SIGNATURE-----

From schultzk at uni-trier.de  Sat Nov  4 11:28:28 2006
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Sat Nov  4 11:28:38 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <454B4516.3070605@perathoner.de>
References: <c14.79d1e63.3277cf5e@aol.com>	<1162248212.5857.1.camel@localhost.localdomain>	<45477F3D.70205@perathoner.de>	<064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>	<4549DC5F.8010002@perathoner.de>	<73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de>	<454B2C66.5020100@perathoner.de>
	<DBB7DBB0-D2AA-46BF-ABAB-0AC2A76D198E@uni-trier.de>
	<454B4516.3070605@perathoner.de>
Message-ID: <3FEB672A-57D7-4F44-972C-7D8F78BFE843@uni-trier.de>

Hi Marcello,

Am 03.11.2006 um 14:33 schrieb Marcello Perathoner:

> Schultz Keith J. wrote:
>
>>     I know what it takes top do the job! It is my profession:  
>> Linguistics.
>
> but he also wrote:
>
>>     The only place I heard about TEI is here.
>
> "The TEI was originally sponsored by the Association of Computers  
> in the
> Humanities (ACH), the Association for Computational Linguistics (ACL),
> and the Association of Literary and Linguistic Computing (ALLC). Major
> support has been received from the U.S. National Endowment for the
> Humanities (NEH), the European Community, the Mellon Foundation,  
> and the
> Social Science and Humanities Research Council of Canada."
>
>   http://www.tei-c.org/
	A lot of things get sponsered in the acedemic world, but that does
	not mean it gets very far in the WORLD and it still does not mean
	that TEI is useful in linguistics!!? Like I said there are a lot better
	systems around for doing work in linguistics.
>
>
> A list of 134 projects using TEI can be found here:
>
>   http://www.tei-c.org/Applications/
>
>
>>     I would have already set up a working system 10 years ago, if  
>> Micheal
>>     would have allowed it. But, I will not start doing anything  
>> unless
>>     it gets an official go ahead and will be used by PG officially.
>
> I'm sure you would! Especially with the great competence you have
> already shown in your very own professional field.
>
> Bottom line: You never will get PG to officially endorse any one  
> format.
	That is why you still have gotten your tools finished, or have
	someone like me step and program it for you in no time flat.
>
> Your only chance is to create a format that is so much better than any
> other format, that PG and DP volunteers don't want to use anything  
> else.
> At that point it will automatically become the "official" format.
	I am not on an ego trip and do not reinvent the wheel.
>
> Tip: don't go to DP and piss off everybody in sight like BB did. DP
> people will have a very loud say in this question because its they  
> that
> create the books.
>
	If I go to DP. I would look at their concept and specifications and
	ask how I can help or offer and discuss better ways of doing things.

	As far as TEI and zml are concerned if you a specific question on how
	implement something -- just ask.

	Keith

From schultzk at uni-trier.de  Sat Nov  4 11:46:23 2006
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Sat Nov  4 11:46:32 2006
Subject: !@!Re: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <Pine.LNX.4.60.0611030808520.8971@pglaf.org>
References: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net>
	<Pine.LNX.4.60.0611030808520.8971@pglaf.org>
Message-ID: <20ED1141-1350-4E89-9DC5-C23043E04F4C@uni-trier.de>

Hi Micheal,

	Glad you step-in. I have to disgree with you one one point though
	If PG had as a base format a minimalistic markup then all would benefit
	from that. It would be easier if in the scanning and proofing process
	information about type, chapters, paragraphs, footnotes, graphics were
	preserved !! With this information then Plain Vanilla Texts, TEI,  
zml, html,
	etc can be easily created ! I am not saying one is better than the  
other or
	if one wish to develop something new he should not. The exact  
opposite is the
	case. But, if one has to reedit the files from scratch we are  
wasting valuable
	resources!! THIS IS WHY I SAY PG NEEDS A BETTER BASE FORMAT THAN PVT.

	regards
		Keith

	
Am 03.11.2006 um 17:37 schrieb Michael Hart:

>
> Neither Joshus Hutchinson, Marcello Perathoner, or Keith Shultz are  
> correct,
> and, in two of these three cases, this is part of a long running  
> pattern, so
> it is likely an intentional error.
>
> Refutations applied to each comment below:
>
>
> On Fri, 3 Nov 2006, joshua@hutchinson.net wrote:
>
>>> ----Original Message---- From: marcello@perathoner.de
>>>
>>> Schultz Keith J. wrote:
>>>
>>>>     I would have already set up a working system 10 years ago,  
>>>> if Micheal
>>>>     would have allowed it. But, I will not start doing anything  
>>>> unless
>>>>     it gets an official go ahead and will be used by PG officially.
>
> 1.  Project Gutenberg encourages all such system setups and always  
> has.
>
> 2.  You are now, and always have been, welcome to your own directories
> on the official Project Gutenberg servers to work with.  Greg Newby
> will be only too glad to set up any such services you like, and even
> to help recruit volunteers to help you.
>
>
>>> Bottom line: You never will get PG to officially endorse any one  
>>> format.
>
> To the extent of an exclusivve endorsement that would disallow  
> other formats
> that is is most likely true, but if you don't have faith in your  
> own format,
> then you can't expect anyone else to have it.
>
> There will be no government sponsored religions here.
>
>
>>> Your only chance is to create a format that is so much better  
>>> than any other format, that PG and DP volunteers don't want to  
>>> use anything else. At that point it will automatically become the  
>>> "official" format.
>
> As it should be.
>
>
>
>> Marcello is right.  Greg and Michael have both said, in private  
>> communications
>> and in public messages, that they will not dictate direction in PG.
>
> We will not dictate one direction at the expense of other similar  
> efforts.
>
> This is not a race to create the official exclusive Project  
> Gutenberg format.
>
> We will present eBooks in lots of formats, particularly those  
> request by
> those to read our books.
>
>
>> Michael, especially, likes the "throw it against the wall and see  
>> if it sticks" method of management.
>
> The actual quotation is:
>
> "We encourage you to run your ideas up the flagpole and see who  
> salutes."
>
> If you can't get anyone to use your format, we are hardly going to  
> force
> your ideas through anyone's alimentary canal, either.
>
>
>> While it can be frustrating at times, because a more decisive  
>> leadership can often "make things happen," this is not something  
>> that is likely to change. Ever.  So, we have to plan with that in  
>> mind.
>
>
> "Make things happen" is exactly what Project Gutenberg encourages.
>
> What you seem to want is for someone else to "make things happen"  
> for you.
>
> We'll help, but we won't do it for you.
>
> And we won't declare your or your format the "official" winner.
>
> There will always be be room for improvements.
>
> Project Gutenberg is a dyanmic process to maximize the eBook  
> potential,
> not a static system to be once achieved and then left as a fossil  
> record.
>
> "Make Things Happen"
>
> Don't wait for someone else to tell you that your idea has happened.
>
>
>
>> Unfortunately, it is a bit of a chicken and egg thing.  Until I can
>> make TEI more popular with folks, the developers don't make the  
>> tools.
>> And until I have the tools, I can't get enough people to use it to
>> reach critical mass.  I can create texts (and do) in TEI, but I don't
>> have the skills to make tools for helping in the creation of said  
>> TEI docs.
>
> Just start by posting your examples and pointing to them.
>
> That's how YouTube, MySpace, Google, Yahoo, and Project Gutenberg  
> started.
>
> Don't expect to start at the end, it helps to start at the beginning.
>
>
>> As Jon Noring agreed earlier, there are NEVER enough developers to go
>> around!  ;)
>>
>> Josh
>
>
> Or, there are too many developers creating not enough example eBooks
> to generate any interest.
>
> Over the 10 years mentioned at the top of this commentary, if you had
> created just one eBook per month for any particular new style of  
> format
> then you would have a collection of well over 100 eBooks to  
> demonstrate.
>
> Without such an initial collection, it's hard to expect anyone to  
> come.
>
> "Build it, and they will come."
>
>
>
> Thanks!!!
>
> Give the world eBooks in 2006!!!
>
> Michael S. Hart
> Founder
> Project Gutenberg
>
> Blog at http://hart.pglaf.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From schultzk at uni-trier.de  Sat Nov  4 11:52:00 2006
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Sat Nov  4 11:52:10 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <454B7392.1060201@perathoner.de>
References: <cfe.da3a8d.327c6e72@aol.com> <454B3A94.3000301@perathoner.de>
	<0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com>
	<454B7392.1060201@perathoner.de>
Message-ID: <6E736CFF-8A2B-493C-BB02-DE7F24D44860@uni-trier.de>

Hi There,

	Just loaded it in camino came up fine. I also displayed the source
	it xml and IE thinks it is xml so it tries to interpret it as xml !
	BTW. no mention in your source about "text/plain"

	Sorry, whose mistake?

	Keith.

Am 03.11.2006 um 17:51 schrieb Marcello Perathoner:

> Dave Fawthrop wrote:
>> On Fri, 03 Nov 2006 13:48:20 +0100,  Marcello Perathoner
>> <marcello@perathoner.de> wrote:
>>
>> |  http://www.gnutenberg.de/bowerbird/poo.tei
>> The XML page cannot be displayed
>> Cannot view XML input using style sheet. Please correct the error  
>> and then
>> click the Refresh button, or try again later.
>>
>>
>> --------------------------------------------------------------------- 
>> -----------
>>
>> The system cannot locate the object specified. Error processing  
>> resource
>> 'http://www.tei-c.org/P4X/DTD/pgtei-extensions.ent...
>>
>> <!ENTITY % TEI.extensions.ent '' >%TEI.extensions.ent;
>> <<<
>>
>> So modern I can not view it with latest IE  6.0.2900.2180
>>
>> What use is something which *ordinary* people can not read?
>
> The error is in your browser. The object is served as "text/plain".  
> That
> means: It is to be displayed to the user without further ado. IE  
> has no
> business trying to interpret it in any way.
>
>
> $ wget -S http://www.gnutenberg.de/bowerbird/poo.tei
> --17:26:32--  http://www.gnutenberg.de/bowerbird/poo.tei
>            => `poo.tei'
> Resolving localhost... 127.0.0.1
> Connecting to localhost|127.0.0.1|:8118... connected.
> Proxy request sent, awaiting response...
>   HTTP/1.1 200 OK
>   Date: Fri, 03 Nov 2006 16:26:32 GMT
>   Server: Apache/2.0.55 (Debian) mod_jk/1.2.18 PHP/5.1.6-1
>   Last-Modified: Fri, 03 Nov 2006 12:50:56 GMT
>   ETag: "27400a-7c6c-2fe12c00"
>   Accept-Ranges: bytes
>   Content-Length: 31852
>   Content-Type: text/plain; charset=utf-8
>   Connection: close
> Length: 31,852 (31K) [text/plain]
>
> 100%[====================================>] 31,852       123.50K/s
>
> 17:26:32 (123.38 KB/s) - `poo.tei' saved [31852/31852]
>
> $
>
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From schultzk at uni-trier.de  Sat Nov  4 12:06:44 2006
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Sat Nov  4 12:06:52 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <454BC95F.90807@bohol.ph>
References: <c14.79d1e63.3277cf5e@aol.com>	<1162248212.5857.1.camel@localhost.localdomain>	<45477F3D.70205@perathoner.de>	<064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>	<4549DC5F.8010002@perathoner.de>
	<73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de>
	<454BC95F.90807@bohol.ph>
Message-ID: <41FAA4BF-8415-4C4B-8235-A564955CC64C@uni-trier.de>

Hi Jereon,

	 I was not talking about TEI being used by people, but by PG
	itsself. Marcello has said they do not have the tools!!

	Keith

Am 03.11.2006 um 23:57 schrieb Jeroen Hellingman (Mailing List Account):

> Schultz Keith J. wrote:
>>
>>     First off,  his method is not any better than anything or worse
>>     than your socalled TEI ! As you said it is not finished nor
>>     anywhere near it. The TEI movement has been around for at least
>>     5 years. As far as I am concerned it is vaporware.
> I have been using TEI for 8 years. I have posted over 200 ebooks
> produced using TEI.
> My tools are available on-line, and the TEI master files can be had  
> for
> the asking. The reason
> I haven't posted the TEI themselves is simply because I haven't taken
> enough care to
> make my tools ready for prime time. TEI isn't vaporware, but a format
> that is actually and
> widely used, however, since it is fairly flexible and very rich, and
> since it approaches text from the semantic,
> not the visual appearance edge, tools for it are not always up-to-task
> to generate a good-looking
> or even acceptable rendered results in a multitude of formats, such as
> HTML, PDF, plain text, etc.
> With increasing complexity and features of texts, some programming and
> fine-tuning of processes
> will be required...
>
> Jeroen.
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From schultzk at uni-trier.de  Sat Nov  4 12:15:56 2006
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Sat Nov  4 12:16:04 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <20061104013550.GA21170@pglaf.org>
References: <20061104013550.GA21170@pglaf.org>
Message-ID: <91C6CECB-0128-4730-B1B0-51CEE6E0B158@uni-trier.de>

Hi Greg,

	What I am talking about is a base format which would require
	that all PG texts be scanned and process to contain information
	on chapter, pages, paragraphs, type, etc..

	It would not exclude conversion to other formats, but greatly simplify
	the process. Also, the format could contain information to be imported
	the databases use by PG-Website.

	As I am only one person, who may not see all implecation or aspects
	others would be needed. Also, it would have to be fully endorsed by PG.
	If PG is interested in developing such a standard I am your man.

	 regards
		Keith.

Am 04.11.2006 um 02:35 schrieb Greg Newby:

> On Fri, Nov 03, 2006 at 01:38:00PM +0100, Schultz Keith J. wrote:
>> ...
>> 	I would have already set up a working system 10 years ago, if  
>> Micheal
>> 	would have allowed it. But, I will not start doing anything unless
>> 	it gets an official go ahead and will be used by PG officially.
>> 	My time is to precious to waste on anything else.
>
> Keith, here is your official "go ahead."
> As Michael responded, we'll use essentially any reasonable format.
> So if you have something in mind, go for it.
>
> If you were thinking that your official format would exclude other
> formats, that's a different issue.  I don't think that will happen
> too soon...
>
> Officially yours,
>   -- Greg
>
> Dr. Gregory B. Newby
> Chief Executive and Director
> Project Gutenberg Literary Archive Foundation http://gutenberg.net
> A 501(c)(3) not-for-profit organization with EIN 64-6221541
> gbnewby@pglaf.org
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From marcello at perathoner.de  Sat Nov  4 12:42:26 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Nov  4 12:42:31 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <3FEB672A-57D7-4F44-972C-7D8F78BFE843@uni-trier.de>
References: <c14.79d1e63.3277cf5e@aol.com>	<1162248212.5857.1.camel@localhost.localdomain>	<45477F3D.70205@perathoner.de>	<064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de>	<4549DC5F.8010002@perathoner.de>	<73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de>	<454B2C66.5020100@perathoner.de>	<DBB7DBB0-D2AA-46BF-ABAB-0AC2A76D198E@uni-trier.de>	<454B4516.3070605@perathoner.de>
	<3FEB672A-57D7-4F44-972C-7D8F78BFE843@uni-trier.de>
Message-ID: <454CFB32.2080308@perathoner.de>

Schultz Keith J. wrote:

>     A lot of things get sponsered in the acedemic world, but that does
>     not mean it gets very far in the WORLD and it still does not mean
>     that TEI is useful in linguistics!!? Like I said there are a lot better
>     systems around for doing work in linguistics.

It would be much more helpful if you did say *which* systems are better
than TEI.


>     That is why you still have gotten your tools finished, or have
>     someone like me step and program it for you in no time flat.

Your words are big enough. Now if you could show me some of your big
deeds maybe I could start to take you seriously.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Sat Nov  4 12:46:12 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Nov  4 12:46:15 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <6E736CFF-8A2B-493C-BB02-DE7F24D44860@uni-trier.de>
References: <cfe.da3a8d.327c6e72@aol.com>
	<454B3A94.3000301@perathoner.de>	<0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com>	<454B7392.1060201@perathoner.de>
	<6E736CFF-8A2B-493C-BB02-DE7F24D44860@uni-trier.de>
Message-ID: <454CFC14.1000307@perathoner.de>

Schultz Keith J. wrote:

>     Just loaded it in camino came up fine. I also displayed the source
>     it xml and IE thinks it is xml so it tries to interpret it as xml !
>     BTW. no mention in your source about "text/plain"

The web server tells you that in the "Content-Type" field of the HTTP
response header. To see the header say this:

  wget -S http://www.gnutenberg.de/bowerbird/poo.tei


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Sat Nov  4 19:14:22 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Sat Nov  4 19:14:24 2006
Subject: [gutvol-d] The Proof is in the Poo
Message-ID: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>

The problem is that IE is not following the "proper" web standards.  
The server sends information that the file is text/plain.  Which means 
IE should treat it like a plain text file and display it accordingly, 
ignoring any and all markup.  IE, which as we all know, is notorious 
for playing fast and loose with the w3c specs.  In this case, it 
ignores the server and just looks directly at the file and tries to 
parse it alone, which, as you can see from the results, it is unable to 
do properly.

The gist is, the TEI files are not meant to be parse by a web browser, 
so the fact that they DON'T display properly basically means everything 
is working according to design.

Josh

>----Original Message----
>From: schultzk@uni-trier.de
>Date: Nov 4, 2006 14:52 
>To: "Project Gutenberg Volunteer Discussion"<gutvol-d@lists.pglaf.
org>
>Subj: Re: [gutvol-d] The Proof is in the Poo
>
>Hi There,
>
>	Just loaded it in camino came up fine. I also displayed the source
>	it xml and IE thinks it is xml so it tries to interpret it as xml !
>	BTW. no mention in your source about "text/plain"
>
>	Sorry, whose mistake?
>
>	Keith.
>
>Am 03.11.2006 um 17:51 schrieb Marcello Perathoner:
>
>> Dave Fawthrop wrote:
>>> On Fri, 03 Nov 2006 13:48:20 +0100,  Marcello Perathoner
>>> <marcello@perathoner.de> wrote:
>>>
>>> |  http://www.gnutenberg.de/bowerbird/poo.tei
>>> The XML page cannot be displayed
>>> Cannot view XML input using style sheet. Please correct the 
error  
>>> and then
>>> click the Refresh button, or try again later.
>>>
>>>
>>> 
--------------------------------------------------------------------- 
>>> -----------
>>>
>>> The system cannot locate the object specified. Error processing  
>>> resource
>>> 'http://www.tei-c.org/P4X/DTD/pgtei-extensions.ent...
>>>
>>> <!ENTITY % TEI.extensions.ent '' >%TEI.extensions.ent;
>>> <<<
>>>
>>> So modern I can not view it with latest IE  6.0.2900.2180
>>>
>>> What use is something which *ordinary* people can not read?
>>
>> The error is in your browser. The object is served as 
"text/plain".  
>> That
>> means: It is to be displayed to the user without further ado. IE  
>> has no
>> business trying to interpret it in any way.
>>
>>
>> $ wget -S http://www.gnutenberg.de/bowerbird/poo.tei
>> --17:26:32--  http://www.gnutenberg.de/bowerbird/poo.tei
>>            => `poo.tei'
>> Resolving localhost... 127.0.0.1
>> Connecting to localhost|127.0.0.1|:8118... connected.
>> Proxy request sent, awaiting response...
>>   HTTP/1.1 200 OK
>>   Date: Fri, 03 Nov 2006 16:26:32 GMT
>>   Server: Apache/2.0.55 (Debian) mod_jk/1.2.18 PHP/5.1.6-1
>>   Last-Modified: Fri, 03 Nov 2006 12:50:56 GMT
>>   ETag: "27400a-7c6c-2fe12c00"
>>   Accept-Ranges: bytes
>>   Content-Length: 31852
>>   Content-Type: text/plain; charset=utf-8
>>   Connection: close
>> Length: 31,852 (31K) [text/plain]
>>
>> 100%[====================================>] 31,852       123.50K/s
>>
>> 17:26:32 (123.38 KB/s) - `poo.tei' saved [31852/31852]
>>
>> $
>>
>> -- 
>> Marcello Perathoner
>> webmaster@gutenberg.org
>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From sam.bretheim at gmail.com  Sat Nov  4 22:14:16 2006
From: sam.bretheim at gmail.com (Sam Bretheim)
Date: Sat Nov  4 22:14:45 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
Message-ID: <454D8138.6040307@gmail.com>

joshua@hutchinson.net wrote:
> The gist is, the TEI files are not meant to be parse by a web browser, 
> so the fact that they DON'T display properly basically means everything 
> is working according to design.
>   

It's worth mentioning that modern Web browsers are quite capable of 
displaying TEI reasonably well, though some work on the relevant TEI and 
XSL stylesheets is necessary before they're ready to be widely used.

For instance, if the author had inserted the following near the 
beginning of the document, it would have rendered quite tolerably in 
recent versions of Firefox/Mozilla/Camino, Konqueror/Safari/OmniWeb, 
Opera, and iCab.  (IE and Amaya have trouble with some of the code in 
this CSS file; I'll try to figure out how to make them display TEI 
properly.)

<?xml-styleheet type="text/css" 
href="http://www.shinparam.org/Sam/Projects/TEI-CSS/prettynovel.css"?>


Here are two books I'm in the midst of proofing and marking up, both of 
which look fairly good when viewed with that CSS stylesheet:

http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml

http://shinparam.org/Sam/Projects/TEI-CSS/Roberts-Bookbinding-draft.xml
From joshua at hutchinson.net  Sun Nov  5 05:29:17 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Sun Nov  5 05:29:20 2006
Subject: [gutvol-d] TEI rendering in Web browsers
Message-ID: <12274920.1162733357578.JavaMail.?@fh1039.dia.cp.net>

Yep, you *can* make tei render.  And maybe someday it is something we 
will work on, but right now, the design decision is that the tei is 
master file and NOT an end-user document.  (And some of the elements in 
some tei elements will never render directly in a web document with 
just a bit of css due to its dynamic nature)

Josh

>----Original Message----
>From: sam.bretheim@gmail.com
>Date: Nov 5, 2006 1:14 
>To: "Project Gutenberg Volunteer Discussion"<gutvol-d@lists.pglaf.
org>
>Subj: [gutvol-d] TEI rendering in Web browsers
>
>joshua@hutchinson.net wrote:
>> The gist is, the TEI files are not meant to be parse by a web 
browser, 
>> so the fact that they DON'T display properly basically means 
everything 
>> is working according to design.
>>   
>
>It's worth mentioning that modern Web browsers are quite capable of 
>displaying TEI reasonably well, though some work on the relevant TEI 
and 
>XSL stylesheets is necessary before they're ready to be widely used.
>
>For instance, if the author had inserted the following near the 
>beginning of the document, it would have rendered quite tolerably in 
>recent versions of Firefox/Mozilla/Camino, Konqueror/Safari/OmniWeb, 
>Opera, and iCab.  (IE and Amaya have trouble with some of the code 
in 
>this CSS file; I'll try to figure out how to make them display TEI 
>properly.)
>
><?xml-styleheet type="text/css" 
>href="http://www.shinparam.org/Sam/Projects/TEI-CSS/prettynovel.css"?
>
>
>
>Here are two books I'm in the midst of proofing and marking up, both 
of 
>which look fairly good when viewed with that CSS stylesheet:
>
>http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml
>
>http://shinparam.org/Sam/Projects/TEI-CSS/Roberts-Bookbinding-draft.
xml
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From desrod at gnu-designs.com  Sun Nov  5 06:08:56 2006
From: desrod at gnu-designs.com (David A. Desrosiers)
Date: Sun Nov  5 06:09:36 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
Message-ID: <Pine.LNX.4.64.0611050906050.30530@aphrodite.gnu-designs.com>


> The problem is that IE is not following the "proper" web standards. 
> The server sends information that the file is text/plain.  Which 
> means IE should treat it like a plain text file and display it 
> accordingly, ignoring any and all markup.

 	Does PG have TEI as a registered mime type?

 	http://www.iana.org/assignments/media-types/

 	In our own space, Plucker has one[1].. and it might further 
adoption if TEI had one registered as well, so webserver maintainers 
could set the proper AddType directive and serve TEI documents with a 
mime type that browsers (even broken ones like MSIE) could render 
properly.


[1] http://www.iana.org/assignments/media-types/application/prs.plucker

David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From jon at noring.name  Sun Nov  5 06:48:16 2006
From: jon at noring.name (Jon Noring)
Date: Sun Nov  5 06:46:28 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <12274920.1162733357578.JavaMail.?@fh1039.dia.cp.net>
References: <12274920.1162733357578.JavaMail.?@fh1039.dia.cp.net>
Message-ID: <865311856.20061105074816@noring.name>

Joshua wrote:

> Yep, you *can* make tei render.  And maybe someday it is something we
> will work on, but right now, the design decision is that the tei is 
> master file and NOT an end-user document.  (And some of the elements in
> some tei elements will never render directly in a web document with 
> just a bit of css due to its dynamic nature)

As Joshua notes, it is certainly possible to make TEI, and most any
XML vocabualary, render using CSS.

However, TEI may include structures which HTML never had to support,
so if these structures are used in TEI markup, it becomes more
difficult to properly render that document in web browsers.

The prime example of such a TEI construct is the <note> tag, where a
note can be embedded right within the main flow of the book. For a
couple TEI examples of using the <note> element, see:

   http://www.tei-c.org/P4X/ref-NOTE.htmlIn
   
In direct TEI rendering we want the note to be extracted from the main
flow and rendered separately. Unfortunately the HTML vocabulary never
included this construct, so web browsers never have had to natively
develop something to handle inline notes. Thus, without special CSS,
the inline note simply merges with the main text when rendered. Not
pretty.

Here's a demo of how an inline TEI <note> might be rendered using CSS
(my aesthetic skill with CSS sucks royally, but you get the picture):

   http://www.windspun.com/demoxml/demonote.xml

The above demo works best in Opera (Opera has the best and widest CSS
conformance, mainly because the Opera CTO, Hakon Lie, is the co-inventor
of CSS), but also renders o.k. in Firefox (looks pretty good in the
recent Firefox 2.0). The demo will NOT work in IE6 -- I haven't yet
tried the new IE7.

Most processing with TEI these days is to use it as a master format,
and then use XSLT to convert it to XHTML optimized for web browsers.
Things like <note> are yanked and moved somewhere else during the
transformation.

It is the intent of the OpenReader format to someday natively support
TEI, or some "ebook subset" of TEI (been in touch with Sebastian Rahtz
on this). Since OpenReader supports the OEBPS out-of-spine feature, all
OpenReader reading clients will already have a built-in means to
natively handle the TEI <note> and do so in innovative ways not requiring
special user-provided CSS.

Jon Noring


From jon at noring.name  Sun Nov  5 07:14:34 2006
From: jon at noring.name (Jon Noring)
Date: Sun Nov  5 07:12:43 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <454D8138.6040307@gmail.com>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
	<454D8138.6040307@gmail.com>
Message-ID: <1912936803.20061105081434@noring.name>

Sam Bretheim wrote:

> It's worth mentioning that modern Web browsers are quite capable of 
> displaying TEI reasonably well, though some work on the relevant TEI and
> XSL stylesheets is necessary before they're ready to be widely
> used....
>
> Here are two books I'm in the midst of proofing and marking up, both of
> which look fairly good when viewed with that CSS stylesheet:
>
> http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml
>
> http://shinparam.org/Sam/Projects/TEI-CSS/Roberts-Bookbinding-draft.xml


Cool! Good work!

Referring to my prior message a few minutes ago, where I showed how
CSS may be used to handle TEI <note> used inline, I notice from
looking at your CSS commentary that PGTEI does not support <note> used
inline, which is understandable...


Jon


From jon at noring.name  Sun Nov  5 08:20:17 2006
From: jon at noring.name (Jon Noring)
Date: Sun Nov  5 08:18:28 2006
Subject: [gutvol-d] 
	Oops, got it wrong... (was TEI rendering in Web browsers)
In-Reply-To: <1912936803.20061105081434@noring.name>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
	<454D8138.6040307@gmail.com> <1912936803.20061105081434@noring.name>
Message-ID: <558740280.20061105092017@noring.name>

Jon Noring previously wrote:

> Referring to my prior message a few minutes ago... I notice from
> looking at [Sam's] CSS commentary that PGTEI does not support <note>
> used inline, which is understandable...

Oops, got this totally wrong. Sorry.

I just referred to Marcello's PGTEI guide (PG text #20000) and
misinterpreted what Sam meant by non-support for inline notes.

He does not refer to the placement of the note in the PGTEI document
(a note certainly can be placed inline at the point of reference),
but rather to how it is to be dealt with in conversion and rendering.

Here's a PGTEI markup example from the Marcello's PGTEI guide:

*********************************************************************
<p> When I was a boy, there was but one permanent ambition among my
comrades in our village<note place="foot"><p>

Hannibal, Missouri.

</p></note> on the west bank of the Mississippi River. That was, to
be a steamboatman. ... </p>


Will be rendered as:

When I was a boy, there was but one permanent ambition among my
comrades in our village(3) on the west bank of the Mississippi River.
That was, to be a steamboatman. ...
*********************************************************************

Indeed in PGTEI the note can be placed inline in the paragraph at the
point of reference.


Btw, aren't there a few old books where notes do appear inline at the
point of reference?

Jon


From Bowerbird at aol.com  Sun Nov  5 09:53:52 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Nov  5 09:54:13 2006
Subject: [gutvol-d] gvd061104 -- another commentary on consistency
Message-ID: <cc9.22699c9.327f7f30@aol.com>

keith said:
>    If I go to DP. I would look at their concept and specifications and
>    ask how I can help or offer and discuss better ways of doing things.

and when you do that, you will
"piss off everybody in sight",
as marcello so delicately put it...

unless you're meek and obsequious,
in which case you will be tolerated...
and ignored...

***

keith said:
>   If PG had as a base format 
>    a minimalistic markup 
>    then all would benefit from that.

just to let you know, keith, p.g. _does_ have
"a minimalistic markup", at least "officially"...

for instance, the rule is that four blank lines
should precede a header, and two follow it.

the problem -- especially bad in the past -- 
is they are too casual about quality-control,
so the e-texts are _inconsistent_ in regard to
their adherence to this minimalistic markup...

so, to use our example, you will find _some_
headers occasionally have _three_ blank lines
preceding them, or only one following them...

_if_ the e-texts _were_ consistent, it would be
good, because then tools could be created to
bring about a wide variety of benefits for users.

again with the headers, it would be easy as pie
to create a _library-wide_ "table of contents" --
each header hotlinked to its respective chapter.
i've already demonstrated how to make that work.

z.m.l. merely codifies p.g.'s "minimalistic markup"
-- adding a few extensions to it, where required --
which is why so many e-texts are "z.m.l.-ready"
right out of the box.

the "extension" that was necessary for headers
was to specify that different "levels" of headers
can be indicated by using _more_than_four_
blank lines before the header, with more blank
lines indicated a higher priority of the heading...

what i'd say to you, keith, is that you already
have what you need -- at least "officially" --
but you'll find when you get "on the ground"
(inside the e-texts), the "official" standard is
not being applied consistently, unfortunately.

that's what i've found.   and when i traced it,
that's what _every_ programmer has found.

this is why i would be very happy if all of the
e-texts were marked up in .tei.   not because i
think .tei is the answer -- it's not -- but because
i would finally be getting _consistent_ e-texts...

so i wish josh would get to work on that .tei stuff.

(of course, taking on the complexity of .tei so you
can get the easy benefit of _consistency_ is overkill,
but as long as it's _somebody_else_ paying the cost
in terms of their time and energy, _i_ don't mind...)

***

on an encouraging note, the level of consistency
_has_ been increasing in recent years, due to d.p.

(for the text versions, anyway.   now it's the .html
versions that are the rat's nest of inconsistencies.)

but, on a discouraging note, there are still _huge_
holes in the text versions that are being created...

the first is that image-names are not being included.
the _caption_ for the image is included, but the actual
_name_ of the image-file is not.   i have _requested_
that the image-file-name be included, but this request
was _turned_down_.   indeed, turned down _repeatedly_.

it's as if the "powers that be" are _deliberately_ trying to
make the text-file as impotent as possible.   astounding.

another problem is that linebreak and pagebreak
information is routinely tossed out of the text files.
again, it's tragic this information is being discarded.

the pagebreak information, at least, is being retained
in the .html version, at least by _some_ postprocessors.

and of course the .html versions have the information
on the filenames of the graphics that were in the book.

thus i've written routines that scour the .html file to
get this information and restore it to the text version.

but it is a shame that work has to be done just to restore
the information that distributed proofreaders tossed out;
still, if they're gonna be _stubborn_ about their stupidity,
at least it's good that the information _can_ be recovered.

the information about _linebreaks_, for instance, is gone,
and cannot be recovered.

what that means, ultimately, is that all the books that are
in the p.g. library will have to be subjected to o.c.r. again,
this time _retaining_ the linebreaks.   (thank _goodness_
that google is doing all the scanning, so we can be sure
that all of the books in p.g. will be scanned, eventually...)

then the p.g. version can be used to _proof_ that new o.c.r.

after that, though, the p.g. version will simply be discarded.

and the d.p. volunteers, who _thought_ they were working
against the backdrop of all time, will be quite saddened (and
perhaps angered as well) to learn that they had been misled.

all because they didn't save linebreak information.  sad, eh?

they thought linebreaks were "superfluous" in an e-book.

well, maybe they are, but since it will be simple for people
in the future to toss the linebreaks _if_they_want_to_do_so_,
why not keep 'em, in case those future people _want_ them?

besides, in the near future, when we're still making the shift
between paper-books and their electronic cousins, anything
that helps maintain synergy between the two (like linebreaks)
is something that we cannot afford to be tossing out casually.

i'm still waiting for the postprocessor who is brave enough to
make the decision to keep the linebreaks in the books they do.

(in that vein, big congratulations to chuck greif, who recently
posted #19703, whose chapter-headings link _back_ to the
table-of-contents, which i've recommended for a long time.)

the other main shortcoming in the text versions these days is
a failure to indicate which lines should _not_ be rewrapped...

this is a terrible problem that causes the most havoc with the
conversions to other formats, since poetry, lists, and tables 
-- even tables of contents and tables of illustrations -- and
sundry other stuff (such as address blocks) get mangled...

this problem could be easily solved with a dirt-simple rule.

the one i use in z.m.l. is that any line with leading whitespace
(a space or a tab) is not to be wrapped.   so all a person has to
do is put a leading space on such lines.   this little rule makes a
huge difference by making sure lines aren't incorrectly wrapped,
and programmers can write simple routines to do rewrapping...

for books where _nothing_ is to be rewrapped (e.g., poetry),
you can just do a global change at the end of all your editing
to change every newline into a newline followed by a space...

also, this rule nicely delineates a _block_, such as a table,
because all of the lines in the block are indented, while
the empty lines above and below them set off the block...

again, just one more demonstration where a very simple rule
-- easy enough for an average 4th-grader to grok and use --
can give us tremendous power if we only leverage it correctly...

but of course, all the good rules in the world don't matter much
if you ain't willing to spend the time to make sure you follow 'em.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061105/b5db8b06/attachment.html
From gbnewby at pglaf.org  Sun Nov  5 11:39:41 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Nov  5 11:39:42 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <91C6CECB-0128-4730-B1B0-51CEE6E0B158@uni-trier.de>
Message-ID: <20061105193940.GA5654@pglaf.org>

On Sat, Nov 04, 2006 at 09:15:56PM +0100, Schultz Keith J. wrote:
> Hi Greg,
> 
> 	What I am talking about is a base format which would require
> 	that all PG texts be scanned and process to contain information
> 	on chapter, pages, paragraphs, type, etc..

Your use of the phrase of "require that all PG texts..." indicates
you're not paying attention.

What you are asking for, will not be granted.  What is granted, to you
or anyone, is in my email below.  Both what will and will not be granted
has already been stated by many people in this discussion thread, and is
in the About articles I mentioned.  

It sounds like you need to find or start another project to meet your
goals (also described in the About articles, and similarly encouraged).

  -- Greg

> 	It would not exclude conversion to other formats, but greatly 
> 	simplify
> 	the process. Also, the format could contain information to be 
> 	imported
> 	the databases use by PG-Website.
> 
> 	As I am only one person, who may not see all implecation or aspects
> 	others would be needed. Also, it would have to be fully endorsed by 
> 	PG.
> 	If PG is interested in developing such a standard I am your man.
> 
> 	 regards
> 		Keith.
> 
> Am 04.11.2006 um 02:35 schrieb Greg Newby:
> 
> >On Fri, Nov 03, 2006 at 01:38:00PM +0100, Schultz Keith J. wrote:
> >>...
> >>	I would have already set up a working system 10 years ago, if  
> >>Micheal
> >>	would have allowed it. But, I will not start doing anything unless
> >>	it gets an official go ahead and will be used by PG officially.
> >>	My time is to precious to waste on anything else.
> >
> >Keith, here is your official "go ahead."
> >As Michael responded, we'll use essentially any reasonable format.
> >So if you have something in mind, go for it.
> >
> >If you were thinking that your official format would exclude other
> >formats, that's a different issue.  I don't think that will happen
> >too soon...
> >
> >Officially yours,
> >  -- Greg
> >
> >Dr. Gregory B. Newby
> >Chief Executive and Director
> >Project Gutenberg Literary Archive Foundation http://gutenberg.net
> >A 501(c)(3) not-for-profit organization with EIN 64-6221541
> >gbnewby@pglaf.org
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d@lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
From traverso at dm.unipi.it  Sun Nov  5 12:19:38 2006
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Sun Nov  5 12:15:03 2006
Subject: [gutvol-d] 
	Oops, got it wrong... (was TEI rendering in Web browsers)
In-Reply-To: <558740280.20061105092017@noring.name> (message from Jon Noring
	on Sun, 5 Nov 2006 09:20:17 -0700)
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
	<454D8138.6040307@gmail.com> <1912936803.20061105081434@noring.name>
	<558740280.20061105092017@noring.name>
Message-ID: <200611052019.kA5KJc414884@pico.dm.unipi.it>

>>>>> "Jon" == Jon Noring <jon@noring.name> writes:

    Jon> Btw, aren't there a few old books where notes do appear
    Jon> inline at the point of reference?

Many old books have side notes in the margins, not exactly inline,
although on the same line. I have never seen one with notes exactly
inline. 

But when it is inline, how to you tell that it is a note and not an
incidental remark?

Carlo
From jon at noring.name  Sun Nov  5 13:26:42 2006
From: jon at noring.name (Jon Noring)
Date: Sun Nov  5 13:24:52 2006
Subject: [gutvol-d]  Oops,
	got it wrong... (was TEI rendering in Web browsers)
In-Reply-To: <200611052019.kA5KJc414884@pico.dm.unipi.it>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
	<454D8138.6040307@gmail.com> <1912936803.20061105081434@noring.name>
	<558740280.20061105092017@noring.name>
	<200611052019.kA5KJc414884@pico.dm.unipi.it>
Message-ID: <556830562.20061105142642@noring.name>

Carl wrote:

> Many old books have side notes in the margins, not exactly inline,
> although on the same line. I have never seen one with notes exactly
> inline. 
>
> But when it is inline, how to you tell that it is a note and not an
> incidental remark?

Good point.

Contentwise, the distinction between a note and an incidental remark
may sometimes be difficult to discern. It depends upon a bunch of
factors, such as if the snippet of text is typographically
distinguished from the surrounding text, or if it was inserted by the
editor (not the author) and noted thusly, etc. In some cases it indeed
can be murky whether or not the inline snippet can be "reassigned" to
be a note, footnote, endnote, etc.

I suppose if the book contains lots of referenced notes, but then has
this inline snippet, the snippet should not be considered a note but
instead simply part of the main text. But what if otherwise the book
has no other notes? It is a case-by-case basis, I suppose.

I assume the decision by PGTEI not to support "inline" among the
attribute values for type of <note> is based on a lot of experience
with real-world texts already worked on by PG and DP, which includes a
lot of older books wherein we find all kinds of difficult conventions
no longer in use today.

Jon

From marcello at perathoner.de  Sun Nov  5 13:52:10 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Nov  5 13:52:15 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <Pine.LNX.4.64.0611050906050.30530@aphrodite.gnu-designs.com>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
	<Pine.LNX.4.64.0611050906050.30530@aphrodite.gnu-designs.com>
Message-ID: <454E5D0A.90904@perathoner.de>

David A. Desrosiers wrote:

>     Does PG have TEI as a registered mime type?

TEI is a format maintained by the TEI Consortium:

  http://www.tei-c.org

I can ask them what their plans are ...


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Sun Nov  5 14:02:13 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Nov  5 14:02:24 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <1912936803.20061105081434@noring.name>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>	<454D8138.6040307@gmail.com>
	<1912936803.20061105081434@noring.name>
Message-ID: <454E5F65.2080909@perathoner.de>

Jon Noring wrote:

> I notice from
> looking at your CSS commentary that PGTEI does not support <note> used
> inline, which is understandable...

PGTEI supports inlined <note> elements and can render them as run-in,
page footnote, any section endnote, book endnote or page margin note.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From sam.bretheim at gmail.com  Sun Nov  5 14:14:23 2006
From: sam.bretheim at gmail.com (Sam Bretheim)
Date: Sun Nov  5 14:15:29 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <454E5F65.2080909@perathoner.de>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>	<454D8138.6040307@gmail.com>	<1912936803.20061105081434@noring.name>
	<454E5F65.2080909@perathoner.de>
Message-ID: <454E623F.3010107@gmail.com>

Marcello Perathoner wrote:
> Jon Noring wrote:
>
>   
>> I notice from
>> looking at your CSS commentary that PGTEI does not support <note> used
>> inline, which is understandable...
>>     
>
> PGTEI supports inlined <note> elements and can render them as run-in,
> page footnote, any section endnote, book endnote or page margin note.
>   

 From the PGTEI 0.4 guide's section on the <note> element:

"The *place* attribute supports only the values of *foot*, *end* and 
*margin*."

Jon's comment was referring to <note place="inline">, not the general 
notion of inlined <note> elements.
From marcello at perathoner.de  Sun Nov  5 14:39:07 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Nov  5 14:39:12 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <454E623F.3010107@gmail.com>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>	<454D8138.6040307@gmail.com>	<1912936803.20061105081434@noring.name>	<454E5F65.2080909@perathoner.de>
	<454E623F.3010107@gmail.com>
Message-ID: <454E680B.9010305@perathoner.de>

Sam Bretheim wrote:

>> PGTEI supports inlined <note> elements and can render them as run-in,
>> page footnote, any section endnote, book endnote or page margin note.
>>   
> 
> From the PGTEI 0.4 guide's section on the <note> element:
> 
> "The *place* attribute supports only the values of *foot*, *end* and
> *margin*."
> 
> Jon's comment was referring to <note place="inline">, not the general
> notion of inlined <note> elements.

You can collect endnotes at any given point in your text. If you place
<divGen type="endnotes" target="id"> someplace in your text it will
output all notes in the section identfied by id.

In the newest version (to be released) a <note place="inline"> will just
be converted into a <hi> with no special formatting attached.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Sun Nov  5 15:15:28 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Nov  5 15:15:36 2006
Subject: [gutvol-d] gvd061105 -- to mimic the competitors left back in the
	dust
Message-ID: <bef.80638a8.327fca90@aol.com>

sam said:
>   modern Web browsers are quite capable
>    of displaying TEI reasonably well, though 
>    some work on the relevant TEI and XSL stylesheets 
>    is necessary before they're ready to be widely used.
...
>   http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml
>    http://shinparam.org/Sam/Projects/TEI-CSS/Roberts-Bookbinding-draft.xml

first, nice work, on both of them.

the headers are good -- big, and bold, and
start on a new page when printed to .pdf...

i woulda liked a hotlinked table-of-contents,
as a bare minimum.   and the dictionary just
cries out for hotlinks in the "see:" entries and
elsewhere, but as i said, overall, they look nice.

***

second, we don't need a 1.3-meg example,
or a 2.3-meg one either.   how about you 
mark up my nice little 17k test-suite, eh?
>    http://snowy.arsc.alaska.edu/bowerbird/test-suite/test-suite.zml

there are also interesting philosophical issues
that are raised by the dictionary example, but
i won't jump in to that can of worms quite yet...

(the lead question, if you want to start preparing:
is a 2.3-meg scrolling-field the best way to display
a dictionary, where most people, most of the time,
likely want to view just one or a few specified terms?
that segues to a critical look at browser as e-book.)

***

third, examination of your markup shows a very high
level of expertise.   congratulations on attaining that.
how much time are you willing to volunteer to p.g.?

the answer will be telling.   we can't expect volunteers to
develop expertise like this.   conversely, we can't expect
people who have such expertise to volunteer much time.

so a big question is this: exactly who does the markup?

i'm guessing you'll agree that botched markup is worse
than none at all.   but botched markup is all you will get
from 89% of the people even willing to take on the task.
so how much time would you have to supervise people?
and do you have a generous tolerance for incompetence?

another big question:   where is the demand for markup?
do average readers really need it?   i seriously doubt it...
even if it came for free (and it doesn't), who needs it?

***

fourth, the ability to load .tei directly into the browser
is where the .tei needs to head, because it is pointless
to spend the resources to create a high-quality format
and then convert that to lesser-and-dumber formats...

but that conflict is telling too, because project gutenberg
is targeted squarely and directly at "the ordinary people",
so the expectation of "modern browsers" cannot be made.

to the contrary, to hit that target, we are forced to assume
_trailing-edge_machinery_ that only runs _old_software_,
or (at best) apps we make that execute on low resources.

yeah, i know, that's no fun, is it?   we like to be all modern,
play with the fast, new, shiny toys, don't we?   sure we do...

...but...

3 out of 4 computers are still running internet explorer 6.
yeah, it's a piece-of-crap browser, you'll get no argument
from me on that, i threw that shit program away long ago,
along with just about every other microsoft app i ever had.

but you and i both know that, if that 75% penetration for ie6
changes substantially in the future, the gains will go to ie7,
because the people running ie6 now are microsoft zombies.
(so, do you know if ie7 will be better?   i know what i'd bet...)

and there's absolutely no question that the number one reason
project gutenberg is known as the premiere cyberspace library
is that it made itself available to every computer on the planet.
(and yes, that does in fact mean even the microsoft zombies,
because if you don't have them, you don't have critical mass.)

it's the guys that required the high-end equipment that got
left in the dust, back at the starting gate, never to catch up...

so i'll ask the question again:   do you want to mimic _them_?

-bowerbird

p.s.   of course, even when .tei displays in the trailing browsers,
which will be what, about 5-10 years from now, that won't be
an _end_ point, as much as it is the time to _start_ reflecting
on the philosophical questions hinted at in an earlier point;
but hey, let's hope we start asking those questions sooner...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061105/194b4f57/attachment.html
From jon at noring.name  Sun Nov  5 15:33:50 2006
From: jon at noring.name (Jon Noring)
Date: Sun Nov  5 15:32:01 2006
Subject: [gutvol-d] Oops,
	forgot to mention hypertext linking and embedding images
	(was gvd061105...)
In-Reply-To: <bef.80638a8.327fca90@aol.com>
References: <bef.80638a8.327fca90@aol.com>
Message-ID: <291263687.20061105163350@noring.name>

Bowerbird wrote:
> sam said:

>>?modern Web browsers are quite capable of displaying TEI reasonably
>> well, though some work on the relevant TEI and XSL stylesheets is
>> necessary before they're ready to be widely used....
>>
>>  ? http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml
>>??  http://shinparam.org/Sam/Projects/TEI-CSS/Roberts-Bookbinding-draft.xml

> i woulda liked a hotlinked table-of-contents,
> as a bare minimum.?  and the dictionary just
> cries out for hotlinks in the "see:" entries and
> elsewhere, but as i said, overall, they look nice.

Ah yes, in addition to inline notes, I forgot to mention the problems
in getting links and embedded images to work when natively rendering
TEI documents in web browsers. (Linking is needed for a hypertext-
enabled table of contents as Bowerbird is suggesting.)

Unfortunately, to do both, generically-speaking, requires XLink. And
unfortunately, only Firefox recognizes some XLink (why Opera still
doesn't implement XLink is a mystery -- and forget IE). And
unfortunately again, Firefox 1.x (haven't checked 2.0) will only
enable hypertext links and not embed images and multimedia.

Here's a demo of XLink-based linking to try in Firefox:

   http://www.windspun.com/demoxml/demolink.xml


All in all, web browsers still have a little ways to go to be able to
natively render the full-power of TEI or any similar XML vocabulary.
That's why TEI is mostly used for "mastering" which is then
transformed into other XML vocabularies as needed for specific
applications, such as rendering in browsers.

Jon Noring


From sam.bretheim at gmail.com  Sun Nov  5 16:48:05 2006
From: sam.bretheim at gmail.com (Sam Bretheim)
Date: Sun Nov  5 17:12:47 2006
Subject: [gutvol-d] Oops, forgot to mention hypertext linking and embedding
	images	(was gvd061105...)
In-Reply-To: <291263687.20061105163350@noring.name>
References: <bef.80638a8.327fca90@aol.com>
	<291263687.20061105163350@noring.name>
Message-ID: <454E8645.7010306@gmail.com>

Jon Noring wrote:
>
> Ah yes, in addition to inline notes, I forgot to mention the problems
> in getting links and embedded images to work when natively rendering
> TEI documents in web browsers. (Linking is needed for a hypertext-
> enabled table of contents as Bowerbird is suggesting.)
>
> Unfortunately, to do both, generically-speaking, requires XLink. And
> unfortunately, only Firefox recognizes some XLink (why Opera still
> doesn't implement XLink is a mystery -- and forget IE). And
> unfortunately again, Firefox 1.x (haven't checked 2.0) will only
> enable hypertext links and not embed images and multimedia.
>   

XLink isn't actually the only way to accomplish that: you can get links, 
images, and a number of other things by importing the XHTML namespace 
into your XML document.  (Of course, browser compatibility is limited, 
and you'd still need some XSL to transform images from the bizarre 
entity construct that regular TEI uses, or even the more sane PGTEI 
image syntax.)

I've put up a quickie example at: 
http://shinparam.org/Sam/Projects/TEI-CSS/tei-xmlns-demo.xml

I'm certainly not advocating that in-browser TEI is the One True Way way 
to distribute eTexts; it's just a trick that I found useful for 
previewing files while authoring them.

From bruce at zuhause.org  Sun Nov  5 17:07:33 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Sun Nov  5 17:35:43 2006
Subject: [gutvol-d] PG disclaimer obsolete?
Message-ID: <17742.35541.262189.857767@celery.zuhause.org>

And now for something completely different.

One of the things that has irked me for some time is the following
paragraph of the disclaimer:

Project Gutenberg-tm eBooks are often created from several printed
editions, all of which are confirmed as Public Domain in the U.S.
unless a copyright notice is included.  Thus, we do not necessarily
keep eBooks in compliance with any particular paper edition. 

All of Distributed Proofreader projects where I have been the content
provider have been from specific editions, and most, if not all of
them, have HTML editions which preserve page breaks.  Is there any way
to have this paragraph removed from the boilerplate, at least by
request of the submitter?
From jon at noring.name  Sun Nov  5 18:46:20 2006
From: jon at noring.name (Jon Noring)
Date: Sun Nov  5 18:44:32 2006
Subject: [gutvol-d] PG disclaimer obsolete?
In-Reply-To: <17742.35541.262189.857767@celery.zuhause.org>
References: <17742.35541.262189.857767@celery.zuhause.org>
Message-ID: <1626627843.20061105194620@noring.name>

Bruce Albrecht wrote:

> One of the things that has irked me for some time is the following
> paragraph of the disclaimer:

Well, this issue has irked a lot of people. The number of rabid
supporters of this policy of "source mystery" is very few, maybe even
one or two. I have my theories as to the origin of this policy, but
will refrain since hopefully all the PG "mystery source" texts will be
properly redone in the future.


> Project Gutenberg-tm eBooks are often created from several printed
> editions, all of which are confirmed as Public Domain in the U.S.
> unless a copyright notice is included.  Thus, we do not necessarily
> keep eBooks in compliance with any particular paper edition. 
>
> All of Distributed Proofreader projects where I have been the content
> provider have been from specific editions, and most, if not all of
> them, have HTML editions which preserve page breaks.  Is there any way
> to have this paragraph removed from the boilerplate, at least by
> request of the submitter?

Well, definitely if a text comes from a single source, then that
*should* be specifically noted in the text, and to mention which text
it comes from. Also called "source metadata".

I think one has to take a consumer's advocate view of this: Would you,
as a consumer:

1) *prefer* the etext you get is a faithful reproduction of a known
   source?

or

2) Are you content knowing the PG text you read may be:

   a) a composite from several sources, none of which are given, and

   b) that there may have been editing done on the text, the extent of
      which is not stated, nor do you even know who did the editing or
      on what basis editing was done.


Unfortunately, many of the most popular texts in the PG corpus were
put together before DP came on the scene. This is why several of us
have urged DP to redo each of these works using a known and reasonably
authoritative source. I'm all for DP doing all those obscure texts
(like 19th century pigeon recipe cookbooks <justkidding/>), but the
most popular works need to be put on a sound footing. If DP doesn't,
someone else will, and these new texts will be outside of PG. Of
course, as Greg just noted, PG encourages outside projects to digitize
the public domain. We're all in this together.

Jon Noring


From joshua at hutchinson.net  Sun Nov  5 18:49:11 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Sun Nov  5 18:49:18 2006
Subject: [gutvol-d] The Proof is in the Poo
Message-ID: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net>


>----Original Message----
>From: desrod@gnu-designs.com
>
> 	Does PG have TEI as a registered mime type?
>
> 	http://www.iana.org/assignments/media-types/
>
> 	In our own space, Plucker has one[1].. and it might further 
>adoption if TEI had one registered as well, so webserver maintainers 
>could set the proper AddType directive and serve TEI documents with 
a 
>mime type that browsers (even broken ones like MSIE) could render 
>properly.
>
>

The problem is that a browser CAN'T render a TEI document correctly 
(see Jon Noring's posts on how close it can come in certain browsers).  
So having it SAY it is a TEI document gives no benefit.  We have it 
specifically say a TEI document is plain text hoping the browser will 
do what web standards say it should do and treat it as plain text.  
Unfortunately, IE thinks it knows better and proceeds to fail 
miserably.  :)

Josh
From traverso at dm.unipi.it  Sun Nov  5 21:19:46 2006
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Sun Nov  5 21:15:07 2006
Subject: [gutvol-d] PG disclaimer obsolete?
In-Reply-To: <17742.35541.262189.857767@celery.zuhause.org> (message from
	Bruce Albrecht on Sun, 5 Nov 2006 19:07:33 -0600)
References: <17742.35541.262189.857767@celery.zuhause.org>
Message-ID: <200611060519.kA65Jkd04782@pico.dm.unipi.it>

>>>>> "Bruce" == Bruce Albrecht <bruce@zuhause.org> writes:

    Bruce> And now for something completely different.

    Bruce> One of the things that has irked me for some time is the
    Bruce> following paragraph of the disclaimer:

    Bruce> Project Gutenberg-tm eBooks are often created from several
    Bruce> printed editions, all of which are confirmed as Public
    Bruce> Domain in the U.S.  unless a copyright notice is included.
    Bruce> Thus, we do not necessarily keep eBooks in compliance with
    Bruce> any particular paper edition.

    Bruce> All of Distributed Proofreader projects where I have been
    Bruce> the content provider have been from specific editions, and
    Bruce> most, if not all of them, have HTML editions which preserve
    Bruce> page breaks.  Is there any way to have this paragraph
    Bruce> removed from the boilerplate, at least by request of the
    Bruce> submitter?

A custom boilerplate is impossible, since it may be changed when the
PG standard changes and an ebook is revised. But a small change of
wording might be adopted: instead of "often" say "sometimes". Then the
source informations (that now are no longer removed from the text)
might be sufficient. Although including metadata stating the source
would be much much better.

And I believe that the sentence as it stands is a big lie: how many PG
books have a filed clearance for more than one source, and these have
been used for the preparation? No more than 0.1% I believe, and this
is not "often".

Carlo
From cannona at fireantproductions.com  Sun Nov  5 21:25:31 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sun Nov  5 21:25:36 2006
Subject: [gutvol-d] PG disclaimer obsolete?
References: <17742.35541.262189.857767@celery.zuhause.org>
	<200611060519.kA65Jkd04782@pico.dm.unipi.it>
Message-ID: <002401c70163$fc2e20a0$0300a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You beat me to it.  However, I would personally suggest the following
wording:

In some rare cases, Project Gutenberg-tm eBooks are created from several
printed editions, all of which are confirmed as Public Domain in the U.S.

Sincerely
Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)
- ----- Original Message -----
From: "Carlo Traverso" <traverso@dm.unipi.it>
To: <gutvol-d@lists.pglaf.org>
Cc: <gutvol-d@lists.pglaf.org>
Sent: Sunday, November 05, 2006 11:19 PM
Subject: Re: [gutvol-d] PG disclaimer obsolete?


>>>>>> "Bruce" == Bruce Albrecht <bruce@zuhause.org> writes:
>
>    Bruce> And now for something completely different.
>
>    Bruce> One of the things that has irked me for some time is the
>    Bruce> following paragraph of the disclaimer:
>
>    Bruce> Project Gutenberg-tm eBooks are often created from several
>    Bruce> printed editions, all of which are confirmed as Public
>    Bruce> Domain in the U.S.  unless a copyright notice is included.
>    Bruce> Thus, we do not necessarily keep eBooks in compliance with
>    Bruce> any particular paper edition.
>
>    Bruce> All of Distributed Proofreader projects where I have been
>    Bruce> the content provider have been from specific editions, and
>    Bruce> most, if not all of them, have HTML editions which preserve
>    Bruce> page breaks.  Is there any way to have this paragraph
>    Bruce> removed from the boilerplate, at least by request of the
>    Bruce> submitter?
>
> A custom boilerplate is impossible, since it may be changed when the
> PG standard changes and an ebook is revised. But a small change of
> wording might be adopted: instead of "often" say "sometimes". Then the
> source informations (that now are no longer removed from the text)
> might be sufficient. Although including metadata stating the source
> would be much much better.
>
> And I believe that the sentence as it stands is a big lie: how many PG
> books have a filed clearance for more than one source, and these have
> been used for the preparation? No more than 0.1% I believe, and this
> is not "often".
>
> Carlo
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFFTsdOI7J99hVZuJcRAs1ZAJ9bcKO/31fTLKwjLF3PVV7Vo4rCbwCgnp3o
KRx9RNW9Mqh23XUj65E4ls0=
=AxKX
-----END PGP SIGNATURE-----

From schultzk at uni-trier.de  Mon Nov  6 01:05:50 2006
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Mon Nov  6 01:05:57 2006
Subject: [gutvol-d] gvd061104 -- another commentary on consistency
In-Reply-To: <cc9.22699c9.327f7f30@aol.com>
References: <cc9.22699c9.327f7f30@aol.com>
Message-ID: <69A833B0-CFB2-415E-ABC6-DEFCD0F8384B@uni-trier.de>

Hi BB,
	PG has a recomended format! I would not call it mark-up
	per-se.

Am 05.11.2006 um 18:53 schrieb Bowerbird@aol.com:

> keith said:
> [snip, snip]

> keith said:
> >   If PG had as a base format
> >   a minimalistic markup
> >   then all would benefit from that.
>
> just to let you know, keith, p.g. _does_ have
> "a minimalistic markup", at least "officially"...
>
> for instance, the rule is that four blank lines
> should precede a header, and two follow it.
	[snip, snip]
>
> the problem -- especially bad in the past --
> is they are too casual about quality-control,
> so the e-texts are _inconsistent_ in regard to
> their adherence to this minimalistic markup...
	That is why I wish for a base format with mark-up.

	regards,
		Keith.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061106/eb8bd381/attachment.html
From schultzk at uni-trier.de  Mon Nov  6 01:15:30 2006
From: schultzk at uni-trier.de (Schultz Keith J.)
Date: Mon Nov  6 01:15:36 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <20061105193940.GA5654@pglaf.org>
References: <20061105193940.GA5654@pglaf.org>
Message-ID: <B8A5E1CB-6911-4995-ABD9-5E28B28C42A8@uni-trier.de>

Hi Greg,

	I will make a proposal and explain the advantages and
	possible use.

	I ask only to reflect on the concept. Then decided if it
	is feasible and maybe if accepted have it implemented.

	I do know and understand perfectly well that I may be
	just barking up a tree for most people.

	But, it has been more than 5 years since I tried last.


	regards
		Keith.

Am 05.11.2006 um 20:39 schrieb Greg Newby:

> On Sat, Nov 04, 2006 at 09:15:56PM +0100, Schultz Keith J. wrote:
>> Hi Greg,
>>
>> 	What I am talking about is a base format which would require
>> 	that all PG texts be scanned and process to contain information
>> 	on chapter, pages, paragraphs, type, etc..
>
> Your use of the phrase of "require that all PG texts..." indicates
> you're not paying attention.
>
> What you are asking for, will not be granted.  What is granted, to you
> or anyone, is in my email below.  Both what will and will not be  
> granted
> has already been stated by many people in this discussion thread,  
> and is
> in the About articles I mentioned.
>
> It sounds like you need to find or start another project to meet your
> goals (also described in the About articles, and similarly  
> encouraged).
>

	[snip, snip]
From prosfilaes at gmail.com  Mon Nov  6 05:08:26 2006
From: prosfilaes at gmail.com (David Starner)
Date: Mon Nov  6 05:08:31 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net>
References: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net>
Message-ID: <6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com>

On 11/5/06, joshua@hutchinson.net <joshua@hutchinson.net> wrote:
>
>
> >----Original Message----
> >From: desrod@gnu-designs.com
> >
> >       Does PG have TEI as a registered mime type?
> >
> >       http://www.iana.org/assignments/media-types/
> >
> >       In our own space, Plucker has one[1].. and it might further
> >adoption if TEI had one registered as well, so webserver maintainers
> >could set the proper AddType directive and serve TEI documents with
> a
> >mime type that browsers (even broken ones like MSIE) could render
> >properly.
> >
> >
>
> The problem is that a browser CAN'T render a TEI document correctly
> (see Jon Noring's posts on how close it can come in certain browsers).
> So having it SAY it is a TEI document gives no benefit.

That's not true. Saying that it is a TEI document means that most
browsers would say that I don't know what this is and it would offer
to load it in another application. Anybody who actually wants to view
the TEI version probably has a better program to view it in; if all
else fails, Wordpad is about as good a tool as opening it up in the
browser. When a TEI viewer is written, the browser can be told to open
it in the viewer, and it shouldn't be that hard to make it a browser
plugin and view it in browser just like a PDF document.
From marcello at perathoner.de  Mon Nov  6 05:39:18 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon Nov  6 05:39:23 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com>
References: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net>
	<6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com>
Message-ID: <454F3B06.3010802@perathoner.de>

David Starner wrote:

> That's not true. Saying that it is a TEI document means that most
> browsers would say that I don't know what this is and it would offer
> to load it in another application. Anybody who actually wants to view
> the TEI version probably has a better program to view it in; if all
> else fails, Wordpad is about as good a tool as opening it up in the
> browser. When a TEI viewer is written, the browser can be told to open
> it in the viewer, and it shouldn't be that hard to make it a browser
> plugin and view it in browser just like a PDF document.

I have just ascertained on the TEI list that there is no registered IANA
MIME type for TEI documents. We thus have following alternatives:


text/plain: browser displays doc. User might save and open in app if she
wants.

application/octet-stream: browser prompts user to execute or save doc.
There are no native TEI viewers yet, so user will probably open in
editor, gaining nothing.

application/tei+xml: and don't we all love breaking standards for no
good reason?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jon at noring.name  Mon Nov  6 06:35:59 2006
From: jon at noring.name (Jon Noring)
Date: Mon Nov  6 06:41:17 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com>
References: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net>
	<6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com>
Message-ID: <1110693392.20061106073559@noring.name>

David Starner wrote:
> Joshua wrote:

>> The problem is that a browser CAN'T render a TEI document correctly
>> (see Jon Noring's posts on how close it can come in certain
>> browsers). So having it SAY it is a TEI document gives no benefit.

> That's not true. Saying that it is a TEI document means that most
> browsers would say that I don't know what this is and it would offer
> to load it in another application. Anybody who actually wants to
> view the TEI version probably has a better program to view it in; if
> all else fails, Wordpad is about as good a tool as opening it up in
> the browser. When a TEI viewer is written, the browser can be told
> to open it in the viewer, and it shouldn't be that hard to make it a
> browser plugin and view it in browser just like a PDF document.

I'm intrigued with the possibility of using a browser plug-in to view
a full-blown PGTEI document. The plug-in would do something so that
notes placed inline, embedded images, and hypertext links would work
as we want them to.

I know nothing about browser plug-ins, so I'm not certain this is
possible to do. We should not feel compelled to provide omni-browser
support -- Firefox and/or Opera are sufficient as a start. I'm
pessimistic about IE6/7 anyway because of its lack of full CSS
standards support.

An alternative to a plug-in would be to adapt an open-source browser,
like Firefox, to render PGTEI documents. Again, I'm not sure of the
amount of time it would take to adapt the code. Handling links and
embedded images should be fairly straightforward since it may only
require an internal transformation in the DOM, but handling notes
placed inline may be a little more knotty. One would like to yank and
present each note placed inline (regardless of what the attribute type
says) into a separate browser window (in popup fashion) like Microsoft
Reader's pagelet feature. That would be very innovative.

Assuming the OpenReader format gets traction, we plan to provide some
form of native TEI support in the future. Since OpenReader supports
the out-of-spine feature, the ability to yank an inline note and place
it into a popup or elsewhere will already be there.

Jon Noring


From lee at novomail.net  Mon Nov  6 06:57:52 2006
From: lee at novomail.net (Lee Passey)
Date: Mon Nov  6 06:57:21 2006
Subject: [gutvol-d] PG disclaimer obsolete?
In-Reply-To: <1626627843.20061105194620@noring.name>
References: <17742.35541.262189.857767@celery.zuhause.org>
	<1626627843.20061105194620@noring.name>
Message-ID: <454F4D70.1000001@novomail.net>

Jon Noring wrote:

>  I think one has to take a consumer's advocate view of this: Would
>  you, as a consumer:
>
>  1) *prefer* the etext you get is a faithful reproduction of a known
>  source?
>
>  or
>
>  2) Are you content knowing the PG text you read may be:
>
>  a) a composite from several sources, none of which are given, and
>
>  b) that there may have been editing done on the text, the extent of
>  which is not stated, nor do you even know who did the editing or on
>  what basis editing was done.

As a consumer, I frankly don't care, so long as there is no deception. 
That is, I don't want an e-text to be labeled as "a faithful 
reproduction of a known source" unless it is. And this should be true 
whether the label is explicit, as in your example #1 or implicit, as in 
calling a collection a "literary archive," which implies some attempt to 
preserve the state of a document.

In addition to the language referred to by the original poster, the 
standard PG boilerplate also contains language attempting to exercise 
some legal control over the distribution of its documents. This control 
is based on the law of trademark. The sine qua non of trademark law is 
that practices which would tend to cause confusion in the minds of 
consumers as to the origin of a product are prohibited. Thus you cannot 
create silverware and label it "Revere," you cannot fill oil cans with 
oil and label them "Quaker State," and you cannot create an arbitrary 
file called "Alice's Adventures in Wonderland" and label it "Project 
Gutenberg." These prohibitions are not designed so much to protect a 
manufacturer or tradesman from unfair competition as much as they are 
designed to protect consumers from deceptive business practices.

Because PG e-texts are ostensibly in the public domain, Project 
Gutenberg cannot control the use of the text itself, but it /can/ 
control use of the Project Gutenberg trademark, for example, by 
requiring certain disclaimers on an e-text if it is labeled as coming 
from Project Gutenberg. Among other things, the Project Gutenberg 
license requires that if you discover errors in a Project Gutenberg 
e-text, and correct them, you must strip of file of all references to 
Project Gutenberg if you wish to distribute the corrected file.

As Mr. Hart and Mr. Newby have made abundantly clear over the past few 
days, the leadership of Project Gutenberg is committed to avoiding the 
establishment of /any/ standards for Project Gutenberg e-texts. Because 
the use of certain words on the PG web site (such as "literary archive" 
and "guidelines") may imply the existence of a standard I think it is 
important that /every/ PG e-text continue to contain the disclaimer that 
no attempt has been made to assure that any e-text conforms to any 
particular existing version or edition. Indeed, the disclaimer should 
probably be made stronger, more in line with Mr. Noring's option number 
2, above.

Of course, as Project Gutenberg is devoid of standards, there is nothing 
which would prevent a contributer from adding his or her own information 
to any text, in addition to the standard Project Gutenberg boilerplate, 
claiming conformance to a specific edition or making other assurances of 
quality. But only by the proper use of the Project Gutenberg trademark 
and disclaimers, can consumers be assured that a Project Gutenberg 
e-text may contain unintentional or intentional errors or omissions and 
is unreliable for any purpose other than casual reading.
From marcello at perathoner.de  Mon Nov  6 07:00:43 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon Nov  6 07:00:48 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <1110693392.20061106073559@noring.name>
References: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net>	<6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com>
	<1110693392.20061106073559@noring.name>
Message-ID: <454F4E1B.5010108@perathoner.de>

Jon Noring wrote:

> I know nothing about browser plug-ins, so I'm not certain this is
> possible to do. We should not feel compelled to provide omni-browser
> support -- Firefox and/or Opera are sufficient as a start. I'm
> pessimistic about IE6/7 anyway because of its lack of full CSS
> standards support.

A browser plug-in is an independent peace of code that just happens to
display inside the browser window. It does not need any support on
behalf of the browser's rendering engine although it may elect to use some.

To build this on a reasonable cross-platform (IE / FF / Opera) basis
will take a good programmer as little as 3 or 4 years (BB: 2500 years).

As I see no earthly reason why anybody might want his high-end browser
to display the TEI version, when the HTML version displays right
out-of-the-box on every browser, I will invest no time in doing any of this.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jon at noring.name  Mon Nov  6 07:16:46 2006
From: jon at noring.name (Jon Noring)
Date: Mon Nov  6 07:15:01 2006
Subject: [gutvol-d] PG disclaimer obsolete?
In-Reply-To: <454F4D70.1000001@novomail.net>
References: <17742.35541.262189.857767@celery.zuhause.org>
	<1626627843.20061105194620@noring.name> <454F4D70.1000001@novomail.net>
Message-ID: <11554190.20061106081646@noring.name>

Lee Passey wrote:

> Of course, as Project Gutenberg is devoid of standards, there is
> nothing which would prevent a contributer from adding his or her own
> information to any text, in addition to the standard Project
> Gutenberg boilerplate, claiming conformance to a specific edition or
> making other assurances of quality. But only by the proper use of
> the Project Gutenberg trademark and disclaimers, can consumers be
> assured that a Project Gutenberg e-text may contain unintentional or
> intentional errors or omissions and is unreliable for any purpose
> other than casual reading.

Agreed!

Jon Noring

From jon at noring.name  Mon Nov  6 07:25:49 2006
From: jon at noring.name (Jon Noring)
Date: Mon Nov  6 07:24:03 2006
Subject: [gutvol-d] The Proof is in the Poo
In-Reply-To: <454F3B06.3010802@perathoner.de>
References: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net>
	<6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com>
	<454F3B06.3010802@perathoner.de>
Message-ID: <685910051.20061106082549@noring.name>

Marcello wrote:

> I have just ascertained on the TEI list that there is no registered IANA
> MIME type for TEI documents. We thus have following alternatives:
>
>
> text/plain: browser displays doc. User might save and open in app if she
> wants.
>
> application/octet-stream: browser prompts user to execute or save doc.
> There are no native TEI viewers yet, so user will probably open in
> editor, gaining nothing.
>
> application/tei+xml: and don't we all love breaking standards for no
> good reason?

If one wants to "roll their own", maybe for temporary use until one
secures IANA registration, there is another syntax, e.g.,

application/x-pgtei+xml

or simply

application/x-tei+xml

Also, there's a move away from the above to a "dot" kind of structure
for mimetypes. I wrote up a summary of this for the IDPF OCF working
group, and will dig it out if anyone is interested.


Jon


From hart at pglaf.org  Mon Nov  6 11:01:51 2006
From: hart at pglaf.org (Michael Hart)
Date: Mon Nov  6 11:01:53 2006
Subject: [gutvol-d] PG disclaimer obsolete?
In-Reply-To: <17742.35541.262189.857767@celery.zuhause.org>
References: <17742.35541.262189.857767@celery.zuhause.org>
Message-ID: <Pine.LNX.4.60.0611061059230.29377@pglaf.org>


No reason not to indicate that any particular was intended to be
an exact copy of a certain paper edition, though you should be
prepared for various other editions to be added to the mix in
new efforts in the future, so make some kind of statement that
you want at least one eBook to remain faithful to the original.

You can decide yourself if you want [sic] to indicate canononical errors.

Michael S. Hart
Founder
Project Gutenberg

On Sun, 5 Nov 2006, Bruce Albrecht wrote:

> And now for something completely different.
>
> One of the things that has irked me for some time is the following
> paragraph of the disclaimer:
>
> Project Gutenberg-tm eBooks are often created from several printed
> editions, all of which are confirmed as Public Domain in the U.S.
> unless a copyright notice is included.  Thus, we do not necessarily
> keep eBooks in compliance with any particular paper edition.
>
> All of Distributed Proofreader projects where I have been the content
> provider have been from specific editions, and most, if not all of
> them, have HTML editions which preserve page breaks.  Is there any way
> to have this paragraph removed from the boilerplate, at least by
> request of the submitter?
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From Bowerbird at aol.com  Mon Nov  6 12:13:16 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Nov  6 12:13:35 2006
Subject: [gutvol-d] gvd061106 -- the story of the little red hen
Message-ID: <594.41c6177d.3280f15c@aol.com>

keith said:
>   PG has a recomended format! 
>    I would not call it mark-up per-se.

i would call it "zen markup".          :+)

***

marcello said:
>   There are no native TEI viewers yet

"yet"?

you mean none of those 134 .tei projects
has seen fit to create a viewer-program
for their major format?   surprising, eh?

***

jon said:
>    I'm intrigued with the possibility
>    of using a browser plug-in to 
>    view a full-blown PGTEI document.

well, if none of those 134 major projects
can do it for you, remember you can always
use the open-source world for programmers!

***

lee said:
>   As a consumer, I frankly don't care

jon, you can't even get your buddy lee
to say that he cares about your cause...
give it up, dude, it was too old long ago.

and frankly, if "consumers" _did_ object,
then project gutenberg would have never
become the premiere cyberspace library...

at least duguid was able to come up with a

_good_example_.   you have yet to do that...

***

a seasick page:
>    http://www.pgdp.org/ols/tools/display.php?nextpage=111.png&
book=400b970ba22f4

how can a scan be twisted one way at the top,
and another way at the bottom?   that's weird!

***

so, let's ruminate for a little bit on "open-source", ok?

to any dedicated fan of cooperation and collaboration, 
open-source has to be one of the neatest concepts ever.

the idea is that a whole lot of people will come together,
each of them making a small contribution, and that these
small individual contributions cumulate and coalesce into
a bigger, social good to which each contributor has access.

"many hands make light work" is the applicable proverb...

since each contributor has made only a small contribution,
none of them was excessively inconvenienced, yet they now
have use of something none could have created all alone...

given the great benefits against a backdrop of small cost,
it's almost as if people are getting "something for nothing".
(it's more like "something for _almost_ nothing", but still...)

of course, into an idyllic situation like this, you know there
must be an evil presence which will surely introduce itself.

in this case, it's people who make use of the end-product
without contributing anything themselves to its creation...
in a phrase, these people are "taking advantage".

there's many different ways that people who _do_ contribute
can think about people who are "taking advantage" of them...

one might be that you made your contribution as a _gift_ and
therefore don't care if people "take advantage" of what you did.
indeed, you might even _want_ people to "take advantage" of it.
this is generally the orientation of most open-source software...

a second reaction might be that you try to prevent other people
from "taking advantage" of the social good, by controlling access.
public laws against homeless people sleeping in a park are just
one example of this approach.   (otherwise we'd _all_ sleep there!)
this approach doesn't really work too well with software, however,
because if you can get access to a program, then you can use it...

a third reaction might be that you just don't care about the people
who are "taking advantage" of the social good without contributing.
like a shopowner who knows that there's gonna be _some_amount_
of shoplifting s/he can't prevent, and therefore simply _accepts_ it.
in terms of software, this approach is probably like "donation-ware",
where the programmer _asks_ for a "donation", but accepts the fact
that most people are gonna use the program without paying for it...

other attitudes can be adopted about people who "take advantage
without contributing", but you get the point -- it's an open issue...

for another take on the matter, consider the story of the little red hen.
i've appended it to this e-mail.   you can find it as p.g. e-text #18735.

as a brief recap, the little red hen has the idea to bake some bread.

she goes around to all the other animals to solicit their help in
the various tasks necessary to the process of baking the bread,
including the growing and harvesting of the wheat, and so on...

at every stage, all of the other animals are "too busy" to help out.

but once the bread is baked, all the other animals want to eat it...

but the little red hen says "nope", because they didn't help her with
any of the intermediate steps.   they just wanted the end-benefit of
_eating_ the bread, without paying any of the cost of _producing_ it.

sometimes, from some people here, i feel they think of "open-source"
as a magical way they can get other people to do their work for them.

like the other animals in the story of the little red hen, they don't want to
do any of the work to _produce_ the social good, they just want to use it
after somebody else has done all the work to create it.

moreover, these people don't seem to understand that a key concept 
of open-source is that _many_people_contribute_, with none of them
required to do more than a little bit.

if it's _one_ person, doing all the work, that's a different phenomenon.

and if the other people who are taking advantage of that one person
are using his/her work in direct opposition to the _core_philosophy_
of the person who has done the work, then that's _entirely_different_.

so, like the little red hen, i've decided i'm not going to let those people
"take advantage" of my work.

***

thus, until i get a little bit of _help_, i've decided not to share
the _code_ that i'm writing.   i'll still let y'all see the _output_
-- because, after all, as you'll remember, it's pudding-time --
but if you want to eat the bread you'll have to help make it...
(i guess that would make this _bread_pudding_, which i love!)

the last assignment in our "babelfish" open-source project
-- hey, we've got the biblical "bread and fishes" going now! --
was to generalize the page-writing code to produce any page.
for an example, output page 83.

you can see the results of this code by running the new script:
>?? http://www.greatamericannovel.com/scgi-bin/babelfish11.pl

ok, nothing remarkable there in the output that we didn't
see from babelfish10.pl.   but at least the underlying code
is more versatile now...

***

so now that we've got general code that can produce any page,
we're able to take the project to the next level, and make our
script start to behave like something that deserves the label of
"an electronic-book viewer-program".

specifically, we need to have the script navigate the pages of
the e-book, based on feedback it collects from the reader...

from any one page, there are a number of pages to which
the reader will typically want to navigate.   these are:
1.   the next page.
2.   the previous page.
3.   the page prior to this one where a chapter starts.
4.   the page following this one where a chapter starts.
5.   the table-of-contents page.
6.   other table-of-contents pages, if there are others.
7.   auxiliary views on the page, if there are any.

in addition, of course, you'll want to give the user the ability
to enter _any_ pagenumber and jump directly to that page...

assignment:   let the user determine what page will be displayed.

you can see the results of this code by running the new script:
>?? http://www.greatamericannovel.com/scgi-bin/babelfish12.pl

basically, because we had to produce an "html form",
to collect responses from the end-user, this is a rather
significant addition to the code we've produced before.

still, it only added about 100 lines of code to the 100 lines
that we had thus far, so we're still under 200 lines _total_...

but yet we've got an impressive e-book engine already.
indeed, the only major shortcoming is the absence of a
book-wide "search" capability (and that's coming soon.)

and again, we produced it from a _plain-text_.zml_file_,
with a perl script that was composed by a perl _beginner_
-- one who admits that his code is "hacked together" and
who has been described as "dangerous" by david desrod.
(if you only knew, david, if you only knew...)

notice as well that -- when you use the buttons on top --
the script generates these pages _on-the-fly_, meaning
you don't even have to _store_ them if you don't want to.

of course, to make sure they get in the search engines,
you'll probably _want_ to have them exist as static pages,
at least until they get crawled.   but you wouldn't _have_to_
store them, if you didn't want to, as you can generate 'em.

(the error-reporting feature on these pages puts another
wrinkle in the dynamic/static decision, but it's not one that
can't be dealt with, if you decide to go the dynamic route.)

***

anyway, to recap, this is what a crude white-hat "hacker"
can do if you hand him a dirt-simple plain-text format...

***

since marcello has been especially contemptuous of my
"mad perl coding skillz", i invite you to compare my hack
with his online viewer-program for this "my antonia" book:

>    http://www.gutenberg.org/etext/242

click on "read online", at the top of the page.   as usual,
i like to jump to p.123, to get in the middle of a book.
when i did that, and chose "next page" for page 124,
i came to a page that had a major header -- the one for
"book 4" -- at the very bottom of the page.   kinda funny.
and the headers aren't big or bold like headers should be.
and of course there's no table-of-contents, or _anything_
that gives users a notion about the structure of the book.

and what _is_ the deal with this "pagination" anyway?
first of all, it doesn't match any existing pagination, so
what good is it really?   all it does is make it impossible
to search the entire book for a keyword.   the reason to
chunk a book into pages is to avoid scrolling, but these
"pages" have so much text that you must scroll anyway.
so this is the worst of both worlds.

i dunno, marcello, seems to me like you don't have
much ground to stand on in terms of being critical.

***

anyway, more pudding later.   but if you want to see
the secret recipes, you'll need to start collaborating!

-bowerbird

p.s.   here's .zml of the little red hen, from p.g. #18735:


 The Little Red Hen


 by Florence White Williams


A Little Red Hen lived in a barnyard. She spent almost all of her
time walking about the barnyard in her picketty-pecketty fashion,
scratching everywhere for worms.

She dearly loved fat, delicious worms and felt they were absolutely
necessary to the health of her children. As often as she found a
worm she would call "Chuck-chuck-chuck!" to her chickies.

When they were gathered about her, she would distribute choice
morsels of her tid-bit. A busy little body was she!

A cat usually napped lazily in the barn door, not even bothering
herself to scare the rat who ran here and there as he pleased.
And as for the pig who lived in the sty -- he did not care what
happened so long as he could eat and grow fat.

One day the Little Red Hen found a Seed. It was a Wheat Seed, but
the Little Red Hen was so accustomed to bugs and worms that she
supposed this to be some new and perhaps very delicious kind
of meat. She bit it gently and found that it resembled a worm in
no way whatsoever as to taste although because it was long and
slender, a Little Red Hen might easily be fooled by its appearance.

Carrying it about, she made many inquiries as to what it might be.
She found it was a Wheat Seed and that, if planted, it would grow
up and when ripe it could be made into flour and then into bread.

When she discovered that, she knew it ought to be planted. She
was so busy hunting food for herself and her family that, naturally,
she thought she ought not to take time to plant it.

So she thought of the Pig -- upon whom time must hang heavily
and of the Cat who had nothing to do, and of the great fat Rat
with his idle hours, and she called loudly:

"Who will plant the Seed?"

But the Pig said, "Not I," and the Cat said, "Not I," and the Rat
said, "Not I."

"Well, then," said the Little Red Hen, "I will."

And she did.

Then she went on with her daily duties through the long summer
days, scratching for worms and feeding her chicks, while the Pig
grew fat, and the Cat grew fat, and the Rat grew fat, and the
Wheat grew tall and ready for harvest.

So one day the Little Red Hen chanced to notice how large the
Wheat was and that the grain was ripe, so she ran about calling
briskly: "Who will cut the Wheat?"

The Pig said, "Not I," the Cat said, "Not I," and the Rat said, "Not I."

"Well, then," said the Little Red Hen, "I will."

And she did.

She got the sickle from among the farmer's tools in the barn
and proceeded to cut off all of the big plant of Wheat.

On the ground lay the nicely cut Wheat, ready to be gathered
and threshed, but the newest and yellowest and downiest of
Mrs. Hen's chicks set up a "peep-peep-peeping" in their most
vigorous fashion, proclaiming to the world at large, but most
particularly to their mother, that she was neglecting them.

Poor Little Red Hen! She felt quite bewildered and hardly knew
where to turn.

Her attention was sorely divided between her duty to her children
and her duty to the Wheat, for which she felt responsible.

So, again, in a very hopeful tone, she called out, "Who will thresh
the Wheat?"

But the Pig, with a grunt, said, "Not I,"
and the Cat, with a meow, said, "Not I," 
and the Rat, with a squeak, said, "Not I."

So the Little Red Hen, looking, it must be admitted, rather
discouraged, said, "Well, I will, then."

And she did.

Of course, she had to feed her babies first, though, and when
she had gotten them all to sleep for their afternoon nap, she
went out and threshed the Wheat. Then she called out: "Who
will carry the Wheat to the mill to be ground?"

Turning their backs with snippy glee, that Pig said, "Not I,"
and that Cat said, "Not I," and that Rat said, "Not I."

So the good Little Red Hen could do nothing but say, "I will then."

And she did.

Carrying the sack of Wheat, she trudged off to the distant mill.
There she ordered the Wheat ground into beautiful white flour.
When the miller brought her the flour she walked slowly back all
the way to her own barnyard in her own picketty-pecketty fashion.

She even managed, in spite of her load, to catch a nice juicy worm
now and then and had one left for the babies when she reached
them. Those cunning little fluff-balls were _so_ glad to see their
mother. For the first time, they really appreciated her.

After this really strenuous day Mrs. Hen retired to her slumbers
earlier than usual -- indeed, before the colors came into the sky
to herald the setting of the sun, her usual bedtime hour.

She would have liked to sleep late in the morning, but her chicks,
joining in the morning chorus of the hen yard, drove away all
hopes of such a luxury.

Even as she sleepily half opened one eye, the thought came to
her that today that Wheat must, somehow, be made into bread.

She was not in the habit of making bread, although, of course,
anyone can make it if he or she follows the recipe with care, and
she knew perfectly well that she could do it if necessary.

So after her children were fed and made sweet and fresh for the
day, she hunted up the Pig, the Cat and the Rat.

Still confident that they would surely help her some day she sang
out, "Who will make the bread?"

Alas for the Little Red Hen! Once more her hopes were dashed!
For the Pig said, "Not I," the Cat said, "Not I," and the Rat said, "Not I."

So the Little Red Hen said once more, "I will then."

And she did.

Feeling that she might have known all the time that she would have
to do it all herself, she went and put on a fresh apron and spotless
cook's cap. First of all she set the dough, as was proper. When it
was time she brought out the moulding board and the baking tins,
moulded the bread, divided it into loaves, and put them in the oven
to bake. 

All the while the Cat sat lazily by, giggling and chuckling.

And close at hand the vain Rat powdered his nose and admired
himself in a mirror.

In the distance could be heard the long-drawn snores of the
dozing Pig.

At last the great moment arrived. A delicious odor was wafted
upon the autumn breeze. Everywhere the barnyard citizens
sniffed the air with delight.

The Red Hen ambled in her picketty-pecketty way toward the
source of all this excitement.

Although she appeared to be perfectly calm, in reality she
could only with difficulty restrain an impulse to dance and sing,
for had she not done all the work on this wonderful bread?

Small wonder that she was the most excited person in the barnyard!

She did not know whether the bread would be fit to eat, but --
joy of joys! -- when the lovely brown loaves came out of the oven,
they were done to perfection.

Then, probably because she had acquired the habit, the Red Hen
called: "Who will eat the Bread?"

All the animals in the barnyard were watching hungrily and
smacking their lips in anticipation, and the Pig said, "I will,"
the Cat said, "I will," the Rat said, "I will."

But the Little Red Hen said,

"No, you won't. I will."

And she did.

 --- the end ---
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061106/101ff2b3/attachment-0001.html
From jon at noring.name  Mon Nov  6 12:38:26 2006
From: jon at noring.name (Jon Noring)
Date: Mon Nov  6 12:42:46 2006
Subject: [gutvol-d] PG disclaimer obsolete?
In-Reply-To: <Pine.LNX.4.60.0611061059230.29377@pglaf.org>
References: <17742.35541.262189.857767@celery.zuhause.org>
	<Pine.LNX.4.60.0611061059230.29377@pglaf.org>
Message-ID: <558816798.20061106133826@noring.name>

Michael wrote:

> No reason not to indicate that any particular was intended to be
> an exact copy of a certain paper edition, though you should be
> prepared for various other editions to be added to the mix in
> new efforts in the future, so make some kind of statement that
> you want at least one eBook to remain faithful to the original.

Well, this is great! I hope that texts submitted which fall under the
category of "faithful/accurate to a particular source edition which is
given in the text" will have stricter requirements as to updating/
editing, namely that edits will only be to correct transcription errors
with reference to the original source (hopefully the scans will be
around to "backup" the digital version, almost like gold backing up
paper money, if one wants some weak analogy.)

Certainly if someone wants to produce a derivative edition from the
source text, such as correcting errors in the original source, that's
fine, too, so long as it is identified as such. In fact, my OpenReader
version of "My Antonia" includes corrections of errors in the original
first edition (the error corrections come from both Jose Menendez and
from the Cather Project):

http://www.openreader.org/myantonia/My_Antonia_OpenReader_1.0_03-Oct-2006.zip

In this version, I go through pains in the metadata to state the general
nature of the changes, and that the original authentic source is available
which includes markup indicating the exact errors found and what the
corrections should be. A future OpenReader version will include both texts
in the distribution (as two "user sets") so the reader can choose either
the more scholarly, accurate-to-the-1st-edition version (including errors
which are highlighted), or the "corrected" edition for casual reading.

It's all a matter of identifying for each text what it exactly is, not
what it may be. I have no problem with derivative/composite texts so
long as they describe what they are and hopefully state what sources were
used in their production, and some general comments of what changes were
made.

The argument that many books from major publishers are themselves some
sort of edited versions of older works and that they as a rule don't
state much about the sources or what they did, thus we don't have to do
the same is a poor argument. Do we really want PG to emulate the
thinking of the "top down" companies (as Michael would describe them)?
Shouldn't PG be better than commercial publishers and record right in
the text whatever metadata it can about the source(s), transcription,
and editing process?

After all, it is easy to include that data in the text, so why not?
Certainly PG has a point in not overly burdening those who transcribe
texts and submit them -- on the other hand, a few *recommendations*,
not requirements, which don't overly burden the volunteers is not bad,
either -- simply state why it is a good idea to do this and that, and
most of the volunteers will be happy to oblige. In fact, many
volunteers will appreciate that PG cares about quality.

Jon

From jon at noring.name  Mon Nov  6 12:47:37 2006
From: jon at noring.name (Jon Noring)
Date: Mon Nov  6 12:45:49 2006
Subject: [gutvol-d] gvd061106 -- the story of the little red hen
In-Reply-To: <594.41c6177d.3280f15c@aol.com>
References: <594.41c6177d.3280f15c@aol.com>
Message-ID: <1009313713.20061106134737@noring.name>

Bowerbird wrote:
> Lee said:

>>?As a consumer, I frankly don't care

>  jon, you can't even get your buddy lee
>  to say that he cares about your cause...
>  give it up, dude, it was too old long ago.
>  
>  and frankly, if "consumers" _did_ object,
>  then project gutenberg would have never
>  become the premiere cyberspace library...
>  
>  at least duguid was able to come up with a
>  _good_example_.?  you have yet to do that...

LOL, well, I guess you don't know Lee as I do. Reread the last
paragraph in his message -- it is brilliant. Here, I'll repeat what
Lee wrote:

> Of course, as Project Gutenberg is devoid of standards, there is
> nothing which would prevent a contributer from adding his or her own
> information to any text, in addition to the standard Project
> Gutenberg boilerplate, claiming conformance to a specific edition or
> making other assurances of quality. But only by the proper use of
> the Project Gutenberg trademark and disclaimers, can consumers be
> assured that a Project Gutenberg e-text may contain unintentional or
> intentional errors or omissions and is unreliable for any purpose
> other than casual reading.

What more can I say about this except that I agree!

Jon

From jon at noring.name  Mon Nov  6 13:05:58 2006
From: jon at noring.name (Jon Noring)
Date: Mon Nov  6 13:04:09 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <1009313713.20061106134737@noring.name>
References: <594.41c6177d.3280f15c@aol.com>
	<1009313713.20061106134737@noring.name>
Message-ID: <1582116939.20061106140558@noring.name>

I previously wrote:
> Lee wrote:

>> Of course, as Project Gutenberg is devoid of standards, there is
>> nothing which would prevent a contributer from adding his or her own
>> information to any text, in addition to the standard Project
>> Gutenberg boilerplate, claiming conformance to a specific edition or
>> making other assurances of quality. But only by the proper use of
>> the Project Gutenberg trademark and disclaimers, can consumers be
>> assured that a Project Gutenberg e-text may contain unintentional or
>> intentional errors or omissions and is unreliable for any purpose
>> other than casual reading.

> What more can I say about this except that I agree!

Hmmm, I guess I should have also asked the obvious questions:

Is everybody cool with Lee's conclusion?

Should PG's collection become known as the collection that can't be
relied upon for anything but casual reading?

Jon

From Bowerbird at aol.com  Mon Nov  6 13:05:41 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Nov  6 13:05:48 2006
Subject: [gutvol-d] gvd061106 -- the story of the little red hen
Message-ID: <bd7.876a4dc.3280fda5@aol.com>

jon quoted lee, who said:
>    a Project Gutenberg e-text may contain unintentional
>    or intentional errors or omissions and is unreliable for 
>    any purpose other than casual reading.

probably true of almost every book in almost every bookstore.

nobody seems to mind.   none of us is perfect either.

and since michael's only target was "casual reading", it's fine...

***

but more to the point, out of that whole post, _this_ was the
only thing to which you responded?   that makes me smile, jon.
i assure you that there is more there for you to think about...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061106/f5972a03/attachment.html
From Bowerbird at aol.com  Mon Nov  6 13:14:41 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Nov  6 13:14:53 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
Message-ID: <c6b.45424f4.3280ffc1@aol.com>

jon said:
>   Is everybody cool with Lee's conclusion?

i think when you are the top dog,
there is no shortage of mutts who
will try to take a cheap shot at you.

if you can't be "cool" toward that,
you don't deserve to be top dog...

***

having said all that, i'm in favor of
doing a clean-up of all the e-texts.

but distributed proofreaders seems
more inclined to produce new books
than clean up all of the older e-texts
(including ones that _they_ produced).

since d.p. lays loud claim to now being 
the biggest digitizer in the land of p.g.,
i'd say the burden is on their shoulders.

if they will not do the job, why should i?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061106/dde5fe80/attachment.html
From jon at noring.name  Mon Nov  6 13:19:27 2006
From: jon at noring.name (Jon Noring)
Date: Mon Nov  6 13:17:40 2006
Subject: [gutvol-d] gvd061106 -- the story of the little red hen
In-Reply-To: <bd7.876a4dc.3280fda5@aol.com>
References: <bd7.876a4dc.3280fda5@aol.com>
Message-ID: <168768566.20061106141927@noring.name>

Bowerbird wrote:
> jon quoted lee, who said:

>>??  a Project Gutenberg e-text may contain unintentional
>>??  or intentional errors or omissions and is unreliable for
>>??  any purpose other than casual reading.

>  probably true of almost every book in almost every bookstore.

Yes, but we are not talking about paper books sold by commercial
interests. We are talking about the PG collection which has completely
different goals.


>  nobody seems to mind.?  none of us is perfect either.

Who is 'nobody'? Many people do mind, and for good reasons. Most of
those people are not here in gutvol-d.


>  and since michael's only target was "casual reading", it's fine...

That's not his vision. His vision is to get the books to every person
on every continent in the world. A big vision like this also requires
great responsibility with respect to the integrity of the texts.


>  but more to the point, out of that whole post, _this_ was the
>  only thing to which you responded??  that makes me smile, jon.
>  i assure you that there is more there for you to think about...

You seem to be happy with Lee's conclusion in his last paragraph, am
I right?

Jon

From jon at noring.name  Mon Nov  6 13:21:45 2006
From: jon at noring.name (Jon Noring)
Date: Mon Nov  6 13:19:57 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <c6b.45424f4.3280ffc1@aol.com>
References: <c6b.45424f4.3280ffc1@aol.com>
Message-ID: <456909141.20061106142145@noring.name>

Bowerbird wrote:
> jon said:

>>?? Is everybody cool with Lee's conclusion?

>  i think when you are the top dog,
>  there is no shortage of mutts who
>  will try to take a cheap shot at you.
>  
>  if you can't be "cool" toward that,
>  you don't deserve to be top dog...

Again, do you accept Lee's conclusion? If not, with which point do you
disagree?

Jon

From Bowerbird at aol.com  Mon Nov  6 14:59:39 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Nov  6 14:59:58 2006
Subject: [gutvol-d] gvd061106 -- the story of the little red hen
Message-ID: <37e.7163b54b.3281185b@aol.com>

jon said:
>    we are not talking about paper books 
>    sold by commercial interests.

no, we're talking about e-books that are
created by volunteers for cost-free use
by anyone who can get a hold of them...

this cost-free use even extends to people
like me -- and you, presumably -- who
wants to improve them in various ways...

the message i have for those volunteers
-- and thus the main reason i am here --
is that there is an easier way for _them_ to
do the work that _they_ do.   i am trying to
_save_ them some of their time and energy.

the message you seem to want to deliver is
that they are doing things the "wrong" way,
and you want them to do it differently, and
expend even _more_ time and energy to
do it the way that _you'd_ prefer they do it...

at any rate...

i am gonna do the work -- myself! --
to convert the e-texts to .zml format...

i suggest you should do the work yourself
in bringing the e-texts up to the standard
that _you_ seem to think they should have.

after all, p.g. has done all the _hard_ work of
actually _digitizing_ the books, so all you have
to do now is make sure they actually conform
to some hard-copy paper version out there...

take it and run with it, jon...


>    Who is 'nobody'? 

00.01% of the population.   (1 out of 10,000.)

(and heck, most people out there don't give a
tinker's damn about books in the first place...)


>    His vision is to get the books to every person
>    on every continent in the world. A big vision 
>    like this also requires great responsibility 
>    with respect to the integrity of the texts.

i'm happy to let _michael_ define his "responsibilities".

more to the point, i don't define "integrity to the text"
as a slavish reproduction of some paper-copy version.

indeed, i think you're a bit nuts for even suggesting it,
since publishers routinely rip authorial intent to shreds.

and moreover, since -- thanks to google -- the scans
_will_ be available for any p-book that we might want,
we can always ascertain what that p-book looked like,
if we need to.   so i feel no call to slavishly reproduce it;
my interest is maximizing the usefulness of the e-book.


>    You seem to be happy with Lee's conclusion 
>    in his last paragraph, am I right?

in case i wasn't perfectly clear, i think it's a cheap shot.

so it doesn't surprise me one bit you'd repeat it.   twice...

further, i expect to see your friend david rothman echo it
over on his teleblahg within the next day or two...

-bowerbird

p.s.   and yes, folks, i know jon is trying to divert attention
from babelfish12, but my pudding ain't going away now...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061106/5954b0fe/attachment-0001.html
From Catenacci at Ieee.Org  Mon Nov  6 15:26:18 2006
From: Catenacci at Ieee.Org (Onorio Catenacci)
Date: Mon Nov  6 15:32:35 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <456909141.20061106142145@noring.name>
References: <c6b.45424f4.3280ffc1@aol.com>
	<456909141.20061106142145@noring.name>
Message-ID: <c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>

On 11/6/06, Jon Noring <jon@noring.name> wrote:
> Bowerbird wrote:
> > jon said:
>
> >> Is everybody cool with Lee's conclusion?
>
> >  i think when you are the top dog,
> >  there is no shortage of mutts who
> >  will try to take a cheap shot at you.
> >
> >  if you can't be "cool" toward that,
> >  you don't deserve to be top dog...
>
> Again, do you accept Lee's conclusion? If not, with which point do you
> disagree?

Jon,

Just to be sure I'm clear: "casual reading" implies good enough to
read and follow but not good enough for a scholarly dissertation?  :-)
 I may be off-base but it seems to me that PG is much like Wikipedia;
that is, take it for a good place to start but if technical accuracy
is a major concern, double check with other sources.  I guess I'm
saying, yes, I accept Lee's conclusion.


-- 
Onorio
From Bowerbird at aol.com  Tue Nov  7 00:49:19 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Nov  7 00:49:28 2006
Subject: [gutvol-d] gvd061107 -- i got a little bit ahead of myself
Message-ID: <3bd.1145eab1.3281a28f@aol.com>

oops!   i got a little bit ahead of myself in yesterday's post...

i said:
>    from any one page, there are a number of pages to which
>    the reader will typically want to navigate.   these are:
>    1.   the next page.
>    2.   the previous page.
>    3.   the page prior to this one where a chapter starts.
>    4.   the page following this one where a chapter starts.
>    5.   the table-of-contents page.
>    6.   other table-of-contents pages, if there are others.
>    7.   auxiliary views on the page, if there are any.

i also said:
>    notice as well that -- when you use the buttons on top --
>    the script generates these pages _on-the-fly_, meaning
>    you don't even have to _store_ them if you don't want to.

while all that is true, the actual code that _implements_ those
particular features and which thus forms the solid core of our
e-book engine wasn't there yesterday.   it's in _today's_ script.

you can see the results of this code by running the new script:
>?? http://www.greatamericannovel.com/scgi-bin/babelfish14.pl

_now_ you can compare this script with marcello's online reader.

***

oh, marcello, i'm sure you'll be _happy_ to know that the .html
for at least _some_ of these pages actually _validate_!   oh wow!

so ok, kids, let's put those famous open-source eyeballs to work.

review all the pages, to see which ones are rendered incorrectly...

and marcello, for extra credit, you can tell us about any pages that
_fail_ validation!   we'll be sure to give those pages extra attention!

***

that's it for today, since i wanna make sure you have enough time
to rouse all those evil right-wing republican assholes out of office...

-bowerbird

p.s.   the term "assholes" is _not_ meant to disparage those evangelicals
who have had affairs with gay "escorts" and buy-but-do-not-use meth,
as some of my best friends (fortunately) are gay, and some of my other
best friends (unfortunately) are tweakers.   (i like drugs, but meth is bad.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061107/e989aaac/attachment.html
From Bowerbird at aol.com  Tue Nov  7 09:20:05 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Nov  7 09:20:10 2006
Subject: [gutvol-d] gvd061106 -- the story of the little red hen
Message-ID: <c3b.6292001.32821a45@aol.com>

i said:
>   further, i expect to see your friend david rothman echo it
>    over on his teleblahg within the next day or two...

bingo.

>    http://www.teleread.org/blog/?p=5757

-bowerbird

p.s.   of course, david might have done it just to yank my chain.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061107/bf7b48b9/attachment.html
From jon at noring.name  Tue Nov  7 11:28:34 2006
From: jon at noring.name (Jon Noring)
Date: Tue Nov  7 11:26:48 2006
Subject: [gutvol-d] gvd061106 -- the story of the little red hen
In-Reply-To: <c3b.6292001.32821a45@aol.com>
References: <c3b.6292001.32821a45@aol.com>
Message-ID: <125222089.20061107122834@noring.name>

Bowerbird wrote:
> Bowerbird wrote:

>>?further, i expect to see your friend david rothman echo it
>>?over on his teleblahg within the next day or two...

>  bingo.
>
>??  http://www.teleread.org/blog/?p=5757
>  
>  -bowerbird
>  
>  p.s.?  of course, david might have done it just to yank my chain.

Nope, he did it because Lee's comment was excellent. We, frankly,
don't have much time these days to try to yank your chain -- we have
more important fish to fry and pudding to make. Maybe when things
settle down we will return to make your life miserable. <laugh/>

Jon

From Bowerbird at aol.com  Tue Nov  7 12:16:49 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Nov  7 12:16:54 2006
Subject: [gutvol-d] gvd061106 -- the story of the little red hen
Message-ID: <ced.13e34ee.328243b1@aol.com>

jon said:
>    Nope, he did it because Lee's comment was excellent.

well, it was an excellent cheap shot, i will give you that...

but it's good to know your position on project gutenberg.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061107/4dc61164/attachment.html
From lee at novomail.net  Tue Nov  7 15:20:44 2006
From: lee at novomail.net (Lee Passey)
Date: Tue Nov  7 15:19:13 2006
Subject: [gutvol-d] The interesting comment by Lee and what the
	PG	collection will become known for...
In-Reply-To: <c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>
References: <c6b.45424f4.3280ffc1@aol.com>	<456909141.20061106142145@noring.name>
	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>
Message-ID: <455114CC.3060203@novomail.net>

Onorio Catenacci wrote:

[snip]

> Just to be sure I'm clear: "casual reading" implies good enough to
> read and follow but not good enough for a scholarly dissertation?  :-)

This is not a bad summation of my position, although, as usual, the 
devil is in the details. You have posited two extremes: 1. good enough 
to read and follow (to which I would add "given a modicum of effort") 
and 2. good enough for a scholarly dissertation.

Now I would agree that the vast majority of Project Gutenberg e-texts 
are probably good enough to read and follow given a modicum of effort. 
And I suspect that you would agree that the vast majority (and perhaps 
even the totality) of Project Gutenberg e-texts are inadequate for a 
scholarly dissertation. But what about those situations which fall 
between the extremes?

The fundamental problem is that Project Gutenberg is totally lacking in 
standards, so it is impossible to judge how well any given e-text 
matches any given use. And we probably don't have enough data to 
determine how well Project Gutenberg e-texts, on the whole, satisfy any 
external standards. But personally, I find that generally Project 
Gutenberg e-texts 1. are inadequate for a scholarly dissertation; AND 2. 
are inadequate for assigned reading at a high-school level; AND 3. are 
inadequate for inclusion in any public or school library; AND 4. are 
inadequate for any type of automated data processing; AND 5. are mostly 
inadequate for /effortless/ reading.

I suspect that there /may/ be some gems in the PG corpus which are 
adequate for any or all of the above uses, but again, because Project 
Gutenberg has no standards, it is virtually impossible to identify those 
e-texts except on a case-by-case basis. Given the lack of indications of 
quality, I must accept as a default position that Project Gutenberg 
e-texts are good enough to read and follow by a human being (and not a 
computer) given a modicum of effort; but nothing more.

-- 
Nothing of significance below this line.

From Bowerbird at aol.com  Tue Nov  7 15:48:04 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Nov  7 15:48:15 2006
Subject: [gutvol-d] let's take stock
Message-ID: <ca8.2382477.32827534@aol.com>

p.g. e-texts are good enough to sell to libraries.

how do i know this?

because a company called "netlibrary" is already
making sales of various p.g. e-texts to libraries.

jon and lee and rothman know this too, and they 
want a piece of that action, so they have to find 
a way to make p.g. e-texts look bad.

and that's what you're seeing here...

they can't point to many badly-flawed e-texts,
but it doesn't matter if they stick to honest facts;
the only thing that counts is if they can make up
a story that is "believable" enough to fool some
"investors" into forking over some cash to them,
so that they can "digitize major books correctly".

the story they are spinning to the investors is that
they will be able to displace the netlibrary sales by
positioning their e-books as "faithful to the original",
while condemning p.g. e-texts as "ridden with errors",
which of course is going to be anathema to librarians.

again, it's a cheap shot, so they can line their pockets...

the sad thing is, they probably _will_ find some suckers.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061107/73b620ce/attachment.html
From donovan at abs.net  Tue Nov  7 16:40:06 2006
From: donovan at abs.net (D Garcia)
Date: Tue Nov  7 16:40:30 2006
Subject: [gutvol-d] The interesting comment by Lee and what the
	=?iso-8859-1?q?PG=09collection_will_become_known?= for...
In-Reply-To: <455114CC.3060203@novomail.net>
References: <c6b.45424f4.3280ffc1@aol.com>
	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>
	<455114CC.3060203@novomail.net>
Message-ID: <200611071940.06539.donovan@abs.net>

On Tuesday 07 November 2006 06:20 pm, Lee Passey wrote:
[heinous snippage]
> But personally, I find that generally Project
> Gutenberg e-texts 1. are inadequate for a scholarly dissertation; AND 2.
> are inadequate for assigned reading at a high-school level; AND 3. are
> inadequate for inclusion in any public or school library; AND 4. are
> inadequate for any type of automated data processing; AND 5. are mostly
> inadequate for /effortless/ reading.

While some of the above may be true for some of the PG texts, you're using an 
awfully broad brush there. See below.

> I suspect that there /may/ be some gems in the PG corpus which are
> adequate for any or all of the above uses, but again, because Project
> Gutenberg has no standards, it is virtually impossible to identify those
> e-texts except on a case-by-case basis. Given the lack of indications of
> quality, I must accept as a default position that Project Gutenberg
> e-texts are good enough to read and follow by a human being (and not a
> computer) given a modicum of effort; but nothing more.

Project Gutenberg does have standards (albeit minimal), but PG is not the 
source of the texts. The volunteers are. Distributed Proofeaders is a major 
(perhaps "the major") source of PG texts, and DP has standards. PG benefits 
from these standards, in spite of their essentially passive role as 
repository. I'm even willing to bet that many (or even most) of the PG texts 
produced by DP are on average the best of the lot. (Except for the ones that 
aren't.) *grin*

From vze3rknp at verizon.net  Tue Nov  7 16:47:04 2006
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Tue Nov  7 16:49:18 2006
Subject: [gutvol-d] The interesting comment by Lee and what the	PG
	collection will become known for...
In-Reply-To: <455114CC.3060203@novomail.net>
References: <c6b.45424f4.3280ffc1@aol.com>
	<456909141.20061106142145@noring.name>
	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>
	<455114CC.3060203@novomail.net>
Message-ID: <45512908.9070805@verizon.net>

Lee Passey wrote:

> The fundamental problem is that Project Gutenberg is totally lacking 
> in standards, so it is impossible to judge how well any given e-text 
> matches any given use. And we probably don't have enough data to 
> determine how well Project Gutenberg e-texts, on the whole, satisfy 
> any external standards. But personally, I find that generally Project 
> Gutenberg e-texts 1. are inadequate for a scholarly dissertation; AND 
> 2. are inadequate for assigned reading at a high-school level; AND 3. 
> are inadequate for inclusion in any public or school library; AND 4. 
> are inadequate for any type of automated data processing; AND 5. are 
> mostly inadequate for /effortless/ reading.

The accuracy of the text being produced by Distributed Proofreaders is 
now very high. At least as good as that of the original printers and 
usually better. Almost all DP books are produced in nicely laid out, 
html formats, many of them xhtml 1.0 strict, which I'm told can be 
easily turned into XML for those who need that for scholarly uses. 
Whether the editions prepared by DP are the ones that would be chosen 
for a scholarly dissertation is another question, but all edition 
information that appears in the book itself is included so that the 
reader can make that judgement. The early DP texts were not up to the 
current standards, but are reasonably error free. PG's standards may not 
be explicitly understood by outsiders (they do exist, as enforced by the 
white-washers, but they can be quite flexible in many areas) but DP's 
are reasonably clearly stated in the documentation associated with the site.

It disturbs me to see people continuing to bash PG for the quality of 
its earlier works without acknowledging that more recent material is 
significantly better. Two examples, chosen from ones posted to PG (from 
DP) in the last couple of days are The Cliff Ruins of Canyon de Chelly, 
Arizona by Cosmos Mindeleff <http://www.gutenberg.org/etext/19723> from 
the series of publications produced by the Bureau of American Ethnology 
of the Smithsonian and The History of Rome, Books 01 to 08, by Titus 
Livius <http://www.gutenberg.org/etext/19725>. Whether or not this 
translation of Titus Livius is the "best" one, certainly the quality of 
these ebooks in terms of transcription quality, presentation, etc. is 
easily good enough for the purposes you cite.

Juliet Sutherland
Distributed Proofreaders

From jon at noring.name  Tue Nov  7 17:07:51 2006
From: jon at noring.name (Jon Noring)
Date: Tue Nov  7 17:06:08 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <45512908.9070805@verizon.net>
References: <c6b.45424f4.3280ffc1@aol.com>
	<456909141.20061106142145@noring.name>
	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>
	<455114CC.3060203@novomail.net> <45512908.9070805@verizon.net>
Message-ID: <387594685.20061107180751@noring.name>

Juliet wrote:

> It disturbs me to see people continuing to bash PG for the quality of 
> its earlier works without acknowledging that more recent material is 
> significantly better. Two examples, chosen from ones posted to PG (from
> DP) in the last couple of days are The Cliff Ruins of Canyon de Chelly,
> Arizona by Cosmos Mindeleff <http://www.gutenberg.org/etext/19723> from
> the series of publications produced by the Bureau of American Ethnology
> of the Smithsonian and The History of Rome, Books 01 to 08, by Titus 
> Livius <http://www.gutenberg.org/etext/19725>. Whether or not this 
> translation of Titus Livius is the "best" one, certainly the quality of
> these ebooks in terms of transcription quality, presentation, etc. is 
> easily good enough for the purposes you cite.

The problem is that, as the saying goes, a rotten apple spoils the
whole barrel.

We know the quality of the DP product is very good, while PG itself
admits the loosey-goosey nature of the pre-DP corpus. So once the DP stuff
goes into the same barrel, it gets the PG label and boilerplate.

Jon


From jon at noring.name  Tue Nov  7 17:32:51 2006
From: jon at noring.name (Jon Noring)
Date: Tue Nov  7 17:31:10 2006
Subject: [gutvol-d] let's take stock
In-Reply-To: <ca8.2382477.32827534@aol.com>
References: <ca8.2382477.32827534@aol.com>
Message-ID: <1339485088.20061107183251@noring.name>

Bowerbird wrote:

> p.g. e-texts are good enough to sell to libraries.
>  
>  how do i know this?
>  
>  because a company called "netlibrary" is already
>  making sales of various p.g. e-texts to libraries.

The marketability of netLibrary is not the public domain texts it
contains, but the copyrighted works. The corpus numbers (last I heard,
which was a while ago) about 70,000 texts.

Next?


>  they can't point to many badly-flawed e-texts,
>  but it doesn't matter if they stick to honest facts;
>  the only thing that counts is if they can make up
>  a story that is "believable" enough to fool some
>  "investors" into forking over some cash to them,
>  so that they can "digitize major books correctly".

If a PG text does not state what the source is, that alone is a major
flaw.

And PG itself admits its books may be composites with editings done at
the whim of the compiler(s), without alerting the reader whether
it is a composite or not, and with no stated commitment to faithful
reproduction. This is right in the PG BOILERPLATE. And this is a major
flaw.

Those are the honest facts.

I don't need to talk about textual errors. In fact, not knowing the
source for books which have multiple Manifestations (which are most of
the great classics) means that truly knowing the textual errors is
indeterminable (and yes, I know about your super-power "toolz" to
hunt down and kill textual errors, but textual errors go beyond just
OCR or key entry errors.)


>  the story they are spinning to the investors is that
>  they will be able to displace the netlibrary sales by
>  positioning their e-books as "faithful to the original",
>  while condemning p.g. e-texts as "ridden with errors",
>  which of course is going to be anathema to librarians.
>  
>  again, it's a cheap shot, so they can line their pockets...
>  
>  the sad thing is, they probably _will_ find some suckers.

Shhhh, you are under NDA. You're not supposed to be telling the world
about our "sooper sekrit" plan to steal the Public Domain and forever
lock it up under the control of Halliburton and the Trilateral
Commission.

And in our nefarious plans we only need to point out that the pre-DP
portion of the collection is simply untrustworthy for the reasons
cited above, which PG itself admits. In fact, the PG BOILERPLATE is
all that we need to seize total control of the Public Domain and earn
billions of dollars.

Jon


From jon at noring.name  Tue Nov  7 17:45:18 2006
From: jon at noring.name (Jon Noring)
Date: Tue Nov  7 17:43:35 2006
Subject: =?iso-8859-1?B?UmU6IFtndXR2b2wtZF0gVGhlIGludGVyZXN0aW5nIGNvbW1lbnQgYnkg?=
	=?iso-8859-1?B?TGVlIGFuZCB3aGF0IHRoZSBQRwljb2xsZWN0aW9uIHdpbGwgYmVjb21l?=
	=?iso-8859-1?B?IGtub3duIGZvci4uLg==?=
In-Reply-To: <200611071940.06539.donovan@abs.net>
References: <c6b.45424f4.3280ffc1@aol.com>
	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>
	<455114CC.3060203@novomail.net> <200611071940.06539.donovan@abs.net>
Message-ID: <665744627.20061107184518@noring.name>

D. Garcia wrote:

> Project Gutenberg does have standards (albeit minimal), but PG is not the
> source of the texts. The volunteers are. Distributed Proofeaders is a major
> (perhaps "the major") source of PG texts, and DP has standards. PG benefits
> from these standards, in spite of their essentially passive role as 
> repository. I'm even willing to bet that many (or even most) of the PG texts
> produced by DP are on average the best of the lot. (Except for the ones that
> aren't.) *grin*

Agreed! DP does great work, and I love the dedication of those there.
And no doubt many of the lone PG contributors also do great work, but
the question is how do we know?

The most important thing is DP's commitment to faithful reproduction
of a known source work, and to preserve the source information. They
also do much of their work in a public setting.

Certainly the DP process can be improved, but the issue has more to do
with a commitment to improve the quality of the work product and address
the considerations of authenticity, faithfulness, trustworthiness, and
all those good things.

Maybe Lee's comment painted too wide of a brush to include the DP
stuff (I know he did not intend that). But note that Lee's comment was
on the PG trademark and its Boilerplate, which is slapped onto all DP
texts when incorporated into the PG collection. This is the "wide
brush" we really need to be concerned about.

Jon


From jon at noring.name  Tue Nov  7 17:58:09 2006
From: jon at noring.name (Jon Noring)
Date: Tue Nov  7 17:56:26 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <455114CC.3060203@novomail.net>
References: <c6b.45424f4.3280ffc1@aol.com>
	<456909141.20061106142145@noring.name>
	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>
	<455114CC.3060203@novomail.net>
Message-ID: <22911606.20061107185809@noring.name>

Lee Passey wrote:

> The fundamental problem is that Project Gutenberg is totally lacking in
> standards, so it is impossible to judge how well any given e-text 
> matches any given use. And we probably don't have enough data to 
> determine how well Project Gutenberg e-texts, on the whole, satisfy any
> external standards. But personally, I find that generally Project 
> Gutenberg e-texts 1. are inadequate for a scholarly dissertation; AND 2.
> are inadequate for assigned reading at a high-school level; AND 3. are
> inadequate for inclusion in any public or school library; AND 4. are 
> inadequate for any type of automated data processing; AND 5. are mostly
> inadequate for /effortless/ reading.
>
> I suspect that there /may/ be some gems in the PG corpus which are 
> adequate for any or all of the above uses, but again, because Project 
> Gutenberg has no standards, it is virtually impossible to identify those
> e-texts except on a case-by-case basis. Given the lack of indications of
> quality, I must accept as a default position that Project Gutenberg 
> e-texts are good enough to read and follow by a human being (and not a
> computer) given a modicum of effort; but nothing more.

Good point.

Of course, with a nod to Juliet who is right to be concerned, Lee's
statement paints a broad brush to include the good work done by DP.

But again note that for the average consumer, who doesn't know what we
know, the DP stuff gets mixed in with all the other stuff, with a
common boilerplate added which doesn't exactly instill warm fuzzies
("this text may be a composite...")

Another important point is that, by and large, the most popular texts
in the PG collection are the well-known classics, most of which were
not done by DP, and done a while back when things were much worse.
(The source(s) not known, ASCIIization of extended Latin characters,
etc.)

I suspect that Lee's experience has been skewed, as has most other
readers, by this more popular and older portion of the PG collection.
The stuff DP is doing now is mostly much lesser known material (which
needs to be properly digitized, of course) that will always be less
popular than the classics.

Jon Noring


From Bowerbird at aol.com  Tue Nov  7 18:21:19 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Nov  7 18:21:31 2006
Subject: [gutvol-d] juliet responds
Message-ID: <cf4.145ac4f.3282991f@aol.com>

juliet said:
>    The accuracy of the text being produced 
>    by Distributed Proofreaders is now very high. 

from one extreme to the other, we swing.


>    The early DP texts were not 
>    up to the current standards,
>    but are reasonably error free. 

"reasonably" being a very slippery word here.

it would   be nice to know, moreover, _when_ d.p. 
texts crossed the line into "the current standards".

and what plans exist to revisit the "early texts"
and improve them "up to the current standards".


>    It disturbs me to see people continuing to bash PG 
>    for the quality of its earlier works without acknowledging 
>    that more recent material is significantly better. 

the line between "early" and "more recent" texts
doesn't mesh so neatly along the _quality_ grid.

neither does the d.p./other segregation line...

some of the early e-texts were quite excellent.
and many of the early ones done by d.p. are not.
(and the line between "early" and "recent' is unclear.)

also, though it probably shouldn't need to be said,
given the advances over the years in o.c.r. quality,
many of the books that are done _outside_ of d.p.
these days are also of excellent, error-free quality.


>    Two examples, chosen from ones posted to PG 
>    (from DP) in the last couple of days

make no mistake about what i'm saying, there are
some very good e-books coming out of d.p. now...

i haven't looked closely at some recent examples
to see what i can find, but given juliet's confidence,
perhaps it's time for me to do another assessment.

(oh, and keep in mind what i said the other day:
that all p.g. e-texts -- including the d.p. ones --
will eventually be tossed, because you discarded
the _linebreak_ information.   even on the e-texts
that you assure us were based on a single source,
without the linebreaks in place, it is so difficult to
_verify_ it, that we have no choice but it do it over.)

***

noring said:
>   And PG itself admits its books may be composites 

"admits"?   as if this is some kind of confession?


>    This is right in the PG BOILERPLATE. 
>    And this is a major flaw.

no, it isn't.   just because it's not what _you_ would do,
it's not a "flaw", let alone "a major flaw".   but of course,
you keep trying to _spin_ it that way, and maybe some
naive fools will come to believe you.   what a cheap shot!

but hey, the bush administration has proven repeatedly
that fear-mongering is the modus operandi of the day...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061107/18310b30/attachment.html
From lee at novomail.net  Tue Nov  7 19:55:18 2006
From: lee at novomail.net (Lee Passey)
Date: Tue Nov  7 19:54:48 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <200611071940.06539.donovan@abs.net>
References: <c6b.45424f4.3280ffc1@aol.com>	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>	<455114CC.3060203@novomail.net>
	<200611071940.06539.donovan@abs.net>
Message-ID: <45515526.4000708@novomail.net>

D Garcia wrote:

[more major snippage]

>  While some of the above may be true for some of the PG texts, you're
>  using an awfully broad brush there. See below.

You are absolutely correct. But without some means of discrimination, a 
broad brush is the only one I have. Going out to the Project Gutenberg 
web site, I can't seem to find any way to perform a search that would 
return /only/ those e-texts produced by Distributed Proofreaders.

Picking an e-text at random, say http://www.gutenberg.org/etext/19725, I 
see nothing that would indicate that it was produced by Distributed 
Proofreaders. Looking at the text itself, I see a sentence (in a <pre> 
block) that states "Produced by Ted Garvin, Taavi Kalju and the Online 
Distributed Proofreading Team at http://www.pgdp.net," so as a human I 
could probably safely conclude that it came from DP, but it doesn't look 
like it's in a format where I could come up with an automated way to, 
say, download all of the PG corpus and build an index of only the DP texts.

Later in that same file it says "Project Gutenberg-tm eBooks are often 
created from several printed editions, all of which are confirmed as 
Public Domain in the U.S. unless a copyright notice is included.  Thus, 
we do not necessarily keep eBooks in compliance with any particular 
paper edition." So even if the original e-text was produced consistent 
with DP's published standards I have no way of knowing whether it was 
altered in some way that would make it /inconsistent/ with those standards.

> > I suspect that there /may/ be some gems in the PG corpus which are
> > adequate for any or all of the above uses, but again, because
> > Project Gutenberg has no standards, it is virtually impossible to
> > identify those e-texts except on a case-by-case basis. Given the
> > lack of indications of quality, I must accept as a default position
> > that Project Gutenberg e-texts are good enough to read and follow
> > by a human being (and not a computer) given a modicum of effort;
> > but nothing more.
>
>  Project Gutenberg does have standards (albeit minimal), but PG is not
>  the source of the texts. The volunteers are. Distributed Proofeaders
>  is a major (perhaps "the major") source of PG texts, and DP has
>  standards. PG benefits from these standards, in spite of their
>  essentially passive role as repository. I'm even willing to bet that
>  many (or even most) of the PG texts produced by DP are on average the
>  best of the lot. (Except for the ones that aren't.) *grin*

There is no doubt in /my/ mind that the DP texts are the best of the 
lot. And without getting into any discussion here about whether DP 
standards are sufficient for most purposes, my observation is that the 
DP standards are consistently evolving for the better. Merely having 
stated standards, no matter how trivial, is a major indication of 
quality. I am quite looking forward to the time when DP begins the task 
of re-scanning and re-creating the most popular public domain books 
which make up the early part of the PG corpus. If Project Gutenberg has 
any data which may indicate the most popular downloads over the past 4 
or 5 years, I'm sure that would be very useful.

But how do I find DP e-texts, and when I've found them how do I know 
they still meet the standard? A simple commitment from the Powers That 
Be at Project Gutenberg that e-texts produced by Distributed 
Proofreaders will not be altered without a change log or revision 
control would be a /major/ advance. But that is a degree of control 
which seems inconsistent with the stated PG preference for anarchy.

Now, while all this may sound like a criticism of Project Gutenberg, it 
is not. I am merely trying to illustrate the reality of the situation. 
Project Gutenberg and its directors can choose to be whatever they want. 
Apparently, what PG wants is to be a producer of mediocre PVA e-texts. 
There is nothing wrong with that decision, and I believe it fills a very 
real market niche. Project Gutenberg's support for other projects, with 
other standards (such as PGTEI) is admirable. I have never seen a 
comment from Mr. Hart or Mr. Newby, reiterating PG's stand on freedom 
from rules, that did not also include an invitation to start a new 
project with different standards, frequently with the explicit support 
of Project Gutenberg. The problem is not that PG refuses to accept 
proposed quality standards, it is that those proposing the changes 
refuse to accept the invitation.

So, to reiterate the point that started this thread: I don't care what 
Project Gutenberg chooses to be; I only hope that it does not hold 
itself out to be something it is not, and the current disclaimer is an 
important part in helping consumers understand what they are, and are 
not, getting when they download a Project Gutenberg e-text.

From jon at noring.name  Tue Nov  7 21:15:40 2006
From: jon at noring.name (Jon Noring)
Date: Tue Nov  7 21:14:03 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <45515526.4000708@novomail.net>
References: <c6b.45424f4.3280ffc1@aol.com>
	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>
	<455114CC.3060203@novomail.net> <200611071940.06539.donovan@abs.net>
	<45515526.4000708@novomail.net>
Message-ID: <127476658.20061107221540@noring.name>

Lee wrote:

> ... So even if the original e-text was produced consistent with DP's
> published standards I have no way of knowing whether it was altered
> in some way that would make it /inconsistent/ with those standards.

This is a good point which I had not thought of before. Is there an
agreement of some sort between PG and DP that DP provided texts will
not be altered in the PG archive except with the cooperation and
consent of DP?

And in the bigger picture, if I submitted a text to PG, will my
specific requirements regarding the alterability of the text in the
PG archive be respected? Are there any policies to dictate this, or
does that copy of the etext become the property of PG to do with as
it pleases (a form of no-strings-attached donation)?


> ... I am quite looking forward to the time when DP begins the task
> of re-scanning and re-creating the most popular public domain books 
> which make up the early part of the PG corpus. If Project Gutenberg has
> any data which may indicate the most popular downloads over the past 4
> or 5 years, I'm sure that would be very useful.

For this to happen, DP would have to commit some percentage of their
work flow to a re-make project of the most popular pre-DP works. I
hope Juliet will discuss the feasibility (as well as interest in the
ranks of the DP volunteers) of this possibility.


> Now, while all this may sound like a criticism of Project Gutenberg, it
> is not. I am merely trying to illustrate the reality of the situation.
> ... I have never seen a comment from Mr. Hart or Mr. Newby,
> reiterating PG's stand on freedom from rules, that did not also
> include an invitation to start a new project with different
> standards, frequently with the explicit support of Project
> Gutenberg. The problem is not that PG refuses to accept proposed
> quality standards, it is that those proposing the changes refuse to
> accept the invitation.

The obvious question is if PG's enthusiastic encouragement to others
to start new public domain digitization projects is only for projects
that will submit their work product primarily to PG, or if the
encouragement is truly ecumenical. I hope it is the latter...


> So, to reiterate the point that started this thread: I don't care what
> Project Gutenberg chooses to be; I only hope that it does not hold 
> itself out to be something it is not, and the current disclaimer is an
> important part in helping consumers understand what they are, and are 
> not, getting when they download a Project Gutenberg e-text.

Agreed.

This has been my main point -- to communicate the status of the PG
collection from various consumer perspectives, and this is not
restricted to just casual individual readers. Different perspectives
allow readers, as consumers, to make informed decisions as to how
and with what they will spend their time (time is just as valuable as
money).

The non-scientific poll I conducted a while back on TeBC suggested
that once readers are informed of the deficiencies in many of the PG
texts (which Mr. Passey has wonderfully summarized), a significant
number do express varying levels of concern -- and in my opinion
they should because they are spending something valuable, their time,
to read them.

One of these days I plan to rerun the poll at TeleRead, with improved
wording, and which I will ask for feedback before the poll is started.
The goal is two-fold: 1) to bring the issue to the consciousness of
the ebook reading community, and 2) to gauge within a reasonable
level of certainty how that community views the issue.

Jon Noring


From sam.bretheim at gmail.com  Tue Nov  7 22:39:10 2006
From: sam.bretheim at gmail.com (Sam Bretheim)
Date: Tue Nov  7 22:39:55 2006
Subject: [gutvol-d] The interesting comment by Lee and what the
	PG	collection will become known for...
In-Reply-To: <45515526.4000708@novomail.net>
References: <c6b.45424f4.3280ffc1@aol.com>	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>	<455114CC.3060203@novomail.net>	<200611071940.06539.donovan@abs.net>
	<45515526.4000708@novomail.net>
Message-ID: <45517B8E.10203@gmail.com>

Lee Passey wrote:
> You are absolutely correct. But without some means of discrimination, 
> a broad brush is the only one I have. Going out to the Project 
> Gutenberg web site, I can't seem to find any way to perform a search 
> that would return /only/ those e-texts produced by Distributed 
> Proofreaders.

http://pgdp.net/c/list_etexts.php?x=g&sort=5

It would be easy to add this data as a flag in the PG catalog, but being 
produced by DP isn't an automatic rubber stamp on quality: just like PG 
as a whole, DP took some time to arrive at good standards and practices, 
and a number of the early (2-round) DP texts are not up to DP's present 
quality level.

> Picking an e-text at random, say http://www.gutenberg.org/etext/19725, 
> I see nothing that would indicate that it was produced by Distributed 
> Proofreaders. Looking at the text itself, I see a sentence (in a <pre> 
> block) that states "Produced by Ted Garvin, Taavi Kalju and the Online 
> Distributed Proofreading Team at http://www.pgdp.net," so as a human I 
> could probably safely conclude that it came from DP, but it doesn't 
> look like it's in a format where I could come up with an automated way 
> to, say, download all of the PG corpus and build an index of only the 
> DP texts.

I have a script that fuses the PG standard RDF catalog, the legacy 
GUTINDEX file, the PG archive file listing, and the DP done list to 
produce a number of metadata products and reports, and it has a 
command-line flag to include only texts that did / didn't come from DP.  
I'm hesitant to just bundle the script into a front-end program because 
it downloads tens of megabytes of data each time it updates its catalog, 
and I've got far too much going on at the moment to work on it, but I'll 
try to set up a site to host it and its output sometime this year.

> So even if the original e-text was produced consistent with DP's 
> published standards I have no way of knowing whether it was altered in 
> some way that would make it /inconsistent/ with those standards.

I don't think you need to worry about the PG Errata team making work for 
themselves by arbitrarily changing things in DP texts.  Since they 
aren't leaping to defend themselves, I'll speak up: they do check 
suggested errata against the source pages, and when it comes to DP texts 
I believe they rely on DP's archived page scans for reference.
From Bowerbird at aol.com  Wed Nov  8 04:01:35 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov  8 04:01:43 2006
Subject: [gutvol-d] let's close this up
Message-ID: <c22.8e66618.3283211f@aol.com>

lee said:
>    But without some means of discrimination, 
>    a broad brush is the only one I have. 

well then be a decent human being and
find "some means of discrimination", lee!

but don't just pop off, making cheap shots...


>    Going out to the Project Gutenberg web site, 
>    I can't seem to find any way to perform 
>    a search that would return /only/ those 
>    e-texts produced by Distributed Proofreaders.

all you needed to do was ask.

i can make you a list of such texts, if you want.
and you can spread it far and wide, if you want.

or you can do it yourself; just download the first 10k
of every e-text and search for the word "distributed".
if you find it, then that text was digitized by d.p.

and i'd guess that the d.p. people could help you too.


>    Picking an e-text at random, 
>    say http://www.gutenberg.org/etext/19725, 
>    I see nothing that would indicate that it was 
>    produced by Distributed Proofreaders. 
>    Looking at the text itself, I see a sentence (in a block) 
>    that states "Produced by Ted Garvin, Taavi Kalju and the 
>    Online Distributed Proofreading Team at http://www.pgdp.net,"

bingo.   "distributed".


>    so as a human I   could probably safely conclude that it came from DP, 
>    but it doesn't look   like it's in a format where I could come up with 
>    an automated way to, say, download all of the PG corpus and 
>    build an index of only the DP texts.

it's a simple scraping task, lee.   really.

your computer can do it a lot faster than you.


>    even if the original e-text was produced consistent 
>    with DP's published standards I have no way of knowing 
>    whether it was altered in some way that would make it 
>    /inconsistent/ with those standards.

it's seemed to me that the postprocessors and project managers
get pretty attached to their books, and would notice if something
like that were to start happening, and complain about it _loudly_,
so the absence of such loud complaints would lead me to believe
that something like that is not happening.   and -- in general --
i think project gutenberg leans too far in the _opposite_ direction;
namely, they are too willing to let the volunteers do it _however_
the volunteers want, even when maybe it's not really the best way.


>    There is no doubt in /my/ mind that the DP texts are the best of the 
lot. 

well then maybe you need to get out a little more, lee.

i can give you more if you want, but for starters, read this:
>    http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&
post=2005-09-30,3

jose menendez compared a digitization he did with the one
for the same book that was done by distributed proofreaders.

he found that his version had _zero_ errors in it, while the d.p. one
had _50_ errors in it.   and that was counting an entire missing page
as _1_ error, even though a good many people seemed to believe that
_every_missing_word_ from that page should be counted as an error.
(i thought that was kind of ridiculous.   to me, it was simply one error.
sure, it was a _major_ error, you have to agree.   but still, just 1 error.)

no matter how you count, though, the comparison was still _stark_.
jose -- all by himself -- had outperformed d.p. in stunning fashion.

not only that, but his version was finished long before the d.p. one,
even though they both started when google released the scan-set...

more astonishingly, as i noted a week later in a follow-up post:
>    http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&
post=2005-10-05,4
the missing page had been fixed, but not any of the other 49 errors!

they had a perfect text to compare against, and they _still_ blew it!

as i put it in that follow-up, "the lights are on, but nobody's home..."

nor is this an anomaly.   there are _lots_ of p.g. e-texts that were 
produced by one person that are nice models of high accuracy...

and conversely, there was an estimate a while back, from jim tinsley --
who, as the main whitewasher at the time was the most likely to know
-- that the average d.p.-posted e-text probably had about 50 errors.
(notice how closely that number dovetails with what jose had found.)

jim considered this to be roughly equivalent to the non-d.p. e-texts.

in other words, in his opinion, there was little difference between the
quality of the non-d.p. e-texts and the d.p. ones.   this might not be
too surprising, once you realize that he probably considered it to be
the job of the whitewashers to take the error-rate down to "about 50".

(whether it was more work to attain this rate for the d.p. texts versus
the non-d.p. texts, jim wasn't clear.   i would assume that, because of
the longer-term relationship with the d.p. folks, it would've become
easier to work with them, attain more efficient communication, etc.)


>    my observation is that the DP standards
>    are consistently evolving for the better. 

well, their quality certainly is.   jim's post estimating 50 errors/book
was a wake-up call to d.p. about their quality, and they put into place
a number of changes meant to improve their error-rate, and it worked.

basically, it involved putting additional eyeballs on each book.
whether the benefit of fewer errors offsets the increased cost
of more human time and energy is an unanswered question, but
the fact is the error-rate _is_ at least somewhat better these days.

before then, however, there were rather embarrassing shortcomings;
there would be errors in posted texts that wouldn't pass a spellcheck!

(that's right.   one of the changes was the requirement of a spellcheck
by the postprocessor.   up until then, spellcheck had been "optional".
of such stuff are d.p. standards "consistently evolving for the better".)


>    I am quite looking forward to the time when DP begins the task of 
>    re-scanning and re-creating the most popular public domain books 
>    which make up the early part of the PG corpus.

i've been waiting for that since p.g. hit the 10,000 e-text mark.
i thought once michael's long-sought goal had been realized,
there would be a reshifting of attention to quality-control ideals.

instead, there was just a continuing rush to hit the 20,000 mark.

once again, i'm hoping when _that_ mark is hit, there will be a shift.

but this time, i'm not holding my breath...


>    A simple commitment from the Powers That   Be at Project Gutenberg 
>    that e-texts produced by Distributed Proofreaders will not be altered 
>    without a change log or revision control would be a /major/ advance. 

i long ago said changelogs are an essential aspect of any e-library.
again, nada.   so my advice to you is to not hold your breath either...

but i think your underlying concern here -- that the e-texts are _fine_
when d.p. turns them over, and that then the project gutenberg side
is gonna turn all nefarious -- is disingenuous and unfair all around...

the general tone, especially from noring, is that p.g. acts in bad faith.
and frankly, that's offensive to me.   everyone here is a volunteer, and
they seem to have very big hearts, and even though i see ineptitude,
i do believe that _everyone_ from p.g. or d.p. is acting in good faith...
(i'm not sure they're all using their heads, but that's a different issue.)


>    Apparently, what PG wants is to be a producer of mediocre PVA e-texts. 

for those of you who aren't hip to lee-speak and his abbreviations,
please let me inform you that "p.v.a." stands for "plain vanilla ascii".

given the way most people think about project gutenberg e-texts,
"plain vanilla ascii" seems like a fair description of the uni-sized,
unstyled files, often viewed with a nonproportionally-spaced font.

however, if you've been following my "babelfish" messages lately,
where i'm using simple perl scripts to turn a plain-vanilla ascii file
into a nicely-formatted, high-quality set of .html pages, you know
that this blinkered way of thinking about "p.v.a." files is outdated...

if your "p.v.a." files are _mediocre_, it's because you're not letting 'em
live up to the full potential they can attain from zen markup language.


>    I have never seen a comment from Mr. Hart or Mr. Newby, 
>    reiterating PG's stand on freedom from rules, that did not also 
>    include an invitation to start a new project with different standards, 
>    frequently with the explicit support   of Project Gutenberg. 
>    The problem is not that PG refuses to accept proposed quality standards, 

>    it is that those proposing the changes refuse to accept the invitation.

wow, that's the most even-handed thing you've said in this whole exchange.

jon, are you paying attention to lee?


>    So, to reiterate the point that started this thread: I don't care what 
>    Project Gutenberg chooses to be; I only hope that it does not hold 
>    itself out to be something it is not, and the current disclaimer is an 
>    important part in helping consumers understand what they are, and 
>    are not, getting when they download a Project Gutenberg e-text.

project gutenberg does _not_ "hold itself out to be something it is not".
and i don't think you need to worry that it ever will.   hart's got 
integrity.

and bruce albrecht's point (that the boilerplate does not seem to apply
to a good many of the e-texts that are being produced these days) is,
i would think, something that could be taken into new consideration.

even if the language stays as it is, though, i don't think it's misleading.
as has been pointed out, there is ample indication in the file itself that
points to the source-text.   people are smart enough to figure it out...

also, if there _is_ new consideration about this language, then i'd think
that you'll need to ponder the other side of the teeter-totter, because
if you're gonna say that an e-text is "a faithful rendition of" its p-book,
you might then incur upon yourself some kind of good-faith obligation
to mount the scans, so end-users can verify your statement themselves.
otherwise, you're still essentially saying "trust me", and furthermore you
have elevated the degree of "trust" to a much higher and different level.

(and i'll point out that d.p. has been terribly reluctant to mount scans,
perhaps because it will immediately make their errors _discoverable_.)

also, i think you would need to document, _meticulously_, the changes
that you have made from the original.   (and to a much greater degree,
and in a more formal way, than such changes are currently elucidated.
plus, your inadvertent errors would then become "violations of trust".)

and even with _all_that_, you're still gonna have to cover your ass with
"we're doing the best we can, as unpaid volunteers", which is _exactly_
what p.g. has said all along, so i'm not sure you've added much value...
it's not like you're gonna accept any fiduciary responsibility, are you?

so i don't know if d.p. really wants to jump in this stream quite yet...
it might carry them to some places that they'd really rather not go...

***

in closing -- and just because this fact rarely seems to get mentioned --
let me remind people that the legal advice that project gutenberg got,
back in the early days, was that it should _strip_identifying_information_,
like the original publisher, date of publication, copyright statement, etc.
the legal situation of the public-domain wasn't always cut-and-dried.

it was only december of 2003, during the 10,000th e-text celebration,
this legal advice changed.   before, since we weren't eager to put our ass
on the line for legal damages that p.g. might have made itself liable for,
and i know _i_ wasn't, then i really don't think we can complain about it.
the legal situation of the public-domain _still_ ain't so cut-and-dried
that rich estates ain't willing to hire big lawyers to try to intimidate p.g.
how many of you would put your savings into a safety escrow for p.g.?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061108/fc58c1b8/attachment-0001.html
From greg at durendal.org  Wed Nov  8 04:29:39 2006
From: greg at durendal.org (Greg Weeks)
Date: Wed Nov  8 05:00:07 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <127476658.20061107221540@noring.name>
References: <c6b.45424f4.3280ffc1@aol.com>
	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>
	<455114CC.3060203@novomail.net> <200611071940.06539.donovan@abs.net>
	<45515526.4000708@novomail.net> <127476658.20061107221540@noring.name>
Message-ID: <Pine.LNX.4.63.0611080728300.17297@durendal.durendal.org>

On Tue, 7 Nov 2006, Jon Noring wrote:

> For this to happen, DP would have to commit some percentage of their
> work flow to a re-make project of the most popular pre-DP works. I
> hope Juliet will discuss the feasibility (as well as interest in the
> ranks of the DP volunteers) of this possibility.

Not true. All you need is somebody willing to volunteer to CP and PM the 
content. No commitment is needed by anyone else.

-- 
Greg Weeks
http://durendal.org:8080/greg/

From joshua at hutchinson.net  Wed Nov  8 05:46:30 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov  8 06:05:45 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
Message-ID: <33431707.1162993590632.JavaMail.?@fh1037.dia.cp.net>

Hell, someone scan in the "classics" they want redone and I'll start 
running them through DP myself.  I'll even mark 'em up in TEI when it 
is all done so we have a consistent markup and a built-in changelog 
system for 'em.

Josh

>----Original Message----
>From: greg@durendal.org
>Date: Nov 8, 2006 7:29 
>To: "Jon Noring"<jon@noring.name>, "Project Gutenberg Volunteer 
Discussion"<gutvol-d@lists.pglaf.org>
>Subj: Re: [gutvol-d] The interesting comment by Lee and what the PG	
collection will become known for...
>
>On Tue, 7 Nov 2006, Jon Noring wrote:
>
>> For this to happen, DP would have to commit some percentage of 
their
>> work flow to a re-make project of the most popular pre-DP works. I
>> hope Juliet will discuss the feasibility (as well as interest in 
the
>> ranks of the DP volunteers) of this possibility.
>
>Not true. All you need is somebody willing to volunteer to CP and PM 
the 
>content. No commitment is needed by anyone else.
>
>-- 
>Greg Weeks
>http://durendal.org:8080/greg/
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From bruce at zuhause.org  Wed Nov  8 06:06:46 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Wed Nov  8 06:06:48 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <22911606.20061107185809@noring.name>
References: <c6b.45424f4.3280ffc1@aol.com>
	<456909141.20061106142145@noring.name>
	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>
	<455114CC.3060203@novomail.net>
	<22911606.20061107185809@noring.name>
Message-ID: <17745.58486.373181.347740@celery.zuhause.org>

Jon Noring writes:
 > But again note that for the average consumer, who doesn't know what we
 > know, the DP stuff gets mixed in with all the other stuff, with a
 > common boilerplate added which doesn't exactly instill warm fuzzies
 > ("this text may be a composite...")

Which is why I brought it up in the first place.  If the submitter
attests that this statement is not true, then why include it in the
disclaimer?  I know that with very few exceptions, it is not true for
DP works in PG (and in those cases it's mentioned in the transcriber's
notes), which is why I'd like to see an option to not have that
particular disclaimer in *my* books.  My books, and nearly all of the
rest of the DP output include the original publisher and publication
or copyright information, and it has not been stripped by the
whitewashers.
From jon at noring.name  Wed Nov  8 06:56:39 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 06:55:03 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <17745.58486.373181.347740@celery.zuhause.org>
References: <c6b.45424f4.3280ffc1@aol.com>
	<456909141.20061106142145@noring.name>
	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>
	<455114CC.3060203@novomail.net> <22911606.20061107185809@noring.name>
	<17745.58486.373181.347740@celery.zuhause.org>
Message-ID: <7310670187.20061108075639@noring.name>

Bruce wrote:
> Jon Noring writes:

>> But again note that for the average consumer, who doesn't know what we
>> know, the DP stuff gets mixed in with all the other stuff, with a
>> common boilerplate added which doesn't exactly instill warm fuzzies
>> ("this text may be a composite...")

> Which is why I brought it up in the first place.  If the submitter
> attests that this statement is not true, then why include it in the
> disclaimer?  I know that with very few exceptions, it is not true for
> DP works in PG (and in those cases it's mentioned in the transcriber's
> notes), which is why I'd like to see an option to not have that
> particular disclaimer in *my* books.  My books, and nearly all of the
> rest of the DP output include the original publisher and publication
> or copyright information, and it has not been stripped by the
> whitewashers.

Good point.

I am confused by your last sentence; does PG strip out any information
from the DP files before inclusion in the PG collection? This would
include metadata information.

Jon

From joshua at hutchinson.net  Wed Nov  8 07:08:24 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov  8 07:08:30 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
Message-ID: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>


>----Original Message----
>From: jon@noring.name
>
>I am confused by your last sentence; does PG strip out any 
information
>from the DP files before inclusion in the PG collection? This would
>include metadata information.
>

Once upon a time, they used to.  Now, if a text comes through with 
that meta data, it stays.

Josh

PS The stripping hasn't happened for quite some time (ie, years).
From jon at noring.name  Wed Nov  8 07:36:24 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 07:34:48 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <33431707.1162993590632.JavaMail.?@fh1037.dia.cp.net>
References: <33431707.1162993590632.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <65889479.20061108083624@noring.name>

Josh wrote:

> Hell, someone scan in the "classics" they want redone and I'll start
> running them through DP myself.  I'll even mark 'em up in TEI when it 
> is all done so we have a consistent markup and a built-in changelog 
> system for 'em.

Well, one classic you could start with is "My Antonia" by Willa Cather.
A while back I purchased the 2nd printing of the 1918 First Edition,
chopped it, and scanned it at 600 dpi (with derivatives at lower
resolution.) See http://www.openreader.org/myantonia/

Independently, Jose Menendez (who had feedback from Bowerbird) and I
(with the help of the Cather project and Lori), produced high quality
text versions. I believe the error rate is zero based on back and forth
machine comparisons between Jose's and my versions (well mostly I
compared mine to his!) I have produced a version totally faithful to the
original 1918 edition, including marking up the errors in that original
book (error source: Jose and the Cather project) and included the line
breaks (at the exact point where the linebreak occurred, including at
the end-of-line-hyphen.)

Now one may ask why should DP "redo" this book? Here's my reasons:

1) It is a classic!

2) Unless Jose or someone else resubmitted "My Antonia" recently to PG,
   the version that is there is one that desperately needs replacing.

3) The existence of a probably zero error, faithful transcription can
   serve as a useful benchmark to see how well the DP process does in
   finding OCR errors. Machine comparisons can be made.

4) The DP process will add trustworthiness to the final work product.


Jon Noring

From jon at noring.name  Wed Nov  8 07:46:25 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 07:44:46 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <Pine.LNX.4.63.0611080728300.17297@durendal.durendal.org>
References: <c6b.45424f4.3280ffc1@aol.com>
	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>
	<455114CC.3060203@novomail.net> <200611071940.06539.donovan@abs.net>
	<45515526.4000708@novomail.net> <127476658.20061107221540@noring.name>
	<Pine.LNX.4.63.0611080728300.17297@durendal.durendal.org>
Message-ID: <1224947266.20061108084625@noring.name>

Greg wrote:
> Jon Noring wrote:

>> For this to happen, DP would have to commit some percentage of their
>> work flow to a re-make project of the most popular pre-DP works. I
>> hope Juliet will discuss the feasibility (as well as interest in the
>> ranks of the DP volunteers) of this possibility.

> Not true. All you need is somebody willing to volunteer to CP and PM the
> content. No commitment is needed by anyone else.

Well, I do understand this.

However, if the DP volunteer "leadership" publicly discusses this, and
then comes to a collective decision that everyone should consider
doing a 'classic' every once in a while, that would go a long ways
towards redoing the classics.

In addition, the leadership should discuss, and if it makes any sense,
*suggest* certain procedures to go through with the classics, such as

1) Before picking an edition to transcribe, talk to one or two experts
   on the Work (including published bibliographies) to get their feedback
   as to the differences between editions and what they recommend should
   be digitized first. It might be their suggestion is to digitize two or
   more different editions due to various types of differences.

   Note that unlike many of the more obscure books DP now does, many of the
   classics (especially those which are translations) have been issued in
   multiple editions at different times from different publishers with
   different levels of editing. Thus it makes sense to digitize the
   edition(s) which are considered reasonably authoritative or
   recommended by the scholars and enthusiasts.

2) Do high quality scans of these books. Consider for mastering
   purposes 600 dpi full color of the text pages, 1200 dpi for any
   illustrations (which can be submitted to IA. Heck, burn them on DVD and
   I'll take a copy.) For OCR, of course, the master scans can be resampled
   as needed. (Note, if the book is chopped, I'll gladly take and
   store away the original pages so the book may be retrieved as
   needed, or if pages need rescanning.)

   Of course, make sure the scan sets (either the master or reasonable
   derivatives) are publicly available right away -- e.g., submit the page
   scans not only to IA but to PG the same time the finished text is.
   Btw, the master text itself should also reference the page scans so
   they may be correlated as has been done with my version of My
   Antonia at http://www.openreader.org/myantonia/

3) Yes, the classics should be mastered in PGTEI. And I do suggest, as
   Bowerbird suggests, that line break information be preserved. This
   is useful information to aid future error correction, plus the
   Internet Archive/OCA may find it useful for their proposed study on
   the quality of OCR engines (something Brewster has talked to DP and
   me about in the past.)


Just a few thoughts.

Jon

From jon at noring.name  Wed Nov  8 07:48:47 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 07:47:09 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>
References: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <1884299832.20061108084847@noring.name>

Joshua wrote:
> Jon asked:

>> I am confused by your last sentence; does PG strip out any
>> information from the DP files before inclusion in the PG
>> collection? This would include metadata information.

> Once upon a time, they used to.  Now, if a text comes through with 
> that meta data, it stays.

Did any metadata stripping happen with the first DP texts submitted to
PG? And if so, has that been corrected since?

Jon

From jon at noring.name  Wed Nov  8 08:32:13 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 08:30:38 2006
Subject: [gutvol-d] let's close this up
In-Reply-To: <c22.8e66618.3283211f@aol.com>
References: <c22.8e66618.3283211f@aol.com>
Message-ID: <817072195.20061108093213@noring.name>

Jon wrote:
> lee said:

>>?But without some means of discrimination, a broad brush is the
>> only one I have.

>  well then be a decent human being and
>  find "some means of discrimination", lee!
>  
>  but don't just pop off, making cheap shots...

Btw, Bowerbird you forgot to add "in my opinion":

"but don't just pop off, making (in my opinion) cheap shots..."

I don't see what Lee said as a cheap shot, but a very well reasoned
argument. I think you know it was a very strong argument, so you
simply label it a "cheap shot". That is a very emotionally-laden
phrase meant not for rational discourse, but tarring and feathering.


>  and bruce albrecht's point (that the boilerplate does not seem to apply
>  to a good many of the e-texts that are being produced these days) is,
>  i would think, something that could be taken into new consideration.

It proves that it is valuable to discuss the integrity, authenticity,
trustworthiness, etc., etc. of the PG texts. It helps people see the
various issues, and suggest improvements.


>  (and i'll point out that d.p. has been terribly reluctant to mount scans,
>  perhaps because it will immediately make their errors _discoverable_.)

Good thing you added "perhaps" -- even so you are making a serious
charge that DP is intentionally hiding the scans for reasons of
embarrassment. Do you have any evidence, or are you just fishing? I
wonder how the DP folk will take your charge?

About the old scan sets, I too hope DP will soon release them. From
what I gather, it simply is a matter of someone's time to straighten
them out (and it will require a lot of time.) I do know one thing, it
is *their* intention to release them to the public. DP is simply a
volunteer project with essentially zero bucks that has submitted over
half of the texts to PG. They are now overwhelmed by their own
success.

Why you have this seeming undercurrent obsession against DP is
disturbing. I would think that Michael considers DP a rousing success.
Even if the transcription error of the DP texts is only the same as
that submitted by individuals (on average), then I would surmise that
Michael considers that good enough. DP has literally doubled PG's
collection in a very short time. They are definitely doing something
right.

Jon Noring


From sly at victoria.tc.ca  Wed Nov  8 08:56:49 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Nov  8 08:56:54 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>
References: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <Pine.GSO.4.58.0611080847150.17023@vtn1.victoria.tc.ca>


I'm not sure that I would use the word "metadata" here.
To me, that seems to imply something that is structured,
that can be extracted and/or processed by a computer.

In this case, we are just talking about transcription
of items, such as a publisher's name, which appear on
a title page.

Andrew

On Wed, 8 Nov 2006, joshua@hutchinson.net wrote:

> >From: jon@noring.name
> >
> >I am confused by your last sentence; does PG strip out any
> information
> >from the DP files before inclusion in the PG collection? This would
> >include metadata information.
> >
>
> Once upon a time, they used to.  Now, if a text comes through with
> that meta data, it stays.
>
> Josh
>
> PS The stripping hasn't happened for quite some time (ie, years).
> _______________________________________________
From jon at noring.name  Wed Nov  8 09:12:03 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 09:10:31 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <Pine.GSO.4.58.0611080847150.17023@vtn1.victoria.tc.ca>
References: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>
	<Pine.GSO.4.58.0611080847150.17023@vtn1.victoria.tc.ca>
Message-ID: <803308369.20061108101203@noring.name>

Andrew wrote in reply to my use of the word metadata:

> I'm not sure that I would use the word "metadata" here.
> To me, that seems to imply something that is structured,
> that can be extracted and/or processed by a computer.

Understandable. In the broadest sense, metadata is simply information
(or data) about data, with no structuring implied.

What does PGTEI do with respect to metadata?


Jon Noring


From joshua at hutchinson.net  Wed Nov  8 09:15:49 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov  8 09:16:11 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
Message-ID: <16192932.1163006149445.JavaMail.?@fh1037.dia.cp.net>


>----Original Message----
>From: jon@noring.name
>
>What does PGTEI do with respect to metadata?
>

Metadata is stored in the <teiHeader> section of the document.

See http://pgtei.pglaf.org/marcello/0.4/doc/20000-h.html#toc22 for an 
example from the guidelines.

Josh
From jon at noring.name  Wed Nov  8 09:27:55 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 09:26:17 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <16192932.1163006149445.JavaMail.?@fh1037.dia.cp.net>
References: <16192932.1163006149445.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <895085912.20061108102755@noring.name>

Joshua wrote:
> Jon asked:

>> What does PGTEI do with respect to metadata?

> Metadata is stored in the <teiHeader> section of the document.
>
> See http://pgtei.pglaf.org/marcello/0.4/doc/20000-h.html#toc22 for an 
> example from the guidelines.

Oops, I should have qualified my original request with what metadata
is recorded. I'm very interested in the source metadata, which the
above metadata does not include (since it is a PGTEI remake of an
older pre-DP work.)

I would assume that the PGTEI metadata header will include source
metadata when that is known?

Jon

From marcello at perathoner.de  Wed Nov  8 09:36:41 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov  8 09:36:49 2006
Subject: [gutvol-d] The interesting comment by Lee and what the
	PG	collection will become known for...
In-Reply-To: <803308369.20061108101203@noring.name>
References: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>	<Pine.GSO.4.58.0611080847150.17023@vtn1.victoria.tc.ca>
	<803308369.20061108101203@noring.name>
Message-ID: <455215A9.9050601@perathoner.de>

Jon Noring wrote:


> What does PGTEI do with respect to metadata?

Look here:

  http://www.tei-c.org/P4X/HD.html

and also here:

  http://www.tei-c.org/Lite/U5-header.html


You can store in the PGTEI header anything that is valid in TEI.

Not everything will be evaluated while generating the user formats but
nothing will be lost for future generations (and later implementations).


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Wed Nov  8 09:42:18 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov  8 09:42:25 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
Message-ID: <15127149.1163007738524.JavaMail.?@fh1037.dia.cp.net>


>----Original Message----
>From: jon@noring.name
>
>I would assume that the PGTEI metadata header will include source
>metadata when that is known?
>

Here is an example from a fairly recent one I did:

 <sourceDesc>
  <bibl>
  <title>The Cathedral Church of Hereford</title>
  <author>A. Hugh Fisher</author>
  <imprint>
   <pubPlace>London</pubPlace>
   <publisher>George Bell and Sons</publisher>
   <date>1898</date>
  </imprint>
  </bibl>
 </sourceDesc>

Josh

PS If you want lots of wordy documentation, here is a link the the TEI-
Lite doc area:  http://www.tei-c.org/Lite/U5-header.html

From sly at victoria.tc.ca  Wed Nov  8 09:57:04 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Nov  8 09:57:10 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <16192932.1163006149445.JavaMail.?@fh1037.dia.cp.net>
References: <16192932.1163006149445.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <Pine.GSO.4.58.0611080946490.27031@vtn1.victoria.tc.ca>


An interesting point to make is that TEI makes provision for
two separate sections which at first look similar (based on
content). One for a description of the source, and one for
a description of the current transcription.

Although there can easily be overlap between these two,
it still makes a lot of sense to keep them separate, because
there is information that is quite distinctly applicable
to each.

For example, some bits of information that may describe
the original book would be publisher, lccn, number of
pages, perhaps a title statement including a typo, etc.

And some bits of information specific to the PG text would
be release date, PG number, etc.

Andrew

On Wed, 8 Nov 2006, joshua@hutchinson.net wrote:

>
>
> >----Original Message----
> >From: jon@noring.name
> >
> >What does PGTEI do with respect to metadata?
> >
>
> Metadata is stored in the <teiHeader> section of the document.
>
> See http://pgtei.pglaf.org/marcello/0.4/doc/20000-h.html#toc22 for an
> example from the guidelines.
>
> Josh
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From kth at srv.net  Wed Nov  8 09:36:01 2006
From: kth at srv.net (Kevin Handy)
Date: Wed Nov  8 10:06:47 2006
Subject: [gutvol-d] let's close this up
In-Reply-To: <817072195.20061108093213@noring.name>
References: <c22.8e66618.3283211f@aol.com>
	<817072195.20061108093213@noring.name>
Message-ID: <45521581.8080906@srv.net>

Jon Noring wrote:

>Good thing you added "perhaps" -- even so you are making a serious
>charge that DP is intentionally hiding the scans for reasons of
>embarrassment. Do you have any evidence, or are you just fishing? I
>wonder how the DP folk will take your charge?
>  
>
I believe one major problem is finding enough disk space
to hold the scans. One simple b&w books images are likely
to be in the 10Mb range. This increases greatly if the images
are color or greyscale. 10,000(+) books will need serious
amounts of disk space, terra-bytes likely, and also needs
enough network bandwidth to serve this amount of data.

From lee at novomail.net  Wed Nov  8 10:08:29 2006
From: lee at novomail.net (Lee Passey)
Date: Wed Nov  8 10:06:58 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <454D8138.6040307@gmail.com>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
	<454D8138.6040307@gmail.com>
Message-ID: <45521D1D.10801@novomail.net>

Sam Bretheim wrote:

> joshua@hutchinson.net wrote:
> 
>> The gist is, the TEI files are not meant to be parse by a web browser, 
>> so the fact that they DON'T display properly basically means 
>> everything is working according to design.
> 
> It's worth mentioning that modern Web browsers are quite capable of 
> displaying TEI reasonably well, though some work on the relevant TEI and 
> XSL stylesheets is necessary before they're ready to be widely used.

Indeed, it has been well known for some time that many major browsers 
are capable of displaying TEI reasonably well. See 
http://lists.pglaf.org/private.cgi/gutvol-d/2005-September/003135.html.

If they are constructed correctly, there is no technical reason why TEI 
files cannot be used directly in web browsers. Web browsers cannot 
display 100% of the richness captured by TEI, but the display would 
still be better than the simplified ASCII text version. And there is no 
reason why a single TEI file could not be created that would satisfy the 
needs of both direct rendering and transformations via an XSL script. It 
is not done simply because those involved in the PGTEI project have 
chosen not to do it.

> For instance, if the author had inserted the following near the 
> beginning of the document, it would have rendered quite tolerably in 
> recent versions of Firefox/Mozilla/Camino, Konqueror/Safari/OmniWeb, 
> Opera, and iCab.  (IE and Amaya have trouble with some of the code in 
> this CSS file; I'll try to figure out how to make them display TEI 
> properly.)

FWIW, I tested both your documents and my documents in IE7 and the 
result is basically identical to the behavior of IE6. Opening your 
stylesheet in Microsoft Visual Studio dotnet, with validation turned on 
indicated a number of errors of the type "'content' is not a known CSS 
property name." My suspicion is that Microsoft has simply not yet gotten 
beyond support for CSS1 ('content' is a new addition in CSS2), so your 
document may look good in IE if you can figure out how to style using 
only CSS1 selectors.

> <?xml-styleheet type="text/css" 
> href="http://www.shinparam.org/Sam/Projects/TEI-CSS/prettynovel.css"?>

My own preference is for /two/ CSS declarations: one which refers to a 
generic CSS file which would make /all/ PGTEI books look acceptable (I 
have suggested PGTEI.CSS) and a second where an end user could place all 
his personal preferences as overrides (I suggest PGTEI-USER.CSS) (I 
/always/ include statements which will replace serifed fonts with 
sans-serifed fonts). If the second stylesheet does not exist, User 
Agents will simply ignore it and use only the first, generic stylesheet.

> Here are two books I'm in the midst of proofing and marking up, both of 
> which look fairly good when viewed with that CSS stylesheet:
> 
> http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml
> 
> http://shinparam.org/Sam/Projects/TEI-CSS/Roberts-Bookbinding-draft.xml

These are really great! Hopefully they will become worthwhile additions 
to PG's TEI collection. When they do I hope you will insist that those 
constructs which make the documents viewable in a web browser must be 
retained.

-- 
Nothing of significance below this line.

From Bowerbird at aol.com  Wed Nov  8 10:07:14 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov  8 10:07:32 2006
Subject: [gutvol-d] re: let's close this up
Message-ID: <3fc.59fc8c9.328376d2@aol.com>

jon, i ain't gonna go 'round your merry-go-round again.
not here, anyway, and not now.   maybe elsewhere, later...

i do my homework, on d.p. and everything else i talk about,
so if anyone questions what i say, i'll be happy to back it up.
plus i know a cheap shot when i see it.   so do other people.

as for appreciating d.p., i appreciate them as much as you do.
and that's despite the fact that i grok their flaws better than you.
and i love them enough to actually _tell_ them about those flaws,
rather than slather them in a slobbery wad of adoring fangirl goo.

as for the d.p. scans, here's some help for you on your homework:
>    http://www.pgdp.org/ols/tools/
congratulations to donovan on whipping them into shape...
as soon as d.p. is ready to release them, i'm ready to scrape.

***

josh said:
>   The stripping hasn't happened for quite some time (ie, years).

3 years ago, next month, to be precise.

(y'all really _should_ know your history.)

as i said, the decision was made at the
december, 2003, celebration of 10,000.

jon was there, making his standard pitch,
so i'm not sure why he's so surprised now
to find that this "meta-data" was "stripped"
as a routine matter of policy until that time.
(oh, i get it.   his "shock" now is the faux kind.)

juliet and charlez were arguing for retention.
_i_ even said that -- in cases where an e-text
was generated on the basis of a single p-book
-- that the meta-data could/should be kept...

but, you know, none of _us_ are _lawyers_, eh?

so hey, how much would _you_ listen to us, eh?

it was only the presence of new legal advice
that assured michael and greg to change the
longstanding policy to drop identifying data.

and it's probably only proper to recognize too
that it was _only_ because brewster kahle was
_seeking_ a legal case to set a good precedent
that made steve harris, p.g.'s pro bono lawyer
-- and working for kahle as well at the time --
believe he could give such advice _and_ make it
stick in court, with brewster's money behind him.

michael or greg should feel free to correct me if
i'm wrong about this, but it was clear at the time.
brewster was unafraid of an unambiguous case...

so you've got brewster's backing to thank for this.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061108/ef912f4f/attachment-0001.html
From Bowerbird at aol.com  Wed Nov  8 10:16:57 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov  8 10:17:59 2006
Subject: [gutvol-d] TEI rendering in Web browsers
Message-ID: <4be.a867516.32837919@aol.com>

lee said:
>    Web browsers cannot display 
>    100% of the richness captured by TEI, 
>    but the display would still be 
>    better than the simplified ASCII text version.

bigger costs to code the document into .tei, with
no benefits greater than those delivered by .zml.

let's stop with the ascii strawman.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061108/9dd79c6d/attachment.html
From jon at noring.name  Wed Nov  8 10:37:28 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 10:35:49 2006
Subject: [gutvol-d] Kahle versus Gonzales on orphan works soon to be heard
Message-ID: <1011748809.20061108113728@noring.name>

Nov. 13 to be specific. For more info, see:

http://www.archive.org/iathreads/post-view.php?id=76756

From j.boelaert at skynet.be  Wed Nov  8 10:40:56 2006
From: j.boelaert at skynet.be (Johan Boelaert)
Date: Wed Nov  8 11:04:05 2006
Subject: [gutvol-d] Project Gutenberg
Message-ID: <200611081841.kA8If0LW015376@outmx025.isp.belgacom.be>

Hello

Every now and then I visit Project Gutenberg Europe, hoping something has
changed to improve it. But today also, it hasn't. 

On the homepage (http://pge.rastko.net/), we read: "Project Gutenberg Europe
is follower of Project Gutenberg philosophy, focused primarily on digitizing
European cultures, under European copyright laws."  If this were true, it
would be an excellent opportunity to publish European books online, which
can't be published in Gutenberg U.S., because of the much more severe
copyright laws over there.

But alas, on the page "Copyright HOWTO",
(http://pge.rastko.net/howto/copyright-howto) we read: "Project Gutenberg's
copyright rules are for Project Gutenberg eBooks, and apply only to works we
release on our main servers. Project Gutenberg is US-based, follows US
rules, and the only servers on which the collection is directly maintained
are in the US." If you try to ask for a clearance, you are redirected to
Getenberg US.

Is there any possibility to improve this site?

 
Greetings from Belgium, Europe (copyright: life +70)

Johan

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061108/f88052f9/attachment.html
From Bowerbird at aol.com  Wed Nov  8 12:11:07 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov  8 12:11:23 2006
Subject: [gutvol-d] gvd061108 -- one preliminary thought on duguid
Message-ID: <c28.55e3a6d.328393db@aol.com>

the article by paul duguid that was published
in the influential "first monday" series recently
was quite insightful on a number of dimensions.

i've written up a good many observations on it,
but i think i will begin with a very simple one...

duguid analyzes the proposition that open-source
_software_development_ might not necessarily apply
-- philosophically -- to other _distributed_ processes.

he specifically includes project gutenberg in his paper,
as well as gracenote (which collects musical meta-data),
and the big cahuna of distributed work today, wikipedia.

at the most basic of levels, duguid argues that software 
has a fundamental check of its worthiness, in the form
of whether the compiler will actually _execute_ the code.

errors that would cause the compiler to abort the code
are quickly identified on that basis, without any analysis.

as duguid puts it:
>    solutions must compile and run. 
>    Hence, while Open Source software has relied heavily 
>    on peer production and to a lesser extent on peer review, 
>    for quality, it relies as heavily though perhaps less obviously, 
>    on the chip and the compiler as ultimate arbiters. 
>    These two both identify problems with the code 
>    and reject inadequate solutions

duguid surmises that other types of distributed projects
-- especially the three that are under his microscope --
do not have this immediate and telling check for errors...

in a letter she sent to duguid responding to his article,
juliet from distributed proofreaders made this claim:
>    By successive revision, in comparing text to image 
>    3-5 times for each page, and then by applying 
>    standard tools, checks, etc (quite parallel, in a way, 
>    to compiling code) DP achieves transcription accuracy

you really should read the entirety of juliet's letter.   it's at:
>    http://www.pgdp.net/phpBB2/viewtopic.php?t=22839&start=0

but for right now, i want to focus in on her statement that d.p.'s
"standard checks" are "quite parallel, in a way, to compiling code".

in my experience, both as a person who executes source code
and as a person who has examined the d.p. output very closely,
this statement strains credulity, and even borders on ridiculous.

and i will give an example right here to back up that assessment.

i'm using the e-text entitled "the hawaiian romance of laieikawai":
>    http://www.gutenberg.org/etext/13603

as you can see, this e-text is #13603, which means it is in the
"later half" of the p.g. library.   the files are dated 2004/10/5.
this e-text was included in the bundle prepared to celebrate
the 5,000th e-text created by d.p., so it's in the "later half" of
all e-texts digitized by d.p. as well.   (their total now is 9,320.)

this file predates the switch from a 2-round to a 4-round system,
but i'm not going to focus on _proofing_ errors right now, merely
on mistakes that dispute juliet's "similar to compiling code" claim.

i've appended to this post the first line of each footnote, in order.

there are all kinds of comments i could make (and will make later)
about the footnote numbering, but again, remember the focus here.

to instantiate the kind of checks that are similar to "chip and compiler",
these numbers should be prepended or appended with another string, 
so each separate one becomes _uniquely_identified_ within the book.
i like to use page-number, or the chapter-number or chapter-name,
but even a consecutive bookwide series of numbers will do fine, like:

>    [Footnote 1-01: Compare the Fijian story quoted by Thomson (p. 6).]
>    [Footnote 2-02: Daggett calls the story "a supernatural folklore legend 
of
>    [Footnote 3-04: The changes introduced by these editors have not been
>    [Footnote 4-05: Dr. N. B. Emerson's rendering of the myth of _Pele and
>    [Footnote 5-06: The most important of these chants translated from the
>    [Footnote 1-07: Bastian In Samoanische Sch?pfungssage (p. 8) says:
>    [Footnote 2-08: Lesson says of the Polynesian groups (I, 378): "On sait 
...
>    [Footnote 3-09: Compare: Stair, Old Samoa, p. 271; White, I, 176; Fison,

but the problems go even deeper than a failure to relabel the footnotes
to make them unambiguous.   they cut to the fundamental cornerstone of 
_actually_having_the_correct_numbers_in_the_e-text_in_the_first_place_.

look at the lines marked with "xxxxxx" and you'll see that:
a)   footnote 24 is incorrectly labeled as footnote 21.
b)   footnote 26 is incorrectly labeled as footnote 28.
c)   footnote 76 is incorrectly labeled as footnote 78.
d)   footnote 2 (toward the bottom) is incorrectly labeled as footnote 3.

these mistakes were located with a basic consistency-checking program.

if errors this _blatant_ -- which could've been identified as easily by d.p.
as they were by me, simply by using a program as elementary as mine --
were allowed to pass through the quality-control checks, what confidence
can we have that there are not _many_ more, and far more subtle in nature?

nor were these the only errors, even in the easy task of labeling footnotes.

take a look at these lines, again in order, this time from the _body_ of the 
e-text:
>    became Kekalukaluokewa's, and he portioned out the land[61] and set up
>    Then quickly he went to consult his sister, to Malio.[62]
>    they ask you what long waves you surf on say on the _Huia_.[63] If they
>    over to the coast where Kumukahi[64] swims in the billows, then this is
>    xxxxxx          get the foster child of Kapukaihaoa, Laielohelohe,[66] 
who is like
>    took his foster child's umbilical cord[66] and wore it about his neck.
>    snails[67] singing, then do you two meet apart from the assembly.
>    "The marriage of the chiefs! The marriage of the chiefs!"[68]
>    back and stopped at their sources, no water flowed into the sea.[69]
>    After this the seer took Laieikawai's skirt[70] and laid it down on the
>    Kaeloikamalama's neck.[71]
>    Tahiti."[72]
>    Moanalihaikawaokele and Laukieleula."[73]
>    xxxxxx          Then that bird[71] drooped its wings down and its body 
remained aloft,
>    month bad weather closes down,[75] when the storm clears, there I am
>    place, and they resemble evil spirits in their nature.[76]

as you will notice, the anchor for footnote #65 is incorrectly labeled as 
"66",
and the anchor for footnote #74 is incorrectly labeled as "71".   
embarrassing,
that something as simple as a sequential number-set could contain such flaws.

again, there are _6_ errors in this e-text, just in the footnote numbering 
alone!

and that's a series of _sequential_ numbers, and thus easily checked by 
machine.

further, it's a check that _should_ have been in place many, many years ago, 
by 
both distributed proofreaders _and_ project gutenberg proper.   yet this 
e-text
-- with these _obvious_ errors -- was posted two years ago, and never 
updated.

i will have more to say about duguid later.   but for now, one thing is 
certain...

the checks that are done are _clearly_very_far_ from being "chip/compiler" 
checks.

-bowerbird

>    xxxxxx          [Footnote A: The titles of chapters are added for
>    [Footnote 1: Compare the Fijian story quoted by Thomson (p. 6).]
>    [Footnote 2: Daggett calls the story "a supernatural folklore legend of
>    [Footnote 3: The changes introduced by these editors have not been
>    [Footnote 4: Dr. N. B. Emerson's rendering of the myth of _Pele and
>    [Footnote 5: The most important of these chants translated from the
>    [Footnote 1: Bastian In Samoanische Sch?pfungssage (p. 8) says:
>    [Footnote 2: Lesson says of the Polynesian groups (I, 378): "On sait ...
>    [Footnote 3: Compare: Stair, Old Samoa, p. 271; White, I, 176; Fison,
>    [Footnote 4: Lesson (II, 190) enumerates eleven small islands, covering
>    [Footnote 5: _Kahiki_, in Hawaiian chants, is the term used to designate
>    [Footnote 6: Lesson, II, 152.]
>    [Footnote 7: Ibid., 170.]
>    [Footnote 8: Ibid., 178.]
>    [Footnote 1: In the Polynesian picture of the universe the wall of
>    [Footnote 2: The Rarotongan world of spirits is an underworld. (See
>    [Footnote 3: White, I, chart; Gill, Myths and Songs, pp. 3, 4; Ellis,
>    [Footnote 4: Gill says of the Hervey Islanders (p. 17 of notes): "The
>    [Footnote 5: Bastian, Samoanische Sch?pfungs-Sage; Ellis, I, 321; White,
>    [Footnote 6: Moerenhout translates (I, 419): "He was, _Taaroa_ (Kanaloa)
>    [Footnote 7: Moerenhout, I, 423: "_Taaroa_ slept with the woman called
>    [Footnote 8: Grey, pp. 38-45; Kr?mer, Samoa Inseln, pp. 395-400; Fison,
>    [Footnote 9: In Fornander's collection of origin chants the Hawaiian
>    [Footnote 1: Mariner, II, 103; Turner, Nineteen Tears in Polynesia, pp.
>    [Footnote 2: When a Polynesian invokes a god he prays to the spirit of
>    [Footnote 3: Bird-bodied gods of low grade in the theogony of the
>    [Footnote 4: With the stories quoted from Fornander may be compared such
>    [Footnote 1: Grey, pp. 1-15; White, I, 46; Baessler, Neue S?dsee-Bilder,
>    [Footnote 2: Compare Kr?mer's Samoan story (in Samoa Inseln, p. 413) of
>    [Footnote 3: Kr?mer, Samoa Inseln, pp. 44, 115; Fison, pp. 16,
>    [Footnote 1: As such Paliuli occurs in other Hawaiian folk tales:
>    [Footnote 2: The gods Kane and Kanaloa, who live in the mountains of
>    [Footnote 3: Although the earthly paradise has the same location in both
>    [Footnote 4: First generation: Waka, Kihanuilulumoku,
>    [Footnote 1: J.A. Macculloch (in Childhood of Fiction, p. 2) says,
>    [Footnote 2: Moerenhout, II, 4, 265.]
>    [Footnote 3: Gracia (p. 47) says that the taboo consists in the
>    [Footnote 4: Compare Kr?mer, Samoa Inseln, p. 31; Stair, p. 75; Turner,
>    [Footnote 5: In certain groups inheritance descends on the mother's side
>    [Footnote 6: Kr?mer (p. 32 et seq.) tells us that in Samoa the daughter
>    [Footnote 7: Rivers, I, 374; Malo, p. 80.
>    [Footnote 8: Keaulumoku's description of a Hawaiian chief (Islander,
>    [Footnote 9: Stair, p. 220; Gracia, p. 59; Alexander, History, chap. IV;
>    [Footnote 10: Gracia, p. 46; Mariner, II, 87, 101, 125; Gill, Myths and
>    [Footnote 11: Malo, p. 69.]
>    [Footnote 12: Ellis (III, 36) describes the art of medicine in
>    [Footnote 1: Jarves says: "Songs and chants were common among all
>    [Footnote 2: Moerenhout, I, 411.]
>    [Footnote 3: Andrews, Islander, 1875, p. 35; Emerson, Unwritten
>    [Footnote 4: In Fornander's story of _Lonoikamakahiki_, the chief
>    [Footnote 5: Compare with Ellis, I, 286, and Williams and Calvert, I,
>    [Footnote 6: Gill, Myths and Songs, pp. 268 et seq.]
>    [Footnote 7: See Fornander's stories of _Lonoikamakahiki, Halemano_, and
>    [Footnote 1: In the Hawaiian Annual, 1890, Alexander translates some 
notes
>    [Footnote 2: Moerenhout (I, 501-507) says that the Areois society in
>    [Footnote 3: Emerson, Unwritten Literature, p. 24 (note).]
>    [Footnote 4: This is well illustrated in Fornander's story of
>    [Footnote 5: Thomson says that the Fijians differ from the Polynesians
>    [Footnote 1: Turner, Samoa, p. 220.]
>    [Footnote 2: Ibid.; Moerenhout, I, 407-410.]
>    [Footnote 3: Turner, Samoa, pp. 216-221; Williams and Calvert, I, p.
>    [Footnote 4: Williams and Calvert, I, 118.]
>    [Footnote 5: Moerenhout, II, 146.]
>    [Footnote 1: See Moerenhout, II, 210; Jarves, p. 34; Alexander in
>    [Footnote 2: Fison, p. 100.]
>    [Footnote 1: The following examples are taken from the Laieikawai, where
>    [Footnote 2: In the course of the story of _Laieikawai_ occur more than
>    [Footnote 1: _Kuakoa_, iv, No. 31, translated also in _Hawaiian Annual_,
>    [Footnote 1: Title pages.
>    [Footnote 1: For the translation of Haleole's foreword, which is in a
>    [Footnote 1: Haleole uses the foreign form for wife, _wahine mare_,
>    [Footnote 2: The chief's vow, _olelo paa_, or "fixed word," to slay all
>    [Footnote 3: The phrase _nalo no hoi na wahi huna_, which means 
literally
>    [Footnote 4: Prenatal infanticide, _omilomilo_, was practiced in various
>    [Footnote 5: The _manini_ (_Tenthis sandvicensis_, Street) is a
>    [Footnote 6: The month _Ikuwa_ is variously placed in the calendar year.
>    [Footnote 7: The adoption by their grandparents and hiding away of the
>    [Footnote 8: The _iako_ of a canoe are the two arched sticks which hold
>    [Footnote 9: The verb _hookuiia_ means literally "cause to be pierced"
>    [Footnote 10: Hawaiian challenge stories bring out a strongly felt
>    [Footnote 11: In his invocation the man recognizes the two classes of
>    [Footnote 12: With this judgment of beauty should be compared
>    [Footnote 13: The building of a _heiau_, or temple, was a common means
>    [Footnote 14: The nights of Kane and of Lono follow each other on the
>    [Footnote 15: By _kahoaka_ the Hawaiians designate "the spirit or soul
>    [Footnote 16: The feathers of the _oo_ bird (_Moho nobilis_), with which
>    [Footnote 17: The reference to the temple of Pahauna is one of a number
>    [Footnote 18: The whole treatment of the Kauakahialii episode suggests 
an
>    [Footnote 19: These are all wood birds, in which form Gill tells us 
(Myths
>    [Footnote 20: _Moaulanuiakea_ means literally "Great-broad-red-cock,"
>    [Footnote 21: Compare Gill's story of the first god, Watea, who dreams
>    [Footnote 22: In the song the girl is likened to the lovely _lehua_,
>    [Footnote 23: No other intoxicating liquor save _awa_ was known to the
>    xxxxxx          [Footnote 21: In the Hawaiian form of checkers, called 
_konane_, the
>    25              [Footnote 25: The _malo_ is a loin cloth 3 or 4 yards 
long and a foot
>    xxxxxx          [Footnote 28: In Hawaiian warfare, the biggest boaster 
was the best man,
>    27              [Footnote 27: The idiomatic passages "_aohe puko momona 
o Kohala_,"
>    [Footnote 28: This boast of downing an antagonist with a single blow is
>    [Footnote 29: Shaking hands was of foreign introduction and marks one of
>    [Footnote 30: Famous Hawaiian boxing teachers kept master strokes in
>    [Footnote 31: Few similes are used in the story. This figure of the
>    [Footnote 32: The Polynesians, like the ancient Hebrews, practiced
>    [Footnote 33: The gods invoked by Aiwohikupua are not translated with
>    [Footnote 34: The _laau palau_, literally "wood-that-cuts," which Wise
>    [Footnote 35: The Hawaiian cloak or _kihei_ is a large square, 2 yards
>    [Footnote 36: The meaning of the idiomatic boast _he lala kamahele no ka
>    [Footnote 37: The _puloulou_ is said to have been introduced by Paao
>    [Footnote 38: Long life was the Polynesian idea of divine blessing. Of
>    [Footnote 39: Chickens were a valuable part of a chief's wealth, since
>    [Footnote 40: Mr. Meheula suggested to me this translation of the
>    [Footnote 41: A peculiarly close family relation between brother and
>    [Footnote 42: For the translation of this dialogue I am indebted, to the
>    [Footnote 43: To express the interrelation between brothers and sisters
>    [Footnote 44: The line translated "Fed upon the fruit of sin" contains
>    [Footnote 45: This _ti_-leaf trumpet is constructed from the thin, dry,
>    [Footnote 46: In the story of _Kapuaokaoheloai_ we read that the
>    [Footnote 47: A strict taboo between man and woman forbade eating
>    [Footnote 48: The place of surf riding in Hawaiian song and story
>    [Footnote 49: _Honi_, to kiss, means to "touch" or "smell," and
>    [Footnote 50: The abrupt entrance of the great _moo_, as of its
>    [Footnote 51: The _ieie_ vine and the sweet-scented fern are, like the
>    [Footnote 52: The fight between two _kupua_, one in lizard form, the
>    [Footnote 53: The season for the bird catcher, _kanaka kia manu_, lay
>    [Footnote 54: For the cloud sign compare the story of Kualii's battles
>    [Footnote 55: Of Hawaiians at prayer Dibble says: "The people were in
>    [Footnote 56: The three mountain domes of Hawaii rise from 13,000 to
>    [Footnote 57: The games of _kilu_ and _ume_, which furnished the popular
>    [Footnote 58: In the story of Kauakahialii, his home at Pihanakalani is
>    [Footnote 59: The Hawaiian custom of group marriages between brothers or
>    [Footnote 60: The Hawaiian flute is believed to be of ancient origin. It
>    [Footnote 61: At the accession of a new chief in Hawaii the land is
>    [Footnote 62: The names of Malio and Halaaniani are still to be found in
>    [Footnote 63: The _huia_ is a specially high wave formed by the meeting
>    [Footnote 64: Kumukahi is a bold cape of black lava on the extreme
>    [Footnote 65: The name of Laieikawai occurs in no old chants with which
>    [Footnote 66: To preserve the umbilical cord in order to lengthen the
>    [Footnote 67: More than 470 species of land snails of a single genus,
>    [Footnote 68: This incident is unsatisfactorily treated. We never know
>    [Footnote 69: This episode of the storm is another inconsistency in the
>    [Footnote 70: The _pa-u_ is a woman's main garment, and consists of five
>    [Footnote 71: In mythical quest stories the hero or heroine seeks, by
>    [Footnote 72: According to the old Polynesian system of age groups, the
>    [Footnote 73: The name Laukieleula means "Red-kiele-leaf." The kiele,
>    [Footnote 74: The story of the slaying of Halulu in the legend of
>    [Footnote 75: The divine approach marked by thunder and lightning,
>    xxxxxx          [Footnote 75: Kaonohiokala, Mr. Emerson tells me, is the 
name of one of
>    [Footnote 1: Compare Westervelt's Gods and Ghosts, p. 66.]
>    [Footnote 1: The rock called Kaneaukai, "Man-floating-on-the-sea," on
>    [Footnote 1: See _Kamapuaa_, where the same feat is described.]
>    [Footnote 1: Compare the fishhook Pahuhu in _Nihoalaki_; the _leho_
>    [Footnote 1: Compare _Kalelealuaka_.]
>    [Footnote 1: This is not the Olopana of Hawaii.]
>    [Footnote 1: This is only a fragment of the very popular story of the
>    [Footnote 2: Rev. A.O. Forbes's version of this story is printed in
>    [Footnote 1: See Daggett's account, who places Moikeha's role in the
>    [Footnote 1: Kaulu meets the wizard Makalii in rat form and kills him by
>    xxxxxx          [Footnote 3: Daggett tells the story of _Hua_, priest of 
Maui.]
>    [Footnote 1: This story Fornander calls "the most famous in Hawaiian
>    [Footnote 1: One of the most popular heroes of the Puna, Kau, and Kona
>    [Footnote 1: Mr. Stokes found on the rocks at Kahaluu, near the _heiau_
>    [Footnote 1: This story is much amplified by Mrs. Nakuina in Thrum, p.
>    [Footnote 1: See Thrum, p. 43.]
>    [Footnote 1: Daggett tells this story.]
>    [Footnote 1: Gill tells this same story from the Hervey group. Myths and
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061108/3d8978ff/attachment-0001.html
From Bowerbird at aol.com  Wed Nov  8 12:17:59 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov  8 12:18:08 2006
Subject: [gutvol-d] gvd061108 -- one preliminary thought on duguid
Message-ID: <cc5.2748ab2.32839577@aol.com>

the article by paul duguid that was published
in the influential "first monday" series recently
was quite insightful on a number of dimensions.

i've written up a good many observations on it,
but i think i will begin with a very simple one...

duguid analyzes the proposition that open-source
_software_development_ might not necessarily apply
-- philosophically -- to other _distributed_ processes.

he specifically includes project gutenberg in his paper,
as well as gracenote (which collects musical meta-data),
and the big cahuna of distributed work today, wikipedia.

at the most basic of levels, duguid argues that software 
has a fundamental check of its worthiness, in the form
of whether the compiler will actually _execute_ the code.

errors that would cause the compiler to abort the code
are quickly identified on that basis, without any analysis.

as duguid puts it:
>    solutions must compile and run. 
>    Hence, while Open Source software has relied heavily 
>    on peer production and to a lesser extent on peer review, 
>    for quality, it relies as heavily though perhaps less obviously, 
>    on the chip and the compiler as ultimate arbiters. 
>    These two both identify problems with the code 
>    and reject inadequate solutions

duguid surmises that other types of distributed projects
-- especially the three that are under his microscope --
do not have this immediate and telling check for errors...

in a letter she sent to duguid responding to his article,
juliet from distributed proofreaders made this claim:
>    By successive revision, in comparing text to image 
>    3-5 times for each page, and then by applying 
>    standard tools, checks, etc (quite parallel, in a way, 
>    to compiling code) DP achieves transcription accuracy

you really should read the entirety of juliet's letter.   it's at:
>    http://www.pgdp.net/phpBB2/viewtopic.php?t=22839&start=0

but for right now, i want to focus in on her statement that d.p.'s
"standard checks" are "quite parallel, in a way, to compiling code".

in my experience, both as a person who executes source code
and as a person who has examined the d.p. output very closely,
this statement strains credulity, and even borders on ridiculous.

and i will give an example right here to back up that assessment.

i'm using the e-text entitled "the hawaiian romance of laieikawai":
>    http://www.gutenberg.org/etext/13603

as you can see, this e-text is #13603, which means it is in the
"later half" of the p.g. library.   the files are dated 2004/10/5.
this e-text was included in the bundle prepared to celebrate
the 5,000th e-text created by d.p., so it's in the "later half" of
all e-texts digitized by d.p. as well.   (their total now is 9,320.)

this file predates the switch from a 2-round to a 4-round system,
but i'm not going to focus on _proofing_ errors right now, merely
on mistakes that dispute juliet's "similar to compiling code" claim.

i've appended to this post the first line of each footnote, in order.

there are all kinds of comments i could make (and will make later)
about the footnote numbering, but again, remember the focus here.

to instantiate the kind of checks that are similar to "chip and compiler",
these numbers should be prepended or appended with another string, 
so each separate one becomes _uniquely_identified_ within the book.
i like to use page-number, or the chapter-number or chapter-name,
but even a consecutive bookwide series of numbers will do fine, like:

>    [Footnote 1-01: Compare the Fijian story quoted by Thomson (p. 6).]
>    [Footnote 2-02: Daggett calls the story "a supernatural folklore legend 
of
>    [Footnote 3-04: The changes introduced by these editors have not been
>    [Footnote 4-05: Dr. N. B. Emerson's rendering of the myth of _Pele and
>    [Footnote 5-06: The most important of these chants translated from the
>    [Footnote 1-07: Bastian In Samoanische Sch?pfungssage (p. 8) says:
>    [Footnote 2-08: Lesson says of the Polynesian groups (I, 378): "On sait 
...
>    [Footnote 3-09: Compare: Stair, Old Samoa, p. 271; White, I, 176; Fison,

but the problems go even deeper than a failure to relabel the footnotes
to make them unambiguous.   they cut to the fundamental cornerstone of 
_actually_having_the_correct_numbers_in_the_e-text_in_the_first_place_.

look at the lines marked with "xxxxxx" and you'll see that:
a)   footnote 24 is incorrectly labeled as footnote 21.
b)   footnote 26 is incorrectly labeled as footnote 28.
c)   footnote 76 is incorrectly labeled as footnote 78.
d)   footnote 2 (toward the bottom) is incorrectly labeled as footnote 3.

these mistakes were located with a basic consistency-checking program.

if errors this _blatant_ -- which could've been identified as easily by d.p.
as they were by me, simply by using a program as elementary as mine --
were allowed to pass through the quality-control checks, what confidence
can we have that there are not _many_ more, and far more subtle in nature?

nor were these the only errors, even in the easy task of labeling footnotes.

take a look at these lines, again in order, this time from the _body_ of the 
e-text:
>    became Kekalukaluokewa's, and he portioned out the land[61] and set up
>    Then quickly he went to consult his sister, to Malio.[62]
>    they ask you what long waves you surf on say on the _Huia_.[63] If they
>    over to the coast where Kumukahi[64] swims in the billows, then this is
>    xxxxxx          get the foster child of Kapukaihaoa, Laielohelohe,[66] 
who is like
>    took his foster child's umbilical cord[66] and wore it about his neck.
>    snails[67] singing, then do you two meet apart from the assembly.
>    "The marriage of the chiefs! The marriage of the chiefs!"[68]
>    back and stopped at their sources, no water flowed into the sea.[69]
>    After this the seer took Laieikawai's skirt[70] and laid it down on the
>    Kaeloikamalama's neck.[71]
>    Tahiti."[72]
>    Moanalihaikawaokele and Laukieleula."[73]
>    xxxxxx          Then that bird[71] drooped its wings down and its body 
remained aloft,
>    month bad weather closes down,[75] when the storm clears, there I am
>    place, and they resemble evil spirits in their nature.[76]

as you will notice, the anchor for footnote #65 is incorrectly labeled as 
"66",
and the anchor for footnote #74 is incorrectly labeled as "71".   
embarrassing,
that something as simple as a sequential number-set could contain such flaws.

again, there are _6_ errors in this e-text, just in the footnote numbering 
alone!

and that's a series of _sequential_ numbers, and thus easily checked by 
machine.

further, it's a check that _should_ have been in place many, many years ago, 
by 
both distributed proofreaders _and_ project gutenberg proper.   yet this 
e-text
-- with these _obvious_ errors -- was posted two years ago, and never 
updated.

i will have more to say about duguid later.   but for now, one thing is 
certain...

the checks that are done are _clearly_very_far_ from being "chip/compiler" 
checks.

-bowerbird

>    xxxxxx          [Footnote A: The titles of chapters are added for
>    [Footnote 1: Compare the Fijian story quoted by Thomson (p. 6).]
>    [Footnote 2: Daggett calls the story "a supernatural folklore legend of
>    [Footnote 3: The changes introduced by these editors have not been
>    [Footnote 4: Dr. N. B. Emerson's rendering of the myth of _Pele and
>    [Footnote 5: The most important of these chants translated from the
>    [Footnote 1: Bastian In Samoanische Sch?pfungssage (p. 8) says:
>    [Footnote 2: Lesson says of the Polynesian groups (I, 378): "On sait ...
>    [Footnote 3: Compare: Stair, Old Samoa, p. 271; White, I, 176; Fison,
>    [Footnote 4: Lesson (II, 190) enumerates eleven small islands, covering
>    [Footnote 5: _Kahiki_, in Hawaiian chants, is the term used to designate
>    [Footnote 6: Lesson, II, 152.]
>    [Footnote 7: Ibid., 170.]
>    [Footnote 8: Ibid., 178.]
>    [Footnote 1: In the Polynesian picture of the universe the wall of
>    [Footnote 2: The Rarotongan world of spirits is an underworld. (See
>    [Footnote 3: White, I, chart; Gill, Myths and Songs, pp. 3, 4; Ellis,
>    [Footnote 4: Gill says of the Hervey Islanders (p. 17 of notes): "The
>    [Footnote 5: Bastian, Samoanische Sch?pfungs-Sage; Ellis, I, 321; White,
>    [Footnote 6: Moerenhout translates (I, 419): "He was, _Taaroa_ (Kanaloa)
>    [Footnote 7: Moerenhout, I, 423: "_Taaroa_ slept with the woman called
>    [Footnote 8: Grey, pp. 38-45; Kr?mer, Samoa Inseln, pp. 395-400; Fison,
>    [Footnote 9: In Fornander's collection of origin chants the Hawaiian
>    [Footnote 1: Mariner, II, 103; Turner, Nineteen Tears in Polynesia, pp.
>    [Footnote 2: When a Polynesian invokes a god he prays to the spirit of
>    [Footnote 3: Bird-bodied gods of low grade in the theogony of the
>    [Footnote 4: With the stories quoted from Fornander may be compared such
>    [Footnote 1: Grey, pp. 1-15; White, I, 46; Baessler, Neue S?dsee-Bilder,
>    [Footnote 2: Compare Kr?mer's Samoan story (in Samoa Inseln, p. 413) of
>    [Footnote 3: Kr?mer, Samoa Inseln, pp. 44, 115; Fison, pp. 16,
>    [Footnote 1: As such Paliuli occurs in other Hawaiian folk tales:
>    [Footnote 2: The gods Kane and Kanaloa, who live in the mountains of
>    [Footnote 3: Although the earthly paradise has the same location in both
>    [Footnote 4: First generation: Waka, Kihanuilulumoku,
>    [Footnote 1: J.A. Macculloch (in Childhood of Fiction, p. 2) says,
>    [Footnote 2: Moerenhout, II, 4, 265.]
>    [Footnote 3: Gracia (p. 47) says that the taboo consists in the
>    [Footnote 4: Compare Kr?mer, Samoa Inseln, p. 31; Stair, p. 75; Turner,
>    [Footnote 5: In certain groups inheritance descends on the mother's side
>    [Footnote 6: Kr?mer (p. 32 et seq.) tells us that in Samoa the daughter
>    [Footnote 7: Rivers, I, 374; Malo, p. 80.
>    [Footnote 8: Keaulumoku's description of a Hawaiian chief (Islander,
>    [Footnote 9: Stair, p. 220; Gracia, p. 59; Alexander, History, chap. IV;
>    [Footnote 10: Gracia, p. 46; Mariner, II, 87, 101, 125; Gill, Myths and
>    [Footnote 11: Malo, p. 69.]
>    [Footnote 12: Ellis (III, 36) describes the art of medicine in
>    [Footnote 1: Jarves says: "Songs and chants were common among all
>    [Footnote 2: Moerenhout, I, 411.]
>    [Footnote 3: Andrews, Islander, 1875, p. 35; Emerson, Unwritten
>    [Footnote 4: In Fornander's story of _Lonoikamakahiki_, the chief
>    [Footnote 5: Compare with Ellis, I, 286, and Williams and Calvert, I,
>    [Footnote 6: Gill, Myths and Songs, pp. 268 et seq.]
>    [Footnote 7: See Fornander's stories of _Lonoikamakahiki, Halemano_, and
>    [Footnote 1: In the Hawaiian Annual, 1890, Alexander translates some 
notes
>    [Footnote 2: Moerenhout (I, 501-507) says that the Areois society in
>    [Footnote 3: Emerson, Unwritten Literature, p. 24 (note).]
>    [Footnote 4: This is well illustrated in Fornander's story of
>    [Footnote 5: Thomson says that the Fijians differ from the Polynesians
>    [Footnote 1: Turner, Samoa, p. 220.]
>    [Footnote 2: Ibid.; Moerenhout, I, 407-410.]
>    [Footnote 3: Turner, Samoa, pp. 216-221; Williams and Calvert, I, p.
>    [Footnote 4: Williams and Calvert, I, 118.]
>    [Footnote 5: Moerenhout, II, 146.]
>    [Footnote 1: See Moerenhout, II, 210; Jarves, p. 34; Alexander in
>    [Footnote 2: Fison, p. 100.]
>    [Footnote 1: The following examples are taken from the Laieikawai, where
>    [Footnote 2: In the course of the story of _Laieikawai_ occur more than
>    [Footnote 1: _Kuakoa_, iv, No. 31, translated also in _Hawaiian Annual_,
>    [Footnote 1: Title pages.
>    [Footnote 1: For the translation of Haleole's foreword, which is in a
>    [Footnote 1: Haleole uses the foreign form for wife, _wahine mare_,
>    [Footnote 2: The chief's vow, _olelo paa_, or "fixed word," to slay all
>    [Footnote 3: The phrase _nalo no hoi na wahi huna_, which means 
literally
>    [Footnote 4: Prenatal infanticide, _omilomilo_, was practiced in various
>    [Footnote 5: The _manini_ (_Tenthis sandvicensis_, Street) is a
>    [Footnote 6: The month _Ikuwa_ is variously placed in the calendar year.
>    [Footnote 7: The adoption by their grandparents and hiding away of the
>    [Footnote 8: The _iako_ of a canoe are the two arched sticks which hold
>    [Footnote 9: The verb _hookuiia_ means literally "cause to be pierced"
>    [Footnote 10: Hawaiian challenge stories bring out a strongly felt
>    [Footnote 11: In his invocation the man recognizes the two classes of
>    [Footnote 12: With this judgment of beauty should be compared
>    [Footnote 13: The building of a _heiau_, or temple, was a common means
>    [Footnote 14: The nights of Kane and of Lono follow each other on the
>    [Footnote 15: By _kahoaka_ the Hawaiians designate "the spirit or soul
>    [Footnote 16: The feathers of the _oo_ bird (_Moho nobilis_), with which
>    [Footnote 17: The reference to the temple of Pahauna is one of a number
>    [Footnote 18: The whole treatment of the Kauakahialii episode suggests 
an
>    [Footnote 19: These are all wood birds, in which form Gill tells us 
(Myths
>    [Footnote 20: _Moaulanuiakea_ means literally "Great-broad-red-cock,"
>    [Footnote 21: Compare Gill's story of the first god, Watea, who dreams
>    [Footnote 22: In the song the girl is likened to the lovely _lehua_,
>    [Footnote 23: No other intoxicating liquor save _awa_ was known to the
>    xxxxxx          [Footnote 21: In the Hawaiian form of checkers, called 
_konane_, the
>    25              [Footnote 25: The _malo_ is a loin cloth 3 or 4 yards 
long and a foot
>    xxxxxx          [Footnote 28: In Hawaiian warfare, the biggest boaster 
was the best man,
>    27              [Footnote 27: The idiomatic passages "_aohe puko momona 
o Kohala_,"
>    [Footnote 28: This boast of downing an antagonist with a single blow is
>    [Footnote 29: Shaking hands was of foreign introduction and marks one of
>    [Footnote 30: Famous Hawaiian boxing teachers kept master strokes in
>    [Footnote 31: Few similes are used in the story. This figure of the
>    [Footnote 32: The Polynesians, like the ancient Hebrews, practiced
>    [Footnote 33: The gods invoked by Aiwohikupua are not translated with
>    [Footnote 34: The _laau palau_, literally "wood-that-cuts," which Wise
>    [Footnote 35: The Hawaiian cloak or _kihei_ is a large square, 2 yards
>    [Footnote 36: The meaning of the idiomatic boast _he lala kamahele no ka
>    [Footnote 37: The _puloulou_ is said to have been introduced by Paao
>    [Footnote 38: Long life was the Polynesian idea of divine blessing. Of
>    [Footnote 39: Chickens were a valuable part of a chief's wealth, since
>    [Footnote 40: Mr. Meheula suggested to me this translation of the
>    [Footnote 41: A peculiarly close family relation between brother and
>    [Footnote 42: For the translation of this dialogue I am indebted, to the
>    [Footnote 43: To express the interrelation between brothers and sisters
>    [Footnote 44: The line translated "Fed upon the fruit of sin" contains
>    [Footnote 45: This _ti_-leaf trumpet is constructed from the thin, dry,
>    [Footnote 46: In the story of _Kapuaokaoheloai_ we read that the
>    [Footnote 47: A strict taboo between man and woman forbade eating
>    [Footnote 48: The place of surf riding in Hawaiian song and story
>    [Footnote 49: _Honi_, to kiss, means to "touch" or "smell," and
>    [Footnote 50: The abrupt entrance of the great _moo_, as of its
>    [Footnote 51: The _ieie_ vine and the sweet-scented fern are, like the
>    [Footnote 52: The fight between two _kupua_, one in lizard form, the
>    [Footnote 53: The season for the bird catcher, _kanaka kia manu_, lay
>    [Footnote 54: For the cloud sign compare the story of Kualii's battles
>    [Footnote 55: Of Hawaiians at prayer Dibble says: "The people were in
>    [Footnote 56: The three mountain domes of Hawaii rise from 13,000 to
>    [Footnote 57: The games of _kilu_ and _ume_, which furnished the popular
>    [Footnote 58: In the story of Kauakahialii, his home at Pihanakalani is
>    [Footnote 59: The Hawaiian custom of group marriages between brothers or
>    [Footnote 60: The Hawaiian flute is believed to be of ancient origin. It
>    [Footnote 61: At the accession of a new chief in Hawaii the land is
>    [Footnote 62: The names of Malio and Halaaniani are still to be found in
>    [Footnote 63: The _huia_ is a specially high wave formed by the meeting
>    [Footnote 64: Kumukahi is a bold cape of black lava on the extreme
>    [Footnote 65: The name of Laieikawai occurs in no old chants with which
>    [Footnote 66: To preserve the umbilical cord in order to lengthen the
>    [Footnote 67: More than 470 species of land snails of a single genus,
>    [Footnote 68: This incident is unsatisfactorily treated. We never know
>    [Footnote 69: This episode of the storm is another inconsistency in the
>    [Footnote 70: The _pa-u_ is a woman's main garment, and consists of five
>    [Footnote 71: In mythical quest stories the hero or heroine seeks, by
>    [Footnote 72: According to the old Polynesian system of age groups, the
>    [Footnote 73: The name Laukieleula means "Red-kiele-leaf." The kiele,
>    [Footnote 74: The story of the slaying of Halulu in the legend of
>    [Footnote 75: The divine approach marked by thunder and lightning,
>    xxxxxx          [Footnote 75: Kaonohiokala, Mr. Emerson tells me, is the 
name of one of
>    [Footnote 1: Compare Westervelt's Gods and Ghosts, p. 66.]
>    [Footnote 1: The rock called Kaneaukai, "Man-floating-on-the-sea," on
>    [Footnote 1: See _Kamapuaa_, where the same feat is described.]
>    [Footnote 1: Compare the fishhook Pahuhu in _Nihoalaki_; the _leho_
>    [Footnote 1: Compare _Kalelealuaka_.]
>    [Footnote 1: This is not the Olopana of Hawaii.]
>    [Footnote 1: This is only a fragment of the very popular story of the
>    [Footnote 2: Rev. A.O. Forbes's version of this story is printed in
>    [Footnote 1: See Daggett's account, who places Moikeha's role in the
>    [Footnote 1: Kaulu meets the wizard Makalii in rat form and kills him by
>    xxxxxx          [Footnote 3: Daggett tells the story of _Hua_, priest of 
Maui.]
>    [Footnote 1: This story Fornander calls "the most famous in Hawaiian
>    [Footnote 1: One of the most popular heroes of the Puna, Kau, and Kona
>    [Footnote 1: Mr. Stokes found on the rocks at Kahaluu, near the _heiau_
>    [Footnote 1: This story is much amplified by Mrs. Nakuina in Thrum, p.
>    [Footnote 1: See Thrum, p. 43.]
>    [Footnote 1: Daggett tells this story.]
>    [Footnote 1: Gill tells this same story from the Hervey group. Myths and
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061108/0cfb5586/attachment-0001.html
From phil at thalasson.com  Wed Nov  8 12:25:33 2006
From: phil at thalasson.com (Philip Baker)
Date: Wed Nov  8 12:28:48 2006
Subject: [gutvol-d] The interesting comment by Lee and what
	the	PG	collection will become known for...
In-Reply-To: <455114CC.3060203@novomail.net>
Message-ID: <EQrDAOA90jUFFw+F@thalasson.com>

In article <455114CC.3060203@novomail.net>, Lee Passey
<lee@novomail.net> writes
>Onorio Catenacci wrote:
>
>[snip]
>
>> Just to be sure I'm clear: "casual reading" implies good enough to
>> read and follow but not good enough for a scholarly dissertation?  :-)
>
>This is not a bad summation of my position, although, as usual, the 
>devil is in the details. You have posited two extremes: 1. good enough 
>to read and follow (to which I would add "given a modicum of effort") 
>and 2. good enough for a scholarly dissertation.
>
>Now I would agree that the vast majority of Project Gutenberg e-texts 
>are probably good enough to read and follow given a modicum of effort. 
>And I suspect that you would agree that the vast majority (and perhaps 
>even the totality) of Project Gutenberg e-texts are inadequate for a 
>scholarly dissertation. But what about those situations which fall 
>between the extremes?
>
>The fundamental problem is that Project Gutenberg is totally lacking in 
>standards, so it is impossible to judge how well any given e-text 
>matches any given use. And we probably don't have enough data to 
>determine how well Project Gutenberg e-texts, on the whole, satisfy any 
>external standards. But personally, I find that generally Project 
>Gutenberg e-texts 1. are inadequate for a scholarly dissertation; AND 2. 
>are inadequate for assigned reading at a high-school level; AND 3. are 
>inadequate for inclusion in any public or school library; AND 4. are 
>inadequate for any type of automated data processing; AND 5. are mostly 
>inadequate for /effortless/ reading.
>
>I suspect that there /may/ be some gems in the PG corpus which are 
>adequate for any or all of the above uses, but again, because Project 
>Gutenberg has no standards, it is virtually impossible to identify those 
>e-texts except on a case-by-case basis. Given the lack of indications of 
>quality, I must accept as a default position that Project Gutenberg 
>e-texts are good enough to read and follow by a human being (and not a 
>computer) given a modicum of effort; but nothing more.
>


But what paper book publishers have explicitly stated standards? I don't
remember seeing any, but I commonly find fairly obvious errors in paper
books from all sorts of publishers all the way from large publishers
with a fair degree of prestige like Oxford University Press to tiny
publishers that have only published a handful of books. And when do
paper book editions of 'classic' out of copyright works meticulously
present the sources for their text? 

And looking outside North America and Europe the situation is even
worse. What standards are going to apply to a bootleg Chinese
translation of a Harry Potter book where each page has a different
translator?

And then there are audio books. The typical commercial audio book novel
has about one third of the content of the print edition. Do you want
every audio book sold to come with a paper book complete text with the
audio book omissions and changes marked up?

You have set up some rather ambiguous standards for ebooks but how many
paper books meet those standards? Do you just assume that most do?

-- 
Philip Baker
From jon at noring.name  Wed Nov  8 12:32:37 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 12:30:58 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
Message-ID: <754776753.20061108133237@noring.name>

A private communication by Joshua (who said I may share this here)
casually mentioned that the earliest DP texts submitted to PG were
stripped of their source information, and they remain that way today.
(This was before PG's policy change regarding source information as
outlined by BBI.)

If this is true, at least in the general sense, this indicates to me
a need for DP to resubmit those texts to PG. Are there plans to do
this, and how difficult would it be?

Jon

From jmdyck at ibiblio.org  Wed Nov  8 13:19:37 2006
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Wed Nov  8 13:19:41 2006
Subject: [gutvol-d] DP's contribution to PG
In-Reply-To: <817072195.20061108093213@noring.name>
References: <c22.8e66618.3283211f@aol.com>
	<817072195.20061108093213@noring.name>
Message-ID: <455249E9.9080608@ibiblio.org>

Jon Noring wrote:
> DP ... has submitted over half of the texts to PG.

On the face of it, pgdp.net has contributed 9,321 of PG-US's
19,695 etexts, so only 47.3% so far.

At recent rates, it'll take about a year for that to get up to 50%.

-Michael

From jon at noring.name  Wed Nov  8 13:44:40 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 13:43:02 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <EQrDAOA90jUFFw+F@thalasson.com>
References: <455114CC.3060203@novomail.net> <EQrDAOA90jUFFw+F@thalasson.com>
Message-ID: <324722079.20061108144440@noring.name>

Philip wrote:

> But what paper book publishers have explicitly stated standards? I don't
> remember seeing any, but I commonly find fairly obvious errors in paper
> books from all sorts of publishers all the way from large publishers
> with a fair degree of prestige like Oxford University Press to tiny
> publishers that have only published a handful of books. And when do
> paper book editions of 'classic' out of copyright works meticulously
> present the sources for their text?

Very interesting question.

And I was composing a long reply differentiating what PG is doing
versus what paper book publishers do, the filtering mechanisms at the
scholarly level, the worry a major publisher has in not doing shoddy
work (if they seek quality), etc. But I decided not to discuss that
here, but instead ask:

"Shouldn't PG be better than commercial paper book publishers when it
comes to doing things right?"

and

"Why should a bottoms-up organization like PG blindly emulate the
practices of top-down publishers who only care about making money?"

The question (and this is not a criticism) also reminds me of the
teenager telling their parents "well, everyone else is doing it."
That sort of argument wears thin on any parent who knows (or should
know) what's best for their children.

Now, obviously, one can argue if the source materials which are
transcribed themselves were done shoddily, then why should we care
about faithful reproduction, warts and all. Well, let's parse that:

1) If there is only one edition of a work, that *is* the canonical
   expression of the author, no matter how badly the publisher mangled
   it from authorial intent. That's *all* we have to go by unless the
   original author's manuscript happens to turn up (which infrequently
   happens.)

2) If a work exists in several editions, then here a case-by-case
   bibliographic analysis should be done to determine which source
   should be used (if there is a choice.) This may include scholarly
   input by those who've studied the Work and its various editions,
   and looking at other things like if it is a first edition, if a
   subsequent edition was published by the same publishers, etc.

Of course, even if a source used has flaws, why would PG or anyone
else attempt to "fix" them (without marking what fixes were made) and
potentially add more problems?

Jon

From prosfilaes at gmail.com  Wed Nov  8 13:55:08 2006
From: prosfilaes at gmail.com (David Starner)
Date: Wed Nov  8 13:55:12 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
In-Reply-To: <754776753.20061108133237@noring.name>
References: <754776753.20061108133237@noring.name>
Message-ID: <6d99d1fd0611081355i73507b7bhfd0e7583c5fa7438@mail.gmail.com>

On 11/8/06, Jon Noring <jon@noring.name> wrote:
> If this is true, at least in the general sense, this indicates to me
> a need for DP to resubmit those texts to PG. Are there plans to do
> this, and how difficult would it be?

There's been discussions, but errata people have been really resistant
to changes that just add source information.
From prosfilaes at gmail.com  Wed Nov  8 14:01:20 2006
From: prosfilaes at gmail.com (David Starner)
Date: Wed Nov  8 14:01:23 2006
Subject: [gutvol-d] let's close this up
In-Reply-To: <45521581.8080906@srv.net>
References: <c22.8e66618.3283211f@aol.com>
	<817072195.20061108093213@noring.name> <45521581.8080906@srv.net>
Message-ID: <6d99d1fd0611081401g196f13beg4265b2066be47128@mail.gmail.com>

On 11/8/06, Kevin Handy <kth@srv.net> wrote:
> I believe one major problem is finding enough disk space
> to hold the scans.

I think Archive.org is happy to handle the scans. Some have already be
uploaded there, and they've been more than happy to handle all the
scans they can get.
From sam.bretheim at gmail.com  Wed Nov  8 14:30:29 2006
From: sam.bretheim at gmail.com (Sam Bretheim)
Date: Wed Nov  8 14:31:18 2006
Subject: [gutvol-d] The interesting comment by Lee and what the
	PG	collection will become known for...
In-Reply-To: <324722079.20061108144440@noring.name>
References: <455114CC.3060203@novomail.net> <EQrDAOA90jUFFw+F@thalasson.com>
	<324722079.20061108144440@noring.name>
Message-ID: <45525A85.4070803@gmail.com>

Jon Noring wrote:
> Now, obviously, one can argue if the source materials which are
> transcribed themselves were done shoddily, then why should we care
> about faithful reproduction, warts and all. Well, let's parse that:
>   

One of TEI's unique advantages is that it lets us preserve information 
about both the source document used for transcription and the author's 
presumed intent.  For instance, consider this sentence, authentically 
transcribed from a DP book ("French and oriental love in a harem") that 
we believe was typeset by aliens:

    "I am langhing at all those stories abont yonr harems which yon 
still make np for me jnst as yon nsed to do for that idiot Hadidje."

In TEI, we can mark this up as:

    <choice>
        <sic>"I am langhing at all those stories abont yonr harems which 
yon still make np for me jnst as yon nsed to do for that idiot 
Hadidje."</sic>
        <corr resp="sab">"I am laughing at all those stories about your 
harems which you still make up for me just as you used to do for that 
idiot Hadidje."</corr>
    </choice>

... where the "resp" attribute indicates the PG proofreader who 
suggested the correction. If it had just been one word, we could have 
tagged just that word (or even just the one incorrect letter):

    "I am <choice><sic>langhing</sic><corr 
resp="sab">laughing</corr></choice> at all those stories about your 
harems which you still make up for me just as you used to do for that 
idiot Hadidje."

However, that particular book is so awful that marking all the necessary 
corrections would be insane.  (The typesetter was mostly uninterested in 
fine distinctions like u/n, h/b, e/c/o, etc.)  For situations like that, 
the TEI header has a plethora of fields for describing the editorial 
intent of the people who made the digital version.

TEI also has similar inline mechanisms for indicating differences 
between editions; any number of editions of a text can be combined into 
a single TEI copy that preserves all of the information in the original 
books.

From joshua at hutchinson.net  Wed Nov  8 17:34:58 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov  8 17:35:02 2006
Subject: [gutvol-d] let's close this up
Message-ID: <10954230.1163036098985.JavaMail.?@fh1063.dia.cp.net>


>----Original Message----
>From: prosfilaes@gmail.com
>On 11/8/06, Kevin Handy <kth@srv.net> wrote:
>> I believe one major problem is finding enough disk space
>> to hold the scans.
>
>I think Archive.org is happy to handle the scans. Some have already 
be
>uploaded there, and they've been more than happy to handle all the
>scans they can get.

PG will take scans, too, now.  I've been uploading image scans with my 
texts for a couple months.  Sometimes the files are large enough that 
it would timeout the upload window, so other arrangements are sometimes 
necessary.

In fact, if people want, I'd be happy to handle the image uploads (I 
have upload status for audio files and just upload page images isn't 
stretching my abilities much).  Then, you can upload texts normally and 
then once it is posted, give me a location where the image files are 
located and I can upload those.

If you want to do something like this, give me a holler.  I'd 
appreciate if you renumbered the files yourself so that each file 
matches the page number from the book so that I don't have to download 
the files, renumber them and reupload to the pglaf server..

Please follow the numbering proposal Marcello posted a while back 
(quote below):

***Numbering / Naming of page files***

A book usually contains 2 page number sequences, a roman one followed 
by
an arabic one. We considered the cover pages as yet another sequence.

A filename for a single-page png file MUST follow this pattern:

  <prefix><page number>.png

The prefix for the cover pages is: "c".

The prefix for the roman pages is: "f".

The prefix for the arabic pages is: "p".

If there are more page number sequences in the book, they MUST be
handled in a similar fashion, using an arbitrary free letter.

The <page number> is the true page number as seen on the physical page
(or inferred from the previous / next pages) expressed in arabic
numerals and left-padded with zeroes to a length of 4 digits.

For blank pages there should be no file and the page number should be
skipped. Optionally an image saying: "This page is blank in the
original." may be inserted. Missing pages MUST be replaced by an image
saying: "This page is missing."

If present, front cover, back cover and spine MUST be named as 
follows:

  front cover outside: c0001.png
  front cover inside:  c0002.png
  back cover inside:   c0003.png
  back cover outside:  c0004.png
  spine:               c0005.png


Example of file naming:

  front cover      c0001.png
  back cover       c0004.png
  spine            c0005.png

  i title page     f0001.png
  ii title verso   f0002.png
  iii dedication   f0003.png
  iv is blank
  v contents       f0005.png

  page 1           p0001.png
  page 2           p0002.png
  image on page 2  p0002-1.png
  image on page 2  p0002-2.png
  page 3           p0003.png
  page 4 is blank
  page 5           p0005.png
  ...              ...
  page 9999        p9999.png

***

If anyone wants to take me up on this, please do.  I think having page 
images in the PG archives is a "good thing" and just posting image 
files isn't that time intensive (well, the upload/download part can be, 
but that can be easily automated!)

Josh
From joshua at hutchinson.net  Wed Nov  8 17:44:00 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov  8 17:44:02 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
Message-ID: <21300953.1163036640831.JavaMail.?@fh1063.dia.cp.net>

>----Original Message----
>From: prosfilaes@gmail.com
>On 11/8/06, Jon Noring <jon@noring.name> wrote:
>> If this is true, at least in the general sense, this indicates to 
me
>> a need for DP to resubmit those texts to PG. Are there plans to do
>> this, and how difficult would it be?
>
>There's been discussions, but errata people have been really 
resistant
>to changes that just add source information.

I think it is more that the errata folks are already overworked and 
this seems a small return on their invested time.

Perhaps if someone "trustworthy" on the DP side were to oversee the 
process so that the errata folks could just take the fixed file(s) and 
push it to the server, it wouldn't be a big deal.

I leave it as an exercise to those more interested in this to organize 
and quantify who is "trustworthy" enough to oversee this.

Josh
From donovan at abs.net  Wed Nov  8 18:19:55 2006
From: donovan at abs.net (D Garcia)
Date: Wed Nov  8 18:20:19 2006
Subject: [gutvol-d] DP's contribution to PG
In-Reply-To: <455249E9.9080608@ibiblio.org>
References: <c22.8e66618.3283211f@aol.com>
	<817072195.20061108093213@noring.name>
	<455249E9.9080608@ibiblio.org>
Message-ID: <200611082119.56116.donovan@abs.net>

On Wednesday 08 November 2006 04:19 pm, Michael Dyck wrote:
> Jon Noring wrote:
> > DP ... has submitted over half of the texts to PG.
>
> On the face of it, pgdp.net has contributed 9,321 of PG-US's
> 19,695 etexts, so only 47.3% so far.

I'm betting that's not excluding the audio books.
What's it look like with those excluded?
From jon at noring.name  Wed Nov  8 19:27:19 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 19:25:46 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
In-Reply-To: <21300953.1163036640831.JavaMail.?@fh1063.dia.cp.net>
References: <21300953.1163036640831.JavaMail.?@fh1063.dia.cp.net>
Message-ID: <24296608.20061108202719@noring.name>

Joshua wrote:

> There's been discussions, but errata people have been really
> resistant to changes that just add source information.
>
> [snip]
>
> Perhaps if someone "trustworthy" on the DP side were to oversee the
> process so that the errata folks could just take the fixed file(s)
> and push it to the server, it wouldn't be a big deal.
>
> I leave it as an exercise to those more interested in this to organize
> and quantify who is "trustworthy" enough to oversee this.

I first thought that all one had to do is to resubmit the DP versions,
which include the source information, to DP to replace the ones in the
PG catalog.

But as I see this discussion unfold, I sense that the DP versions at
PG may have been corrected (at the PG-side), and that the corrections
do not flow back to DP. Is this right? (If so, this is a little bit
troubling since I think DP should hold the "master" and do any
corrections, but I'll defer commenting further on this point.)

So from my limited vantage point, I see the following:

1) If a DP-provided text has never been corrected after it was
   stripped of its source info, DP simply resubmits the text as it
   originally was submitted.

2) If a DP-provided text had corrections, then maybe the insertion
   of the source info should be done by DP.


Anyway, I'm sort of operating in the dark on this particular issue, so
what sayeth the DP and PG folk?

Jon Noring


From joshua at hutchinson.net  Thu Nov  9 05:06:42 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Thu Nov  9 05:06:46 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
Message-ID: <5438407.1163077602820.JavaMail.?@fh1037.dia.cp.net>


>----Original Message----
>From: jon@noring.name
>
>I first thought that all one had to do is to resubmit the DP 
versions,
>which include the source information, to DP to replace the ones in 
the
>PG catalog.
>

DP doesn't keep a "master" file.  In some cases, we've probably got 
something very close to the final product posted to PG, but ultimately, 
PG is the archive that holds the final product.

That being said, buried within old backups, all the metadata is THERE 
at DP.  It would be a matter of readding that information to the file
(s) on PG's archive.  Something non-trivial in scope.

Josh

PS If any of the DP admins are reading, please feel free to correct 
any mistakes or omissions.
From lee at novomail.net  Thu Nov  9 07:54:17 2006
From: lee at novomail.net (Lee Passey)
Date: Thu Nov  9 07:53:41 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <31978970.1162579360874.JavaMail.?@fh1063.dia.cp.net>
References: <31978970.1162579360874.JavaMail.?@fh1063.dia.cp.net>
Message-ID: <45534F29.7080309@novomail.net>

joshua@hutchinson.net wrote:
> However, it could definitely be improved and flesh out.
>
> As far as community feedback, that tends to happen more in the DP 
> forums where folks are more actively putting together and talking about 
> new etexts.
>   

So, I've made my first pass through the document, and I have a few 
comments. It would appear that most of the discussion on this topic 
occurs in the DP forums, so could you tell me which one would be the 
most appropriate? Or perhaps a thread subject to search for?

If most of this work is occurring under aegis of Distributed 
Proofreaders, and knowing as we do that Project Gutenberg eschews 
standards, wouldn't it be better to call this project DPTEI, and perhaps 
move the documentation and tools to the DP servers?

From lee at novomail.net  Thu Nov  9 07:59:35 2006
From: lee at novomail.net (Lee Passey)
Date: Thu Nov  9 07:58:57 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
In-Reply-To: <5438407.1163077602820.JavaMail.?@fh1037.dia.cp.net>
References: <5438407.1163077602820.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <45535067.1030008@novomail.net>

joshua@hutchinson.net wrote:
> > ----Original Message---- From: jon@noring.name
> >
> > I first thought that all one had to do is to resubmit the DP
> > versions, which include the source information, to DP to replace
> > the ones in the PG catalog.
>
>  DP doesn't keep a "master" file.  In some cases, we've probably got
>  something very close to the final product posted to PG, but
>  ultimately, PG is the archive that holds the final product.

If this is true, I find it very troubling (I'm the kind of person who 
finds any loss of meaningful data troubling). I would think that the 
Internet Archive would be very interested in maintaining DP output. Is 
there any reason that DP would not be willing to publish to PG and IA 
simultaneously? Is there any reason that IA would not be willing to 
accept DP output?

From joshua at hutchinson.net  Thu Nov  9 08:14:57 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Thu Nov  9 08:15:16 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
Message-ID: <9017688.1163088897747.JavaMail.?@fh1037.dia.cp.net>


>----Original Message----
>From: lee@novomail.net
>
>So, I've made my first pass through the document, and I have a few 
>comments. It would appear that most of the discussion on this topic 
>occurs in the DP forums, so could you tell me which one would be the 
>most appropriate? Or perhaps a thread subject to search for?
>
>If most of this work is occurring under aegis of Distributed 
>Proofreaders, and knowing as we do that Project Gutenberg eschews 
>standards, wouldn't it be better to call this project DPTEI, and 
perhaps 
>move the documentation and tools to the DP servers?
>
>

I apologize.  I gave the impression that feedback should *only* occur 
at DP.  It was an off-hand comment just pointing out that most of the 
feedback has historically gone on there (as I'm sure most would agree, 
attempts in gutvol-d have a tendency to get side-track by a certain 
bird).  I'm very active here and very willing to hold any and all 
discussions on the topic here.

There haven't been any recent PGTEI discussions at DP, so resurrecting 
one of those old threads probably wouldn't be all that helpful.

Also, while *I* have a large involvement at DP, Marcello does not (he 
stops by occasionally to explain things from a technical level, but he 
doesn't spend a large portion of his life there like I do ;)  So 
renaming this to DPTEI would be inappropriate, I believe.

Sounds like you have some ideas.  Let 'er rip.

Josh
From sly at victoria.tc.ca  Thu Nov  9 08:52:05 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Nov  9 08:52:11 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
In-Reply-To: <24296608.20061108202719@noring.name>
References: <21300953.1163036640831.JavaMail.?@fh1063.dia.cp.net>
	<24296608.20061108202719@noring.name>
Message-ID: <Pine.GSO.4.58.0611090847180.28632@vtn1.victoria.tc.ca>


Hmm.... I had an excellent little write-up somewhere that
Jim Tinsley did a while back, describing what a white-washer
does with _all_ incoming texts. (Not all files coming out
of DP post-processing can be posted just as they are.)

For now, you could try reading through:

http://www.gutenberg.org/wiki/Gutenberg:Volunteers%27_FAQ

V.16. How does a text get produced?


Thanks,
Andrew

On Wed, 8 Nov 2006, Jon Noring wrote:

> I first thought that all one had to do is to resubmit the DP versions,
> which include the source information, to DP to replace the ones in the
> PG catalog.
>
> But as I see this discussion unfold, I sense that the DP versions at
> PG may have been corrected (at the PG-side), and that the corrections
> do not flow back to DP. Is this right? (If so, this is a little bit
> troubling since I think DP should hold the "master" and do any
> corrections, but I'll defer commenting further on this point.)
>
> So from my limited vantage point, I see the following:
>
> 1) If a DP-provided text has never been corrected after it was
>    stripped of its source info, DP simply resubmits the text as it
>    originally was submitted.
>
> 2) If a DP-provided text had corrections, then maybe the insertion
>    of the source info should be done by DP.
>
>
> Anyway, I'm sort of operating in the dark on this particular issue, so
> what sayeth the DP and PG folk?
>
> Jon Noring
>
>
From Bowerbird at aol.com  Thu Nov  9 09:22:36 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  9 09:23:03 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
Message-ID: <d02.170475c.3284bddc@aol.com>

lee said:
>    wouldn't it be better to call this project DPTEI, and perhaps 
>    move the documentation and tools to the DP servers?

lee said:
>    I would think that the Internet Archive would
>    be very interested in maintaining DP output. 
>    Is there any reason that DP would not be willing 
>    to publish to PG and IA simultaneously?

i seem to detect a pattern here.

but maybe it's just me...

meanwhile, that banner-ad atop the p.g. site sure does
pull in a huge flock of new volunteers to d.p., doesn't it?

***

josh said:
>   attempts in gutvol-d have a tendency 
>    to get side-track by a certain bird

that's an ironic thing to say, when a quick glance at
the subject-header reveals my thread was hijacked
into a discussion about .tei...               :+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061109/06e31538/attachment.html
From marcello at perathoner.de  Thu Nov  9 09:47:34 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 09:47:45 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <45521D1D.10801@novomail.net>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>	<454D8138.6040307@gmail.com>
	<45521D1D.10801@novomail.net>
Message-ID: <455369B6.3070203@perathoner.de>

Lee Passey wrote:

> If they are constructed correctly, there is no technical reason why TEI
> files cannot be used directly in web browsers. Web browsers cannot
> display 100% of the richness captured by TEI, but the display would
> still be better than the simplified ASCII text version. And there is no
> reason why a single TEI file could not be created that would satisfy the
> needs of both direct rendering and transformations via an XSL script. It
> is not done simply because those involved in the PGTEI project have
> chosen not to do it.

And they chose not do it for several good reasons:

1. PGTEI is designed as a high fidelity conversion chain. We support
most elements in TEI lite and many CSS styling attributes on every
element. The output quality of PGTEI is meant to be vastly superior to
anything you can attain with XSLT transforms only.

2. It is virtually impossible to produce up-to-spec "plain vanilla" and
PDF output thru XSLT only. (Try line rewrapping in XSLT as exercise.) So
there is no way around external tools for TXT (nroff) and PDF (LaTeX)
output.

3. Many utility functions built into the PGTEI conversion chain cannot
work without external tools. eg. embedded SVG graphics conversion to
PNG, embedded LaTeX and AmsTeX equation conversion to PNG, scaling and
thumbnailing of images thru ImageMagick, automatic validation with
HMTLTidy, ...

4. The ability to display TEI files directly in a browser is a
geek-feature that very few people could use to any advantage while it
will confuse end-users with their browser's inadequacies and the choice
between two different browser-enabled formats.

5. PGTEI is designed to have one master format and many end-user
formats. While the master format is not designed to be consumed by
end-users nothing prevents them doing so. Anybody is free to extend
their PGTEI master file with any valid CSS or XSLT files they might
fancy. Caveat emptor: you'll have to write those CSS and XSLT
stylesheets yourself and maintain them while PGTEI evolves.


OTOH I didn't hear a single compelling reason for the PGTEI masters to
be displayed in a browser. As you said: "there is no technical reason
why TEI files cannot be used directly in web browsers". But you didn't
say if there is a practical reason why we *should* do it. At the very
best we can call it a solution in quest of a problem.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jon at noring.name  Thu Nov  9 10:33:00 2006
From: jon at noring.name (Jon Noring)
Date: Thu Nov  9 10:31:25 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <455369B6.3070203@perathoner.de>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
	<454D8138.6040307@gmail.com> <45521D1D.10801@novomail.net>
	<455369B6.3070203@perathoner.de>
Message-ID: <389479139.20061109113300@noring.name>

Marcello wrote:
> Lee Passey wrote:

>> If they are constructed correctly, there is no technical reason why
>> TEI files cannot be used directly in web browsers. Web browsers
>> cannot display 100% of the richness captured by TEI, but the
>> display would still be better than the simplified ASCII text
>> version. And there is no reason why a single TEI file could not be
>> created that would satisfy the needs of both direct rendering and
>> transformations via an XSL script. It is not done simply because
>> those involved in the PGTEI project have chosen not to do it.

> And they chose not do it for several good reasons:

Before commenting on a couple of your individual points, let me say
that your reply is an excellent summary of the issues. Kudos...


> 3. Many utility functions built into the PGTEI conversion chain cannot
> work without external tools. eg. embedded SVG graphics conversion to
> PNG, embedded LaTeX and AmsTeX equation conversion to PNG, scaling and
> thumbnailing of images thru ImageMagick, automatic validation with
> HMTLTidy, ...

Are there any plans to actively or passively support embedded MathML,
either in lieu of, or as an option to, LaTeX?

Four reasons:

1) Presentational MathML is round-trippable to LaTeX, so I understand.
   So no loss there.

2) Content MathML allows many (but not all) mathematics expressions to
   be recognizable by mathematics software for processing (e.g.,
   analytical and numerical solving, graphing, etc.)

3) SVG and MathML have close ties, and likely over time will get closer.

4) MathML is XML, LaTeX is not.

This is not to say LaTeX is evil -- it is not -- but that MathML
(especially content MathML) should be allowed to have a role in PGTEI.


> OTOH I didn't hear a single compelling reason for the PGTEI masters to
> be displayed in a browser. As you said: "there is no technical reason
> why TEI files cannot be used directly in web browsers". But you didn't
> say if there is a practical reason why we *should* do it. At the very
> best we can call it a solution in quest of a problem.

Good point.

The only conceivable thing I can think of is visualization, to make
sure markup is applied properly. Here one doesn't have to worry about
the inline <note>, image embedding, hypertext links issues. The CSS
for visualization would be optimized not for final presentation, but
to simply help the publication author see if the markup was applied
properly.

Like in XHTML, validation to a DTD does not prove the markup was
properly applied to the content for publishing purposes.

Jon Noring


From joshua at hutchinson.net  Thu Nov  9 10:49:45 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Thu Nov  9 10:49:47 2006
Subject: [gutvol-d] TEI rendering in Web browsers
Message-ID: <32862065.1163098185440.JavaMail.?@fh1037.dia.cp.net>

>----Original Message----
>From: jon@noring.name
>
>Are there any plans to actively or passively support embedded MathML,
>either in lieu of, or as an option to, LaTeX?
>


MathML is supported.  Due to it only working in HTML docs and not PDF, 
I've never used it.

>From the guidelines:

<formula>

Attribute notation can take following values:
tex

In PDF output mode this will pipe the contents of the <formula> 
element directly through to the TeX processor.

In HTML output mode the contents of the <formula> element will be 
passed to an instance of TeX and converted to an image. The resulting 
image file is inserted into the HTML file.

In all other output modes it will be ignored.
mathml

In HTML output mode the contents of the <formula> element will be 
inserted literally into the HTML file.

In all other output modes it will be ignored.
svg

In HTML and PDF output modes the SVG contents of the <formula> element 
will be converted to an image and inserted into the file.

In all other output modes it will be ignored.
From cannona at fireantproductions.com  Thu Nov  9 11:16:03 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu Nov  9 11:16:45 2006
Subject: [gutvol-d] TEI rendering in Web browsers
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net><454D8138.6040307@gmail.com>
	<45521D1D.10801@novomail.net><455369B6.3070203@perathoner.de>
	<389479139.20061109113300@noring.name>
Message-ID: <004101c70433$95b4b640$0300a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jon Noring wrote:

> Are there any plans to actively or passively support embedded MathML,
> either in lieu of, or as an option to, LaTeX?
>
> Four reasons:
>
> 1) Presentational MathML is round-trippable to LaTeX, so I understand.
>   So no loss there.
>
> 2) Content MathML allows many (but not all) mathematics expressions to
>   be recognizable by mathematics software for processing (e.g.,
>   analytical and numerical solving, graphing, etc.)
>
> 3) SVG and MathML have close ties, and likely over time will get closer.
>
> 4) MathML is XML, LaTeX is not.
>
> This is not to say LaTeX is evil -- it is not -- but that MathML
> (especially content MathML) should be allowed to have a role in PGTEI.
>


I would add that MathML is slightly more accessible simply because of a
plugin which has been designed which will allow MathML formulas to be
correctly read by a screenreader.  To my knowledge no such piece of software
exists for latex.  This is know problem for me personally, as I have learned
enough latex to get by.  But for the average blind user, MathML is better.

On the other hand, many more people know how to write latex than do MathML.
This is due, in large part, to the length of time it has been around, as
well as its wide acceptance by the academic community.

Aaron Cannon

- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFFU36YI7J99hVZuJcRAhGIAKDfp/44YN8HtOvtQLxWKQX6BnKX8ACdHCix
ysyaoyGkhNIeXwdby6+WGfY=
=Qmdk
-----END PGP SIGNATURE-----

From gbnewby at pglaf.org  Thu Nov  9 11:17:15 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Nov  9 11:17:19 2006
Subject: Seeking whitewashers,
	errata handlers (Re: [gutvol-d] Earliest DP texts stripped of source
	info?)
In-Reply-To: <6d99d1fd0611081355i73507b7bhfd0e7583c5fa7438@mail.gmail.com>
References: <754776753.20061108133237@noring.name>
	<6d99d1fd0611081355i73507b7bhfd0e7583c5fa7438@mail.gmail.com>
Message-ID: <20061109191715.GC26863@mail.pglaf.org>

On Wed, Nov 08, 2006 at 03:55:08PM -0600, David Starner wrote:
> On 11/8/06, Jon Noring <jon@noring.name> wrote:
> >If this is true, at least in the general sense, this indicates to me
> >a need for DP to resubmit those texts to PG. Are there plans to do
> >this, and how difficult would it be?
> 
> There's been discussions, but errata people have been really resistant
> to changes that just add source information.

We are quite short-staffed on the errata handlers and whitewashers.
Jim and Brett have had very little time to spend, leaving
most posting & errata to Tonya, David & Joe.  Joshua has stepped
up quite a bit, but is focused mainly on audio eBooks from Librivox,
and a few other projects you've seen mentioned on gutvol-d.  (There
are tons of other things going on, of course, but I wanted to
send an email about the current labor shortage for the posting/errata).

Mostly we're just trying to save the errata reports, because we don't
have enough staff to act on them.  Many will not get attention until
there are people willing to investigate and apply the changes.

Interested in working with errata reports or whitewashing?  Some
approximate qualifications:
 - commit at least 5 hours per week to labor (plus another
   1-2 for keeping track of email)

 - have outstanding detail-orientation

 - able to work with plain text files, among other formats (i.e., 
   you cannot do this work just within MS Word or Dreamweaver)

 - sufficient network bandwidth to download some pretty big files,
   including multi-MB emails -- OR, sufficient Unix/Linux chops to
   use remote online resources

 - extreme diplomatic skills, for interacting with people whose
   opinions might differ from yours, or who have made some errors
   and need gentle correction and guidance.

As to the question of whether DP can re-submit texts, with
source info added, page scans, etc.: yes, certainly.  In fact,
we could consider some sort of DP process that is more automated
than a "regular" error report or bug fix.

Personally, I'd rather see a little more "value added" than just
source information, for a reposted eBook.  But if someone is
willing to do the work, it's fine to have such an update.

  -- Greg

From sly at victoria.tc.ca  Thu Nov  9 11:24:08 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Nov  9 11:24:12 2006
Subject: [gutvol-d] Christmas-related texts in PG
Message-ID: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca>


Would anyone be interested in helping to create a list of
Christmas related texts in the collection?

I've already got a start by just searching for texts
with the word "Christmas" in the title.

Any individual suggestions would be welcome.

Would there be a good place to ask about this on
the DP foums?

Andrew
From gbnewby at pglaf.org  Thu Nov  9 11:03:09 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Nov  9 11:34:48 2006
Subject: [gutvol-d] DP's contribution to PG
In-Reply-To: <200611082119.56116.donovan@abs.net>
References: <c22.8e66618.3283211f@aol.com>
	<817072195.20061108093213@noring.name>
	<455249E9.9080608@ibiblio.org> <200611082119.56116.donovan@abs.net>
Message-ID: <20061109190309.GA26863@mail.pglaf.org>

On Wed, Nov 08, 2006 at 09:19:55PM -0500, D Garcia wrote:
> On Wednesday 08 November 2006 04:19 pm, Michael Dyck wrote:
> > Jon Noring wrote:
> > > DP ... has submitted over half of the texts to PG.
> >
> > On the face of it, pgdp.net has contributed 9,321 of PG-US's
> > 19,695 etexts, so only 47.3% so far.
> 
> I'm betting that's not excluding the audio books.
> What's it look like with those excluded?

There are fewer than 700 audio books...  that leaves DP
still just under 50%.

People interested in "where to eBooks come from" can
subscribe to the posted list (http://lists.pglaf.org).

We still get quite a few from all over....  from the
whitewasher's point of view, the DP eBooks are generally
much more "conformant" to our processing stream than
those from other sources.  That said, there are some
individuals doing just as high-quality work as DP, but
working on their own.

  -- Greg
From grythumn at gmail.com  Thu Nov  9 11:50:45 2006
From: grythumn at gmail.com (Robert Cicconetti)
Date: Thu Nov  9 11:50:57 2006
Subject: [gutvol-d] Christmas-related texts in PG
In-Reply-To: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca>
Message-ID: <15cfa2a50611091150u291bb11cwa54839e0b7dbe897@mail.gmail.com>

On 11/9/06, Andrew Sly <sly@victoria.tc.ca> wrote:
>
> Would anyone be interested in helping to create a list of
> Christmas related texts in the collection?
>
> I've already got a start by just searching for texts
> with the word "Christmas" in the title.
>
> Any individual suggestions would be welcome.
>
> Would there be a good place to ask about this on
> the DP foums?

A project search on Special Day: Christmas will find some:
http://www.pgdp.net/c/tools/project_manager/projectmgr.php?show=search&title=&author=&language%5B%5D=&special_day%5B%5D=Christmas&projectid=&project_manager=&checkedoutby=&n_results_per_page=300

A keyword search on the project comments and forums might work better,
but I don't think it is possible at the moment...

Ask in General, I'd say, or the PM forums.

R C
From marcello at perathoner.de  Thu Nov  9 12:08:07 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 12:08:16 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <389479139.20061109113300@noring.name>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>	<454D8138.6040307@gmail.com>
	<45521D1D.10801@novomail.net>	<455369B6.3070203@perathoner.de>
	<389479139.20061109113300@noring.name>
Message-ID: <45538AA7.70502@perathoner.de>

Jon Noring wrote:

> Are there any plans to actively or passively support embedded MathML,
> either in lieu of, or as an option to, LaTeX?

Is there some stand-alone open source tool (eg. like ImageMagick for
SVG) that converts MathML expressions into PNG (or LaTeX) ?


> The only conceivable thing I can think of is visualization, to make
> sure markup is applied properly.

Simply build your own CSS stylesheet (or steal one somewhere) and paste
it into the TEI file. But no reason to leave it there after proofing.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Thu Nov  9 12:09:43 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 12:09:51 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <32862065.1163098185440.JavaMail.?@fh1037.dia.cp.net>
References: <32862065.1163098185440.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <45538B07.60401@perathoner.de>

joshua@hutchinson.net wrote:

> MathML is supported.  Due to it only working in HTML docs and not PDF, 
> I've never used it.

The browser has to support it too. Meaning: use Firefox or appropriate
plug-in.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From sly at victoria.tc.ca  Thu Nov  9 12:24:57 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Nov  9 12:25:01 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
In-Reply-To: <24296608.20061108202719@noring.name>
References: <21300953.1163036640831.JavaMail.?@fh1063.dia.cp.net>
	<24296608.20061108202719@noring.name>
Message-ID: <Pine.GSO.4.58.0611091218530.667@vtn1.victoria.tc.ca>


And I just ran into one more example of a thing I've
seen a few times before. Where an annonymous text
is in the DP system with the publisher listed as
"author". This is then (hopefully) corrected as
it enters the PG collection.

It is not too uncommon for what we're calling
"metadata" to be revised just before or shortly
after posting. (I do try to review all the messages
sent to posted, and frequently make a pain of
myself to white-washers, sending messages saying
"this doesn't look quite right".)

Andrew


On Wed, 8 Nov 2006, Jon Noring wrote:

> Joshua wrote:
>
> >
> > Perhaps if someone "trustworthy" on the DP side were to oversee the
> > process so that the errata folks could just take the fixed file(s)
> > and push it to the server, it wouldn't be a big deal.
From lee at novomail.net  Thu Nov  9 12:38:20 2006
From: lee at novomail.net (Lee Passey)
Date: Thu Nov  9 12:36:33 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <d02.170475c.3284bddc@aol.com>
References: <d02.170475c.3284bddc@aol.com>
Message-ID: <455391BC.1050700@novomail.net>

Bowerbird@aol.com wrote:
>    lee said:
>>   wouldn't it be better to call this project DPTEI, and perhaps
>>   move the documentation and tools to the DP servers?
> 
> lee said:
>>   I would think that the Internet Archive would
>>   be very interested in maintaining DP output.
>>   Is there any reason that DP would not be willing
>>   to publish to PG and IA simultaneously?
> 
> i seem to detect a pattern here.
> 
> but maybe it's just me...

What pattern do you think you see here? I'd be happy to tell you whether 
or not you're correct.

-- 
Nothing of significance below this line.

From lee at novomail.net  Thu Nov  9 12:38:28 2006
From: lee at novomail.net (Lee Passey)
Date: Thu Nov  9 12:36:42 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <455369B6.3070203@perathoner.de>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>	<454D8138.6040307@gmail.com>	<45521D1D.10801@novomail.net>
	<455369B6.3070203@perathoner.de>
Message-ID: <455391C4.9030407@novomail.net>

Marcello Perathoner wrote:

[snip]

> And they chose not do it for several good reasons:

[specious arguments snipped]

Knowing that you are already deeply emotionally invested in the current 
TEI process, I recognize the futility of trying to counter any of your 
arguments.

Could you at least furnish us with your XSL scripts so we could use them 
as a starting point for our own transformations?

-- 
Nothing of significance below this line.

From joshua at hutchinson.net  Thu Nov  9 13:00:11 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Thu Nov  9 13:00:13 2006
Subject: [gutvol-d] TEI rendering in Web browsers
Message-ID: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>


>----Original Message----
>From: lee@novomail.net
>
>> And they chose not do it for several good reasons:
>
>[specious arguments snipped]
>
>Knowing that you are already deeply emotionally invested in the 
current 
>TEI process, I recognize the futility of trying to counter any of 
your 
>arguments.
>
>Could you at least furnish us with your XSL scripts so we could use 
them 
>as a starting point for our own transformations?
>

Maybe I'm too close to the issue, myself, but I thought Marcello's 
reasons were very well thought out.  I'd love to see counter arguments, 
so for my sake, maybe?

That said, the conversion scripts are all available here: http://pgtei.
pglaf.org/marcello/0.4/src/gnutenberg-press-0.4.tgz

Josh
From hiddengreen at gmail.com  Thu Nov  9 13:45:06 2006
From: hiddengreen at gmail.com (Cori)
Date: Thu Nov  9 13:51:49 2006
Subject: [gutvol-d] Christmas-related texts in PG
In-Reply-To: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca>
Message-ID: <910fee4a0611091345v4afe69d6y8a53edaf4ff19117@mail.gmail.com>

Much as I hate volunteering other people for things ;) ... Librivoxers
would probably have some time for discussing this too ... I know we're
working on various Chrimbo-related projects there already.
http://librivox.org/forum/

Perhaps "Volunteers Wanted: Other Projects" would be a good place for
a question, and there's still lots of time before Christmas, so some
new text suggestions might get all-singing, all-dancing audio versions
in place in the next 6 weeks.  I suspect a number of readers will have
been looking at the topical book possibilities beyond the usual "A
Christmas Carol" (and literal carol-singing, too!)

Cori


On 09/11/06, Andrew Sly <sly@victoria.tc.ca> wrote:
>
> Would anyone be interested in helping to create a list of
> Christmas related texts in the collection?
>
> I've already got a start by just searching for texts
> with the word "Christmas" in the title.
>
> Any individual suggestions would be welcome.
>
> Would there be a good place to ask about this on
> the DP foums?
>
> Andrew
>


-- 
To Posterity - and Beyond!
From Bowerbird at aol.com  Thu Nov  9 14:32:06 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  9 14:32:12 2006
Subject: [gutvol-d] re: in praise of david widger
Message-ID: <ccd.291d18e.32850666@aol.com>

greg said:
>    there are some individuals doing 
>    just as high-quality work as DP, 
>    but working on their own.

in this regard, david widger deserves
a special shoutout, maybe a medal...

not only has he done a lot of books
on his own, with _excellent_ quality,
but david is also a whitewasher who
handles a ton of other submissions.

in addition, he often includes a little
"explanatory overview" in his books,
and not infrequently a collection of
interesting passages as well.

as a special bonus, in e-mail messages
to the posted list, he provides a one-line
weather report from his home in england.

david is a fine british gentleman...

here's a link to an overview of dr. widger's library:
>    http://www.gutenberg.net.au/widger/home.html

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061109/b8c0787a/attachment.html
From marcello at perathoner.de  Thu Nov  9 14:42:53 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 14:43:01 2006
Subject: [gutvol-d] re: in praise of david widger
In-Reply-To: <ccd.291d18e.32850666@aol.com>
References: <ccd.291d18e.32850666@aol.com>
Message-ID: <4553AEED.5090905@perathoner.de>

Bowerbird@aol.com wrote:

> in this regard, david widger deserves
> a special shoutout, maybe a medal...
...
> in addition, he often includes a little
> "explanatory overview" in his books,
> and not infrequently a collection of
> interesting passages as well.
> 
> as a special bonus, in e-mail messages
> to the posted list, he provides a one-line
> weather report from his home in england.

Oooops. Both Davids just dropped into one pudding ... :-)


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Thu Nov  9 14:46:31 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  9 14:46:43 2006
Subject: [gutvol-d] Christmas-related texts in PG
Message-ID: <bf3.304d38e.328509c7@aol.com>

here's what richard seltzer of samizdat.com lists under "christmas":
>    Mr. Bamboo and the Honorable Little God -- a Christmas Story by Fannie 
Macaulay
>    Christmas, Its Origin, Celebration, and Significance by Robert Haven 
Schauffler
>    Christmas in Ritual and Tradition by Clement A. Niles
>    The Christmas Kalends of Provence by Thomas Janvier
>    Christmas Sermon by Robert Louis Stevenson, 1900
>    A Christmas Story by Samuel W. Francis
>    Christmas Tales and Christmas Verse by Eugene Field, 1912
>    Holiday Stories for Young People, compiled by Margaret Sangster
>    Holidays at the Grange by Emily Mayer Higgins
>    In the Yule-Log Glow: Christmas Tales Round the World, edited by 
Harrions Morris
>    Little Book for Christmas by Cyrus Townsend Brady
>    Nisby's Christmas by Jacob August Riis
>    On Christmas Day int he Evening by Grace Richmond
>    St. Nicholas [magazine]
>    Trifles for the Christmas Holidays by H.S. Armstrong, 1869
>    Twas the Night Before Christmas by Clement Moore, 1912
>    Yule-tide in Many Lands by Mary Pringle and Clara Urann

as with a good many categories, richard's work will give you a head-start...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061109/c60a3f70/attachment.html
From marcello at perathoner.de  Thu Nov  9 14:54:10 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 14:54:17 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <455391C4.9030407@novomail.net>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>	<454D8138.6040307@gmail.com>	<45521D1D.10801@novomail.net>	<455369B6.3070203@perathoner.de>
	<455391C4.9030407@novomail.net>
Message-ID: <4553B192.10005@perathoner.de>

Lee Passey wrote:

> [specious arguments snipped]

I can see that snipping those arguments is much easier than refuting them.


> Could you at least furnish us with your XSL scripts so we could use them
> as a starting point for our own transformations?

Everything PGTEI is, and always has been, GPLed and online.

Start here:

  http://pgtei.pglaf.org/marcello/0.4/


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Thu Nov  9 15:19:38 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  9 15:19:59 2006
Subject: [gutvol-d] re: gvd061108 -- one preliminary thought on duguid
Message-ID: <c8b.15d2683.3285118a@aol.com>

Skipped content of type multipart/alternative-------------- next part --------------
An embedded message was scrubbed...
From: Morasch@aol.com
Subject: Fwd: [gutvol-d] gvd061108 -- one preliminary thought on duguid
Date: Thu, 9 Nov 2006 18:03:56 EST
Size: 46776
Url: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061109/a01afb2f/attachment-0001.mht
From Bowerbird at aol.com  Thu Nov  9 15:29:03 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  9 15:29:12 2006
Subject: [gutvol-d] re: in praise of david widger
Message-ID: <c57.674a3f7.328513bf@aol.com>

marcello said:
>    Oooops. Both Davids just dropped into one pudding ... :-)

oh, that's right, i've blended david widger
and david price into one "super-david"...

well, they'll have to _share_ that medal,
because i'm not buying two...           ;+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061109/c28e120f/attachment.html
From lee at novomail.net  Thu Nov  9 16:02:58 2006
From: lee at novomail.net (Lee Passey)
Date: Thu Nov  9 16:01:17 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>
References: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <4553C1B2.2060900@novomail.net>

joshua@hutchinson.net wrote:

> Maybe I'm too close to the issue, myself, but I thought Marcello's 
> reasons were very well thought out.  I'd love to see counter arguments, 
> so for my sake, maybe?

Marcello's arguments are not necessarily wrong or inaccurate, the are 
simply "non-responsive" (in lawyer-speak). That is to say, they don't 
really respond to the question.

I postulate that it is possible to create TEI-encoded files which can be 
used with all of the existing TEI tools you and Marcello have created 
AND can be used directly in browsers with CSS. In other words, we don't 
have to make a choice between the two rendering processes, we can have 
both simultaneously.

Now if you look and Marcello's arguments 1-3, you will see that while 
they are all valid (and in my opinion persuasive) arguments in favor of 
a robust TEI encoding for Project Gutenberg e-texts, none of them even 
address the issue of dual-use TEI files. And nowhere does he suggest, 
let alone demonstrate, that building CSS support into the TEI master 
files would in any way hinder the creation of other file formats using 
the existing tools.

Argument number 5 is not really an argument at all; it is merely a 
restatement of the conclusion: "We're not going to make /our/ TEI files 
CSS-compatible, but you're welcome to write an XSL script (or any other 
script for that matter -- BB would probably favor Perl at the moment) 
that will convert PGTEI to CSS-compatible TEI." In other words, all it 
does is give us permission to do that which we know we already have the 
right and capability to do (assuming, of course, that the TEI master 
files will always be available without post-processing).

Argument number 4 is the only valid argument of the 5 (this doesn't mean 
I agree with the conclusion, simply that I recognize that it raises 
valid issues). It can be summarized as:

a. nobody wants this capability, so any time we expend implementing 
CSS-enabled TEI files is wasted (this can also be reformulated as 
"nobody will want anything but ZML files, so any time we expend 
implementing PGTEI files is wasted"); and

b. if we provide CSS-enabled TEI files we will unduly confuse those 
people who can't read the caveat "TEI support in Internet Explorer and 
older web browsers is limited; if you can't read this file you should 
download the HTML version instead."

I don't find either of these arguments particularly persuasive. We know 
that at least 3 people (myself, Sam Bretheim and Keith Schultz) out of a 
fairly small sample (the gutvol-d mailing list) would like to see 
CSS-enabled TEI. Perhaps I flatter myself, but I think that sample is 
somewhat significant. I also choose to believe (totally without hard 
evidence) that other people would want it too, if they were to discover 
it. Further, from a purely cost/benefit analysis, the cost of 
implementing the solution (addition of two standard <?xml-stylesheet> 
lines and perhaps some refinement in the PGTEI guidelines, which are 
still undergoing evolution) is virtually zero. Even if the benefit 
derived is very small, the benefit/cost ratio still makes it worth doing.

Likewise, I don't find the "confused end user" argument very persuasive. 
Anyone who 1. finds Project Gutenberg, and 2. finds the TEI files 
offered by Project Gutenberg, has got to have a certain amount of 
technical sophistication. For those who don't know what TEI or XML is, 
even if the attempt to download the files into their browser if it 
doesn't display well they will simply pick a different version. And if 
users are confused by TEI files that render only 90% of the TEI markup, 
think how confused they will be by TEI files that render 0% of the 
markup. Really, the only way to avoid end user confusion is to not offer 
them TEI files at all.

Project Gutenberg is a volunteer-driven organization; it offers 
downloads "as-is" and doesn't provide technical support. If an end user 
can't figure out that TEI files may not be appropriate for them, it's 
not like the technical support phones are going to start ringing.

Now the really fun thing about CSS is that because the content and the 
style is separated, I can discover my own style preferences and apply 
them to all PGTEI files if only the files themselves will contain a 
reference to a well-known style sheet. Personally I am completely 
unimpressed by the HTML created by Marcello's HTML XSL script. At this 
point, my only option is to create a /new/ XSL script to transform the 
TEI into a file matching my tastes, and to run that script every time I 
download a file. If my own computer with my own XSLT engine and XSL 
scripts is not available. But with CSS-enabled TEI I could simply 
maintain my own CSS stylesheet in the local file system and I could 
download a new file and it would just work.

Personally, I think Mr. Bretheim's version of Shirley is /awesome/ -- 
but I'm not a fan of serifed fonts. And on my system the "Fantasy" font 
family is mapped to a deck of cards, so the title of Chapter 1 of 
_Shirley_ is queen of hearts, five of spades, nine of clubs, nine of 
spades, seven of clubs, nine of spades, etc. However, if the stylesheet 
declaration in that file referred first to PGTEI.CSS and then 
PGTEI-USER.CSS I could use PGTEI-USER.CSS to override any or all of the 
default choices with my own. I wouldn't have to create my own master CSS 
file, and I wouldn't have to edit the downloaded TEI to reference the 
standard CSS file. As a matter of fact, I wouldn't even have to create a 
personal CSS file at all: I could just surf the net and find alternative 
CSS files that other people had created and simply select the one I 
liked the best.

<aside>
In fact, creating a new XSL script that would create CSS-enabled XHTML 
would probably be a useful thing to do as well.
</aside>

I believe that empowering end users is always a good thing, even if it 
requires a little more effort or may create a little confusion. I 
certainly wouldn't suggest this if it would hinder in any way the 
automated tools used to transform PGTEI into other useful formats, but 
so far there has been no suggestion that it would.

Now to be quite honest, I'm not suggesting that you or Marcello change 
your position on this point. I have made the suggestion before, and I'm 
not the type of person who enjoys beating his head against the wall. The 
Project Gutenberg way (PGTAO?) has always been that if you don't like 
the way something is being done, just redo it differently; that is the 
approach I am attempting to take. Indeed, the only request I have of the 
PGTEI project is to not claim that CSS-enabled TEI /cannot/ be done, or 
that it /should not/ be done (how I choose to waste my time is my 
business). If there is a group here that wants to explore how to create 
e-texts using CSS-enabled TEI I would think that such an effort should 
be encouraged.

> That said, the conversion scripts are all available here: http://pgtei.
> pglaf.org/marcello/0.4/src/gnutenberg-press-0.4.tgz

Thanks.


-- 
Nothing of significance below this line.

From Bowerbird at aol.com  Thu Nov  9 16:17:39 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  9 16:17:50 2006
Subject: [gutvol-d] TEI rendering in Web browsers
Message-ID: <24b.1211b237.32851f23@aol.com>

lee said:
>    Project Gutenberg is a volunteer-driven organization; it offers
>    downloads "as-is" and doesn't provide technical support.   If an 
>    end user can't figure out that TEI files may not be appropriate for 
them, 
>    it's not like the technical support phones are going to start ringing.

um...   as far as i know, e-mail to project gutenberg from end-users
_is_ answered, at least sometimes anyway, even if "technical support"
is not advertised per se.   and the questions asked can sometimes be
_basic_, as in "what does it mean to 'unzip' a file?"   just so you know...

anyway, until some way is found to speed the creation of .tei files,
it's putting the cart before the horse to argue about their display,
especially since the time when even _trailing-edge_ browsers will
be able to display .tei is at least 5 years down the line, maybe 10...

and by that time, markup will be all washed up...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061109/987d6e1d/attachment.html
From jon at noring.name  Thu Nov  9 17:21:21 2006
From: jon at noring.name (Jon Noring)
Date: Thu Nov  9 17:26:53 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <4553C1B2.2060900@novomail.net>
References: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>
	<4553C1B2.2060900@novomail.net>
Message-ID: <224991111.20061109182121@noring.name>

Lee wrote:
> joshua@hutchinson.net wrote:

>> Maybe I'm too close to the issue, myself, but I thought Marcello's 
>> reasons were very well thought out.  I'd love to see counter
>> arguments, so for my sake, maybe?

Marcello's answer definitely summarize his thoughts, and I agree were
pretty well presented. But I also see Lee's point.

I'll comment on a couple points Lee made.


> I postulate that it is possible to create TEI-encoded files which
> can be used with all of the existing TEI tools you and Marcello have
> created AND can be used directly in browsers with CSS. In other
> words, we don't have to make a choice between the two rendering
> processes, we can have both simultaneously.

This I agree with, but it requires knowing what "constraints" must be
followed when utilizing PGTEI. A sort of usage "subset."

I've noted three major areas of difficulty:

1) Handling of hypertext links.

2) Handling of embedded images.

3) Handling of <note> applied inline within a block of text (such as
   a paragraph.)


#1 and #2 might be handleable by using the XHTML namespaced elements
of <a> and <img> instead of the TEI equivalents. But now we are no
longer in pure TEI. As far as I know PGTEI has not been extended to
support these (am I right?)

I've demonstrated that #1 is achievable (but only in Mozilla/Firefox)
using XLink, but again the stuff we have to put into the TEI
document renders it no longer pure TEI (it is not sufficient to put
the stuff into the DTD.) Now, if PGTEI allows the extension to XLink,
then #1 (and #2 in principle, but not currently supported in browsers)
can be handled (interesting to see how difficult it would be to add
XLink support for images in Mozilla/Firefox.)

#3 is difficult and limited to handle in CSS. As I've demonstrated,
one can "float" the <note> and move it elsewhere in the proximity
of the paragraph it came from, but there are limitations (such as
not supported in IE6 -- I just tried IE7 and it does do something,
but still not pretty.)

#3 is solvable for all browsers if one simply constrains PGTEI
authoring to not put <note> inline at the point of reference, but
again this is limiting.

And finally, there is always the possibility of adding a plugin to
one of the browsers to handle one or more of the above, but I'm
not sure how feasible that is vis-a-vis developer time -- Marcello
believes it to be a major effort -- it probably depends upon what
one wants to do.

Of course, if we simply say our subset won't support inline <note>,
images and hypertext links, then applying CSS to TEI is eminently
doable. (There may be a couple other TEI constructs that give
problems as well -- some uses of table in TEI may not map correctly
using CSS display properties for table elements.)


> I don't find either of these arguments particularly persuasive. We know
> that at least 3 people (myself, Sam Bretheim and Keith Schultz) out of a
> fairly small sample (the gutvol-d mailing list) would like to see 
> CSS-enabled TEI.

Add me to the list!


> Perhaps I flatter myself, but I think that sample is 
> somewhat significant. I also choose to believe (totally without hard 
> evidence) that other people would want it too, if they were to discover
> it. Further, from a purely cost/benefit analysis, the cost of 
> implementing the solution (addition of two standard <?xml-stylesheet> 
> lines and perhaps some refinement in the PGTEI guidelines, which are 
> still undergoing evolution) is virtually zero. Even if the benefit 
> derived is very small, the benefit/cost ratio still makes it worth doing.

The only thing I can see that PGTEI could do is to extend PGTEI to
allow adding XLink for defining hypertext linking and image embedding
(mostly boilerplate additions) -- this will at least make hypertext
links workable in Firefox, and maybe images if someone will submit
some code to Firefox which maps the XLink into the equivalent <img>

The <note> placed inline is a knotty topic. I personally *like*, for
mastering reasons, putting the <note> inline at the point of first
reference. But it makes it hell to do something with it in a browser
(especially if the same note is referenced more than once in a book!)
And then consider the impracticality of handling a note inside of a
note, which occurs in scholarly books. Someone mentioned a book with
three levels of notes: a note inside a note inside a note. Fun.


> ... If there is a group here that wants to explore how to create
> e-texts using CSS-enabled TEI I would think that such an effort should
> be encouraged.

Well, I'll be glad to join that discussion. Maybe a separate group?

I know there are people in the TEI community who have explored this,
so it goes beyond PG. Sebastian Rahtz and I have privately
communicated in building an "ebook-oriented" TEI subset with various
element/attribute constraints. Nothing has come of this, but he did
express interest.


Jon Noring


(p.s, here's an early attempt at using CSS to visualize TEI
documents: http://math.ut.ee/~kaarel/NLP/TEI/visualization/
Not exactly for ebook use, but may help with the project.

Also see: http://www.tei-c.org.uk/Stylesheets/  )


From joshua at hutchinson.net  Thu Nov  9 18:00:07 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Thu Nov  9 18:00:15 2006
Subject: [gutvol-d] TEI rendering in Web browsers
Message-ID: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>

>----Original Message----
>From: lee@novomail.net
>
>I postulate that it is possible to create TEI-encoded files which can 
be 
>used with all of the existing TEI tools you and Marcello have 
created 
>AND can be used directly in browsers with CSS. In other words, we 
don't 
>have to make a choice between the two rendering processes, we can 
have 
>both simultaneously.

Yes, it is possible.

No, I don't believe it is all that useful.  (I'll explain in detail 
below)

>And nowhere does he suggest, 
>let alone demonstrate, that building CSS support into the TEI master 
>files would in any way hinder the creation of other file formats 
using 
>the existing tools.
>

Nope, wouldn't affect it in the least.  But that isn't where the 
problem with it lies, imo.

>
>a. nobody wants this capability, so any time we expend implementing 
>CSS-enabled TEI files is wasted

I would interpret it slightly differently.  No one gets any benefit 
from this capability that isn't already present, in a MUCH better way, 
in other formats (HTML/PDF).

>
>b. if we provide CSS-enabled TEI files we will unduly confuse those 
>people who can't read the caveat "TEI support in Internet Explorer 
and 
>older web browsers is limited; if you can't read this file you 
should 
>download the HTML version instead."
>

This is actually a HUGE thing.  If you had seen all the confused 
emails/comments I received just about text index files for audio books, 
you'd see that most of the folks that used this resource don't want to 
have to learn anything beyond, "click" and it loads.  If it loads a 
little, but the feature set is limited due to limited web browser 
capability ... people won't know that anything better is available and 
assume that text is just crap.  At least if the TEI file won't load at 
all, they will understand that something is wrong (and we can funnel 
them away from the TEI just by laying out the catalog screens to 
provide emphasis to other files).

>We know 
>that at least 3 people (myself, Sam Bretheim and Keith Schultz) out 
of a 
>fairly small sample (the gutvol-d mailing list) would like to see 
>CSS-enabled TEI.

It isn't that I don't think there isn't a significant number that 
would like it, rather that it is a better use of time and energy to 
make the HTML or PDF files better.


>Further, from a purely cost/benefit analysis, the cost of 
>implementing the solution (addition of two standard <?xml-
stylesheet> 
>lines and perhaps some refinement in the PGTEI guidelines, which are 
>still undergoing evolution) is virtually zero. Even if the benefit 
>derived is very small, the benefit/cost ratio still makes it worth 
doing.
>

Ok, now this is where I have to disagree 100%.  That much of an 
addition *would* make it render, but it would be a severely handicapped 
version.  For instance, footnotes would not look very good.  CSS is 
just not good at pulling information out of the text flow and placing 
it somewhere else.  Some data structures/formatting are of a dynamic 
nature, too.  For instance, table of contents, indexes, math equations, 
etc are created by the XSL and associated scripts.  In a CSS-enabled 
TEI file, that stuff would either be non-existent or basically 
gibberish (ie, LaTeX equations are gibberish in their native form to 
99.9% of the world).  

This is the biggest reason I have for not wanting CSS-enabled TEI 
files.  There is a lot of stuff that is not possible to display 
properly.

>Anyone who 1. finds Project Gutenberg, and 2. finds the TEI files 
>offered by Project Gutenberg, has got to have a certain amount of 
>technical sophistication. 

This one actually made me chuckle.  You've obviously never been in 
charge of answering the PG email!  ;)

>For those who don't know what TEI or XML is, 
>even if the attempt to download the files into their browser if it 
>doesn't display well they will simply pick a different version. 

No they won't.  They will either assume 1) PG's file is crap or 2) 
they did something wrong and slink away.  

Some people will try other things.  Most will give up.

>Really, the only way to avoid end user confusion is to not offer 
>them TEI files at all.

There is something to that statement.  It is one of the reason I am 
glad that TEI usually lists at or near the bottom in the PG catalog 
pages.  They have to pass up a lot of other GOOD files to pick the one 
that doesn't work like that.

>Personally I am completely 
>unimpressed by the HTML created by Marcello's HTML XSL script. 

Honestly, this is the feedback I'm most interested in from your 
message.  *What* isn't good?  *What* can we improve?  (And keep in 
mind, the system allows for a LOT of CSS styling that can be specified 
in the master document that I don't use.  I'm one that likes simple 
clean layouts, but you can get crazy).

>At this 
>point, my only option is to create a /new/ XSL script to transform 
the 
>TEI into a file matching my tastes, and to run that script every time 
I 
>download a file. But with CSS-enabled TEI I could simply 
>maintain my own CSS stylesheet in the local file system and I could 
>download a new file and it would just work.

Actually, if you're up to that, you also up to adding a personalized 
CSS to the HTML document, which would probably be easier.  But to each 
his own.  Everyone *is* free to do what they want with the files; that 
is the great thing about them being public domain files! :)

>As a matter of fact, I wouldn't even have to create a 
>personal CSS file at all: I could just surf the net and find 
alternative 
>CSS files that other people had created and simply select the one I 
>liked the best.

This is actually a feature that we've talked about a lot for the HTML 
files.  But it is something that we put on the back-burner.  Just felt 
other things had a higher priority.

>
><aside>
>In fact, creating a new XSL script that would create CSS-enabled 
XHTML 
>would probably be a useful thing to do as well.
></aside>

I assume you mean a XHTML file that references an external CSS file?  
We had that in older versions, but PG whitewashers did not like 
separate CSS files, so we moved to an integrated file approach.

>Indeed, the only request I have of the 
>PGTEI project is to not claim that CSS-enabled TEI /cannot/ be done, 
or 
>that it /should not/ be done (how I choose to waste my time is my 
>business).

Let me say in no uncertain terms.  It CAN be done.  And I DON'T want 
anyone to think it shouldn't be done by others.  I just don't believe 
it should be done by PG scripts.  For those, like you Lee, that *can* 
work with TEI files directly, the files are available.  

Seriously, I'm concerned with folks like my grandmother, who is 87 
years old. She would be horribly confused by a CSS-enabled TEI file 
that didn't display a Table of Contents or had footnotes splashed in 
the middle of the text, etc.

>If there is a group here that wants to explore how to create 
>e-texts using CSS-enabled TEI I would think that such an effort 
should 
>be encouraged.
>

Go for it.  There is benefit to it.  Just not as a general download 
option at this time.  Maybe we can revisit this when browsers are 
better at it and I'll have a completely different opinion.

Josh

PS  I am serious about wanting your feedback on the XHTML we generate 
and ideas for ways to improve the default CSS that is in those files.
From marcello at perathoner.de  Thu Nov  9 18:03:24 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 18:03:32 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <4553C1B2.2060900@novomail.net>
References: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>
	<4553C1B2.2060900@novomail.net>
Message-ID: <4553DDEC.7010007@perathoner.de>

Lee Passey wrote:

> Argument number 5 is not really an argument at all; it is merely a
> restatement of the conclusion: "We're not going to make /our/ TEI files
> CSS-compatible, but you're welcome to write an XSL script (or any other
> script for that matter -- BB would probably favor Perl at the moment)
> that will convert PGTEI to CSS-compatible TEI."

If you think you can make CSS work with PGTEI as it is, you have nothing
to complain. Just start working on it.

If you think you need changes to PGTEI to make it work with CSS, then I
will answer you that in my opinion the play isn't worth the candle
(unless the changes are absolutely trivial like inserting a link into
the header).


PGTEI is a very thin layer of additional standardization over standard
TEI. The virtually only thing that PGTEI defines is how to interpret the
"rend" attribute on TEI elements. This attribute's contents was
intentionally left undefined in the TEI standard because it was felt
that the needs of different markup projects were too far apart to make a
standardization of this attribute desirable or viable.

This very thin layer borrows from other well-known standards whenever
possible. PGTEI uses CSS2/3-compatible values for the "rend" attribute.
Thus it should not be difficult to process the PGTEI "rend" attribute
into an XML "style" attribute that any CSS-capable browser can understand.


> a. nobody wants this capability, so any time we expend implementing
> CSS-enabled TEI files is wasted

Moreover: if we try to implement a kitchen sink where anybody can stuff
their dirty plates in at will, we risk to fail. One of the more
important responsibilities of the software architect is complexity
management. Any added feature -- especially a feature that implements a
completely new usage model -- will make the project more complex, thus
increasing the risk of failure. It needs to be a very important feature
to make me change the design goals at this late stage.


> b. if we provide CSS-enabled TEI files we will unduly confuse those
> people who can't read the caveat "TEI support in Internet Explorer and
> older web browsers is limited; if you can't read this file you should
> download the HTML version instead."

"Can't read this file" is relative. The user may not be able to tell the
level of degradation she is just experiencing.


> I don't find either of these arguments particularly persuasive. We know
> that at least 3 people (myself, Sam Bretheim and Keith Schultz) out of a
> fairly small sample (the gutvol-d mailing list) would like to see
> CSS-enabled TEI.

Then I suggest those 3 people put their heads together and just do it.


> Further, from a purely cost/benefit analysis, the cost of
> implementing the solution (addition of two standard <?xml-stylesheet>
> lines and perhaps some refinement in the PGTEI guidelines, which are
> still undergoing evolution) is virtually zero. Even if the benefit
> derived is very small, the benefit/cost ratio still makes it worth doing.

If you have aready done a cost/benefit analysis, then please state
*exactly* what "refinements" you want, and I'll tell you if they fit in.


> Likewise, I don't find the "confused end user" argument very persuasive.
> Anyone who 1. finds Project Gutenberg, and 2. finds the TEI files
> offered by Project Gutenberg, has got to have a certain amount of
> technical sophistication.

Anybody with enough sophistication to type "free books" into Google will
find PG. If this person then just clicks on "TEI" and gets an
indifferently rendered version, he might not be able to tell that he's
viewing a degraded version.


> For those who don't know what TEI or XML is,
> even if the attempt to download the files into their browser if it
> doesn't display well they will simply pick a different version. And if
> users are confused by TEI files that render only 90% of the TEI markup,
> think how confused they will be by TEI files that render 0% of the
> markup. Really, the only way to avoid end user confusion is to not offer
> them TEI files at all.

Again. 0% is a far better choice than 90%. Users may not be able to tell
that they are getting 90%.

Users will recognize that a TEI source file dumped into the browser is
not what they want and seek alternatives, but they might not be able to
tell eg. that their browser just dropped all footnotes into the bit
bucket because of a buggy XSLT implementation.


> Personally, I think Mr. Bretheim's version of Shirley is /awesome/ --
> but I'm not a fan of serifed fonts. And on my system the "Fantasy" font
> family is mapped to a deck of cards, so the title of Chapter 1 of
> _Shirley_ is queen of hearts, five of spades, nine of clubs, nine of
> spades, seven of clubs, nine of spades, etc.

And now imagine Aunt Tillie confronted with this problem and then say
again that what you propose is user-friendly.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Thu Nov  9 19:05:10 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 19:05:18 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <224991111.20061109182121@noring.name>
References: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>	<4553C1B2.2060900@novomail.net>
	<224991111.20061109182121@noring.name>
Message-ID: <4553EC66.9070202@perathoner.de>

Jon Noring wrote:

> I've noted three major areas of difficulty:
> 
> 1) Handling of hypertext links.
> 
> 2) Handling of embedded images.
> 
> 3) Handling of <note> applied inline within a block of text (such as
>    a paragraph.)

Ever notice the different semantics of <p> in TEI and HTML?

A <p> in TEI might contain a lot of elements a <p> in HTML is not
allowed to contain:

  <p rend="fancy">The scrap of paper read:
    <q rend="block">GXLKDS</q>
  Holmes chuckled.</p>

Is perfectly valid TEI.

Now transform this naively into XHTML and you get:

  <p style="fancy">The scrap of paper read:
    <blockquote>GXLKDS</blockquote>
  Holmes chuckled.</p>

Perfectly invalid XHTML.

The browser will notice this and chicken out into "quirks" HTML. It will
close the <p>s for you and will render like this:

  <p style="fancy">The scrap of paper read:</p>
  <blockquote>GXLKDS</blockquote>
  <p>Holmes chuckled.</p>

Losing the "fancy" formatting on the rest of the snippet.


Did I mention a <p> in TEI can contain tables and lists?


> #1 and #2 might be handleable by using the XHTML namespaced elements
> of <a> and <img> instead of the TEI equivalents. But now we are no
> longer in pure TEI. As far as I know PGTEI has not been extended to
> support these (am I right?)

<figure> will not map gracefully to any XLink construct because <figure>
may contain <head> and <figDesc>.

Why not use browser XSLT to map <figure> to <img>?


> I've demonstrated that #1 is achievable (but only in Mozilla/Firefox)
> using XLink, but again the stuff we have to put into the TEI
> document renders it no longer pure TEI (it is not sufficient to put
> the stuff into the DTD.) Now, if PGTEI allows the extension to XLink,
> then #1 (and #2 in principle, but not currently supported in browsers)
> can be handled (interesting to see how difficult it would be to add
> XLink support for images in Mozilla/Firefox.)

Basically you are suggesting to make PGTEI incompatible to TEI and to
throw in all the complexity of XLink just to maybe make it displayable
in Mozilla (and nowhere else)?


> #3 is solvable for all browsers if one simply constrains PGTEI
> authoring to not put <note> inline at the point of reference, but
> again this is limiting.

So you support the feature by forbidding its use?


> Of course, if we simply say our subset won't support inline <note>,
> images and hypertext links, then applying CSS to TEI is eminently
> doable.

Lets simply say PGTEI doesn't support direct rendering in browsers.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jon at noring.name  Thu Nov  9 19:20:07 2006
From: jon at noring.name (Jon Noring)
Date: Thu Nov  9 19:20:18 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <4553EC66.9070202@perathoner.de>
References: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>
	<4553C1B2.2060900@novomail.net> <224991111.20061109182121@noring.name>
	<4553EC66.9070202@perathoner.de>
Message-ID: <1641943390.20061109202007@noring.name>

Marcello wrote:
> Jon Noring wrote:

>> Of course, if we simply say our subset won't support inline <note>,
>> images and hypertext links, then applying CSS to TEI is eminently
>> doable.

> Lets simply say PGTEI doesn't support direct rendering in browsers.

I overall agree with you, for the reasons you cite plus the ones I
brought up (e.g, difficulty of the links, images, <note>, etc.).

In addition, since PGTEI is simply TEI with a few added things and
constraints, it is pretty free-form as TEI is free-form (and as HTML
is pretty free-form), making it impossible to develop a library of
standardized CSS style sheets that apply to all PGTEI documents.

The only thing that can be done, and this is probably outside of
PGTEI, is to settle upon a type of "TEI-lighter" (not necessarily the
same as "TEI-Lite") which is so well constrained that it is possible
to develop a standardized set of style sheets for rendering, either
for the TEI vocabulary subset, or of the XHTML transform. But then
this "TEI-Lighter" will likely not be able to represent a certain
percentage of the texts PG and DP digitize. One thing I've learned,
as I know all the DPers and PGers have, is that the diversity of
structures used in old documents is very broad.

Jon Noring


From tb at baechler.net  Fri Nov 10 01:12:42 2006
From: tb at baechler.net (Tony Baechler)
Date: Fri Nov 10 01:34:15 2006
Subject: [gutvol-d] Christmas-related texts in PG
In-Reply-To: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca>
Message-ID: <20061110091219.IXWX6331.dukecmmtar02.coxmail.com@Tony.baechler.net>

At 11:24 AM 11/9/06 -0800, you wrote:

>Would anyone be interested in helping to create a list of
>Christmas related texts in the collection?


Yes, I've already done a lot of this.  Here is my list of them.  To 
my knowledge, this includes all of them to date although I could be 
missing a few.  If there is a Christmas story included, it's listed 
even though the rest of the text may not have anything to do with 
Christmas or holidays.  Also there are others not listed because I've 
already read them, such as _A Kidnapped Santa Clause_ by Baum, 
etc.  If you need exact titles, please ask.  This was what I found 
with grep, so my apologies for the lack of decent formatting but this 
at least gives you a list to work from.

7abgh10.txt:Release Date: August, 2005  [EBook #8694]
7loui10.txt:Release Date: February, 2005 [EBook #7425]
cbcst10.txt:Release Date: February, 2004  [EBook #5061]
pg10813.txt:Release Date: January 23, 2004 [EBook #10813]
pg11014.txt:Release Date: February 10, 2004 [EBook #11014]
pg12881.txt:Release Date: July 11, 2004 [EBook #12881]
pg12974.txt:Release Date: July 21, 2004 [EBook #12974]
pg13158.txt:Release Date: August 10, 2004 [EBook #13158]
pg13213.txt:Release Date: August 18, 2004 [EBook #13213]
pg14572.txt:Release Date: January 3, 2005 [EBook #14572]
pg14624.txt:Release Date: January 6, 2005 [EBook #14624]
pg14629.txt:Release Date: January 7, 2005 [EBook #14629]
pg14667.txt:Release Date: January 11, 2005 [EBook #14667]
pg15034.txt:Release Date: February 13, 2005 [EBook #15034]
pg15044.txt:Release Date: February 14, 2005 [EBook #15044]
pg15078.txt:Release Date: February 16, 2005 [EBook #15078]
pg15343.txt:Release Date: March 12, 2005 [EBook #15343]
pg15552.txt:Release Date: April 5, 2005 [EBook #15552]
pg15709.txt:Release Date: April 25, 2005 [EBook #15709]
pg16498.txt:Release Date: August 9, 2005 [EBook #16498]
pg16648.txt:Release Date: September 4, 2005 [EBook #16648]
pg17006.txt:Release Date: November 5, 2005 [EBook #17006]
pg17456.txt:Release Date: January 4, 2006 [EBook #17456]
pg17510.txt:Release Date: January 13, 2006 [EBook #17510]
pg17562.txt:Release Date: January 21, 2006 [EBook #17562]
pg17630.txt:Release Date: January 29, 2006 [EBook #17630]
pg17743.txt:Release Date: February 10, 2006 [EBook #17743]
pg17770.txt:Release Date: February 16, 2006 [EBook #17770]
pg17937.txt:Release Date: March 6, 2006 [EBook #17937]
pg18570.txt:Release Date: June 12, 2006 [EBook #18570]
pg18720.txt:Release Date: June 29, 2006 [EBook #18720]
pg18725.txt:Release Date: July 1, 2006 [EBook #18725]
pg18770.txt:Release Date: July 6, 2006 [EBook #18770]
pg19014.txt:Release Date: August 9, 2006 [EBook #19014]
pg19084.txt:Release Date: August 20, 2006 [EBook #19084]
pg19098.txt:Release Date: August 21, 2006 [EBook #19098]
pg19337.txt:Release Date: September 20, 2006 [EBook #19337]
pg19384.txt:Release Date: September 26, 2006 [EBook #19384]
pg19587.txt:Release Date: October 19, 2006 [EBook #19587]
pg2556.txt:Release Date: May 18, 2006 [EBook #2556]
pg2597.txt:Release Date: May 21, 2006 [EBook #2597]
pg2731.txt:Release Date: May 25, 2006 [EBook #2731]
pg46.txt:Release Date: August 11, 2004 [EBook #46]
stngc10.txt:Release Date: April, 2004 [EBook #5403]
tlrco10.txt:Release Date: August, 2004  [EBook #6373]


-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.14.0/525 - Release Date: 11/9/06


From prosfilaes at gmail.com  Fri Nov 10 05:45:58 2006
From: prosfilaes at gmail.com (David Starner)
Date: Fri Nov 10 05:46:04 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
In-Reply-To: <Pine.GSO.4.58.0611091218530.667@vtn1.victoria.tc.ca>
References: <21300953.1163036640831.JavaMail.?@fh1063.dia.cp.net>
	<24296608.20061108202719@noring.name>
	<Pine.GSO.4.58.0611091218530.667@vtn1.victoria.tc.ca>
Message-ID: <6d99d1fd0611100545h31e8e617i5a85f3f7e9960887@mail.gmail.com>

On 11/9/06, Andrew Sly <sly@victoria.tc.ca> wrote:
> And I just ran into one more example of a thing I've
> seen a few times before. Where an annonymous text
> is in the DP system with the publisher listed as
> "author". This is then (hopefully) corrected as
> it enters the PG collection.

As a project manager at DP, I've never regarded the author and title
fields as real metadata, rather more as tools to keep the system from
spitting out the entire series at once and to communicate information
of interest to the proofer.

Slightly offtopic, but today's nightmare in metadata was my Chippeway
(and Chipeway) grammars, which I had loaded into the system under
Chippewayan. Of course, as is obvious to everyone, Chippewayan is
completely unrelated to Chippeway, which is known to the system (using
the ISO-639-2 list of names) as Objiway. At least we caught it before
we sent it on to DP that way.
From lee at novomail.net  Fri Nov 10 09:03:37 2006
From: lee at novomail.net (Lee Passey)
Date: Fri Nov 10 09:01:59 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>
References: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>
Message-ID: <4554B0E9.7080601@novomail.net>

joshua@hutchinson.net wrote:

[snip]

> Yes, it is possible.
> 
> No, I don't believe it is all that useful.  (I'll explain in detail 
> below)

[snip]

> It isn't that I don't think there isn't a significant number that 
> would like it, rather that it is a better use of time and energy to 
> make the HTML or PDF files better.

Hey, you asked. I'm just pointing out that there are no technical
issues; the only real issue is the cost/benefit of actually doing it,
and on that point you and I will simply have to agree to disagree.

[snip]

> I assume you mean a XHTML file that references an external CSS file?
>  We had that in older versions, but PG whitewashers did not like 
> separate CSS files, so we moved to an integrated file approach.

I want to expand on this so we know that we're both on the same page. I
am assuming from what you say here that you are talking about a
situation where each e-text has its own external CSS file. That is, file
19725.html contains a link to 19725.css and frank15.html contains a link
to frank15.css. To me, this method makes no sense, and I can understand
why PG whitewashers would be opposed. If there is a one-to-one
correspondence between a content file and a stylesheet file, the two
ought to be merged into a single file, for file maintenance reasons if
nothing else.

When it comes to HTML files I have only a single rule on which I am
unwilling to compromise: /you don't get to dictate how an e-text looks
on my system!/

Now it takes a lot of work to strip out someone else's ill-conceived
notion of presentation and replace it with my own, particularly if you
do really stupid things like Microsoft Word does and repeatedly attach
the same style to every paragraph. A process where I don't have to edit
/every file/ I download, sometimes non-trivially, is desirable.

So in support of my #1 rule I suggest three sub rules: a. No style rules
should be included in any XML content file (of any vocabulary) except
via CSS declarations; b. every XML content file should reference a
single, well-known CSS file that contains all the style declarations
used by a certain class of file (e.g. all Project Gutenberg HTML files);
and c. every XML content file should reference a single CSS file with a
well-known file name but which is reserved exclusively for end-user use.

In this scenario, Project Gutenberg would maintain one, count them,
/one/, CSS file (per XML vocabulary). Every XML content file would be
designed to have /adequate/ styling using only the default presentation
for that vocabulary (I'm thinking primarily XHTML here) without the
inclusion of /any/ CSS, but would include links to include the base CSS
file and the user CSS file if they exist on the local file system.

Thus, IMHO, content file + content specific CSS file == bad, content
file + universal CSS file == good.

>> Indeed, the only request I have of the PGTEI project is to not 
>> claim that CSS-enabled TEI /cannot/ be done, or that it /should 
>> not/ be done (how I choose to waste my time is my business).
> 
> Let me say in no uncertain terms.  It CAN be done.  And I DON'T want
> anyone to think it shouldn't be done by others.  I just don't
> believe it should be done by PG[TEI] scripts.

This is what I have always believed, and have stated many times. I'm not 
asking you to adopt my belief system any more than I am willing to adopt 
yours. I am simply exploring the possibility of creating a new set of 
scripts which compete with yours, and which, given Project Gutenberg's 
historical even-handed approach to innovation, I'm sure it would be 
willing to host.

-- 
Nothing of significance below this line.

From Bowerbird at aol.com  Fri Nov 10 09:05:47 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Nov 10 09:05:55 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
Message-ID: <cd8.2a24fd5.32860b6b@aol.com>

david said:
>    As a project manager at DP, I've never regarded 
>    the author and title fields as real metadata

in .zml, the author and title are considered
as _part_of_the_data_, not as "metadata"...

indeed, they are considered to be
a very _important_ part of the data,
so important that the title is expected
to be the first thing listed in the book.

the author is usually the second thing,
expect in cases where something else
supersedes it in importance -- such as
the name of the editor of an anthology.

anything else that follows in importance,
but which is still important enough to list,
is listed next to that in this first section...

of course, this kind of _transparency_ is 
the hallmark of .zml.

which is not to say that you cannot have a 
"metadata" section in a .zml file.   you can...
you can have any kind of section you want.
just start a section, and label it accordingly.

the philosophy is that if it is important enough
information about the file to be saved elsewhere,
it's important enough to be put into the file itself,
and done so in a way that it can be easily harvested
from the file itself, and thus regenerated any time...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061110/6adf09fd/attachment-0001.html
From Bowerbird at aol.com  Fri Nov 10 09:16:57 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Nov 10 09:17:07 2006
Subject: [gutvol-d] TEI rendering in Web browsers
Message-ID: <c50.6f33df1.32860e09@aol.com>

hey folks, here's a suggestion for consideration:
take the .tei negotiations to a different listserve!

as an incentive for you, i promise i would _not_ join
that list, so you will have no one making fun of you.

as the .tei efforts move forward, it's likely that you
will have a boatload of .tei "experts" coming here
and saying "what if you did it _this_ way instead?",
and the second-guessing will drive everyone crazy,
especially people who don't like swimming through
the acronym soup.   so please do us the big favor, eh?

think of it as your own little technoid play-pen, where
you can impress your peers with your spiffy tech-talk...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061110/d1342f5f/attachment.html
From Bowerbird at aol.com  Fri Nov 10 09:18:37 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Nov 10 09:18:48 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info? (typo flame)
Message-ID: <cf3.1886829.32860e6d@aol.com>

i said:
>    the author is usually the second thing,
>    expect in cases where something else

haha, i said "expect" where i meant "except".
what an idiot i am!          :+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061110/f89f4bc3/attachment.html
From Bowerbird at aol.com  Fri Nov 10 12:47:46 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Nov 10 12:47:51 2006
Subject: [gutvol-d] gvd06110 -- first-pass on a 2-up text-template for
	"babelfish"
Message-ID: <3f8.2dc78637.32863f72@aol.com>

here was my first-pass on a 2-up text-template for "babelfish":
>    http://www.greatamericannovel.com/scgi-bin/twocol.html

and marcello, it validates!

it's in xhtml 1.0 strict, and i seem to remember
some problem mentioned with that, but anyway...

you might remember the point of the .html version
-- aside from the obvious of putting it on the web --
is to enable conversions to formats like mobipocket,
rocketbook, plucker, palm ereader, .lit, and such, so
eventually i'll probably have to notch this back so
that it will work with all of those converters, but
for now i'm satisfied with getting the look down...

it's fragile, but seems to be working based on these screenshots:
>    http://www.greatamericannovel.com/scgi-bin/twocolcamino.jpg
>    http://www.greatamericannovel.com/scgi-bin/twocolsafari.jpg
>    http://www.greatamericannovel.com/scgi-bin/twocolie5.jpg

if anyone has any feedback, let me know...

thanks!   have a good weekend!

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061110/5a5fe966/attachment.html
From Catenacci at Ieee.Org  Fri Nov 10 12:58:20 2006
From: Catenacci at Ieee.Org (Onorio Catenacci)
Date: Fri Nov 10 12:58:59 2006
Subject: [gutvol-d] gvd06110 -- first-pass on a 2-up text-template for
	"babelfish"
In-Reply-To: <3f8.2dc78637.32863f72@aol.com>
References: <3f8.2dc78637.32863f72@aol.com>
Message-ID: <c26320b80611101258n2d7f44a7l2e768ae538d2c74b@mail.gmail.com>

On 11/10/06, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
> here was my first-pass on a 2-up text-template for "babelfish":
>  >   http://www.greatamericannovel.com/scgi-bin/twocol.html
>
>  and marcello, it validates!
>
>  it's in xhtml 1.0 strict, and i seem to remember
>  some problem mentioned with that, but anyway...
>
>  you might remember the point of the .html version
>  -- aside from the obvious of putting it on the web --
>  is to enable conversions to formats like mobipocket,
>  rocketbook, plucker, palm ereader, .lit, and such, so
>  eventually i'll probably have to notch this back so
>  that it will work with all of those converters, but
>  for now i'm satisfied with getting the look down...
>
>  it's fragile, but seems to be working based on these screenshots:
>  >
> http://www.greatamericannovel.com/scgi-bin/twocolcamino.jpg
>  >
> http://www.greatamericannovel.com/scgi-bin/twocolsafari.jpg
>  >
> http://www.greatamericannovel.com/scgi-bin/twocolie5.jpg
>
>  if anyone has any feedback, let me know...
>
>  thanks!  have a good weekend!

You may want to look at http://www.csszengarden.com.  I'm pretty sure
it's possible to get a two-column layout like what you're looking for
without having to embed <br/> tags.

Just a thought.

-- 
Onorio
From Bowerbird at aol.com  Fri Nov 10 13:40:36 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Nov 10 13:40:43 2006
Subject: [gutvol-d] gvd06110 -- first-pass on a 2-up text-template for
	"babelfish"
Message-ID: <cf0.18db178.32864bd4@aol.com>

onorio said:
>    You may want to look at http://www.csszengarden.com.

can you be more specific?         :+)


>    I'm pretty sure it's possible 
>    to get a two-column layout 
>    like what you're looking for
>    without having to embed <br/> tags.

oh, ok, well yes, that would be good feedback, onorio.

except i'm not using the break tags to create the columns.

they were just in there because -- up to this point -- i've
been maintaining the linebreaks from the p-book, to make
the comparison with the page-scan maximally easy to do...

but you're right that since this 2-up text-template is essentially
the next step _beyond_ that, the original linebreaks might be
thought of as unnecessary.

so here's an example where they are taken out:
>    http://www.greatamericannovel.com/scgi-bin/twocolnobr.html

this example also changes to book-style _indented paragraphs
from the blank-line-as-paragraph-separator model used before.
and, as you'd expect, the original end-line hyphenates are joined.
further changes would be the segue over to curly "smart" quotes
and proper typographic em-dashes, but i didn't bother with that.

of course, you could also argue from this standpoint that
now the original _pagebreaks_ are unnecessary as well,
and you'd be right about that too.   and indeed one of my
future examples will go on to take _that_ step as well, but
we're just doing one step at a time here now...              :+)

-bowerbird

p.s.   but, if you're interested, if a reader opts to abandon
original pagebreaks, then the text-chunking mechanism
becomes how much text can be displayed on one screen,
given the current window-size and their chosen text-size.

this starts to get complicated, to the point that we might
need to switch up from perl to java or flash or something.

unless, of course, we opt for the simple solution and just
display a whole section -- or even the whole book -- in
a single column and force the person to scroll the thing.

however, any respectable e-book designer would rather
throw up than resort to that inferior interface ugliness,
because proper paging increases readability immensely.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061110/1dda3ba4/attachment.html
From ke at gnu.franken.de  Fri Nov 10 23:50:59 2006
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Fri Nov 10 23:51:42 2006
Subject: Seeking whitewashers,
	errata handlers (Re: [gutvol-d] Earliest DP texts stripped of source
	info?)
In-Reply-To: <20061109191715.GC26863@mail.pglaf.org> (Greg Newby's message of
	"Thu\, 9 Nov 2006 11\:17\:15 -0800")
References: <754776753.20061108133237@noring.name>
	<6d99d1fd0611081355i73507b7bhfd0e7583c5fa7438@mail.gmail.com>
	<20061109191715.GC26863@mail.pglaf.org>
Message-ID: <sh8xii4e1o.fsf@tux.gnu.franken.de>

Greg Newby <gbnewby@pglaf.org> writes:

> Mostly we're just trying to save the errata reports, because we don't
> have enough staff to act on them.  Many will not get attention until
> there are people willing to investigate and apply the changes.

That's a good thing.  It would be nice if you could publish the reports
as add-ons to the projects.

I hope you never ever apply reports without careful checking.  Often,
words that look wrong are actually written that way by the author.

-- 
http://www.gnu.franken.de/ke/                           |      ,__o
                                                        |    _-\_<,
                                                        |   (*)/'(*)
Key fingerprint = F138 B28F B7ED E0AC 1AB4  AA7F C90A 35C3 E9D0 5D1C
From ke at gnu.franken.de  Fri Nov 10 23:38:49 2006
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Fri Nov 10 23:56:28 2006
Subject: [gutvol-d] Re: Project Gutenberg
In-Reply-To: <200611081841.kA8If0LW015376@outmx025.isp.belgacom.be> (Johan
	Boelaert's message of "Wed\, 8 Nov 2006 19\:40\:56 +0100")
References: <200611081841.kA8If0LW015376@outmx025.isp.belgacom.be>
Message-ID: <shd57u4ely.fsf@tux.gnu.franken.de>

"Johan Boelaert" <j.boelaert@skynet.be> writes:

> Every now and then I visit Project Gutenberg Europe, hoping something has
> changed to improve it. But today also, it hasn't. 
>
> On the homepage (http://pge.rastko.net/), we read: "Project Gutenberg Europe
> is follower of Project Gutenberg philosophy, focused primarily on digitizing
> European cultures, under European copyright laws."  If this were true, it
> would be an excellent opportunity to publish European books online, which
> can't be published in Gutenberg U.S., because of the much more severe
> copyright laws over there.

Finding these books from pge does not seem that easy, but for all I
know, the 350+ books already done at pg-eu are listed in the PG catalog.

I can go for the "gold" books listed at http://dp.rastko.net/ or simply
browse here:

    http://pge.rastko.net/dirs/pge/

At dp.rastko.net, I am pushing Friedrich Gundolf's "Romantiker" essays
through the rounds which were published 1930/32 in two books.  Gundolf
died in 1931.

-- 
http://www.gnu.franken.de/ke/                           |      ,__o
                                                        |    _-\_<,
                                                        |   (*)/'(*)
Key fingerprint = F138 B28F B7ED E0AC 1AB4  AA7F C90A 35C3 E9D0 5D1C
From sly at victoria.tc.ca  Sat Nov 11 00:13:48 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat Nov 11 00:13:51 2006
Subject: Seeking whitewashers, errata handlers (Re: [gutvol-d] Earliest
	DP texts stripped of source info?)
In-Reply-To: <sh8xii4e1o.fsf@tux.gnu.franken.de>
References: <754776753.20061108133237@noring.name>
	<6d99d1fd0611081355i73507b7bhfd0e7583c5fa7438@mail.gmail.com>
	<20061109191715.GC26863@mail.pglaf.org>
	<sh8xii4e1o.fsf@tux.gnu.franken.de>
Message-ID: <Pine.GSO.4.58.0611110004080.18385@vtn1.victoria.tc.ca>

Karl:

Yes, that's very true. That's why the ideal person
for helping would be someone very detail oriented who
has some idea of the wide variety of texts that can
be found in PG. Also some patience... I remember Jim
Tinsley saying once that about 50% of the "suggested
corrections" he received were actually incorrect.

>From the catalog point of view, I remember that before
the catalog had its current form, I got more than a few
emails saying "Why do you have Winston Churchill's
dates wrong?", to which I would explain each time that
there was an American author called Winston Churchill,
who was not the same person as the British politician.

Another thing that can happen easily is somthing that
looks like it will be a simple, quick fix, leads to a
text that would take many hours of work, to get up to
the standard we expect from PG texts today.

Andrew

On Sat, 11 Nov 2006, Karl Eichwalder wrote:

> Greg Newby <gbnewby@pglaf.org> writes:
>
> > Mostly we're just trying to save the errata reports, because we don't
> > have enough staff to act on them.  Many will not get attention until
> > there are people willing to investigate and apply the changes.
>
> That's a good thing.  It would be nice if you could publish the reports
> as add-ons to the projects.
>
> I hope you never ever apply reports without careful checking.  Often,
> words that look wrong are actually written that way by the author.
>
>
From hyphen at hyphenologist.co.uk  Sat Nov 11 00:35:55 2006
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sat Nov 11 00:36:06 2006
Subject: Seeking whitewashers,
	errata handlers (Re: [gutvol-d] Earliest DP texts stripped of source
	info?)
In-Reply-To: <Pine.GSO.4.58.0611110004080.18385@vtn1.victoria.tc.ca>
References: <754776753.20061108133237@noring.name>
	<6d99d1fd0611081355i73507b7bhfd0e7583c5fa7438@mail.gmail.com>
	<20061109191715.GC26863@mail.pglaf.org>
	<sh8xii4e1o.fsf@tux.gnu.franken.de>
	<Pine.GSO.4.58.0611110004080.18385@vtn1.victoria.tc.ca>
Message-ID: <qh2bl25jjbo2ij9tnctnv3f7a3rttq8805@4ax.com>

On Sat, 11 Nov 2006 00:13:48 -0800 (PST),  Andrew Sly <sly@victoria.tc.ca>
wrote:

|I remember Jim
|Tinsley saying once that about 50% of the "suggested
|corrections" he received were actually incorrect.

Also 50% of suggested errors by the whitewashers are actually incorrect or
errors in the paper copy.

"Hiding to nothing" or "Rock and Hard Place" spring to mind :-(
-- 
Dave Fawthrop <dave hyphenologist co uk> 
"Intelligent Design?" my knees say *not*. 
"Intelligent Design?" my back says *not*.
More like "Incompetent design". Sig (C) Copyright Public Domain

From Bowerbird at aol.com  Sat Nov 11 11:44:36 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Nov 11 11:44:43 2006
Subject: [gutvol-d] gvd061111 -- erecting more dynamic intralinear tensions
Message-ID: <473.8f46abd.32878224@aol.com>


i'm going on a very dangerous mission.
they asked for volunteers.   it's very
dangerous, but someone has to do it.
i'll write you the instant i get back.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061111/086e49d9/attachment.html
From lee at novomail.net  Sat Nov 11 15:25:31 2006
From: lee at novomail.net (Lee Passey)
Date: Sat Nov 11 15:24:52 2006
Subject: [gutvol-d] TEI to XHTML transformations
In-Reply-To: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>
References: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>
Message-ID: <45565BEB.9020003@novomail.net>

joshua@hutchinson.net wrote:

>  Honestly, this is the feedback I'm most interested in from your
>  message.  *What* isn't good?  *What* can we improve?  (And keep in
>  mind, the system allows for a LOT of CSS styling that can be
>  specified in the master document that I don't use.  I'm one that
>  likes simple clean layouts, but you can get crazy)

For this analysis I started with two documents: _Alice in Wonderland_ 
(GutenText #11) available through Marcello's examples page at 
http://pgtei.pglaf.org/marcello/0.4/examples/alice/ (which, BTW, is not 
available via the regular download process at gutenberg.org). I also 
chose to examine _The Siouan Indians_ (GutenText #19628) on the 
assumption that as a very recent TEI document it more likely reflect the 
most recent evolution of the PGTEI XSL scripts. This assumption may have 
been unnecessary; if the XHTML files are generated dynamically using 
XSLT each time they are downloaded then an end user will /always/ get 
the result of the most recent iteration of the transformation tools no 
matter when the master TEI file was generated. But at least it does no harm.

This reply may seem a bit disjointed as I am composing it as I examine 
the files, and I am recording my notes as I encounter things.

Having downloaded the generated XHTML files I ran them through HTML Tidy 
to regularize the presentation for human analysis and to convert the 
native UTF-8 encoding to ASCII. Older versions of Tidy complained about 
"nested emphasis" when encountering the construct "<span 
class='whatever'><span style='whatever'>...</span></span>" In these 
cases it is more elegant to simply combine the two <span> elements into 
a single element when they are co-terminous (and I vaguely recall that I 
may have added code to Tidy to do that in a subsequent release; but I 
could be wrong). In any case, the construct, while inelegant, is not 
incorrect, and given the limitation of XSL transformations it is 
probably not objectionable.

The next thing I noticed when examining the XHTML file is that /every/ 
<p> tag is qualified with both a "class" attribute and a "style" 
attribute. As I think I have made clear by now, this is totally 
unacceptable. The amount of effort required to replace all of these 
style attributes with what it should really be ("style='margin: 0; 
text-indent: 2em") is simply too great. On closer examination you will 
see that every <p> element is also qualified with a "class='tei-p'" 
attribute, and the <style> declarations at the beginning of the file set 
the style for <p> elements having the 'tei-p' class exactly like the 
style attribute does. So not only do you make it very difficult for the 
end user to edit the file, and not only do you increase the file size 
unnecessarily by including the style attribute, in the end it doesn't 
make one bit of difference because the presentation will be identical 
with or without the attribute.

The W3C's CSS working group has prepared a sample CSS stylesheet that 
they believe "describes the typical formatting of all HTML 4.0 
([HTML40]) elements based on extensive research into current UA 
practice" (http://www.w3.org/TR/CSS2/sample.html). According to that 
sample, the default rendering for a <p> element is: "{ display: block; 
margin: 1.33em 0; }". Further, the CSS2 specification indicates that the 
initial value for a "text-indent" property is zero. So taking a step 
back in examining the PG-TEI-XHTML internal styles we see that the rule 
for "p.tei-p" is virtually identical to the default values for "current 
UA practice" for the <p> element. Thus we can see that not only is the 
"style" attribute attached to each paragraph redundant, identifying each 
<p> element as "class='tei-p'" is also redundant.

The same analysis applies to the ".tei" CSS rule in the internal 
stylesheet; in fact, the ".tei" rule can create unintended consequences. 
First, it should be noted that the ".tei" rule does not alter the 
default CSS presentation; it simply reiterates the default values set 
out in the CSS2 specification. So in the first instance it is 
irrelevant. It may additionally have subtle, unintended consequences.

For example, consider the situation where I have a multiple-paragraph 
quotation, which was all italicized in the original text. In TEI the 
passage could be (simplistically) marked as:

<quote rend="block italic">
<p>...</p>
<p>...</p>
<p>...</p>
</quote>

Presumably, the TEI2XHTML XSLT script will convert this to something like:

<blockquote style="font-style: italic">
<p class="tei tei-p">...</p>
<p class="tei tei-p">...</p>
<p class="tei tei-p">...</p>
</blockquote>

A semi-sophisticated end-user may wonder why the block that is supposed 
to be italicized, isn't.  The reason, of course, is that the font is set 
to italic when the style attribute on the <blockquote> element is 
encountered, but when the 'tei' value on the p[class] attribute is 
encountered it sets the font style back to normal.

IIRC, when we discussed TEI last year, Marcello was strongly opposed to 
the idea of creating a TEI file that was capable of "round-tripping," 
that is, being converted to XHTML then back to TEI such that the 
resultant TEI file was canonically equivalent to the original file. If 
you're not interested in "round-tripping" then specifying a 'tei' class 
has absolutely no value at all.

Thus, I would make the following suggestions:

1. Do a direct conversion of the TEI <p> element to the HTML <p> element 
with no additional attributes.
2. Remove the style definition for 'tei-p'.
3. Remove the style definition for 'tei'.
4. Review all the style definitions beginning with 'tei-', comparing 
them with the default HTML rendering described by the W3C. Remove all 
style definitions which are duplicates of the default styles. Examine 
very carefully those style definitions which are /not/ duplicates of the 
default HTML styles and retain only those for which there is a 
compelling reason to change the default (my guess is that you won't be 
able to find any, and all 'tei-' classes will be discarded).

I did note a fair amount of <p> element abuse in the resulting XHTML 
file, but it appears that it is a result of <p> element abuse in the 
original TEI file. Because this message is not designed to comment on 
the PGTEI guidelines, but only on the conversion tools, I will say only 
that blocks which are /not/ paragraphs but have paragraph-like 
presentation should be represented in TEI with the <ab> element and not 
the <p> element. The conversion tools should then convert <p> to <p> and 
<ab> to <div>.

Why you would want to force me to put up with justified text is 
incomprehensible. Get rid of all references to "text-align: justify."

I like the way you use lists in your generated Tables of Contents (so 
many people abuse the <table> element or the <p> element to create 
ToCs). What you have done, however, is to create a single list and have 
attempted to create a hierarchical structure by using styles to attempt 
to offset the list items. Not only is using styles to try and indicate 
structure a bad idea, it's going to fall apart if you're using a User 
Agent which doesn't support style sheets. Instead of a Table of Contents 
that looks like this:

<ul class="tei tei-index tei-index-toc">
<li><a href="#toc1">THE SIOUAN STOCK</a></li>
<li style="margin-left: 2em"><a href="#toc3">DEFINITION</a></li>
<li style="margin-left: 4em"><a href="#toc5">EXTENT OF THE STOCK</a></li>
<li style="margin-left: 4em"><a href="#toc7">TRIBAL NOMENCLATURE</a></li>
<li style="margin-left: 4em"><a href="#toc9">PRINCIPAL CHARACTERS</a></li>
<li style="margin-left: 6em"><a href="#toc11">PHONETIC AND GRAPHIC 
ARTS</a></li>
<li style="margin-left: 6em"><a href="#toc13">INDUSTRIAL AND ESTHETIC 
ARTS</a></li>
<li style="margin-left: 6em"><a href="#toc15">INSTITUTIONS</a></li>
<li style="margin-left: 6em"><a href="#toc17">BELIEFS</a></li>
<li style="margin-left: 8em"><a href="#toc19">THE DEVELOPMENT OF 
MYTHOLOGY</a></li>
<li style="margin-left: 8em"><a href="#toc21">THE SIOUAN MYTHOLOGY</a></li>
<li style="margin-left: 6em"><a href="#toc23">SOMATOLOGY</a></li>
<li style="margin-left: 2em"><a href="#toc25">HABITAT</a></li>
<li style="margin-left: 2em"><a href="#toc27">ORGANIZATION</a></li>
<li style="margin-left: 2em"><a href="#toc29">HISTORY</a></li>
<li style="margin-left: 4em"><a href="#toc31">DAKOTA-ASINIBOIN</a></li>
<li style="margin-left: 4em"><a href="#toc33">&#162;EGIHA</a></li>
<li style="margin-left: 4em"><a href="#toc35">&#647;&#390;IWE'RE</a></li>
<li style="margin-left: 4em"><a href="#toc37">WINNEBAGO</a></li>
<li style="margin-left: 4em"><a href="#toc39">MANDAN</a></li>
<li style="margin-left: 4em"><a href="#toc41">HIDATSA</a></li>
<li style="margin-left: 4em"><a href="#toc43">THE EASTERN AND SOUTHERN 
TRIBES</a></li>
<li style="margin-left: 4em"><a href="#toc45">GENERAL MOVEMENTS</a></li>
<li><a href="#toc47">SOME FEATURES OF INDIAN SOCIOLOGY</a></li>
</ul>

I would recommend creating a Table of Contents that looks like this:

<ul class="toc">
 <li><a href="#toc1">THE SIOUAN STOCK</a></li>
 <li><ul>
  <li><a href="#toc3">DEFINITION</a></li>
  <li><ul>
   <li><a href="#toc5">EXTENT OF THE STOCK</a></li>
   <li><a href="#toc7">TRIBAL NOMENCLATURE</a></li>
   <li><a href="#toc9">PRINCIPAL CHARACTERS</a></li>
   <li><ul>
    <li><a href="#toc11">PHONETIC AND GRAPHIC ARTS</a></li>
    <li><a href="#toc13">INDUSTRIAL AND ESTHETIC ARTS</a></li>
    <li><a href="#toc15">INSTITUTIONS</a></li>
    <li><a href="#toc17">BELIEFS</a></li>
    <li><ul>
     <li><a href="#toc19">THE DEVELOPMENT OF MYTHOLOGY</a></li>
     <li><a href="#toc21">THE SIOUAN MYTHOLOGY</a></li>
    </ul></li>
    <li style="margin-left: 6em"><a href="#toc23">SOMATOLOGY</a></li>
   </ul></li>
  </ul></li>
  <li><a href="#toc25">HABITAT</a></li>
  <li><a href="#toc27">ORGANIZATION</a></li>
  <li><a href="#toc29">HISTORY</a></li>
  <li><ul>
   <li><a href="#toc31">DAKOTA-ASINIBOIN</a></li>
   <li><a href="#toc33">&#162;EGIHA</a></li>
   <li><a href="#toc35">&#647;&#390;IWE'RE</a></li>
   <li><a href="#toc37">WINNEBAGO</a></li>
   <li><a href="#toc39">MANDAN</a></li>
   <li><a href="#toc41">HIDATSA</a></li>
   <li><a href="#toc43">THE EASTERN AND SOUTHERN TRIBES</a></li>
   <li><a href="#toc45">GENERAL MOVEMENTS</a></li>
  </ul></li>
 </ul></li>
 <li><a href="#toc47">SOME FEATURES OF INDIAN SOCIOLOGY</a></li>
</ul>

(The indentation was added just so I could keep track of closing tags).

In this case the structure of the document is reflected in the structure 
of the Table of Contents, and it is more robust when faced with various 
User Agents.

Many handheld User Agents, such as ?Book, Microsoft Reader, and 
MobiPocket Reader, understand the "page-break-before" attribute, so I 
wouldn't omit it as you appear to have done (personally I like a page 
break before every chapter, so each chapter begins at the top of the 
page). Unfortunately, not all of these User Agents understand the 
attribute when it is applied to elements other than <br> so I would 
recommend placing a "<br style='page-break-before:always' />" before 
every element which would otherwise contain the attribute.

Please, please, please get rid of all the style attributes on individual 
elements, unless you can clearly articulate a compelling reason for its use.

I don't like the use of Definition Lists for footnotes. It may give you 
the presentation you want, but it's not what it was designed for. I 
prefer the use of two <div>s; one for a footnote group and one for the 
note itself, such as this:

<div class="footnote">
<h3>Footnotes</h3>
<div class="note"><a id="note_1" name="note_1" 
href="#noteref_1">[1.]</a> Prepared as a complement and introduction to 
the following paper oil "Siouan Sociology," by the late James Owen 
Dorsey.</div>

<div class="note"><a id="note_2" name="note_2" 
href="#noteref_2">[2.]</a> "A synopsis of the Indian tribes ... in North 
America," Trans, and Coll. Am. Antiq. Soc., vol. II, p. 120.</div>

<div class="note"><a id="note_3" name="note_3" href="#noteref_3">3.</a> 
"Indian linguistic families of America north of Mexico," Seventh Annual 
Report of the Bureau of Ethnology, for 1885-86 (1891), pp. 111-118. 
Johnson's Cyclopedia, 1893-95 edition, vol. VII, p. 546, etc.</div>

<div class="note"><a id="note_4" name="note_4" href="#noteref_4">4.</a> 
Correspondence with the Bureau of Ethnology.</div>
...
</div>

I think if you follow these suggestions you will have a file which is 
presentationally as nice as what you had before (or perhaps even 
presentationally identical), which is more flexible, and which will 
retain an acceptable presentation even in those User Agents which have 
incomplete support for CSS.

To be continued....

From lee at novomail.net  Sat Nov 11 15:41:16 2006
From: lee at novomail.net (Lee Passey)
Date: Sat Nov 11 15:40:31 2006
Subject: [gutvol-d] here's the perl code for babelfish assignment 01
In-Reply-To: <4bf.2aa73da.3269eae7@aol.com>
References: <4bf.2aa73da.3269eae7@aol.com>
Message-ID: <45565F9C.6050404@novomail.net>

Bowerbird@aol.com wrote:
> here's the perl version for babelfish assignment 01,
> which was to read a file in and spit it to a webpage
> with [pre], a simple task accomplished in 11 lines...

You know, I haven't really been paying much attention to this babelfish 
thingey; a program to convert from ZML to HTML is really not that 
interesting or useful. But it occurs to me that since ZML is pretty much 
ASCII only, and the markup is subtle and generally unnoticeable, ZML 
documents ought to be acceptable to Project Gutenberg as the PVA 
alternative to well-structured text. Thus, an HTML to ZML convert 
/could/ be useful.

So how about it Bowerbird, can you give us that instead?

From lee at novomail.net  Sat Nov 11 16:49:40 2006
From: lee at novomail.net (Lee Passey)
Date: Sat Nov 11 16:49:08 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <4553B192.10005@perathoner.de>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>	<454D8138.6040307@gmail.com>	<45521D1D.10801@novomail.net>	<455369B6.3070203@perathoner.de>	<455391C4.9030407@novomail.net>
	<4553B192.10005@perathoner.de>
Message-ID: <45566FA4.1020803@novomail.net>

Marcello Perathoner wrote:

>  Everything PGTEI is, and always has been, GPLed and online.
>
>  Start here:
>
>  http://pgtei.pglaf.org/marcello/0.4/
>

What XSLT script engine are you using? I've had the most experience with 
Xalan, is there one you think is better?


From sly at victoria.tc.ca  Sat Nov 11 20:18:23 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat Nov 11 20:18:26 2006
Subject: [gutvol-d] Christmas-related texts in PG
In-Reply-To: <20061110091219.IXWX6331.dukecmmtar02.coxmail.com@Tony.baechler.net>
References: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca>
	<20061110091219.IXWX6331.dukecmmtar02.coxmail.com@Tony.baechler.net>
Message-ID: <Pine.GSO.4.58.0611112010140.4772@vtn1.victoria.tc.ca>

Thanks Tony (and others)

Right now, all I'm aiming for is a list of PG numbers,
so this is great. Before cross-checking with your list
here, I've got ~70 items.

For individual stories which occur in other collections, what
about the idea of making some kind of "PG Christmas anthology"?
including them together? Or is that too "1990s-PG"...

Andrew

On Fri, 10 Nov 2006, Tony Baechler wrote:

> Yes, I've already done a lot of this.  Here is my list of them.  To
> my knowledge, this includes all of them to date although I could be
> missing a few.  If there is a Christmas story included, it's listed
> even though the rest of the text may not have anything to do with
> Christmas or holidays.  Also there are others not listed because I've
> already read them, such as _A Kidnapped Santa Clause_ by Baum,
> etc.  If you need exact titles, please ask.  This was what I found
> with grep, so my apologies for the lack of decent formatting but this
> at least gives you a list to work from.
>

[snip list]
From sly at victoria.tc.ca  Sun Nov 12 00:11:08 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Nov 12 00:11:15 2006
Subject: [gutvol-d] Christmas-related texts in PG
In-Reply-To: <20061110091219.IXWX6331.dukecmmtar02.coxmail.com@Tony.baechler.net>
References: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca>
	<20061110091219.IXWX6331.dukecmmtar02.coxmail.com@Tony.baechler.net>
Message-ID: <Pine.GSO.4.58.0611112357590.6307@vtn1.victoria.tc.ca>


Ok, below is a sorted list of PG numbers, representing
76 christmas-related items in the collection.

Note this does not include texts that include one or
two christmas stories mixed with others. I will be
dealing with those separately.

Would someone be able to help generate a list of
titles from this I could use for a web page?

(I've spent hours on this already, and I'm going
to be stubborn now, and avoid manually copying out
the titles for each of these.)


   46
  519
  520
  644
  676
 1467
 2731
 4384
 5061
 6670
 7256
 8694
 9523
10524
10528
10723
10813
10825
11014
12610
12881
12974
13024
13158
13213
13244
14508
14534
14535
14572
14573
14606
14624
14629
14653
14667
14677
14713
14785
14786
14788
14829
14835
14946
15044
15133
15343
15552
15709
15826
16134
16454
17006
17135
17382
17456
17510
17562
17630
17679
17743
17761
17770
17937
18570
18720
18725
18770
19014
19084
19098
19337
19384
19425
19587
19608

From gbnewby at pglaf.org  Sun Nov 12 02:32:01 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Nov 12 02:32:04 2006
Subject: [gutvol-d] Christmas-related texts in PG
In-Reply-To: <Pine.GSO.4.58.0611112357590.6307@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca>
	<20061110091219.IXWX6331.dukecmmtar02.coxmail.com@Tony.baechler.net>
	<Pine.GSO.4.58.0611112357590.6307@vtn1.victoria.tc.ca>
Message-ID: <20061112103201.GA17913@mail.pglaf.org>

On Sun, Nov 12, 2006 at 12:11:08AM -0800, Andrew Sly wrote:
> 
> Ok, below is a sorted list of PG numbers, representing
> 76 christmas-related items in the collection.

Neat!  I went ahead and created an ISO via the online tool.

	http://snowy.arsc.alaska.edu/pgiso

If all's well it should be here shortly:

	ftp://snowy.arsc.alaska.edu/pgiso/isos/gbnewby/xmas.iso

But I'll check later to check that it arrived.  About
625MB, including these 76 titles in all available file
formats (290 files total).

Thanks!
  -- Greg

> Note this does not include texts that include one or
> two christmas stories mixed with others. I will be
> dealing with those separately.
> 
> Would someone be able to help generate a list of
> titles from this I could use for a web page?
> 
> (I've spent hours on this already, and I'm going
> to be stubborn now, and avoid manually copying out
> the titles for each of these.)
> 
> 
>    46
>   519
>   520
>   644
>   676
>  1467
>  2731
>  4384
>  5061
>  6670
>  7256
>  8694
>  9523
> 10524
> 10528
> 10723
> 10813
> 10825
> 11014
> 12610
> 12881
> 12974
> 13024
> 13158
> 13213
> 13244
> 14508
> 14534
> 14535
> 14572
> 14573
> 14606
> 14624
> 14629
> 14653
> 14667
> 14677
> 14713
> 14785
> 14786
> 14788
> 14829
> 14835
> 14946
> 15044
> 15133
> 15343
> 15552
> 15709
> 15826
> 16134
> 16454
> 17006
> 17135
> 17382
> 17456
> 17510
> 17562
> 17630
> 17679
> 17743
> 17761
> 17770
> 17937
> 18570
> 18720
> 18725
> 18770
> 19014
> 19084
> 19098
> 19337
> 19384
> 19425
> 19587
> 19608
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From cannona at fireantproductions.com  Sun Nov 12 05:54:55 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sun Nov 12 05:58:15 2006
Subject: [gutvol-d] Christmas-related texts in PG
References: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca><20061110091219.IXWX6331.dukecmmtar02.coxmail.com@Tony.baechler.net><Pine.GSO.4.58.0611112357590.6307@vtn1.victoria.tc.ca>
	<20061112103201.GA17913@mail.pglaf.org>
Message-ID: <009401c70662$95e3e290$0300a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This should fit nicely on a cd.

Sincerely
Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)
- ----- Original Message -----
From: "Greg Newby" <gbnewby@pglaf.org>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Sunday, November 12, 2006 4:32 AM
Subject: Re: [gutvol-d] Christmas-related texts in PG


> On Sun, Nov 12, 2006 at 12:11:08AM -0800, Andrew Sly wrote:
>>
>> Ok, below is a sorted list of PG numbers, representing
>> 76 christmas-related items in the collection.
>
> Neat!  I went ahead and created an ISO via the online tool.
>
> http://snowy.arsc.alaska.edu/pgiso
>
> If all's well it should be here shortly:
>
> ftp://snowy.arsc.alaska.edu/pgiso/isos/gbnewby/xmas.iso
>
> But I'll check later to check that it arrived.  About
> 625MB, including these 76 titles in all available file
> formats (290 files total).
>
> Thanks!
>  -- Greg
>
>> Note this does not include texts that include one or
>> two christmas stories mixed with others. I will be
>> dealing with those separately.
>>
>> Would someone be able to help generate a list of
>> titles from this I could use for a web page?
>>
>> (I've spent hours on this already, and I'm going
>> to be stubborn now, and avoid manually copying out
>> the titles for each of these.)
>>
>>
>>    46
>>   519
>>   520
>>   644
>>   676
>>  1467
>>  2731
>>  4384
>>  5061
>>  6670
>>  7256
>>  8694
>>  9523
>> 10524
>> 10528
>> 10723
>> 10813
>> 10825
>> 11014
>> 12610
>> 12881
>> 12974
>> 13024
>> 13158
>> 13213
>> 13244
>> 14508
>> 14534
>> 14535
>> 14572
>> 14573
>> 14606
>> 14624
>> 14629
>> 14653
>> 14667
>> 14677
>> 14713
>> 14785
>> 14786
>> 14788
>> 14829
>> 14835
>> 14946
>> 15044
>> 15133
>> 15343
>> 15552
>> 15709
>> 15826
>> 16134
>> 16454
>> 17006
>> 17135
>> 17382
>> 17456
>> 17510
>> 17562
>> 17630
>> 17679
>> 17743
>> 17761
>> 17770
>> 17937
>> 18570
>> 18720
>> 18725
>> 18770
>> 19014
>> 19084
>> 19098
>> 19337
>> 19384
>> 19425
>> 19587
>> 19608
>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFFVyhyI7J99hVZuJcRAhjlAKC2A/+v9c600Eiw0ZVRN226kin63gCfTQy7
r6cEhnQhWsTAIZkGu8eDn/Y=
=m4zM
-----END PGP SIGNATURE-----

From cannona at fireantproductions.com  Sun Nov 12 06:45:35 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sun Nov 12 06:46:06 2006
Subject: [gutvol-d] Christmas-related texts in PG
References: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca><20061110091219.IXWX6331.dukecmmtar02.coxmail.com@Tony.baechler.net>
	<Pine.GSO.4.58.0611112357590.6307@vtn1.victoria.tc.ca>
Message-ID: <00a401c70669$46c9c6f0$0300a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Here you go.  The list will probably strip the attachment, so I'm sending it
to you directly as well.

If you amend your list, or if you want more catalog data on each book, such
as size, file name, ETC. let me know.

Sincerely
Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)
- ----- Original Message -----
From: "Andrew Sly" <sly@victoria.tc.ca>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Sunday, November 12, 2006 2:11 AM
Subject: Re: [gutvol-d] Christmas-related texts in PG


>
> Ok, below is a sorted list of PG numbers, representing
> 76 christmas-related items in the collection.
>
> Note this does not include texts that include one or
> two christmas stories mixed with others. I will be
> dealing with those separately.
>
> Would someone be able to help generate a list of
> titles from this I could use for a web page?
>
> (I've spent hours on this already, and I'm going
> to be stubborn now, and avoid manually copying out
> the titles for each of these.)
>
>
>   46
>  519
>  520
>  644
>  676
> 1467
> 2731
> 4384
> 5061
> 6670
> 7256
> 8694
> 9523
> 10524
> 10528
> 10723
> 10813
> 10825
> 11014
> 12610
> 12881
> 12974
> 13024
> 13158
> 13213
> 13244
> 14508
> 14534
> 14535
> 14572
> 14573
> 14606
> 14624
> 14629
> 14653
> 14667
> 14677
> 14713
> 14785
> 14786
> 14788
> 14829
> 14835
> 14946
> 15044
> 15133
> 15343
> 15552
> 15709
> 15826
> 16134
> 16454
> 17006
> 17135
> 17382
> 17456
> 17510
> 17562
> 17630
> 17679
> 17743
> 17761
> 17770
> 17937
> 18570
> 18720
> 18725
> 18770
> 19014
> 19084
> 19098
> 19337
> 19384
> 19425
> 19587
> 19608
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFFVzOrI7J99hVZuJcRAvZRAJ9UZI2n/+uXj+VgkK2RJSz++R3JPwCglkWe
8zHXXVU7jBcUWncAjsTeDlQ=
=IPj+
-----END PGP SIGNATURE-----
-------------- next part --------------
id	author	title
46	Dickens, Charles, 1812-1870	A Christmas Carol
519	Baum, L. Frank (Lyman Frank), 1856-1919	A Kidnapped Santa Claus
520	Baum, L. Frank (Lyman Frank), 1856-1919	Life and Adventures of Santa Claus
644	Dickens, Charles, 1812-1870	The Haunted Man and the Ghost's Bargain
676	Dickens, Charles, 1812-1870	The Battle of Life
1467	Dickens, Charles, 1812-1870	Some Christmas Stories
2731	Thackeray, William Makepeace, 1811-1863	The Christmas Books
4384	Van Dyke, Henry, 1852-1933	The Lost Word, Christmas stories
5061		The Children's Book of Christmas Stories
6670	Browning, Robert, 1812-1889	Christmas Eve
7256	Henry, O., 1862-1910	The Gift of the Magi
8694	Alcott, Louisa May, 1832-1888	The Abbot's Ghost, or Maurice Treherne's Temptation\nA Christmas Story
9523	Thackeray, William Makepeace, 1811-1863	The Christmas Books
10524	McGuinn, Roger	We Wish You a Merry Christmas
10528	McGuinn, Roger	I Heard the Bells on Christmas Day
10723	Stowe, Harriet Beecher, 1811-1896	Betty's Bright Idea; Deacon Pitkin's Farm; and the First Christmas of New England
10813	Boyd, Mary Stuart, -1937	A Versailles Christmas-Tide
10825	Altamirano, Ignacio Manuel, 1834-1893	La Navidad en las Monta?as
11014	Smith, Elva S.	Christmas in Legend and Story\nA Book for Boys and Girls
12610	Carr, Annie Roe	Nan Sherwood's Winter Holidays\nRescuing the Runaways
12881	Hughes, Rupert, 1872-1956	Mrs. Budlong's Chrismas Presents
12974	Kellogg, Alice Maude, 1862-1911	Christmas Entertainments
13024	Dandurand, Josephine Marchand, 1862-1925	Contes de No?l par Josette
13158	Hocking, Joseph	Weapons of Mystery
13213	Various	The Night Before Christmas and Other Popular Stories For Children
13244	Various	Punch Among the Planets
14508	Knapp, Shepherd	The Christmas Dinner
14534	Finley, Martha, 1828-1909	Christmas with Grandma Elsie
14535	Stevenson, Robert Louis, 1850-1894	A Christmas Sermon
14572	Van Dyke, Henry, 1852-1933	The Spirit of Christmas
14573	Rinehart, Mary Roberts, 1876-1958	The Truce of God
14606	Brady, Cyrus Townsend, 1861-1920	And Thus He Came\nA Christmas Fantasy
14624	Page, Thomas Nelson, 1835-1922	Santa Claus's Partner
14629	Snowden, James H. (James Henry), 1852-1936	A Wonderful Night; An Interpretation Of Christmas
14653	Bennett, Arnold, 1867-1931	The Feast of St. Friend
14667	Beerbohm, Max, Sir, 1872-1956	A Christmas Garland
14677	Mirmont, Madame Henri de la Ville de	Contes de No?l
14713	Chabot, Alphonse	No?l dans les pays ?trangers
14785	Knapp, Shepherd	Down the Chimney
14786	Knapp, Shepherd	Up the Chimney
14788	Chabot, Alphonse	La nuit de No?l dans tous les pays
14829	Various	Our Holidays\nTheir Meaning and Spirit; retold from St. Nicholas
14835	Miller, Alice Duer, 1874-1942	The Burglar and the Blizzard\nA Christmas Story
14946	Cutting, Mary Stewart Doubleday, 1851-1924	The Blossoming Rod
15044	Nicholson, Meredith, 1866-1947	A Reversible Santa Claus
15133	Francis, Stella M.	Campfire Girls in the Allegheny Mountains\nor, A Christmas Success against Odds
15343	Brady, Cyrus Townsend, 1861-1920	A Little Book for Christmas
15552	Dawson, Coningsby (Coningsby William), 1883-1959	Christmas Outside of Eden
15709	Brown, Abbie Farwell, 1871-1927	The Christmas Angel
15826	Dalrymple, Leona, 1884-	Uncle Noah's Christmas Inspiration
16134	Van Dyke, Henry, 1852-1933	The First Christmas Tree\nA Story of the Forest
16454	Barclay, Florence L. (Florence Louisa), 1862-1921	The Upas Tree\nA Christmas Story for all the Year
17006	Parker, Theodore, 1810-1860	Two Christmas Celebrations
17135	Moore, Clement Clarke, 1779-1863	Twas the Night before Christmas\nA Visit from St. Nicholas
17382	Moore, Clement Clarke, 1779-1863	A Visit From Saint Nicholas
17456	Wiggin, Kate Douglas Smith, 1856-1923	The Romance of a Christmas Card
17510	Dalrymple, Leona, 1884-	When the Yule Log Burns\nA Christmas Story
17562	Armstrong, H. S.	Trifles for the Christmas Holidays
17630	Field, Eugene, 1850-1895	Christmas Tales and Christmas Verse
17679	Hope, Laura Lee	The Story of a Nodding Donkey
17743	Williamson, Alice Muriel, -1933;Williamson, Charles Norris, 1859-1920	Rosemary\nA Christmas story
17761	Hope, Laura Lee	Six Little Bunkers at Grandpa Ford's
17770	Various	Christmas Stories And Legends
17937	Butler, Ellis Parker, 1869-1937	The Thin Santa Claus\nThe Chicken Yard That Was a Christmas Stocking
18570	Pringle, Mary Poague;Urann, Clara A.	Yule-Tide in Many Lands
18720		In the Yule-Log Glow, Book I\nChristmas Tales from 'Round the World
18725	Gunnison, Charles A., 1861-1897	A Napa Christchild; and Benicia's Letters
18770	Francis, Samuel W., 1835-1886	A Christmas Story\nMan in His Element: or, A New Way to Keep House
19014	Riis, Jacob A., 1849-1914	Nibsy's Christmas
19084		In the Yule-Log Glow, Book II\nChristmas Tales from 'Round the World
19098	Miles, Clement A.	Christmas in Ritual and Tradition, Christian and Pagan
19337	Dickens, Charles, 1812-1870	A Christmas Carol
19384	Richmond, Grace S. (Grace Smith), 1866-1959	On Christmas Day In The Evening
19425	Hope, Laura Lee	The Story of a Stuffed Elephant
19587	Janvier, Thomas A. (Thomas Allibone), 1849-1913	The Christmas Kalends of Provence\nAnd Some Other Proven?al Festivals
19608	Van Dyke, Henry, 1852-1933	The Story of the Other Wise Man
From prosfilaes at gmail.com  Sun Nov 12 08:10:31 2006
From: prosfilaes at gmail.com (David Starner)
Date: Sun Nov 12 08:16:55 2006
Subject: [gutvol-d] Project Gutenberg
In-Reply-To: <200611081841.kA8If0LW015376@outmx025.isp.belgacom.be>
References: <200611081841.kA8If0LW015376@outmx025.isp.belgacom.be>
Message-ID: <6d99d1fd0611120810g26640f39x959252bf7c20c941@mail.gmail.com>

On 11/8/06, Johan Boelaert <j.boelaert@skynet.be> wrote:
> If this were true, it
> would be an excellent opportunity to publish European books online, which
> can't be published in Gutenberg U.S., because of the much more severe
> copyright laws over there.

It's not exactly more severe laws; I was reading the Oxford Book of
British Prose and Poetry 1890-1905, and at least one of the authors
won't be in the public domain for another couple decades. PG holds a
couple books of Agatha Christie's, which won't be in the public domain
in Europe until 2050.

> If you try to ask for a clearance, you
> are redirected to Getenberg US?
>
> Is there any possibility to improve this site?

One of the major originators and directors behind this is currently
doing mandatory military service. However, there's quite a bit of
stuff still being done at DP-EU (<http://dp.rastko.net>) and you can
get a lot of help there, including the informal clearances they're
currently doing.
From jeroen.mailinglist at bohol.ph  Sun Nov 12 09:44:36 2006
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Sun Nov 12 09:44:18 2006
Subject: [gutvol-d] The interesting comment by Lee and what the
	PG	collection will become known for...
In-Reply-To: <45515526.4000708@novomail.net>
References: <c6b.45424f4.3280ffc1@aol.com>	<c26320b80611061526g5de7e652lb5666ff08e9d37e8@mail.gmail.com>	<455114CC.3060203@novomail.net>	<200611071940.06539.donovan@abs.net>
	<45515526.4000708@novomail.net>
Message-ID: <45575D84.1060000@bohol.ph>

Lee Passey wrote:
>
> There is no doubt in /my/ mind that the DP texts are the best of the
> lot. And without getting into any discussion here about whether DP
> standards are sufficient for most purposes, my observation is that the
> DP standards are consistently evolving for the better. Merely having
> stated standards, no matter how trivial, is a major indication of
> quality. I am quite looking forward to the time when DP begins the
> task of re-scanning and re-creating the most popular public domain
> books which make up the early part of the PG corpus. If Project
> Gutenberg has any data which may indicate the most popular downloads
> over the past 4 or 5 years, I'm sure that would be very useful.
>
That is a nice proposal, unfortunately, due to the existence of many
mirrors, quite difficult to implement. The current web interface on
www.gutenberg.org already has some stats, which may serve your purposes,
though. Since this is the location most people will go to or be lead to
when searching for a text, I think those figures are a decent sample.
> But how do I find DP e-texts, and when I've found them how do I know
> they still meet the standard? A simple commitment from the Powers That
> Be at Project Gutenberg that e-texts produced by Distributed
> Proofreaders will not be altered without a change log or revision
> control would be a /major/ advance. But that is a degree of control
> which seems inconsistent with the stated PG preference for anarchy.
>
Version control and features similar to the view history feature in
wikipedia are great, and I would applaud at such a step. I think it
would be very nice idea to install a Subversion (or other suitable
version control system) repository, in which all etexts (sans the
compressed zip archives) will be stored, transparently mapped to the
current ftp and web based archives (with automatically generated zip
archives), with a few sites mirroring the entire repository (with
history). The white washers would have commit permissions, the rest of
the world can trace what has been done since the first posting.

I will give importing the entire bulk of PG into a Subversion server I
try if I get a chance. Interested whether it can handle a 25 Gigabyte
commit... First have to make space for that, then sync my personal
mirror of PG.

> So, to reiterate the point that started this thread: I don't care what
> Project Gutenberg chooses to be; I only hope that it does not hold
> itself out to be something it is not, and the current disclaimer is an
> important part in helping consumers understand what they are, and are
> not, getting when they download a Project Gutenberg e-text.
>
I hope you will care, and help. Many criticisms on Project Gutenberg
have an implied assumption that what the paper print industry has
produced in the past has some consistency or quality. For those who
don't want those believes falsified, please never try to compare two
editions of a well known classic -- the number of differences will shock
you. The number of mistakes (such as misspellings, inconsistencies, and
punctuation errors) in books published by many publishers is rather
high, as can be confirmed by the long lists of errata I currently add to
my html versions of etexts. To ruthlessly blame PG, and remain silent on
the produce of many traditional publishers is not fair. I personally
strive to what is considered a very high standard of encoding, known as
"five nines", that is one mistake every 100,000 opportunities. For an
average size book, that will mean 5 mistakes will slip through. PG is
not yet there, but tell me one publisher who is. A recent book (The
Koran in Dutch) I handled had about 1 mistake per 4000 characters, or
.99975 correctness.

Jeroen.

From phil at thalasson.com  Sun Nov 12 09:10:01 2006
From: phil at thalasson.com (Philip Baker)
Date: Sun Nov 12 09:48:12 2006
Subject: [gutvol-d] The interesting comment by Lee and what the
	PG	collection will become known for...
In-Reply-To: <324722079.20061108144440@noring.name>
Message-ID: <mGPrFDApV1VFFwyY@thalasson.com>

In article <324722079.20061108144440@noring.name>, Jon Noring
<jon@noring.name> writes
>Philip wrote:
>
>> But what paper book publishers have explicitly stated standards? I don't
>> remember seeing any, but I commonly find fairly obvious errors in paper
>> books from all sorts of publishers all the way from large publishers
>> with a fair degree of prestige like Oxford University Press to tiny
>> publishers that have only published a handful of books. And when do
>> paper book editions of 'classic' out of copyright works meticulously
>> present the sources for their text?
>
>Very interesting question.
>
>And I was composing a long reply differentiating what PG is doing
>versus what paper book publishers do, the filtering mechanisms at the
>scholarly level, the worry a major publisher has in not doing shoddy
>work (if they seek quality), etc. But I decided not to discuss that
>here, but instead ask:
>
>"Shouldn't PG be better than commercial paper book publishers when it
>comes to doing things right?"

May be PG already is better. The expanded PG mission statement asserts
that "most of our eBooks exceed these standards [99.95% level of
accuracy]". No evidence has been provided by you or Lee Passey that
this assertion is incorrect. What commercial paper book publisher has
made a similar claim? Some may have done so but I've never seen any.

PG is committed to "incrementally correct errors as times goes on".
How well and how quickly it proceeds on that course depends on the
number of volunteers there are who believe that this is an important
aim and who have enough time and energy and commitment to do the work
required. You widen the range of potential volunteers that can find
and report errors if you put texts into the public archive sooner
rather than later. There is a balance to be struck here. A balance
that will shift over time as 'best practice' evolves.
    
>
>and
>
>"Why should a bottoms-up organization like PG blindly emulate the
>practices of top-down publishers who only care about making money?"

It doesn't.
>
>The question (and this is not a criticism) also reminds me of the
>teenager telling their parents "well, everyone else is doing it."
>That sort of argument wears thin on any parent who knows (or should
>know) what's best for their children.

I like the image you conjure up, Jon Noring giving a parental guiding
hand to the wayward youth that is Project Gutenberg. But somehow it
doesn't quite convince.
-- 
Philip Baker
From sly at victoria.tc.ca  Sun Nov 12 09:57:38 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Nov 12 09:57:42 2006
Subject: [gutvol-d] Copyright terms
In-Reply-To: <6d99d1fd0611120810g26640f39x959252bf7c20c941@mail.gmail.com>
References: <200611081841.kA8If0LW015376@outmx025.isp.belgacom.be>
	<6d99d1fd0611120810g26640f39x959252bf7c20c941@mail.gmail.com>
Message-ID: <Pine.GSO.4.58.0611120948290.7267@vtn1.victoria.tc.ca>


Yes, I agree with David. Here are two contrasting examples
comparing US and Canadian copyright law.

PG#1612, "Poems By a Little Girl", was indeed written when
the author (Hilda Conkling, 1910-1986) was a young girl.
Because of its publishing date, it has been PD in the US
for some time now. In Canada, it won't be PD until 2037.

For an alternate example take the novel "Nineteen Eighty-Four".
George Orwell died in 1950, so his writings became PD
in Canada Jan. 2001. In the US, under current rules, ummm....
I think it will not be PD until Jan. 2045.

However, speaking in generalities, being under a life+50
regime does open up a vast number of titles from the 1920s,
30s, and 40s.

Andrew

On Sun, 12 Nov 2006, David Starner wrote:

> On 11/8/06, Johan Boelaert <j.boelaert@skynet.be> wrote:
> If this were true, it
> would be an excellent opportunity to publish European books online, which
> can't be published in Gutenberg U.S., because of the much more severe
> copyright laws over there.

It's not exactly more severe laws; I was reading the Oxford Book of
British Prose and Poetry 1890-1905, and at least one of the authors
won't be in the public domain for another couple decades. PG holds a
couple books of Agatha Christie's, which won't be in the public domain
in Europe until 2050.

From sly at victoria.tc.ca  Sun Nov 12 10:17:22 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Nov 12 10:17:26 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <mGPrFDApV1VFFwyY@thalasson.com>
References: <mGPrFDApV1VFFwyY@thalasson.com>
Message-ID: <Pine.GSO.4.58.0611121013270.3406@vtn1.victoria.tc.ca>


On Sun, 12 Nov 2006, Philip Baker wrote:

> In article <324722079.20061108144440@noring.name>, Jon Noring
> <jon@noring.name> writes

> >"Shouldn't PG be better than commercial paper book publishers when it
> >comes to doing things right?"
>
> May be PG already is better. The expanded PG mission statement asserts
> that "most of our eBooks exceed these standards [99.95% level of
> accuracy]". No evidence has been provided by you or Lee Passey that
> this assertion is incorrect. What commercial paper book publisher has
> made a similar claim? Some may have done so but I've never seen any.
>

Well, although they don't put it into quite as many words, the Centre
for Editing Early Canadian Texts does a pretty stunning job in preparing
its texts. They digitize multiple editions, and use that as a basis for
comparison, include a long introduction with lots of information about
background of the text, and plenty more tidbits in appendixes, including
in at least a few cases documentation of how words broken by line-end
hyphenation in their main source ws treated.

http://www.mqup.mcgill.ca/book_list.php?series=5

But then again, I suppose they are not actually a "book publisher".

Andrew
From lee at novomail.net  Sun Nov 12 10:35:11 2006
From: lee at novomail.net (Lee Passey)
Date: Sun Nov 12 10:34:27 2006
Subject: [gutvol-d] TEI to XHTML transformations
In-Reply-To: <45565BEB.9020003@novomail.net>
References: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>
	<45565BEB.9020003@novomail.net>
Message-ID: <4557695F.2020207@novomail.net>

Lee Passey wrote:
> I would recommend creating a Table of Contents that looks like this:
>
> <ul class="toc">
> <li><a href="#toc1">THE SIOUAN STOCK</a></li>
> <li><ul>
>  <li><a href="#toc3">DEFINITION</a></li>
>  <li><ul>
>   <li><a href="#toc5">EXTENT OF THE STOCK</a></li>
>   <li><a href="#toc7">TRIBAL NOMENCLATURE</a></li>
>   <li><a href="#toc9">PRINCIPAL CHARACTERS</a></li>
>   <li><ul>
>    <li><a href="#toc11">PHONETIC AND GRAPHIC ARTS</a></li>
>    <li><a href="#toc13">INDUSTRIAL AND ESTHETIC ARTS</a></li>
>    <li><a href="#toc15">INSTITUTIONS</a></li>
>    <li><a href="#toc17">BELIEFS</a></li>
>    <li><ul>
>     <li><a href="#toc19">THE DEVELOPMENT OF MYTHOLOGY</a></li>
>     <li><a href="#toc21">THE SIOUAN MYTHOLOGY</a></li>
>    </ul></li>
>    <li style="margin-left: 6em"><a href="#toc23">SOMATOLOGY</a></li>
>   </ul></li>
>  </ul></li>
>  <li><a href="#toc25">HABITAT</a></li>
>  <li><a href="#toc27">ORGANIZATION</a></li>
>  <li><a href="#toc29">HISTORY</a></li>
>  <li><ul>
>   <li><a href="#toc31">DAKOTA-ASINIBOIN</a></li>
>   <li><a href="#toc33">&#162;EGIHA</a></li>
>   <li><a href="#toc35">&#647;&#390;IWE'RE</a></li>
>   <li><a href="#toc37">WINNEBAGO</a></li>
>   <li><a href="#toc39">MANDAN</a></li>
>   <li><a href="#toc41">HIDATSA</a></li>
>   <li><a href="#toc43">THE EASTERN AND SOUTHERN TRIBES</a></li>
>   <li><a href="#toc45">GENERAL MOVEMENTS</a></li>
>  </ul></li>
> </ul></li>
> <li><a href="#toc47">SOME FEATURES OF INDIAN SOCIOLOGY</a></li>
> </ul>
>
> (The indentation was added just so I could keep track of closing tags).
I apologize for the error here; it's what happens when you create code 
off the top of your head without testing it out.

Conceptually, a heading and its sub-headings are all part of a single 
list item, so when creating a new list with <ul> the dependent <li> 
should not be closed. So instead of:

 <li><a href="#toc3">DEFINITION</a></li>
 <li><ul>
  <li><a href="#toc5">EXTENT OF THE STOCK</a></li>
  <li><a href="#toc7">TRIBAL NOMENCLATURE</a></li>
  <li><a href="#toc9">PRINCIPAL CHARACTERS</a></li>
  ...
  </ul></li>

it should be:

 <li><a href="#toc3">DEFINITION</a><ul> <-- no "</li><li>" here
  <li><a href="#toc5">EXTENT OF THE STOCK</a></li>
  <li><a href="#toc7">TRIBAL NOMENCLATURE</a></li>
  <li><a href="#toc9">PRINCIPAL CHARACTERS</a></li>
  </ul></li>    <-- this </li> closes the element containing "DEFINITION"

From sly at victoria.tc.ca  Sun Nov 12 11:43:28 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Nov 12 11:43:34 2006
Subject: [gutvol-d] Christmas-related texts in PG
In-Reply-To: <00a401c70669$46c9c6f0$0300a8c0@blackbox>
References: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca><20061110091219.IXWX6331.dukecmmtar02.coxmail.com@Tony.baechler.net>
	<Pine.GSO.4.58.0611112357590.6307@vtn1.victoria.tc.ca>
	<00a401c70669$46c9c6f0$0300a8c0@blackbox>
Message-ID: <Pine.GSO.4.58.0611121141570.23354@vtn1.victoria.tc.ca>

Thanks Aaron.

I see you've helped with some of the iso collections as well...

Ok, here's an attempt to do some further sorting with
the Christmas texts.

http://www.victoria.tc.ca/~sly/xmas1.htm

Use of one heading or another is perhaps somewhat
arbitrary, but it's better than nothing.

I plan to put this on the PG wiki once I've played around
with it a little more.

Suggestions, feedback?

Andrew

On Sun, 12 Nov 2006, Aaron Cannon wrote:

>
> Here you go.  The list will probably strip the attachment, so I'm sending it
> to you directly as well.
>
> If you amend your list, or if you want more catalog data on each book, such
> as size, file name, ETC. let me know.
>
> Sincerely
> Aaron Cannon
From gbnewby at pglaf.org  Sun Nov 12 11:46:02 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Nov 12 11:46:04 2006
Subject: [gutvol-d] Christmas-related texts in PG
In-Reply-To: <009401c70662$95e3e290$0300a8c0@blackbox>
References: <20061112103201.GA17913@mail.pglaf.org>
	<009401c70662$95e3e290$0300a8c0@blackbox>
Message-ID: <20061112194602.GB23625@mail.pglaf.org>

On Sun, Nov 12, 2006 at 07:54:55AM -0600, Aaron Cannon wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> This should fit nicely on a cd.

Yep.

Here's a more permanent link:

	http://snowy.arsc.alaska.edu/pgimages/xmas.iso
	http://snowy.arsc.alaska.edu/pgimages/xmas.iso.md5 for checksum

This is generic output from the ISO maker...I didn't make
add a lot of stuff like I do with our main ISOs.  
  -- Greg


> - --
> Skype: cannona
> MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
> address.)
> - ----- Original Message -----
> From: "Greg Newby" <gbnewby@pglaf.org>
> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
> Sent: Sunday, November 12, 2006 4:32 AM
> Subject: Re: [gutvol-d] Christmas-related texts in PG
> 
> 
> >On Sun, Nov 12, 2006 at 12:11:08AM -0800, Andrew Sly wrote:
> >>
> >>Ok, below is a sorted list of PG numbers, representing
> >>76 christmas-related items in the collection.
> >
> >Neat!  I went ahead and created an ISO via the online tool.
> >
> >http://snowy.arsc.alaska.edu/pgiso
> >
> >If all's well it should be here shortly:
> >
> >ftp://snowy.arsc.alaska.edu/pgiso/isos/gbnewby/xmas.iso
> >
> >But I'll check later to check that it arrived.  About
> >625MB, including these 76 titles in all available file
> >formats (290 files total).
> >
> >Thanks!
> > -- Greg
> >
> >>Note this does not include texts that include one or
> >>two christmas stories mixed with others. I will be
> >>dealing with those separately.
> >>
> >>Would someone be able to help generate a list of
> >>titles from this I could use for a web page?
> >>
> >>(I've spent hours on this already, and I'm going
> >>to be stubborn now, and avoid manually copying out
> >>the titles for each of these.)
> >>
> >>
> >>   46
> >>  519
> >>  520
> >>  644
> >>  676
> >> 1467
> >> 2731
> >> 4384
> >> 5061
> >> 6670
> >> 7256
> >> 8694
> >> 9523
> >>10524
> >>10528
> >>10723
> >>10813
> >>10825
> >>11014
> >>12610
> >>12881
> >>12974
> >>13024
> >>13158
> >>13213
> >>13244
> >>14508
> >>14534
> >>14535
> >>14572
> >>14573
> >>14606
> >>14624
> >>14629
> >>14653
> >>14667
> >>14677
> >>14713
> >>14785
> >>14786
> >>14788
> >>14829
> >>14835
> >>14946
> >>15044
> >>15133
> >>15343
> >>15552
> >>15709
> >>15826
> >>16134
> >>16454
> >>17006
> >>17135
> >>17382
> >>17456
> >>17510
> >>17562
> >>17630
> >>17679
> >>17743
> >>17761
> >>17770
> >>17937
> >>18570
> >>18720
> >>18725
> >>18770
> >>19014
> >>19084
> >>19098
> >>19337
> >>19384
> >>19425
> >>19587
> >>19608
> >>
> >>_______________________________________________
> >>gutvol-d mailing list
> >>gutvol-d@lists.pglaf.org
> >>http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d@lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> >
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
> Comment: Key available from all major key servers.
> 
> iD8DBQFFVyhyI7J99hVZuJcRAhjlAKC2A/+v9c600Eiw0ZVRN226kin63gCfTQy7
> r6cEhnQhWsTAIZkGu8eDn/Y=
> =m4zM
> -----END PGP SIGNATURE-----
From tb at baechler.net  Sun Nov 12 11:53:26 2006
From: tb at baechler.net (Tony Baechler)
Date: Sun Nov 12 11:52:59 2006
Subject: [gutvol-d] Christmas-related texts in PG
In-Reply-To: <Pine.GSO.4.58.0611112010140.4772@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0611091116130.4469@vtn1.victoria.tc.ca>
	<20061110091219.IXWX6331.dukecmmtar02.coxmail.com@Tony.baechler.net>
	<Pine.GSO.4.58.0611112010140.4772@vtn1.victoria.tc.ca>
Message-ID: <20061112195252.ICAE6331.dukecmmtar02.coxmail.com@Tony.baechler.net>

At 08:18 PM 11/11/06 -0800, you wrote:

>For individual stories which occur in other collections, what
>about the idea of making some kind of "PG Christmas anthology"?
>including them together? Or is that too "1990s-PG"...
>
>Andrew

I would be very much in favor of this.  Now, I have a bunch of books 
that really don't relate to Christmas all thrown in a directory 
because of a short story here or there.  It would be nicer to have 
them all in one file instead of scattered over many files.  I would 
also like to see a similar anthology for the short stories of H. Beam Piper. 


-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.14.3/530 - Release Date: 11/11/06


From cannona at fireantproductions.com  Sun Nov 12 14:33:36 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sun Nov 12 14:36:17 2006
Subject: [gutvol-d] Christmas-related texts in PG
References: <20061112103201.GA17913@mail.pglaf.org><009401c70662$95e3e290$0300a8c0@blackbox>
	<20061112194602.GB23625@mail.pglaf.org>
Message-ID: <000f01c706aa$f5422bf0$0300a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Since you included a lot of text files along with their zipped counterparts,
the ISO would probably zip nicely.  Not a big deal, just a thought.  It
might save you some bandwidth.

Sincerely
Aaron Cannon


- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)
- ----- Original Message -----
From: "Greg Newby" <gbnewby@pglaf.org>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Sunday, November 12, 2006 1:46 PM
Subject: Re: [gutvol-d] Christmas-related texts in PG


> On Sun, Nov 12, 2006 at 07:54:55AM -0600, Aaron Cannon wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> This should fit nicely on a cd.
>
> Yep.
>
> Here's a more permanent link:
>
> http://snowy.arsc.alaska.edu/pgimages/xmas.iso
> http://snowy.arsc.alaska.edu/pgimages/xmas.iso.md5 for checksum
>
> This is generic output from the ISO maker...I didn't make
> add a lot of stuff like I do with our main ISOs.
>  -- Greg
>
>
>> - --
>> Skype: cannona
>> MSN/Windows Messenger: cannona@hotmail.com (don't send email to the
>> hotmail
>> address.)
>> - ----- Original Message -----
>> From: "Greg Newby" <gbnewby@pglaf.org>
>> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
>> Sent: Sunday, November 12, 2006 4:32 AM
>> Subject: Re: [gutvol-d] Christmas-related texts in PG
>>
>>
>> >On Sun, Nov 12, 2006 at 12:11:08AM -0800, Andrew Sly wrote:
>> >>
>> >>Ok, below is a sorted list of PG numbers, representing
>> >>76 christmas-related items in the collection.
>> >
>> >Neat!  I went ahead and created an ISO via the online tool.
>> >
>> >http://snowy.arsc.alaska.edu/pgiso
>> >
>> >If all's well it should be here shortly:
>> >
>> >ftp://snowy.arsc.alaska.edu/pgiso/isos/gbnewby/xmas.iso
>> >
>> >But I'll check later to check that it arrived.  About
>> >625MB, including these 76 titles in all available file
>> >formats (290 files total).
>> >
>> >Thanks!
>> > -- Greg
>> >
>> >>Note this does not include texts that include one or
>> >>two christmas stories mixed with others. I will be
>> >>dealing with those separately.
>> >>
>> >>Would someone be able to help generate a list of
>> >>titles from this I could use for a web page?
>> >>
>> >>(I've spent hours on this already, and I'm going
>> >>to be stubborn now, and avoid manually copying out
>> >>the titles for each of these.)
>> >>
>> >>
>> >>   46
>> >>  519
>> >>  520
>> >>  644
>> >>  676
>> >> 1467
>> >> 2731
>> >> 4384
>> >> 5061
>> >> 6670
>> >> 7256
>> >> 8694
>> >> 9523
>> >>10524
>> >>10528
>> >>10723
>> >>10813
>> >>10825
>> >>11014
>> >>12610
>> >>12881
>> >>12974
>> >>13024
>> >>13158
>> >>13213
>> >>13244
>> >>14508
>> >>14534
>> >>14535
>> >>14572
>> >>14573
>> >>14606
>> >>14624
>> >>14629
>> >>14653
>> >>14667
>> >>14677
>> >>14713
>> >>14785
>> >>14786
>> >>14788
>> >>14829
>> >>14835
>> >>14946
>> >>15044
>> >>15133
>> >>15343
>> >>15552
>> >>15709
>> >>15826
>> >>16134
>> >>16454
>> >>17006
>> >>17135
>> >>17382
>> >>17456
>> >>17510
>> >>17562
>> >>17630
>> >>17679
>> >>17743
>> >>17761
>> >>17770
>> >>17937
>> >>18570
>> >>18720
>> >>18725
>> >>18770
>> >>19014
>> >>19084
>> >>19098
>> >>19337
>> >>19384
>> >>19425
>> >>19587
>> >>19608
>> >>
>> >>_______________________________________________
>> >>gutvol-d mailing list
>> >>gutvol-d@lists.pglaf.org
>> >>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>> >_______________________________________________
>> >gutvol-d mailing list
>> >gutvol-d@lists.pglaf.org
>> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
>> >
>> >
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
>> Comment: Key available from all major key servers.
>>
>> iD8DBQFFVyhyI7J99hVZuJcRAhjlAKC2A/+v9c600Eiw0ZVRN226kin63gCfTQy7
>> r6cEhnQhWsTAIZkGu8eDn/Y=
>> =m4zM
>> -----END PGP SIGNATURE-----
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFFV6HdI7J99hVZuJcRArQXAKCsVUboLCj3l/64lruBXPt0R5t49gCgq41H
Mvh6V8NYM96MbJojRmxGk6A=
=LaUm
-----END PGP SIGNATURE-----

From lee at novomail.net  Mon Nov 13 13:27:27 2006
From: lee at novomail.net (Lee Passey)
Date: Mon Nov 13 13:25:48 2006
Subject: [gutvol-d] TEI to XHTML transformations - continued
In-Reply-To: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>
References: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>
Message-ID: <4558E33F.5020404@novomail.net>

joshua@hutchinson.net wrote:

>  I am serious about wanting your feedback on the XHTML we generate and
>  ideas for ways to improve the default CSS that is in those files.

Left alignment is the default for blocks in HTML, so setting the style 
"text-align: left" is usually superfluous. But hopefully, you're 
starting to believe that setting /any/ style is a bad idea.

As a general rule, markup of the form "<span style='whatever'>" should 
be avoided whenever there is a more specific HTML element. As the CSS 
working group stated:

<blockquote>
CSS gives so much power to the "class" attribute, that authors could 
conceivably design their own "document language" based on elements with 
almost no associated presentation (such as DIV and SPAN in HTML) and 
assigning style information through the "class" attribute. Authors 
should avoid this practice since the structural elements of a document 
language often have recognized and accepted meanings and author-defined 
classes may not.
</blockquote>

Thus, the TEI phrase "<hi rend='vertical-align: super'>" (which is the 
example I encountered here) should be converted to "<sup>", and /not/ to 
"<span style='vertical-align: super;'>." The enclosing <span> that 
identifies this as a member of the "tei-hi" class could also be merged 
with the <sup> element (if it is not omitted altogether, which is 
preferable) such than "<hi rend='vertical-align: super'>" would be 
converted to "<sup class='tei-hi'>," and /not/ to "<span class='tei 
tei-hi'><span style='vertical-align: super;'>."

Likewise, the auto generated footnotes should use the <sup> element, if 
such an element is used at all. Personally, I prefer footnote links to 
be fairly large so they make a good target for my mouse pointer. The 
three elements now being used could be merged into a single element so 
instead of generating: "<a id='noteref_1' href='#note_1'><span 
class='tei tei-noteref'><span style='font-size: 60%; vertical-align: 
super;'>" you would have simply: "<a id='noteref_1' href='#note_1' 
class='tei-noteref'>" and then the actual presentation would be 
controlled by a ".tei-noteref" CSS rule.

I also think that the TEI phrase "<hi rend='italic'>" should be 
converted to "<i>". Even though the <i> element is deprecated, it serves 
a very useful purpose which <em> does not satisfy. Indeed, a bare <hi> 
should probably also be converted to <i>, with a notation indicating 
that the actual rendering is unknown, such as "<i class='tei-hi'>".

I note that every <h?> element generated is labeled with 
"class='tei-head'." There is no "tei-head" CSS rule, so unless you're 
round tripping the classification is superfluous. Even if you /are/ 
round tripping, a rule that converts <h?> to <head> would probably be 
adequate so the classification is still superfluous. Again I see that 
every <h?> element has a style attribute, so even if I created a 
"tei-head" rule, or a rule for all or some HTML header elements, these 
rules would be overridden by the element specific style. These style 
attributes should be removed.

I also note that every <h?> elements 1, 2, and 3 contains a <span> 
element attempting to set the font size for the contents of the header. 
HTML already has default font sizes and styles for all header elements, 
and you should not attempt to override these styles without a compelling 
reason. Even if you can find a compelling reason, you should use CSS 
style rules to implement these new styles and you should /not/ add an 
unnecessary <span> to the <h?> element.

Personally, I like to reserve <h1> for the document title page. Thus, 
when you start creating headers based upon division levels I think you 
should start with <h2>

There are places in the generated title page where <div> elements are 
marked with "class='block'." In HTML the <div> element is a block 
element by default, so the classification is simply redundant. There is 
also at least one place where there is an empty <div> elements that 
apparently serves only to create to create a blank area on the screen. 
Needless to say, this should be done away with. I also note that as the 
various levels of <div> in TEI are processed, it creates corresponding 
<div>s in the HTML, but the generated <div>s all carry presentational 
baggage. First level <div>s are set to have top and bottom margins of 5 
lines, second level <div>s are set to have top and bottom margins of 4 
lines, third level <div>s are set to have top and bottom margins of 3 
lines, and so on. I can see no reason for this presentational baggage; 
it should be omitted.

I like the idea of grouping parts of a file by classed <div>s because it 
makes it possible to perform automated functions, such as creation of 
metadata files, or splitting and recombining a file for use on 
"less-capable" platforms. However, a <div> with neither a "style" 
attribute (which should be studiously avoided) nor a "class" attribute 
is of no use. Thus, I would suggest that all conversions from TEI <div>s 
to HTML <div>s be given /meaningful/ class attributes. If the TEI <div> 
has a "type" attribute, that should be used as the HTML class. If the 
TEI element is a <divn>, the number should be used as the HTML class 
attribute (e.g. <div class='tei-div6'>). If the <div> is unadorned, a 
class attribute should be generated reflecting the nesting level (e.g. 
<div class='tei-level3'>).

To be continued ...

-- 
Nothing of significance below this line.


From vlsimpson at gmail.com  Tue Nov 14 12:11:00 2006
From: vlsimpson at gmail.com (V. L. Simpson)
Date: Tue Nov 14 12:17:12 2006
Subject: [gutvol-d] TEI to XHTML transformations - continued
In-Reply-To: <4558E33F.5020404@novomail.net>
References: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>
	<4558E33F.5020404@novomail.net>
Message-ID: <bd09bf340611141211g374a25f7s9f7a1923c892a595@mail.gmail.com>

On 11/13/06, Lee Passey <lee@novomail.net> wrote:

> I also think that the TEI phrase "<hi rend='italic'>" should be
> converted to "<i>". Even though the <i> element is deprecated, it serves
> a very useful purpose which <em> does not satisfy. Indeed, a bare <hi>
> should probably also be converted to <i>, with a notation indicating
> that the actual rendering is unknown, such as "<i class='tei-hi'>".

The italic element is not deprecated.

vls
From lee at novomail.net  Tue Nov 14 14:32:03 2006
From: lee at novomail.net (Lee Passey)
Date: Tue Nov 14 14:30:23 2006
Subject: [gutvol-d] TEI to XHTML transformations - continued
In-Reply-To: <bd09bf340611141211g374a25f7s9f7a1923c892a595@mail.gmail.com>
References: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>	<4558E33F.5020404@novomail.net>
	<bd09bf340611141211g374a25f7s9f7a1923c892a595@mail.gmail.com>
Message-ID: <455A43E3.3080704@novomail.net>

V. L. Simpson wrote:
> On 11/13/06, Lee Passey <lee@novomail.net> wrote:
> 
>> I also think that the TEI phrase "<hi rend='italic'>" should be
>> converted to "<i>". Even though the <i> element is deprecated, it serves
>> a very useful purpose which <em> does not satisfy. Indeed, a bare <hi>
>> should probably also be converted to <i>, with a notation indicating
>> that the actual rendering is unknown, such as "<i class='tei-hi'>".
> 
> The italic element is not deprecated.

You are correct. I was thinking about the fact that <i> is not supported 
in the new, proposed XHTML 2, although I have heard rumors that the W3C 
is reconsidering that proposal given <i>'s obvious usefulness.

-- 
Nothing of significance below this line.

From lee at novomail.net  Tue Nov 14 15:58:29 2006
From: lee at novomail.net (Lee Passey)
Date: Tue Nov 14 15:56:40 2006
Subject: [gutvol-d] TEI rendering in Web browsers - part 3
In-Reply-To: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>
References: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>
Message-ID: <455A5825.1090004@novomail.net>

joshua@hutchinson.net wrote:

> I am serious about wanting your feedback on the XHTML we generate 
> and ideas for ways to improve the default CSS that is in those files.

I have noticed a few <span> elements in the generated file that attempt 
to center the phrase using "style='text-align: center;'." In HTML this 
style is only significant when applied to elements where 
"display='block';" it should not be applied to <span> elements.

HTML has always had quite good support for lists. Thus, I was quite 
surprised to see that the PGTEI XSL scripts convert TEI lists to HTML 
tables!

In addition to <p> abuse (using the <p> element for generic blocks 
because the default <p> presentation is desired) the use of <table> 
elements to force a certain layout is a hallmark of ill-conceived HTML. 
TEI <list>s should always map to HTML <list>s, and /only/ TEI <table>s 
should map to HTML <table>s.

I think I'm beginning to see a pattern here.

When HTML started its life as an offshoot of SGML, it was primarily 
presentationally oriented. The tags were things like <center>, <i>, <b>, 
<strike>, <u> etc. As time went on people began to learn that when the 
canvas is as fluid as a computer screen, markup that reflects the 
semantics of a document is more flexible than markup that attempts to 
fix the presentation of a document. As time has gone on, new elements 
have been added to focus more on the semantics of a document (e.g. <em>, 
<strong>) and other elements which are purely presentational (e.g. 
<center>, <u>) have been deprecated.

The output of the PGTEI XSL scripts suggests that PGTEI is still stuck 
in the presentational orientation of the past. This may be due to the 
fact that the tools to generate XHTML and PDF were created more or less 
simultaneously. PDF is designed to mimic an immutable ink on paper 
presentation; and structure in the format is accidental. TEI is designed 
to describe how a document is structured in the abstract without regard 
to how it should be presented; indeed, my sense is that the "rend" 
attribute is designed more to indicate how a phrase was rendered in the 
original than a suggestion as to how it should be re-rendered in the 
future. HTML started out being closer to PDF, but has been evolving to 
become more like TEI. The PGTEI XSL scripts should be developed to 
follow this evolution and produce XHTML that is more TEI-like than PDF-like.

I think I have exhausted the suggestions I can derive from _The Siouan 
Indians_. There are no doubt other issues outstanding (for example, we 
should verify that TEI <foreign> maps to HTML's <i class="foreign">). If 
you're interested I can address these unresolved issues when I encounter 
them in a different text, or when I get into the PGTEI XSL scripts.

As I have reviewed the subject e-text, I have altered it to reflect my 
suggestions. I have published the resulting file (which is now largely 
devoid of "style" attributes and <span> elements) at 
http://www.passkeysoft.com/~lee/19628.html if you're interested in 
seeing what I think the XHTML output /ought/ to look like.

HTH

-- 
Nothing of significance below this line.

From joshua at hutchinson.net  Tue Nov 14 17:32:54 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Tue Nov 14 17:33:01 2006
Subject: [gutvol-d] TEI to XHTML transformations - continued
Message-ID: <15320629.1163554374271.JavaMail.?@fh1038.dia.cp.net>

>----Original Message----
>From: lee@novomail.net
>
>You are correct. I was thinking about the fact that <i> is not 
supported 
>in the new, proposed XHTML 2, although I have heard rumors that the 
W3C 
>is reconsidering that proposal given <i>'s obvious usefulness.
>

If I remember right, that is why <i> was not used in the PGTEI 
conversion process, but maybe that should be revisited.

(I'm not ignoring your commentary ... there is a lot of good stuff in 
there and I want to tackle it when I have time to give it the attention 
it deserves.)

Josh
From joshua at hutchinson.net  Wed Nov 15 09:20:59 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov 15 09:21:07 2006
Subject: [gutvol-d] TEI to XHTML transformations
Message-ID: <20445274.1163611259801.JavaMail.?@fh1037.dia.cp.net>

**I'm going to compile all three of Lee's feedback emails into one 
response.

First of all, thank you Lee, for putting this together.  I really do 
appreciate this kind of feedback.  And the feedback you gave is well 
thought out and researched.

quote:
    if the XHTML files are generated dynamically using
    XSLT each time they are downloaded then an end user will /always/ 
get
    the result of the most recent iteration of the transformation 
tools no
    matter when the master TEI file was generated.

I think I mentioned this previously, but right now, they are 
statically posted to the archive.  The PG whitewasher's take the TEI 
master file and run it through a script that generates a zip archive 
with the PDF, HTML, UTF-8 text, image files, etc.  This zip file is 
then pushed to the master PG archives.  Any fixes identified by the 
errata email folks require another pass through the script and push to 
the archive.  The biggest benefit, though, is that the fix need only be 
applied to the TEI master and it "populates" down through the generated 
formats.  No need to fix something in multiple files and worry about 
making sure you hit everything in all spots.

quote:
    I ran them through HTML Tidy
    to regularize the presentation for human analysis and to convert 
the
    native UTF-8 encoding to ASCII

It doesn't sound like you were holding this up as a complaint, but 
just in case you were, I wanted to outline my philosophy here.  I 
believe sticking to a UTF-8 encoding everywhere is the "right thing to 
do".  The TEI master *can* be encoded in Latin-1 or even US-ASCII.  The 
conversion scripts *could* output the HTML as Latin1 or even US-ASCII.  
But I believe that UTF-8 allows for a clearer source code.  Plus, it 
guarantees that anyone with a modern computer on basically ANY 
operating system will be able to open the file and see exactly the same 
thing as everyone else.

quote:
    Older versions of Tidy complained about
    "nested emphasis" when encountering the construct "<span
    class='whatever'><span style='whatever'>...</span></span>" ...the 
construct, 
    while inelegant, is not incorrect, and given the limitation of XSL 
transformations it is
    probably not objectionable.

This is an artifact of how the script converts the original TEI.  
While it is possible to construct the tool chain to add a step that 
"consolidates" the nested spans into a single complex span, the code 
required would be non-trivial and honestly not worth the effort.

quote:
    /every/ <p> tag is qualified with both a "class" attribute and a 
"style"
    attribute ... The amount of effort required to replace all of 
these
    style attributes with what it should really be ("style='margin: 0;
    text-indent: 2em") is simply too great ... Thus we can see that 
not only is the
    "style" attribute attached to each paragraph redundant, 
identifying each
    <p> element as "class='tei-p'" is also redundant.

I'm not going to argue that this is pretty (it isn't).  I honestly 
can't remember what led to the design decision to have both a class and 
a style (I remember talking about the issue, but that's all).  I'll 
check with Marcello to see if he remembers why it was done that way.

<p class="tei tei-p" style="margin-bottom: 1.00em">

The class has two entries, one for tei (which pulls from the CSS "p.
tei" line) and tei-p (which pulls nothing from anywhere that I could 
find).  This is redundant and useless.

Class fulfills two purposes.  One, it allows a style to be pulled from 
the CSS section and applied to that section.  Two, it provides a "name" 
for the type of object.  For a standard <p>, the name aspect isn't that 
important.  But for other elements (like a line of poetry), that class 
becomes important to more easily indentify its purpose to someone 
looking at the source markup.

In the interests of such markup philosophy, every element in our HTML 
has a class assigned, whether that class has a style defined for it or 
not.  It makes it more wordy, but has merit, I believe.

Now, in regards to the example above, the best markup would probably 
be something like this:

<p class="tei-p">

assuming that the style really ISN'T necessary for some reason.  The 
tei-p can be styled in the CSS section and also identifies what purpose 
this element fulfills (it is just a paragraph).

quote:
    Remove all style definitions which are duplicates of the default 
styles.

I don't agree here for one simple reason.  Assuming we remove the 
style attribute from the <p> element, having a class attribute allows a 
global change in the CSS section to affect all the paragraphs.  So, for 
instance, is someone prefers indented paragraphs, they can make that 
style change to p.tei-p and it will cascade down to the rest of the 
document.

Even if we aren't using a non-standard style on a class in our default 
format, having the class there provides a certain flexibility for 
someone else to ADD non-standard styling.

quote:
    I did note a fair amount of <p> element abuse in the resulting 
XHTML
    file, but it appears that it is a result of <p> element abuse in 
the
    original TEI file.

*hangs head in shame* that is no doubt true.  I have probably misused 
lots of things in the past (some I used to do but have learned better 
... some I no doubt still do and need to learn more).

quote:
    Why you would want to force me to put up with justified text is
    incomprehensible. Get rid of all references to "text-align: 
justify."

I'll only say that you are in a vast minority here.  Every time I've 
gotten layout feedback, it has been overwhelmingly in favor of 
justified text.  People tend to think it looks more "professional".  
That being said, that one is easy to change.  Under the CSS section 
change:

body.tei		        { margin: 4ex 10%; text-align: justify }

to

body.tei		        { margin: 4ex 10%; text-align: left }

quote:
    in your generated Tables of Contents ... (you) create a single 
list and have
    attempted to create a hierarchical structure by using styles to 
attempt
    to offset the list items. Not only is using styles to try and 
indicate
    structure a bad idea, it's going to fall apart if you're using a 
User
    Agent which doesn't support style sheets.

I'll admit, I never noticed this before.  The ToC just worked and I 
don't remember examining the HTML structure of it before.  I'll check 
with Marcello on this, too, but I admit on the surface it does look 
like a nested list structure would make more logical sense.

quote:
    Many handheld User Agents, such as ?Book, Microsoft Reader, and
    MobiPocket Reader, understand the "page-break-before" attribute 
... 
    I would recommend placing a "<br style='page-break-before:always' 
/>" before
    every element which would otherwise contain the attribute.

So you would recommend this:

<br style='page-break-before:always' /><hr class="doublepage" />

instead of just

<hr class="doublepage" />

I don't have a problem with that, provided it really would provide 
benefit to a significant number of User Agents.  But, it isn't a 
defined style, is it?  I can see people complaining that we are 
including "non-standard" markup/CSS...

quote:
    I don't like the use of Definition Lists for footnotes.

By the W3C documentation, a definition list "consist of two parts: a 
term and a description."  So, while the name is a "definition list", it 
doesn't specify that it is a term and definition.  So, while probably a 
slight perversion of the intent, it can also be argued that a footnote 
can fall within the definition list specs.  And, as you say, it 
admirably meets the layout we need with a minimum of extra effort.

I guess it comes down to this:  Is this important enough to take the 
time to change it?

quote:
    As a general rule, markup of the form "<span style='whatever'>" 
should
    be avoided whenever there is a more specific HTML element ...
    Thus, the TEI phrase "<hi rend='vertical-align: super'>" (which is 
the
    example I encountered here) should be converted to "<sup>", and 
/not/ to
    "<span style='vertical-align: super;'>." 

Ok, I see your point.  Further, a <sup class="footnote"> would allow a 
single CSS change to affect all footnote markers in one fell swoop.  
I'll check with Marcello to see if there is anything I am missing here.

***

Thanks again for the feedback.  I'll try to get some more information 
on the issues I didn't have much to offer and see about getting some of 
the suggestions rolled into the conversion scripts.

Josh
From prosfilaes at gmail.com  Wed Nov 15 10:25:09 2006
From: prosfilaes at gmail.com (David Starner)
Date: Wed Nov 15 10:31:30 2006
Subject: [gutvol-d] Translations of books
Message-ID: <6d99d1fd0611151025m18b9bac3gb29090477f3b55ef@mail.gmail.com>

I've been looking at a translation online recently:
<http://no-sword.jp/botchan/>. As he mentions, PG has a translation of
this book, but it is obviously not written by a native English
speaker. I've considered asking him if PG could store a copy of his
book, but I'm not sure about the issues on our side. First, it's not a
polished translation; he says that he spent a month on it, and offers
links to more careful, scholarly translations. Secondly, it came from
an Aozora Bunko copy, and he translates their trailer here.

Source text: Chikuma Nihon Bungaku Zenshuu: Natsume Soseki, Chikuma
Shobo, first printing January 20th 1992 (Heisei 4)
That book based on: Natsume Soseki Zenshuu 2, Chikuma Bunko, Chikuma
Shobo, first printing October 27, 1987 (Showa 62)

Entered by: MASAKI Yoshiaki (????)
Proofread by: YANAGISAWA Shigeo (????)

Posted on September 13, 1999
Last modified February 27, 2004

AN AOZORA BUNKO FILE:
This file was created by the "internet library", Aozora Bunko. Data
entry, proofreading and all other work is performed by volunteers.

There's no way we could clear this, could we?
From marcello at perathoner.de  Wed Nov 15 12:52:37 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 15 12:52:40 2006
Subject: [gutvol-d] TEI to XHTML transformations
In-Reply-To: <20445274.1163611259801.JavaMail.?@fh1037.dia.cp.net>
References: <20445274.1163611259801.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <455B7E15.9000205@perathoner.de>

joshua@hutchinson.net wrote:


The current design of the conversion chain is to convert the TEI into an 
internal presentation-oriented format with structural hints. The 
internal format is then converted into the end-user formats: HTML, TeX 
and nroff.

Advantages:

1. more internal->end-user converters are easier to add
2. different output formats "look" more consistent

Disadvantages:

1. All structural information has to be carried over by "hinting"

If we use <span style="bla"> instead of <i> etc. it is because the 
hinting is currently only implemented for TEI elements and not for the 
rend attribute. Thus a <head> will turn into a <h*> because of hinting, 
a rend="italic" will not turn into an <i>.  This has reasons, see below.

The next iteration of the design will be to convert into standard XSL-FO 
with structural hinting. This will allow users to use their own XSL-FO 
to <format of choice> converters like PassiveTeX etc.


> <p class="tei tei-p" style="margin-bottom: 1.00em">
> 
> The class has two entries, one for tei (which pulls from the CSS "p.
> tei" line) and tei-p (which pulls nothing from anywhere that I could 
> find).  This is redundant and useless.

Lee made a very strong argument in favor of user control of text 
appearance but has failed to see over the rim of his teacup. In fact the 
"useless" classes are in there exactly to allow what Lee is so loudly 
advocating.

If somebody wants to embed a PG text into his own website, he needs some 
way to differentiate between an <elementname> of his website and an 
<elementname> of the PG text. Every element of our HTML has a 
class="tei" and a class="tei-teielementname" attached to allow just this.

eg. a very plausible stylesheet would contain:

   p { text-align: left }
   p.tei { text-align: justified }

this way the <p>s in the website are left-aligned, while the <p>s in the 
PG text stay justified.

Of course to make this really fly, you'll have to patch css.pl in the 
converter to suit your personal preferences.

Furthermore the class="teielementname" is a structural hint that users 
may evaluate to find out which TEI construct the HTML element came from.


> quote:
>     in your generated Tables of Contents ... (you) create a single 
> list and have
>     attempted to create a hierarchical structure by using styles to 
> attempt
>     to offset the list items.

This is simply a missing feature. TeX and nroff work that way. So we 
just need to pass some hints to the HTML converter. Patches are welcome.


> quote:
>     Many handheld User Agents, such as ?Book, Microsoft Reader, and
>     MobiPocket Reader, understand the "page-break-before" attribute 
> ... 
>     I would recommend placing a "<br style='page-break-before:always' 
> />" before
>     every element which would otherwise contain the attribute.

Why not just put

   hr.doublepage { page-break-before: always }

into the stylesheet?


> quote:
>     I don't like the use of Definition Lists for footnotes.

Why not? The list defines the text for the footnotes declared earlier in 
the text. All known alternatives would be worse: it is not a list 
because the entries have no relation to each other, it is not a table, 
it is not a div, not a p, ...


> quote:
>     As a general rule, markup of the form "<span style='whatever'>" 
> should
>     be avoided whenever there is a more specific HTML element ...
>     Thus, the TEI phrase "<hi rend='vertical-align: super'>" (which is 
> the
>     example I encountered here) should be converted to "<sup>", and 
> /not/ to
>     "<span style='vertical-align: super;'>." 


The TEI philosophy is to confine all presentational text attributes into 
the rend attribute. We simply go along with this philosophy and confine 
all presentation in style attributes.

In HTML 4.01 the <sup> element is still good, the <i> element is 
"discouraged" and the <font> element is "deprecated" but still valid.

Just because a <sup> element does still exist in XHTML it doesn't mean 
we have to use it. You could just as well make a point in favor of the 
<font> element.

The advantages of using <i> and <sup> etc. are so small, it isn't worth 
the trouble implementing. Same goes for removing "redundant" style or 
class attributes. The validator is happy and browsers dig them too. Even 
HTMLTidy does not replace them.

The only users that will have a problem with the style attributes are 
those who want to change the look of the text, eg. display all italic 
text as roman green instead. These users can post-process the html with 
perl or patch the TEI and feed it thru the online converter or install 
and patch the converter locally.

   perl -e 's/text-align:\s*justify/text-align: left/g;'

In my opinion people wanting to do that are extremely rare. It will also 
make the converter more complex and harder to maintain. Thus the 
priority to implement this is very low on my agenda.


The reason I'm using HTML tables for TEI lists is that there is no 
standard construct in HTML that lets you choose an arbitrary list 
marker. Say like this:

   1. First
   AA. Second
   $$$. Third
   Another good point is this: Fourth

In TEI this is allowed, so we have to support it one way or the other. 
We could also use definition lists, but tables are easier because they 
adjust to the width of the list marker while dl wraps the dd to a new line.

Oh! and before you say: CSS2 "content" attribute will allow this, double 
check it with IE6!


-- 
Marcello Perathoner
webmaster@gutenberg.org

From lee at novomail.net  Wed Nov 15 13:20:57 2006
From: lee at novomail.net (Lee Passey)
Date: Wed Nov 15 13:20:59 2006
Subject: [gutvol-d] TEI to XHTML transformations
In-Reply-To: <20445274.1163611259801.JavaMail.?@fh1037.dia.cp.net>
References: <20445274.1163611259801.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <455B84B9.2080509@novomail.net>

joshua@hutchinson.net wrote:

[snip]

>  quote: I ran them through HTML Tidy to regularize the presentation
>  for human analysis and to convert the native UTF-8 encoding
 >           to ASCII
>
>  It doesn't sound like you were holding this up as a complaint, but
>  just in case you were, I wanted to outline my philosophy here. I
>  believe sticking to a UTF-8 encoding everywhere is the "right thing
>  to do". The TEI master *can* be encoded in Latin-1 or even US-ASCII.
>  The conversion scripts *could* output the HTML as Latin1 or even
>  US-ASCII. But I believe that UTF-8 allows for a clearer source code.
>  Plus, it guarantees that anyone with a modern computer on basically
>  ANY operating system will be able to open the file and see exactly
>  the same thing as everyone else.

None of what I said should be interpreted as complaints, merely 
observations. I mentioned Tidy only for the sake of completeness, so the 
process I used would be (hopefully) transparent. There is also the risk 
that Tidy cleaned something up for me, so there could be minor (and 
probably trivial) flaws that I missed.

The principal reason I converted to ASCII is that very few of the 
editors I use grok UTF-8; it's just easier to convert to ASCII or 
Latin-1 if I'm going to edit the file. This does raise and interesting 
issue, though.

In my mind, the biggest advantage to HTML over any other format (with 
the possible exception of TEI+CSS) is that in permits Just In Time 
layout. With Just In Time layout you want to push all of the layout 
decisions as close to the end user as you can, and give the end user the 
maximum amount of control over those layout decisions.

If you choose to embed the PGTEI HTML stylesheet in the document file 
itself, and don't provide a link for a user-supplied CSS file, in order 
for the end user to make any changes to the presentation at all, she/he 
will have to edit the souirce document file. Because support for UTF-8 
is spotty, I would think that if you're going to force an end user to 
edit the file to make any stylistic changes you probably ought to output 
pure ASCII, so as to support the most common software tools. As you 
pointed out, there is nothing you can encode in UTF-8 that you can't 
encode in ASCII; it just makes the resulting source look a little 
messier. This is probably an OK trade-off in this case.

On the other hand, if you created an HTML file that looked acceptable in 
the /absence/ of any stylesheet (by using standard HTML tags), contained 
miminal styling on individual elements, and referenced all stylesheets 
from external files, you would have removed just about every reason 
there would be to need to edit the source file. In this scenario using 
UTF-8 encoding makes much more sense.


From joshua at hutchinson.net  Wed Nov 15 14:05:03 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov 15 14:05:06 2006
Subject: [gutvol-d] TEI to XHTML transformations
Message-ID: <17115451.1163628303898.JavaMail.?@fh1037.dia.cp.net>


>----Original Message----
>From: marcello@perathoner.de
>
>joshua@hutchinson.net wrote:
>
>
>> <p class="tei tei-p" style="margin-bottom: 1.00em">
>> 
>> The class has two entries, one for tei (which pulls from the CSS 
"p.
>> tei" line) and tei-p (which pulls nothing from anywhere that I 
could 
>> find).  This is redundant and useless.
>
>
>If somebody wants to embed a PG text into his own website, he needs 
some 
>way to differentiate between an <elementname> of his website and an 
><elementname> of the PG text. Every element of our HTML has a 
>class="tei" and a class="tei-teielementname" attached to allow just 
this.
>

I have to admit, I'm not following the logic here completely.  I like 
the idea of a class attribute on every element in the file.  I don't 
see why each one has TWO class attributes.  p.tei seems sufficient.

ie.
<p class="tei">

>
>The only users that will have a problem with the style attributes 
are 
>those who want to change the look of the text, eg. display all 
italic 
>text as roman green instead. These users can post-process the html 
with 
>perl or patch the TEI and feed it thru the online converter or 
install 
>and patch the converter locally.

This one I'm not sure I agree with this (if, indeed, I'm understanding 
this one correctly, either).  Right now, we have class assignments 
which provide a style ... but then we have a style attribute which 
overrides the global class.  If we got rid of the individual style 
attribute, then a single change in the CSS section at the top of the 
HTML would allow someone to make a global change without the 
possibility of a local override.


>
>The reason I'm using HTML tables for TEI lists is that there is no 
>standard construct in HTML that lets you choose an arbitrary list 
>marker. Say like this:
>
>   1. First
>   AA. Second
>   $$$. Third
>   Another good point is this: Fourth
>

THAT was the reason.  I knew we had discussed this in the past, but I 
couldn't bring it to mind.  Thanks, Marcello.

(And for those interested, I have had texts that had this kind of list 
weirdness!)

Thanks,
Josh
From marcello at perathoner.de  Wed Nov 15 14:49:01 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov 15 14:49:04 2006
Subject: [gutvol-d] TEI to XHTML transformations
In-Reply-To: <17115451.1163628303898.JavaMail.?@fh1037.dia.cp.net>
References: <17115451.1163628303898.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <455B995D.9030404@perathoner.de>

joshua@hutchinson.net wrote:

> I have to admit, I'm not following the logic here completely.  I like 
> the idea of a class attribute on every element in the file.  I don't 
> see why each one has TWO class attributes.  p.tei seems sufficient.

The converter has to output <html:div> for quite a few different tei 
elements for want of a better element. To differentiate between them you 
can use the tei-elementname class.

If you want docTitle rendered in red you say:

   .tei-docTitle { color: red; }


> This one I'm not sure I agree with this (if, indeed, I'm understanding 
> this one correctly, either).  Right now, we have class assignments 
> which provide a style ... but then we have a style attribute which 
> overrides the global class.  If we got rid of the individual style 
> attribute, then a single change in the CSS section at the top of the 
> HTML would allow someone to make a global change without the 
> possibility of a local override.

Since we have to embed the css into every file, the difference is 
neglectible. If the user knows enough to change the class attribute she 
will know enough to use the search / replace function of her editor: a 
few keystrokes more but same results.

Remember that we are talking about attributes like italics and 
superscript. Attributes that nobody will want to change. Now if somebody 
could give me a valid reason why people should want to change a span of 
italic text into something else ...

Attributes people actually may want to change like: font, base font size 
and background-color are never used in the style attributes.


It would be different if we had an external style sheet. By changing the 
style sheet all downloaded texts would change appearance at once. But 
the WWers nixed that, so its no use putting much programming effort there.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From lee at novomail.net  Wed Nov 15 19:27:41 2006
From: lee at novomail.net (Lee Passey)
Date: Wed Nov 15 19:26:56 2006
Subject: [gutvol-d] TEI to XHTML transformations
In-Reply-To: <455B995D.9030404@perathoner.de>
References: <17115451.1163628303898.JavaMail.?@fh1037.dia.cp.net>
	<455B995D.9030404@perathoner.de>
Message-ID: <455BDAAD.8080105@novomail.net>

Marcello Perathoner wrote:
> Remember that we are talking about attributes like italics and 
> superscript. Attributes that nobody will want to change. Now if 
> somebody could give me a valid reason why people should want to change 
> a span of italic text into something else ...

A woman I know has a hard time reading the slanted text that is 
sometimes used for italics. Her solution is to have the italicized text 
rendered in a different font, such as Comic Sans MS which is both 
distinctive and legible.

I further believe that any file labeled HTML should be adequately 
rendered by a User Agent which does not support stylesheets, such as 
MobiPocket Reader or ?Book reader. Thus, "<span 
style='font-style:italic'>" is not an adequate replacement for <i>, and 
"<span style="font-size: 60%; vertical-align: super">" is not an 
adequate replacement for <sup>.

> Attributes people actually may want to change like: font, base font 
> size and background-color are never used in the style attributes.

OTOH, attributes people may want to change like: text-align and top and 
bottom margins are frequently used in the style attributes.
From sly at victoria.tc.ca  Wed Nov 15 22:14:41 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Nov 15 22:14:53 2006
Subject: [gutvol-d] Christmas-related text
Message-ID: <Pine.GSO.4.58.0611152214010.18164@vtn1.victoria.tc.ca>


I've put the list of Christmas-related material I've organized up at:
http://www.gutenberg.org/wiki/Christmas_(Bookshelf)

Some of the subdivisions are perhaps not very exact, but it's
better than having just one long list.

I would welcome any efforts to add to it, or improve it.

Thanks,
Andrew
From Bowerbird at aol.com  Mon Nov 20 13:55:50 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Nov 20 13:56:04 2006
Subject: [gutvol-d] gvd061120 -- the good
Message-ID: <d1b.44c268.32937e66@aol.com>

michael hart tells us december 10th has been a significant day
for him over the years, and this year will be no exception, since
the project gutenberg library will pass 20,000 items before then.

(with all the audio files that are being posted now, it becomes
more and more inaccurate to refer to those items as _e-texts_.
then again, my local library also has more stuff besides books.)

it was just 3 years ago that we were celebrating 10,000 e-texts.

congratulations, michael, on your vision and your dedication,
and your success in building the cyberspace library which has
attracted the volunteer efforts of so many committed people...

what you have accomplished is truly remarkable.

when i give thanks on thursday, you'll get prominent mention.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061120/4949e59b/attachment.html
From Bowerbird at aol.com  Tue Nov 21 12:26:21 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Nov 21 12:26:52 2006
Subject: [gutvol-d] gvd061121 -- the good, the bad
Message-ID: <3c9.f52a19f.3294baed@aol.com>

when i go away, this list just kind of dries up into nothing.

***

here's a curious little factoid.   since i started my babelfish
series on our little open-source project on this listserve,
using "my antonia" as the content, josh has put it into .tei
and submitted it to the p.g. library.   see if you can find it,
and download the .pdf and the .html to take a look at it...

we'll soon have some fun doing straight-up comparisons.

this will be good from the standpoint of the end-readers.

and it will be bad for any side that ends up looking bad...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061121/fc77dd07/attachment.html
From j.hagerson at comcast.net  Tue Nov 21 20:09:57 2006
From: j.hagerson at comcast.net (John Hagerson)
Date: Tue Nov 21 20:10:39 2006
Subject: [gutvol-d] New goal?
Message-ID: <018b01c70dec$15a4e700$1f12fea9@sarek>

The smaller that "20,000 books" appears in the rear-view mirror, the
stranger the daily e-mail progress reports appear. Would anyone care to pick
the next numerical goal and update the progress report messages
appropriately?

Thank you.


From joshua at hutchinson.net  Wed Nov 22 05:00:10 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov 22 05:00:50 2006
Subject: [gutvol-d] New goal?
Message-ID: <1642873.1164200410877.JavaMail.?@fh1037.dia.cp.net>

Well, we haven't actually hit 20,000 yet.  The progress report counts 
in Australia book count (and maybe PG Europe?)

Straight PG book count numbers are right around 19900.  We should hit 
20,000 in the next two weeks sometime.

Josh

>----Original Message----
>From: j.hagerson@comcast.net
>Date: Nov 21, 2006 23:09 
>To: "Project Gutenberg Volunteer Discussion"<gutvol-d@lists.pglaf.
org>
>Subj: [gutvol-d] New goal?
>
>The smaller that "20,000 books" appears in the rear-view mirror, the
>stranger the daily e-mail progress reports appear. Would anyone care 
to pick
>the next numerical goal and update the progress report messages
>appropriately?
>
>Thank you.
>
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From jeroen.mailinglist at bohol.ph  Wed Nov 22 12:57:43 2006
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Wed Nov 22 12:57:00 2006
Subject: [gutvol-d] New goal?
In-Reply-To: <1642873.1164200410877.JavaMail.?@fh1037.dia.cp.net>
References: <1642873.1164200410877.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <4564B9C7.3060600@bohol.ph>


After 20,000 comes 25,000, but that is quite early. I next great event
would be 50,000, but may be too far in the future, maybe we can go for
33,333....

Some other statistics may also be interesting trivia...

How many characters
How many words
How many pages
How many illustrations
How many HTML tags

Have we included.

Jeroen.


joshua@hutchinson.net wrote:
> Well, we haven't actually hit 20,000 yet.  The progress report counts 
> in Australia book count (and maybe PG Europe?)
>
> Straight PG book count numbers are right around 19900.  We should hit 
> 20,000 in the next two weeks sometime.
>
> Josh
>
>   
From gbnewby at pglaf.org  Wed Nov 22 17:41:55 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Nov 22 17:41:57 2006
Subject: [gutvol-d] New goal?
In-Reply-To: <1642873.1164200410877.JavaMail.?@fh1037.dia.cp.net>
References: <1642873.1164200410877.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <20061123014155.GA6963@mail.pglaf.org>

On Wed, Nov 22, 2006 at 01:00:10PM +0000, joshua@hutchinson.net wrote:
> Well, we haven't actually hit 20,000 yet.  The progress report counts 
> in Australia book count (and maybe PG Europe?)
> 
> Straight PG book count numbers are right around 19900.  We should hit 
> 20,000 in the next two weeks sometime.

BTW, we don't yet have a particular eBook scheduled for PG's
#20000.  If you have suggestions, please speak up -- soon!
  -- Greg


> Josh
> 
> >----Original Message----
> >From: j.hagerson@comcast.net
> >Date: Nov 21, 2006 23:09 
> >To: "Project Gutenberg Volunteer Discussion"<gutvol-d@lists.pglaf.
> org>
> >Subj: [gutvol-d] New goal?
> >
> >The smaller that "20,000 books" appears in the rear-view mirror, the
> >stranger the daily e-mail progress reports appear. Would anyone care 
> to pick
> >the next numerical goal and update the progress report messages
> >appropriately?
> >
> >Thank you.
> >
> >
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d@lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> 
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From tb at baechler.net  Thu Nov 23 01:05:26 2006
From: tb at baechler.net (Tony Baechler)
Date: Thu Nov 23 01:04:34 2006
Subject: [gutvol-d] New goal?
In-Reply-To: <20061123014155.GA6963@mail.pglaf.org>
References: <1642873.1164200410877.JavaMail.?@fh1037.dia.cp.net>
	<20061123014155.GA6963@mail.pglaf.org>
Message-ID: <20061123090435.SHSP12305.dukecmmtar01.coxmail.com@Tony.baechler.net>

Hi,

I know this is already in the PG collection, but _20,000 Leagues 
Under the Sea_ comes to mind for book 20000.

At 05:41 PM 11/22/06 -0800, you wrote:
>BTW, we don't yet have a particular eBook scheduled for PG's
>#20000.  If you have suggestions, please speak up -- soon!
>   -- Greg


-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.14.13/546 - Release Date: 11/22/06


From Bowerbird at aol.com  Mon Nov 27 13:35:26 2006
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 27 Nov 2006 16:35:26 EST
Subject: [gutvol-d] errors in the strangest places
Message-ID: <c7b.656de24.329cb41e@aol.com>

even on a children's book, with <15k of text,
one can find errors in the work of d.p./p.g.,
as shown in e-text #19915.

>    http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy028.png
pbook>    "HERE, Charlotte," said Mamma one day.
dp/pg>    "Here, Charlotte," Said Mamma One Day.

>    http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy088.png
pbook>    I never Saw a Girl Or Boy
dp/pg>    I NEVER saw a girl or boy

those u.r.l.s show the scans of the pages, which -- in d.p. fashion --
are inexplicably huge, so you'll need to scroll the image if your browser
doesn't resize larger images to the window-size for you automatically...

i'm fine with eliminating uppercasing of the lead-words in a chapter
-- i do that myself -- but the uppercasing of the other words is wrong.

of course these aren't "serious", verging on complete meaninglessness
as "errors".   but for an outfit whose motto is "make it match the page",
how can mismatches like this be made, and go through so many eyes?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061127/2e5952a1/attachment.htm 

From Bowerbird at aol.com  Mon Nov 27 14:13:07 2006
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 27 Nov 2006 17:13:07 EST
Subject: [gutvol-d] errors in the strangest places
Message-ID: <cd9.42981b2.329cbcf3@aol.com>

i said:
>    pbook>?? I never Saw a Girl Or Boy
>    dp/pg>?? I NEVER saw a girl or boy

oops!   made an error of my own!

these two should be switched...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061127/2fa456a9/attachment.htm 

From gbnewby at pglaf.org  Mon Nov 27 16:57:43 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Mon, 27 Nov 2006 16:57:43 -0800
Subject: [gutvol-d] errors in the strangest places
In-Reply-To: <c7b.656de24.329cb41e@aol.com>
References: <c7b.656de24.329cb41e@aol.com>
Message-ID: <20061128005743.GA26198@mail.pglaf.org>

On Mon, Nov 27, 2006 at 04:35:26PM -0500, Bowerbird at aol.com wrote:
> even on a children's book, with <15k of text,
> one can find errors in the work of d.p./p.g.,
> as shown in e-text #19915.
> 
> >    http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy028.png
> pbook>    "HERE, Charlotte," said Mamma one day.
> dp/pg>    "Here, Charlotte," Said Mamma One Day.
> 
> >    http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy088.png
> pbook>    I never Saw a Girl Or Boy
> dp/pg>    I NEVER saw a girl or boy
> 
> those u.r.l.s show the scans of the pages, which -- in d.p. fashion --
> are inexplicably huge, so you'll need to scroll the image if your browser
> doesn't resize larger images to the window-size for you automatically...
> 
> i'm fine with eliminating uppercasing of the lead-words in a chapter
> -- i do that myself -- but the uppercasing of the other words is wrong.

You might think it's wrong.  I think it's within the latitude
given to eBook producers to make their own choices concerning 
typography.
  -- Greg

> of course these aren't "serious", verging on complete meaninglessness
> as "errors".   but for an outfit whose motto is "make it match the page",
> how can mismatches like this be made, and go through so many eyes?
> 
> -bowerbird

> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From sly at victoria.tc.ca  Mon Nov 27 17:46:05 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Mon, 27 Nov 2006 17:46:05 -0800 (PST)
Subject: [gutvol-d] errors in the strangest places
In-Reply-To: <20061128005743.GA26198@mail.pglaf.org>
References: <c7b.656de24.329cb41e@aol.com>
	<20061128005743.GA26198@mail.pglaf.org>
Message-ID: <Pine.GSO.4.58.0611271740590.25717@vtn1.victoria.tc.ca>

That's a good point. There have been a decent number of times
I've glances at some PG text and gritted my teeth, thinking
"_I_ wouldn't do it that way." But I've had to get used to
moving on. The latest in my mind was an illustrated children's
poetry book which laid out everything in html using only
tables and <br> tags.

Andrew

On Mon, 27 Nov 2006, Greg Newby wrote:

> >
> > i'm fine with eliminating uppercasing of the lead-words in a chapter
> > -- i do that myself -- but the uppercasing of the other words is wrong.
>
> You might think it's wrong.  I think it's within the latitude
> given to eBook producers to make their own choices concerning
> typography.
>   -- Greg
>

From Bowerbird at aol.com  Mon Nov 27 18:31:29 2006
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 27 Nov 2006 21:31:29 EST
Subject: [gutvol-d] errors in the strangest places
Message-ID: <c4e.9179fe5.329cf981@aol.com>

greg said:
>    You might think it's wrong.? 
>    I think it's within the latitude
>    given to eBook producers 
>   to make their own choices 
>   concerning typography.

if it does indeed reflect a d.p. choice,
i would love to hear the rationale...

however, since no other section-starts
had such capitalization added to them,
and given d.p. policy to match the page,
i believe d.p. will agree these are errors;
they don't change capitalization willy-nilly.

my guess is that it was small-cap markup
gone awry.   the argument for small-caps
was that, in some cases (which i'd argue
are extremely rare), they are semantically
significant.   in practice, though, especially
because of that "match the page" mantra,
all kinds of non-semantic small-caps are
currently being "matched".   since it doesn't
carry over into the plain-ascii text version,
it doesn't bother me.   but i do believe it's
an interesting aspect of the d.p. thinking,
that the rationale for _adoping_ a process
can get lost in a too-literal interpretation...

and, just because the mere fact of discussing
something sometimes makes it appear to be
more "important" than it really is, i will stress
one more time that these "errors" are trivial,
so trivial that i feel a need to put quotemarks
around the word "errors".   the primary reason
that i brought this up is that it happened on
such a _simple_ and straightforward book...
certainly nobody would say that these were
_difficult_ pages to proofread.   yet still...

and if i asserted that d.p. makes mistakes on
even the simplest of pages, i'm positive that
someone here would accuse me of "trolling",
and "harboring ill will toward the d.p. people".
um, no.   i'm just reporting the facts.   sorry...

another reason i brought it up is because
-- if i wouldn't have been able to point to
the _scans_ -- nobody would have probably
believed these were "errors" in the first place.

i distance myself from noring's "authenticity"
b.s., but i do believe that if you are going to
state that your text comes from one specific
version of one specific book, it should match,
and any deviations should be due to a policy
that you can articulate clearly.   now of course,
errors do happen, because we're "only" human,
so you just go in and fix 'em when you find 'em.

but you mount the scans so the general public
can help you find them.   it's just common sense.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061127/012c0507/attachment.htm 

From vze3rknp at verizon.net  Mon Nov 27 19:28:59 2006
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Mon, 27 Nov 2006 22:28:59 -0500
Subject: [gutvol-d] errors in the strangest places
In-Reply-To: <c7b.656de24.329cb41e@aol.com>
References: <c7b.656de24.329cb41e@aol.com>
Message-ID: <456BACFB.6050501@verizon.net>

Bowerbird at aol.com wrote:

> even on a children's book, with <15k of text,
> one can find errors in the work of d.p./p.g.,
> as shown in e-text #19915.
>
> >   http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy028.png
> pbook>   "HERE, Charlotte," said Mamma one day.
> dp/pg>   "Here, Charlotte," Said Mamma One Day.
>
> >   http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy088.png
> pbook>   I never Saw a Girl Or Boy
> dp/pg>   I NEVER saw a girl or boy
>
> those u.r.l.s show the scans of the pages, which -- in d.p. fashion --
> are inexplicably huge, so you'll need to scroll the image if your browser
> doesn't resize larger images to the window-size for you automatically...
>
> i'm fine with eliminating uppercasing of the lead-words in a chapter
> -- i do that myself -- but the uppercasing of the other words is wrong.

In these cases, the upper case words are at the beginning of poems, 
which are
treated just like chapters by DP folks. In fact, the first words of all 
the poems are in
upper case in the scans, not just the two examples you gave. So these were
handled consistently across the entire e-book, in accordance with standard
DP (and PG) guidelines, and even in a manner that you say you approve of
(assuming that you agree that beginnings of poems are like beginnings of 
chapters).

JulietS


From jon at noring.name  Mon Nov 27 21:38:02 2006
From: jon at noring.name (Jon Noring)
Date: Mon, 27 Nov 2006 22:38:02 -0700
Subject: [gutvol-d] errors in the strangest places
In-Reply-To: <456BACFB.6050501@verizon.net>
References: <c7b.656de24.329cb41e@aol.com> <456BACFB.6050501@verizon.net>
Message-ID: <779157567.20061127223802@noring.name>

Juliet wrote:
> Bowerbird wrote:

>> even on a children's book, with <15k of text,
>> one can find errors in the work of d.p./p.g.,
>> as shown in e-text #19915.
>>
>>    http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy088.png
>>
>> pbook>   I never Saw a Girl Or Boy
>> dp/pg>   I NEVER saw a girl or boy

> In these cases, the upper case words are at the beginning of poems, 
> which are treated just like chapters by DP folks. In fact, the first
> words of all the poems are in upper case in the scans, not just the
> two examples you gave. So these were handled consistently across the
> entire e-book, in accordance with standard DP (and PG) guidelines,
> and even in a manner that you say you approve of (assuming that you
> agree that beginnings of poems are like beginnings of chapters).

Hmmm, referring to:

   http://pge.rastko.net/dirs/1/9/9/1/19915/19915-h/19915-h.htm

It is clear by looking at all the other poetry, that indeed Bowerbird
is right, in my estimation, and that line should be:

"I never saw a girl or boy"

not

"I never Saw a Girl or Boy"


And the original page scan bears this out -- it has "I NEVER saw a girl
or boy", where "I NEVER" is in small-caps, per convention. "saw",
"girl" and "boy" are not capitalized in the original. Note, too, that
poem does have a title, "SOPHIE SPOILALL"

(The above is further verified by looking at the other original page
scans. See, for example, page 80, which is the start of another poem
in the book. The words in the first line, other than the leading word
"Oh!" (in small-caps per convention), are likewise not capitalized:

   http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy080.png )


Btw, nicely done HTML version of the book at rastko.net! (I'll have to
record the color values for the background pages -- nice and warm
and old paper-like.)

DP did a great job on this book, despite the capitalization error, which
is something correctable if the DP folk believe it to be an error.


Jon Noring


From Bowerbird at aol.com  Mon Nov 27 23:50:28 2006
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 28 Nov 2006 02:50:28 EST
Subject: [gutvol-d] errors in the strangest places
Message-ID: <d0c.318cec4.329d4444@aol.com>

juliet said:
>    So these were handled consistently across the entire e-book, 
>   in accordance with standard DP (and PG) guidelines, and even 
>    in a manner that you say you approve of (assuming that you agree 
>   that beginnings of poems are like beginnings of chapters).

take another look please.


>    http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy028.png
>    pbook>?? "HERE, Charlotte," said Mamma one day.
>    dp/pg>?? "Here, Charlotte," Said Mamma One Day.

"said" and "one" and "day" should be lowercased.

i agree that "here" should be initial-cap-only, since 
the all-uppercase in the p-book was "ornamental"...


>?? http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy088.png
>    pbook>?? I NEVER saw a girl or boy
>    dp/pg>?? I never Saw a Girl Or Boy

"saw" and "girl" and "or" and "boy" should be lowercased.

i do agree that "never" should be lowercased, since 
the all-uppercase in the p-book was "ornamental"...

i could agree that retaining ornamental uppercasing
is an acceptable option -- even though i don't do it.
i think a better policy is to discard the ornamental
and use all-caps only when the _author_ wrote that.
but i could not count that as "a digitization error"...

still, i don't think anyone is arguing that an extension
of ornamental initial-caps to additional words is wise.
(and i don't think that's what juliet suggested here...)

also, to repeat, so no one gets the wrong impression:
these "errors" are not important in and of themselves.
only that they were on a simple text, and got posted.

what _should_ be of concern here is that _someone_
needs to double-check _every_ change that is made,
even when the change has been made by what would
ordinarily be considered "the last person" in the chain.

(whether that's the "last proofer" or the "last formatter"
or the "last postprocessor" or the "last whitewasher" is
a matter that will need to be worked out by the powers;
my concern is with the sausage plopped out at the end.)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061128/14eae465/attachment-0001.htm 

From mlockey at magma.ca  Tue Nov 28 08:23:26 2006
From: mlockey at magma.ca (Michael Lockey)
Date: Tue, 28 Nov 2006 11:23:26 -0500
Subject: [gutvol-d] Errors...
In-Reply-To: <mailman.27.1164700238.985.gutvol-d@lists.pglaf.org>
Message-ID: <200611281621.kASGLUnB017084@mail1.magma.ca>

-----Original Message-----

pbook>   "HERE, Charlotte," said Mamma one day.
dp/pg>   "Here, Charlotte," Said Mamma One Day.

>   http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy088.png
pbook>   I never Saw a Girl Or Boy
dp/pg>   I NEVER saw a girl or boy

of course these aren't "serious", verging on complete meaninglessness
as "errors".  but for an outfit whose motto is "make it match the page",
how can mismatches like this be made, and go through so many eyes?

-bowerbird

...Possibly because some folk prefer talk to being part of the solution...

Michael Lockey


From Bowerbird at aol.com  Tue Nov 28 09:58:39 2006
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 28 Nov 2006 12:58:39 EST
Subject: [gutvol-d] Errors...
Message-ID: <bfc.a005a46.329dd2cf@aol.com>

michael said:
>    ...Possibly because some folk prefer talk to being part of the 
solution...

what _is_ the solution, michael?

i've suggested that any change be double-checked after it's made.

but i'd love to hear what _your_ recommendation might be...

or are you just "talking" now (or should i say "throwing rocks"?)
without being able to suggest concrete "solutions" of your own?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061128/758fc87e/attachment.htm 

From ke at gnu.franken.de  Tue Nov 28 20:57:05 2006
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Wed, 29 Nov 2006 05:57:05 +0100
Subject: [gutvol-d] errors in the strangest places
In-Reply-To: <779157567.20061127223802@noring.name> (Jon Noring's message of
	"Mon\, 27 Nov 2006 22\:38\:02 -0700")
References: <c7b.656de24.329cb41e@aol.com> <456BACFB.6050501@verizon.net>
	<779157567.20061127223802@noring.name>
Message-ID: <sh7ixe4zq6.fsf@tux.gnu.franken.de>

Jon Noring <jon at noring.name> writes:

> Hmmm, referring to:

[...]

> Btw, nicely done HTML version of the book at rastko.net! (I'll have to
> record the color values for the background pages -- nice and warm
> and old paper-like.)
>
> DP did a great job on this book, despite the capitalization error, which
> is something correctable if the DP folk believe it to be an error.

I think different.  Not bad, but bolding the text seems to be invented
by the post-processor and I do not like it.  Black on a white (or
lightgrey) background would be much better.  A lightgrey background
would also be better because of the color plates.

-- 
http://www.gnu.franken.de/ke/                           |      ,__o
                                                        |    _-\_<,
                                                        |   (*)/'(*)
Key fingerprint = F138 B28F B7ED E0AC 1AB4  AA7F C90A 35C3 E9D0 5D1C

From prosfilaes at gmail.com  Thu Nov 30 06:05:12 2006
From: prosfilaes at gmail.com (David Starner)
Date: Thu, 30 Nov 2006 08:05:12 -0600
Subject: [gutvol-d] Fwd: Undelivered Mail Returned to Sender
In-Reply-To: <20061130135841.7AB9935270E@mail1.pglaf.org>
References: <20061130135841.7AB9935270E@mail1.pglaf.org>
Message-ID: <6d99d1fd0611300605rae12b57p6bfea0f04a421713@mail.gmail.com>

Is the errata address having problems?

---------- Forwarded message ----------
From: Mail Delivery System <MAILER-DAEMON at pglaf.org>
Date: Nov 30, 2006 7:58 AM
Subject: Undelivered Mail Returned to Sender
To: prosfilaes at gmail.com


This is the mail system at host mail1.pglaf.org.

I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to <postmaster>

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

                   The mail system

<errata at pglaf.org>: Command died with status 1: " cat >>
    /home/cleared/errata-mbox.txt". Command output: sh:
    /home/cleared/errata-mbox.txt: Permission denied


Final-Recipient: rfc822; errata at pglaf.org
Original-Recipient: rfc822;errata at pglaf.org
Action: failed
Status: 5.3.0
Diagnostic-Code: x-unix; sh: /home/cleared/errata-mbox.txt: Permission denied


---------- Forwarded message ----------
From: "David Starner" <prosfilaes at gmail.com>
To: errata at pglaf.org
Date: Thu, 30 Nov 2006 07:51:32 -0600
Subject: errata for etext 19964, Regeneration
The line:

Somehow Builder had known they wore going to kill him before arousing the
rest of the tribe to the fact that Thor was back.

should obviously be

Somehow Builder had known they were going to kill him before arousing the
rest of the tribe to the fact that Thor was back.

It's on page p086 of the original scans, and it's clearly "were" in those.

From joshua at hutchinson.net  Thu Nov 30 06:38:34 2006
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Thu, 30 Nov 2006 14:38:34 +0000 (UTC)
Subject: [gutvol-d] Fwd: Undelivered Mail Returned to Sender
Message-ID: <5515118.1164897514424.JavaMail.?@fh1035.dia.cp.net>

Probably related to the recent server move.  I'll let Greg figure that 
out.

In the meantime, I'll fix this error and repost to the archive.

Thanks,
Josh

>----Original Message----
>From: prosfilaes at gmail.com
>Date: Nov 30, 2006 9:05 
>To: "Project Gutenberg Volunteer Discussion"<gutvol-d at lists.pglaf.
org>
>Subj: [gutvol-d] Fwd: Undelivered Mail Returned to Sender
>
>Is the errata address having problems?
>
>---------- Forwarded message ----------
>From: Mail Delivery System <MAILER-DAEMON at pglaf.org>
>Date: Nov 30, 2006 7:58 AM
>Subject: Undelivered Mail Returned to Sender
>To: prosfilaes at gmail.com
>
>
>This is the mail system at host mail1.pglaf.org.
>
>I'm sorry to have to inform you that your message could not
>be delivered to one or more recipients. It's attached below.
>
>For further assistance, please send mail to <postmaster>
>
>If you do so, please include this problem report. You can
>delete your own text from the attached returned message.
>
>                   The mail system
>
><errata at pglaf.org>: Command died with status 1: " cat >>
>    /home/cleared/errata-mbox.txt". Command output: sh:
>    /home/cleared/errata-mbox.txt: Permission denied
>
>
>Final-Recipient: rfc822; errata at pglaf.org
>Original-Recipient: rfc822;errata at pglaf.org
>Action: failed
>Status: 5.3.0
>Diagnostic-Code: x-unix; sh: /home/cleared/errata-mbox.txt: 
Permission denied
>
>
>
>---------- Forwarded message ----------
>From: "David Starner" <prosfilaes at gmail.com>
>To: errata at pglaf.org
>Date: Thu, 30 Nov 2006 07:51:32 -0600
>Subject: errata for etext 19964, Regeneration
>The line:
>
>Somehow Builder had known they wore going to kill him before arousing 
the
>rest of the tribe to the fact that Thor was back.
>
>should obviously be
>
>Somehow Builder had known they were going to kill him before arousing 
the
>rest of the tribe to the fact that Thor was back.
>
>It's on page p086 of the original scans, and it's clearly "were" in 
those.
>_______________________________________________
>gutvol-d mailing list
>gutvol-d at lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From joey at joeysmith.com  Thu Nov 30 11:15:46 2006
From: joey at joeysmith.com (joey)
Date: Thu, 30 Nov 2006 12:15:46 -0700
Subject: [gutvol-d] Fwd: Undelivered Mail Returned to Sender
In-Reply-To: <5515118.1164897514424.JavaMail.?@fh1035.dia.cp.net>
References: <5515118.1164897514424.JavaMail.?@fh1035.dia.cp.net>
Message-ID: <20061130191545.GA9321@joeysmith.com>

I think this issue should now be fixed, if someone wants to send a test
and let me know how it goes.

On Thu, Nov 30, 2006 at 02:38:34PM +0000, joshua at hutchinson.net wrote:
> Probably related to the recent server move.  I'll let Greg figure that 
> out.
> 
> In the meantime, I'll fix this error and repost to the archive.
> 
> Thanks,
> Josh
> 
> >----Original Message----
> >From: prosfilaes at gmail.com
> >Date: Nov 30, 2006 9:05 
> >To: "Project Gutenberg Volunteer Discussion"<gutvol-d at lists.pglaf.
> org>
> >Subj: [gutvol-d] Fwd: Undelivered Mail Returned to Sender
> >
> >Is the errata address having problems?

From hart at pglaf.org  Thu Nov 30 13:46:16 2006
From: hart at pglaf.org (Michael Hart)
Date: Thu, 30 Nov 2006 13:46:16 -0800 (PST)
Subject: [gutvol-d] Attn: eBook Standards
Message-ID: <Pine.LNX.4.64.0611301345450.20507@mail1.pglaf.org>


---------- Forwarded message ----------
Date: Tue, 28 Nov 2006 20:06:46 -0700
From: Edward Goas <edwgoas at uat.edu>
To: hart at pobox.com
Subject: Attn: Michael Hart

Hi Mr Hart, my name is Edward Goas. I found your name and email address on 
your organization?s public website. I assure you THIS IS NOT SPAM.

I am a graduate student at the University of Advancing Technology in Tempe, 
AZ. I?m a candidate for an M.S. in Technology Management and have chosen to 
complete my graduate thesis on forecasting the success of a proposed eBook 
standard file format and whether a standard eBook format will solve any 
existing issues.

I am wondering if you can take 5 minutes and submit a survey for my thesis. 
Feel free to leave any answers blank and submit an incomplete survey; it?s 
quite all right!

Here is a link to the survey:
http://www.questionpro.com/akira/TakeSurvey?id=570587

If you complete a survey, I will happily provide you with my data and thesis 
upon its completion. You may use the research and data any way you?d like. 
I?ll provide it in any format you?d like. Please don't hesitate email me with 
any questions. Even if you choose not to participate, send me an email if 
you?d like a copy of the final thesis document and data.

If you do not feel comfortable taking this survey and know someone else in 
your organization that might be more qualified, please forward this message 
on.

Thank you for your time.

Sincerely,
Edward Goas
University of Advancing Technology
edwgoas at uat.edu
(201) 819-9687

From gbnewby at pglaf.org  Thu Nov 30 15:38:00 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu, 30 Nov 2006 15:38:00 -0800
Subject: [gutvol-d] Fwd: Undelivered Mail Returned to Sender
In-Reply-To: <20061130191545.GA9321@joeysmith.com>
References: <5515118.1164897514424.JavaMail.?@fh1035.dia.cp.net>
	<20061130191545.GA9321@joeysmith.com>
Message-ID: <20061130233800.GA23092@mail.pglaf.org>

On Thu, Nov 30, 2006 at 12:15:46PM -0700, joey wrote:
> I think this issue should now be fixed, if someone wants to send a test
> and let me know how it goes.

Actually, it was still broken :)

Should be fixed now.
   -- Greg

> On Thu, Nov 30, 2006 at 02:38:34PM +0000, joshua at hutchinson.net wrote:
> > Probably related to the recent server move.  I'll let Greg figure that 
> > out.
> > 
> > In the meantime, I'll fix this error and repost to the archive.
> > 
> > Thanks,
> > Josh
> > 
> > >----Original Message----
> > >From: prosfilaes at gmail.com
> > >Date: Nov 30, 2006 9:05 
> > >To: "Project Gutenberg Volunteer Discussion"<gutvol-d at lists.pglaf.
> > org>
> > >Subj: [gutvol-d] Fwd: Undelivered Mail Returned to Sender
> > >
> > >Is the errata address having problems?
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d